Co-constructing intent with AI agents
uxdesign.cc
How can we move beyond the confines of a simple input field to help agents evolve from mere tools into true partners that can perceive our unspoken intentions and spark new ways of thinking?When we share our vague, half-formed ideas with AI agents, are we looking for the predictable, standard answers we expect, or a genuine surprise we didnt seecoming?As the capabilities of AI agents evolve at a breathtaking pace, we increasingly expect them to be intuitive and understanding.Yet, the reality often falls short. Vague or abstract questions typically yield only generic, catch-all answers. This traps us in a frustrating loop of rephrasing and refining, where we might land on a satisfactory result, but only after a frustrating cycle of refinement.Clear questions vs. Vague or abstract questionsThis is a dominant mode of human-AI interaction. But is this the future we reallywant?True connection often sparks in a dialogue between equals. The conversations that leave a lasting impression arent the simple question-and-answer exchanges. Instead, they are the ones where our true intent gradually surfaces through a back-and-forth of clarifying questions, mutual confirmations, and shared moments ofinsight.If AI agents is to make the leap from tool to partner, perhaps we need to reimagine its role. Should it merely provide answers on command, or could it become a true companion in our explorationsone that provokes our thoughts, and ultimately, helps us discover what we trulywant?Speed UnderstandingImagine sending a morning greeting to your family on a hazy, half-awake morning. Your finger instinctively finds the Send button. A light tap, and its done. This simple, natural actionhow different would it have been just a few decadesago?You would have had to carefully type out lines of code on a screen, where a single typo or an extra space would cause the computer to rebuke you with an even more cryptic string of garbledtext.At its core, the difference between these two experiences lies in the challenge of translating between fuzzy human intent and precise computer instructions. Back in the 1980s, Don Norman defined this challenge with two concepts: the Gulf of Execution, which separates our thoughts from the machines commands, and the Gulf of Evaluation, which separates the machines feedback from our ability to understand it.The narrower these gulfs, the more seamless the process of conveying our intent and interpreting the results. Decades of progress in human-computer interaction, from the command-line interface to the graphical user interface, have been dedicated to bridging thisdivide.Whitenton, K. (2018, March 12). The two UX gulfs: evaluation and execution. Nielsen NormanGroup.Todays AI-powered applications are following the same trajectory, continuously lowering the barrier to execution and making it ever faster to communicate in natural language.But does this speed truly mean our intent is being understood?Are AI agents really helping usexplore?Whats the capital of the Netherlands?The capital of the Netherlands is Amsterdam, but the seat of government, central ministries, parliament, and foreign embassies are all in TheHagueHow high is Mount Everest?The latest official height of Mount Everest is 8,848.86 meters, a result from a joint survey by China and Nepal in2020Generate a working timerdemo.Sure, heres a simple countdown timer in Python. The user can input the number of seconds, and the script will count down and print a message at theendYes. For tasks with a clear intent, a specific goal, and low-context requirementslike summarizing articles, translating text, or doing basic mathAI agents deliver accurate and prompt answers. But when faced with the messy, complex challenges of real life, does that efficiency holdup?Impeccable, butuselessHelp me plan a trip to Japan that the whole family willlove.Of course. To plan a trip to Japan the whole family can enjoy, here are some suggestions you can tailor to your needs. The milder weather in spring (March-May) and autumn (September-November) is ideal for family travel. Determine the trips duration based on everyones vacation time. If youre traveling between cities, consider a Japan Rail Pass, but youll need to calculate if its cost-effective for your specific itineraryOur conversation with the agent seems free of any Gulf of Execution. We had no trouble sending the request, and the agent promptly returned a standard-issue response. But how much is that response really worth tous?The parents want to unwind in a hot spring, the kids are dying to go to Universal Studios, and your partner is hoping for a more niche, authentic cultural experience. Everyone has different hopes for the trip, but the agents generic advice fails to address any of them. But why didnt we just give the agent all these details from thestart?The slot machine conversation trapWhen we turn to AI with these kinds of vague, complex problems, 99% of the time we are funneled into a single input box. Its the dominant interface for AI today, a model that originated with ChatGPTs goal of giving people the most direct path to experiencing the power of large languagemodels.The predominant way we interact with AI is almost entirely centered around the inputfield.However, the thought of cramming every detail into that tiny boxeveryones preferences, the family budget, and all the nuances from memoryand then endlessly editing it, is just exhausting.This is too much trouble, just simplifyit.Our brains are wired for shortcuts. To get our vague idea out quickly, we subconsciously strip away all the context, preferences, and other details that are hard to articulate, compressing everything into the oversimplified phrase make the family happy. We toss it into the input box and pin all our hopes on the agents abilities.Then, like a gambler, we pull the lever and pray for a lucky spin that happens to read our minds. To increase its hit rate with such pitiful context, the agent can only flex its capabilities, calling on every tool at its disposal to generate a broad, catch-all answer.The result isnt a helpful guide that inspires new thinking, but an undigested information dump. This interaction becomes less like a conversation and more like a slot machine, defined by uncertainty. It invisibly adds to our cognitive load and pushes us further away from discovering what we reallyneed.Even as AI agents have evolved to handle high-dimensional, ambiguous, and exploratory tasks, the way we communicate with it remains a low-dimensional channel, ill-suited for expressing our own complex thoughts.However, difficulties in obtaining the desired outcome arise from both the AIs interpretation and the translation of intentions into prompts.An evolution in the user experience of AI systems is necessary, integrating GUI-like characteristics with intent-based interaction.On the usability of generative AI: Human generative AIStop guessing, start exploring the realproblemLets revisit the original idea. If you truly wanted to plan a trip to make your whole family happy, how would you do it without an AI? Youd probably engage in a series of exploratory actionsreflecting, researching, and running what-if scenarios to find a plan that balances everyones different needs.Our daily reality isnt about clear instructions and direct execution; its about navigating vague and messy challenges. Whether planning a family vacation or kicking off a new project at work, the hardest problem we face is often how to transform a fuzzy impulse into a clear and valuablegoal.So how can we design our interactions with AI to help us explore these vague, fragile impulses? How can we build a more coherent, natural dialogue instead of getting stuck in a constant guessinggame?Good design is thorough down to the last detail. Nothing must be arbitrary or left to chance. Care and accuracy in the design process show respect towards the user.DieterRamsLike partners: The power of co-constructing intentDo you think this potted plant would look better somewhere else?Oh? Whats on your mind? I thought you liked it where itwas.Its not that I dont I just feel like nothing looks right lately. I guess Im just looking for a change of scenery.When we talk things over with friends, partners, or family, we rarely expect an immediate, clear-cut answer. The conversation often begins with a vague impulse or a half-formed idea.They might build on your thought: How about by the window? The sunlight might help it thrive. Or they might probe deeper, sensing the motive behind the question: Have you been feeling a bit drained lately? It sounds like you want to move more than just the plantmaybe youre looking to bring something new into yourlife.Human conversation is a dynamic, exploratory journey. Its not about simply transferring information. Its about two people taking a fuzzy idea and, through a back-and-forth exchange, co-discovering, refining, and even shaping it into something entirely newuncharted territory neither had imagined at the start. This is a process of Intent Co-construction.As our relationship with AI evolves from tool to partner, we find ourselves sharing more of these ambiguous intentions. To meet this changing need, how can we learn from our human relationships to design interactions that foster deep connection and co-construct intent with our AI counterparts?Anthropics official introduction: Meet Claude, your thinking partnerscreenshot viaReading between the lines with multimodalityPicture a perfect sunny weekend. Youre driving with the windows down, your favorite album playing, on your way to that new park youve been wanting tovisit.You tell your voice assistant your destination. It instantly displays three routes, color-coded by time and traffic, and helpfully highlights the one its algorithm deemsfastest.You subconsciously take its advice, but halfway there, something feelswrong.While it may be the shortest path physically, the route involves constant lane changes on streets barely wide enough for one car. Youre flanked by parked cars whose doors could swing open at any moment and kids who might dart into the road. Your nerves are frayed, your palms are sweating on the wheel, and you find yourself muttering about the cramped, crowded conditions, nearly rear-ending ane-bike.Through it all, the navigation remains indifferent, stubbornly sticking to its original recommendation.Yes, multimodal inputs allow us to give clearer commands. But when our initial command is incomplete, we still end up with a generic solution. A true partner wouldthink:They seem stressed by this complex route. Should I suggest a longer but easier alternative?Im detecting swearing and frequent hard braking. Is this road too difficult for them tohandle?The real breakthrough isnt just understanding what users say, but how they say itcombining their words with environmental cues and situational context. Do they type fluently or constantly backspace? Do they circle a data point with confidence or hesitation? These subconscious signals often reveal our true state ofmind.Hume AI can analyze the emotion in a speakers voice and respond with empathetic intelligence.The AI we need isnt just one that can process text, voice, images, and gestures simultaneously. We need a partner that, while respecting our privacy, can keenly and continuously read between the lines, detecting the unspoken truth in the dissonance between these multimodal signals.To design the best UX, pay attention to what users do, not what they say. Self-reported claims are unreliable, as are user speculations about future behavior. Users do not know what theywant. JakobNielsenNow, lets take this one step further. Imagine an AI that, through multimodal sensing, has perfectly understood our true intent. If it simply serves up a flawless answer like a data report, is that really the best way for us to learn andgrow?Information as a flowingprocessLets rewind and take that drive to the park again. This time, instead of an AI, your co-pilot is a living, breathing friend.When you reach that same algorithm-approved turnoff, you tense up at the sight of the narrow lane. Your friend notices immediately and guides you through the challenge:This road looks rough. Let me guide you to a betterone.Turn right just after that coffee shop upahead.Were almost there. See the people with picnic blankets?The journey is seamless. You realize your friend didnt necessarily give you more information than the AI, but they delivered the right information at the right time, in a way that made sense in themoment.Similarly, AI-generated information can be delivered through diverse mediums; text is by no means the only way. Think about a recent conversation that stuck with you. Was it memorable for its dictionary-like volume of facts? More likely, you were captivated by how the story was toldin a way that helped you visualize it. This power of visualization is rooted in metaphor.we often think we use metaphors to explain ideas, but I believe good metaphors dont explain but rather transform how our minds engage with ideas, opening entirely new ways of thinking. The Secret of Good MetaphorsFiles that look like paper, directories that look like folders, icons for calculators, notepads, and clocksback in the earliest days of personal computing, designers used graphical metaphors based on familiar physical objects to make strange and complex command lines feel intuitive and accessible.Apple Lisa 2 (1984): Features like desktop icons, the menu bar, and graphical windows significantly lowered the barrier to entry for personal computersMetaphors work by tapping into our past experiences and connecting them to something new, bridging the gap to understanding. So, how does this apply to AIoutput?Think about how we typically use an AI to explore a complex topic. We might ask it a direct question, have it synthesize industry reports, or feed it a pile of research to summarize. Even with the AIs best efforts, clicking open a result to find a wall of text can feel overwhelming.We cant see its thought process. We dont know if it considered all the angles we did. We dont know where to begin. What we truly need isnt just a final answer, but to feel like a friend is walking us through their thinkingtransforming information delivery from a static report into a guided process of shared discovery.Metaso: Visualizes its entire thinking process on a canvas as it works on aproblem.But what if, even after seeing the process, the answer is still too abstract?We naturally understand information through different forms: charts for trends, diagrams for processes, and stories told through sound and images. Any good communication orchestrates different dimensions of information into a presentation that conveys meaning more effectively.Google NotebookLM can transform source materials into various easy-to-digest formats, such as narrated video overviews, conversational podcasts, and interactive mind maps. This shifts learning from a process of passive consumption to a dynamic, co-creative experience.NotebookLM (Google): Can autonomously transform source materials into various accessible formats like illustrated videos, podcasts, or mind maps, turning passive learning into active co-creation.However, theres a risk. When an AI uses carefully crafted metaphors to present an output that is clear, beautiful, and logically flawless, it can feel like an unchallengeable finalanswer.Is that how our conversations with human partnerswork?When a friend shares an idea, we dont just agree. Our responses are filled with questions, doubts, and counter-arguments. Sometimes, a single insightful comment can change the direction of an entire project. A meaningful dialogue is less about the period at the end of a sentence and more about the comma or the question mark that keeps the conversation going.Progressive construction through dialogue andmemoryLets go hiking this weekend. I want to challenge myself.Sounds good! But remember last time? You said your knee was bothering you halfway up. Are you sure? We could find an easiertrail.Im fine, my knees allbetter.Dont push yourselfA true partner remembers your past knee injury. They remember youre directionally challenged and that youre not a fan of reading long texts. This long-term memory allows your interactions to build on a shared history, moving beyond simple Q&A into a state of mutual understanding where you can anticipate each others needs without lengthy explanations.Googles Project Astra remembers what it sees and hears in real time, allowing it to answer contextual questions like, Where did I leave my glasses? The Dia browsers memory feature continuously learns from your browsing history to develop a genuine understanding of yourtastesFor an AI to co-construct intent like a partner, persistent memory is not just a featureits essential.Agent failures arent only model failures; they are context failures.The New Skill in AI is Not Prompting, Its Context EngineeringBut memory alone isnt enough; we need to use it to foster deeper exploration. As we said from the start, the goal isnt to get an instant answer, but to refine our intentions and formulate better, more insightful questions.ChatGPT Study Mode. When given a task, its first instinct isnt to jump straight to an answer. Instead, it begins by asking the user clarifying questions to better define theproblemWhen a vague idea or question surfaces, we want an AI that is more than an answer machine. We want a true thinking partner: one that can reach beyond the immediate context, draw on our shared history to initiate meaningful dialogue, and guide us as we peel back the layers of our own thoughts. In this progressive, co-constructive process, it helps us finally articulate what we trulyintend.Where co-construction ends, webeginDeeper insights through multimodality, dynamic presentations that clarify information, and a back-and-forth conversational loop that feels like chatting with a friend As our dialogue with an AI becomes deeper and more meaningful, so too does our understanding of the problem, and our own intent becomesclearer.But is that the end of thejourney?In the film Her, through countless conversations with the AI Samantha, Theodore is compelled to confront his emotions, his past failed marriage, and his own conflicting fear and desire to reconnect. Throughout this process, Samanthas curiosity, learning, and gentle challenges to his preconceptions help him see himself with new clarity, allowing him to truly feel and face his lifeagain.screenshot viaHerThe world of Her is not some distant future; in many ways, it is a portrait of our present moment. In a future where AI companions will be a long-term presence in our lives, their ultimate purpose may not be to replace human connection, but to act as a catalyst for our owngrowth.The ultimate value of co-constructive interaction is not just to help us understand ourselves more deeply. It is to act as an engine, converting that profound self-awareness into the motivation and clarity needed to achieve our potential in the realworld.Of course, times change, but the fundamentals do not. This has always been the goal of the pioneers of human-computer interaction:Boosting mankinds capability for coping with complex, urgent problems.Doug EngelbartReferenceJohnson, Jeff. Designing with the mind in mind: simple guide to understanding user interface design guidelines. Morgan Kaufmann, 2020.Whitenton, K. (2018, March 12). The two UX gulfs: evaluation and execution. Nielsen Norman Group. https://www.nngroup.com/articles/two-ux-gulfs-evaluation-execution/DOCThe secret of good metaphors. (n.d.). https://www.doc.cc/articles/good-metaphorsNielsen, J., Gibbons, S., & Mugunthan, T. (2024, January 30). Accordion Editing and Apple Picking: Early Generative-AI User Behaviors. Nielsen Norman Group. https://www.nngroup.com/articles/accordion-editing-apple-picking/Varanasi, L. (2025, May 25). Meta chief AI scientist Yann LeCun says current AI models lack 4 key human traits. Business Insider. https://www.businessinsider.com/meta-yann-lecun-ai-models-lack-4-key-human-traits-2025-5?utm_source=chatgpt.comPerry, T. S., & Voelcker, J. (2023, August 7). How the Graphical User Interface Was Invented. IEEE Spectrum. https://spectrum.ieee.org/graphical-user-interfaceGerlich, M. (2025). AI tools in society: Impacts on cognitive offloading and the future of critical thinking. Societies, 15(1),6.Ravera, A., & Gena, C. (2025). On the usability of generative AI: Human generative AI. arXiv preprint arXiv:2502.17714.Co-constructing intent with AI agents was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.
Like
Love
Wow
Sad
Angry
14
· 0 Comments ·0 Shares
CGShares https://cgshares.com