Navigating embedding vectors

@UXCollective A distribuit un link

2025-03-04 10:06:31 ·

uxdesign.cc

AI, feedback & the need for greater usercontrol.As of March 2025, we still lack meaningful control over AI-generated outputs. From a user experience point of view, most of the time this is acceptable. However, when using AI tools to help with complex information discovery or nuanced creative tasks, the prompting process quickly becomes convoluted, imprecise and frustrating. Technically, this shouldnt need to be thecase.Every time we revise a prompt, a new cycle of input and output tokens is generated. This is an awkward way of working when you are honing in towards a final output. The back and forth text prompting needed to direct AI tools is inefficient, quickly strays from naturally constructed phrases and previous incorrect responses pollute the attention mechanism.This lack of predictability currently prevents users from gaining an intuitive working knowledge of AI tools, which in turn limits the models capabilities.What If?What if we had customisable UI controls that would allow users to navigate towards a desired output without having to use imprecise languageprompts?Older electronic products had direct mechanical feedback between a users input and a corresponding action. This experience feels distant when using current AI tools. But does this need to be the case? Dieter Rams. World Receiver T 1000 Radio, 1963. BrooklynMuseum.Why Is ThisBetter?This isnt just about convenienceits about creating a more natural way for users to collaborate with AI tools and harness their power. The most efficient way for users to solve problems is to learn by doing. The most natural way is by trial, error and refinement. Rewriting a prompt resets all the input token embeddings which mean that users loose any sense of control when working with AItools.A more sensible approach would be to allow users to move through the AI model space and let them navigate to a desiredoutcome.Wireflow: Enhancing AI prompts with a control panel and concept vectorsliders.Erm, I Still Dont GetItTo illustrate this concept more clearly, lets use an analogy. Imagine a game where the multi-dimensional geometry of an AI model is represented by inter-galactic space. Each time you prompt a spaceship pops up somewhere in this inter-galactic space. You have a destination in mindsay a specific star system that you want to explore. At the moment, the only way to navigate towards your star system is to prompt. Each time you do so, the spaceship teleports to another somewhat random position. You are unsure if your new prompt will appear closer or further away to your destination. Your prompts balloon in length, and your uncertainty increases as each additional word has less impact on the spaceships position.If, on the other hand, you had navigational controls, instead of blindly jumping about the universe, you could increase or decrease various values and more effectively learn which values move you towards your destination.You might find that you need to re-prompt a couple of times first to start closer to your destination. But when youre closing in, being able to navigate through the vector space with sliders is significantly more effective.(But what about prompt weightings? By adding + andto words in a prompt it is possible to change their importance! > This is a useful hack but it isnt intuitive or efficient. With successive, lengthy prompts users are still blindly guessing with new token embeddings.)Whats Needed For A UI ControlPanel?UI controls would need to be inferred from eachprompt.The input embeddings go through many cycles of attention processing, so controls would need to directly alter the prompts final input embedding vectorsprior to the output content generation process.Proposed and existing data flow through an LLM attention head.So, How Could ThisWork?When a prompt is being processed a copy of the final input vector embeddings would need to be stored prior to the output generation. From these copied embeddings it should be possible to infer the most relevant values to provide as controls. It should also be possible to allow users to input their ownvalues.If a user needs to fine-tune an output, they could adjust controls which would shift the token embeddings. These new embeddings would be fed directly into the output generation, skipping the input prompt generation.While Im at the edge of my knowledge of ML models, it seems that mathematically it might be possible to effect a change in the token embeddings by altering the Value V in the equationbelow.This mathematical equation describes the attention head layer within a Large Language Model. Query (Q) relates to the token generation of the input prompt. Key (K) maps the input prompt to the model space. Value (V) is a weighting layer that intentionally guides output generation.Where This Approach WorksBestWorking With Near-Known & Unknown InformationWhen new information can shift a users initial intention. Eg, Travel Planning > If a user wrote an initial prompt for a personalised travel itinerary, they could then shift subjective parameters to tailor the plan without having to re-write longprompts.Content GenerationThe tasks that stand to gain the most are when prompting during the creative process, when its beneficial for the temperature parameter to be higher. Eg, when using image generation tools users either have a conscious target in mind that they are trying to match, or they will discover what feels right as they use the generative tool. Endless prompting harms the creative process and is computationally expensive. Concept vector sliders should expand a users creative flow state rather than frustrate it.Deep Research | Searching Within Complex Vertical DatabasesInterrogating data with nuanced vector based search would be useful for particular scientific experiments that involve large databases. Eg, for research studies attempting to map animal communication, it might be useful to explore the contextual differences in the way animals communicate. The same sound pattern might be being made, but being expressed differently depending on comfort and safety vs threat and danger. Navigating a database with UI sliders that control various embedding vectors and provide feedback analysis on search terms could beuseful.Generative AI: Two Example UseCases1. Writing | Feedback & Modulation ControlBefore making style changes to text, it would be useful for writers to receive feedback. As Im writing this article for example, when Im deep in a writing flow, Im unsure if Im keeping an acceptable level of complexity and tone across sections. Variance of course is ok, but feedback would behelpful.Then when making style changes, users need more precise control. Default commands, such as Apple Intelligences Friendly, Professional, Concise, or Geminis Rephrase, Shorten, Elaborate offer little feedback or control. How Friendly or Professional is the text to begin with? And then when applying the change, how much more Friendly, Professional, Shorter or Longer does the user want it to be? Also, perhaps there are more nuanced stylistic changes that Id like toexplore.An initial mock up of how a simple control panel could function within Google Docs existingUI.So Wait, WhatsNew?FeedbackUsers can quickly review a text based on customisable values.User Interface ControlsFollowing feedback, users can then make informed and confident changes along several nuanced concept vectors at once. Without a multi-step prompt dialogue. Using these concept sliders users can pinpoint a specific intention that might be difficult, or inefficient to describe withwords.Easier Development, Deployment & Modulation of Personal StylesA fully customisable control panel can help users create and deploy a personal style and then modulate it for a givencontext.The impact of document analytics and vector sliders like this would be considerable. Instead of giving full agency to AI to re-write texts, using a copilot to quickly analyse and variably modulate text could help users to be more intentional with their writing and improve their writing skills rather than loosing it toAI.2. Multi-Media Content GenerationCompared to text based LLMs text to media generation tools currently suffer from an even greater lack of traction between intention, prompt and output. This is because they have huge dual model spaces with a text input analysis as well as an output vector space which have to be matched together.As well as media labelling issues and black holes within training data (eg. there are hardly any images of wine glasses that are full to the brim), another significant problem is a UXone.Users lack intuition of how to prompt text-image models effectively. With vector sliders users would have greater certainty in knowing whether a desired outcome is even achievable in the model and not a prompt failure. By removing the uncertainty involved with prompts, users would increasingly enjoy working with generative AI tools and be more effective with less overall prompt attempts. Efficiencies in text prompting can only be beneficial from a business standpoint.Mock up of a text to image generator to shown the usefulness of subjective conceptvectors.Im Almost Lost Again, WhatsNew?Two Step Prompts | Text + Concept Vector SlidersWith a more straightforward initial prompt, users could now make further changes using subjective concept vectors. In the above example atmosphere is added to the image. There is feedback of how atmospheric the images is, which informs a user when changing thisvalue.Control Panels Change the Final Input EmbeddingsThis is crucial. When users decide to make a change they would now be able to carefully fine tune an existing prompt without reshuffling all the vector embeddings.It took over an hour of repeated prompts to Adobe Firefly to get the three images for the above mock-up. Every time I re-prompted Firefly I felt as though I was playing roulette. I was never certain of what any of Fireflys controls or presets were doing. Perhaps its a skill issue, but even after finding an image to use as a firm compositional lock and as a style transfer, I was frustrated with an inability to nudge the image in any meaningful non-random way.It definitely feels that something is going wrong. These models are incredibly powerful, and they should be able to handle incremental changes and nuanced inference. There is obviously a lot of untapped potential with the combination of LLMs and diffusion models.Doing More With Less. Why This Is Worth Pursuing.Part of the problem with prompt engineering is that users have to communicate to an AI that has an unknown exposure to the world. Users dont know what information they need to provide to an AI or how that information should be provided. To make matters worse, models frequently change, and in turn, their sensitivities to words and phraseschange.If users had greater model space control, this would ease some of these tensions. Users could write shorter prompts to establish a baseline which they could re-define with concept vectors. A multi-step user interface means shorter, less perfect, and more efficient prompts with increased fine control of the output for the last mile of accuracy.A two-step process, of prompting and then fine-tuning final input embeddings, should also be more computationally efficient. From a UX perspective it would be more satisfying because this method is in-sync with how we think and workparticularly when working through unknown problems and when needing Generative AI to perform at higher temperatures (hallucinations) for creativework.NotesThe ideas in this article can be seen as part of wider evolving research and discussions surrounding Large Concept Models that are being developed by Meta. Essentially this is an LLM model that is specifically organised around conceptually related terms. This approach should make navigating concepts more predictable and reliable from a user experience interaction. Articles for further reading:- Mehul Guptas Meta Large Concept Models (LCM): End of LLMs?- Vishal Rajpjuts Forget LLMs, Its Time For Large Concept Models (LCMs).I first encountered Concept Activation Vectors (CAVs) in 2020, while working alongside Nord Projects on a research project for GoogleAI. This project, which explored subjectivity, style, and inference in images, won an Interaction Award (IxDA).The idea of identifying and working with subjective inference, which Nord Projects explored, has stayed with me ever since. It has influenced the central ideas of this piece and shaped my thinking on how similar concepts could be applied as user controls within LLM and GenAImodels.ReferencesAttention In Transformers, step-by-step Grant Sanderson, (3Blue1Brown Youtube Channel)https://www.youtube.com/watch?v=eMlx5fFNoYcLarge Language Models II: Attention, Transformers and LLMsMitul Tiwarihttps://www.linkedin.com/pulse/large-language-models-ii-attention-transformers-llms-mitul-tiwari-zg0uf/Attention Is All You NeedAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhinhttps://arxiv.org/abs/1706.03762What Is ChatGPT Doing and Why Does It Work Stephen Wolframhttps://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/KingMan + Woman is Queen; but why?Piotr Migdahttps://p.migdal.pl/blog/2017/01/king-man-woman-queen-whyDont Use Cosine Similarity CarelesslyPiotr Migdahttps://p.migdal.pl/blog/2025/01/dont-use-cosine-similarityOpen sourcing the Embedding Projector: a tool for visualizing high dimensional dataDaniel Smilkov and the Big Picture grouphttps://research.google/blog/open-sourcing-the-embedding-projector-a-tool-for-visualizing-high-dimensional-data/How AI Understands Images (CLIP)Mike Pound, (Computerphile)https://www.youtube.com/watch?v=KcSXcpluDe4www.tomhatton.co.ukNavigating embedding vectors was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.

0 Commentarii ·0 Distribuiri ·41 Views

Upgrade to Pro