
Beyond Training Data: How RAG Lets LLMs Retrieve, Not Guess
towardsai.net
Author(s): DarkBones Originally published on Towards AI. Source: Image by the author generated with Flux.Large Language Models (LLMs) like GPT-4 dont actually know anything, they predict words based on old training data. Retrieval-Augmented Generation (RAG) changes that by letting AI pull in fresh, real-world knowledge before answering.RAG enhances LLMs by enabling them to retrieve relevant information from external sources before generating a response. Because LLMs rely on static training data and dont update automatically, RAG gives them access to fresh, domain-specific, or private knowledge, without the need for costly retraining.Lets explore how RAG works, why it is useful, and how it differs from traditional LLM prompting.What is Retrieval-Augmented Generation (RAG) in AI?Retrieval-Augmented Generation (RAG) helps AI models retrieve external information before generating a response. But how exactly does this process work, and why is it important?Large Language Models excel at many tasks. They can code, draft emails, hallucinate ingredients for the perfect sandwich, and even write articles, although I still prefer doing that myself. However, they have a major limitation. They lack real-time knowledge. Because training LLMs is a time-consuming process, they do not know about recent events. If you ask one about last week, it will either display a disclaimer, provide an outdated answer, or generate something completely inaccurate.Some LLMs overcome their biggest limitation of stale training data by retrieving up-to-date information before responding.RAG fetches relevant information before generating an answer, making AI responses more accurate and reducing hallucinations.RAG Explained in Simple TermsBut how does RAG actually work? Instead of looking it up ourselves, lets ask our favorite LLM:Source: Image by the author.This is not quite what we were hoping for. No problem, we can ask Bob instead.Source: Image by the author.Surprisingly, Bob did not know the answer either, but he was able to retrieve it. Here is what happened:We asked Bob about RAG.Bob went to the library and asked the librarian for information.The librarian pointed him to the right aisle.Bob retrieved the information.Bob augmented his understanding by consuming the information before generating an answer.Now Bob sounds like an expert. Thanks, Bob.This breakdown reveals that Bob is effectively functioning as a RAG agent.With that insight, lets explore exactly how a RAG agent operates.RAG SimplifiedLets transform our interaction with Bob into an actual RAG system:Bob represents the RAG system.The librarian acts as an embedder.The library functions as a vector database.Source: Image by the author.Rather than prompting an LLM directly, a RAG system acts as a knowledge bridge: retrieving, augmenting, and then generating responses.Vectorizing the InputThe RAG system then forwards the prompt to the embedder, which converts it into a vector. This vector is a numeric representation of the prompt. The idea is that information with similar meaning will have similar vector representations.Vectors unlock relevance. This vector allows the system to retrieve the most meaningful information from the vector database.When the vector representation of the users prompt is sent to the database, it retrieves the most relevant matches.The RAG system then enhances the users prompt by including the retrieved information:<context>the information returned from the database</context><user-prompt>the user's original prompt</user-prompt>That is the entire process. Retrieve, Augment, and Generate. RAG.Adding to the Knowledge BaseHowever, the system cannot retrieve information that has not been added to the database. How do we store new data? The process is straightforward. Instead of using the vector to find relevant information, the system stores the data along with its vector representation.Source: Image by the author.If you were only interested in the big picture, congratulations. You now understand the core concept. However, if youre a fellow neckbeard, lets talk a bit more about vectors and embedders.What is a Vector?In simple terms, a vector is a set of coordinates that describe how to move from A to B. Look at this graph:Source: Still frame from Photograph music video by Nickelback, edited with a custom graph overlay by the author.This graph has two dimensions. Each point, A, B, C, and D, can be described using a two-number coordinate system. The first number tells us how far to move to the right from the origin (0), while the second number tells us how far to move up. To reach A, the vector is [3, 7]. To reach D, the vector is [3, 0].Dimensionality of VectorsThe same principle applies in three dimensions. To move from your desk to the coffee machine, you must travel a certain distance along the x, y, and z axes, forming a three-digit coordinate system.Source: Image by the author.Humans struggle to visualize beyond three dimensions. Computers thrive in multi-dimensional spaces.The math remains the same. Four dimensions? That requires a four-digit coordinate system. One hundred dimensions? That requires a 100-digit coordinate system.Source: Meme remix combining This is Fine by KC Green (original) with custom artwork by the author.The embedder I use operates in a mind-bending, 768-dimensional coordinate system, far beyond human perception.When you have finished trying to visualize that, we can return to simpler, easy-to-draw, two-dimensional graphs.How Vector Embeddings Help LLMs Retrieve DataVectors by themselves are simply n-dimensional coordinates that represent points in n-dimensional space.Vectors arent just numbers, they encode meaning. Their true power lies in the information they represent.In the same way, vectors are coordinates not to places, but to information. A specialized LLM, an embedder, is trained on a large corpus of text to figure out similarities and to place these pieces of information somewhere in n-dimensional space such that similar topics tend to be grouped together.Like, when you go to a social event, youre likely to stick with your friends, colleagues, or at least a group of like-minded people.Grouping Similar Concepts TogetherSource: Image by the author.This graph shows how words that are similar in meaning tend to get grouped together in this n-dimensional space. Modern embedders (like BERT) dont use single-word embeddings anymore, but generate contextual embeddings.The ability to group similar concepts in vector space makes embeddings powerful. However, early embedding models like Word2Vec had a significant limitation that modern models have addressed.Quick Tech TangentIf youve been working on AI systems for as long as I have, you might be familiar with Word2Vec. While groundbreaking when it came out in 2013, it has a major flaw: it assigns a single vector to each word, no matter the context.Take the word bat.Are we talking about the flying mammal? Then it should be near mammal, cave, and nocturnal.Or do we mean a baseball bat? Then it belongs near ball, pitch, and base (but what base? Military?)And what if were in the world of fiction? Then bat relates to vampire and transformation.Word2Vec cant tell the difference. It picks one and sticks with it.One thing I find particularly fascinating with Word2Vec is that, since words are now represented by numbers, you can actually do arithmetic on them.You can make equations likeking - man + woman = queen a legendary example of how AI models map relationships in vector space."Its wild, but it works (most of the time).Tangent over.How are Vectors Used?Now that we understand vectors, the next step is straightforward. We embed the information we want the LLM to access, and when we ask a question about that information, the question itself should be close to the relevant content in vector space. The vector database retrieves the n most relevant pieces of content, where n is a configurable number.It also returns the cosine similarity score for each result, indicating how closely the retrieved content matches the query.Cosine SimilarityCosine similarity doesnt just compare numbers, it measures meaning by calculating the angle between two vectors.A smaller angle indicates greater similarity, meaning the retrieved data is more relevant to the prompt.Source: Image by the author.In our example, A and B represent the phrases RAG stands for Retrieval Augmented Generation and Hey LLM, tell me about RAG. Since they are closely related, their vectors are similar. If we instead ask Describe an Eclipse, its vector will be far from the others, making it unrelated. However, if RAG stands for Retrieval Augmented Generation is the only entry in the database, it will still be retrieved, even if it is not relevant to the query.Limitations of RAGTypically, we do not store and retrieve entire documents in the vector database. If we did, a single large document could easily exceed the context window of the LLM. If the system is configured to return the ten most relevant pieces of information, and each of them is the size of a full article, your computer quickly turns into a space heater. To prevent this, we split the information into chunks of a predefined size, such as 1000 characters, and we try to keep the sentences and paragraphs intact.However, splitting information into chunks introduces a new problem. Just as Word2Vec struggles to determine meaning from a single word, RAG often fails to understand the full context of a single chunk, especially when that chunk is extracted from the middle of a document.Source: Image by the author.Here is a problem I encountered recently. I keep a detailed work diary where I document all my professional achievements. It is extremely useful during performance reviews. However, when I ask my RAG system what I achieved at my current company, it confidently includes accomplishments from my previous jobs. Because I write this diary in the first person and also include information from other sources written in the first person, the system cannot distinguish between them. As a result, it starts attributing achievements to me that I had nothing to do with. That is how I realized something was wrong. My system was suddenly telling me about all the interesting things I supposedly did away from the computer, which is impossible since I never leave my desk.ConclusionRAG makes LLMs more useful by letting them retrieve information they wouldnt otherwise have access to. But its not magic. It comes with its own challenges, from handling context properly to avoiding irrelevant results.But as I learned firsthand, fetching information isnt the same as understanding it. Thats why making RAG systems context-aware is the next big challenge, one Ill tackle in my next article.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
0 Comments
·0 Shares
·15 Views