
TOWARDSDATASCIENCE.COM
Retrieval Augmented Generation (RAG) — An Introduction
The model hallucinated! It was giving me OK answers and then it just started hallucinating. We’ve all heard or experienced it.
Natural Language Generation models can sometimes hallucinate, i.e., they start generating text that is not quite accurate for the prompt provided. In layman’s terms, they start making stuff up that’s not strictly related to the context given or plainly inaccurate. Some hallucinations can be understandable, for example, mentioning something related but not exactly the topic in question, other times it may look like legitimate information but it’s simply not correct, it’s made up.
This is clearly a problem when we start using generative models to complete tasks and we intend to consume the information they generated to make decisions.
The problem is not necessarily tied to how the model is generating the text, but in the information it’s using to generate a response. Once you train an LLM, the information encoded in the training data is crystalized, it becomes a static representation of everything the model knows up until that point in time. In order to make the model update its world view or its knowledge base, it needs to be retrained. However, training Large Language Models requires time and money.
One of the main motivations for developing RAG s the increasing demand for factually accurate, contextually relevant, and up-to-date generated content.[1]
When thinking about a way to make generative models aware of the wealth of new information that is created everyday, researchers started exploring efficient ways to keep these models-up-to-date that didn’t require continuously re-training models.
They came up with the idea for Hybrid Models, meaning, generative models that have a way of fetching external information that can complement the data the LLM already knows and was trained on. These modela have a information retrieval component that allows the model to access up-to-date data, and the generative capabilities they are already well known for. The goal being to ensure both fluency and factual correctness when producing text.
This hybrid model architecture is called Retrieval Augmented Generation, or RAG for short.
The RAG era
Given the critical need to keep models updated in a time and cost effective way, RAG has become an increasingly popular architecture.
Its retrieval mechanism pulls information from external sources that are not encoded in the LLM. For example, you can see RAG in action, in the real world, when you ask Gemini something about the Brooklyn Bridge. At the bottom you’ll see the external sources where it pulled information from.
Example of external sources being shown as part of the output of the RAG model. (Image by author)
By grounding the final output on information obtained from the retrieval module, the outcome of these Generative AI applications, is less likely to propagate any biases originating from the outdated, point-in-time view of the training data they used.
The second piece of the Rag Architecture is what is the most visible to us, consumers, the generation model. This is typically an LLM that processes the information retrieved and generates human-like text.
RAG combines retrieval mechanisms with generative language models to enhance the accuracy of outputs[1]
As for its internal architecture, the retrieval module, relies on dense vectors to identify the relevant documents to use, while the generative model, utilizes the typical LLM architecture based on transformers.
A basic flow of the RAG system along with its component. Image and caption taken from paper referenced in [1] (Image by Author)
This architecture addresses very important pain-points of generative models, but it’s not a silver bullet. It also comes with some challenges and limitations.
The Retrieval module may struggle in getting the most up-to-date documents.
This part of the architecture relies heavily on Dense Passage Retrieval (DPR)[2, 3]. Compared to other techniques such as BM25, which is based on TF-IDF, DPR does a much better job at finding the semantic similarity between query and documents. It leverages semantic meaning, instead of simple keyword matching is especially useful in open-domain applications, i.e., think about tools like Gemini or ChatGPT, which are not necessarily experts in a particular domain, but know a little bit about everything.
However, DPR has its shortcomings too. The dense vector representation can lead to irrelevant or off-topic documents being retrieved. DPR models seem to retrieve information based on knowledge that already exists within their parameters, i.e, facts must be already encoded in order to be accessible by retrieval[2].
[…] if we extend our definition of retrieval to also encompass the ability to navigate and elucidate concepts previously unknown or unencountered by the model—a capacity akin to how humans research and retrieve information—our findings imply that DPR models fall short of this mark.[2]
To mitigate these challenges, researchers thought about adding more sophisticated query expansion and contextual disambiguation. Query expansion is a set of techniques that modify the original user query by adding relevant terms, with the goal of establishing a connection between the intent of the user’s query with relevant documents[4].
There are also cases when the generative module fails to fully take into account, into its responses, the information gathered in the retrieval phase. To address this, there have been new improvements on attention and hierarchical fusion techniques [5].
Model performance is an important metric, especially when the goal of these applications is to seamlessly be part of our day-to-day lives, and make the most mundane tasks almost effortless. However, running RAG end-to-end can be computationally expensive. For every query the user makes, there needs to be one step for information retrieval, and another for text generation. This is where new techniques, such as Model Pruning [6] and Knowledge Distillation [7] come into play, to ensure that even with the additional step of searching for up-to-date information outside of the trained model data, the overall system is still performant.
Lastly, while the information retrieval module in the RAG architecture is intended to mitigate bias by accessing external sources that are more up-to-date than the data the model was trained on, it may actually not fully eliminate bias. If the external sources are not meticulously chosen, they can continue to add bias or even amplify existing biases from the training data.
Conclusion
Utilizing RAG in generative applications provides a significant improvement on the model’s capacity to stay up-to-date, and gives its users more accurate results.
When used in domain-specific applications, its potential is even clearer. With a narrower scope and an external library of documents pertaining only to a particular domain, these models have the ability to do a more effective retrieval of new information.
However, ensuring generative models are constantly up-to-date is far from a solved problem.
Technical challenges, such as, handling unstructured data or ensuring model performance, continue to be active research topics.
Hope you enjoyed learning a bit more about RAG, and the role this type of architecture plays in making generative applications stay up-to-date without requiring to retrain the model.
Thanks for reading!
References
A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions. (2024). Shailja Gupta and Rajesh Ranjan and Surya Narayan Singh. (ArXiv)
Retrieval-Augmented Generation: Is Dense Passage Retrieval Retrieving. (2024). Benjamin Reichman and Larry Heck— (link)
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D. & Yih, W. T. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 6769-6781).(Arxiv)
Hamin Koo and Minseon Kim and Sung Ju Hwang. (2024).Optimizing Query Generation for Enhanced Document Retrieval in RAG. (Arxiv)
Izacard, G., & Grave, E. (2021). Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 874-880). (Arxiv)
Han, S., Pool, J., Tran, J., & Dally, W. J. (2015). Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems (pp. 1135-1143). (Arxiv)
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. ArXiv. /abs/1910.01108 (Arxiv)
The post Retrieval Augmented Generation (RAG) — An Introduction appeared first on Towards Data Science.
0 Kommentare
0 Anteile
43 Ansichten