RAGent: A Multi-Agent PDF Whisperer Built on LangChain + LangGraph

A distribuit un link

2025-04-21 00:32:08 -

Author(s): Dwaipayan Bandyopadhyay Originally published on Towards AI. Retrieval Augmented Generation is a very well-known approach in the field of Generative AI, which usually consists of a linear flow of chunking a document, storing it in a vector database, then retrieving relevant chunks based on the user query and feeding that to an LLM to get the final response. In recent times, the term “Agentic AI” has been taking the internet by storm, in simple terms it refers to break down a problem into smaller sections and assigning it to certain “agents” who are capable of handling a certain task, and combining smaller agents like that to build a complex workflow. What if we combine this Agentic Approach and Retrieval Augmented Generation? In this article, we will explain a similar concept/architecture we developed using LangGraph, FAISS and OpenAI. Source : Image by Author We will not explore AI Agents and how they work in this article; otherwise, this would become a full-fledged book. But to give a brief overview of what “AI Agents” are, we can consider an “AI Agent” as an assistant, someone or something that is a master in one particular task, multiple agents with multiple capabilities are being added together to make a full Graphical Agentic Workflow, where each agents may communicate with each other, can understand what response the previous agent returned etc. In our approach, we divided the concept of “Retrieval Augmented Generation” into three different tasks and created agent for each task which are capable of handling one specific task, one agent will look into the Retrieval Part, whereas the other will look into the Augmentation Part, and finally the last agent will look into the Generation Part. Then we have combined all three agents to make a complete end-to-end agentic workflow. Let’s dive deep into the coding section. Coding Section Starts Firstly, we will install all the necessary packages required. The best practice would be to create a virtual environment first, and then install the following packages. After they are installed successfully, we will import all the necessary packages to create the Retriever agent first. Coding the RetrieverAgent : from langchain_openai import ChatOpenAIfrom langchain_community.vectorstores.faiss import FAISSfrom langchain_openai import OpenAIEmbeddingsfrom langchain.docstore.document import Documentfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.prompts import ChatPromptTemplate, MessagesPlaceholderfrom pypdf import PdfReaderimport refrom dotenv import load_dotenvimport streamlit as stload_dotenv()LLM = ChatOpenAI(model_name="gpt-4o", temperature=0.0)def extract_text_from_pdf(pdf_path): try: pdf = PdfReader(pdf_path) output = [] for i, page in enumerate(pdf.pages, 1): text = page.extract_text() text = re.sub(r"(\w+)-\n(\w+)", r"\1\2", text) text = re.sub(r"(?<!\n\s)\n(?!\s\n)", " ", text.strip()) text = re.sub(r"\n\s*\n", "\n\n", text) output.append((text, i)) # Tuple of (text, page number) return output except Exception as e: st.error(f"Error reading PDF: {e}") return []def text_to_docs(text_with_pages): docs = [] text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=200) for text, page_num in text_with_pages: chunks = text_splitter.split_text(text) for i, chunk in enumerate(chunks): doc = Document( page_content=chunk, metadata={"source": f"page-{page_num}", "page_num": page_num} ) docs.append(doc) return docsdef create_vectordb(pdf_path): text_with_pages = extract_text_from_pdf(pdf_path) if not text_with_pages: raise ValueError("No text extracted from PDF.") docs = text_to_docs(text_with_pages) embeddings = OpenAIEmbeddings() return FAISS.from_documents(docs, embeddings)# Define Toolsdef retrieve_from_pdf(query: str, vectordb) -> dict: """Retrieve the most relevant text and page number using similarity search.""" # Use similarity_search to get the top result docs = vectordb.similarity_search(query, k=3) # k=1 for single most relevant result if docs: doc = docs[0] content = f"Page {doc.metadata['page_num']}: {doc.page_content}" page_num = doc.metadata["page_num"] return {"content": content, "page_num": page_num} return {"content": "No content retrieved.", "page_num": None}RETRIEVE_PROMPT = ChatPromptTemplate.from_messages([ ("system", """), You are the Retrieve Agent. Your task is to fetch the most relevant text from a PDF based on the user's query. - Use the provided retrieval function to get content and a single page number. - Return the content directly with the page number included (e.g., 'Page X: text'). - If no content is found, return "No content retrieved." """ MessagesPlaceholder(variable_name="chat_history"), ("human", "{query}"),]) Explanation of the Code – In this retriever agent code, firstly, we are importing all the necessary modules and classes required, We are storing our credentials, such as OpenAI API Key, in a .env file, which is why the dotenv module has been used here alongside the load_dotenv function call. Next up, we are initialising the LLM by providing required arguments such as model name, temperature, etc. Descriptions of Functions extract_text_from_pdf is being used to read and extract the content of the PDF and cleanse it a bit by fixing hyphenated line breaks, which are causing a word to break into two pieces, converting single newlines into spaces unless they are a part of paragraph spacing, etc. The cleaning process is done page-wise, which is why a loop is applied over the number of pages using the enumerate function. Finally, from this function, the cleansed extracted content is returned alongside its’s pagenumber is returned as a form of list of tuples. If any unwanted error occurs, that too can be handled via the try-except block used; this ensures the code works seamlessly without breaking due to errors. text_to_docs is being used to do Chunking, here the RecursiveCharacterTextSplitter class of the langchain module is being used, each chunk size would be of 4000, and the overlapping would be of 200. Then a loop is being done over the text_with_pages argument, which will receive the output from the previous function, i.e extract_text_from_pdf, as it returns the output in a list of tuples format. Two variables are being used in the loop to consider both items of the tuple. Then the cleansed text is split into chunks and converted into a Document object, which will be further used to convert them into Embeddings. Apart from the page content, the Document object will hold the page number and a string label including the page number as metadata. Each Document will then be appended to a list and returned. create_vectordb This function uses the above two functions to create Embeddings using FAISS (Facebook AI Similarity Search) Vectorstore. It is a lightweight vector store that stores the index locally and helps in doing similarity searches with ease. This function just creates and returns the Vector database. That’s it. retrieve_from_pdf In this function, we are doing the similarity search and getting the top 3 chunks, and if found, then we are considering the first chunk only so that it consists of the most similar content and returning it along with it’s page number as a dictionary. The RETRIEVE_PROMPT is a ChatPromptTemplate consisting of the instruction, i.e System Message for the LLM, mentioning its job as a retriever agent. It will also consider the entire chat history of a particular session and will accept the user query as human input. Coding the Augmentator Agent from langchain.prompts import ChatPromptTemplate, MessagesPlaceholderfrom typing import Optionaldef augment_with_context(content: str, page_num: Optional[int]) -> str: """Augment retrieved content with source context.""" if content != "No content retrieved." and page_num: return f"{content}\n\nAdditional context: Sourced from page {page_num}." return f"{content}\n\nAdditional context: No specific page identified."AUGMENT_PROMPT = ChatPromptTemplate.from_messages([ ("system", """), You are the Augment Agent. Enhance the retrieved content with additional context. - If content is available, append a note with the single page number. - If no content is retrieved, return "No augmented content." """ MessagesPlaceholder(variable_name="chat_history"), ("human", "Retrieved content: {retrieved_content}\nPage number: {page_num}"),]) Explanation of the Functions augment_with_context This is a very straightforward approach where we are looking for some extra information from the provided PDF to solidify the retrieved information by the retrieval agent. If found, the extra content, alongside its page number, will be added to the original retrieved content; otherwise, if both are not found, it will simply return the same original content without any modification The AUGMENT_PROMPT Coding the GeneratorAgent from langchain.prompts import ChatPromptTemplate, MessagesPlaceholderGENERATE_PROMPT = ChatPromptTemplate.from_messages([ ("system", """), You are the Generate Agent. Create a detailed response based on the augmented content. - Focus on DBMS and SQL content. - Append "Source: Page X" at the end if a page number is available. - If the user query consists of terms like "explain", "simple", "simplify" etc. or relatable, then do not return any page number, otherwise return the proper page number. - If the question is not DBMS-related, reply "Not applicable." - Use the chat history to maintain context. """ MessagesPlaceholder(variable_name="chat_history"), ("human", "{query}\nAugmented content: {augmented_content}"),]) The generator agent only consists of the PromptTemplate with the instruction of how to generate final response based on the retrieved content as well as augmented extra information from previous two steps. After all these separate agents are created, it’s time to store them under a single umbrella and form the entire end-to-end workflow using LangGraph. Code for the Graph Creation using LangGraph import streamlit as stfrom langgraph.graph import StateGraph, ENDfrom typing import TypedDict, List, Optionalimport refrom IPython.display import display, Imagefrom retriever import (LLM,extract_text_from_pdf,text_to_docs,create_vectordb,retrieve_from_pdf,RETRIEVE_PROMPT)from augmentation import augment_with_context,AUGMENT_PROMPTfrom generation import GENERATE_PROMPTfrom dotenv import load_dotenvload_dotenv()PDF_FILE_PATH = "dbms_notes.pdf"# Define the Agent Stateclass AgentState(TypedDict): query: str chat_history: List[dict] retrieved_content: Optional[str] page_num: Optional[int] # Single page number instead of a list augmented_content: Optional[str] response: Optional[str]def format_for_display(text): def replace_latex(match): latex_expr = match.group(1) return f"$${latex_expr}$$" # Use $$ for Streamlit Markdown to render LaTeX text = re.sub(r'\\frac\{([^}]+)\}\{([^}]+)\}', r'$\\frac{\1}{\2}$', text) return text# Define Multi-Agent Nodesdef retrieve_agent(state: AgentState) -> AgentState: chain = RETRIEVE_PROMPT | LLM retrieved = retrieve_from_pdf(state["query"], st.session_state.vectordb) response = chain.invoke({"query": state["query"], "chat_history": state["chat_history"]}) #print(retrieved) return { "retrieved_content": retrieved['content'], "page_num": retrieved["page_num"] }def augment_agent(state: AgentState) -> AgentState: chain = AUGMENT_PROMPT | LLM if state["retrieved_content"] and state["retrieved_content"] != "No content retrieved.": # Prepare input for the LLM input_data = { "retrieved_content": state["retrieved_content"], "page_num": str(state["page_num"]) if state["page_num"] else "None", "chat_history": state["chat_history"] } # Invoke the LLM to generate augmented content response = chain.invoke(input_data) augmented_content = response.content # Use the LLM's output else: augmented_content = "No augmented content." return {"augmented_content": augmented_content}def generate_agent(state: AgentState) -> AgentState: chain = GENERATE_PROMPT | LLM response = chain.invoke({ "query": state["query"], "augmented_content": state["augmented_content"] or "No augmented content.", "chat_history": state["chat_history"] }) return {"response": response.content}# Define Conditional Edge Logicdef decide_augmentation(state: AgentState) -> str: if state["retrieved_content"] and state["retrieved_content"] != "No content retrieved.": return "augmentation" return "generation"workflow = StateGraph(AgentState)workflow.add_node("retrieve_agent", retrieve_agent)workflow.add_node("augment_agent", augment_agent)workflow.add_node("generate_agent", generate_agent)workflow.set_entry_point("retrieve_agent")workflow.add_conditional_edges( "retrieve_agent", decide_augmentation, { "augmentation": "augment_agent", "generation": "generate_agent" })workflow.add_edge("augment_agent", "generate_agent")workflow.add_edge("generate_agent", END)agent = workflow.compile()# display(Image(agent.get_graph().draw_mermaid_png(output_file_path="tutor_agent.png")))st.set_page_config(page_title="🤖 RAGent", layout="wide")st.title("🤖 RAGent : Your Personal Teaching Assistant")st.markdown("Ask any question from your book and get detailed answers with a single source page!")# Initialize session state for vector databaseif "vectordb" not in st.session_state: with st.spinner("Loading PDF content... This may take a minute."): try: st.session_state.vectordb = create_vectordb(PDF_FILE_PATH) except Exception as e: st.error(f"Failed to load PDF: {e}") st.stop()# Initialize chat history in session stateif "messages" not in st.session_state: st.session_state.messages = []# Display chat historyfor message in st.session_state.messages: with st.chat_message(message["role"]): st.markdown(message["content"])# User inputuser_input = st.chat_input("Ask anything from the PDF")if user_input: # Add user message to chat history st.session_state.messages.append({"role": "user", "content": user_input}) with st.chat_message("user"): st.markdown(user_input) # Display assistant response with st.chat_message("assistant"): message_placeholder = st.empty() # Prepare chat history for the agent chat_history = [ {"type": "human", "content": msg["content"]} if msg["role"] == "user" else {"type": "ai", "content": msg["content"]} for msg in st.session_state.messages[:-1] # Exclude current input ] # Prepare initial state initial_state = { "query": user_input, "chat_history": chat_history, "retrieved_content": None, "page_num": None, "augmented_content": None, "response": None, # Add field for Ragas sample } # Run the agent with a spinner with st.spinner("Processing..."): final_state = agent.invoke(initial_state) answer = final_state["response"] formatted_answer = format_for_display(answer) # Display response message_placeholder.markdown(formatted_answer) # Update chat history st.session_state.messages.append({ "role": "assistant", "content": formatted_answer }) Explanation of the Code AgentState class — In this class, we are defining a schema that will be enforced on top of the LLM response and the entire “state” will carry this same structure throughout the entire workflow. This will be passed as argument during the StateGraph creation. format_for_display function — This function has a nested function, which will be used to handle LaTeX-based outputs. We are using this because the document may contain fractions which might not be handled by Streamlit properly, so using this as an extra precaution. retrieve_agent function — This will use the retrieve_from_pdf function we defined earlier. Firstly, we will create a chain using the retrieve prompt and LLM. Then, invoke it using the query provided by the user, which is nothing but the user’s question, and also consider the entire chat_history as well, and finally it will return the content and page number. augment_agent function — Here, we will again create a chain using the AUGMENT_PROMPT, this time and check whether the retriever agent returned any content or not. If it returned any content, then we will call the augment_with_context function and pass the retrieved content, page number, as well as the chat_history, then return the content provided by the response. generate_agent function — Here, finally, we are passing the augmented content, user query and chat history so that LLM can leverage the augmented content and generate the final response based on the augmented information and display it to the user. decide_augmentation function — This is an optional step being provided to check whether it is necessary for the augmentation agent to run or not. After all the necessary agents are created, it’s time to combine them to create an end-to-end workflow, which will be done by using the StateGraph class of LangGraph. During initialisation of the StateGraph class, we will pass the AgentState class we defined earlier as its parameter to indicate that during the entire workflow, these are the only keys that will be there in the response, nothing else. Then we are adding the nodes into the StateGraph to create the entire workflow, setting up the entry point manually to make it understand which node will be executed first, adding edges in between the nodes to state how the workflow will look like, adding a conditional edge in between to signify that the node connected with the conditional edge, may or may not be called during the workflow everytime. Finally, compiling the entire workflow to check whether everything is working fine and the graph that has been created is proper or not. We can display the graph using the IPython module and Mermaid ink method. The graph will look like below, if everything goes correctly. Source : Image by Author Then, the rest of the code is entirely Streamlit-based. The user can design the UI according to their choice. We have taken a very basic approach in designing the UI, so that it remains user-friendly. We are considering some session states as well, so to maintain the chat history, user query, etc. This will not start without the user input, meaning that until and unless the user provides any query, the workflow will not start. Screenshots of the Application in Working Condition – Source : Image by Author Source : Image by Author This article has been written in collaboration with Biswajit Das Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI

0 Commentarii 0 Distribuiri 51 Views