Creating a Medical Question-Answering Chatbot Using Open-Source BioMistral LLM, LangChain, Chromas Vector Storage, and RAG: A Step-by-Step Guide
www.marktechpost.com
In this tutorial, well build a powerful, PDF-based question-answering chatbot tailored for medical or health-related content. Well leveRAGe the open-source BioMistral LLM and LangChains flexible data orchestration capabilities to process PDF documents into manageable text chunks. Well then encode these chunks using Hugging Face embeddings, capturing deep semantic relationships and storing them in a Chroma vector database for high-efficiency retrieval. Finally, by employing a Retrieval-Augmented Generation (RAG) system, well integrate the retrieved context directly into our chatbots responses, ensuring clear, authoritative answers for users. This approach allows us to rapidly sift through large volumes of medical PDFs, providing context-rich, accurate, and easy-to-understand insights.Setting up toolsCopy CodeCopiedUse a different Browser!pip install langchain sentence-transformers chromadb llama-cpp-python langchain_community pypdffrom langchain_community.document_loaders import PyPDFDirectoryLoaderfrom langchain.text_splitter import CharacterTextSplitter,RecursiveCharacterTextSplitterfrom langchain_community.embeddings import HuggingFaceEmbeddingsfrom langchain.vectorstores import FAISS, Chromafrom langchain_community.llms import LlamaCppfrom langchain.chains import RetrievalQA, LLMChainimport pathlibimport textwrapfrom IPython.display import displayfrom IPython.display import Markdowndef to_markdown(text): text = text.replace('', ' *') return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))from google.colab import drivedrive.mount('/content/drive')First, we install and configure Python packages for document processing, embedding generation, local LLMs, and advanced retrieval-based workflows with LlamaCpp. We leverage langchain_community for PDF loading and text splitting, set up RetrievalQA and LLMChain for question answering, and include a to_markdown utility plus Google Drive mounting.Setting up API key accessCopy CodeCopiedUse a different Browserfrom google.colab import userdata# Or use `os.getenv('HUGGINGFACEHUB_API_TOKEN')` to fetch an environment variable.import osfrom getpass import getpassHF_API_KEY = userdata.get("HF_API_KEY")os.environ["HF_API_KEY"] = "HF_API_KEY"Here, we securely fetch and set the Hugging Face API key as an environment variable in Google Colab. It can also leverage the HUGGINGFACEHUB_API_TOKEN environment variable to avoid directly exposing sensitive credentials in your code.Loading and Extracting PDFs from a DirectoryCopy CodeCopiedUse a different Browserloader = PyPDFDirectoryLoader('/content/drive/My Drive/Data')docs = loader.load()We use PyPDFDirectoryLoader to scan the specified folder for PDFs, extract their text into a document list, and lay the groundwork for tasks like question answering, summarization, or keyword extraction.Splitting Loaded Text Documents into Manageable ChunksCopy CodeCopiedUse a different Browsertext_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)chunks = text_splitter.split_documents(docs)In this code snippet, RecursiveCharacterTextSplitter is applied to break down each document in docs into smaller, more manageable segments.Initializing Hugging Face EmbeddingsCopy CodeCopiedUse a different Browserembeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")Using HuggingFaceEmbeddings, we create an object using the BAAI/bge-base-en-v1.5 model. It converts text into numerical vectors.Building a Vector Store and Running a Similarity SearchCopy CodeCopiedUse a different Browservectorstore = Chroma.from_documents(chunks, embeddings)query = "who is at risk of heart disease"search = vectorstore.similarity_search(query)to_markdown(search[0].page_content)We first build a Chroma vector store (Chroma.from_documents) from the text chunks and the specified embedding model. Next, you create a query asking, who is at risk of heart disease, and perform a similarity search against the stored embeddings. The top result (search[0].page_content) is then converted to Markdown for clearer display.Creating a Retriever and Fetching Relevant DocumentsCopy CodeCopiedUse a different Browserretriever = vectorstore.as_retriever( search_kwargs={'k': 5})retriever.get_relevant_documents(query)We convert the Chroma vector store into a retriever (vectorstore.as_retriever) that efficiently fetches the most relevant documents for a given query.Initializing BioMistral-7B Model with LlamaCppCopy CodeCopiedUse a different Browserllm = LlamaCpp( model_path= "/content/drive/MyDrive/Model/BioMistral-7B.Q4_K_M.gguf", temperature=0.3, max_tokens=2048, top_p=1)We set up an open-source local BioMistral LLM using LlamaCpp, pointing to a pre-downloaded model file. We also configure generation parameters such as temperature, max_tokens, and top_p, which control randomness, the maximum tokens generated, and the nucleus sampling strategy.Setting Up a Retrieval-Augmented Generation (RAG) Chain with a Custom PromptCopy CodeCopiedUse a different Browserfrom langchain.schema.runnable import RunnablePassthroughfrom langchain.schema.output_parser import StrOutputParserfrom langchain.prompts import ChatPromptTemplatetemplate = """<|context|>You are an AI assistant that follows instruction extremely well.Please be truthful and give direct answers</s><|user|>{query}</s> <|assistant|>"""prompt = ChatPromptTemplate.from_template(template)rag_chain = ( {'context': retriever, 'query': RunnablePassthrough()} | prompt | llm | StrOutputParser())Using the above, we set up an RAG pipeline using the LangChain framework. It creates a custom prompt with instructions and placeholders, incorporates a retriever for context, and leverages a language model for generating answers. The flow is defined as a series of operations (RunnablePassthrough for direct query handling, the ChatPromptTemplate for prompt construction, the LLM for response generation, and finally, the StrOutputParser to produce a clean text string).Invoking the RAG Chain to Answer a Health-Related QueryCopy CodeCopiedUse a different Browserresponse = rag_chain.invoke("Why should I care about my heart health?")to_markdown(response)Now, we call the previously constructed RAG chain with a users query. It passes the query to the retriever, retrieves relevant context from the document collection, and feeds that context into the LLM to generate a concise, accurate answer.In conclusion, by integrating BioMistral via LlamaCpp and taking advantage of LangChains flexibility, we are able to build a medical-RAG chatbot with context awareness. From chunk-based indexing to seamless RAG pipelines, it streamlines the process of mining large volumes of PDF data for relevant insights. Users receive clear and easily readable answers by formatting final responses in Markdown. This design can be extended or tailored for various domains, ensuring scalability and precision in knowledge retrieval across diverse documents.Use the Colab Notebook here. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. Dont Forget to join our75k+ ML SubReddit. Meet IntellAgent: An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System (Promoted)The post Creating a Medical Question-Answering Chatbot Using Open-Source BioMistral LLM, LangChain, Chromas Vector Storage, and RAG: A Step-by-Step Guide appeared first on MarkTechPost.
0 Kommentare ·0 Anteile ·19 Ansichten