• How to Build a Powerful and Intelligent Question-Answering System by Using Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain Framework

    In this tutorial, we demonstrate how to build a powerful and intelligent question-answering system by combining the strengths of Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain framework. The pipeline leverages real-time web search using Tavily, semantic document caching with Chroma vector store, and contextual response generation through the Gemini model. These tools are integrated through LangChain’s modular components, such as RunnableLambda, ChatPromptTemplate, ConversationBufferMemory, and GoogleGenerativeAIEmbeddings. It goes beyond simple Q&A by introducing a hybrid retrieval mechanism that checks for cached embeddings before invoking fresh web searches. The retrieved documents are intelligently formatted, summarized, and passed through a structured LLM prompt, with attention to source attribution, user history, and confidence scoring. Key functions such as advanced prompt engineering, sentiment and entity analysis, and dynamic vector store updates make this pipeline suitable for advanced use cases like research assistance, domain-specific summarization, and intelligent agents.
    !pip install -qU langchain-community tavily-python langchain-google-genai streamlit matplotlib pandas tiktoken chromadb langchain_core pydantic langchain
    We install and upgrade a comprehensive set of libraries required to build an advanced AI search assistant. It includes tools for retrieval, LLM integration, data handling, visualization, and tokenization. These components form the core foundation for constructing a real-time, context-aware QA system.
    import os
    import getpass
    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    import json
    import time
    from typing import List, Dict, Any, Optional
    from datetime import datetime
    We import essential Python libraries used throughout the notebook. It includes standard libraries for environment variables, secure input, time tracking, and data types. Additionally, it brings in core data science tools like pandas, matplotlib, and numpy for data handling, visualization, and numerical computations, as well as json for parsing structured data.
    if "TAVILY_API_KEY" not in os.environ:
    os.environ= getpass.getpassif "GOOGLE_API_KEY" not in os.environ:
    os.environ= getpass.getpassimport logging
    logging.basicConfigs - %s - %s - %s')
    logger = logging.getLoggerWe securely initialize API keys for Tavily and Google Gemini by prompting users only if they’re not already set in the environment, ensuring safe and repeatable access to external services. It also configures a standardized logging setup using Python’s logging module, which helps monitor execution flow and capture debug or error messages throughout the notebook.
    from langchain_community.retrievers import TavilySearchAPIRetriever
    from langchain_community.vectorstores import Chroma
    from langchain_core.documents import Document
    from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
    from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
    from langchain_core.runnables import RunnablePassthrough, RunnableLambda
    from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain.chains.summarize import load_summarize_chain
    from langchain.memory import ConversationBufferMemory
    We import key components from the LangChain ecosystem and its integrations. It brings in the TavilySearchAPIRetriever for real-time web search, Chroma for vector storage, and GoogleGenerativeAI modules for chat and embedding models. Core LangChain modules like ChatPromptTemplate, RunnableLambda, ConversationBufferMemory, and output parsers enable flexible prompt construction, memory handling, and pipeline execution.
    class SearchQueryError:
    """Exception raised for errors in the search query."""
    pass

    def format_docs:
    formatted_content =for i, doc in enumerate:
    metadata = doc.metadata
    source = metadata.gettitle = metadata.getscore = metadata.getformatted_content.appendreturn "nn".joinWe define two essential components for search and document handling. The SearchQueryError class creates a custom exception to manage invalid or failed search queries gracefully. The format_docs function processes a list of retrieved documents by extracting metadata such as title, source, and relevance score and formatting them into a clean, readable string.
    class SearchResultsParser:
    def parse:
    try:
    if isinstance:
    import re
    import json
    json_match = re.searchif json_match:
    json_str = json_match.groupreturn json.loadsreturn {"answer": text, "sources":, "confidence": 0.5}
    elif hasattr:
    return {"answer": text.content, "sources":, "confidence": 0.5}
    else:
    return {"answer": str, "sources":, "confidence": 0.5}
    except Exception as e:
    logger.warningreturn {"answer": str, "sources":, "confidence": 0.5}
    The SearchResultsParser class provides a robust method for extracting structured information from LLM responses. It attempts to parse a JSON-like string from the model output, returning to a plain text response format if parsing fails. It gracefully handles string outputs and message objects, ensuring consistent downstream processing. In case of errors, it logs a warning and returns a fallback response containing the raw answer, empty sources, and a default confidence score, enhancing the system’s fault tolerance.
    class EnhancedTavilyRetriever:
    def __init__:
    self.api_key = api_key
    self.max_results = max_results
    self.search_depth = search_depth
    self.include_domains = include_domains orself.exclude_domains = exclude_domains orself.retriever = self._create_retrieverself.previous_searches =def _create_retriever:
    try:
    return TavilySearchAPIRetrieverexcept Exception as e:
    logger.errorraise

    def invoke:
    if not query or not query.strip:
    raise SearchQueryErrortry:
    start_time = time.timeresults = self.retriever.invokeend_time = time.timesearch_record = {
    "timestamp": datetime.now.isoformat,
    "query": query,
    "num_results": len,
    "response_time": end_time - start_time
    }
    self.previous_searches.appendreturn results
    except Exception as e:
    logger.errorraise SearchQueryError}")

    def get_search_history:
    return self.previous_searches
    The EnhancedTavilyRetriever class is a custom wrapper around the TavilySearchAPIRetriever, adding greater flexibility, control, and traceability to search operations. It supports advanced features like limiting search depth, domain inclusion/exclusion filters, and configurable result counts. The invoke method performs web searches and tracks each query’s metadata, storing it for later analysis.
    class SearchCache:
    def __init__:
    self.embedding_function = GoogleGenerativeAIEmbeddingsself.vector_store = None
    self.text_splitter = RecursiveCharacterTextSplitterdef add_documents:
    if not documents:
    return

    try:
    if self.vector_store is None:
    self.vector_store = Chroma.from_documentselse:
    self.vector_store.add_documentsexcept Exception as e:
    logger.errordef search:
    if self.vector_store is None:
    returntry:
    return self.vector_store.similarity_searchexcept Exception as e:
    logger.errorreturnThe SearchCache class implements a semantic caching layer that stores and retrieves documents using vector embeddings for efficient similarity search. It uses GoogleGenerativeAIEmbeddings to convert documents into dense vectors and stores them in a Chroma vector database. The add_documents method initializes or updates the vector store, while the search method enables fast retrieval of the most relevant cached documents based on semantic similarity. This reduces redundant API calls and improves response times for repeated or related queries, serving as a lightweight hybrid memory layer in the AI assistant pipeline.
    search_cache = SearchCacheenhanced_retriever = EnhancedTavilyRetrievermemory = ConversationBufferMemorysystem_template = """You are a research assistant that provides accurate answers based on the search results provided.
    Follow these guidelines:
    1. Only use the context provided to answer the question
    2. If the context doesn't contain the answer, say "I don't have sufficient information to answer this question."
    3. Cite your sources by referencing the document numbers
    4. Don't make up information
    5. Keep the answer concise but complete

    Context: {context}
    Chat History: {chat_history}
    """

    system_message = SystemMessagePromptTemplate.from_templatehuman_template = "Question: {question}"
    human_message = HumanMessagePromptTemplate.from_templateprompt = ChatPromptTemplate.from_messagesWe initialize the core components of the AI assistant: a semantic SearchCache, the EnhancedTavilyRetriever for web-based querying, and a ConversationBufferMemory to retain chat history across turns. It also defines a structured prompt using ChatPromptTemplate, guiding the LLM to act as a research assistant. The prompt enforces strict rules for factual accuracy, context usage, source citation, and concise answering, ensuring reliable and grounded responses.
    def get_llm:
    try:
    return ChatGoogleGenerativeAIexcept Exception as e:
    logger.errorraise

    output_parser = SearchResultsParserWe define the get_llm function, which initializes a Google Gemini language model with configurable parameters such as model name, temperature, and decoding settings. It ensures robustness with error handling for failed model initialization. An instance of SearchResultsParser is also created to standardize and structure the LLM’s raw responses, enabling consistent downstream processing of answers and metadata.
    def plot_search_metrics:
    if not search_history:
    printreturn

    df = pd.DataFrameplt.figure)
    plt.subplotplt.plot), df, marker='o')
    plt.titleplt.xlabelplt.ylabel')
    plt.gridplt.subplotplt.bar), df)
    plt.titleplt.xlabelplt.ylabelplt.gridplt.tight_layoutplt.showThe plot_search_metrics function visualizes performance trends from past queries using Matplotlib. It converts the search history into a DataFrame and plots two subgraphs: one showing response time per search and the other displaying the number of results returned. This aids in analyzing the system’s efficiency and search quality over time, helping developers fine-tune the retriever or identify bottlenecks in real-world usage.
    def retrieve_with_fallback:
    cached_results = search_cache.searchif cached_results:
    logger.info} documents from cache")
    return cached_results

    logger.infosearch_results = enhanced_retriever.invokesearch_cache.add_documentsreturn search_results

    def summarize_documents:
    llm = get_llmsummarize_prompt = ChatPromptTemplate.from_templatechain =, "query": lambda _: query}
    | summarize_prompt
    | llm
    | StrOutputParser)

    return chain.invokeThese two functions enhance the assistant’s intelligence and efficiency. The retrieve_with_fallback function implements a hybrid retrieval mechanism: it first attempts to fetch semantically relevant documents from the local Chroma cache and, if unsuccessful, falls back to a real-time Tavily web search, caching the new results for future use. Meanwhile, summarize_documents leverages a Gemini LLM to generate concise summaries from retrieved documents, guided by a structured prompt that ensures relevance to the query. Together, they enable low-latency, informative, and context-aware responses.
    def advanced_chain:
    llm = get_llmif query_engine == "enhanced":
    retriever = lambda query: retrieve_with_fallbackelse:
    retriever = enhanced_retriever.invoke

    def chain_with_history:
    query = input_dictchat_history = memory.load_memory_variablesif include_history elsedocs = retrievercontext = format_docsresult = prompt.invokememory.save_contextreturn llm.invokereturn RunnableLambda| StrOutputParserThe advanced_chain function defines a modular, end-to-end reasoning workflow for answering user queries using cached or real-time search. It initializes the specified Gemini model, selects the retrieval strategy, constructs a response pipeline incorporating chat history, formats documents into context, and prompts the LLM using a system-guided template. The chain also logs the interaction in memory and returns the final answer, parsed into clean text. This design enables flexible experimentation with models and retrieval strategies while maintaining conversation coherence.
    qa_chain = advanced_chaindef analyze_query:
    llm = get_llmanalysis_prompt = ChatPromptTemplate.from_template3. Key entities mentioned
    4. Query typeQuery: {query}

    Return the analysis in JSON format with the following structure:
    {{
    "topic": "main topic",
    "sentiment": "sentiment",
    "entities":,
    "type": "query type"
    }}
    """
    )

    chain = analysis_prompt | llm | output_parser

    return chain.invokeprintprintquery = "what year was breath of the wild released and what was its reception?"
    printWe initialize the final components of the intelligent assistant. qa_chain is the assembled reasoning pipeline ready to process user queries using retrieval, memory, and Gemini-based response generation. The analyze_query function performs a lightweight semantic analysis on a query, extracting the main topic, sentiment, entities, and query type using the Gemini model and a structured JSON prompt. The example query, about Breath of the Wild’s release and reception, showcases how the assistant is triggered and prepared for full-stack inference and semantic interpretation. The printed heading marks the start of interactive execution.
    try:
    printanswer = qa_chain.invokeprintprintprinttry:
    query_analysis = analyze_queryprintprint)
    except Exception as e:
    print: {e}")
    except Exception as e:
    printhistory = enhanced_retriever.get_search_historyprintfor i, h in enumerate:
    printprintspecialized_retriever = EnhancedTavilyRetrievertry:
    specialized_results = specialized_retriever.invokeprint} specialized results")

    summary = summarize_documentsprintprintexcept Exception as e:
    printprintplot_search_metricsWe demonstrate the complete pipeline in action. It performs a search using the qa_chain, displays the generated answer, and then analyzes the query for sentiment, topic, entities, and type. It also retrieves and prints each query’s search history, response time, and result count. Also, it runs a domain-filtered search focused on Nintendo-related sites, summarizes the results, and visualizes search performance using plot_search_metrics, offering a comprehensive view of the assistant’s capabilities in real-time use.
    In conclusion, following this tutorial gives users a comprehensive blueprint for creating a highly capable, context-aware, and scalable RAG system that bridges real-time web intelligence with conversational AI. The Tavily Search API lets users directly pull fresh and relevant content from the web. The Gemini LLM adds robust reasoning and summarization capabilities, while LangChain’s abstraction layer allows seamless orchestration between memory, embeddings, and model outputs. The implementation includes advanced features such as domain-specific filtering, query analysis, and fallback strategies using a semantic vector cache built with Chroma and GoogleGenerativeAIEmbeddings. Also, structured logging, error handling, and analytics dashboards provide transparency and diagnostics for real-world deployment.

    Check out the Colab Notebook. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.
    Asif RazzaqWebsite |  + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/AWS Open-Sources Strands Agents SDK to Simplify AI Agent DevelopmentAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Windsurf Launches SWE-1: A Frontier AI Model Family for End-to-End Software EngineeringAsif Razzaqhttps://www.marktechpost.com/author/6flvq/AI Agents Now Write Code in Parallel: OpenAI Introduces Codex, a Cloud-Based Coding Agent Inside ChatGPTAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step-by-Step Guide to Build an Automated Knowledge Graph Pipeline Using LangGraph and NetworkX

    Build GenAI you can trust. ⭐️ Parlant is your open-source engine for controlled, compliant, and purposeful AI conversations — Star Parlant on GitHub!
    #how #build #powerful #intelligent #questionanswering
    How to Build a Powerful and Intelligent Question-Answering System by Using Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain Framework
    In this tutorial, we demonstrate how to build a powerful and intelligent question-answering system by combining the strengths of Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain framework. The pipeline leverages real-time web search using Tavily, semantic document caching with Chroma vector store, and contextual response generation through the Gemini model. These tools are integrated through LangChain’s modular components, such as RunnableLambda, ChatPromptTemplate, ConversationBufferMemory, and GoogleGenerativeAIEmbeddings. It goes beyond simple Q&A by introducing a hybrid retrieval mechanism that checks for cached embeddings before invoking fresh web searches. The retrieved documents are intelligently formatted, summarized, and passed through a structured LLM prompt, with attention to source attribution, user history, and confidence scoring. Key functions such as advanced prompt engineering, sentiment and entity analysis, and dynamic vector store updates make this pipeline suitable for advanced use cases like research assistance, domain-specific summarization, and intelligent agents. !pip install -qU langchain-community tavily-python langchain-google-genai streamlit matplotlib pandas tiktoken chromadb langchain_core pydantic langchain We install and upgrade a comprehensive set of libraries required to build an advanced AI search assistant. It includes tools for retrieval, LLM integration, data handling, visualization, and tokenization. These components form the core foundation for constructing a real-time, context-aware QA system. import os import getpass import pandas as pd import matplotlib.pyplot as plt import numpy as np import json import time from typing import List, Dict, Any, Optional from datetime import datetime We import essential Python libraries used throughout the notebook. It includes standard libraries for environment variables, secure input, time tracking, and data types. Additionally, it brings in core data science tools like pandas, matplotlib, and numpy for data handling, visualization, and numerical computations, as well as json for parsing structured data. if "TAVILY_API_KEY" not in os.environ: os.environ= getpass.getpassif "GOOGLE_API_KEY" not in os.environ: os.environ= getpass.getpassimport logging logging.basicConfigs - %s - %s - %s') logger = logging.getLoggerWe securely initialize API keys for Tavily and Google Gemini by prompting users only if they’re not already set in the environment, ensuring safe and repeatable access to external services. It also configures a standardized logging setup using Python’s logging module, which helps monitor execution flow and capture debug or error messages throughout the notebook. from langchain_community.retrievers import TavilySearchAPIRetriever from langchain_community.vectorstores import Chroma from langchain_core.documents import Document from langchain_core.output_parsers import StrOutputParser, JsonOutputParser from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate from langchain_core.runnables import RunnablePassthrough, RunnableLambda from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.chains.summarize import load_summarize_chain from langchain.memory import ConversationBufferMemory We import key components from the LangChain ecosystem and its integrations. It brings in the TavilySearchAPIRetriever for real-time web search, Chroma for vector storage, and GoogleGenerativeAI modules for chat and embedding models. Core LangChain modules like ChatPromptTemplate, RunnableLambda, ConversationBufferMemory, and output parsers enable flexible prompt construction, memory handling, and pipeline execution. class SearchQueryError: """Exception raised for errors in the search query.""" pass def format_docs: formatted_content =for i, doc in enumerate: metadata = doc.metadata source = metadata.gettitle = metadata.getscore = metadata.getformatted_content.appendreturn "nn".joinWe define two essential components for search and document handling. The SearchQueryError class creates a custom exception to manage invalid or failed search queries gracefully. The format_docs function processes a list of retrieved documents by extracting metadata such as title, source, and relevance score and formatting them into a clean, readable string. class SearchResultsParser: def parse: try: if isinstance: import re import json json_match = re.searchif json_match: json_str = json_match.groupreturn json.loadsreturn {"answer": text, "sources":, "confidence": 0.5} elif hasattr: return {"answer": text.content, "sources":, "confidence": 0.5} else: return {"answer": str, "sources":, "confidence": 0.5} except Exception as e: logger.warningreturn {"answer": str, "sources":, "confidence": 0.5} The SearchResultsParser class provides a robust method for extracting structured information from LLM responses. It attempts to parse a JSON-like string from the model output, returning to a plain text response format if parsing fails. It gracefully handles string outputs and message objects, ensuring consistent downstream processing. In case of errors, it logs a warning and returns a fallback response containing the raw answer, empty sources, and a default confidence score, enhancing the system’s fault tolerance. class EnhancedTavilyRetriever: def __init__: self.api_key = api_key self.max_results = max_results self.search_depth = search_depth self.include_domains = include_domains orself.exclude_domains = exclude_domains orself.retriever = self._create_retrieverself.previous_searches =def _create_retriever: try: return TavilySearchAPIRetrieverexcept Exception as e: logger.errorraise def invoke: if not query or not query.strip: raise SearchQueryErrortry: start_time = time.timeresults = self.retriever.invokeend_time = time.timesearch_record = { "timestamp": datetime.now.isoformat, "query": query, "num_results": len, "response_time": end_time - start_time } self.previous_searches.appendreturn results except Exception as e: logger.errorraise SearchQueryError}") def get_search_history: return self.previous_searches The EnhancedTavilyRetriever class is a custom wrapper around the TavilySearchAPIRetriever, adding greater flexibility, control, and traceability to search operations. It supports advanced features like limiting search depth, domain inclusion/exclusion filters, and configurable result counts. The invoke method performs web searches and tracks each query’s metadata, storing it for later analysis. class SearchCache: def __init__: self.embedding_function = GoogleGenerativeAIEmbeddingsself.vector_store = None self.text_splitter = RecursiveCharacterTextSplitterdef add_documents: if not documents: return try: if self.vector_store is None: self.vector_store = Chroma.from_documentselse: self.vector_store.add_documentsexcept Exception as e: logger.errordef search: if self.vector_store is None: returntry: return self.vector_store.similarity_searchexcept Exception as e: logger.errorreturnThe SearchCache class implements a semantic caching layer that stores and retrieves documents using vector embeddings for efficient similarity search. It uses GoogleGenerativeAIEmbeddings to convert documents into dense vectors and stores them in a Chroma vector database. The add_documents method initializes or updates the vector store, while the search method enables fast retrieval of the most relevant cached documents based on semantic similarity. This reduces redundant API calls and improves response times for repeated or related queries, serving as a lightweight hybrid memory layer in the AI assistant pipeline. search_cache = SearchCacheenhanced_retriever = EnhancedTavilyRetrievermemory = ConversationBufferMemorysystem_template = """You are a research assistant that provides accurate answers based on the search results provided. Follow these guidelines: 1. Only use the context provided to answer the question 2. If the context doesn't contain the answer, say "I don't have sufficient information to answer this question." 3. Cite your sources by referencing the document numbers 4. Don't make up information 5. Keep the answer concise but complete Context: {context} Chat History: {chat_history} """ system_message = SystemMessagePromptTemplate.from_templatehuman_template = "Question: {question}" human_message = HumanMessagePromptTemplate.from_templateprompt = ChatPromptTemplate.from_messagesWe initialize the core components of the AI assistant: a semantic SearchCache, the EnhancedTavilyRetriever for web-based querying, and a ConversationBufferMemory to retain chat history across turns. It also defines a structured prompt using ChatPromptTemplate, guiding the LLM to act as a research assistant. The prompt enforces strict rules for factual accuracy, context usage, source citation, and concise answering, ensuring reliable and grounded responses. def get_llm: try: return ChatGoogleGenerativeAIexcept Exception as e: logger.errorraise output_parser = SearchResultsParserWe define the get_llm function, which initializes a Google Gemini language model with configurable parameters such as model name, temperature, and decoding settings. It ensures robustness with error handling for failed model initialization. An instance of SearchResultsParser is also created to standardize and structure the LLM’s raw responses, enabling consistent downstream processing of answers and metadata. def plot_search_metrics: if not search_history: printreturn df = pd.DataFrameplt.figure) plt.subplotplt.plot), df, marker='o') plt.titleplt.xlabelplt.ylabel') plt.gridplt.subplotplt.bar), df) plt.titleplt.xlabelplt.ylabelplt.gridplt.tight_layoutplt.showThe plot_search_metrics function visualizes performance trends from past queries using Matplotlib. It converts the search history into a DataFrame and plots two subgraphs: one showing response time per search and the other displaying the number of results returned. This aids in analyzing the system’s efficiency and search quality over time, helping developers fine-tune the retriever or identify bottlenecks in real-world usage. def retrieve_with_fallback: cached_results = search_cache.searchif cached_results: logger.info} documents from cache") return cached_results logger.infosearch_results = enhanced_retriever.invokesearch_cache.add_documentsreturn search_results def summarize_documents: llm = get_llmsummarize_prompt = ChatPromptTemplate.from_templatechain =, "query": lambda _: query} | summarize_prompt | llm | StrOutputParser) return chain.invokeThese two functions enhance the assistant’s intelligence and efficiency. The retrieve_with_fallback function implements a hybrid retrieval mechanism: it first attempts to fetch semantically relevant documents from the local Chroma cache and, if unsuccessful, falls back to a real-time Tavily web search, caching the new results for future use. Meanwhile, summarize_documents leverages a Gemini LLM to generate concise summaries from retrieved documents, guided by a structured prompt that ensures relevance to the query. Together, they enable low-latency, informative, and context-aware responses. def advanced_chain: llm = get_llmif query_engine == "enhanced": retriever = lambda query: retrieve_with_fallbackelse: retriever = enhanced_retriever.invoke def chain_with_history: query = input_dictchat_history = memory.load_memory_variablesif include_history elsedocs = retrievercontext = format_docsresult = prompt.invokememory.save_contextreturn llm.invokereturn RunnableLambda| StrOutputParserThe advanced_chain function defines a modular, end-to-end reasoning workflow for answering user queries using cached or real-time search. It initializes the specified Gemini model, selects the retrieval strategy, constructs a response pipeline incorporating chat history, formats documents into context, and prompts the LLM using a system-guided template. The chain also logs the interaction in memory and returns the final answer, parsed into clean text. This design enables flexible experimentation with models and retrieval strategies while maintaining conversation coherence. qa_chain = advanced_chaindef analyze_query: llm = get_llmanalysis_prompt = ChatPromptTemplate.from_template3. Key entities mentioned 4. Query typeQuery: {query} Return the analysis in JSON format with the following structure: {{ "topic": "main topic", "sentiment": "sentiment", "entities":, "type": "query type" }} """ ) chain = analysis_prompt | llm | output_parser return chain.invokeprintprintquery = "what year was breath of the wild released and what was its reception?" printWe initialize the final components of the intelligent assistant. qa_chain is the assembled reasoning pipeline ready to process user queries using retrieval, memory, and Gemini-based response generation. The analyze_query function performs a lightweight semantic analysis on a query, extracting the main topic, sentiment, entities, and query type using the Gemini model and a structured JSON prompt. The example query, about Breath of the Wild’s release and reception, showcases how the assistant is triggered and prepared for full-stack inference and semantic interpretation. The printed heading marks the start of interactive execution. try: printanswer = qa_chain.invokeprintprintprinttry: query_analysis = analyze_queryprintprint) except Exception as e: print: {e}") except Exception as e: printhistory = enhanced_retriever.get_search_historyprintfor i, h in enumerate: printprintspecialized_retriever = EnhancedTavilyRetrievertry: specialized_results = specialized_retriever.invokeprint} specialized results") summary = summarize_documentsprintprintexcept Exception as e: printprintplot_search_metricsWe demonstrate the complete pipeline in action. It performs a search using the qa_chain, displays the generated answer, and then analyzes the query for sentiment, topic, entities, and type. It also retrieves and prints each query’s search history, response time, and result count. Also, it runs a domain-filtered search focused on Nintendo-related sites, summarizes the results, and visualizes search performance using plot_search_metrics, offering a comprehensive view of the assistant’s capabilities in real-time use. In conclusion, following this tutorial gives users a comprehensive blueprint for creating a highly capable, context-aware, and scalable RAG system that bridges real-time web intelligence with conversational AI. The Tavily Search API lets users directly pull fresh and relevant content from the web. The Gemini LLM adds robust reasoning and summarization capabilities, while LangChain’s abstraction layer allows seamless orchestration between memory, embeddings, and model outputs. The implementation includes advanced features such as domain-specific filtering, query analysis, and fallback strategies using a semantic vector cache built with Chroma and GoogleGenerativeAIEmbeddings. Also, structured logging, error handling, and analytics dashboards provide transparency and diagnostics for real-world deployment. Check out the Colab Notebook. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit. Asif RazzaqWebsite |  + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/AWS Open-Sources Strands Agents SDK to Simplify AI Agent DevelopmentAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Windsurf Launches SWE-1: A Frontier AI Model Family for End-to-End Software EngineeringAsif Razzaqhttps://www.marktechpost.com/author/6flvq/AI Agents Now Write Code in Parallel: OpenAI Introduces Codex, a Cloud-Based Coding Agent Inside ChatGPTAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step-by-Step Guide to Build an Automated Knowledge Graph Pipeline Using LangGraph and NetworkX 🚨 Build GenAI you can trust. ⭐️ Parlant is your open-source engine for controlled, compliant, and purposeful AI conversations — Star Parlant on GitHub! #how #build #powerful #intelligent #questionanswering
    WWW.MARKTECHPOST.COM
    How to Build a Powerful and Intelligent Question-Answering System by Using Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain Framework
    In this tutorial, we demonstrate how to build a powerful and intelligent question-answering system by combining the strengths of Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain framework. The pipeline leverages real-time web search using Tavily, semantic document caching with Chroma vector store, and contextual response generation through the Gemini model. These tools are integrated through LangChain’s modular components, such as RunnableLambda, ChatPromptTemplate, ConversationBufferMemory, and GoogleGenerativeAIEmbeddings. It goes beyond simple Q&A by introducing a hybrid retrieval mechanism that checks for cached embeddings before invoking fresh web searches. The retrieved documents are intelligently formatted, summarized, and passed through a structured LLM prompt, with attention to source attribution, user history, and confidence scoring. Key functions such as advanced prompt engineering, sentiment and entity analysis, and dynamic vector store updates make this pipeline suitable for advanced use cases like research assistance, domain-specific summarization, and intelligent agents. !pip install -qU langchain-community tavily-python langchain-google-genai streamlit matplotlib pandas tiktoken chromadb langchain_core pydantic langchain We install and upgrade a comprehensive set of libraries required to build an advanced AI search assistant. It includes tools for retrieval (tavily-python, chromadb), LLM integration (langchain-google-genai, langchain), data handling (pandas, pydantic), visualization (matplotlib, streamlit), and tokenization (tiktoken). These components form the core foundation for constructing a real-time, context-aware QA system. import os import getpass import pandas as pd import matplotlib.pyplot as plt import numpy as np import json import time from typing import List, Dict, Any, Optional from datetime import datetime We import essential Python libraries used throughout the notebook. It includes standard libraries for environment variables, secure input, time tracking, and data types (os, getpass, time, typing, datetime). Additionally, it brings in core data science tools like pandas, matplotlib, and numpy for data handling, visualization, and numerical computations, as well as json for parsing structured data. if "TAVILY_API_KEY" not in os.environ: os.environ["TAVILY_API_KEY"] = getpass.getpass("Enter Tavily API key: ") if "GOOGLE_API_KEY" not in os.environ: os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter Google API key: ") import logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s') logger = logging.getLogger(__name__) We securely initialize API keys for Tavily and Google Gemini by prompting users only if they’re not already set in the environment, ensuring safe and repeatable access to external services. It also configures a standardized logging setup using Python’s logging module, which helps monitor execution flow and capture debug or error messages throughout the notebook. from langchain_community.retrievers import TavilySearchAPIRetriever from langchain_community.vectorstores import Chroma from langchain_core.documents import Document from langchain_core.output_parsers import StrOutputParser, JsonOutputParser from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate from langchain_core.runnables import RunnablePassthrough, RunnableLambda from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.chains.summarize import load_summarize_chain from langchain.memory import ConversationBufferMemory We import key components from the LangChain ecosystem and its integrations. It brings in the TavilySearchAPIRetriever for real-time web search, Chroma for vector storage, and GoogleGenerativeAI modules for chat and embedding models. Core LangChain modules like ChatPromptTemplate, RunnableLambda, ConversationBufferMemory, and output parsers enable flexible prompt construction, memory handling, and pipeline execution. class SearchQueryError(Exception): """Exception raised for errors in the search query.""" pass def format_docs(docs): formatted_content = [] for i, doc in enumerate(docs): metadata = doc.metadata source = metadata.get('source', 'Unknown source') title = metadata.get('title', 'Untitled') score = metadata.get('score', 0) formatted_content.append( f"Document {i+1} [Score: {score:.2f}]:n" f"Title: {title}n" f"Source: {source}n" f"Content: {doc.page_content}n" ) return "nn".join(formatted_content) We define two essential components for search and document handling. The SearchQueryError class creates a custom exception to manage invalid or failed search queries gracefully. The format_docs function processes a list of retrieved documents by extracting metadata such as title, source, and relevance score and formatting them into a clean, readable string. class SearchResultsParser: def parse(self, text): try: if isinstance(text, str): import re import json json_match = re.search(r'{.*}', text, re.DOTALL) if json_match: json_str = json_match.group(0) return json.loads(json_str) return {"answer": text, "sources": [], "confidence": 0.5} elif hasattr(text, 'content'): return {"answer": text.content, "sources": [], "confidence": 0.5} else: return {"answer": str(text), "sources": [], "confidence": 0.5} except Exception as e: logger.warning(f"Failed to parse JSON: {e}") return {"answer": str(text), "sources": [], "confidence": 0.5} The SearchResultsParser class provides a robust method for extracting structured information from LLM responses. It attempts to parse a JSON-like string from the model output, returning to a plain text response format if parsing fails. It gracefully handles string outputs and message objects, ensuring consistent downstream processing. In case of errors, it logs a warning and returns a fallback response containing the raw answer, empty sources, and a default confidence score, enhancing the system’s fault tolerance. class EnhancedTavilyRetriever: def __init__(self, api_key=None, max_results=5, search_depth="advanced", include_domains=None, exclude_domains=None): self.api_key = api_key self.max_results = max_results self.search_depth = search_depth self.include_domains = include_domains or [] self.exclude_domains = exclude_domains or [] self.retriever = self._create_retriever() self.previous_searches = [] def _create_retriever(self): try: return TavilySearchAPIRetriever( api_key=self.api_key, k=self.max_results, search_depth=self.search_depth, include_domains=self.include_domains, exclude_domains=self.exclude_domains ) except Exception as e: logger.error(f"Failed to create Tavily retriever: {e}") raise def invoke(self, query, **kwargs): if not query or not query.strip(): raise SearchQueryError("Empty search query") try: start_time = time.time() results = self.retriever.invoke(query, **kwargs) end_time = time.time() search_record = { "timestamp": datetime.now().isoformat(), "query": query, "num_results": len(results), "response_time": end_time - start_time } self.previous_searches.append(search_record) return results except Exception as e: logger.error(f"Search failed: {e}") raise SearchQueryError(f"Failed to perform search: {str(e)}") def get_search_history(self): return self.previous_searches The EnhancedTavilyRetriever class is a custom wrapper around the TavilySearchAPIRetriever, adding greater flexibility, control, and traceability to search operations. It supports advanced features like limiting search depth, domain inclusion/exclusion filters, and configurable result counts. The invoke method performs web searches and tracks each query’s metadata (timestamp, response time, and result count), storing it for later analysis. class SearchCache: def __init__(self): self.embedding_function = GoogleGenerativeAIEmbeddings(model="models/embedding-001") self.vector_store = None self.text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200) def add_documents(self, documents): if not documents: return try: if self.vector_store is None: self.vector_store = Chroma.from_documents( documents=documents, embedding=self.embedding_function ) else: self.vector_store.add_documents(documents) except Exception as e: logger.error(f"Failed to add documents to cache: {e}") def search(self, query, k=3): if self.vector_store is None: return [] try: return self.vector_store.similarity_search(query, k=k) except Exception as e: logger.error(f"Vector search failed: {e}") return [] The SearchCache class implements a semantic caching layer that stores and retrieves documents using vector embeddings for efficient similarity search. It uses GoogleGenerativeAIEmbeddings to convert documents into dense vectors and stores them in a Chroma vector database. The add_documents method initializes or updates the vector store, while the search method enables fast retrieval of the most relevant cached documents based on semantic similarity. This reduces redundant API calls and improves response times for repeated or related queries, serving as a lightweight hybrid memory layer in the AI assistant pipeline. search_cache = SearchCache() enhanced_retriever = EnhancedTavilyRetriever(max_results=5) memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True) system_template = """You are a research assistant that provides accurate answers based on the search results provided. Follow these guidelines: 1. Only use the context provided to answer the question 2. If the context doesn't contain the answer, say "I don't have sufficient information to answer this question." 3. Cite your sources by referencing the document numbers 4. Don't make up information 5. Keep the answer concise but complete Context: {context} Chat History: {chat_history} """ system_message = SystemMessagePromptTemplate.from_template(system_template) human_template = "Question: {question}" human_message = HumanMessagePromptTemplate.from_template(human_template) prompt = ChatPromptTemplate.from_messages([system_message, human_message]) We initialize the core components of the AI assistant: a semantic SearchCache, the EnhancedTavilyRetriever for web-based querying, and a ConversationBufferMemory to retain chat history across turns. It also defines a structured prompt using ChatPromptTemplate, guiding the LLM to act as a research assistant. The prompt enforces strict rules for factual accuracy, context usage, source citation, and concise answering, ensuring reliable and grounded responses. def get_llm(model_name="gemini-2.0-flash-lite", temperature=0.2, response_mode="json"): try: return ChatGoogleGenerativeAI( model=model_name, temperature=temperature, convert_system_message_to_human=True, top_p=0.95, top_k=40, max_output_tokens=2048 ) except Exception as e: logger.error(f"Failed to initialize LLM: {e}") raise output_parser = SearchResultsParser() We define the get_llm function, which initializes a Google Gemini language model with configurable parameters such as model name, temperature, and decoding settings (e.g., top_p, top_k, and max tokens). It ensures robustness with error handling for failed model initialization. An instance of SearchResultsParser is also created to standardize and structure the LLM’s raw responses, enabling consistent downstream processing of answers and metadata. def plot_search_metrics(search_history): if not search_history: print("No search history available") return df = pd.DataFrame(search_history) plt.figure(figsize=(12, 6)) plt.subplot(1, 2, 1) plt.plot(range(len(df)), df['response_time'], marker='o') plt.title('Search Response Times') plt.xlabel('Search Index') plt.ylabel('Time (seconds)') plt.grid(True) plt.subplot(1, 2, 2) plt.bar(range(len(df)), df['num_results']) plt.title('Number of Results per Search') plt.xlabel('Search Index') plt.ylabel('Number of Results') plt.grid(True) plt.tight_layout() plt.show() The plot_search_metrics function visualizes performance trends from past queries using Matplotlib. It converts the search history into a DataFrame and plots two subgraphs: one showing response time per search and the other displaying the number of results returned. This aids in analyzing the system’s efficiency and search quality over time, helping developers fine-tune the retriever or identify bottlenecks in real-world usage. def retrieve_with_fallback(query): cached_results = search_cache.search(query) if cached_results: logger.info(f"Retrieved {len(cached_results)} documents from cache") return cached_results logger.info("No cache hit, performing web search") search_results = enhanced_retriever.invoke(query) search_cache.add_documents(search_results) return search_results def summarize_documents(documents, query): llm = get_llm(temperature=0) summarize_prompt = ChatPromptTemplate.from_template( """Create a concise summary of the following documents related to this query: {query} {documents} Provide a comprehensive summary that addresses the key points relevant to the query. """ ) chain = ( {"documents": lambda docs: format_docs(docs), "query": lambda _: query} | summarize_prompt | llm | StrOutputParser() ) return chain.invoke(documents) These two functions enhance the assistant’s intelligence and efficiency. The retrieve_with_fallback function implements a hybrid retrieval mechanism: it first attempts to fetch semantically relevant documents from the local Chroma cache and, if unsuccessful, falls back to a real-time Tavily web search, caching the new results for future use. Meanwhile, summarize_documents leverages a Gemini LLM to generate concise summaries from retrieved documents, guided by a structured prompt that ensures relevance to the query. Together, they enable low-latency, informative, and context-aware responses. def advanced_chain(query_engine="enhanced", model="gemini-1.5-pro", include_history=True): llm = get_llm(model_name=model) if query_engine == "enhanced": retriever = lambda query: retrieve_with_fallback(query) else: retriever = enhanced_retriever.invoke def chain_with_history(input_dict): query = input_dict["question"] chat_history = memory.load_memory_variables({})["chat_history"] if include_history else [] docs = retriever(query) context = format_docs(docs) result = prompt.invoke({ "context": context, "question": query, "chat_history": chat_history }) memory.save_context({"input": query}, {"output": result.content}) return llm.invoke(result) return RunnableLambda(chain_with_history) | StrOutputParser() The advanced_chain function defines a modular, end-to-end reasoning workflow for answering user queries using cached or real-time search. It initializes the specified Gemini model, selects the retrieval strategy (cached fallback or direct search), constructs a response pipeline incorporating chat history (if enabled), formats documents into context, and prompts the LLM using a system-guided template. The chain also logs the interaction in memory and returns the final answer, parsed into clean text. This design enables flexible experimentation with models and retrieval strategies while maintaining conversation coherence. qa_chain = advanced_chain() def analyze_query(query): llm = get_llm(temperature=0) analysis_prompt = ChatPromptTemplate.from_template( """Analyze the following query and provide: 1. Main topic 2. Sentiment (positive, negative, neutral) 3. Key entities mentioned 4. Query type (factual, opinion, how-to, etc.) Query: {query} Return the analysis in JSON format with the following structure: {{ "topic": "main topic", "sentiment": "sentiment", "entities": ["entity1", "entity2"], "type": "query type" }} """ ) chain = analysis_prompt | llm | output_parser return chain.invoke({"query": query}) print("Advanced Tavily-Gemini Implementation") print("="*50) query = "what year was breath of the wild released and what was its reception?" print(f"Query: {query}") We initialize the final components of the intelligent assistant. qa_chain is the assembled reasoning pipeline ready to process user queries using retrieval, memory, and Gemini-based response generation. The analyze_query function performs a lightweight semantic analysis on a query, extracting the main topic, sentiment, entities, and query type using the Gemini model and a structured JSON prompt. The example query, about Breath of the Wild’s release and reception, showcases how the assistant is triggered and prepared for full-stack inference and semantic interpretation. The printed heading marks the start of interactive execution. try: print("nSearching for answer...") answer = qa_chain.invoke({"question": query}) print("nAnswer:") print(answer) print("nAnalyzing query...") try: query_analysis = analyze_query(query) print("nQuery Analysis:") print(json.dumps(query_analysis, indent=2)) except Exception as e: print(f"Query analysis error (non-critical): {e}") except Exception as e: print(f"Error in search: {e}") history = enhanced_retriever.get_search_history() print("nSearch History:") for i, h in enumerate(history): print(f"{i+1}. Query: {h['query']} - Results: {h['num_results']} - Time: {h['response_time']:.2f}s") print("nAdvanced search with domain filtering:") specialized_retriever = EnhancedTavilyRetriever( max_results=3, search_depth="advanced", include_domains=["nintendo.com", "zelda.com"], exclude_domains=["reddit.com", "twitter.com"] ) try: specialized_results = specialized_retriever.invoke("breath of the wild sales") print(f"Found {len(specialized_results)} specialized results") summary = summarize_documents(specialized_results, "breath of the wild sales") print("nSummary of specialized results:") print(summary) except Exception as e: print(f"Error in specialized search: {e}") print("nSearch Metrics:") plot_search_metrics(history) We demonstrate the complete pipeline in action. It performs a search using the qa_chain, displays the generated answer, and then analyzes the query for sentiment, topic, entities, and type. It also retrieves and prints each query’s search history, response time, and result count. Also, it runs a domain-filtered search focused on Nintendo-related sites, summarizes the results, and visualizes search performance using plot_search_metrics, offering a comprehensive view of the assistant’s capabilities in real-time use. In conclusion, following this tutorial gives users a comprehensive blueprint for creating a highly capable, context-aware, and scalable RAG system that bridges real-time web intelligence with conversational AI. The Tavily Search API lets users directly pull fresh and relevant content from the web. The Gemini LLM adds robust reasoning and summarization capabilities, while LangChain’s abstraction layer allows seamless orchestration between memory, embeddings, and model outputs. The implementation includes advanced features such as domain-specific filtering, query analysis (sentiment, topic, and entity extraction), and fallback strategies using a semantic vector cache built with Chroma and GoogleGenerativeAIEmbeddings. Also, structured logging, error handling, and analytics dashboards provide transparency and diagnostics for real-world deployment. Check out the Colab Notebook. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit. Asif RazzaqWebsite |  + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/AWS Open-Sources Strands Agents SDK to Simplify AI Agent DevelopmentAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Windsurf Launches SWE-1: A Frontier AI Model Family for End-to-End Software EngineeringAsif Razzaqhttps://www.marktechpost.com/author/6flvq/AI Agents Now Write Code in Parallel: OpenAI Introduces Codex, a Cloud-Based Coding Agent Inside ChatGPTAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step-by-Step Guide to Build an Automated Knowledge Graph Pipeline Using LangGraph and NetworkX 🚨 Build GenAI you can trust. ⭐️ Parlant is your open-source engine for controlled, compliant, and purposeful AI conversations — Star Parlant on GitHub! (Promoted)
    0 Comments 0 Shares
  • Sierra made the games of my childhood. Are they still fun to play?

    Sludge Vohaul!

    Sierra made the games of my childhood. Are they still fun to play?

    Get ready for some nostalgia.

    Nate Anderson



    May 17, 2025 7:00 am

    |

    14

    Story text

    Size

    Small
    Standard
    Large

    Width
    *

    Standard
    Wide

    Links

    Standard
    Orange

    * Subscribers only
      Learn more

    My Ars colleagues were kicking back at the Orbital HQ water cooler the other day, and—as gracefully aging gamers are wont to do—they began to reminisce about classic Sierra On-Line adventure games. I was a huge fan of these games in my youth, so I settled in for some hot buttered nostalgia.
    Would we remember the limited-palette joys of early King's Quest, Space Quest, or Quest for Glory titles? Would we branch out beyond games with "Quest" in their titles, seeking rarer fare like Freddy Pharkas: Frontier Pharmacist? What about the gothic stylings of The Colonel's Bequest or the voodoo-curious Gabriel Knight?
    Nope. The talk was of acorns.acorns, in fact!
    The scene in question came from King's Quest III, where our hero Gwydion must acquire some exceptionally desiccated acorns to advance the plot. It sounds simple enough. As one walkthrough puts it, "Go east one screen and north one screen to the acorn tree. Try picking up acorns until you get some dry ones. Try various spots underneath the tree." Easy! And clear!
    Except it wasn't either one because the game rather notoriously won't always give you the acorns, even when you enter the right command. This led many gamers to believe they were in the wrong spot, when in reality, they just had to keep entering the "get acorns" command while moving pixel by pixel around the tree until the game finally supplied them. One of our staffers admitted to having purchased the King's Quest III hint book solely because of this "puzzle."This wasn't quite the "fun" I had remembered from these games, but as I cast my mind back, I dimly began to recall similar situations. Space Quest II: Vohaul's Revenge had been my first Sierra title, and after my brother and I spent weeks on the game only to get stuck and die repeatedly in some pitch-dark tunnels, we implored my dad to call Sierra's 1-900 pay hint line. He thought about it. I could see it pained him because he had never beforecalled a 1-900 number in his life. In this case, the call cost a piratical 75 cents for the first minute and 50 cents for each additional minute. But after listening to us whine for several days straight, my dad decided that his sanity was worth the fee, and he called.

    Much like with the acorn example above, we had known what to do—we had just not done it to the game's rather exacting and sometimes obscure standards. The key was to use a glowing gem as a light source, which my brother and I had long understood. The problem was the text parser, which demanded that we "put gem in mouth" to use its light in the tunnels. There was no other place to put the gem, no other way to hold or attach it.No other attempts to use the light of this shining crystal, no matter how clear, well-intentioned, or succinctly expressed, would work. You put the gem in your mouth, or you died in the darkness.
    Returning from my reveries to the conversation at hand, I caught Ars Senior Editor Lee Hutchinson's cynical remark that these kinds of puzzles were "the only way to make 2–3 hours of 'game' last for months." This seemed rather shocking, almost offensive. How could one say such a thing about the games that colored my memories of childhood?
    So I decided to replay Space Quest II for the first time in 35 years in an attempt to defend my own past.
    Big mistake.

    We're not on Endor anymore, Dorothy.

    Play it again, Sam
    In my memory, the Space Quest series was filled with sharply written humor, clever puzzles, and enchanting art. But when I fired up the original version of the game, I found that only one of these was true. The art, despite its blockiness and limited colors, remained charming.
    As for the gameplay, the puzzles were not so much "clever" as "infuriating," "obvious," or"rather obscure."
    Finding the glowing gem discussed above requires you to swim into one small spot of a multi-screen river, with no indication in advance that anything of importance is in that exact location. Trying to "call" a hunter who has captured you does nothing… until you do it a second time. And the less said about trying to throw a puzzle at a Labian Terror Beast, typing out various word permutations while death bears down upon you, the better.

    The whole game was also filled with far more no-warning insta-deaths than I had remembered. On the opening screen, for instance, after your janitorial space-broom floats off into the cosmic ether, you can walk your character right off the edge of the orbital space station he is cleaning. The game doesn't stop you; indeed, it kills you and then mocks you for "an obvious lack of common sense." It then calls you a "wing nut" with an "inability to sustain life." Game over.
    The game's third screen, which features nothing more to do than simply walking around, will also kill you in at least two different ways. Walk into the room still wearing your spacesuit and your boss will come over and chew you out. Game over.
    If you manage to avoid that fate by changing into your indoor uniform first, it's comically easy to tap the wrong arrow key and fall off the room's completely guardrail-free elevator platform. Game over.

    Do NOT touch any part of this root monster.

    Get used to it because the game will kill you in so, so many ways: touching any single pixel of a root monster whose branches form a difficult maze; walking into a giant mushroom; stepping over an invisible pit in the ground; getting shot by a guard who zips in on a hovercraft; drowning in an underwater tunnel; getting swiped at by some kind of giant ape; not putting the glowing gem in your mouth; falling into acid; and many more.
    I used the word "insta-death" above, but the game is not even content with this. At one key point late in the game, a giant Aliens-style alien stalks the hallways, and if she finds you, she "kisses" you. But then she leaves! You are safe after all! Of course, if you have seen the films, you will recognize that you are not safe, but the game lets you go on for a bit before the alien's baby inevitably bursts from your chest, killing you. Game over.

    This is why the official hint book suggests that you "save your game a lot, especially when it seems that you're entering a dangerous area. That way, if you die, you don't have to retrace your steps much." Presumably, this was once considered entertaining.
    When it comes to the humor, most of it is broad.Sometimes it is condescending.Or it might just be potty jokes.My total gameplay time: a few hours.
    "By Grabthar's hammer!" I thought. "Lee was right!"

    When I admitted this to him, Lee told me that he had actually spent time learning to speedrun the Space Quest games during the pandemic. "According to my notes, a clean run of SQ2 in 'fast' mode—assuming good typing skills—takes about 20 minutes straight-up," he said. Yikes.

    What a fiendish plot!

    And yet
    The past was a different time. Computer memory was small, graphics capabilities were low, and computer games had emerged from the "let them live just long enough to encourage spending another quarter" arcade model. Mouse adoption took a while; text parsers made sense even though they created plenty of frustration. So yes—some of these games were a few hours of gameplay stretched out with insta-death, obscure puzzles, and the sheer amount of time it took just to walk across the game's various screens.Let's get off this rock.

    Judged by current standards, the Sierra games are no longer what I would play for fun.
    All the same, I loved them. They introduced me to the joy of exploring virtual worlds and to the power of evocative artwork. I went into space, into fairy tales, and into the past, and I did so while finding the games' humor humorous and their plotlines compelling.If the games can feel a bit arbitrary or vexing today, my child-self's love of repetition was able to treat them as engaging challenges rather than "unfair" design.
    Replaying Space Quest II, encountering the half-remembered jokes and visual designs, brought back these memories. The novelist Thomas Wolfe knew that you can't go home again, and it was probably inevitable that the game would feel dated to me now. But playing it again did take me back to that time before the Internet, when not even hint lines, insta-death, and EGA graphics could dampen the wonder of the new worlds computers were capable of showing us.

    Literal bathroom humor.

    Space Quest II, along with several other Sierra titles, is freely and legally available online at sarien.net—though I found many, many glitches in the implementation. Windows users can buy the entire Space Quest collection through Steam or Good Old Games. There's even a fan remake that runs on macOS, Windows, and Linux.

    Nate Anderson
    Deputy Editor

    Nate Anderson
    Deputy Editor

    Nate is the deputy editor at Ars Technica. His most recent book is In Emergency, Break Glass: What Nietzsche Can Teach Us About Joyful Living in a Tech-Saturated World, which is much funnier than it sounds.

    14 Comments
    #sierra #made #games #childhood #are
    Sierra made the games of my childhood. Are they still fun to play?
    Sludge Vohaul! Sierra made the games of my childhood. Are they still fun to play? Get ready for some nostalgia. Nate Anderson – May 17, 2025 7:00 am | 14 Story text Size Small Standard Large Width * Standard Wide Links Standard Orange * Subscribers only   Learn more My Ars colleagues were kicking back at the Orbital HQ water cooler the other day, and—as gracefully aging gamers are wont to do—they began to reminisce about classic Sierra On-Line adventure games. I was a huge fan of these games in my youth, so I settled in for some hot buttered nostalgia. Would we remember the limited-palette joys of early King's Quest, Space Quest, or Quest for Glory titles? Would we branch out beyond games with "Quest" in their titles, seeking rarer fare like Freddy Pharkas: Frontier Pharmacist? What about the gothic stylings of The Colonel's Bequest or the voodoo-curious Gabriel Knight? Nope. The talk was of acorns.acorns, in fact! The scene in question came from King's Quest III, where our hero Gwydion must acquire some exceptionally desiccated acorns to advance the plot. It sounds simple enough. As one walkthrough puts it, "Go east one screen and north one screen to the acorn tree. Try picking up acorns until you get some dry ones. Try various spots underneath the tree." Easy! And clear! Except it wasn't either one because the game rather notoriously won't always give you the acorns, even when you enter the right command. This led many gamers to believe they were in the wrong spot, when in reality, they just had to keep entering the "get acorns" command while moving pixel by pixel around the tree until the game finally supplied them. One of our staffers admitted to having purchased the King's Quest III hint book solely because of this "puzzle."This wasn't quite the "fun" I had remembered from these games, but as I cast my mind back, I dimly began to recall similar situations. Space Quest II: Vohaul's Revenge had been my first Sierra title, and after my brother and I spent weeks on the game only to get stuck and die repeatedly in some pitch-dark tunnels, we implored my dad to call Sierra's 1-900 pay hint line. He thought about it. I could see it pained him because he had never beforecalled a 1-900 number in his life. In this case, the call cost a piratical 75 cents for the first minute and 50 cents for each additional minute. But after listening to us whine for several days straight, my dad decided that his sanity was worth the fee, and he called. Much like with the acorn example above, we had known what to do—we had just not done it to the game's rather exacting and sometimes obscure standards. The key was to use a glowing gem as a light source, which my brother and I had long understood. The problem was the text parser, which demanded that we "put gem in mouth" to use its light in the tunnels. There was no other place to put the gem, no other way to hold or attach it.No other attempts to use the light of this shining crystal, no matter how clear, well-intentioned, or succinctly expressed, would work. You put the gem in your mouth, or you died in the darkness. Returning from my reveries to the conversation at hand, I caught Ars Senior Editor Lee Hutchinson's cynical remark that these kinds of puzzles were "the only way to make 2–3 hours of 'game' last for months." This seemed rather shocking, almost offensive. How could one say such a thing about the games that colored my memories of childhood? So I decided to replay Space Quest II for the first time in 35 years in an attempt to defend my own past. Big mistake. We're not on Endor anymore, Dorothy. Play it again, Sam In my memory, the Space Quest series was filled with sharply written humor, clever puzzles, and enchanting art. But when I fired up the original version of the game, I found that only one of these was true. The art, despite its blockiness and limited colors, remained charming. As for the gameplay, the puzzles were not so much "clever" as "infuriating," "obvious," or"rather obscure." Finding the glowing gem discussed above requires you to swim into one small spot of a multi-screen river, with no indication in advance that anything of importance is in that exact location. Trying to "call" a hunter who has captured you does nothing… until you do it a second time. And the less said about trying to throw a puzzle at a Labian Terror Beast, typing out various word permutations while death bears down upon you, the better. The whole game was also filled with far more no-warning insta-deaths than I had remembered. On the opening screen, for instance, after your janitorial space-broom floats off into the cosmic ether, you can walk your character right off the edge of the orbital space station he is cleaning. The game doesn't stop you; indeed, it kills you and then mocks you for "an obvious lack of common sense." It then calls you a "wing nut" with an "inability to sustain life." Game over. The game's third screen, which features nothing more to do than simply walking around, will also kill you in at least two different ways. Walk into the room still wearing your spacesuit and your boss will come over and chew you out. Game over. If you manage to avoid that fate by changing into your indoor uniform first, it's comically easy to tap the wrong arrow key and fall off the room's completely guardrail-free elevator platform. Game over. Do NOT touch any part of this root monster. Get used to it because the game will kill you in so, so many ways: touching any single pixel of a root monster whose branches form a difficult maze; walking into a giant mushroom; stepping over an invisible pit in the ground; getting shot by a guard who zips in on a hovercraft; drowning in an underwater tunnel; getting swiped at by some kind of giant ape; not putting the glowing gem in your mouth; falling into acid; and many more. I used the word "insta-death" above, but the game is not even content with this. At one key point late in the game, a giant Aliens-style alien stalks the hallways, and if she finds you, she "kisses" you. But then she leaves! You are safe after all! Of course, if you have seen the films, you will recognize that you are not safe, but the game lets you go on for a bit before the alien's baby inevitably bursts from your chest, killing you. Game over. This is why the official hint book suggests that you "save your game a lot, especially when it seems that you're entering a dangerous area. That way, if you die, you don't have to retrace your steps much." Presumably, this was once considered entertaining. When it comes to the humor, most of it is broad.Sometimes it is condescending.Or it might just be potty jokes.My total gameplay time: a few hours. "By Grabthar's hammer!" I thought. "Lee was right!" When I admitted this to him, Lee told me that he had actually spent time learning to speedrun the Space Quest games during the pandemic. "According to my notes, a clean run of SQ2 in 'fast' mode—assuming good typing skills—takes about 20 minutes straight-up," he said. Yikes. What a fiendish plot! And yet The past was a different time. Computer memory was small, graphics capabilities were low, and computer games had emerged from the "let them live just long enough to encourage spending another quarter" arcade model. Mouse adoption took a while; text parsers made sense even though they created plenty of frustration. So yes—some of these games were a few hours of gameplay stretched out with insta-death, obscure puzzles, and the sheer amount of time it took just to walk across the game's various screens.Let's get off this rock. Judged by current standards, the Sierra games are no longer what I would play for fun. All the same, I loved them. They introduced me to the joy of exploring virtual worlds and to the power of evocative artwork. I went into space, into fairy tales, and into the past, and I did so while finding the games' humor humorous and their plotlines compelling.If the games can feel a bit arbitrary or vexing today, my child-self's love of repetition was able to treat them as engaging challenges rather than "unfair" design. Replaying Space Quest II, encountering the half-remembered jokes and visual designs, brought back these memories. The novelist Thomas Wolfe knew that you can't go home again, and it was probably inevitable that the game would feel dated to me now. But playing it again did take me back to that time before the Internet, when not even hint lines, insta-death, and EGA graphics could dampen the wonder of the new worlds computers were capable of showing us. Literal bathroom humor. Space Quest II, along with several other Sierra titles, is freely and legally available online at sarien.net—though I found many, many glitches in the implementation. Windows users can buy the entire Space Quest collection through Steam or Good Old Games. There's even a fan remake that runs on macOS, Windows, and Linux. Nate Anderson Deputy Editor Nate Anderson Deputy Editor Nate is the deputy editor at Ars Technica. His most recent book is In Emergency, Break Glass: What Nietzsche Can Teach Us About Joyful Living in a Tech-Saturated World, which is much funnier than it sounds. 14 Comments #sierra #made #games #childhood #are
    ARSTECHNICA.COM
    Sierra made the games of my childhood. Are they still fun to play?
    Sludge Vohaul! Sierra made the games of my childhood. Are they still fun to play? Get ready for some nostalgia. Nate Anderson – May 17, 2025 7:00 am | 14 Story text Size Small Standard Large Width * Standard Wide Links Standard Orange * Subscribers only   Learn more My Ars colleagues were kicking back at the Orbital HQ water cooler the other day, and—as gracefully aging gamers are wont to do—they began to reminisce about classic Sierra On-Line adventure games. I was a huge fan of these games in my youth, so I settled in for some hot buttered nostalgia. Would we remember the limited-palette joys of early King's Quest, Space Quest, or Quest for Glory titles? Would we branch out beyond games with "Quest" in their titles, seeking rarer fare like Freddy Pharkas: Frontier Pharmacist? What about the gothic stylings of The Colonel's Bequest or the voodoo-curious Gabriel Knight? Nope. The talk was of acorns. [Bleeping] acorns, in fact! The scene in question came from King's Quest III, where our hero Gwydion must acquire some exceptionally desiccated acorns to advance the plot. It sounds simple enough. As one walkthrough puts it, "Go east one screen and north one screen to the acorn tree. Try picking up acorns until you get some dry ones. Try various spots underneath the tree." Easy! And clear! Except it wasn't either one because the game rather notoriously won't always give you the acorns, even when you enter the right command. This led many gamers to believe they were in the wrong spot, when in reality, they just had to keep entering the "get acorns" command while moving pixel by pixel around the tree until the game finally supplied them. One of our staffers admitted to having purchased the King's Quest III hint book solely because of this "puzzle." (The hint book, which is now online, says that players should "move around" the particular oak tree in question because "you can only find the right kind of acorns in one spot.") This wasn't quite the "fun" I had remembered from these games, but as I cast my mind back, I dimly began to recall similar situations. Space Quest II: Vohaul's Revenge had been my first Sierra title, and after my brother and I spent weeks on the game only to get stuck and die repeatedly in some pitch-dark tunnels, we implored my dad to call Sierra's 1-900 pay hint line. He thought about it. I could see it pained him because he had never before (and never since!) called a 1-900 number in his life. In this case, the call cost a piratical 75 cents for the first minute and 50 cents for each additional minute. But after listening to us whine for several days straight, my dad decided that his sanity was worth the fee, and he called. Much like with the acorn example above, we had known what to do—we had just not done it to the game's rather exacting and sometimes obscure standards. The key was to use a glowing gem as a light source, which my brother and I had long understood. The problem was the text parser, which demanded that we "put gem in mouth" to use its light in the tunnels. There was no other place to put the gem, no other way to hold or attach it. (We tried them all.) No other attempts to use the light of this shining crystal, no matter how clear, well-intentioned, or succinctly expressed, would work. You put the gem in your mouth, or you died in the darkness. Returning from my reveries to the conversation at hand, I caught Ars Senior Editor Lee Hutchinson's cynical remark that these kinds of puzzles were "the only way to make 2–3 hours of 'game' last for months." This seemed rather shocking, almost offensive. How could one say such a thing about the games that colored my memories of childhood? So I decided to replay Space Quest II for the first time in 35 years in an attempt to defend my own past. Big mistake. We're not on Endor anymore, Dorothy. Play it again, Sam In my memory, the Space Quest series was filled with sharply written humor, clever puzzles, and enchanting art. But when I fired up the original version of the game, I found that only one of these was true. The art, despite its blockiness and limited colors, remained charming. As for the gameplay, the puzzles were not so much "clever" as "infuriating," "obvious," or (more often) "rather obscure." Finding the glowing gem discussed above requires you to swim into one small spot of a multi-screen river, with no indication in advance that anything of importance is in that exact location. Trying to "call" a hunter who has captured you does nothing… until you do it a second time. And the less said about trying to throw a puzzle at a Labian Terror Beast, typing out various word permutations while death bears down upon you, the better. The whole game was also filled with far more no-warning insta-deaths than I had remembered. On the opening screen, for instance, after your janitorial space-broom floats off into the cosmic ether, you can walk your character right off the edge of the orbital space station he is cleaning. The game doesn't stop you; indeed, it kills you and then mocks you for "an obvious lack of common sense." It then calls you a "wing nut" with an "inability to sustain life." Game over. The game's third screen, which features nothing more to do than simply walking around, will also kill you in at least two different ways. Walk into the room still wearing your spacesuit and your boss will come over and chew you out. Game over. If you manage to avoid that fate by changing into your indoor uniform first, it's comically easy to tap the wrong arrow key and fall off the room's completely guardrail-free elevator platform. Game over. Do NOT touch any part of this root monster. Get used to it because the game will kill you in so, so many ways: touching any single pixel of a root monster whose branches form a difficult maze; walking into a giant mushroom; stepping over an invisible pit in the ground; getting shot by a guard who zips in on a hovercraft; drowning in an underwater tunnel; getting swiped at by some kind of giant ape; not putting the glowing gem in your mouth; falling into acid; and many more. I used the word "insta-death" above, but the game is not even content with this. At one key point late in the game, a giant Aliens-style alien stalks the hallways, and if she finds you, she "kisses" you. But then she leaves! You are safe after all! Of course, if you have seen the films, you will recognize that you are not safe, but the game lets you go on for a bit before the alien's baby inevitably bursts from your chest, killing you. Game over. This is why the official hint book suggests that you "save your game a lot, especially when it seems that you're entering a dangerous area. That way, if you die, you don't have to retrace your steps much." Presumably, this was once considered entertaining. When it comes to the humor, most of it is broad. (When you are told to "say the word," you have to say "the word.") Sometimes it is condescending. ("You quickly glance around the room to see if anyone saw you blow it.") Or it might just be potty jokes. (Plungers, jock straps, toilet paper, alien bathrooms, and fouling one's trousers all make appearances.) My total gameplay time: a few hours. "By Grabthar's hammer!" I thought. "Lee was right!" When I admitted this to him, Lee told me that he had actually spent time learning to speedrun the Space Quest games during the pandemic. "According to my notes, a clean run of SQ2 in 'fast' mode—assuming good typing skills—takes about 20 minutes straight-up," he said. Yikes. What a fiendish plot! And yet The past was a different time. Computer memory was small, graphics capabilities were low, and computer games had emerged from the "let them live just long enough to encourage spending another quarter" arcade model. Mouse adoption took a while; text parsers made sense even though they created plenty of frustration. So yes—some of these games were a few hours of gameplay stretched out with insta-death, obscure puzzles, and the sheer amount of time it took just to walk across the game's various screens. (Seriously, "walking around" took a ridiculous amount of the game's playtime, especially when a puzzle made you backtrack three screens, type some command, and then return.) Let's get off this rock. Judged by current standards, the Sierra games are no longer what I would play for fun. All the same, I loved them. They introduced me to the joy of exploring virtual worlds and to the power of evocative artwork. I went into space, into fairy tales, and into the past, and I did so while finding the games' humor humorous and their plotlines compelling. ("An army of life insurance salesmen?" I thought at the time. "Hilarious and brilliant!") If the games can feel a bit arbitrary or vexing today, my child-self's love of repetition was able to treat them as engaging challenges rather than "unfair" design. Replaying Space Quest II, encountering the half-remembered jokes and visual designs, brought back these memories. The novelist Thomas Wolfe knew that you can't go home again, and it was probably inevitable that the game would feel dated to me now. But playing it again did take me back to that time before the Internet, when not even hint lines, insta-death, and EGA graphics could dampen the wonder of the new worlds computers were capable of showing us. Literal bathroom humor. Space Quest II, along with several other Sierra titles, is freely and legally available online at sarien.net—though I found many, many glitches in the implementation. Windows users can buy the entire Space Quest collection through Steam or Good Old Games. There's even a fan remake that runs on macOS, Windows, and Linux. Nate Anderson Deputy Editor Nate Anderson Deputy Editor Nate is the deputy editor at Ars Technica. His most recent book is In Emergency, Break Glass: What Nietzsche Can Teach Us About Joyful Living in a Tech-Saturated World, which is much funnier than it sounds. 14 Comments
    0 Comments 0 Shares
  • Visual Grounding for Advanced RAG Frameworks

    Author(s): Felix Pappe
    Originally published on Towards AI.

    Image created by the author using gpt-image-1
    AI chatbots and advanced Retrieval-Augmented Generation (RAG) systems are increasingly adept at providing up-to-date, context-aware answers based on previously retrieved text chunks.
    However, despite their seemingly reliable responses, the significant issue remains, that users often lack a clear method to verify the source of these answers without having to return to the original, often lengthy, documents themselves.
    This is particularly cumbersome when the source is a multi-page academic paper, technical manual, or book.
    Even when a link is provided, users are left searching through dozens of pages, trying to locate the exact section that the chatbot used for the generation of the final response.

    This manual cross-checking is not only time-consuming, it undermines trust in the LLM’s output.As a result, AI-generated answers may appear correct at first glance but remain opaque, leaving users uncertain about their reliability.
    This lack of verifiability can lead to misinformation and misinterpretation.
    In this blog post, I would like to offer a solution to this problem with a visual grounding approach using the Docling parsing tool, Qdrant vector store, and LangChain.

    This RAG framework doesn’t just retrieve relevant text.
    It also highlights the exact location of the extracted text directly on the page from the source document.By connecting answers to theirLLM’s output.The result is a transparent, verifiable, and user-friendly RAG framework that builds trust while maintaining accuracy, which is built in this blog post.
    Docling
    The foundation of the visual grounding approach introduced in this blog post is the Docling document processing pipeline.
    Docling is an open-source tool for layout-aware document parsing and grounding, achieving results comparable to paid solutions like Mistral OCR.Moreover, docling provides an additional key feature for visual grounding, which other document-to-markdown solutions don’t have.

    This feature is the decomposition of the input document into smaller sub-elements, including headings, text chunks, formulas, and tables, using different models in a sophisticated processing pipeline, which is presented in the following image.Docling Pipeline from docling paper
    The output of this processing pipeline is not a markdown file but a DoclingDocument, which consists of all the detected and extracted elements from the input document.

    This intermediate DoclingDocumentclass, enhanced with metadata from extracted elements, allows for the transformation of the original document into various file types and supports the visual grounding discussed in this blog post.RAG framework
    Like all RAG frameworks, this one consists of two phases that can be divided into two scripts.
    In the offline indexing phases, input documents are split into chunks and encoded into vector representations.
    These vectors are stored in a specialized vector database for later retrieval.In the online retrieval and generation phase, text chunks related to a user’s input are retrieved and passed to the LLM to generate a final response.These two phases are implemented in two separate scripts in this post.The following image illustrates the final visual grounding result produced by the two scripts explained in this blog post.
    In the first Python script, the Docling paper is uploaded to a Qdrant vector store during the offline indexing phase.
    In the second script, relevant passages are retrieved based on a given question and are then highlighted directly on the document.Image created by author and designed in canva
    Indexing phase
    The indexing script handles data preprocessing and stores the embedded text chunks in a vector database for the second online retrieval phase.
    I split the script into several parts to explain in detail what happens at each part, providing a deeper understanding of the entire code.
    Imports
    Let’s start very gently with the import of all required libraries and packages, including docling, langchain, and qdrant.
    import osfrom pathlib import Pathfrom uuid import uuid4from dotenv import load_dotenvfrom transformers import AutoTokenizerfrom qdrant_client import QdrantClientfrom qdrant_client.http.models import Distance, VectorParamsfrom langchain_qdrant import QdrantVectorStorefrom langchain_huggingface.embeddings import HuggingFaceEmbeddingsfrom docling.datamodel.base_models import InputFormatfrom docling.datamodel.pipeline_options import PdfPipelineOptionsfrom docling.document_converter import DocumentConverter, PdfFormatOptionfrom docling.chunking import HybridChunkerfrom langchain_docling.loader import DoclingLoader, ExportTypefrom docling_core.types.doc import ImageRefMode
    Configuration and environment variables
    In the next step, the environmental variables are loaded from the .env file and read in with getenv().
    Inside the .env file, your HF_TOKEN for the embedding model and your MISTRAL_API_KEY for the LLM must be included.

    Of course, you can also adjust the code to your needs and choose any other embedding model or LLM.
    But if you change the embedding model, also change the DIM variable, which refers to the final vector dimensions of the embedded chunks.Afterwards, the docling processing pipeline is defined, enabling all functionalities to generate a complete DoclingDocument.
    This pipeline includes the detection and extraction of code blocks, formulas, tables, pages, and page images of the document.
    Moreover, the extracted images are scaled by 2.0 for a higher resolution.
    load_dotenv() HF_TOKEN = os.getenv("HF_TOKEN")MISTRAL_API_KEY = os.getenv("MISTRAL_API_KEY")SOURCES = ["attention-04.pdf"]OUTPUT_DIR = Path("output")OUTPUT_DIR.mkdir(parents=True, exist_ok=True)EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"COLLECTION_NAME = "demo_collection"DIM = 384pipeline_options = PdfPipelineOptions( do_code_enrichment=True, do_formula_enrichment=True, do_table_structure=True, generate_picture_images=True, generate_page_images=True, images_scale=2.0,)pipeline_options.table_structure_options.do_cell_matching = True
    Setting up document converter
    In the next step, the DocumentConverter from Docling is configured using the previously defined pipeline options.
    Later in the code, this converter will be used to transform an input PDF into a DoclingDocument, which consists of modular components such as headings, text paragraphs, images, and tables.
    Furthermore, the embedding model is initialized, which is necessary to embed the extracted sentences from the document into a vector representation.
    converter = DocumentConverter( format_options={ InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options) })embeddings = HuggingFaceEmbeddings(model_name=EMBED_MODEL)
    Initialize qdrant collection
    A local Qdrant vector storage is then set up, allowing you to experiment with the script on your device.
    It either creates a new vector storage if none exists with the current name or connects to an existing one if there is a match.
    client = QdrantClient(path="langchain_qdrant")collections = [col.name for col in client.get_collections().collections]if COLLECTION_NAME not in collections: client.create_collection( collection_name=COLLECTION_NAME, vectors_config=VectorParams(size=DIM, distance=Distance.COSINE), ) print(f"Created new collection '{COLLECTION_NAME}'.")else: print(f"Using existing collection '{COLLECTION_NAME}'.") vector_store = QdrantVectorStore( client=client, collection_name=COLLECTION_NAME, embedding=embeddings,)
    Convert pdf to docling and save outputs
    Subsequently, all PDF files listed in the SOURCE are converted into DoclingDocuments and saved in dl_doc.
    These documents are then transformed into JSON and markdown files, which are stored on your local device.
    The markdown file is used to assess the quality of the file transformation process, while the JSON file is necessary for the subsequent visual grounding process in the second script.
    for source in SOURCES: dl_doc = converter.convert(source=source).document # JSON export out_json = OUTPUT_DIR / f"{dl_doc.origin.binary_hash}.json" dl_doc.save_as_json(out_json) # Markdown export with embedded images out_md = OUTPUT_DIR / f"{dl_doc.origin.binary_hash}.md" dl_doc.save_as_markdown(out_md, image_mode=ImageRefMode.EMBEDDED)
    Chunking the document
    Finally, we come to the main part of the script, including the chunking process of the document into smaller texts.
    This is achieved using the HybridChunker provided by docling.
    The best feature of this HybridChunker is that it tries to keep related passages together based on the markdown formatting and merges passages with each other if they are too small.
    A small max_tokens size has been selected to implement a small-to-big retrieval approach.
    This means that initially, a small chunk of text that closely matches the user's query is retrieved.
    Following this, a larger context chunk that surrounds the retrieved section is additionally retrieved and provided to the language model for generating the final answer.
    In this case, the larger context chunk refers to the paragraph containing the smaller chunk.
    chunker = HybridChunker( tokenizer=EMBED_MODEL, max_tokens=64, merge_peers=True)loader = DoclingLoader( file_path=SOURCES, converter=converter, export_type=ExportType.DOC_CHUNKS, chunker=chunker,)docs = loader.load()
    Embedding the document
    In the final step, the generated chunks are stored in the vector database with a unique identifier, from which they can be retrieved during the online phase.
    ids = [str(uuid4()) for _ in docs]vector_store.add_documents(documents=docs, ids=ids)print("Documents have been embedded into the vector store.")
    Retrieval and generation part
    Now, the online retrieval and generation part leverages the previously embedded text chunks to generate an answer for the user’s input based on the knowledge in the vector store.
    Imports
    As in the previous indexing script, all the necessary packages are included first.
    import osimport refrom pathlib import Pathfrom dotenv import load_dotenvfrom PIL import ImageDraw, Imagefrom pydantic import BaseModel, Fieldfrom qdrant_client import QdrantClientfrom langchain_core.output_parsers import PydanticOutputParserfrom langchain_core.prompts import PromptTemplatefrom langchain_huggingface.embeddings import HuggingFaceEmbeddingsfrom langchain_mistralai import ChatMistralAIfrom langchain_qdrant import QdrantVectorStorefrom docling.chunking import DocMetafrom docling.datamodel.document import DoclingDocument
    Configuration and environment variables
    Then, the necessary environment variables are loaded and configuration variables are set.
    load_dotenv()MISTRAL_API_KEY = os.environ["MISTRAL_API_KEY"]OUTPUT_DIR = Path("output")EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"COLLECTION_NAME = "demo_collection"QUESTION = ( "How does attention is computed in the transformer architecture?")TOP_K = 3OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
    Build JSON lookup table
    In the next step, all JSON files in the OUTPUT directory are detected.
    For each file, its name is extracted using the stem method.
    Afterwards, it’s verified that the name consists only of digits, representing the hash value of the file, before adding the numeric name as a key and the file’s path as a value in the doc_store dictionary.
    Initializing embedding model, vector store, and llm
    After that, the embedding model and the LLM are initialized.
    It is important to use the same embedding model and Qdrant collection name as in the previous indexing script.
    Optionally, you may want to define these names in a separate configuration script and import this script into both the indexing and retrieval scripts.
    For the LLM, I selected Mistral, but you can also choose GPT-4, Gemini, Llama, or any other model you prefer.
    embeddings = HuggingFaceEmbeddings(model_name=EMBED_MODEL)client = QdrantClient(path="langchain_qdrant")try: client.get_collection(collection_name=COLLECTION_NAME)except Exception: print( f"Collection {COLLECTION_NAME} not found; create it before indexing.")vector_store = QdrantVectorStore( client=client, collection_name=COLLECTION_NAME, embedding=embeddings,)llm = ChatMistralAI( api_key=MISTRAL_API_KEY, model="mistral-large-latest", temperature=0, max_retries=2,)
    Preparing the chain
    Thereafter, all components of the RAG chain are set up, beginning with the Answer Pydantic schema, which consists of two fields.

    The first field, answerable is a boolean value that indicates whether the question can be answered using the retrieved knowledge.
    The second field, answer provides the actual response.
    This structure is employed to address the hallucination issue that many LLMs encounter, helping to prevent the dissemination of incorrect information to the user.Following that, the PydanticOutputParser and its get_format_instructuons() method format_instructuons generate the format instructions for the LLM.
    Moreover, the parser is employed again as the final Runnable of the LangChain execution chain, with its elements connected by the | operator.
    The prompt created for the PromptTemplate is basic and can be further enhanced by the introduction of few-shot examples or other advanced prompt engineering techniques.
    class Answer(BaseModel): answerable: bool = Field( ..., description="Whether the question can be answered" ) answer: str = Field( ..., description="The answer based on the provided knowledge" ) parser = PydanticOutputParser(pydantic_object=Answer)format_instructions = parser.get_format_instructions()prompt = PromptTemplate( input_variables=["knowledge", "topic"], partial_variables={"format_instructions": format_instructions}, template=( "You are given the following grounding material:\n\n" "{knowledge}\n\n" "Question:\n" "{topic}\n\n" "Please provide a concise answer in full sentences " "based solely on the information above.\n" "If the answer is not contained within the provided material, " "reply with:\n" "“There is no answer to this question in the provided material.”\n\n" "{format_instructions}" ),)rag_chain = prompt | llm | parser
    Run similarity search
    Once the chain has been set up.
    The text chunks semantically related to the user input query are retrieved.
    results = vector_store.similarity_search_with_score( k=TOP_K, query=QUESTION)
    Load and assemble grounding texts
    Based on the retrieved text chunks from the vector database, the entire paragraph containing each chunk is loaded.
    This approach is inspired by the small-to-big retrieval technique introduced earlier.
    Image created by the author, designed with canva and graphics generated using gpt-image-1
    The relevant paragraph is pulled from the JSON file associated with the document from which the chunk originates.
    The correct JSON file is identified by the document’s hash value, which appears both in the text chunk’s metadata and in the JSON filename.
    Once the correct JSON file is identified, it’s loaded via the load_from_json method of the DoclingDocument.
    The original text item referenced in the current result chunk’s metadata is then extracted from the JSON file using a regular expression.
    If the referenced text item is found, the full text passage is retrieved to generate the final result.
    This example focuses exclusively on text grounding.
    However, Docling also provides references to previously identified images and tables.
    grounding_texts: list[str] = []for res, score in results: meta = DocMeta.model_validate(res.metadata["dl_meta"]) h = meta.origin.binary_hash json_file = doc_store.get(h) if not json_file: continue dl_doc = DoclingDocument.load_from_json(json_file) for item in meta.doc_items: if not item.prov: continue match = re.search(r"^#/texts/(\d+)$", item.self_ref) if not match: continue idx = int(match.group(1)) grounding_texts.append(dl_doc.texts[idx].text.strip())knowledge = "\n\n".join(grounding_texts)print("Assembled grounding material:\n", knowledge)
    Invoke the llm
    Finally, the LLM can be invoked using the previously defined chain.
    Inside the invoke method, the retrieved knowledge passages from the original JSON file and the user’s input question are passed in.
    The returned values conform to the defined AnswerPydantic schema for a structured evaluation the results.
    answer_obj = rag_chain.invoke( {"knowledge": knowledge, "topic": QUESTION})print("Answerable?", answer_obj.answerable)print("Answer: ", answer_obj.answer)
    Visual grounding
    The subsequent script proceeds to visually ground the used chunks in the corresponding documents only if the question can be answered based on the provided content, meaning the LLM returns that answerable is equal to true/1.
    This grounding process is approached similarly to how original text passages are extracted from documents.
    However, this time, the focus is on the bounding boxes surrounding these text passages instead of the raw text itself.
    These bounding boxes are another outcome of the document analysis and processing performed by the Docling pipeline in the indexing script.
    Now, these boxes can be utilized to anchor the generated answers to the original pages of the document.
    The coordinates of these bounding boxes come also from the metadata of each chunk,
    Moreover, the metadata of a chunk includes the page number, which is required to extract the screenshot of the correct page from the corresponding file.
    This screenshot of the page is the foundation for the visual grounding process, as bounding boxes are added on top of it.
    In the end, these bounding boxes are on top of the retrieved image of the correct side, allowing the user to precisely identify from which part of the document the information was retrieved.
    The final visually enhanced images are then stored in the same output directory as the input document and are ready for visual inspection.
    if answer_obj.answerable: for i, (res, score) in enumerate(results, start=1): meta = DocMeta.model_validate(res.metadata["dl_meta"]) h = meta.origin.binary_hash json_file = doc_store.get(h) if not json_file: continue dl_doc = DoclingDocument.load_from_json(json_file) image_by_page: dict[int, "Image.Image"] = {} for item in meta.doc_items: if not item.prov: continue prov = item.prov[0] p = prov.page_no if p not in image_by_page: image_by_page[p] = dl_doc.pages[p].image.pil_image.copy() img = image_by_page[p] bbox = prov.bbox.to_top_left_origin( page_height=dl_doc.pages[p].size.height ).normalized(dl_doc.pages[p].size) left = round(bbox.l * img.width) - 2 top = round(bbox.t * img.height) - 2 right = round(bbox.r * img.width) + 2 bottom = round(bbox.b * img.height) + 2 draw = ImageDraw.Draw(img) draw.rectangle([left, top, right, bottom], outline="blue", width=2) for p, img in image_by_page.items(): out_png = OUTPUT_DIR / f"source_{i}_page_{p}.png" img.save(out_png) print(f"Saved annotated page {p} → {out_png}")
    Limitation
    However, no solution is without flaws, and this one is no exception.
    The first point to consider is the heavy reliance on Docling.
    This tool is deeply integrated into the RAG framework, as it is used in almost every part of the system, including document parsing, text chunking, and grounding the final answer, which depends on the DoclingDocument class.
    Another point is the processing speed and resource requirements.
    While the advantage of Docling is that it can be deployed entirely on your local device, its processing speed and the maximum manageable file size depend significantly on your hardware.
    If your hardware is not powerful enough, you may only be able to parse one page of a document at a time.
    Additionally, a limitation of the provided example in this blog post is its focus solely on text.
    It ignores images and tables that also contain rich information and leaves space for further enhancements.
    Conclusion
    But despite these limitations, Docling is a neat way to enhance advanced RAG frameworks quickly with additional visual grounding capabilities.
    By connecting retrieved answers directly to their visual origin, this approach not only boosts transparency but also helps users build trust in the system’s output.
    What is your opinion about Docling and this introduced RAG framework?
    Have you already experimented with other grounding approaches for more explainable AI?Sources
    Docling Paper: https://arxiv.org/abs/2501.17887" style="color: #0066cc;">https://arxiv.org/abs/2501.17887
    Docling Documentation: https://docling-project.github.io/docling/" style="color: #0066cc;">https://docling-project.github.io/docling/
    Qdrant local quickstart: https://qdrant.tech/documentation/quickstart/" style="color: #0066cc;">https://qdrant.tech/documentation/quickstart/
    LangChain ChatMistralAI: https://python.langchain.com/docs/integrations/chat/mistralai/" style="color: #0066cc;">https://python.langchain.com/docs/integrations/chat/mistralai/
    LangChain structured outputs: https://python.langchain.com/docs/concepts/structured_outputs/" style="color: #0066cc;">https://python.langchain.com/docs/concepts/structured_outputs/
    Join thousands of data leaders on the AI newsletter.
    Join over 80,000 subscribers and keep up to date with the latest developments in AI.
    From research to projects and ideas.
    If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
    Published via Towards AI

    Source: https://towardsai.net/p/l/visual-grounding-for-advanced-rag-frameworks" style="color: #0066cc;">https://towardsai.net/p/l/visual-grounding-for-advanced-rag-frameworks
    #visual #grounding #for #advanced #rag #frameworks
    Visual Grounding for Advanced RAG Frameworks
    Author(s): Felix Pappe Originally published on Towards AI. Image created by the author using gpt-image-1 AI chatbots and advanced Retrieval-Augmented Generation (RAG) systems are increasingly adept at providing up-to-date, context-aware answers based on previously retrieved text chunks. However, despite their seemingly reliable responses, the significant issue remains, that users often lack a clear method to verify the source of these answers without having to return to the original, often lengthy, documents themselves. This is particularly cumbersome when the source is a multi-page academic paper, technical manual, or book. Even when a link is provided, users are left searching through dozens of pages, trying to locate the exact section that the chatbot used for the generation of the final response. This manual cross-checking is not only time-consuming, it undermines trust in the LLM’s output.As a result, AI-generated answers may appear correct at first glance but remain opaque, leaving users uncertain about their reliability. This lack of verifiability can lead to misinformation and misinterpretation. In this blog post, I would like to offer a solution to this problem with a visual grounding approach using the Docling parsing tool, Qdrant vector store, and LangChain. This RAG framework doesn’t just retrieve relevant text. It also highlights the exact location of the extracted text directly on the page from the source document.By connecting answers to theirLLM’s output.The result is a transparent, verifiable, and user-friendly RAG framework that builds trust while maintaining accuracy, which is built in this blog post. Docling The foundation of the visual grounding approach introduced in this blog post is the Docling document processing pipeline. Docling is an open-source tool for layout-aware document parsing and grounding, achieving results comparable to paid solutions like Mistral OCR.Moreover, docling provides an additional key feature for visual grounding, which other document-to-markdown solutions don’t have. This feature is the decomposition of the input document into smaller sub-elements, including headings, text chunks, formulas, and tables, using different models in a sophisticated processing pipeline, which is presented in the following image.Docling Pipeline from docling paper The output of this processing pipeline is not a markdown file but a DoclingDocument, which consists of all the detected and extracted elements from the input document. This intermediate DoclingDocumentclass, enhanced with metadata from extracted elements, allows for the transformation of the original document into various file types and supports the visual grounding discussed in this blog post.RAG framework Like all RAG frameworks, this one consists of two phases that can be divided into two scripts. In the offline indexing phases, input documents are split into chunks and encoded into vector representations. These vectors are stored in a specialized vector database for later retrieval.In the online retrieval and generation phase, text chunks related to a user’s input are retrieved and passed to the LLM to generate a final response.These two phases are implemented in two separate scripts in this post.The following image illustrates the final visual grounding result produced by the two scripts explained in this blog post. In the first Python script, the Docling paper is uploaded to a Qdrant vector store during the offline indexing phase. In the second script, relevant passages are retrieved based on a given question and are then highlighted directly on the document.Image created by author and designed in canva Indexing phase The indexing script handles data preprocessing and stores the embedded text chunks in a vector database for the second online retrieval phase. I split the script into several parts to explain in detail what happens at each part, providing a deeper understanding of the entire code. Imports Let’s start very gently with the import of all required libraries and packages, including docling, langchain, and qdrant. import osfrom pathlib import Pathfrom uuid import uuid4from dotenv import load_dotenvfrom transformers import AutoTokenizerfrom qdrant_client import QdrantClientfrom qdrant_client.http.models import Distance, VectorParamsfrom langchain_qdrant import QdrantVectorStorefrom langchain_huggingface.embeddings import HuggingFaceEmbeddingsfrom docling.datamodel.base_models import InputFormatfrom docling.datamodel.pipeline_options import PdfPipelineOptionsfrom docling.document_converter import DocumentConverter, PdfFormatOptionfrom docling.chunking import HybridChunkerfrom langchain_docling.loader import DoclingLoader, ExportTypefrom docling_core.types.doc import ImageRefMode Configuration and environment variables In the next step, the environmental variables are loaded from the .env file and read in with getenv(). Inside the .env file, your HF_TOKEN for the embedding model and your MISTRAL_API_KEY for the LLM must be included. Of course, you can also adjust the code to your needs and choose any other embedding model or LLM. But if you change the embedding model, also change the DIM variable, which refers to the final vector dimensions of the embedded chunks.Afterwards, the docling processing pipeline is defined, enabling all functionalities to generate a complete DoclingDocument. This pipeline includes the detection and extraction of code blocks, formulas, tables, pages, and page images of the document. Moreover, the extracted images are scaled by 2.0 for a higher resolution. load_dotenv() HF_TOKEN = os.getenv("HF_TOKEN")MISTRAL_API_KEY = os.getenv("MISTRAL_API_KEY")SOURCES = ["attention-04.pdf"]OUTPUT_DIR = Path("output")OUTPUT_DIR.mkdir(parents=True, exist_ok=True)EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"COLLECTION_NAME = "demo_collection"DIM = 384pipeline_options = PdfPipelineOptions( do_code_enrichment=True, do_formula_enrichment=True, do_table_structure=True, generate_picture_images=True, generate_page_images=True, images_scale=2.0,)pipeline_options.table_structure_options.do_cell_matching = True Setting up document converter In the next step, the DocumentConverter from Docling is configured using the previously defined pipeline options. Later in the code, this converter will be used to transform an input PDF into a DoclingDocument, which consists of modular components such as headings, text paragraphs, images, and tables. Furthermore, the embedding model is initialized, which is necessary to embed the extracted sentences from the document into a vector representation. converter = DocumentConverter( format_options={ InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options) })embeddings = HuggingFaceEmbeddings(model_name=EMBED_MODEL) Initialize qdrant collection A local Qdrant vector storage is then set up, allowing you to experiment with the script on your device. It either creates a new vector storage if none exists with the current name or connects to an existing one if there is a match. client = QdrantClient(path="langchain_qdrant")collections = [col.name for col in client.get_collections().collections]if COLLECTION_NAME not in collections: client.create_collection( collection_name=COLLECTION_NAME, vectors_config=VectorParams(size=DIM, distance=Distance.COSINE), ) print(f"Created new collection '{COLLECTION_NAME}'.")else: print(f"Using existing collection '{COLLECTION_NAME}'.") vector_store = QdrantVectorStore( client=client, collection_name=COLLECTION_NAME, embedding=embeddings,) Convert pdf to docling and save outputs Subsequently, all PDF files listed in the SOURCE are converted into DoclingDocuments and saved in dl_doc. These documents are then transformed into JSON and markdown files, which are stored on your local device. The markdown file is used to assess the quality of the file transformation process, while the JSON file is necessary for the subsequent visual grounding process in the second script. for source in SOURCES: dl_doc = converter.convert(source=source).document # JSON export out_json = OUTPUT_DIR / f"{dl_doc.origin.binary_hash}.json" dl_doc.save_as_json(out_json) # Markdown export with embedded images out_md = OUTPUT_DIR / f"{dl_doc.origin.binary_hash}.md" dl_doc.save_as_markdown(out_md, image_mode=ImageRefMode.EMBEDDED) Chunking the document Finally, we come to the main part of the script, including the chunking process of the document into smaller texts. This is achieved using the HybridChunker provided by docling. The best feature of this HybridChunker is that it tries to keep related passages together based on the markdown formatting and merges passages with each other if they are too small. A small max_tokens size has been selected to implement a small-to-big retrieval approach. This means that initially, a small chunk of text that closely matches the user's query is retrieved. Following this, a larger context chunk that surrounds the retrieved section is additionally retrieved and provided to the language model for generating the final answer. In this case, the larger context chunk refers to the paragraph containing the smaller chunk. chunker = HybridChunker( tokenizer=EMBED_MODEL, max_tokens=64, merge_peers=True)loader = DoclingLoader( file_path=SOURCES, converter=converter, export_type=ExportType.DOC_CHUNKS, chunker=chunker,)docs = loader.load() Embedding the document In the final step, the generated chunks are stored in the vector database with a unique identifier, from which they can be retrieved during the online phase. ids = [str(uuid4()) for _ in docs]vector_store.add_documents(documents=docs, ids=ids)print("Documents have been embedded into the vector store.") Retrieval and generation part Now, the online retrieval and generation part leverages the previously embedded text chunks to generate an answer for the user’s input based on the knowledge in the vector store. Imports As in the previous indexing script, all the necessary packages are included first. import osimport refrom pathlib import Pathfrom dotenv import load_dotenvfrom PIL import ImageDraw, Imagefrom pydantic import BaseModel, Fieldfrom qdrant_client import QdrantClientfrom langchain_core.output_parsers import PydanticOutputParserfrom langchain_core.prompts import PromptTemplatefrom langchain_huggingface.embeddings import HuggingFaceEmbeddingsfrom langchain_mistralai import ChatMistralAIfrom langchain_qdrant import QdrantVectorStorefrom docling.chunking import DocMetafrom docling.datamodel.document import DoclingDocument Configuration and environment variables Then, the necessary environment variables are loaded and configuration variables are set. load_dotenv()MISTRAL_API_KEY = os.environ["MISTRAL_API_KEY"]OUTPUT_DIR = Path("output")EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"COLLECTION_NAME = "demo_collection"QUESTION = ( "How does attention is computed in the transformer architecture?")TOP_K = 3OUTPUT_DIR.mkdir(parents=True, exist_ok=True) Build JSON lookup table In the next step, all JSON files in the OUTPUT directory are detected. For each file, its name is extracted using the stem method. Afterwards, it’s verified that the name consists only of digits, representing the hash value of the file, before adding the numeric name as a key and the file’s path as a value in the doc_store dictionary. Initializing embedding model, vector store, and llm After that, the embedding model and the LLM are initialized. It is important to use the same embedding model and Qdrant collection name as in the previous indexing script. Optionally, you may want to define these names in a separate configuration script and import this script into both the indexing and retrieval scripts. For the LLM, I selected Mistral, but you can also choose GPT-4, Gemini, Llama, or any other model you prefer. embeddings = HuggingFaceEmbeddings(model_name=EMBED_MODEL)client = QdrantClient(path="langchain_qdrant")try: client.get_collection(collection_name=COLLECTION_NAME)except Exception: print( f"Collection {COLLECTION_NAME} not found; create it before indexing.")vector_store = QdrantVectorStore( client=client, collection_name=COLLECTION_NAME, embedding=embeddings,)llm = ChatMistralAI( api_key=MISTRAL_API_KEY, model="mistral-large-latest", temperature=0, max_retries=2,) Preparing the chain Thereafter, all components of the RAG chain are set up, beginning with the Answer Pydantic schema, which consists of two fields. The first field, answerable is a boolean value that indicates whether the question can be answered using the retrieved knowledge. The second field, answer provides the actual response. This structure is employed to address the hallucination issue that many LLMs encounter, helping to prevent the dissemination of incorrect information to the user.Following that, the PydanticOutputParser and its get_format_instructuons() method format_instructuons generate the format instructions for the LLM. Moreover, the parser is employed again as the final Runnable of the LangChain execution chain, with its elements connected by the | operator. The prompt created for the PromptTemplate is basic and can be further enhanced by the introduction of few-shot examples or other advanced prompt engineering techniques. class Answer(BaseModel): answerable: bool = Field( ..., description="Whether the question can be answered" ) answer: str = Field( ..., description="The answer based on the provided knowledge" ) parser = PydanticOutputParser(pydantic_object=Answer)format_instructions = parser.get_format_instructions()prompt = PromptTemplate( input_variables=["knowledge", "topic"], partial_variables={"format_instructions": format_instructions}, template=( "You are given the following grounding material:\n\n" "{knowledge}\n\n" "Question:\n" "{topic}\n\n" "Please provide a concise answer in full sentences " "based solely on the information above.\n" "If the answer is not contained within the provided material, " "reply with:\n" "“There is no answer to this question in the provided material.”\n\n" "{format_instructions}" ),)rag_chain = prompt | llm | parser Run similarity search Once the chain has been set up. The text chunks semantically related to the user input query are retrieved. results = vector_store.similarity_search_with_score( k=TOP_K, query=QUESTION) Load and assemble grounding texts Based on the retrieved text chunks from the vector database, the entire paragraph containing each chunk is loaded. This approach is inspired by the small-to-big retrieval technique introduced earlier. Image created by the author, designed with canva and graphics generated using gpt-image-1 The relevant paragraph is pulled from the JSON file associated with the document from which the chunk originates. The correct JSON file is identified by the document’s hash value, which appears both in the text chunk’s metadata and in the JSON filename. Once the correct JSON file is identified, it’s loaded via the load_from_json method of the DoclingDocument. The original text item referenced in the current result chunk’s metadata is then extracted from the JSON file using a regular expression. If the referenced text item is found, the full text passage is retrieved to generate the final result. This example focuses exclusively on text grounding. However, Docling also provides references to previously identified images and tables. grounding_texts: list[str] = []for res, score in results: meta = DocMeta.model_validate(res.metadata["dl_meta"]) h = meta.origin.binary_hash json_file = doc_store.get(h) if not json_file: continue dl_doc = DoclingDocument.load_from_json(json_file) for item in meta.doc_items: if not item.prov: continue match = re.search(r"^#/texts/(\d+)$", item.self_ref) if not match: continue idx = int(match.group(1)) grounding_texts.append(dl_doc.texts[idx].text.strip())knowledge = "\n\n".join(grounding_texts)print("Assembled grounding material:\n", knowledge) Invoke the llm Finally, the LLM can be invoked using the previously defined chain. Inside the invoke method, the retrieved knowledge passages from the original JSON file and the user’s input question are passed in. The returned values conform to the defined AnswerPydantic schema for a structured evaluation the results. answer_obj = rag_chain.invoke( {"knowledge": knowledge, "topic": QUESTION})print("Answerable?", answer_obj.answerable)print("Answer: ", answer_obj.answer) Visual grounding The subsequent script proceeds to visually ground the used chunks in the corresponding documents only if the question can be answered based on the provided content, meaning the LLM returns that answerable is equal to true/1. This grounding process is approached similarly to how original text passages are extracted from documents. However, this time, the focus is on the bounding boxes surrounding these text passages instead of the raw text itself. These bounding boxes are another outcome of the document analysis and processing performed by the Docling pipeline in the indexing script. Now, these boxes can be utilized to anchor the generated answers to the original pages of the document. The coordinates of these bounding boxes come also from the metadata of each chunk, Moreover, the metadata of a chunk includes the page number, which is required to extract the screenshot of the correct page from the corresponding file. This screenshot of the page is the foundation for the visual grounding process, as bounding boxes are added on top of it. In the end, these bounding boxes are on top of the retrieved image of the correct side, allowing the user to precisely identify from which part of the document the information was retrieved. The final visually enhanced images are then stored in the same output directory as the input document and are ready for visual inspection. if answer_obj.answerable: for i, (res, score) in enumerate(results, start=1): meta = DocMeta.model_validate(res.metadata["dl_meta"]) h = meta.origin.binary_hash json_file = doc_store.get(h) if not json_file: continue dl_doc = DoclingDocument.load_from_json(json_file) image_by_page: dict[int, "Image.Image"] = {} for item in meta.doc_items: if not item.prov: continue prov = item.prov[0] p = prov.page_no if p not in image_by_page: image_by_page[p] = dl_doc.pages[p].image.pil_image.copy() img = image_by_page[p] bbox = prov.bbox.to_top_left_origin( page_height=dl_doc.pages[p].size.height ).normalized(dl_doc.pages[p].size) left = round(bbox.l * img.width) - 2 top = round(bbox.t * img.height) - 2 right = round(bbox.r * img.width) + 2 bottom = round(bbox.b * img.height) + 2 draw = ImageDraw.Draw(img) draw.rectangle([left, top, right, bottom], outline="blue", width=2) for p, img in image_by_page.items(): out_png = OUTPUT_DIR / f"source_{i}_page_{p}.png" img.save(out_png) print(f"Saved annotated page {p} → {out_png}") Limitation However, no solution is without flaws, and this one is no exception. The first point to consider is the heavy reliance on Docling. This tool is deeply integrated into the RAG framework, as it is used in almost every part of the system, including document parsing, text chunking, and grounding the final answer, which depends on the DoclingDocument class. Another point is the processing speed and resource requirements. While the advantage of Docling is that it can be deployed entirely on your local device, its processing speed and the maximum manageable file size depend significantly on your hardware. If your hardware is not powerful enough, you may only be able to parse one page of a document at a time. Additionally, a limitation of the provided example in this blog post is its focus solely on text. It ignores images and tables that also contain rich information and leaves space for further enhancements. Conclusion But despite these limitations, Docling is a neat way to enhance advanced RAG frameworks quickly with additional visual grounding capabilities. By connecting retrieved answers directly to their visual origin, this approach not only boosts transparency but also helps users build trust in the system’s output. What is your opinion about Docling and this introduced RAG framework? Have you already experimented with other grounding approaches for more explainable AI?Sources Docling Paper: https://arxiv.org/abs/2501.17887 Docling Documentation: https://docling-project.github.io/docling/ Qdrant local quickstart: https://qdrant.tech/documentation/quickstart/ LangChain ChatMistralAI: https://python.langchain.com/docs/integrations/chat/mistralai/ LangChain structured outputs: https://python.langchain.com/docs/concepts/structured_outputs/ Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Source: https://towardsai.net/p/l/visual-grounding-for-advanced-rag-frameworks #visual #grounding #for #advanced #rag #frameworks
    TOWARDSAI.NET
    Visual Grounding for Advanced RAG Frameworks
    Author(s): Felix Pappe Originally published on Towards AI. Image created by the author using gpt-image-1 AI chatbots and advanced Retrieval-Augmented Generation (RAG) systems are increasingly adept at providing up-to-date, context-aware answers based on previously retrieved text chunks. However, despite their seemingly reliable responses, the significant issue remains, that users often lack a clear method to verify the source of these answers without having to return to the original, often lengthy, documents themselves. This is particularly cumbersome when the source is a multi-page academic paper, technical manual, or book. Even when a link is provided, users are left searching through dozens of pages, trying to locate the exact section that the chatbot used for the generation of the final response. This manual cross-checking is not only time-consuming, it undermines trust in the LLM’s output.As a result, AI-generated answers may appear correct at first glance but remain opaque, leaving users uncertain about their reliability. This lack of verifiability can lead to misinformation and misinterpretation. In this blog post, I would like to offer a solution to this problem with a visual grounding approach using the Docling parsing tool, Qdrant vector store, and LangChain. This RAG framework doesn’t just retrieve relevant text. It also highlights the exact location of the extracted text directly on the page from the source document.By connecting answers to theirLLM’s output.The result is a transparent, verifiable, and user-friendly RAG framework that builds trust while maintaining accuracy, which is built in this blog post. Docling The foundation of the visual grounding approach introduced in this blog post is the Docling document processing pipeline. Docling is an open-source tool for layout-aware document parsing and grounding, achieving results comparable to paid solutions like Mistral OCR.Moreover, docling provides an additional key feature for visual grounding, which other document-to-markdown solutions don’t have. This feature is the decomposition of the input document into smaller sub-elements, including headings, text chunks, formulas, and tables, using different models in a sophisticated processing pipeline, which is presented in the following image.Docling Pipeline from docling paper The output of this processing pipeline is not a markdown file but a DoclingDocument, which consists of all the detected and extracted elements from the input document. This intermediate DoclingDocumentclass, enhanced with metadata from extracted elements, allows for the transformation of the original document into various file types and supports the visual grounding discussed in this blog post.RAG framework Like all RAG frameworks, this one consists of two phases that can be divided into two scripts. In the offline indexing phases, input documents are split into chunks and encoded into vector representations. These vectors are stored in a specialized vector database for later retrieval.In the online retrieval and generation phase, text chunks related to a user’s input are retrieved and passed to the LLM to generate a final response.These two phases are implemented in two separate scripts in this post.The following image illustrates the final visual grounding result produced by the two scripts explained in this blog post. In the first Python script, the Docling paper is uploaded to a Qdrant vector store during the offline indexing phase. In the second script, relevant passages are retrieved based on a given question and are then highlighted directly on the document.Image created by author and designed in canva Indexing phase The indexing script handles data preprocessing and stores the embedded text chunks in a vector database for the second online retrieval phase. I split the script into several parts to explain in detail what happens at each part, providing a deeper understanding of the entire code. Imports Let’s start very gently with the import of all required libraries and packages, including docling, langchain, and qdrant. import osfrom pathlib import Pathfrom uuid import uuid4from dotenv import load_dotenvfrom transformers import AutoTokenizerfrom qdrant_client import QdrantClientfrom qdrant_client.http.models import Distance, VectorParamsfrom langchain_qdrant import QdrantVectorStorefrom langchain_huggingface.embeddings import HuggingFaceEmbeddingsfrom docling.datamodel.base_models import InputFormatfrom docling.datamodel.pipeline_options import PdfPipelineOptionsfrom docling.document_converter import DocumentConverter, PdfFormatOptionfrom docling.chunking import HybridChunkerfrom langchain_docling.loader import DoclingLoader, ExportTypefrom docling_core.types.doc import ImageRefMode Configuration and environment variables In the next step, the environmental variables are loaded from the .env file and read in with getenv(). Inside the .env file, your HF_TOKEN for the embedding model and your MISTRAL_API_KEY for the LLM must be included. Of course, you can also adjust the code to your needs and choose any other embedding model or LLM. But if you change the embedding model, also change the DIM variable, which refers to the final vector dimensions of the embedded chunks.Afterwards, the docling processing pipeline is defined, enabling all functionalities to generate a complete DoclingDocument. This pipeline includes the detection and extraction of code blocks, formulas, tables, pages, and page images of the document. Moreover, the extracted images are scaled by 2.0 for a higher resolution. load_dotenv() HF_TOKEN = os.getenv("HF_TOKEN")MISTRAL_API_KEY = os.getenv("MISTRAL_API_KEY")SOURCES = ["attention-04.pdf"]OUTPUT_DIR = Path("output")OUTPUT_DIR.mkdir(parents=True, exist_ok=True)EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"COLLECTION_NAME = "demo_collection"DIM = 384pipeline_options = PdfPipelineOptions( do_code_enrichment=True, do_formula_enrichment=True, do_table_structure=True, generate_picture_images=True, generate_page_images=True, images_scale=2.0,)pipeline_options.table_structure_options.do_cell_matching = True Setting up document converter In the next step, the DocumentConverter from Docling is configured using the previously defined pipeline options. Later in the code, this converter will be used to transform an input PDF into a DoclingDocument, which consists of modular components such as headings, text paragraphs, images, and tables. Furthermore, the embedding model is initialized, which is necessary to embed the extracted sentences from the document into a vector representation. converter = DocumentConverter( format_options={ InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options) })embeddings = HuggingFaceEmbeddings(model_name=EMBED_MODEL) Initialize qdrant collection A local Qdrant vector storage is then set up, allowing you to experiment with the script on your device. It either creates a new vector storage if none exists with the current name or connects to an existing one if there is a match. client = QdrantClient(path="langchain_qdrant")collections = [col.name for col in client.get_collections().collections]if COLLECTION_NAME not in collections: client.create_collection( collection_name=COLLECTION_NAME, vectors_config=VectorParams(size=DIM, distance=Distance.COSINE), ) print(f"Created new collection '{COLLECTION_NAME}'.")else: print(f"Using existing collection '{COLLECTION_NAME}'.") vector_store = QdrantVectorStore( client=client, collection_name=COLLECTION_NAME, embedding=embeddings,) Convert pdf to docling and save outputs Subsequently, all PDF files listed in the SOURCE are converted into DoclingDocuments and saved in dl_doc. These documents are then transformed into JSON and markdown files, which are stored on your local device. The markdown file is used to assess the quality of the file transformation process, while the JSON file is necessary for the subsequent visual grounding process in the second script. for source in SOURCES: dl_doc = converter.convert(source=source).document # JSON export out_json = OUTPUT_DIR / f"{dl_doc.origin.binary_hash}.json" dl_doc.save_as_json(out_json) # Markdown export with embedded images out_md = OUTPUT_DIR / f"{dl_doc.origin.binary_hash}.md" dl_doc.save_as_markdown(out_md, image_mode=ImageRefMode.EMBEDDED) Chunking the document Finally, we come to the main part of the script, including the chunking process of the document into smaller texts. This is achieved using the HybridChunker provided by docling. The best feature of this HybridChunker is that it tries to keep related passages together based on the markdown formatting and merges passages with each other if they are too small. A small max_tokens size has been selected to implement a small-to-big retrieval approach. This means that initially, a small chunk of text that closely matches the user's query is retrieved. Following this, a larger context chunk that surrounds the retrieved section is additionally retrieved and provided to the language model for generating the final answer. In this case, the larger context chunk refers to the paragraph containing the smaller chunk. chunker = HybridChunker( tokenizer=EMBED_MODEL, max_tokens=64, merge_peers=True)loader = DoclingLoader( file_path=SOURCES, converter=converter, export_type=ExportType.DOC_CHUNKS, chunker=chunker,)docs = loader.load() Embedding the document In the final step, the generated chunks are stored in the vector database with a unique identifier, from which they can be retrieved during the online phase. ids = [str(uuid4()) for _ in docs]vector_store.add_documents(documents=docs, ids=ids)print("Documents have been embedded into the vector store.") Retrieval and generation part Now, the online retrieval and generation part leverages the previously embedded text chunks to generate an answer for the user’s input based on the knowledge in the vector store. Imports As in the previous indexing script, all the necessary packages are included first. import osimport refrom pathlib import Pathfrom dotenv import load_dotenvfrom PIL import ImageDraw, Imagefrom pydantic import BaseModel, Fieldfrom qdrant_client import QdrantClientfrom langchain_core.output_parsers import PydanticOutputParserfrom langchain_core.prompts import PromptTemplatefrom langchain_huggingface.embeddings import HuggingFaceEmbeddingsfrom langchain_mistralai import ChatMistralAIfrom langchain_qdrant import QdrantVectorStorefrom docling.chunking import DocMetafrom docling.datamodel.document import DoclingDocument Configuration and environment variables Then, the necessary environment variables are loaded and configuration variables are set. load_dotenv()MISTRAL_API_KEY = os.environ["MISTRAL_API_KEY"]OUTPUT_DIR = Path("output")EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"COLLECTION_NAME = "demo_collection"QUESTION = ( "How does attention is computed in the transformer architecture?")TOP_K = 3OUTPUT_DIR.mkdir(parents=True, exist_ok=True) Build JSON lookup table In the next step, all JSON files in the OUTPUT directory are detected. For each file, its name is extracted using the stem method. Afterwards, it’s verified that the name consists only of digits, representing the hash value of the file, before adding the numeric name as a key and the file’s path as a value in the doc_store dictionary. Initializing embedding model, vector store, and llm After that, the embedding model and the LLM are initialized. It is important to use the same embedding model and Qdrant collection name as in the previous indexing script. Optionally, you may want to define these names in a separate configuration script and import this script into both the indexing and retrieval scripts. For the LLM, I selected Mistral, but you can also choose GPT-4, Gemini, Llama, or any other model you prefer. embeddings = HuggingFaceEmbeddings(model_name=EMBED_MODEL)client = QdrantClient(path="langchain_qdrant")try: client.get_collection(collection_name=COLLECTION_NAME)except Exception: print( f"Collection {COLLECTION_NAME} not found; create it before indexing.")vector_store = QdrantVectorStore( client=client, collection_name=COLLECTION_NAME, embedding=embeddings,)llm = ChatMistralAI( api_key=MISTRAL_API_KEY, model="mistral-large-latest", temperature=0, max_retries=2,) Preparing the chain Thereafter, all components of the RAG chain are set up, beginning with the Answer Pydantic schema, which consists of two fields. The first field, answerable is a boolean value that indicates whether the question can be answered using the retrieved knowledge. The second field, answer provides the actual response. This structure is employed to address the hallucination issue that many LLMs encounter, helping to prevent the dissemination of incorrect information to the user.Following that, the PydanticOutputParser and its get_format_instructuons() method format_instructuons generate the format instructions for the LLM. Moreover, the parser is employed again as the final Runnable of the LangChain execution chain, with its elements connected by the | operator. The prompt created for the PromptTemplate is basic and can be further enhanced by the introduction of few-shot examples or other advanced prompt engineering techniques. class Answer(BaseModel): answerable: bool = Field( ..., description="Whether the question can be answered" ) answer: str = Field( ..., description="The answer based on the provided knowledge" ) parser = PydanticOutputParser(pydantic_object=Answer)format_instructions = parser.get_format_instructions()prompt = PromptTemplate( input_variables=["knowledge", "topic"], partial_variables={"format_instructions": format_instructions}, template=( "You are given the following grounding material:\n\n" "{knowledge}\n\n" "Question:\n" "{topic}\n\n" "Please provide a concise answer in full sentences " "based solely on the information above.\n" "If the answer is not contained within the provided material, " "reply with:\n" "“There is no answer to this question in the provided material.”\n\n" "{format_instructions}" ),)rag_chain = prompt | llm | parser Run similarity search Once the chain has been set up. The text chunks semantically related to the user input query are retrieved. results = vector_store.similarity_search_with_score( k=TOP_K, query=QUESTION) Load and assemble grounding texts Based on the retrieved text chunks from the vector database, the entire paragraph containing each chunk is loaded. This approach is inspired by the small-to-big retrieval technique introduced earlier. Image created by the author, designed with canva and graphics generated using gpt-image-1 The relevant paragraph is pulled from the JSON file associated with the document from which the chunk originates. The correct JSON file is identified by the document’s hash value, which appears both in the text chunk’s metadata and in the JSON filename. Once the correct JSON file is identified, it’s loaded via the load_from_json method of the DoclingDocument. The original text item referenced in the current result chunk’s metadata is then extracted from the JSON file using a regular expression. If the referenced text item is found, the full text passage is retrieved to generate the final result. This example focuses exclusively on text grounding. However, Docling also provides references to previously identified images and tables. grounding_texts: list[str] = []for res, score in results: meta = DocMeta.model_validate(res.metadata["dl_meta"]) h = meta.origin.binary_hash json_file = doc_store.get(h) if not json_file: continue dl_doc = DoclingDocument.load_from_json(json_file) for item in meta.doc_items: if not item.prov: continue match = re.search(r"^#/texts/(\d+)$", item.self_ref) if not match: continue idx = int(match.group(1)) grounding_texts.append(dl_doc.texts[idx].text.strip())knowledge = "\n\n".join(grounding_texts)print("Assembled grounding material:\n", knowledge) Invoke the llm Finally, the LLM can be invoked using the previously defined chain. Inside the invoke method, the retrieved knowledge passages from the original JSON file and the user’s input question are passed in. The returned values conform to the defined AnswerPydantic schema for a structured evaluation the results. answer_obj = rag_chain.invoke( {"knowledge": knowledge, "topic": QUESTION})print("Answerable?", answer_obj.answerable)print("Answer: ", answer_obj.answer) Visual grounding The subsequent script proceeds to visually ground the used chunks in the corresponding documents only if the question can be answered based on the provided content, meaning the LLM returns that answerable is equal to true/1. This grounding process is approached similarly to how original text passages are extracted from documents. However, this time, the focus is on the bounding boxes surrounding these text passages instead of the raw text itself. These bounding boxes are another outcome of the document analysis and processing performed by the Docling pipeline in the indexing script. Now, these boxes can be utilized to anchor the generated answers to the original pages of the document. The coordinates of these bounding boxes come also from the metadata of each chunk, Moreover, the metadata of a chunk includes the page number, which is required to extract the screenshot of the correct page from the corresponding file. This screenshot of the page is the foundation for the visual grounding process, as bounding boxes are added on top of it. In the end, these bounding boxes are on top of the retrieved image of the correct side, allowing the user to precisely identify from which part of the document the information was retrieved. The final visually enhanced images are then stored in the same output directory as the input document and are ready for visual inspection. if answer_obj.answerable: for i, (res, score) in enumerate(results, start=1): meta = DocMeta.model_validate(res.metadata["dl_meta"]) h = meta.origin.binary_hash json_file = doc_store.get(h) if not json_file: continue dl_doc = DoclingDocument.load_from_json(json_file) image_by_page: dict[int, "Image.Image"] = {} for item in meta.doc_items: if not item.prov: continue prov = item.prov[0] p = prov.page_no if p not in image_by_page: image_by_page[p] = dl_doc.pages[p].image.pil_image.copy() img = image_by_page[p] bbox = prov.bbox.to_top_left_origin( page_height=dl_doc.pages[p].size.height ).normalized(dl_doc.pages[p].size) left = round(bbox.l * img.width) - 2 top = round(bbox.t * img.height) - 2 right = round(bbox.r * img.width) + 2 bottom = round(bbox.b * img.height) + 2 draw = ImageDraw.Draw(img) draw.rectangle([left, top, right, bottom], outline="blue", width=2) for p, img in image_by_page.items(): out_png = OUTPUT_DIR / f"source_{i}_page_{p}.png" img.save(out_png) print(f"Saved annotated page {p} → {out_png}") Limitation However, no solution is without flaws, and this one is no exception. The first point to consider is the heavy reliance on Docling. This tool is deeply integrated into the RAG framework, as it is used in almost every part of the system, including document parsing, text chunking, and grounding the final answer, which depends on the DoclingDocument class. Another point is the processing speed and resource requirements. While the advantage of Docling is that it can be deployed entirely on your local device, its processing speed and the maximum manageable file size depend significantly on your hardware. If your hardware is not powerful enough, you may only be able to parse one page of a document at a time. Additionally, a limitation of the provided example in this blog post is its focus solely on text. It ignores images and tables that also contain rich information and leaves space for further enhancements. Conclusion But despite these limitations, Docling is a neat way to enhance advanced RAG frameworks quickly with additional visual grounding capabilities. By connecting retrieved answers directly to their visual origin, this approach not only boosts transparency but also helps users build trust in the system’s output. What is your opinion about Docling and this introduced RAG framework? Have you already experimented with other grounding approaches for more explainable AI?Sources Docling Paper: https://arxiv.org/abs/2501.17887 Docling Documentation: https://docling-project.github.io/docling/ Qdrant local quickstart: https://qdrant.tech/documentation/quickstart/ LangChain ChatMistralAI: https://python.langchain.com/docs/integrations/chat/mistralai/ LangChain structured outputs: https://python.langchain.com/docs/concepts/structured_outputs/ Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI
    0 Comments 0 Shares
  • #333;">Bizarre iPhone bug causes some audio messages to fail. Here’s why
    Macworld
    Super-weird bugs in Messages are nothing new, but this latest one is a real head-scratcher: If you try to send an audio message with the phrase “Dave and Buster’s,” it won’t work.
    Why would that specific phrasing cause a problem? A coding expert has cracked the case.
    I won’t say “and the reason will shock you,” but if you’re anything like me, you’ll find it interesting.
    First, let me explain what happens when the bug triggers.
    At first, the audio message (“I’m off to eat lunch at Dave and Buster’s,” as an example) appears to send normally.
    It shows up in the Messages thread to the recipient, along with a transcript of the content.
    No problem is flagged.
    It’s at the recipient’s end that we spot the issue.
    Initially the recipient sees the ellipsis icon, indicating that something is being typed or sent… but this carries on, and carries on, and eventually disappears.
    And at this point there is no indication that anything has been sent at all: no message, no message transcript, no message failed notification.
    In fact, if the recipient didn’t happen to have the app open, or had it open but was in a different conversation thread, they never would have known something was supposed to be on the way.
    This bug is new to me, and the first time I heard about it was when it was discussed on Monday in the blog run by Guilherme Rambo, a coding and engineering expert.
    Rambo, in turn, heard about the bug on the Search Engine podcast, which devoted its May 9 episode to the subject.
    Rambo reproduced the bug, guessed the problem must be at the recipient end, then plugged that device into his Mac and started looking at logs.
    And from that point it doesn’t appear to have taken long for him to work out what was going on: iOS’s transcription engine was recognizing the name of the U.S.
    restaurant chain, changing it to the correct corporate branding (“Dave & Buster’s,” with an all-important ampersand), and then passing that into the XHTML code used to send a transcript with the audio message.
    The problem isn’t being caused by the words Dave and Buster’s, but by the ampersand character between them, which has a special purpose in coding and prevents the code from being parsed correctly.



    The phrase “Dave and Buster’s” doesn’t cause a problem in the U.K.
    because iOS doesn’t add an ampersand (or even an apostrophe).David Price / Foundry
    As you can see in the image at the top of this story, a seemingly successfully sent audio iMessage ending with the phrase “Dave & Buster’s” appears as sent but never actually appears on the recipient’s phone.
    After a while, the audio message disappeared from the sender’s phone, and the recipient was completely unaware that the message had ever been sent.
    With that in mind, it’s a short leap to recognize that other brands could cause the same issue—they just haven’t been spotted doing so up to now.
    Rambo notes that “M&Ms” will do the same thing.
    For U.K.
    iPhone owners, in fact, “Dave and Buster’s” doesn’t trigger the bug because that chain is evidently not well enough known here and doesn’t get its ampersand added by autocorrect.

    To reproduce the issue, I had to ask a friend to send me a message about the supermarket chain M&S.
    Sure enough, this caused the hanging ellipsis followed by an unsent message.
    At the time of writing, it seems almost certain that any phrase iOS would recognize as containing an ampersand would cause an audio message to fail, and when I put it like that, it’s surprising the bug hasn’t been more widely reported.



    But here’s what happens when a U.K.
    user tries to send a message about the supermarket chain M&S, complete with ampersand.Karen Haslam / Foundry
    On the plus side, one would imagine it’s a bug that should be easy to patch in an iOS update.
    The transcription feature in Messages simply needs to be told to “escape” special characters so they don’t mess up the parsing process.
    And as Rambo notes, this isn’t a bug with any security vulnerabilities; indeed, it shows Apple’s BlastDoor mechanism working correctly.
    “Many bad parsers would probably accept the incorrectly-formatted XHTML,” he writes, “but that sort of leniency when parsing data formats is often what ends up causing security issues.
    By being pedantic about the formatting, BlastDoor is protecting the recipient from an exploit that would abuse that type of issue.”
    #0066cc;">#bizarre #iphone #bug #causes #some #audio #messages #fail #heres #why #macworldsuperweird #bugs #are #nothing #new #but #this #latest #one #real #headscratcher #you #try #send #message #with #the #phrase #dave #and #busters #wont #workwhy #would #that #specific #phrasing #cause #problem #coding #expert #has #cracked #casei #say #reason #will #shock #youre #anything #like #youll #find #interestingfirst #let #explain #what #happens #when #triggersat #first #off #eat #lunch #example #appears #normallyit #shows #thread #recipient #along #transcript #contentno #flaggedits #recipients #end #spot #issueinitially #sees #ellipsis #icon #indicating #something #being #typed #sent #carries #eventually #disappearsand #point #there #indication #been #all #failed #notificationin #fact #didnt #happen #have #app #open #had #was #different #conversation #they #never #known #supposed #waythis #time #heard #about #discussed #monday #blog #run #guilherme #rambo #engineering #expertrambo #turn #search #engine #podcast #which #devoted #its #may #episode #subjectrambo #reproduced #guessed #must #then #plugged #device #into #his #mac #started #looking #logsand #from #doesnt #appear #taken #long #for #him #work #out #going #ioss #transcription #recognizing #name #usrestaurant #chain #changing #correct #corporate #branding #ampamp #allimportant #ampersand #passing #xhtml #code #used #messagethe #isnt #caused #words #character #between #them #special #purpose #prevents #parsed #correctlythe #ukbecause #ios #add #even #apostrophedavid #price #foundryas #can #see #image #top #story #seemingly #successfully #imessage #ending #actually #phoneafter #while #disappeared #senders #phone #completely #unaware #ever #sentwith #mind #short #leap #recognize #other #brands #could #same #issuethey #just #havent #spotted #doing #nowrambo #notes #mampampms #thingfor #ukiphone #owners #trigger #because #evidently #not #well #enough #here #get #added #autocorrectto #reproduce #issue #ask #friend #supermarket #mampampssure #hanging #followed #unsent #messageat #writing #seems #almost #certain #any #containing #put #surprising #hasnt #more #widely #reportedbut #ukuser #tries #mampamps #complete #ampersandkaren #haslam #foundryon #plus #side #imagine #should #easy #patch #updatethe #feature #simply #needs #told #escape #characters #dont #mess #parsing #processand #security #vulnerabilities #indeed #apples #blastdoor #mechanism #working #correctlymany #bad #parsers #probably #accept #incorrectlyformatted #writes #sort #leniency #data #formats #often #ends #causing #issuesby #pedantic #formatting #protecting #exploit #abuse #type
    Bizarre iPhone bug causes some audio messages to fail. Here’s why
    Macworld Super-weird bugs in Messages are nothing new, but this latest one is a real head-scratcher: If you try to send an audio message with the phrase “Dave and Buster’s,” it won’t work. Why would that specific phrasing cause a problem? A coding expert has cracked the case. I won’t say “and the reason will shock you,” but if you’re anything like me, you’ll find it interesting. First, let me explain what happens when the bug triggers. At first, the audio message (“I’m off to eat lunch at Dave and Buster’s,” as an example) appears to send normally. It shows up in the Messages thread to the recipient, along with a transcript of the content. No problem is flagged. It’s at the recipient’s end that we spot the issue. Initially the recipient sees the ellipsis icon, indicating that something is being typed or sent… but this carries on, and carries on, and eventually disappears. And at this point there is no indication that anything has been sent at all: no message, no message transcript, no message failed notification. In fact, if the recipient didn’t happen to have the app open, or had it open but was in a different conversation thread, they never would have known something was supposed to be on the way. This bug is new to me, and the first time I heard about it was when it was discussed on Monday in the blog run by Guilherme Rambo, a coding and engineering expert. Rambo, in turn, heard about the bug on the Search Engine podcast, which devoted its May 9 episode to the subject. Rambo reproduced the bug, guessed the problem must be at the recipient end, then plugged that device into his Mac and started looking at logs. And from that point it doesn’t appear to have taken long for him to work out what was going on: iOS’s transcription engine was recognizing the name of the U.S. restaurant chain, changing it to the correct corporate branding (“Dave & Buster’s,” with an all-important ampersand), and then passing that into the XHTML code used to send a transcript with the audio message. The problem isn’t being caused by the words Dave and Buster’s, but by the ampersand character between them, which has a special purpose in coding and prevents the code from being parsed correctly. The phrase “Dave and Buster’s” doesn’t cause a problem in the U.K. because iOS doesn’t add an ampersand (or even an apostrophe).David Price / Foundry As you can see in the image at the top of this story, a seemingly successfully sent audio iMessage ending with the phrase “Dave & Buster’s” appears as sent but never actually appears on the recipient’s phone. After a while, the audio message disappeared from the sender’s phone, and the recipient was completely unaware that the message had ever been sent. With that in mind, it’s a short leap to recognize that other brands could cause the same issue—they just haven’t been spotted doing so up to now. Rambo notes that “M&Ms” will do the same thing. For U.K. iPhone owners, in fact, “Dave and Buster’s” doesn’t trigger the bug because that chain is evidently not well enough known here and doesn’t get its ampersand added by autocorrect. To reproduce the issue, I had to ask a friend to send me a message about the supermarket chain M&S. Sure enough, this caused the hanging ellipsis followed by an unsent message. At the time of writing, it seems almost certain that any phrase iOS would recognize as containing an ampersand would cause an audio message to fail, and when I put it like that, it’s surprising the bug hasn’t been more widely reported. But here’s what happens when a U.K. user tries to send a message about the supermarket chain M&S, complete with ampersand.Karen Haslam / Foundry On the plus side, one would imagine it’s a bug that should be easy to patch in an iOS update. The transcription feature in Messages simply needs to be told to “escape” special characters so they don’t mess up the parsing process. And as Rambo notes, this isn’t a bug with any security vulnerabilities; indeed, it shows Apple’s BlastDoor mechanism working correctly. “Many bad parsers would probably accept the incorrectly-formatted XHTML,” he writes, “but that sort of leniency when parsing data formats is often what ends up causing security issues. By being pedantic about the formatting, BlastDoor is protecting the recipient from an exploit that would abuse that type of issue.”
    المصدر: www.macworld.com
    #bizarre #iphone #bug #causes #some #audio #messages #fail #heres #why #macworldsuperweird #bugs #are #nothing #new #but #this #latest #one #real #headscratcher #you #try #send #message #with #the #phrase #dave #and #busters #wont #workwhy #would #that #specific #phrasing #cause #problem #coding #expert #has #cracked #casei #say #reason #will #shock #youre #anything #like #youll #find #interestingfirst #let #explain #what #happens #when #triggersat #first #off #eat #lunch #example #appears #normallyit #shows #thread #recipient #along #transcript #contentno #flaggedits #recipients #end #spot #issueinitially #sees #ellipsis #icon #indicating #something #being #typed #sent #carries #eventually #disappearsand #point #there #indication #been #all #failed #notificationin #fact #didnt #happen #have #app #open #had #was #different #conversation #they #never #known #supposed #waythis #time #heard #about #discussed #monday #blog #run #guilherme #rambo #engineering #expertrambo #turn #search #engine #podcast #which #devoted #its #may #episode #subjectrambo #reproduced #guessed #must #then #plugged #device #into #his #mac #started #looking #logsand #from #doesnt #appear #taken #long #for #him #work #out #going #ioss #transcription #recognizing #name #usrestaurant #chain #changing #correct #corporate #branding #ampamp #allimportant #ampersand #passing #xhtml #code #used #messagethe #isnt #caused #words #character #between #them #special #purpose #prevents #parsed #correctlythe #ukbecause #ios #add #even #apostrophedavid #price #foundryas #can #see #image #top #story #seemingly #successfully #imessage #ending #actually #phoneafter #while #disappeared #senders #phone #completely #unaware #ever #sentwith #mind #short #leap #recognize #other #brands #could #same #issuethey #just #havent #spotted #doing #nowrambo #notes #mampampms #thingfor #ukiphone #owners #trigger #because #evidently #not #well #enough #here #get #added #autocorrectto #reproduce #issue #ask #friend #supermarket #mampampssure #hanging #followed #unsent #messageat #writing #seems #almost #certain #any #containing #put #surprising #hasnt #more #widely #reportedbut #ukuser #tries #mampamps #complete #ampersandkaren #haslam #foundryon #plus #side #imagine #should #easy #patch #updatethe #feature #simply #needs #told #escape #characters #dont #mess #parsing #processand #security #vulnerabilities #indeed #apples #blastdoor #mechanism #working #correctlymany #bad #parsers #probably #accept #incorrectlyformatted #writes #sort #leniency #data #formats #often #ends #causing #issuesby #pedantic #formatting #protecting #exploit #abuse #type
    WWW.MACWORLD.COM
    Bizarre iPhone bug causes some audio messages to fail. Here’s why
    Macworld Super-weird bugs in Messages are nothing new, but this latest one is a real head-scratcher: If you try to send an audio message with the phrase “Dave and Buster’s,” it won’t work. Why would that specific phrasing cause a problem? A coding expert has cracked the case. I won’t say “and the reason will shock you,” but if you’re anything like me, you’ll find it interesting. First, let me explain what happens when the bug triggers. At first, the audio message (“I’m off to eat lunch at Dave and Buster’s,” as an example) appears to send normally. It shows up in the Messages thread to the recipient, along with a transcript of the content. No problem is flagged. It’s at the recipient’s end that we spot the issue. Initially the recipient sees the ellipsis icon, indicating that something is being typed or sent… but this carries on, and carries on, and eventually disappears. And at this point there is no indication that anything has been sent at all: no message, no message transcript, no message failed notification. In fact, if the recipient didn’t happen to have the app open, or had it open but was in a different conversation thread, they never would have known something was supposed to be on the way. This bug is new to me, and the first time I heard about it was when it was discussed on Monday in the blog run by Guilherme Rambo, a coding and engineering expert. Rambo, in turn, heard about the bug on the Search Engine podcast, which devoted its May 9 episode to the subject. Rambo reproduced the bug, guessed the problem must be at the recipient end, then plugged that device into his Mac and started looking at logs. And from that point it doesn’t appear to have taken long for him to work out what was going on: iOS’s transcription engine was recognizing the name of the U.S. restaurant chain, changing it to the correct corporate branding (“Dave & Buster’s,” with an all-important ampersand), and then passing that into the XHTML code used to send a transcript with the audio message. The problem isn’t being caused by the words Dave and Buster’s, but by the ampersand character between them, which has a special purpose in coding and prevents the code from being parsed correctly. The phrase “Dave and Buster’s” doesn’t cause a problem in the U.K. because iOS doesn’t add an ampersand (or even an apostrophe).David Price / Foundry As you can see in the image at the top of this story, a seemingly successfully sent audio iMessage ending with the phrase “Dave & Buster’s” appears as sent but never actually appears on the recipient’s phone. After a while, the audio message disappeared from the sender’s phone, and the recipient was completely unaware that the message had ever been sent. With that in mind, it’s a short leap to recognize that other brands could cause the same issue—they just haven’t been spotted doing so up to now. Rambo notes that “M&Ms” will do the same thing. For U.K. iPhone owners, in fact, “Dave and Buster’s” doesn’t trigger the bug because that chain is evidently not well enough known here and doesn’t get its ampersand added by autocorrect. To reproduce the issue, I had to ask a friend to send me a message about the supermarket chain M&S. Sure enough, this caused the hanging ellipsis followed by an unsent message. At the time of writing, it seems almost certain that any phrase iOS would recognize as containing an ampersand would cause an audio message to fail, and when I put it like that, it’s surprising the bug hasn’t been more widely reported. But here’s what happens when a U.K. user tries to send a message about the supermarket chain M&S, complete with ampersand.Karen Haslam / Foundry On the plus side, one would imagine it’s a bug that should be easy to patch in an iOS update. The transcription feature in Messages simply needs to be told to “escape” special characters so they don’t mess up the parsing process. And as Rambo notes, this isn’t a bug with any security vulnerabilities; indeed, it shows Apple’s BlastDoor mechanism working correctly. “Many bad parsers would probably accept the incorrectly-formatted XHTML,” he writes, “but that sort of leniency when parsing data formats is often what ends up causing security issues. By being pedantic about the formatting, BlastDoor is protecting the recipient from an exploit that would abuse that type of issue.”
    0 Comments 0 Shares