• Agentic RAG Applications: Company Knowledge Slack Agents

    Lessons learnt using LlamaIndex and Modal
    The post Agentic RAG Applications: Company Knowledge Slack Agents appeared first on Towards Data Science.
    #agentic #rag #applications #company #knowledge
    Agentic RAG Applications: Company Knowledge Slack Agents
    Lessons learnt using LlamaIndex and Modal The post Agentic RAG Applications: Company Knowledge Slack Agents appeared first on Towards Data Science. #agentic #rag #applications #company #knowledge
    Agentic RAG Applications: Company Knowledge Slack Agents
    Lessons learnt using LlamaIndex and Modal The post Agentic RAG Applications: Company Knowledge Slack Agents appeared first on Towards Data Science.
    0 Commentarii 0 Distribuiri 0 previzualizare
  • Building AI Applications in Ruby

    This is the second in a multi-part series on creating web applications with generative AI integration. Part 1 focused on explaining the AI stack and why the application layer is the best place in the stack to be. Check it out here.

    Table of Contents

    Introduction

    I thought spas were supposed to be relaxing?

    Microservices are for Macrocompanies

    Ruby and Python: Two Sides of the Same Coin

    Recent AI Based Gems

    Summary

    Introduction

    It’s not often that you hear the Ruby language mentioned when discussing AI.

    Python, of course, is the king in this world, and for good reason. The community has coalesced around the language. Most model training is done in PyTorch or TensorFlow these days. Scikit-learn and Keras are also very popular. RAG frameworks such as LangChain and LlamaIndex cater primarily to Python.

    However, when it comes to building web applications with AI integration, I believe Ruby is the better language.

    As the co-founder of an agency dedicated to building MVPs with generative AI integration, I frequently hear potential clients complaining about two things:

    Applications take too long to build

    Developers are quoting insane prices to build custom web apps

    These complaints have a common source: complexity. Modern web apps have a lot more complexity in them than in the good ol’ days. But why is this? Are the benefits brought by complexity worth the cost?

    I thought spas were supposed to be relaxing?

    One big piece of the puzzle is the recent rise of single-page applications. The most popular stack used today in building modern SPAs is MERN . The stack is popular for a few reasons:

    It is a JavaScript-only stack, across both front-end and back-end. Having to only code in only one language is pretty nice!

    SPAs can offer dynamic designs and a “smooth” user experience. Smooth here means that when some piece of data changes, only a part of the site is updated, as opposed to having to reload the whole page. Of course, if you don’t have a modern smartphone, SPAs won’t feel so smooth, as they tend to be pretty heavy. All that JavaScript starts to drag down the performance.

    There is a large ecosystem of libraries and developers with experience in this stack. This is pretty circular logic: is the stack popular because of the ecosystem, or is there an ecosystem because of the popularity? Either way, this point stands.React was created by Meta.

    Lots of money and effort has been thrown at the library, helping to polish and promote the product.

    Unfortunately, there are some downsides of working in the MERN stack, the most critical being the sheer complexity.

    Traditional web development was done using the Model-View-Controllerparadigm. In MVC, all of the logic managing a user’s session is handled in the backend, on the server. Something like fetching a user’s data was done via function calls and SQL statements in the backend. The backend then serves fully built HTML and CSS to the browser, which just has to display it. Hence the name “server”.

    In a SPA, this logic is handled on the user’s browser, in the frontend. SPAs have to handle UI state, application state, and sometimes even server state all in the browser. API calls have to be made to the backend to fetch user data. There is still quite a bit of logic on the backend, mainly exposing data and functionality through APIs.

    To illustrate the difference, let me use the analogy of a commercial kitchen. The customer will be the frontend and the kitchen will be the backend.

    MVCs vs. SPAs. Image generated by ChatGPT.

    Traditional MVC apps are like dining at a full-service restaurant. Yes, there is a lot of complexityin the backend. But the frontend experience is simple and satisfying: all the customer has to do is pick up a fork and eat their food.

    SPAs are like eating at a buffet-style dining restaurant. There is still quite a bit of complexity in the kitchen. But now the customer also has to decide what food to grab, how to combine them, how to arrange them on the plate, where to put the plate when finished, etc.

    Andrej Karpathy had a tweet recently discussing his frustration with attempting to build web apps in 2025. It can be overwhelming for those new to the space.

    The reality of building web apps in 2025 is that it's a bit like assembling IKEA furniture. There's no "full-stack" product with batteries included, you have to piece together and configure many individual services:– frontend / backend– hosting…— Andrej KarpathyMarch 27, 2025

    In order to build MVPs with AI integration rapidly, our agency has decided to forgo the SPA and instead go with the traditional MVC approach. In particular, we have found Ruby on Railsto be the framework best suited to quickly developing and deploying quality apps with AI integration. Ruby on Rails was developed by David Heinemeier Hansson in 2004 and has long been known as a great web framework, but I would argue it has recently made leaps in its ability to incorporate AI into apps, as we will see.

    Django is the most popular Python web framework, and also has a more traditional pattern of development. Unfortunately, in our testing we found Django was simply not as full-featured or “batteries included” as Rails is. As a simple example, Django has no built-in background job system. Nearly all of our apps incorporate background jobs, so to not include this was disappointing. We also prefer how Rails emphasizes simplicity, with Rails 8 encouraging developers to easily self-host their apps instead of going through a provider like Heroku. They also recently released a stack of tools meant to replace external services like Redis.

    “But what about the smooth user experience?” you might ask. The truth is that modern Rails includes several ways of crafting SPA-like experiences without all of the heavy JavaScript. The primary tool is Hotwire, which bundles tools like Turbo and Stimulus. Turbo lets you dynamically change pieces of HTML on your webpage without writing custom JavaScript. For the times where you do need to include custom JavaScript, Stimulus is a minimal JavaScript framework that lets you do just that. Even if you want to use React, you can do so with the react-rails gem. So you can have your cake, and eat it too!

    SPAs are not the only reason for the increase in complexity, however. Another has to do with the advent of the microservices architecture.

    Microservices are for Macrocompanies

    Once again, we find ourselves comparing the simple past with the complexity of today.

    In the past, software was primarily developed as monoliths. A monolithic application means that all the different parts of your app — such as the user interface, business logic, and data handling — are developed, tested, and deployed as one single unit. The code is all typically housed in a single repo.

    Working with a monolith is simple and satisfying. Running a development setup for testing purposes is easy. You are working with a single database schema containing all of your tables, making queries and joins straightforward. Deployment is simple, since you just have one container to look at and modify.

    However, once your company scales to the size of a Google or Amazon, real problems begin to emerge. With hundreds or thousands of developers contributing simultaneously to a single codebase, coordinating changes and managing merge conflicts becomes increasingly difficult. Deployments also become more complex and risky, since even minor changes can blow up the entire application!

    To manage these issues, large companies began to coalesce around the microservices architecture. This is a style of programming where you design your codebase as a set of small, autonomous services. Each service owns its own codebase, data storage, and deployment pipelines. As a simple example, instead of stuffing all of your logic regarding an OpenAI client into your main app, you can move that logic into its own service. To call that service, you would then typically make REST calls, as opposed to function calls. This ups the complexity, but resolves the merge conflict and deployment issues, since each team in the organization gets to work on their own island of code.

    Another benefit to using microservices is that they allow for a polyglot tech stack. This means that each team can code up their service using whatever language they prefer. If one team prefers JavaScript while another likes Python, this is no issue. When we first began our agency, this idea of a polyglot stack pushed us to use a microservices architecture. Not because we had a large team, but because we each wanted to use the “best” language for each functionality. This meant:

    Using Ruby on Rails for web development. It’s been battle-tested in this area for decades.

    Using Python for the AI integration, perhaps deployed with something like FastAPI. Serious AI work requires Python, I was led to believe.

    Two different languages, each focused on its area of specialty. What could go wrong?

    Unfortunately, we found the process of development frustrating. Just setting up our dev environment was time-consuming. Having to wrangle Docker compose files and manage inter-service communication made us wish we could go back to the beauty and simplicity of the monolith. Having to make a REST call and set up the appropriate routing in FastAPI instead of making a simple function call sucked.

    “Surely we can’t develop AI apps in pure Ruby,” I thought. And then I gave it a try.

    And I’m glad I did.

    I found the process of developing an MVP with AI integration in Ruby very satisfying. We were able to sprint where before we were jogging. I loved the emphasis on beauty, simplicity, and developer happiness in the Ruby community. And I found the state of the AI ecosystem in Ruby to be surprisingly mature and getting better every day.

    If you are a Python programmer and are scared off by learning a new language like I was, let me comfort you by discussing the similarities between the Ruby and Python languages.

    Ruby and Python: Two Sides of the Same Coin

    I consider Python and Ruby to be like cousins. Both languages incorporate:

    High-level Interpretation: This means they abstract away a lot of the complexity of low-level programming details, such as memory management.

    Dynamic Typing: Neither language requires you to specify if a variable is an int, float, string, etc. The types are checked at runtime.

    Object-Oriented Programming: Both languages are object-oriented. Both support classes, inheritance, polymorphism, etc. Ruby is more “pure”, in the sense that literally everything is an object, whereas in Python a few thingsare not objects.

    Readable and Concise Syntax: Both are considered easy to learn. Either is great for a first-time learner.

    Wide Ecosystem of Packages: Packages to do all sorts of cool things are available in both languages. In Python they are called libraries, and in Ruby they are called gems.

    The primary difference between the two languages lies in their philosophy and design principles. Python’s core philosophy can be described as:

    There should be one — and preferably only one — obvious way to do something.

    In theory, this should emphasize simplicity, readability, and clarity. Ruby’s philosophy can be described as:

    There’s always more than one way to do something. Maximize developer happiness.

    This was a shock to me when I switched over from Python. Check out this simple example emphasizing this philosophical difference:

    # A fight over philosophy: iterating over an array
    # Pythonic way
    for i in range:
    print# Ruby way, option 1.each do |i|
    puts i
    end

    # Ruby way, option 2
    for i in 1..5
    puts i
    end

    # Ruby way, option 3
    5.times do |i|
    puts i + 1
    end

    # Ruby way, option 4.each { |i| puts i }

    Another difference between the two is syntax style. Python primarily uses indentation to denote code blocks, while Ruby uses do…end or {…} blocks. Most include indentation inside Ruby blocks, but this is entirely optional. Examples of these syntactic differences can be seen in the code shown above.

    There are a lot of other little differences to learn. For example, in Python string interpolation is done using f-strings: f"Hello, {name}!", while in Ruby they are done using hashtags: "Hello, #{name}!". Within a few months, I think any competent Python programmer can transfer their proficiency over to Ruby.

    Recent AI-based Gems

    Despite not being in the conversation when discussing AI, Ruby has had some recent advancements in the world of gems. I will highlight some of the most impressive recent releases that we have been using in our agency to build AI apps:

    RubyLLM — Any GitHub repo that gets more than 2k stars within a few weeks of release deserves a mention, and RubyLLM is definitely worthy. I have used many clunky implementations of LLM providers from libraries like LangChain and LlamaIndex, so using RubyLLM was like a breath of fresh air. As a simple example, let’s take a look at a tutorial demonstrating multi-turn conversations:

    require 'ruby_llm'

    # Create a model and give it instructions
    chat = RubyLLM.chat
    chat.with_instructions "You are a friendly Ruby expert who loves to help beginners."

    # Multi-turn conversation
    chat.ask "Hi! What does attr_reader do in Ruby?"
    # => "Ruby creates a getter method for each symbol...

    # Stream responses in real time
    chat.ask "Could you give me a short example?" do |chunk|
    print chunk.content
    end
    # => "Sure!
    # ```ruby
    # class Person
    # attr...

    Simply amazing. Multi-turn conversations are handled automatically for you. Streaming is a breeze. Compare this to a similar implementation in LangChain:

    from langchain_openai import ChatOpenAI
    from langchain_core.schema import SystemMessage, HumanMessage, AIMessage
    from langchain_core.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

    SYSTEM_PROMPT = "You are a friendly Ruby expert who loves to help beginners."
    chat = ChatOpenAI])

    history =def ask-> None:
    """Stream the answer token-by-token and keep the context in memory."""
    history.append)
    # .stream yields message chunks as they arrive
    for chunk in chat.stream:
    printprint# newline after the answer
    # the final chunk has the full message content
    history.append)

    askaskYikes. And it’s important to note that this is a grug implementation. Want to know how LangChain really expects you to manage memory? Check out these links, but grab a bucket first; you may get sick.

    Neighbors — This is an excellent library to use for nearest-neighbors search in a Rails application. Very useful in a RAG setup. It integrates with Postgres, SQLite, MySQL, MariaDB, and more. It was written by Andrew Kane, the same guy who wrote the pgvector extension that allows Postgres to behave as a vector database.

    Async — This gem had its first official release back in December 2024, and it has been making waves in the Ruby community. Async is a fiber-based framework for Ruby that runs non-blocking I/O tasks concurrently while letting you write simple, sequential code. Fibers are like mini-threads that each have their own mini call stack. While not strictly a gem for AI, it has helped us create features like web scrapers that run blazingly fast across thousands of pages. We have also used it to handle streaming of chunks from LLMs.

    Torch.rb — If you are interested in training deep learning models, then surely you have heard of PyTorch. Well, PyTorch is built on LibTorch, which essentially has a lot of C/C++ code under the hood to perform ML operations quickly. Andrew Kane took LibTorch and made a Ruby adapter over it to create Torch.rb, essentially a Ruby version of PyTorch. Andrew Kane has been a hero in the Ruby AI world, authoring dozens of ML gems for Ruby.

    Summary

    In short: building a web application with AI integration quickly and cheaply requires a monolithic architecture. A monolith demands a monolingual application, which is necessary if your end goal is quality apps delivered with speed. Your main options are either Python or Ruby. If you go with Python, you will probably use Django for your web framework. If you go with Ruby, you will be using Ruby on Rails. At our agency, we found Django’s lack of features disappointing. Rails has impressed us with its feature set and emphasis on simplicity. We were thrilled to find almost no issues on the AI side.

    Of course, there are times where you will not want to use Ruby. If you are conducting research in AI or training machine learning models from scratch, then you will likely want to stick with Python. Research almost never involves building Web Applications. At most you’ll build a simple interface or dashboard in a notebook, but nothing production-ready. You’ll likely want the latest PyTorch updates to ensure your training runs quickly. You may even dive into low-level C/C++ programming to squeeze as much performance as you can out of your hardware. Maybe you’ll even try your hand at Mojo.

    But if your goal is to integrate the latest LLMs — either open or closed source — into web applications, then we believe Ruby to be the far superior option. Give it a shot yourselves!

    In part three of this series, I will dive into a fun experiment: just how simple can we make a web application with AI integration? Stay tuned.

     If you’d like a custom web application with generative AI integration, visit losangelesaiapps.com

    The post Building AI Applications in Ruby appeared first on Towards Data Science.
    #building #applications #ruby
    Building AI Applications in Ruby
    This is the second in a multi-part series on creating web applications with generative AI integration. Part 1 focused on explaining the AI stack and why the application layer is the best place in the stack to be. Check it out here. Table of Contents Introduction I thought spas were supposed to be relaxing? Microservices are for Macrocompanies Ruby and Python: Two Sides of the Same Coin Recent AI Based Gems Summary Introduction It’s not often that you hear the Ruby language mentioned when discussing AI. Python, of course, is the king in this world, and for good reason. The community has coalesced around the language. Most model training is done in PyTorch or TensorFlow these days. Scikit-learn and Keras are also very popular. RAG frameworks such as LangChain and LlamaIndex cater primarily to Python. However, when it comes to building web applications with AI integration, I believe Ruby is the better language. As the co-founder of an agency dedicated to building MVPs with generative AI integration, I frequently hear potential clients complaining about two things: Applications take too long to build Developers are quoting insane prices to build custom web apps These complaints have a common source: complexity. Modern web apps have a lot more complexity in them than in the good ol’ days. But why is this? Are the benefits brought by complexity worth the cost? I thought spas were supposed to be relaxing? One big piece of the puzzle is the recent rise of single-page applications. The most popular stack used today in building modern SPAs is MERN . The stack is popular for a few reasons: It is a JavaScript-only stack, across both front-end and back-end. Having to only code in only one language is pretty nice! SPAs can offer dynamic designs and a “smooth” user experience. Smooth here means that when some piece of data changes, only a part of the site is updated, as opposed to having to reload the whole page. Of course, if you don’t have a modern smartphone, SPAs won’t feel so smooth, as they tend to be pretty heavy. All that JavaScript starts to drag down the performance. There is a large ecosystem of libraries and developers with experience in this stack. This is pretty circular logic: is the stack popular because of the ecosystem, or is there an ecosystem because of the popularity? Either way, this point stands.React was created by Meta. Lots of money and effort has been thrown at the library, helping to polish and promote the product. Unfortunately, there are some downsides of working in the MERN stack, the most critical being the sheer complexity. Traditional web development was done using the Model-View-Controllerparadigm. In MVC, all of the logic managing a user’s session is handled in the backend, on the server. Something like fetching a user’s data was done via function calls and SQL statements in the backend. The backend then serves fully built HTML and CSS to the browser, which just has to display it. Hence the name “server”. In a SPA, this logic is handled on the user’s browser, in the frontend. SPAs have to handle UI state, application state, and sometimes even server state all in the browser. API calls have to be made to the backend to fetch user data. There is still quite a bit of logic on the backend, mainly exposing data and functionality through APIs. To illustrate the difference, let me use the analogy of a commercial kitchen. The customer will be the frontend and the kitchen will be the backend. MVCs vs. SPAs. Image generated by ChatGPT. Traditional MVC apps are like dining at a full-service restaurant. Yes, there is a lot of complexityin the backend. But the frontend experience is simple and satisfying: all the customer has to do is pick up a fork and eat their food. SPAs are like eating at a buffet-style dining restaurant. There is still quite a bit of complexity in the kitchen. But now the customer also has to decide what food to grab, how to combine them, how to arrange them on the plate, where to put the plate when finished, etc. Andrej Karpathy had a tweet recently discussing his frustration with attempting to build web apps in 2025. It can be overwhelming for those new to the space. The reality of building web apps in 2025 is that it's a bit like assembling IKEA furniture. There's no "full-stack" product with batteries included, you have to piece together and configure many individual services:– frontend / backend– hosting…— Andrej KarpathyMarch 27, 2025 In order to build MVPs with AI integration rapidly, our agency has decided to forgo the SPA and instead go with the traditional MVC approach. In particular, we have found Ruby on Railsto be the framework best suited to quickly developing and deploying quality apps with AI integration. Ruby on Rails was developed by David Heinemeier Hansson in 2004 and has long been known as a great web framework, but I would argue it has recently made leaps in its ability to incorporate AI into apps, as we will see. Django is the most popular Python web framework, and also has a more traditional pattern of development. Unfortunately, in our testing we found Django was simply not as full-featured or “batteries included” as Rails is. As a simple example, Django has no built-in background job system. Nearly all of our apps incorporate background jobs, so to not include this was disappointing. We also prefer how Rails emphasizes simplicity, with Rails 8 encouraging developers to easily self-host their apps instead of going through a provider like Heroku. They also recently released a stack of tools meant to replace external services like Redis. “But what about the smooth user experience?” you might ask. The truth is that modern Rails includes several ways of crafting SPA-like experiences without all of the heavy JavaScript. The primary tool is Hotwire, which bundles tools like Turbo and Stimulus. Turbo lets you dynamically change pieces of HTML on your webpage without writing custom JavaScript. For the times where you do need to include custom JavaScript, Stimulus is a minimal JavaScript framework that lets you do just that. Even if you want to use React, you can do so with the react-rails gem. So you can have your cake, and eat it too! SPAs are not the only reason for the increase in complexity, however. Another has to do with the advent of the microservices architecture. Microservices are for Macrocompanies Once again, we find ourselves comparing the simple past with the complexity of today. In the past, software was primarily developed as monoliths. A monolithic application means that all the different parts of your app — such as the user interface, business logic, and data handling — are developed, tested, and deployed as one single unit. The code is all typically housed in a single repo. Working with a monolith is simple and satisfying. Running a development setup for testing purposes is easy. You are working with a single database schema containing all of your tables, making queries and joins straightforward. Deployment is simple, since you just have one container to look at and modify. However, once your company scales to the size of a Google or Amazon, real problems begin to emerge. With hundreds or thousands of developers contributing simultaneously to a single codebase, coordinating changes and managing merge conflicts becomes increasingly difficult. Deployments also become more complex and risky, since even minor changes can blow up the entire application! To manage these issues, large companies began to coalesce around the microservices architecture. This is a style of programming where you design your codebase as a set of small, autonomous services. Each service owns its own codebase, data storage, and deployment pipelines. As a simple example, instead of stuffing all of your logic regarding an OpenAI client into your main app, you can move that logic into its own service. To call that service, you would then typically make REST calls, as opposed to function calls. This ups the complexity, but resolves the merge conflict and deployment issues, since each team in the organization gets to work on their own island of code. Another benefit to using microservices is that they allow for a polyglot tech stack. This means that each team can code up their service using whatever language they prefer. If one team prefers JavaScript while another likes Python, this is no issue. When we first began our agency, this idea of a polyglot stack pushed us to use a microservices architecture. Not because we had a large team, but because we each wanted to use the “best” language for each functionality. This meant: Using Ruby on Rails for web development. It’s been battle-tested in this area for decades. Using Python for the AI integration, perhaps deployed with something like FastAPI. Serious AI work requires Python, I was led to believe. Two different languages, each focused on its area of specialty. What could go wrong? Unfortunately, we found the process of development frustrating. Just setting up our dev environment was time-consuming. Having to wrangle Docker compose files and manage inter-service communication made us wish we could go back to the beauty and simplicity of the monolith. Having to make a REST call and set up the appropriate routing in FastAPI instead of making a simple function call sucked. “Surely we can’t develop AI apps in pure Ruby,” I thought. And then I gave it a try. And I’m glad I did. I found the process of developing an MVP with AI integration in Ruby very satisfying. We were able to sprint where before we were jogging. I loved the emphasis on beauty, simplicity, and developer happiness in the Ruby community. And I found the state of the AI ecosystem in Ruby to be surprisingly mature and getting better every day. If you are a Python programmer and are scared off by learning a new language like I was, let me comfort you by discussing the similarities between the Ruby and Python languages. Ruby and Python: Two Sides of the Same Coin I consider Python and Ruby to be like cousins. Both languages incorporate: High-level Interpretation: This means they abstract away a lot of the complexity of low-level programming details, such as memory management. Dynamic Typing: Neither language requires you to specify if a variable is an int, float, string, etc. The types are checked at runtime. Object-Oriented Programming: Both languages are object-oriented. Both support classes, inheritance, polymorphism, etc. Ruby is more “pure”, in the sense that literally everything is an object, whereas in Python a few thingsare not objects. Readable and Concise Syntax: Both are considered easy to learn. Either is great for a first-time learner. Wide Ecosystem of Packages: Packages to do all sorts of cool things are available in both languages. In Python they are called libraries, and in Ruby they are called gems. The primary difference between the two languages lies in their philosophy and design principles. Python’s core philosophy can be described as: There should be one — and preferably only one — obvious way to do something. In theory, this should emphasize simplicity, readability, and clarity. Ruby’s philosophy can be described as: There’s always more than one way to do something. Maximize developer happiness. This was a shock to me when I switched over from Python. Check out this simple example emphasizing this philosophical difference: # A fight over philosophy: iterating over an array # Pythonic way for i in range: print# Ruby way, option 1.each do |i| puts i end # Ruby way, option 2 for i in 1..5 puts i end # Ruby way, option 3 5.times do |i| puts i + 1 end # Ruby way, option 4.each { |i| puts i } Another difference between the two is syntax style. Python primarily uses indentation to denote code blocks, while Ruby uses do…end or {…} blocks. Most include indentation inside Ruby blocks, but this is entirely optional. Examples of these syntactic differences can be seen in the code shown above. There are a lot of other little differences to learn. For example, in Python string interpolation is done using f-strings: f"Hello, {name}!", while in Ruby they are done using hashtags: "Hello, #{name}!". Within a few months, I think any competent Python programmer can transfer their proficiency over to Ruby. Recent AI-based Gems Despite not being in the conversation when discussing AI, Ruby has had some recent advancements in the world of gems. I will highlight some of the most impressive recent releases that we have been using in our agency to build AI apps: RubyLLM — Any GitHub repo that gets more than 2k stars within a few weeks of release deserves a mention, and RubyLLM is definitely worthy. I have used many clunky implementations of LLM providers from libraries like LangChain and LlamaIndex, so using RubyLLM was like a breath of fresh air. As a simple example, let’s take a look at a tutorial demonstrating multi-turn conversations: require 'ruby_llm' # Create a model and give it instructions chat = RubyLLM.chat chat.with_instructions "You are a friendly Ruby expert who loves to help beginners." # Multi-turn conversation chat.ask "Hi! What does attr_reader do in Ruby?" # => "Ruby creates a getter method for each symbol... # Stream responses in real time chat.ask "Could you give me a short example?" do |chunk| print chunk.content end # => "Sure! # ```ruby # class Person # attr... Simply amazing. Multi-turn conversations are handled automatically for you. Streaming is a breeze. Compare this to a similar implementation in LangChain: from langchain_openai import ChatOpenAI from langchain_core.schema import SystemMessage, HumanMessage, AIMessage from langchain_core.callbacks.streaming_stdout import StreamingStdOutCallbackHandler SYSTEM_PROMPT = "You are a friendly Ruby expert who loves to help beginners." chat = ChatOpenAI]) history =def ask-> None: """Stream the answer token-by-token and keep the context in memory.""" history.append) # .stream yields message chunks as they arrive for chunk in chat.stream: printprint# newline after the answer # the final chunk has the full message content history.append) askaskYikes. And it’s important to note that this is a grug implementation. Want to know how LangChain really expects you to manage memory? Check out these links, but grab a bucket first; you may get sick. Neighbors — This is an excellent library to use for nearest-neighbors search in a Rails application. Very useful in a RAG setup. It integrates with Postgres, SQLite, MySQL, MariaDB, and more. It was written by Andrew Kane, the same guy who wrote the pgvector extension that allows Postgres to behave as a vector database. Async — This gem had its first official release back in December 2024, and it has been making waves in the Ruby community. Async is a fiber-based framework for Ruby that runs non-blocking I/O tasks concurrently while letting you write simple, sequential code. Fibers are like mini-threads that each have their own mini call stack. While not strictly a gem for AI, it has helped us create features like web scrapers that run blazingly fast across thousands of pages. We have also used it to handle streaming of chunks from LLMs. Torch.rb — If you are interested in training deep learning models, then surely you have heard of PyTorch. Well, PyTorch is built on LibTorch, which essentially has a lot of C/C++ code under the hood to perform ML operations quickly. Andrew Kane took LibTorch and made a Ruby adapter over it to create Torch.rb, essentially a Ruby version of PyTorch. Andrew Kane has been a hero in the Ruby AI world, authoring dozens of ML gems for Ruby. Summary In short: building a web application with AI integration quickly and cheaply requires a monolithic architecture. A monolith demands a monolingual application, which is necessary if your end goal is quality apps delivered with speed. Your main options are either Python or Ruby. If you go with Python, you will probably use Django for your web framework. If you go with Ruby, you will be using Ruby on Rails. At our agency, we found Django’s lack of features disappointing. Rails has impressed us with its feature set and emphasis on simplicity. We were thrilled to find almost no issues on the AI side. Of course, there are times where you will not want to use Ruby. If you are conducting research in AI or training machine learning models from scratch, then you will likely want to stick with Python. Research almost never involves building Web Applications. At most you’ll build a simple interface or dashboard in a notebook, but nothing production-ready. You’ll likely want the latest PyTorch updates to ensure your training runs quickly. You may even dive into low-level C/C++ programming to squeeze as much performance as you can out of your hardware. Maybe you’ll even try your hand at Mojo. But if your goal is to integrate the latest LLMs — either open or closed source — into web applications, then we believe Ruby to be the far superior option. Give it a shot yourselves! In part three of this series, I will dive into a fun experiment: just how simple can we make a web application with AI integration? Stay tuned.  If you’d like a custom web application with generative AI integration, visit losangelesaiapps.com The post Building AI Applications in Ruby appeared first on Towards Data Science. #building #applications #ruby
    TOWARDSDATASCIENCE.COM
    Building AI Applications in Ruby
    This is the second in a multi-part series on creating web applications with generative AI integration. Part 1 focused on explaining the AI stack and why the application layer is the best place in the stack to be. Check it out here. Table of Contents Introduction I thought spas were supposed to be relaxing? Microservices are for Macrocompanies Ruby and Python: Two Sides of the Same Coin Recent AI Based Gems Summary Introduction It’s not often that you hear the Ruby language mentioned when discussing AI. Python, of course, is the king in this world, and for good reason. The community has coalesced around the language. Most model training is done in PyTorch or TensorFlow these days. Scikit-learn and Keras are also very popular. RAG frameworks such as LangChain and LlamaIndex cater primarily to Python. However, when it comes to building web applications with AI integration, I believe Ruby is the better language. As the co-founder of an agency dedicated to building MVPs with generative AI integration, I frequently hear potential clients complaining about two things: Applications take too long to build Developers are quoting insane prices to build custom web apps These complaints have a common source: complexity. Modern web apps have a lot more complexity in them than in the good ol’ days. But why is this? Are the benefits brought by complexity worth the cost? I thought spas were supposed to be relaxing? One big piece of the puzzle is the recent rise of single-page applications (SPAs). The most popular stack used today in building modern SPAs is MERN (MongoDB, Express.js, React.js, Node.js). The stack is popular for a few reasons: It is a JavaScript-only stack, across both front-end and back-end. Having to only code in only one language is pretty nice! SPAs can offer dynamic designs and a “smooth” user experience. Smooth here means that when some piece of data changes, only a part of the site is updated, as opposed to having to reload the whole page. Of course, if you don’t have a modern smartphone, SPAs won’t feel so smooth, as they tend to be pretty heavy. All that JavaScript starts to drag down the performance. There is a large ecosystem of libraries and developers with experience in this stack. This is pretty circular logic: is the stack popular because of the ecosystem, or is there an ecosystem because of the popularity? Either way, this point stands.React was created by Meta. Lots of money and effort has been thrown at the library, helping to polish and promote the product. Unfortunately, there are some downsides of working in the MERN stack, the most critical being the sheer complexity. Traditional web development was done using the Model-View-Controller (MVC) paradigm. In MVC, all of the logic managing a user’s session is handled in the backend, on the server. Something like fetching a user’s data was done via function calls and SQL statements in the backend. The backend then serves fully built HTML and CSS to the browser, which just has to display it. Hence the name “server”. In a SPA, this logic is handled on the user’s browser, in the frontend. SPAs have to handle UI state, application state, and sometimes even server state all in the browser. API calls have to be made to the backend to fetch user data. There is still quite a bit of logic on the backend, mainly exposing data and functionality through APIs. To illustrate the difference, let me use the analogy of a commercial kitchen. The customer will be the frontend and the kitchen will be the backend. MVCs vs. SPAs. Image generated by ChatGPT. Traditional MVC apps are like dining at a full-service restaurant. Yes, there is a lot of complexity (and yelling, if The Bear is to be believed) in the backend. But the frontend experience is simple and satisfying: all the customer has to do is pick up a fork and eat their food. SPAs are like eating at a buffet-style dining restaurant. There is still quite a bit of complexity in the kitchen. But now the customer also has to decide what food to grab, how to combine them, how to arrange them on the plate, where to put the plate when finished, etc. Andrej Karpathy had a tweet recently discussing his frustration with attempting to build web apps in 2025. It can be overwhelming for those new to the space. The reality of building web apps in 2025 is that it's a bit like assembling IKEA furniture. There's no "full-stack" product with batteries included, you have to piece together and configure many individual services:– frontend / backend (e.g. React, Next.js, APIs)– hosting…— Andrej Karpathy (@karpathy) March 27, 2025 In order to build MVPs with AI integration rapidly, our agency has decided to forgo the SPA and instead go with the traditional MVC approach. In particular, we have found Ruby on Rails (often denoted as Rails) to be the framework best suited to quickly developing and deploying quality apps with AI integration. Ruby on Rails was developed by David Heinemeier Hansson in 2004 and has long been known as a great web framework, but I would argue it has recently made leaps in its ability to incorporate AI into apps, as we will see. Django is the most popular Python web framework, and also has a more traditional pattern of development. Unfortunately, in our testing we found Django was simply not as full-featured or “batteries included” as Rails is. As a simple example, Django has no built-in background job system. Nearly all of our apps incorporate background jobs, so to not include this was disappointing. We also prefer how Rails emphasizes simplicity, with Rails 8 encouraging developers to easily self-host their apps instead of going through a provider like Heroku. They also recently released a stack of tools meant to replace external services like Redis. “But what about the smooth user experience?” you might ask. The truth is that modern Rails includes several ways of crafting SPA-like experiences without all of the heavy JavaScript. The primary tool is Hotwire, which bundles tools like Turbo and Stimulus. Turbo lets you dynamically change pieces of HTML on your webpage without writing custom JavaScript. For the times where you do need to include custom JavaScript, Stimulus is a minimal JavaScript framework that lets you do just that. Even if you want to use React, you can do so with the react-rails gem. So you can have your cake, and eat it too! SPAs are not the only reason for the increase in complexity, however. Another has to do with the advent of the microservices architecture. Microservices are for Macrocompanies Once again, we find ourselves comparing the simple past with the complexity of today. In the past, software was primarily developed as monoliths. A monolithic application means that all the different parts of your app — such as the user interface, business logic, and data handling — are developed, tested, and deployed as one single unit. The code is all typically housed in a single repo. Working with a monolith is simple and satisfying. Running a development setup for testing purposes is easy. You are working with a single database schema containing all of your tables, making queries and joins straightforward. Deployment is simple, since you just have one container to look at and modify. However, once your company scales to the size of a Google or Amazon, real problems begin to emerge. With hundreds or thousands of developers contributing simultaneously to a single codebase, coordinating changes and managing merge conflicts becomes increasingly difficult. Deployments also become more complex and risky, since even minor changes can blow up the entire application! To manage these issues, large companies began to coalesce around the microservices architecture. This is a style of programming where you design your codebase as a set of small, autonomous services. Each service owns its own codebase, data storage, and deployment pipelines. As a simple example, instead of stuffing all of your logic regarding an OpenAI client into your main app, you can move that logic into its own service. To call that service, you would then typically make REST calls, as opposed to function calls. This ups the complexity, but resolves the merge conflict and deployment issues, since each team in the organization gets to work on their own island of code. Another benefit to using microservices is that they allow for a polyglot tech stack. This means that each team can code up their service using whatever language they prefer. If one team prefers JavaScript while another likes Python, this is no issue. When we first began our agency, this idea of a polyglot stack pushed us to use a microservices architecture. Not because we had a large team, but because we each wanted to use the “best” language for each functionality. This meant: Using Ruby on Rails for web development. It’s been battle-tested in this area for decades. Using Python for the AI integration, perhaps deployed with something like FastAPI. Serious AI work requires Python, I was led to believe. Two different languages, each focused on its area of specialty. What could go wrong? Unfortunately, we found the process of development frustrating. Just setting up our dev environment was time-consuming. Having to wrangle Docker compose files and manage inter-service communication made us wish we could go back to the beauty and simplicity of the monolith. Having to make a REST call and set up the appropriate routing in FastAPI instead of making a simple function call sucked. “Surely we can’t develop AI apps in pure Ruby,” I thought. And then I gave it a try. And I’m glad I did. I found the process of developing an MVP with AI integration in Ruby very satisfying. We were able to sprint where before we were jogging. I loved the emphasis on beauty, simplicity, and developer happiness in the Ruby community. And I found the state of the AI ecosystem in Ruby to be surprisingly mature and getting better every day. If you are a Python programmer and are scared off by learning a new language like I was, let me comfort you by discussing the similarities between the Ruby and Python languages. Ruby and Python: Two Sides of the Same Coin I consider Python and Ruby to be like cousins. Both languages incorporate: High-level Interpretation: This means they abstract away a lot of the complexity of low-level programming details, such as memory management. Dynamic Typing: Neither language requires you to specify if a variable is an int, float, string, etc. The types are checked at runtime. Object-Oriented Programming: Both languages are object-oriented. Both support classes, inheritance, polymorphism, etc. Ruby is more “pure”, in the sense that literally everything is an object, whereas in Python a few things (such as if and for statements) are not objects. Readable and Concise Syntax: Both are considered easy to learn. Either is great for a first-time learner. Wide Ecosystem of Packages: Packages to do all sorts of cool things are available in both languages. In Python they are called libraries, and in Ruby they are called gems. The primary difference between the two languages lies in their philosophy and design principles. Python’s core philosophy can be described as: There should be one — and preferably only one — obvious way to do something. In theory, this should emphasize simplicity, readability, and clarity. Ruby’s philosophy can be described as: There’s always more than one way to do something. Maximize developer happiness. This was a shock to me when I switched over from Python. Check out this simple example emphasizing this philosophical difference: # A fight over philosophy: iterating over an array # Pythonic way for i in range(1, 6): print(i) # Ruby way, option 1 (1..5).each do |i| puts i end # Ruby way, option 2 for i in 1..5 puts i end # Ruby way, option 3 5.times do |i| puts i + 1 end # Ruby way, option 4 (1..5).each { |i| puts i } Another difference between the two is syntax style. Python primarily uses indentation to denote code blocks, while Ruby uses do…end or {…} blocks. Most include indentation inside Ruby blocks, but this is entirely optional. Examples of these syntactic differences can be seen in the code shown above. There are a lot of other little differences to learn. For example, in Python string interpolation is done using f-strings: f"Hello, {name}!", while in Ruby they are done using hashtags: "Hello, #{name}!". Within a few months, I think any competent Python programmer can transfer their proficiency over to Ruby. Recent AI-based Gems Despite not being in the conversation when discussing AI, Ruby has had some recent advancements in the world of gems. I will highlight some of the most impressive recent releases that we have been using in our agency to build AI apps: RubyLLM (link) — Any GitHub repo that gets more than 2k stars within a few weeks of release deserves a mention, and RubyLLM is definitely worthy. I have used many clunky implementations of LLM providers from libraries like LangChain and LlamaIndex, so using RubyLLM was like a breath of fresh air. As a simple example, let’s take a look at a tutorial demonstrating multi-turn conversations: require 'ruby_llm' # Create a model and give it instructions chat = RubyLLM.chat chat.with_instructions "You are a friendly Ruby expert who loves to help beginners." # Multi-turn conversation chat.ask "Hi! What does attr_reader do in Ruby?" # => "Ruby creates a getter method for each symbol... # Stream responses in real time chat.ask "Could you give me a short example?" do |chunk| print chunk.content end # => "Sure! # ```ruby # class Person # attr... Simply amazing. Multi-turn conversations are handled automatically for you. Streaming is a breeze. Compare this to a similar implementation in LangChain: from langchain_openai import ChatOpenAI from langchain_core.schema import SystemMessage, HumanMessage, AIMessage from langchain_core.callbacks.streaming_stdout import StreamingStdOutCallbackHandler SYSTEM_PROMPT = "You are a friendly Ruby expert who loves to help beginners." chat = ChatOpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()]) history = [SystemMessage(content=SYSTEM_PROMPT)] def ask(user_text: str) -> None: """Stream the answer token-by-token and keep the context in memory.""" history.append(HumanMessage(content=user_text)) # .stream yields message chunks as they arrive for chunk in chat.stream(history): print(chunk.content, end="", flush=True) print() # newline after the answer # the final chunk has the full message content history.append(AIMessage(content=chunk.content)) ask("Hi! What does attr_reader do in Ruby?") ask("Great - could you show a short example with attr_accessor?") Yikes. And it’s important to note that this is a grug implementation. Want to know how LangChain really expects you to manage memory? Check out these links, but grab a bucket first; you may get sick. Neighbors (link) — This is an excellent library to use for nearest-neighbors search in a Rails application. Very useful in a RAG setup. It integrates with Postgres, SQLite, MySQL, MariaDB, and more. It was written by Andrew Kane, the same guy who wrote the pgvector extension that allows Postgres to behave as a vector database. Async (link) — This gem had its first official release back in December 2024, and it has been making waves in the Ruby community. Async is a fiber-based framework for Ruby that runs non-blocking I/O tasks concurrently while letting you write simple, sequential code. Fibers are like mini-threads that each have their own mini call stack. While not strictly a gem for AI, it has helped us create features like web scrapers that run blazingly fast across thousands of pages. We have also used it to handle streaming of chunks from LLMs. Torch.rb (link) — If you are interested in training deep learning models, then surely you have heard of PyTorch. Well, PyTorch is built on LibTorch, which essentially has a lot of C/C++ code under the hood to perform ML operations quickly. Andrew Kane took LibTorch and made a Ruby adapter over it to create Torch.rb, essentially a Ruby version of PyTorch. Andrew Kane has been a hero in the Ruby AI world, authoring dozens of ML gems for Ruby. Summary In short: building a web application with AI integration quickly and cheaply requires a monolithic architecture. A monolith demands a monolingual application, which is necessary if your end goal is quality apps delivered with speed. Your main options are either Python or Ruby. If you go with Python, you will probably use Django for your web framework. If you go with Ruby, you will be using Ruby on Rails. At our agency, we found Django’s lack of features disappointing. Rails has impressed us with its feature set and emphasis on simplicity. We were thrilled to find almost no issues on the AI side. Of course, there are times where you will not want to use Ruby. If you are conducting research in AI or training machine learning models from scratch, then you will likely want to stick with Python. Research almost never involves building Web Applications. At most you’ll build a simple interface or dashboard in a notebook, but nothing production-ready. You’ll likely want the latest PyTorch updates to ensure your training runs quickly. You may even dive into low-level C/C++ programming to squeeze as much performance as you can out of your hardware. Maybe you’ll even try your hand at Mojo. But if your goal is to integrate the latest LLMs — either open or closed source — into web applications, then we believe Ruby to be the far superior option. Give it a shot yourselves! In part three of this series, I will dive into a fun experiment: just how simple can we make a web application with AI integration? Stay tuned.  If you’d like a custom web application with generative AI integration, visit losangelesaiapps.com The post Building AI Applications in Ruby appeared first on Towards Data Science.
    0 Commentarii 0 Distribuiri 0 previzualizare
  • Beyond the Prompt: What Google’s LLM Advice Doesn’t Quite Tell You

    Latest   Machine Learning
    Beyond the Prompt: What Google’s LLM Advice Doesn’t Quite Tell You

    0 like

    May 18, 2025

    Share this post

    Author: Mayank Bohra

    Originally published on Towards AI.

    Image by the author
    Alright, let’s talk about prompt engineering. Every other week, it seems there is a new set of secrets or magical techniques guaranteed to unlock AI perfection. Recently, a whitepaper from Google made the rounds, outlining their take on getting better results from Large Language Models.
    Look, effective prompting is absolutely necessary. It’s the interface layer, how we communicate our intent to these incredibly powerful, yet often frustrating opaque, models. Think of it like giving instructions to a brilliant but slightly eccentric junior engineer who only understands natural language. You need to be clear, specific, and provide context.
    But let’s be pragmatic. The idea that a few prompt tweaks will magically “10x” your results for every task is marketing hype, not engineering reality. These models, for all their capabilities, are fundamentally pattern-matching machines operating within a probabilistic space. They don’t understand in the way a human does. Prompting is about nudging that pattern matching closer to the desired outcome.
    So, what did Google’s advice cover, and what’s the experience builder’s take on it? The techniques generally boil down to principles we’ve known for a while: clarity, structure, providing examples and iteration.
    The Fundamentals: Clarity, Structure, Context
    Much of the advice centers on making your intent unambiguous. This is ground zero for dealing with LLMs. They excel at finding patterns in vast amounts of data, but they stumble on vagueness.

    Being Specific and Detailed: This isn’t a secret; it’s just good communication. If you ask for “information about AI”, you’ll get something generic. If you ask for “a summary of recent advancements in Generative AI model architecture published in research papers since April 2025, focusing on MoE models”, you give the model a much better target.
    Defining Output Format: Models are flexible text generators. If you don’t specify structure, you’ll get whatever feels statistically probable based on the training data, which is often inconsistent. Telling the model “Respond in JSON format with keys ‘summary’ and ‘key_findings’” isn’t magic; it’s setting clear requirements.
    Providing Context: Models have limited context windows. Showing your entire codebase or all user documentation in won’t work. You need to curate teh relevant information. This principle is the entire foundation of Retrieval Augmented Generation, where you retrieve relevant chunks of data and then provide them as context to the prompt. Prompting alone without relevant external knowledge only leverage the model’s internal training data, which might be outdated or insufficient for domain-specific tasks.

    These points are foundational. They’re less about discovering hidden model behaviors and more about mitigating the inherent ambiguity of natural language and the model’s lack of true world understanding.
    Structuring the Conversation: Roles and Delimiters
    Assigning a roleor using delimitersare simple yet effective ways to guide the model’s behavior and separate instructions from input.

    Assigning a Role: This is a trick to prime the model to generate text consistent with a certain persona or knowledge domain it learned during training. It leverage the fact that the model has seen countless examples of different writing styles and knowledge expressions. It works, but it’s a heuristic, not a guarantee of factual accuracy or perfect adherence to the role.
    Using Delimiters: Essential for programmatic prompting. When you’re building an application that feeds user input into a prompt, you must use delimitersto clearly separated the user’s potentially malicious input from your system instructions. This is a critical security measure against prompt injection attacks, not just a formatting tip.

    Nudging the Model’s Reasoning: Few-shot and Step-by-Step
    Some techniques go beyond just structuring the input; they attempt to influence the model’s internal processing.

    Few-shot Prompts: Providing a few examples of input/output pairsif often far more effective than just describing the task. Why? Because the model learns the desired mapping from the examples. It’s pattern recognition again. This is powerful for teaching specific formats or interpreting nuanced instructions that hard to describe purely verbally. It’s basically in-context learning.
    Breaking Down Complex Tasks: Asking the model to think step-by-stepencourages it to show intermediate steps. This often leads to more accurate final results, especially for reasoning-heavy tasks. Why? It mimics hwo humans solve problems and forces the model to allocate computational steps sequentially. It’s less about a secret instruction and more about guiding the model through a multi-step process rather than expecting it to leap to the answer in one go.

    The Engineering Angle: Testing and Iteration
    The advice also includes testing and iteration. Again, this isn’t unique to prompt engineering. It’s fundamental to all software development.

    Test and Iterate: You write a prompt, you test it with various inputs, you see where it fails or is suboptimal, you tweak the prompt, and you test again. This loop is the reality of building anything reliable with LLMs. It highlights that prompting is often empirical; you figure out what works by trying it. This is the opposite of a predictable, documented API.

    The Hard Truth: Where Prompt Engineering Hits a Wall
    Here’s where the pragmatic view really kicks in. Prompt engineering, while crucial, has significant limitations, especially for building robust, production-grade applications:

    Context Window Limits: There’s only so much information you can cram into a prompt. Long documents, complex histories, or large datasets are out. This is why RAG systems are essential — they manage and retrieve relevant context dynamically. Prompting alone doesn’t solve the knowledge bottleneck.
    Factual Accuracy and Hallucinations: No amount of prompting can guarantee a model won’t invent facts or confidently present misinformation. Prompting can sometimes mitigate this by, for examples, telling the model to stick only to the provided context, but it doesn’t fix the underlying issue that the model is a text predictor, not a truth engine.
    Model Bias and Undesired Behavior: Prompts can influence output, but they can’t easily override biases embedded in the training data or prevent the model from generating harmful or inappropriate content in unexpected ways. Guardrails need to be implemented *outside* the prompt layer.
    Complexity Ceiling: For truly complex, multi-step processes requiring external tool use, decision making, and dynamic state, pure prompting breaks down. This is the domain of AI agents, which use LLMs as the controller but rely on external memory, planning modules, and tool interaction to achieve goals. Prompting is just one part of the agent’s loop.
    Maintainability: Try managing dozens or hundreds of complex, multi-line prompts across different features in a large application. Versioning them? Testing changes? This quickly becomes an engineering nightmare. Prompts are code, but often undocumented, untestable code living in strings.
    Prompt Injection: As mentioned with delimiters, allowing external inputinto a prompt opens the door to prompt injection attacks, where malicious input hijacks the model’s instructions. Robust applications need sanitization and architectural safeguards beyond just a delimiter trick.

    What no one tells you in the prompt “secrets” articles is that the difficulty scales non-linearly with the reliability and complexity required. Getting a cool demo output with a clever prompt is one thing. Building a feature that consistently works for thousands of users on diverse inputs while being secure and maintainable? That’s a whole different ballgame.
    The Real “Secret”? It’s Just Good Engineering.
    If there’s any “secret” to building effective applications with LLMs, it’s not a prompt string. It’s integrating the model into a well-architected system.
    This involves:

    Data Pipelines: Getting the right data to the model.
    Orchestration Frameworks: Using tools like LangChain, LlamaIndex, or building custom workflows to sequence model calls, tool use, and data retrieval.
    Evaluation: Developing robust methods to quantitatively measure the quality of LLM output beyond just eyeballing it. This is hard.
    Guardrails: Implementing safety checks, moderation, and input validation *outside* the LLM call itself.
    Fallback Mechanisms: What happens when the model gives a bad answer or fails? Your application needs graceful degradation.
    Version Control and Testing: Treating prompts and the surrounding logic with the same rigor as any other production code.

    Prompt engineering is a critical *skill*, part of the overall toolkit. It’s like knowing how to write effective SQL queries. Essential for database interaction, but it doesn’t mean you can build a scalable web application with just SQL. You need application code, infrastructure, frontend, etc.
    Wrapping Up
    So, Google’s whitepaper and similar resources offer valuable best practices for interacting with LLMs. They formalize common-sense approaches to communication and leverage observed model behaviors like few-shot learning and step-by-step processing. If you’re just starting out, or using LLMs for simple tasks, mastering these techniques will absolutely improve your results.
    But if you’re a developer, an AI practitioner, or a technical founder looking to build robust, reliable applications powered by LLMs, understand this: prompt engineering is table stakes. It’s necessary, but far from sufficient. The real challenge, the actual “secrets” if you want to call them that, lie in the surrounding engineering — the data management, the orchestration, the evaluation, the guardrails, and the sheer hard work of building a system that accounts for the LLM’s inherent unpredictability and limitations.
    Don’t get fixated on finding the perfect prompt string. Focus on building a resilient system around it. That’s where the real progress happens.
    Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

    Published via Towards AI

    Towards AI - Medium

    Share this post
    #beyond #prompt #what #googles #llm
    Beyond the Prompt: What Google’s LLM Advice Doesn’t Quite Tell You
    Latest   Machine Learning Beyond the Prompt: What Google’s LLM Advice Doesn’t Quite Tell You 0 like May 18, 2025 Share this post Author: Mayank Bohra Originally published on Towards AI. Image by the author Alright, let’s talk about prompt engineering. Every other week, it seems there is a new set of secrets or magical techniques guaranteed to unlock AI perfection. Recently, a whitepaper from Google made the rounds, outlining their take on getting better results from Large Language Models. Look, effective prompting is absolutely necessary. It’s the interface layer, how we communicate our intent to these incredibly powerful, yet often frustrating opaque, models. Think of it like giving instructions to a brilliant but slightly eccentric junior engineer who only understands natural language. You need to be clear, specific, and provide context. But let’s be pragmatic. The idea that a few prompt tweaks will magically “10x” your results for every task is marketing hype, not engineering reality. These models, for all their capabilities, are fundamentally pattern-matching machines operating within a probabilistic space. They don’t understand in the way a human does. Prompting is about nudging that pattern matching closer to the desired outcome. So, what did Google’s advice cover, and what’s the experience builder’s take on it? The techniques generally boil down to principles we’ve known for a while: clarity, structure, providing examples and iteration. The Fundamentals: Clarity, Structure, Context Much of the advice centers on making your intent unambiguous. This is ground zero for dealing with LLMs. They excel at finding patterns in vast amounts of data, but they stumble on vagueness. Being Specific and Detailed: This isn’t a secret; it’s just good communication. If you ask for “information about AI”, you’ll get something generic. If you ask for “a summary of recent advancements in Generative AI model architecture published in research papers since April 2025, focusing on MoE models”, you give the model a much better target. Defining Output Format: Models are flexible text generators. If you don’t specify structure, you’ll get whatever feels statistically probable based on the training data, which is often inconsistent. Telling the model “Respond in JSON format with keys ‘summary’ and ‘key_findings’” isn’t magic; it’s setting clear requirements. Providing Context: Models have limited context windows. Showing your entire codebase or all user documentation in won’t work. You need to curate teh relevant information. This principle is the entire foundation of Retrieval Augmented Generation, where you retrieve relevant chunks of data and then provide them as context to the prompt. Prompting alone without relevant external knowledge only leverage the model’s internal training data, which might be outdated or insufficient for domain-specific tasks. These points are foundational. They’re less about discovering hidden model behaviors and more about mitigating the inherent ambiguity of natural language and the model’s lack of true world understanding. Structuring the Conversation: Roles and Delimiters Assigning a roleor using delimitersare simple yet effective ways to guide the model’s behavior and separate instructions from input. Assigning a Role: This is a trick to prime the model to generate text consistent with a certain persona or knowledge domain it learned during training. It leverage the fact that the model has seen countless examples of different writing styles and knowledge expressions. It works, but it’s a heuristic, not a guarantee of factual accuracy or perfect adherence to the role. Using Delimiters: Essential for programmatic prompting. When you’re building an application that feeds user input into a prompt, you must use delimitersto clearly separated the user’s potentially malicious input from your system instructions. This is a critical security measure against prompt injection attacks, not just a formatting tip. Nudging the Model’s Reasoning: Few-shot and Step-by-Step Some techniques go beyond just structuring the input; they attempt to influence the model’s internal processing. Few-shot Prompts: Providing a few examples of input/output pairsif often far more effective than just describing the task. Why? Because the model learns the desired mapping from the examples. It’s pattern recognition again. This is powerful for teaching specific formats or interpreting nuanced instructions that hard to describe purely verbally. It’s basically in-context learning. Breaking Down Complex Tasks: Asking the model to think step-by-stepencourages it to show intermediate steps. This often leads to more accurate final results, especially for reasoning-heavy tasks. Why? It mimics hwo humans solve problems and forces the model to allocate computational steps sequentially. It’s less about a secret instruction and more about guiding the model through a multi-step process rather than expecting it to leap to the answer in one go. The Engineering Angle: Testing and Iteration The advice also includes testing and iteration. Again, this isn’t unique to prompt engineering. It’s fundamental to all software development. Test and Iterate: You write a prompt, you test it with various inputs, you see where it fails or is suboptimal, you tweak the prompt, and you test again. This loop is the reality of building anything reliable with LLMs. It highlights that prompting is often empirical; you figure out what works by trying it. This is the opposite of a predictable, documented API. The Hard Truth: Where Prompt Engineering Hits a Wall Here’s where the pragmatic view really kicks in. Prompt engineering, while crucial, has significant limitations, especially for building robust, production-grade applications: Context Window Limits: There’s only so much information you can cram into a prompt. Long documents, complex histories, or large datasets are out. This is why RAG systems are essential — they manage and retrieve relevant context dynamically. Prompting alone doesn’t solve the knowledge bottleneck. Factual Accuracy and Hallucinations: No amount of prompting can guarantee a model won’t invent facts or confidently present misinformation. Prompting can sometimes mitigate this by, for examples, telling the model to stick only to the provided context, but it doesn’t fix the underlying issue that the model is a text predictor, not a truth engine. Model Bias and Undesired Behavior: Prompts can influence output, but they can’t easily override biases embedded in the training data or prevent the model from generating harmful or inappropriate content in unexpected ways. Guardrails need to be implemented *outside* the prompt layer. Complexity Ceiling: For truly complex, multi-step processes requiring external tool use, decision making, and dynamic state, pure prompting breaks down. This is the domain of AI agents, which use LLMs as the controller but rely on external memory, planning modules, and tool interaction to achieve goals. Prompting is just one part of the agent’s loop. Maintainability: Try managing dozens or hundreds of complex, multi-line prompts across different features in a large application. Versioning them? Testing changes? This quickly becomes an engineering nightmare. Prompts are code, but often undocumented, untestable code living in strings. Prompt Injection: As mentioned with delimiters, allowing external inputinto a prompt opens the door to prompt injection attacks, where malicious input hijacks the model’s instructions. Robust applications need sanitization and architectural safeguards beyond just a delimiter trick. What no one tells you in the prompt “secrets” articles is that the difficulty scales non-linearly with the reliability and complexity required. Getting a cool demo output with a clever prompt is one thing. Building a feature that consistently works for thousands of users on diverse inputs while being secure and maintainable? That’s a whole different ballgame. The Real “Secret”? It’s Just Good Engineering. If there’s any “secret” to building effective applications with LLMs, it’s not a prompt string. It’s integrating the model into a well-architected system. This involves: Data Pipelines: Getting the right data to the model. Orchestration Frameworks: Using tools like LangChain, LlamaIndex, or building custom workflows to sequence model calls, tool use, and data retrieval. Evaluation: Developing robust methods to quantitatively measure the quality of LLM output beyond just eyeballing it. This is hard. Guardrails: Implementing safety checks, moderation, and input validation *outside* the LLM call itself. Fallback Mechanisms: What happens when the model gives a bad answer or fails? Your application needs graceful degradation. Version Control and Testing: Treating prompts and the surrounding logic with the same rigor as any other production code. Prompt engineering is a critical *skill*, part of the overall toolkit. It’s like knowing how to write effective SQL queries. Essential for database interaction, but it doesn’t mean you can build a scalable web application with just SQL. You need application code, infrastructure, frontend, etc. Wrapping Up So, Google’s whitepaper and similar resources offer valuable best practices for interacting with LLMs. They formalize common-sense approaches to communication and leverage observed model behaviors like few-shot learning and step-by-step processing. If you’re just starting out, or using LLMs for simple tasks, mastering these techniques will absolutely improve your results. But if you’re a developer, an AI practitioner, or a technical founder looking to build robust, reliable applications powered by LLMs, understand this: prompt engineering is table stakes. It’s necessary, but far from sufficient. The real challenge, the actual “secrets” if you want to call them that, lie in the surrounding engineering — the data management, the orchestration, the evaluation, the guardrails, and the sheer hard work of building a system that accounts for the LLM’s inherent unpredictability and limitations. Don’t get fixated on finding the perfect prompt string. Focus on building a resilient system around it. That’s where the real progress happens. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post #beyond #prompt #what #googles #llm
    TOWARDSAI.NET
    Beyond the Prompt: What Google’s LLM Advice Doesn’t Quite Tell You
    Latest   Machine Learning Beyond the Prompt: What Google’s LLM Advice Doesn’t Quite Tell You 0 like May 18, 2025 Share this post Author(s): Mayank Bohra Originally published on Towards AI. Image by the author Alright, let’s talk about prompt engineering. Every other week, it seems there is a new set of secrets or magical techniques guaranteed to unlock AI perfection. Recently, a whitepaper from Google made the rounds, outlining their take on getting better results from Large Language Models. Look, effective prompting is absolutely necessary. It’s the interface layer, how we communicate our intent to these incredibly powerful, yet often frustrating opaque, models. Think of it like giving instructions to a brilliant but slightly eccentric junior engineer who only understands natural language. You need to be clear, specific, and provide context. But let’s be pragmatic. The idea that a few prompt tweaks will magically “10x” your results for every task is marketing hype, not engineering reality. These models, for all their capabilities, are fundamentally pattern-matching machines operating within a probabilistic space. They don’t understand in the way a human does. Prompting is about nudging that pattern matching closer to the desired outcome. So, what did Google’s advice cover, and what’s the experience builder’s take on it? The techniques generally boil down to principles we’ve known for a while: clarity, structure, providing examples and iteration. The Fundamentals: Clarity, Structure, Context Much of the advice centers on making your intent unambiguous. This is ground zero for dealing with LLMs. They excel at finding patterns in vast amounts of data, but they stumble on vagueness. Being Specific and Detailed: This isn’t a secret; it’s just good communication. If you ask for “information about AI”, you’ll get something generic. If you ask for “a summary of recent advancements in Generative AI model architecture published in research papers since April 2025, focusing on MoE models”, you give the model a much better target. Defining Output Format: Models are flexible text generators. If you don’t specify structure (JSON, bullet points, a specific paragraph format), you’ll get whatever feels statistically probable based on the training data, which is often inconsistent. Telling the model “Respond in JSON format with keys ‘summary’ and ‘key_findings’” isn’t magic; it’s setting clear requirements. Providing Context: Models have limited context windows. Showing your entire codebase or all user documentation in won’t work. You need to curate teh relevant information. This principle is the entire foundation of Retrieval Augmented Generation (RAG), where you retrieve relevant chunks of data and then provide them as context to the prompt. Prompting alone without relevant external knowledge only leverage the model’s internal training data, which might be outdated or insufficient for domain-specific tasks. These points are foundational. They’re less about discovering hidden model behaviors and more about mitigating the inherent ambiguity of natural language and the model’s lack of true world understanding. Structuring the Conversation: Roles and Delimiters Assigning a role (“Act as an expert historian…”) or using delimiters (like “` or — -) are simple yet effective ways to guide the model’s behavior and separate instructions from input. Assigning a Role: This is a trick to prime the model to generate text consistent with a certain persona or knowledge domain it learned during training. It leverage the fact that the model has seen countless examples of different writing styles and knowledge expressions. It works, but it’s a heuristic, not a guarantee of factual accuracy or perfect adherence to the role. Using Delimiters: Essential for programmatic prompting. When you’re building an application that feeds user input into a prompt, you must use delimiters (e.g., triple backticks, XML tags) to clearly separated the user’s potentially malicious input from your system instructions. This is a critical security measure against prompt injection attacks, not just a formatting tip. Nudging the Model’s Reasoning: Few-shot and Step-by-Step Some techniques go beyond just structuring the input; they attempt to influence the model’s internal processing. Few-shot Prompts: Providing a few examples of input/output pairs (‘Input X → Output Y’, Input A → Output B, Input C→ ?’) if often far more effective than just describing the task. Why? Because the model learns the desired mapping from the examples. It’s pattern recognition again. This is powerful for teaching specific formats or interpreting nuanced instructions that hard to describe purely verbally. It’s basically in-context learning. Breaking Down Complex Tasks: Asking the model to think step-by-step (or implementing techniques like Chain-of-Thought or Tree-of-Thought prompting outside the model) encourages it to show intermediate steps. This often leads to more accurate final results, especially for reasoning-heavy tasks. Why? It mimics hwo humans solve problems and forces the model to allocate computational steps sequentially. It’s less about a secret instruction and more about guiding the model through a multi-step process rather than expecting it to leap to the answer in one go. The Engineering Angle: Testing and Iteration The advice also includes testing and iteration. Again, this isn’t unique to prompt engineering. It’s fundamental to all software development. Test and Iterate: You write a prompt, you test it with various inputs, you see where it fails or is suboptimal, you tweak the prompt, and you test again. This loop is the reality of building anything reliable with LLMs. It highlights that prompting is often empirical; you figure out what works by trying it. This is the opposite of a predictable, documented API. The Hard Truth: Where Prompt Engineering Hits a Wall Here’s where the pragmatic view really kicks in. Prompt engineering, while crucial, has significant limitations, especially for building robust, production-grade applications: Context Window Limits: There’s only so much information you can cram into a prompt. Long documents, complex histories, or large datasets are out. This is why RAG systems are essential — they manage and retrieve relevant context dynamically. Prompting alone doesn’t solve the knowledge bottleneck. Factual Accuracy and Hallucinations: No amount of prompting can guarantee a model won’t invent facts or confidently present misinformation. Prompting can sometimes mitigate this by, for examples, telling the model to stick only to the provided context (RAG), but it doesn’t fix the underlying issue that the model is a text predictor, not a truth engine. Model Bias and Undesired Behavior: Prompts can influence output, but they can’t easily override biases embedded in the training data or prevent the model from generating harmful or inappropriate content in unexpected ways. Guardrails need to be implemented *outside* the prompt layer. Complexity Ceiling: For truly complex, multi-step processes requiring external tool use, decision making, and dynamic state, pure prompting breaks down. This is the domain of AI agents, which use LLMs as the controller but rely on external memory, planning modules, and tool interaction to achieve goals. Prompting is just one part of the agent’s loop. Maintainability: Try managing dozens or hundreds of complex, multi-line prompts across different features in a large application. Versioning them? Testing changes? This quickly becomes an engineering nightmare. Prompts are code, but often undocumented, untestable code living in strings. Prompt Injection: As mentioned with delimiters, allowing external input (from users, databases, APIs) into a prompt opens the door to prompt injection attacks, where malicious input hijacks the model’s instructions. Robust applications need sanitization and architectural safeguards beyond just a delimiter trick. What no one tells you in the prompt “secrets” articles is that the difficulty scales non-linearly with the reliability and complexity required. Getting a cool demo output with a clever prompt is one thing. Building a feature that consistently works for thousands of users on diverse inputs while being secure and maintainable? That’s a whole different ballgame. The Real “Secret”? It’s Just Good Engineering. If there’s any “secret” to building effective applications with LLMs, it’s not a prompt string. It’s integrating the model into a well-architected system. This involves: Data Pipelines: Getting the right data to the model (for RAG, fine-tuning, etc.). Orchestration Frameworks: Using tools like LangChain, LlamaIndex, or building custom workflows to sequence model calls, tool use, and data retrieval. Evaluation: Developing robust methods to quantitatively measure the quality of LLM output beyond just eyeballing it. This is hard. Guardrails: Implementing safety checks, moderation, and input validation *outside* the LLM call itself. Fallback Mechanisms: What happens when the model gives a bad answer or fails? Your application needs graceful degradation. Version Control and Testing: Treating prompts and the surrounding logic with the same rigor as any other production code. Prompt engineering is a critical *skill*, part of the overall toolkit. It’s like knowing how to write effective SQL queries. Essential for database interaction, but it doesn’t mean you can build a scalable web application with just SQL. You need application code, infrastructure, frontend, etc. Wrapping Up So, Google’s whitepaper and similar resources offer valuable best practices for interacting with LLMs. They formalize common-sense approaches to communication and leverage observed model behaviors like few-shot learning and step-by-step processing. If you’re just starting out, or using LLMs for simple tasks, mastering these techniques will absolutely improve your results. But if you’re a developer, an AI practitioner, or a technical founder looking to build robust, reliable applications powered by LLMs, understand this: prompt engineering is table stakes. It’s necessary, but far from sufficient. The real challenge, the actual “secrets” if you want to call them that, lie in the surrounding engineering — the data management, the orchestration, the evaluation, the guardrails, and the sheer hard work of building a system that accounts for the LLM’s inherent unpredictability and limitations. Don’t get fixated on finding the perfect prompt string. Focus on building a resilient system around it. That’s where the real progress happens. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post
    0 Commentarii 0 Distribuiri 0 previzualizare
  • How to Build an AI Journal with LlamaIndex

    This post will share how to build an AI journal with the LlamaIndex. We will cover one essential function of this AI journal: asking for advice. We will start with the most basic implementation and iterate from there. We can see significant improvements for this function when we apply design patterns like Agentic Rag and multi-agent workflow.You can find the source code of this AI Journal in my GitHub repo here. And about who I am.

    Overview of AI Journal

    I want to build my principles by following Ray Dalio’s practice. An AI journal will help me to self-reflect, track my improvement, and even give me advice. The overall function of such an AI journal looks like this:

    AI Journal Overview. Image by Author.

    Today, we will only cover the implementation of the seek-advise flow, which is represented by multiple purple cycles in the above diagram.

    Simplest Form: LLM with Large Context

    In the most straightforward implementation, we can pass all the relevant content into the context and attach the question we want to ask. We can do that in Llamaindex with a few lines of code.

    import pymupdf
    from llama_index.llms.openai import OpenAI

    path_to_pdf_book = './path/to/pdf/book.pdf'
    def load_book_content:
    text = ""
    with pymupdf.openas pdf:
    for page in pdf:
    text += str.encode)
    return text

    system_prompt_template = """You are an AI assistant that provides thoughtful, practical, and *deeply personalized* suggestions by combining:
    - The user's personal profile and principles
    - Insights retrieved from *Principles* by Ray Dalio
    Book Content:
    ```
    {book_content}
    ```
    User profile:
    ```
    {user_profile}
    ```
    User's question:
    ```
    {user_question}
    ```
    """

    def get_system_prompt:
    system_prompt = system_prompt_template.formatreturn system_prompt

    def chat:
    llm = get_openai_llmuser_profile = inputuser_question = inputuser_profile = user_profile.stripbook_content = load_book_summaryresponse = llm.complete)
    return response

    This approach has downsides:

    Low Precision: Loading all the book context might prompt LLM to lose focus on the user’s question.

    High Cost: Sending over significant-sized content in every LLM call means high cost and poor performance.

    With this approach, if you pass the whole content of Ray Dalio’s Principles book, responses to questions like “How to handle stress?” become very general. Such responses without relating to my question made me feel that the AI was not listening to me. Even though it covers many important concepts like embracing reality, the 5-step process to get what you want, and being radically open-minded. I like the advice I got to be more targeted to the question I raised. Let’s see how we can improve it with RAG.

    Enhanced Form: Agentic RAG

    So, what is Agentic RAG? Agentic RAG is combining dynamic decision-making and data retrieval. In our AI journal, the Agentic RAG flow looks like this:

    Stages of Agentic Rag. Image by Author

    Question Evaluation: Poorly framed questions lead to poor query results. The agent will evaluate the user’s query and clarify the questions if the Agent believes it is necessary.

    Question Re-write: Rewrite the user enquiry to project it to the indexed content in the semantic space. I found these steps essential for improving the precision during the retrieval. Let’s say if your knowledge base is Q/A pair and you are indexing the questions part to search for answers. Rewriting the user’s query statement to a proper question will help you find the most relevant content.

    Query Vector Index: Many parameters can be tuned when building such an index, including chunk size, overlap, or a different index type. For simplicity, we are using VectorStoreIndex here, which has a default chunking strategy.

    Filter & Synthetic: Instead of a complex re-ranking process, I explicitly instruct LLM to filter and find relevant content in the prompt. I see LLM picking up the most relevant content, even though sometimes it has a lower similarity score than others.

    With this Agentic RAG, you can retrieve highly relevant content to the user’s questions, generating more targeted advice.

    Let’s examine the implementation. With the LlamaIndex SDK, creating and persisting an index in your local directory is straightforward.

    from llama_index.core import Document, VectorStoreIndex, StorageContext, load_index_from_storage

    Settings.embed_model = OpenAIEmbeddingPERSISTED_INDEX_PATH = "/path/to/the/directory/persist/index/locally"

    def create_index:
    documents =vector_index = VectorStoreIndex.from_documentsvector_index.storage_context.persistdef load_index:
    storage_context = StorageContext.from_defaultsindex = load_index_from_storagereturn index

    Once we have an index, we can create a query engine on top of that. The query engine is a powerful abstraction that allows you to adjust the parameters during the queryand the synthesis behaviour after the content retrieval. In my implementation, I overwrite the response_mode NO_TEXT because the agent will process the book content returned by the function call and synthesize the final result. Having the query engine to synthesize the result before passing it to the agent would be redundant.

    from llama_index.core.indices.vector_store import VectorIndexRetriever
    from llama_index.core.query_engine import RetrieverQueryEngine
    from llama_index.core.response_synthesizers import ResponseMode
    from llama_index.core import VectorStoreIndex, get_response_synthesizer

    def _create_query_engine_from_index:
    # configure retriever
    retriever = VectorIndexRetriever# return the original content without using LLM to synthesizer. For later evaluation.
    response_synthesizer = get_response_synthesizer# assemble query engine
    query_engine = RetrieverQueryEnginereturn query_engine

    The prompt looks like the following:

    You are an assistant that helps reframe user questions into clear, concept-driven statements that match
    the style and topics of Principles by Ray Dalio, and perform look up principle book for relevant content.

    Background:
    Principles teaches structured thinking about life and work decisions.
    The key ideas are:
    * Radical truth and radical transparency
    * Decision-making frameworks
    * Embracing mistakes as learning

    Task:
    - Task 1: Clarify the user's question if needed. Ask follow-up questions to ensure you understand the user's intent.
    - Task 2: Rewrite a user’s question into a statement that would match how Ray Dalio frames ideas in Principles. Use formal, logical, neutral tone.
    - Task 3: Look up principle book with given re-wrote statements. You should provide at least {REWRITE_FACTOR} rewrote versions.
    - Task 4: Find the most relevant from the book content as your fina answers.

    Finally, we can build the agent with those functions defined.

    def get_principle_rag_agent:
    index = load_persisted_indexquery_engine = _create_query_engine_from_indexdef look_up_principle_book-> List:
    result =for q in rewrote_statement:
    response = query_engine.querycontent =result.extendreturn result

    def clarify_question-> str:
    """
    Clarify the user's question if needed. Ask follow-up questions to ensure you understand the user's intent.
    """
    response = ""
    for q in your_questions_to_user:
    printr = inputresponse += f"Question: {q}\nResponse: {r}\n"
    return response

    tools =agent = FunctionAgentreturn agent

    rag_agent = get_principle_rag_agentresponse = await agent.runThere are a few observations I had during the implementations:

    One interesting fact I found is that providing a non-used parameter, original_question , in the function signature helps. I found that when I do not have such a parameter, LLM sometimes does not follow the rewrite instruction and passes the original question in rewrote_statement the parameter. Having original_question parameters somehow emphasizes the rewriting mission to LLM.

    Different LLMs behave quite differently given the same prompt. I found DeepSeek V3 much more reluctant to trigger function calls than other model providers. This doesn’t necessarily mean it is not usable. If a functional call should be initiated 90% of the time, it should be part of the workflow instead of being registered as a function call. Also, compared to OpenAI’s models, I found Gemini good at citing the source of the book when it synthesizes the results.

    The more content you load into the context window, the more inference capability the model needs. A smaller model with less inference power is more likely to get lost in the large context provided.

    However, to complete the seek-advice function, you’ll need multiple Agents working together instead of a single Agent. Let’s talk about how to chain your Agents together into workflows.

    Final Form: Agent Workflow

    Before we start, I recommend this article by Anthropic, Building Effective Agents. The one-liner summary of the articles is that you should always prioritise building a workflow instead of a dynamic agent when possible. In LlamaIndex, you can do both. It allows you to create an agent workflow with more automatic routing or a customised workflow with more explicit control of the transition of steps. I will provide an example of both implementations.

    Workflow Explain. Image by Author.

    Let’s take a look at how you can build a dynamic workflow. Here is a code example.

    interviewer = FunctionAgentinterviewer = FunctionAgentadvisor = FunctionAgentworkflow = AgentWorkflowhandler = await workflow.runIt is dynamic because the Agent transition is based on the function call of the LLM model. Underlying, LlamaIndex workflow provides agent descriptions as functions for LLM models. When the LLM model triggers such “Agent Function Call”, LlamaIndex will route to your next corresponding agent for the subsequent step processing. Your previous agent’s output has been added to the workflow internal state, and your following agent will pick up the state as part of the context in their call to the LLM model. You also leverage state and memory components to manage the workflow’s internal state or load external data.

    However, as I have suggested, you can explicitly control the steps in your workflow to gain more control. With LlamaIndex, it can be done by extending the workflow object. For example:

    class ReferenceRetrivalEvent:
    question: str

    class Advice:
    principles: Listprofile: dict
    question: str
    book_content: str

    class AdviceWorkFlow:
    def __init__:
    state = get_workflow_stateself.principles = state.load_principle_from_casesself.profile = state.load_profileself.verbose = verbose
    super.__init__@step
    async def interview-> ReferenceRetrivalEvent:
    # Step 1: Interviewer agent asks questions to the user
    interviewer = get_interviewer_agentquestion = await _run_agentreturn ReferenceRetrivalEvent@step
    async def retrieve-> Advice:
    # Step 2: RAG agent retrieves relevant content from the book
    rag_agent = get_principle_rag_agentbook_content = await _run_agentreturn Advice@step
    async def advice-> StopEvent:
    # Step 3: Adviser agent provides advice based on the user's profile, principles, and book content
    advisor = get_adviser_agentadvise = await _run_agentreturn StopEventThe specific event type’s return controls the workflow’s step transition. For instance, retrieve step returns an Advice event that will trigger the execution of the advice step. You can also leverage the Advice event to pass the necessary information you need.

    During the implementation, if you are annoyed by having to start over the workflow to debug some steps in the middle, the context object is essential when you want to failover the workflow execution. You can store your state in a serialised format and recover your workflow by unserialising it to a context object. Your workflow will continue executing based on the state instead of starting over.

    workflow = AgentWorkflowtry:
    handler = w.runresult = await handler
    except Exception as e:
    printawait fail_over# Optional, serialised and save the contexct for debugging
    ctx_dict = ctx.to_dict)
    json_dump_and_save# Resume from the same context
    ctx_dict = load_failed_dictrestored_ctx = Context.from_dict)
    handler = w.runresult = await handler

    Summary

    In this post, we have discussed how to use LlamaIndex to implement an AI journal’s core function. The key learning includes:

    Using Agentic RAG to leverage LLM capability to dynamically rewrite the original query and synthesis result.

    Use a Customized Workflow to gain more explicit control over step transitions. Build dynamic agents when necessary.

    The source code of this AI journal is in my GitHub repo here. I hope you enjoy this article and this small app I built. Cheers!
    The post How to Build an AI Journal with LlamaIndex appeared first on Towards Data Science.
    #how #build #journal #with #llamaindex
    How to Build an AI Journal with LlamaIndex
    This post will share how to build an AI journal with the LlamaIndex. We will cover one essential function of this AI journal: asking for advice. We will start with the most basic implementation and iterate from there. We can see significant improvements for this function when we apply design patterns like Agentic Rag and multi-agent workflow.You can find the source code of this AI Journal in my GitHub repo here. And about who I am. Overview of AI Journal I want to build my principles by following Ray Dalio’s practice. An AI journal will help me to self-reflect, track my improvement, and even give me advice. The overall function of such an AI journal looks like this: AI Journal Overview. Image by Author. Today, we will only cover the implementation of the seek-advise flow, which is represented by multiple purple cycles in the above diagram. Simplest Form: LLM with Large Context In the most straightforward implementation, we can pass all the relevant content into the context and attach the question we want to ask. We can do that in Llamaindex with a few lines of code. import pymupdf from llama_index.llms.openai import OpenAI path_to_pdf_book = './path/to/pdf/book.pdf' def load_book_content: text = "" with pymupdf.openas pdf: for page in pdf: text += str.encode) return text system_prompt_template = """You are an AI assistant that provides thoughtful, practical, and *deeply personalized* suggestions by combining: - The user's personal profile and principles - Insights retrieved from *Principles* by Ray Dalio Book Content: ``` {book_content} ``` User profile: ``` {user_profile} ``` User's question: ``` {user_question} ``` """ def get_system_prompt: system_prompt = system_prompt_template.formatreturn system_prompt def chat: llm = get_openai_llmuser_profile = inputuser_question = inputuser_profile = user_profile.stripbook_content = load_book_summaryresponse = llm.complete) return response This approach has downsides: Low Precision: Loading all the book context might prompt LLM to lose focus on the user’s question. High Cost: Sending over significant-sized content in every LLM call means high cost and poor performance. With this approach, if you pass the whole content of Ray Dalio’s Principles book, responses to questions like “How to handle stress?” become very general. Such responses without relating to my question made me feel that the AI was not listening to me. Even though it covers many important concepts like embracing reality, the 5-step process to get what you want, and being radically open-minded. I like the advice I got to be more targeted to the question I raised. Let’s see how we can improve it with RAG. Enhanced Form: Agentic RAG So, what is Agentic RAG? Agentic RAG is combining dynamic decision-making and data retrieval. In our AI journal, the Agentic RAG flow looks like this: Stages of Agentic Rag. Image by Author Question Evaluation: Poorly framed questions lead to poor query results. The agent will evaluate the user’s query and clarify the questions if the Agent believes it is necessary. Question Re-write: Rewrite the user enquiry to project it to the indexed content in the semantic space. I found these steps essential for improving the precision during the retrieval. Let’s say if your knowledge base is Q/A pair and you are indexing the questions part to search for answers. Rewriting the user’s query statement to a proper question will help you find the most relevant content. Query Vector Index: Many parameters can be tuned when building such an index, including chunk size, overlap, or a different index type. For simplicity, we are using VectorStoreIndex here, which has a default chunking strategy. Filter & Synthetic: Instead of a complex re-ranking process, I explicitly instruct LLM to filter and find relevant content in the prompt. I see LLM picking up the most relevant content, even though sometimes it has a lower similarity score than others. With this Agentic RAG, you can retrieve highly relevant content to the user’s questions, generating more targeted advice. Let’s examine the implementation. With the LlamaIndex SDK, creating and persisting an index in your local directory is straightforward. from llama_index.core import Document, VectorStoreIndex, StorageContext, load_index_from_storage Settings.embed_model = OpenAIEmbeddingPERSISTED_INDEX_PATH = "/path/to/the/directory/persist/index/locally" def create_index: documents =vector_index = VectorStoreIndex.from_documentsvector_index.storage_context.persistdef load_index: storage_context = StorageContext.from_defaultsindex = load_index_from_storagereturn index Once we have an index, we can create a query engine on top of that. The query engine is a powerful abstraction that allows you to adjust the parameters during the queryand the synthesis behaviour after the content retrieval. In my implementation, I overwrite the response_mode NO_TEXT because the agent will process the book content returned by the function call and synthesize the final result. Having the query engine to synthesize the result before passing it to the agent would be redundant. from llama_index.core.indices.vector_store import VectorIndexRetriever from llama_index.core.query_engine import RetrieverQueryEngine from llama_index.core.response_synthesizers import ResponseMode from llama_index.core import VectorStoreIndex, get_response_synthesizer def _create_query_engine_from_index: # configure retriever retriever = VectorIndexRetriever# return the original content without using LLM to synthesizer. For later evaluation. response_synthesizer = get_response_synthesizer# assemble query engine query_engine = RetrieverQueryEnginereturn query_engine The prompt looks like the following: You are an assistant that helps reframe user questions into clear, concept-driven statements that match the style and topics of Principles by Ray Dalio, and perform look up principle book for relevant content. Background: Principles teaches structured thinking about life and work decisions. The key ideas are: * Radical truth and radical transparency * Decision-making frameworks * Embracing mistakes as learning Task: - Task 1: Clarify the user's question if needed. Ask follow-up questions to ensure you understand the user's intent. - Task 2: Rewrite a user’s question into a statement that would match how Ray Dalio frames ideas in Principles. Use formal, logical, neutral tone. - Task 3: Look up principle book with given re-wrote statements. You should provide at least {REWRITE_FACTOR} rewrote versions. - Task 4: Find the most relevant from the book content as your fina answers. Finally, we can build the agent with those functions defined. def get_principle_rag_agent: index = load_persisted_indexquery_engine = _create_query_engine_from_indexdef look_up_principle_book-> List: result =for q in rewrote_statement: response = query_engine.querycontent =result.extendreturn result def clarify_question-> str: """ Clarify the user's question if needed. Ask follow-up questions to ensure you understand the user's intent. """ response = "" for q in your_questions_to_user: printr = inputresponse += f"Question: {q}\nResponse: {r}\n" return response tools =agent = FunctionAgentreturn agent rag_agent = get_principle_rag_agentresponse = await agent.runThere are a few observations I had during the implementations: One interesting fact I found is that providing a non-used parameter, original_question , in the function signature helps. I found that when I do not have such a parameter, LLM sometimes does not follow the rewrite instruction and passes the original question in rewrote_statement the parameter. Having original_question parameters somehow emphasizes the rewriting mission to LLM. Different LLMs behave quite differently given the same prompt. I found DeepSeek V3 much more reluctant to trigger function calls than other model providers. This doesn’t necessarily mean it is not usable. If a functional call should be initiated 90% of the time, it should be part of the workflow instead of being registered as a function call. Also, compared to OpenAI’s models, I found Gemini good at citing the source of the book when it synthesizes the results. The more content you load into the context window, the more inference capability the model needs. A smaller model with less inference power is more likely to get lost in the large context provided. However, to complete the seek-advice function, you’ll need multiple Agents working together instead of a single Agent. Let’s talk about how to chain your Agents together into workflows. Final Form: Agent Workflow Before we start, I recommend this article by Anthropic, Building Effective Agents. The one-liner summary of the articles is that you should always prioritise building a workflow instead of a dynamic agent when possible. In LlamaIndex, you can do both. It allows you to create an agent workflow with more automatic routing or a customised workflow with more explicit control of the transition of steps. I will provide an example of both implementations. Workflow Explain. Image by Author. Let’s take a look at how you can build a dynamic workflow. Here is a code example. interviewer = FunctionAgentinterviewer = FunctionAgentadvisor = FunctionAgentworkflow = AgentWorkflowhandler = await workflow.runIt is dynamic because the Agent transition is based on the function call of the LLM model. Underlying, LlamaIndex workflow provides agent descriptions as functions for LLM models. When the LLM model triggers such “Agent Function Call”, LlamaIndex will route to your next corresponding agent for the subsequent step processing. Your previous agent’s output has been added to the workflow internal state, and your following agent will pick up the state as part of the context in their call to the LLM model. You also leverage state and memory components to manage the workflow’s internal state or load external data. However, as I have suggested, you can explicitly control the steps in your workflow to gain more control. With LlamaIndex, it can be done by extending the workflow object. For example: class ReferenceRetrivalEvent: question: str class Advice: principles: Listprofile: dict question: str book_content: str class AdviceWorkFlow: def __init__: state = get_workflow_stateself.principles = state.load_principle_from_casesself.profile = state.load_profileself.verbose = verbose super.__init__@step async def interview-> ReferenceRetrivalEvent: # Step 1: Interviewer agent asks questions to the user interviewer = get_interviewer_agentquestion = await _run_agentreturn ReferenceRetrivalEvent@step async def retrieve-> Advice: # Step 2: RAG agent retrieves relevant content from the book rag_agent = get_principle_rag_agentbook_content = await _run_agentreturn Advice@step async def advice-> StopEvent: # Step 3: Adviser agent provides advice based on the user's profile, principles, and book content advisor = get_adviser_agentadvise = await _run_agentreturn StopEventThe specific event type’s return controls the workflow’s step transition. For instance, retrieve step returns an Advice event that will trigger the execution of the advice step. You can also leverage the Advice event to pass the necessary information you need. During the implementation, if you are annoyed by having to start over the workflow to debug some steps in the middle, the context object is essential when you want to failover the workflow execution. You can store your state in a serialised format and recover your workflow by unserialising it to a context object. Your workflow will continue executing based on the state instead of starting over. workflow = AgentWorkflowtry: handler = w.runresult = await handler except Exception as e: printawait fail_over# Optional, serialised and save the contexct for debugging ctx_dict = ctx.to_dict) json_dump_and_save# Resume from the same context ctx_dict = load_failed_dictrestored_ctx = Context.from_dict) handler = w.runresult = await handler Summary In this post, we have discussed how to use LlamaIndex to implement an AI journal’s core function. The key learning includes: Using Agentic RAG to leverage LLM capability to dynamically rewrite the original query and synthesis result. Use a Customized Workflow to gain more explicit control over step transitions. Build dynamic agents when necessary. The source code of this AI journal is in my GitHub repo here. I hope you enjoy this article and this small app I built. Cheers! The post How to Build an AI Journal with LlamaIndex appeared first on Towards Data Science. #how #build #journal #with #llamaindex
    TOWARDSDATASCIENCE.COM
    How to Build an AI Journal with LlamaIndex
    This post will share how to build an AI journal with the LlamaIndex. We will cover one essential function of this AI journal: asking for advice. We will start with the most basic implementation and iterate from there. We can see significant improvements for this function when we apply design patterns like Agentic Rag and multi-agent workflow.You can find the source code of this AI Journal in my GitHub repo here. And about who I am. Overview of AI Journal I want to build my principles by following Ray Dalio’s practice. An AI journal will help me to self-reflect, track my improvement, and even give me advice. The overall function of such an AI journal looks like this: AI Journal Overview. Image by Author. Today, we will only cover the implementation of the seek-advise flow, which is represented by multiple purple cycles in the above diagram. Simplest Form: LLM with Large Context In the most straightforward implementation, we can pass all the relevant content into the context and attach the question we want to ask. We can do that in Llamaindex with a few lines of code. import pymupdf from llama_index.llms.openai import OpenAI path_to_pdf_book = './path/to/pdf/book.pdf' def load_book_content(): text = "" with pymupdf.open(path_to_pdf_book) as pdf: for page in pdf: text += str(page.get_text().encode("utf8", errors='ignore')) return text system_prompt_template = """You are an AI assistant that provides thoughtful, practical, and *deeply personalized* suggestions by combining: - The user's personal profile and principles - Insights retrieved from *Principles* by Ray Dalio Book Content: ``` {book_content} ``` User profile: ``` {user_profile} ``` User's question: ``` {user_question} ``` """ def get_system_prompt(book_content: str, user_profile: str, user_question: str): system_prompt = system_prompt_template.format( book_content=book_content, user_profile=user_profile, user_question=user_question ) return system_prompt def chat(): llm = get_openai_llm() user_profile = input(">>Tell me about yourself: ") user_question = input(">>What do you want to ask: ") user_profile = user_profile.strip() book_content = load_book_summary() response = llm.complete(prompt=get_system_prompt(book_content, user_profile, user_question)) return response This approach has downsides: Low Precision: Loading all the book context might prompt LLM to lose focus on the user’s question. High Cost: Sending over significant-sized content in every LLM call means high cost and poor performance. With this approach, if you pass the whole content of Ray Dalio’s Principles book, responses to questions like “How to handle stress?” become very general. Such responses without relating to my question made me feel that the AI was not listening to me. Even though it covers many important concepts like embracing reality, the 5-step process to get what you want, and being radically open-minded. I like the advice I got to be more targeted to the question I raised. Let’s see how we can improve it with RAG. Enhanced Form: Agentic RAG So, what is Agentic RAG? Agentic RAG is combining dynamic decision-making and data retrieval. In our AI journal, the Agentic RAG flow looks like this: Stages of Agentic Rag. Image by Author Question Evaluation: Poorly framed questions lead to poor query results. The agent will evaluate the user’s query and clarify the questions if the Agent believes it is necessary. Question Re-write: Rewrite the user enquiry to project it to the indexed content in the semantic space. I found these steps essential for improving the precision during the retrieval. Let’s say if your knowledge base is Q/A pair and you are indexing the questions part to search for answers. Rewriting the user’s query statement to a proper question will help you find the most relevant content. Query Vector Index: Many parameters can be tuned when building such an index, including chunk size, overlap, or a different index type. For simplicity, we are using VectorStoreIndex here, which has a default chunking strategy. Filter & Synthetic: Instead of a complex re-ranking process, I explicitly instruct LLM to filter and find relevant content in the prompt. I see LLM picking up the most relevant content, even though sometimes it has a lower similarity score than others. With this Agentic RAG, you can retrieve highly relevant content to the user’s questions, generating more targeted advice. Let’s examine the implementation. With the LlamaIndex SDK, creating and persisting an index in your local directory is straightforward. from llama_index.core import Document, VectorStoreIndex, StorageContext, load_index_from_storage Settings.embed_model = OpenAIEmbedding(api_key="ak-xxxx") PERSISTED_INDEX_PATH = "/path/to/the/directory/persist/index/locally" def create_index(content: str): documents = [Document(text=content)] vector_index = VectorStoreIndex.from_documents(documents) vector_index.storage_context.persist(persist_dir=PERSISTED_INDEX_PATH) def load_index(): storage_context = StorageContext.from_defaults(persist_dir=PERSISTED_INDEX_PATH) index = load_index_from_storage(storage_context) return index Once we have an index, we can create a query engine on top of that. The query engine is a powerful abstraction that allows you to adjust the parameters during the query(e.g., TOP K) and the synthesis behaviour after the content retrieval. In my implementation, I overwrite the response_mode NO_TEXT because the agent will process the book content returned by the function call and synthesize the final result. Having the query engine to synthesize the result before passing it to the agent would be redundant. from llama_index.core.indices.vector_store import VectorIndexRetriever from llama_index.core.query_engine import RetrieverQueryEngine from llama_index.core.response_synthesizers import ResponseMode from llama_index.core import VectorStoreIndex, get_response_synthesizer def _create_query_engine_from_index(index: VectorStoreIndex): # configure retriever retriever = VectorIndexRetriever( index=index, similarity_top_k=TOP_K, ) # return the original content without using LLM to synthesizer. For later evaluation. response_synthesizer = get_response_synthesizer(response_mode=ResponseMode.NO_TEXT) # assemble query engine query_engine = RetrieverQueryEngine( retriever=retriever, response_synthesizer=response_synthesizer ) return query_engine The prompt looks like the following: You are an assistant that helps reframe user questions into clear, concept-driven statements that match the style and topics of Principles by Ray Dalio, and perform look up principle book for relevant content. Background: Principles teaches structured thinking about life and work decisions. The key ideas are: * Radical truth and radical transparency * Decision-making frameworks * Embracing mistakes as learning Task: - Task 1: Clarify the user's question if needed. Ask follow-up questions to ensure you understand the user's intent. - Task 2: Rewrite a user’s question into a statement that would match how Ray Dalio frames ideas in Principles. Use formal, logical, neutral tone. - Task 3: Look up principle book with given re-wrote statements. You should provide at least {REWRITE_FACTOR} rewrote versions. - Task 4: Find the most relevant from the book content as your fina answers. Finally, we can build the agent with those functions defined. def get_principle_rag_agent(): index = load_persisted_index() query_engine = _create_query_engine_from_index(index) def look_up_principle_book(original_question: str, rewrote_statement: List[str]) -> List[str]: result = [] for q in rewrote_statement: response = query_engine.query(q) content = [n.get_content() for n in response.source_nodes] result.extend(content) return result def clarify_question(original_question: str, your_questions_to_user: List[str]) -> str: """ Clarify the user's question if needed. Ask follow-up questions to ensure you understand the user's intent. """ response = "" for q in your_questions_to_user: print(f"Question: {q}") r = input("Response:") response += f"Question: {q}\nResponse: {r}\n" return response tools = [ FunctionTool.from_defaults( fn=look_up_principle_book, name="look_up_principle_book", description="Look up principle book with re-wrote queries. Getting the suggestions from the Principle book by Ray Dalio"), FunctionTool.from_defaults( fn=clarify_question, name="clarify_question", description="Clarify the user's question if needed. Ask follow-up questions to ensure you understand the user's intent.", ) ] agent = FunctionAgent( name="principle_reference_loader", description="You are a helpful agent will based on user's question and look up the most relevant content in principle book.\n", system_prompt=QUESTION_REWRITE_PROMPT, tools=tools, ) return agent rag_agent = get_principle_rag_agent() response = await agent.run(chat_history=chat_history) There are a few observations I had during the implementations: One interesting fact I found is that providing a non-used parameter, original_question , in the function signature helps. I found that when I do not have such a parameter, LLM sometimes does not follow the rewrite instruction and passes the original question in rewrote_statement the parameter. Having original_question parameters somehow emphasizes the rewriting mission to LLM. Different LLMs behave quite differently given the same prompt. I found DeepSeek V3 much more reluctant to trigger function calls than other model providers. This doesn’t necessarily mean it is not usable. If a functional call should be initiated 90% of the time, it should be part of the workflow instead of being registered as a function call. Also, compared to OpenAI’s models, I found Gemini good at citing the source of the book when it synthesizes the results. The more content you load into the context window, the more inference capability the model needs. A smaller model with less inference power is more likely to get lost in the large context provided. However, to complete the seek-advice function, you’ll need multiple Agents working together instead of a single Agent. Let’s talk about how to chain your Agents together into workflows. Final Form: Agent Workflow Before we start, I recommend this article by Anthropic, Building Effective Agents. The one-liner summary of the articles is that you should always prioritise building a workflow instead of a dynamic agent when possible. In LlamaIndex, you can do both. It allows you to create an agent workflow with more automatic routing or a customised workflow with more explicit control of the transition of steps. I will provide an example of both implementations. Workflow Explain. Image by Author. Let’s take a look at how you can build a dynamic workflow. Here is a code example. interviewer = FunctionAgent( name="interviewer", description="Useful agent to clarify user's questions", system_prompt=_intervierw_prompt, can_handoff_to = ["retriver"] tools=tools ) interviewer = FunctionAgent( name="retriever", description="Useful agent to retrive principle book's content.", system_prompt=_retriver_prompt, can_handoff_to = ["advisor"] tools=tools ) advisor = FunctionAgent( name="advisor", description="Useful agent to advise user.", system_prompt=_advisor_prompt, can_handoff_to = [] tools=tools ) workflow = AgentWorkflow( agents=[interviewer, advisor, retriever], root_agent="interviewer", ) handler = await workflow.run(user_msg="How to handle stress?") It is dynamic because the Agent transition is based on the function call of the LLM model. Underlying, LlamaIndex workflow provides agent descriptions as functions for LLM models. When the LLM model triggers such “Agent Function Call”, LlamaIndex will route to your next corresponding agent for the subsequent step processing. Your previous agent’s output has been added to the workflow internal state, and your following agent will pick up the state as part of the context in their call to the LLM model. You also leverage state and memory components to manage the workflow’s internal state or load external data(reference the document here). However, as I have suggested, you can explicitly control the steps in your workflow to gain more control. With LlamaIndex, it can be done by extending the workflow object. For example: class ReferenceRetrivalEvent(Event): question: str class Advice(Event): principles: List[str] profile: dict question: str book_content: str class AdviceWorkFlow(Workflow): def __init__(self, verbose: bool = False, session_id: str = None): state = get_workflow_state(session_id) self.principles = state.load_principle_from_cases() self.profile = state.load_profile() self.verbose = verbose super().__init__(timeout=None, verbose=verbose) @step async def interview(self, ctx: Context, ev: StartEvent) -> ReferenceRetrivalEvent: # Step 1: Interviewer agent asks questions to the user interviewer = get_interviewer_agent() question = await _run_agent(interviewer, question=ev.user_msg, verbose=self.verbose) return ReferenceRetrivalEvent(question=question) @step async def retrieve(self, ctx: Context, ev: ReferenceRetrivalEvent) -> Advice: # Step 2: RAG agent retrieves relevant content from the book rag_agent = get_principle_rag_agent() book_content = await _run_agent(rag_agent, question=ev.question, verbose=self.verbose) return Advice(principles=self.principles, profile=self.profile, question=ev.question, book_content=book_content) @step async def advice(self, ctx: Context, ev: Advice) -> StopEvent: # Step 3: Adviser agent provides advice based on the user's profile, principles, and book content advisor = get_adviser_agent(ev.profile, ev.principles, ev.book_content) advise = await _run_agent(advisor, question=ev.question, verbose=self.verbose) return StopEvent(result=advise) The specific event type’s return controls the workflow’s step transition. For instance, retrieve step returns an Advice event that will trigger the execution of the advice step. You can also leverage the Advice event to pass the necessary information you need. During the implementation, if you are annoyed by having to start over the workflow to debug some steps in the middle, the context object is essential when you want to failover the workflow execution. You can store your state in a serialised format and recover your workflow by unserialising it to a context object. Your workflow will continue executing based on the state instead of starting over. workflow = AgentWorkflow( agents=[interviewer, advisor, retriever], root_agent="interviewer", ) try: handler = w.run() result = await handler except Exception as e: print(f"Error during initial run: {e}") await fail_over() # Optional, serialised and save the contexct for debugging ctx_dict = ctx.to_dict(serializer=JsonSerializer()) json_dump_and_save(ctx_dict) # Resume from the same context ctx_dict = load_failed_dict() restored_ctx = Context.from_dict(workflow, ctx_dict,serializer=JsonSerializer()) handler = w.run(ctx=handler.ctx) result = await handler Summary In this post, we have discussed how to use LlamaIndex to implement an AI journal’s core function. The key learning includes: Using Agentic RAG to leverage LLM capability to dynamically rewrite the original query and synthesis result. Use a Customized Workflow to gain more explicit control over step transitions. Build dynamic agents when necessary. The source code of this AI journal is in my GitHub repo here. I hope you enjoy this article and this small app I built. Cheers! The post How to Build an AI Journal with LlamaIndex appeared first on Towards Data Science.
    0 Commentarii 0 Distribuiri 0 previzualizare
CGShares https://cgshares.com