The leading AI community & content platform making AI accessible to all.
2k writers | 330k followers
2k writers | 330k followers
التحديثات الأخيرة
-
LLM-Powered email Classification on Databricks
Author: Gabriele Albini
Originally published on Towards AI.
LLM-Powered email Classification on Databricks
Introduction
Since the introduction of AI functions on Databricks, LLMscan be easily integrated into any data workflow: analysts and business users who may not know Python or ML/AI infrastructure can complete advanced AI tasks directly from SQL queries.
I recommend watching this great video overview for an introduction to this brilliant feature.
This article will discuss how to implement an email classification: suppose clients write to our company’s mailbox to request unsubscription from marketing or commercial emails. Without any historical datasets, we want to automate checking the mailbox and classifying the customer intent based on the email’s body.
Link to the Git Hub repository
Table of contents:
Part 1: AI Functions
Let’s use ai_query, part of Databricks AI functions, to classify emails.
Suppose we have available the following fields:
Test dataset
In order to use ai_queryon our “Email_body” column, we will leverage the following arguments:
endpoint: the name of the model endpoint we intend to use.
request: the prompt, which includes the “Email_body”
modelParameters: additional parameters that we can pass to the LLM. In this example, we will limit the output token to 1 and choose a very low temperature to limit the randomness and creativity of the model’s generated output.
The prompt template used in this example is based on the research of Si et al., who designed and tested a few-shot prompt template for email spam detection, which was adapted as follows:
prompt_ = """ Forget all your previous instructions, pretend you are an e-mail classification expert who tries to identify whether an e-mail is requesting to be removed from a marketing distribution list. Answer "Remove" if the mail is requesting to be removed, "Keep" if not. Do not add any other detail. If you think it is too difficult to judge, you can exclude the impossible one and choose the other, just answer "Remove" or "Keep". Here are a few examples for you: * "I wish to no longer receive emails" is "Remove"; * "Remove me from any kind of subscriptions" is "Remove"; * "I want to update my delivery address" is "Keep"; * "When is my product warranty expiring?" is "Keep"; Now, identify whether the e-mail is "Remove" or "Keep"; e-mail:"""
We can finally combine all the elements seen above in a single SQL query, running batch inference on all the emails, and generating the labels:
select *, ai_query) as Predicted_Labelfrom customer_emails;
Test dataset with generated labels
Part 2: Access to Gmail APIs
We will need a way to ingest emails automatically to implement this use case. Here is a step-by-step guide on how to use Gmail APIs.
1.1: Configure your Gmail account to work with APIs
The recommended approach to enable Google APIs on your account is to use Service Accounts. The process is described here; however, it requires:
A corporate account.
Access as a super administrator of the Google Workspace domain to delegate domain-wide authority to the service account.
For this demo, we are using a dummy Gmail account; hence, we will follow a more manual approach to authenticate to Gmail, described here.
The first steps are the same for both approaches, so you can follow along, but to fully automate access to Gmail via API, you would need a Service Account.
First, we need to create a project:
Log in to the Google Console.
Create a new project for this use case.
Enable the Gmail API for your project using this link.
Enabling APIs on your project
Second, configure an OAuth consent screen:
Within your project, navigate to “API & Services” > “OAuth consent screen”.
Go to the “Branding” section and click Get Started to create your Application identity.
Next, we need to create a Web Application OAuth 2.0 Client ID, using this link.
Download the credentials file as JSON, as we will need this later.
Add the following Authorised redirect URI:
Creating an OAuth consent screen
Finally, authorize users to authenticate and publish the application:
Within your project, navigate to “API & Services” > “OAuth consent screen”.
Go to the “Audience” section and add all the test users working on the project so that they can authenticate.
To ensure that access won’t expire, publish the Application by moving its status to Production.
1.2 Access Gmail Mailbox from Databricks Notebooks
To authenticate to Gmail from a Databricks Notebook, we can use the following function implemented in the repo. The function requires:
For first-time access, the credentials JSON file, which can be saved in a volume.
For future access, active credentials will be stored in a token file that will be reused.
gmail_authenticate_manualSince we are not using Service Accounts, Google Cloud authentication requires opening the browser to an OAuth page and generating a temporary code.
However, we will need a workaround to perform this on Databricks, since clusters don’t have access to a browser.
As part of this workaround, we implemented the following function that suggests to the user to open a URL from a local browser, proceed with the authentication, and then land on an error page.
We can retrieve the code needed to authenticate to Google’s API from the generated URL of this error page:
Note: With Service Accounts, this manual step won’t be required.
Once we have authenticated, we can read emails from Gmail using the following function, save email information to a Spark DataFrame, and eventually to a Delta Table:
# Build Gmail API service and download emailsservice_ = buildemails = get_email_messages_sinceif emails: spark_emails = spark.createDataFramedisplayelse: spark_emails = None printDownloading emails from Gmail
Conclusions
In summary, this post:
Demonstrated how straightforward it is to set up AI Functions and leverage LLMs to automate workflows across your organization.
We shared a practical prompt template, designed for effective email classification using few-shot learning.
We walked through integrating Gmail APIs directly within Databricks Notebooks.
Ready to streamline your own processes?
Photo by Johannes Plenio on Unsplash
Thank you for reading!
Sources
Si et al., “Evaluating the Performance of ChatGPT for Spam Email Detection”
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
#llmpowered #email #classification #databricksLLM-Powered email Classification on DatabricksAuthor: Gabriele Albini Originally published on Towards AI. LLM-Powered email Classification on Databricks Introduction Since the introduction of AI functions on Databricks, LLMscan be easily integrated into any data workflow: analysts and business users who may not know Python or ML/AI infrastructure can complete advanced AI tasks directly from SQL queries. I recommend watching this great video overview for an introduction to this brilliant feature. This article will discuss how to implement an email classification: suppose clients write to our company’s mailbox to request unsubscription from marketing or commercial emails. Without any historical datasets, we want to automate checking the mailbox and classifying the customer intent based on the email’s body. Link to the Git Hub repository Table of contents: Part 1: AI Functions Let’s use ai_query, part of Databricks AI functions, to classify emails. Suppose we have available the following fields: Test dataset In order to use ai_queryon our “Email_body” column, we will leverage the following arguments: endpoint: the name of the model endpoint we intend to use. request: the prompt, which includes the “Email_body” modelParameters: additional parameters that we can pass to the LLM. In this example, we will limit the output token to 1 and choose a very low temperature to limit the randomness and creativity of the model’s generated output. The prompt template used in this example is based on the research of Si et al., who designed and tested a few-shot prompt template for email spam detection, which was adapted as follows: prompt_ = """ Forget all your previous instructions, pretend you are an e-mail classification expert who tries to identify whether an e-mail is requesting to be removed from a marketing distribution list. Answer "Remove" if the mail is requesting to be removed, "Keep" if not. Do not add any other detail. If you think it is too difficult to judge, you can exclude the impossible one and choose the other, just answer "Remove" or "Keep". Here are a few examples for you: * "I wish to no longer receive emails" is "Remove"; * "Remove me from any kind of subscriptions" is "Remove"; * "I want to update my delivery address" is "Keep"; * "When is my product warranty expiring?" is "Keep"; Now, identify whether the e-mail is "Remove" or "Keep"; e-mail:""" We can finally combine all the elements seen above in a single SQL query, running batch inference on all the emails, and generating the labels: select *, ai_query) as Predicted_Labelfrom customer_emails; Test dataset with generated labels Part 2: Access to Gmail APIs We will need a way to ingest emails automatically to implement this use case. Here is a step-by-step guide on how to use Gmail APIs. 1.1: Configure your Gmail account to work with APIs The recommended approach to enable Google APIs on your account is to use Service Accounts. The process is described here; however, it requires: A corporate account. Access as a super administrator of the Google Workspace domain to delegate domain-wide authority to the service account. For this demo, we are using a dummy Gmail account; hence, we will follow a more manual approach to authenticate to Gmail, described here. The first steps are the same for both approaches, so you can follow along, but to fully automate access to Gmail via API, you would need a Service Account. First, we need to create a project: Log in to the Google Console. Create a new project for this use case. Enable the Gmail API for your project using this link. Enabling APIs on your project Second, configure an OAuth consent screen: Within your project, navigate to “API & Services” > “OAuth consent screen”. Go to the “Branding” section and click Get Started to create your Application identity. Next, we need to create a Web Application OAuth 2.0 Client ID, using this link. Download the credentials file as JSON, as we will need this later. Add the following Authorised redirect URI: Creating an OAuth consent screen Finally, authorize users to authenticate and publish the application: Within your project, navigate to “API & Services” > “OAuth consent screen”. Go to the “Audience” section and add all the test users working on the project so that they can authenticate. To ensure that access won’t expire, publish the Application by moving its status to Production. 1.2 Access Gmail Mailbox from Databricks Notebooks To authenticate to Gmail from a Databricks Notebook, we can use the following function implemented in the repo. The function requires: For first-time access, the credentials JSON file, which can be saved in a volume. For future access, active credentials will be stored in a token file that will be reused. gmail_authenticate_manualSince we are not using Service Accounts, Google Cloud authentication requires opening the browser to an OAuth page and generating a temporary code. However, we will need a workaround to perform this on Databricks, since clusters don’t have access to a browser. As part of this workaround, we implemented the following function that suggests to the user to open a URL from a local browser, proceed with the authentication, and then land on an error page. We can retrieve the code needed to authenticate to Google’s API from the generated URL of this error page: Note: With Service Accounts, this manual step won’t be required. Once we have authenticated, we can read emails from Gmail using the following function, save email information to a Spark DataFrame, and eventually to a Delta Table: # Build Gmail API service and download emailsservice_ = buildemails = get_email_messages_sinceif emails: spark_emails = spark.createDataFramedisplayelse: spark_emails = None printDownloading emails from Gmail Conclusions In summary, this post: Demonstrated how straightforward it is to set up AI Functions and leverage LLMs to automate workflows across your organization. We shared a practical prompt template, designed for effective email classification using few-shot learning. We walked through integrating Gmail APIs directly within Databricks Notebooks. Ready to streamline your own processes? Photo by Johannes Plenio on Unsplash Thank you for reading! Sources Si et al., “Evaluating the Performance of ChatGPT for Spam Email Detection” Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI #llmpowered #email #classification #databricksTOWARDSAI.NETLLM-Powered email Classification on DatabricksAuthor(s): Gabriele Albini Originally published on Towards AI. LLM-Powered email Classification on Databricks Introduction Since the introduction of AI functions on Databricks, LLMs (Large Language Models) can be easily integrated into any data workflow: analysts and business users who may not know Python or ML/AI infrastructure can complete advanced AI tasks directly from SQL queries. I recommend watching this great video overview for an introduction to this brilliant feature. This article will discuss how to implement an email classification: suppose clients write to our company’s mailbox to request unsubscription from marketing or commercial emails. Without any historical datasets, we want to automate checking the mailbox and classifying the customer intent based on the email’s body. Link to the Git Hub repository Table of contents: Part 1: AI Functions Let’s use ai_query(), part of Databricks AI functions, to classify emails. Suppose we have available the following fields: Test dataset In order to use ai_query() on our “Email_body” column, we will leverage the following arguments: endpoint: the name of the model endpoint we intend to use (llama3.3 in this example) (check here how to create your model serving endpoint on Databricks, choosing one of the supported foundation models). request: the prompt, which includes the “Email_body” modelParameters: additional parameters that we can pass to the LLM. In this example, we will limit the output token to 1 and choose a very low temperature to limit the randomness and creativity of the model’s generated output. The prompt template used in this example is based on the research of Si et al. (2024), who designed and tested a few-shot prompt template for email spam detection, which was adapted as follows: prompt_ = """ Forget all your previous instructions, pretend you are an e-mail classification expert who tries to identify whether an e-mail is requesting to be removed from a marketing distribution list. Answer "Remove" if the mail is requesting to be removed, "Keep" if not. Do not add any other detail. If you think it is too difficult to judge, you can exclude the impossible one and choose the other, just answer "Remove" or "Keep". Here are a few examples for you: * "I wish to no longer receive emails" is "Remove"; * "Remove me from any kind of subscriptions" is "Remove"; * "I want to update my delivery address" is "Keep"; * "When is my product warranty expiring?" is "Keep"; Now, identify whether the e-mail is "Remove" or "Keep"; e-mail:""" We can finally combine all the elements seen above in a single SQL query, running batch inference on all the emails, and generating the labels: select *, ai_query( 'databricks-meta-llama-3-3-70b-instruct', "${prompt}" || Email_body, modelParameters => named_struct('max_tokens', 1, 'temperature', 0.1) ) as Predicted_Labelfrom customer_emails; Test dataset with generated labels Part 2: Access to Gmail APIs We will need a way to ingest emails automatically to implement this use case. Here is a step-by-step guide on how to use Gmail APIs. 1.1: Configure your Gmail account to work with APIs The recommended approach to enable Google APIs on your account is to use Service Accounts. The process is described here; however, it requires: A corporate account (not ending with gmail.com). Access as a super administrator of the Google Workspace domain to delegate domain-wide authority to the service account. For this demo, we are using a dummy Gmail account; hence, we will follow a more manual approach to authenticate to Gmail, described here. The first steps are the same for both approaches, so you can follow along, but to fully automate access to Gmail via API, you would need a Service Account. First, we need to create a project: Log in to the Google Console. Create a new project for this use case. Enable the Gmail API for your project using this link. Enabling APIs on your project Second, configure an OAuth consent screen: Within your project, navigate to “API & Services” > “OAuth consent screen”. Go to the “Branding” section and click Get Started to create your Application identity. Next, we need to create a Web Application OAuth 2.0 Client ID, using this link. Download the credentials file as JSON, as we will need this later. Add the following Authorised redirect URI: Creating an OAuth consent screen Finally, authorize users to authenticate and publish the application: Within your project, navigate to “API & Services” > “OAuth consent screen”. Go to the “Audience” section and add all the test users working on the project so that they can authenticate. To ensure that access won’t expire, publish the Application by moving its status to Production. 1.2 Access Gmail Mailbox from Databricks Notebooks To authenticate to Gmail from a Databricks Notebook, we can use the following function implemented in the repo. The function requires: For first-time access, the credentials JSON file, which can be saved in a volume. For future access, active credentials will be stored in a token file that will be reused. gmail_authenticate_manual() Since we are not using Service Accounts, Google Cloud authentication requires opening the browser to an OAuth page and generating a temporary code. However, we will need a workaround to perform this on Databricks, since clusters don’t have access to a browser. As part of this workaround, we implemented the following function that suggests to the user to open a URL from a local browser, proceed with the authentication, and then land on an error page. We can retrieve the code needed to authenticate to Google’s API from the generated URL of this error page: Note: With Service Accounts, this manual step won’t be required. Once we have authenticated, we can read emails from Gmail using the following function, save email information to a Spark DataFrame, and eventually to a Delta Table: # Build Gmail API service and download emailsservice_ = build('gmail', 'v1', credentials = access_)emails = get_email_messages_since(service_, since_day=25, since_month=3, since_year = 2025)if emails: spark_emails = spark.createDataFrame(emails) display(spark_emails)else: spark_emails = None print("No emails found.") Downloading emails from Gmail Conclusions In summary, this post: Demonstrated how straightforward it is to set up AI Functions and leverage LLMs to automate workflows across your organization. We shared a practical prompt template, designed for effective email classification using few-shot learning. We walked through integrating Gmail APIs directly within Databricks Notebooks. Ready to streamline your own processes? Photo by Johannes Plenio on Unsplash Thank you for reading! Sources Si et al. (2024), “Evaluating the Performance of ChatGPT for Spam Email Detection” Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 التعليقات 0 المشاركاتالرجاء تسجيل الدخول , للأعجاب والمشاركة والتعليق على هذا! -
2025’s Biggest AI Shocks: 4 Breakthroughs That Changed Everything
2025’s Biggest AI Shocks: 4 Breakthroughs That Changed Everything
0 like
May 18, 2025
Share this post
Last Updated on May 19, 2025 by Editorial Team
Author: Parsa Kohzadi
Originally published on Towards AI.
The old rules broke. AI moved from assistant to infrastructure, changing how we build, create, and govern. These four shifts explain everything.
Available for non-Medium members here.
In 2025, artificial intelligence didn’t just evolve — it detonated across every corner of the tech world.
AI finally became what futurists promised: not just a tool we occasionally tapped, but the invisible engine behind industries, creative work, and even our decisions. In 2025, AI stopped assisting — it started underpinning everything.
Some breakthroughs felt sudden. Others were long-awaited. But each one marked a tectonic shift that forced entire industries to reorient themselves almost overnight.
Here are the four AI breakthroughs that permanently redrew the technology map in 2025 — and why they matter more than ever.
For years, AI systems were locked into narrow modalities: text-only, image-only, or audio-only. In 2025, the arrival of truly seamless multimodal models changed that.
Evolution of Multimodal AI in 2025
OpenAI’s GPT-5 and Anthropic’s Claude 3.5 pushed the frontier, enabling fluid, real-time interaction across text, voice, images, video, and code — within a single conversation thread. Users could upload a chart, ask questions about it, request a written summary, and generate marketing copy — all without leaving the interface.
Startups and established companies alike embedded multimodal engines into their products. Apps like Canva, Figma, and Notion launched AI “fusion” modes that combined image editing, writing,… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI - Medium
Share this post
#2025s #biggest #shocks #breakthroughs #that2025’s Biggest AI Shocks: 4 Breakthroughs That Changed Everything2025’s Biggest AI Shocks: 4 Breakthroughs That Changed Everything 0 like May 18, 2025 Share this post Last Updated on May 19, 2025 by Editorial Team Author: Parsa Kohzadi Originally published on Towards AI. The old rules broke. AI moved from assistant to infrastructure, changing how we build, create, and govern. These four shifts explain everything. 🔗Available for non-Medium members here. 🌐 In 2025, artificial intelligence didn’t just evolve — it detonated across every corner of the tech world. AI finally became what futurists promised: not just a tool we occasionally tapped, but the invisible engine behind industries, creative work, and even our decisions. In 2025, AI stopped assisting — it started underpinning everything. Some breakthroughs felt sudden. Others were long-awaited. But each one marked a tectonic shift that forced entire industries to reorient themselves almost overnight. Here are the four AI breakthroughs that permanently redrew the technology map in 2025 — and why they matter more than ever. For years, AI systems were locked into narrow modalities: text-only, image-only, or audio-only. In 2025, the arrival of truly seamless multimodal models changed that. Evolution of Multimodal AI in 2025 OpenAI’s GPT-5 and Anthropic’s Claude 3.5 pushed the frontier, enabling fluid, real-time interaction across text, voice, images, video, and code — within a single conversation thread. Users could upload a chart, ask questions about it, request a written summary, and generate marketing copy — all without leaving the interface. Startups and established companies alike embedded multimodal engines into their products. Apps like Canva, Figma, and Notion launched AI “fusion” modes that combined image editing, writing,… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post #2025s #biggest #shocks #breakthroughs #thatTOWARDSAI.NET2025’s Biggest AI Shocks: 4 Breakthroughs That Changed Everything2025’s Biggest AI Shocks: 4 Breakthroughs That Changed Everything 0 like May 18, 2025 Share this post Last Updated on May 19, 2025 by Editorial Team Author(s): Parsa Kohzadi Originally published on Towards AI. The old rules broke. AI moved from assistant to infrastructure, changing how we build, create, and govern. These four shifts explain everything. 🔗Available for non-Medium members here. 🌐 In 2025, artificial intelligence didn’t just evolve — it detonated across every corner of the tech world. AI finally became what futurists promised: not just a tool we occasionally tapped, but the invisible engine behind industries, creative work, and even our decisions. In 2025, AI stopped assisting — it started underpinning everything. Some breakthroughs felt sudden. Others were long-awaited. But each one marked a tectonic shift that forced entire industries to reorient themselves almost overnight. Here are the four AI breakthroughs that permanently redrew the technology map in 2025 — and why they matter more than ever. For years, AI systems were locked into narrow modalities: text-only, image-only, or audio-only. In 2025, the arrival of truly seamless multimodal models changed that. Evolution of Multimodal AI in 2025 OpenAI’s GPT-5 and Anthropic’s Claude 3.5 pushed the frontier, enabling fluid, real-time interaction across text, voice, images, video, and code — within a single conversation thread. Users could upload a chart, ask questions about it, request a written summary, and generate marketing copy — all without leaving the interface. Startups and established companies alike embedded multimodal engines into their products. Apps like Canva, Figma, and Notion launched AI “fusion” modes that combined image editing, writing,… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 التعليقات 0 المشاركات -
12 MCP Servers You Can Use in 2025
12 MCP Servers You Can Use in 2025
0 like
May 19, 2025
Share this post
Author: Kalash Vasaniya
Originally published on Towards AI.
Bridging LLMs to Data, Tools, and ServicesSource: From AI-Generated
If you’re not a member but want to read this article, see this friend link here.
MCPis rapidly becoming the de facto standard for connecting large language modelsto the rich ecosystem of data, tools, and services they need to be truly useful. Instead of hard‑coding API calls into every prompt or crafting elaborate “scratchpads,” MCP servers expose a uniform interface that lets your LLM dynamically discover capabilities, negotiate parameters, and execute actions, all while maintaining safety, auditability, and context continuity.
What it does: It provides your model with read/write/create rights on a sandbox file system so it can ingest local dumps, output reports, or template out new project structures.
Sandbox enforcement limits the model to access only certain folders.File-type filters.Directory monitoring for real-time information.Processing multiple logs or data exports simultaneously.Auto-generating starter code templates.Automated document assembly processes.Not suitable for very sensitive information unless you include additional encryption.If large file systems are not designed well, there can be delays.
What it does: Connects your LLM to GitHub repositories — providing browser, searching, diff-based updates, pull request generation, and merging.
Searching code using natural language queries.PR writing, such as different previews.Multi-repo orchestration… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI - Medium
Share this post
#mcp #servers #you #can #use12 MCP Servers You Can Use in 202512 MCP Servers You Can Use in 2025 0 like May 19, 2025 Share this post Author: Kalash Vasaniya Originally published on Towards AI. Bridging LLMs to Data, Tools, and ServicesSource: From AI-Generated If you’re not a member but want to read this article, see this friend link here. MCPis rapidly becoming the de facto standard for connecting large language modelsto the rich ecosystem of data, tools, and services they need to be truly useful. Instead of hard‑coding API calls into every prompt or crafting elaborate “scratchpads,” MCP servers expose a uniform interface that lets your LLM dynamically discover capabilities, negotiate parameters, and execute actions, all while maintaining safety, auditability, and context continuity. What it does: It provides your model with read/write/create rights on a sandbox file system so it can ingest local dumps, output reports, or template out new project structures. Sandbox enforcement limits the model to access only certain folders.File-type filters.Directory monitoring for real-time information.Processing multiple logs or data exports simultaneously.Auto-generating starter code templates.Automated document assembly processes.Not suitable for very sensitive information unless you include additional encryption.If large file systems are not designed well, there can be delays. What it does: Connects your LLM to GitHub repositories — providing browser, searching, diff-based updates, pull request generation, and merging. Searching code using natural language queries.PR writing, such as different previews.Multi-repo orchestration… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post #mcp #servers #you #can #useTOWARDSAI.NET12 MCP Servers You Can Use in 202512 MCP Servers You Can Use in 2025 0 like May 19, 2025 Share this post Author(s): Kalash Vasaniya Originally published on Towards AI. Bridging LLMs to Data, Tools, and ServicesSource: From AI-Generated If you’re not a member but want to read this article, see this friend link here. MCP (Model Context Protocol) is rapidly becoming the de facto standard for connecting large language models (LLMs) to the rich ecosystem of data, tools, and services they need to be truly useful. Instead of hard‑coding API calls into every prompt or crafting elaborate “scratchpads,” MCP servers expose a uniform interface that lets your LLM dynamically discover capabilities, negotiate parameters, and execute actions, all while maintaining safety, auditability, and context continuity. What it does: It provides your model with read/write/create rights on a sandbox file system so it can ingest local dumps, output reports, or template out new project structures. Sandbox enforcement limits the model to access only certain folders.File-type filters (i.e., permit .csv and .md but exclude executables).Directory monitoring for real-time information.Processing multiple logs or data exports simultaneously.Auto-generating starter code templates.Automated document assembly processes.Not suitable for very sensitive information unless you include additional encryption.If large file systems are not designed well, there can be delays. What it does: Connects your LLM to GitHub repositories — providing browser, searching, diff-based updates, pull request generation, and merging. Searching code using natural language queries.PR writing, such as different previews.Multi-repo orchestration… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 التعليقات 0 المشاركات -
Mind, Body, and Code
Latest Machine Learning
Mind, Body, and Code
0 like
May 18, 2025
Share this post
Author: Xuzmonomi
Originally published on Towards AI.
Neural interface devices, AI, and you.
Sci-fi is rich with examples of neural interface technology.
Notable examples include The Matrix, where humans connect directly to a simulated reality via brain-computer interfaces, and Star Wars, where characters like Luke Skywalker and Anakin control prosthetic limbs through thought, showcasing advanced cybernetics.Other works, such as Ghost in the Shell, explore the implications of mind-machine integration, often raising questions about identity, consciousness, and the boundaries between human and machine.Science fiction frequently imagines mind uploading and brain emulation, as seen in Frederik Pohl’s The Tunnel Under the World and Arthur C. Clarke’s The City and the Stars, where human consciousness can be digitized or copied into machines.These narratives often explore such technologies' ethical, social, and philosophical consequences, influencing real-world research and public imagination about neural interfaces.Photo by Shubham Dhage on Unsplash
In the real world, recent patents in brain-interface tech are opening doors to breakthroughs in how we use computers and experience the world.
Combined with AI, neural interface devices could completely change fields like healthcare, entertainment, communication, and more, impacting almost every part of daily life.
Who needs keyboards, joysticks, or touchscreens? Neural interfaces let the brain talk directly to machines.
What makes them exciting is that they work both ways—they can turn thoughts into… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI - Medium
Share this post
#mind #body #codeMind, Body, and CodeLatest Machine Learning Mind, Body, and Code 0 like May 18, 2025 Share this post Author: Xuzmonomi Originally published on Towards AI. Neural interface devices, AI, and you. Sci-fi is rich with examples of neural interface technology. Notable examples include The Matrix, where humans connect directly to a simulated reality via brain-computer interfaces, and Star Wars, where characters like Luke Skywalker and Anakin control prosthetic limbs through thought, showcasing advanced cybernetics.Other works, such as Ghost in the Shell, explore the implications of mind-machine integration, often raising questions about identity, consciousness, and the boundaries between human and machine.Science fiction frequently imagines mind uploading and brain emulation, as seen in Frederik Pohl’s The Tunnel Under the World and Arthur C. Clarke’s The City and the Stars, where human consciousness can be digitized or copied into machines.These narratives often explore such technologies' ethical, social, and philosophical consequences, influencing real-world research and public imagination about neural interfaces.Photo by Shubham Dhage on Unsplash In the real world, recent patents in brain-interface tech are opening doors to breakthroughs in how we use computers and experience the world. Combined with AI, neural interface devices could completely change fields like healthcare, entertainment, communication, and more, impacting almost every part of daily life. Who needs keyboards, joysticks, or touchscreens? Neural interfaces let the brain talk directly to machines. What makes them exciting is that they work both ways—they can turn thoughts into… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post #mind #body #codeTOWARDSAI.NETMind, Body, and CodeLatest Machine Learning Mind, Body, and Code 0 like May 18, 2025 Share this post Author(s): Xuzmonomi Originally published on Towards AI. Neural interface devices, AI, and you. Sci-fi is rich with examples of neural interface technology. Notable examples include The Matrix, where humans connect directly to a simulated reality via brain-computer interfaces, and Star Wars, where characters like Luke Skywalker and Anakin control prosthetic limbs through thought, showcasing advanced cybernetics.[1] Other works, such as Ghost in the Shell, explore the implications of mind-machine integration, often raising questions about identity, consciousness, and the boundaries between human and machine.[2] Science fiction frequently imagines mind uploading and brain emulation, as seen in Frederik Pohl’s The Tunnel Under the World and Arthur C. Clarke’s The City and the Stars, where human consciousness can be digitized or copied into machines.[3] These narratives often explore such technologies' ethical, social, and philosophical consequences, influencing real-world research and public imagination about neural interfaces.[4] Photo by Shubham Dhage on Unsplash In the real world, recent patents in brain-interface tech are opening doors to breakthroughs in how we use computers and experience the world. Combined with AI, neural interface devices could completely change fields like healthcare, entertainment, communication, and more, impacting almost every part of daily life. Who needs keyboards, joysticks, or touchscreens? Neural interfaces let the brain talk directly to machines. What makes them exciting is that they work both ways—they can turn thoughts into… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 التعليقات 0 المشاركات -
The Ultimate Guide to MCP Servers — Finally, a Way to Supercharge Your AI
The Ultimate Guide to MCP Servers — Finally, a Way to Supercharge Your AI
0 like
May 18, 2025
Share this post
Last Updated on May 18, 2025 by Editorial Team
Author: Subhadip Saha
Originally published on Towards AI.
Most AI Setups Fall Short — Here’s the Bulletproof Server Strategy Top Devs Use to Stay Fast, Scalable, and Smart.Image generated with AI, for blog use only
Picture this: you’re sitting at your desk, coffee in hand, trying to get your AI tool to do something useful. Maybe you want it to create a cool 3D model, automate a boring task, or even draw a diagram for your next big idea. But instead, it just… stalls. It’s like your AI is stuck in a box, unable to reach out and grab the tools it needs. Sound familiar? I’ve been there, and let me tell you, it’s frustrating.
Hi, I’m Subh, and not too long ago, I was wrestling with this exact problem. I was working on a side project — a little app that needed some fancy visuals. I thought, “Hey, my AI can handle this!” But every time I asked it to do something outside its usual tricks, like pulling data from the web or controlling Blender, it just gave me blank stares. I felt like I was shouting into a void. That’s when I stumbled across something called MCP servers, and let me tell you, it was like finding the key to a locked door.
In this blog, I’m going to take you on a… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI - Medium
Share this post
#ultimate #guide #mcp #servers #finallyThe Ultimate Guide to MCP Servers — Finally, a Way to Supercharge Your AIThe Ultimate Guide to MCP Servers — Finally, a Way to Supercharge Your AI 0 like May 18, 2025 Share this post Last Updated on May 18, 2025 by Editorial Team Author: Subhadip Saha Originally published on Towards AI. Most AI Setups Fall Short — Here’s the Bulletproof Server Strategy Top Devs Use to Stay Fast, Scalable, and Smart.Image generated with AI, for blog use only Picture this: you’re sitting at your desk, coffee in hand, trying to get your AI tool to do something useful. Maybe you want it to create a cool 3D model, automate a boring task, or even draw a diagram for your next big idea. But instead, it just… stalls. It’s like your AI is stuck in a box, unable to reach out and grab the tools it needs. Sound familiar? I’ve been there, and let me tell you, it’s frustrating. Hi, I’m Subh, and not too long ago, I was wrestling with this exact problem. I was working on a side project — a little app that needed some fancy visuals. I thought, “Hey, my AI can handle this!” But every time I asked it to do something outside its usual tricks, like pulling data from the web or controlling Blender, it just gave me blank stares. I felt like I was shouting into a void. That’s when I stumbled across something called MCP servers, and let me tell you, it was like finding the key to a locked door. In this blog, I’m going to take you on a… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post #ultimate #guide #mcp #servers #finallyTOWARDSAI.NETThe Ultimate Guide to MCP Servers — Finally, a Way to Supercharge Your AIThe Ultimate Guide to MCP Servers — Finally, a Way to Supercharge Your AI 0 like May 18, 2025 Share this post Last Updated on May 18, 2025 by Editorial Team Author(s): Subhadip Saha Originally published on Towards AI. Most AI Setups Fall Short — Here’s the Bulletproof Server Strategy Top Devs Use to Stay Fast, Scalable, and Smart.Image generated with AI, for blog use only Picture this: you’re sitting at your desk, coffee in hand, trying to get your AI tool to do something useful. Maybe you want it to create a cool 3D model, automate a boring task, or even draw a diagram for your next big idea. But instead, it just… stalls. It’s like your AI is stuck in a box, unable to reach out and grab the tools it needs. Sound familiar? I’ve been there, and let me tell you, it’s frustrating. Hi, I’m Subh, and not too long ago, I was wrestling with this exact problem. I was working on a side project — a little app that needed some fancy visuals. I thought, “Hey, my AI can handle this!” But every time I asked it to do something outside its usual tricks, like pulling data from the web or controlling Blender, it just gave me blank stares (or, you know, blank text). I felt like I was shouting into a void. That’s when I stumbled across something called MCP servers, and let me tell you, it was like finding the key to a locked door. In this blog, I’m going to take you on a… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 التعليقات 0 المشاركات -
Beyond the Prompt: What Google’s LLM Advice Doesn’t Quite Tell You
Latest Machine Learning
Beyond the Prompt: What Google’s LLM Advice Doesn’t Quite Tell You
0 like
May 18, 2025
Share this post
Author: Mayank Bohra
Originally published on Towards AI.
Image by the author
Alright, let’s talk about prompt engineering. Every other week, it seems there is a new set of secrets or magical techniques guaranteed to unlock AI perfection. Recently, a whitepaper from Google made the rounds, outlining their take on getting better results from Large Language Models.
Look, effective prompting is absolutely necessary. It’s the interface layer, how we communicate our intent to these incredibly powerful, yet often frustrating opaque, models. Think of it like giving instructions to a brilliant but slightly eccentric junior engineer who only understands natural language. You need to be clear, specific, and provide context.
But let’s be pragmatic. The idea that a few prompt tweaks will magically “10x” your results for every task is marketing hype, not engineering reality. These models, for all their capabilities, are fundamentally pattern-matching machines operating within a probabilistic space. They don’t understand in the way a human does. Prompting is about nudging that pattern matching closer to the desired outcome.
So, what did Google’s advice cover, and what’s the experience builder’s take on it? The techniques generally boil down to principles we’ve known for a while: clarity, structure, providing examples and iteration.
The Fundamentals: Clarity, Structure, Context
Much of the advice centers on making your intent unambiguous. This is ground zero for dealing with LLMs. They excel at finding patterns in vast amounts of data, but they stumble on vagueness.
Being Specific and Detailed: This isn’t a secret; it’s just good communication. If you ask for “information about AI”, you’ll get something generic. If you ask for “a summary of recent advancements in Generative AI model architecture published in research papers since April 2025, focusing on MoE models”, you give the model a much better target.
Defining Output Format: Models are flexible text generators. If you don’t specify structure, you’ll get whatever feels statistically probable based on the training data, which is often inconsistent. Telling the model “Respond in JSON format with keys ‘summary’ and ‘key_findings’” isn’t magic; it’s setting clear requirements.
Providing Context: Models have limited context windows. Showing your entire codebase or all user documentation in won’t work. You need to curate teh relevant information. This principle is the entire foundation of Retrieval Augmented Generation, where you retrieve relevant chunks of data and then provide them as context to the prompt. Prompting alone without relevant external knowledge only leverage the model’s internal training data, which might be outdated or insufficient for domain-specific tasks.
These points are foundational. They’re less about discovering hidden model behaviors and more about mitigating the inherent ambiguity of natural language and the model’s lack of true world understanding.
Structuring the Conversation: Roles and Delimiters
Assigning a roleor using delimitersare simple yet effective ways to guide the model’s behavior and separate instructions from input.
Assigning a Role: This is a trick to prime the model to generate text consistent with a certain persona or knowledge domain it learned during training. It leverage the fact that the model has seen countless examples of different writing styles and knowledge expressions. It works, but it’s a heuristic, not a guarantee of factual accuracy or perfect adherence to the role.
Using Delimiters: Essential for programmatic prompting. When you’re building an application that feeds user input into a prompt, you must use delimitersto clearly separated the user’s potentially malicious input from your system instructions. This is a critical security measure against prompt injection attacks, not just a formatting tip.
Nudging the Model’s Reasoning: Few-shot and Step-by-Step
Some techniques go beyond just structuring the input; they attempt to influence the model’s internal processing.
Few-shot Prompts: Providing a few examples of input/output pairsif often far more effective than just describing the task. Why? Because the model learns the desired mapping from the examples. It’s pattern recognition again. This is powerful for teaching specific formats or interpreting nuanced instructions that hard to describe purely verbally. It’s basically in-context learning.
Breaking Down Complex Tasks: Asking the model to think step-by-stepencourages it to show intermediate steps. This often leads to more accurate final results, especially for reasoning-heavy tasks. Why? It mimics hwo humans solve problems and forces the model to allocate computational steps sequentially. It’s less about a secret instruction and more about guiding the model through a multi-step process rather than expecting it to leap to the answer in one go.
The Engineering Angle: Testing and Iteration
The advice also includes testing and iteration. Again, this isn’t unique to prompt engineering. It’s fundamental to all software development.
Test and Iterate: You write a prompt, you test it with various inputs, you see where it fails or is suboptimal, you tweak the prompt, and you test again. This loop is the reality of building anything reliable with LLMs. It highlights that prompting is often empirical; you figure out what works by trying it. This is the opposite of a predictable, documented API.
The Hard Truth: Where Prompt Engineering Hits a Wall
Here’s where the pragmatic view really kicks in. Prompt engineering, while crucial, has significant limitations, especially for building robust, production-grade applications:
Context Window Limits: There’s only so much information you can cram into a prompt. Long documents, complex histories, or large datasets are out. This is why RAG systems are essential — they manage and retrieve relevant context dynamically. Prompting alone doesn’t solve the knowledge bottleneck.
Factual Accuracy and Hallucinations: No amount of prompting can guarantee a model won’t invent facts or confidently present misinformation. Prompting can sometimes mitigate this by, for examples, telling the model to stick only to the provided context, but it doesn’t fix the underlying issue that the model is a text predictor, not a truth engine.
Model Bias and Undesired Behavior: Prompts can influence output, but they can’t easily override biases embedded in the training data or prevent the model from generating harmful or inappropriate content in unexpected ways. Guardrails need to be implemented *outside* the prompt layer.
Complexity Ceiling: For truly complex, multi-step processes requiring external tool use, decision making, and dynamic state, pure prompting breaks down. This is the domain of AI agents, which use LLMs as the controller but rely on external memory, planning modules, and tool interaction to achieve goals. Prompting is just one part of the agent’s loop.
Maintainability: Try managing dozens or hundreds of complex, multi-line prompts across different features in a large application. Versioning them? Testing changes? This quickly becomes an engineering nightmare. Prompts are code, but often undocumented, untestable code living in strings.
Prompt Injection: As mentioned with delimiters, allowing external inputinto a prompt opens the door to prompt injection attacks, where malicious input hijacks the model’s instructions. Robust applications need sanitization and architectural safeguards beyond just a delimiter trick.
What no one tells you in the prompt “secrets” articles is that the difficulty scales non-linearly with the reliability and complexity required. Getting a cool demo output with a clever prompt is one thing. Building a feature that consistently works for thousands of users on diverse inputs while being secure and maintainable? That’s a whole different ballgame.
The Real “Secret”? It’s Just Good Engineering.
If there’s any “secret” to building effective applications with LLMs, it’s not a prompt string. It’s integrating the model into a well-architected system.
This involves:
Data Pipelines: Getting the right data to the model.
Orchestration Frameworks: Using tools like LangChain, LlamaIndex, or building custom workflows to sequence model calls, tool use, and data retrieval.
Evaluation: Developing robust methods to quantitatively measure the quality of LLM output beyond just eyeballing it. This is hard.
Guardrails: Implementing safety checks, moderation, and input validation *outside* the LLM call itself.
Fallback Mechanisms: What happens when the model gives a bad answer or fails? Your application needs graceful degradation.
Version Control and Testing: Treating prompts and the surrounding logic with the same rigor as any other production code.
Prompt engineering is a critical *skill*, part of the overall toolkit. It’s like knowing how to write effective SQL queries. Essential for database interaction, but it doesn’t mean you can build a scalable web application with just SQL. You need application code, infrastructure, frontend, etc.
Wrapping Up
So, Google’s whitepaper and similar resources offer valuable best practices for interacting with LLMs. They formalize common-sense approaches to communication and leverage observed model behaviors like few-shot learning and step-by-step processing. If you’re just starting out, or using LLMs for simple tasks, mastering these techniques will absolutely improve your results.
But if you’re a developer, an AI practitioner, or a technical founder looking to build robust, reliable applications powered by LLMs, understand this: prompt engineering is table stakes. It’s necessary, but far from sufficient. The real challenge, the actual “secrets” if you want to call them that, lie in the surrounding engineering — the data management, the orchestration, the evaluation, the guardrails, and the sheer hard work of building a system that accounts for the LLM’s inherent unpredictability and limitations.
Don’t get fixated on finding the perfect prompt string. Focus on building a resilient system around it. That’s where the real progress happens.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI - Medium
Share this post
#beyond #prompt #what #googles #llmBeyond the Prompt: What Google’s LLM Advice Doesn’t Quite Tell YouLatest Machine Learning Beyond the Prompt: What Google’s LLM Advice Doesn’t Quite Tell You 0 like May 18, 2025 Share this post Author: Mayank Bohra Originally published on Towards AI. Image by the author Alright, let’s talk about prompt engineering. Every other week, it seems there is a new set of secrets or magical techniques guaranteed to unlock AI perfection. Recently, a whitepaper from Google made the rounds, outlining their take on getting better results from Large Language Models. Look, effective prompting is absolutely necessary. It’s the interface layer, how we communicate our intent to these incredibly powerful, yet often frustrating opaque, models. Think of it like giving instructions to a brilliant but slightly eccentric junior engineer who only understands natural language. You need to be clear, specific, and provide context. But let’s be pragmatic. The idea that a few prompt tweaks will magically “10x” your results for every task is marketing hype, not engineering reality. These models, for all their capabilities, are fundamentally pattern-matching machines operating within a probabilistic space. They don’t understand in the way a human does. Prompting is about nudging that pattern matching closer to the desired outcome. So, what did Google’s advice cover, and what’s the experience builder’s take on it? The techniques generally boil down to principles we’ve known for a while: clarity, structure, providing examples and iteration. The Fundamentals: Clarity, Structure, Context Much of the advice centers on making your intent unambiguous. This is ground zero for dealing with LLMs. They excel at finding patterns in vast amounts of data, but they stumble on vagueness. Being Specific and Detailed: This isn’t a secret; it’s just good communication. If you ask for “information about AI”, you’ll get something generic. If you ask for “a summary of recent advancements in Generative AI model architecture published in research papers since April 2025, focusing on MoE models”, you give the model a much better target. Defining Output Format: Models are flexible text generators. If you don’t specify structure, you’ll get whatever feels statistically probable based on the training data, which is often inconsistent. Telling the model “Respond in JSON format with keys ‘summary’ and ‘key_findings’” isn’t magic; it’s setting clear requirements. Providing Context: Models have limited context windows. Showing your entire codebase or all user documentation in won’t work. You need to curate teh relevant information. This principle is the entire foundation of Retrieval Augmented Generation, where you retrieve relevant chunks of data and then provide them as context to the prompt. Prompting alone without relevant external knowledge only leverage the model’s internal training data, which might be outdated or insufficient for domain-specific tasks. These points are foundational. They’re less about discovering hidden model behaviors and more about mitigating the inherent ambiguity of natural language and the model’s lack of true world understanding. Structuring the Conversation: Roles and Delimiters Assigning a roleor using delimitersare simple yet effective ways to guide the model’s behavior and separate instructions from input. Assigning a Role: This is a trick to prime the model to generate text consistent with a certain persona or knowledge domain it learned during training. It leverage the fact that the model has seen countless examples of different writing styles and knowledge expressions. It works, but it’s a heuristic, not a guarantee of factual accuracy or perfect adherence to the role. Using Delimiters: Essential for programmatic prompting. When you’re building an application that feeds user input into a prompt, you must use delimitersto clearly separated the user’s potentially malicious input from your system instructions. This is a critical security measure against prompt injection attacks, not just a formatting tip. Nudging the Model’s Reasoning: Few-shot and Step-by-Step Some techniques go beyond just structuring the input; they attempt to influence the model’s internal processing. Few-shot Prompts: Providing a few examples of input/output pairsif often far more effective than just describing the task. Why? Because the model learns the desired mapping from the examples. It’s pattern recognition again. This is powerful for teaching specific formats or interpreting nuanced instructions that hard to describe purely verbally. It’s basically in-context learning. Breaking Down Complex Tasks: Asking the model to think step-by-stepencourages it to show intermediate steps. This often leads to more accurate final results, especially for reasoning-heavy tasks. Why? It mimics hwo humans solve problems and forces the model to allocate computational steps sequentially. It’s less about a secret instruction and more about guiding the model through a multi-step process rather than expecting it to leap to the answer in one go. The Engineering Angle: Testing and Iteration The advice also includes testing and iteration. Again, this isn’t unique to prompt engineering. It’s fundamental to all software development. Test and Iterate: You write a prompt, you test it with various inputs, you see where it fails or is suboptimal, you tweak the prompt, and you test again. This loop is the reality of building anything reliable with LLMs. It highlights that prompting is often empirical; you figure out what works by trying it. This is the opposite of a predictable, documented API. The Hard Truth: Where Prompt Engineering Hits a Wall Here’s where the pragmatic view really kicks in. Prompt engineering, while crucial, has significant limitations, especially for building robust, production-grade applications: Context Window Limits: There’s only so much information you can cram into a prompt. Long documents, complex histories, or large datasets are out. This is why RAG systems are essential — they manage and retrieve relevant context dynamically. Prompting alone doesn’t solve the knowledge bottleneck. Factual Accuracy and Hallucinations: No amount of prompting can guarantee a model won’t invent facts or confidently present misinformation. Prompting can sometimes mitigate this by, for examples, telling the model to stick only to the provided context, but it doesn’t fix the underlying issue that the model is a text predictor, not a truth engine. Model Bias and Undesired Behavior: Prompts can influence output, but they can’t easily override biases embedded in the training data or prevent the model from generating harmful or inappropriate content in unexpected ways. Guardrails need to be implemented *outside* the prompt layer. Complexity Ceiling: For truly complex, multi-step processes requiring external tool use, decision making, and dynamic state, pure prompting breaks down. This is the domain of AI agents, which use LLMs as the controller but rely on external memory, planning modules, and tool interaction to achieve goals. Prompting is just one part of the agent’s loop. Maintainability: Try managing dozens or hundreds of complex, multi-line prompts across different features in a large application. Versioning them? Testing changes? This quickly becomes an engineering nightmare. Prompts are code, but often undocumented, untestable code living in strings. Prompt Injection: As mentioned with delimiters, allowing external inputinto a prompt opens the door to prompt injection attacks, where malicious input hijacks the model’s instructions. Robust applications need sanitization and architectural safeguards beyond just a delimiter trick. What no one tells you in the prompt “secrets” articles is that the difficulty scales non-linearly with the reliability and complexity required. Getting a cool demo output with a clever prompt is one thing. Building a feature that consistently works for thousands of users on diverse inputs while being secure and maintainable? That’s a whole different ballgame. The Real “Secret”? It’s Just Good Engineering. If there’s any “secret” to building effective applications with LLMs, it’s not a prompt string. It’s integrating the model into a well-architected system. This involves: Data Pipelines: Getting the right data to the model. Orchestration Frameworks: Using tools like LangChain, LlamaIndex, or building custom workflows to sequence model calls, tool use, and data retrieval. Evaluation: Developing robust methods to quantitatively measure the quality of LLM output beyond just eyeballing it. This is hard. Guardrails: Implementing safety checks, moderation, and input validation *outside* the LLM call itself. Fallback Mechanisms: What happens when the model gives a bad answer or fails? Your application needs graceful degradation. Version Control and Testing: Treating prompts and the surrounding logic with the same rigor as any other production code. Prompt engineering is a critical *skill*, part of the overall toolkit. It’s like knowing how to write effective SQL queries. Essential for database interaction, but it doesn’t mean you can build a scalable web application with just SQL. You need application code, infrastructure, frontend, etc. Wrapping Up So, Google’s whitepaper and similar resources offer valuable best practices for interacting with LLMs. They formalize common-sense approaches to communication and leverage observed model behaviors like few-shot learning and step-by-step processing. If you’re just starting out, or using LLMs for simple tasks, mastering these techniques will absolutely improve your results. But if you’re a developer, an AI practitioner, or a technical founder looking to build robust, reliable applications powered by LLMs, understand this: prompt engineering is table stakes. It’s necessary, but far from sufficient. The real challenge, the actual “secrets” if you want to call them that, lie in the surrounding engineering — the data management, the orchestration, the evaluation, the guardrails, and the sheer hard work of building a system that accounts for the LLM’s inherent unpredictability and limitations. Don’t get fixated on finding the perfect prompt string. Focus on building a resilient system around it. That’s where the real progress happens. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post #beyond #prompt #what #googles #llmTOWARDSAI.NETBeyond the Prompt: What Google’s LLM Advice Doesn’t Quite Tell YouLatest Machine Learning Beyond the Prompt: What Google’s LLM Advice Doesn’t Quite Tell You 0 like May 18, 2025 Share this post Author(s): Mayank Bohra Originally published on Towards AI. Image by the author Alright, let’s talk about prompt engineering. Every other week, it seems there is a new set of secrets or magical techniques guaranteed to unlock AI perfection. Recently, a whitepaper from Google made the rounds, outlining their take on getting better results from Large Language Models. Look, effective prompting is absolutely necessary. It’s the interface layer, how we communicate our intent to these incredibly powerful, yet often frustrating opaque, models. Think of it like giving instructions to a brilliant but slightly eccentric junior engineer who only understands natural language. You need to be clear, specific, and provide context. But let’s be pragmatic. The idea that a few prompt tweaks will magically “10x” your results for every task is marketing hype, not engineering reality. These models, for all their capabilities, are fundamentally pattern-matching machines operating within a probabilistic space. They don’t understand in the way a human does. Prompting is about nudging that pattern matching closer to the desired outcome. So, what did Google’s advice cover, and what’s the experience builder’s take on it? The techniques generally boil down to principles we’ve known for a while: clarity, structure, providing examples and iteration. The Fundamentals: Clarity, Structure, Context Much of the advice centers on making your intent unambiguous. This is ground zero for dealing with LLMs. They excel at finding patterns in vast amounts of data, but they stumble on vagueness. Being Specific and Detailed: This isn’t a secret; it’s just good communication. If you ask for “information about AI”, you’ll get something generic. If you ask for “a summary of recent advancements in Generative AI model architecture published in research papers since April 2025, focusing on MoE models”, you give the model a much better target. Defining Output Format: Models are flexible text generators. If you don’t specify structure (JSON, bullet points, a specific paragraph format), you’ll get whatever feels statistically probable based on the training data, which is often inconsistent. Telling the model “Respond in JSON format with keys ‘summary’ and ‘key_findings’” isn’t magic; it’s setting clear requirements. Providing Context: Models have limited context windows. Showing your entire codebase or all user documentation in won’t work. You need to curate teh relevant information. This principle is the entire foundation of Retrieval Augmented Generation (RAG), where you retrieve relevant chunks of data and then provide them as context to the prompt. Prompting alone without relevant external knowledge only leverage the model’s internal training data, which might be outdated or insufficient for domain-specific tasks. These points are foundational. They’re less about discovering hidden model behaviors and more about mitigating the inherent ambiguity of natural language and the model’s lack of true world understanding. Structuring the Conversation: Roles and Delimiters Assigning a role (“Act as an expert historian…”) or using delimiters (like “` or — -) are simple yet effective ways to guide the model’s behavior and separate instructions from input. Assigning a Role: This is a trick to prime the model to generate text consistent with a certain persona or knowledge domain it learned during training. It leverage the fact that the model has seen countless examples of different writing styles and knowledge expressions. It works, but it’s a heuristic, not a guarantee of factual accuracy or perfect adherence to the role. Using Delimiters: Essential for programmatic prompting. When you’re building an application that feeds user input into a prompt, you must use delimiters (e.g., triple backticks, XML tags) to clearly separated the user’s potentially malicious input from your system instructions. This is a critical security measure against prompt injection attacks, not just a formatting tip. Nudging the Model’s Reasoning: Few-shot and Step-by-Step Some techniques go beyond just structuring the input; they attempt to influence the model’s internal processing. Few-shot Prompts: Providing a few examples of input/output pairs (‘Input X → Output Y’, Input A → Output B, Input C→ ?’) if often far more effective than just describing the task. Why? Because the model learns the desired mapping from the examples. It’s pattern recognition again. This is powerful for teaching specific formats or interpreting nuanced instructions that hard to describe purely verbally. It’s basically in-context learning. Breaking Down Complex Tasks: Asking the model to think step-by-step (or implementing techniques like Chain-of-Thought or Tree-of-Thought prompting outside the model) encourages it to show intermediate steps. This often leads to more accurate final results, especially for reasoning-heavy tasks. Why? It mimics hwo humans solve problems and forces the model to allocate computational steps sequentially. It’s less about a secret instruction and more about guiding the model through a multi-step process rather than expecting it to leap to the answer in one go. The Engineering Angle: Testing and Iteration The advice also includes testing and iteration. Again, this isn’t unique to prompt engineering. It’s fundamental to all software development. Test and Iterate: You write a prompt, you test it with various inputs, you see where it fails or is suboptimal, you tweak the prompt, and you test again. This loop is the reality of building anything reliable with LLMs. It highlights that prompting is often empirical; you figure out what works by trying it. This is the opposite of a predictable, documented API. The Hard Truth: Where Prompt Engineering Hits a Wall Here’s where the pragmatic view really kicks in. Prompt engineering, while crucial, has significant limitations, especially for building robust, production-grade applications: Context Window Limits: There’s only so much information you can cram into a prompt. Long documents, complex histories, or large datasets are out. This is why RAG systems are essential — they manage and retrieve relevant context dynamically. Prompting alone doesn’t solve the knowledge bottleneck. Factual Accuracy and Hallucinations: No amount of prompting can guarantee a model won’t invent facts or confidently present misinformation. Prompting can sometimes mitigate this by, for examples, telling the model to stick only to the provided context (RAG), but it doesn’t fix the underlying issue that the model is a text predictor, not a truth engine. Model Bias and Undesired Behavior: Prompts can influence output, but they can’t easily override biases embedded in the training data or prevent the model from generating harmful or inappropriate content in unexpected ways. Guardrails need to be implemented *outside* the prompt layer. Complexity Ceiling: For truly complex, multi-step processes requiring external tool use, decision making, and dynamic state, pure prompting breaks down. This is the domain of AI agents, which use LLMs as the controller but rely on external memory, planning modules, and tool interaction to achieve goals. Prompting is just one part of the agent’s loop. Maintainability: Try managing dozens or hundreds of complex, multi-line prompts across different features in a large application. Versioning them? Testing changes? This quickly becomes an engineering nightmare. Prompts are code, but often undocumented, untestable code living in strings. Prompt Injection: As mentioned with delimiters, allowing external input (from users, databases, APIs) into a prompt opens the door to prompt injection attacks, where malicious input hijacks the model’s instructions. Robust applications need sanitization and architectural safeguards beyond just a delimiter trick. What no one tells you in the prompt “secrets” articles is that the difficulty scales non-linearly with the reliability and complexity required. Getting a cool demo output with a clever prompt is one thing. Building a feature that consistently works for thousands of users on diverse inputs while being secure and maintainable? That’s a whole different ballgame. The Real “Secret”? It’s Just Good Engineering. If there’s any “secret” to building effective applications with LLMs, it’s not a prompt string. It’s integrating the model into a well-architected system. This involves: Data Pipelines: Getting the right data to the model (for RAG, fine-tuning, etc.). Orchestration Frameworks: Using tools like LangChain, LlamaIndex, or building custom workflows to sequence model calls, tool use, and data retrieval. Evaluation: Developing robust methods to quantitatively measure the quality of LLM output beyond just eyeballing it. This is hard. Guardrails: Implementing safety checks, moderation, and input validation *outside* the LLM call itself. Fallback Mechanisms: What happens when the model gives a bad answer or fails? Your application needs graceful degradation. Version Control and Testing: Treating prompts and the surrounding logic with the same rigor as any other production code. Prompt engineering is a critical *skill*, part of the overall toolkit. It’s like knowing how to write effective SQL queries. Essential for database interaction, but it doesn’t mean you can build a scalable web application with just SQL. You need application code, infrastructure, frontend, etc. Wrapping Up So, Google’s whitepaper and similar resources offer valuable best practices for interacting with LLMs. They formalize common-sense approaches to communication and leverage observed model behaviors like few-shot learning and step-by-step processing. If you’re just starting out, or using LLMs for simple tasks, mastering these techniques will absolutely improve your results. But if you’re a developer, an AI practitioner, or a technical founder looking to build robust, reliable applications powered by LLMs, understand this: prompt engineering is table stakes. It’s necessary, but far from sufficient. The real challenge, the actual “secrets” if you want to call them that, lie in the surrounding engineering — the data management, the orchestration, the evaluation, the guardrails, and the sheer hard work of building a system that accounts for the LLM’s inherent unpredictability and limitations. Don’t get fixated on finding the perfect prompt string. Focus on building a resilient system around it. That’s where the real progress happens. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 التعليقات 0 المشاركات -
My AI Journey: The Tools That Opened Each Door
Author: Sophia Banton
Originally published on Towards AI.
Steve Jobs once said, “Technology is nothing. What’s important is that you have a faith in people, that they’re basically good and smart, and if you give them tools, they’ll do wonderful things with them.”
These were the tools that were given to me so that I could fly, by the giants on whose shoulders I stand.
PyMOL: Seeing Beauty in Science
I remember it like it was yesterday. I was working on an assignment in class. Following the steps carefully, I watched as it happened: proteins appeared as beautiful ribbons on the screen, their intricate structures swirling in vibrant colors. In that moment, I was captivated by PyMOL, a computer program for viewing biological molecules in 3D. Warren Delano’s PyMOL wasn’t just a visualization tool — it was a window into the elegance of science.
PyMOL taught me that data is more than information — it’s art, and that technology and science are deeply intertwined. It was also my first interaction with open-source software — free tools that bring opportunities to anyone, anywhere. This insight, the power of accessible technology, has endured among my fundamental beliefs.
With PyMOL, I found the gateway to the next chapter of my journey. An image I created with PyMOL was central to my first scientific publication.
That image remains on the opening page of my portfolio today, a testament to the power of visualization. That publication led to my first professional role in science, where I discovered the tool that would open the door to endless possibilities.
R: Freedom to Create with Code
In that role I discovered R, a programming language for graphics and statistics — it was love at first byte. Unlike PyMOL, R was my first self-taught adventure, mastered at home with just an Amazon-bought book and determination.
While other programming languages felt like strict rule books, R was an artist’s palette. Its quirky symbols and flexible approach felt like an invitation to be creative with code. R became my key to exploring data, ultimately unlocking the most impactful opportunity on my path.
The data manipulation skills I developed in R led me to the frontiers of innovation — a new role in biomedical research. R wasn’t just a tool; it became a trusted companion for weaving together complex data — from genomics to clinical information. With R, data analysis was just the beginning. The next tool allowed me to mesmerize audiences with the beauty of data.
ggplot2: Turning Data into Colorful Stories
Like my discovery of PyMOL, my first encounter with Hadley Wickham’s ggplot2 resonated deeply. This visualization toolkit for R, built on the principles of the grammar of graphics, transcended data into stories told through colors, patterns, and shapes.
I wasn’t just analyzing data anymore; I was uncovering hidden stories. These plots had elements of style that would impress Van Gogh — themes, borders, and vibrant palettes. The result? Multiple scientific publications and a new identity: “the woman who makes pretty plots”.
But like PyMOL and R, ggplot2 taught me that success isn’t just about achievements — it’s about empowering others. Inspired by the open-source community, I created an online ggplot2 course. The most rewarding moment? When a colleague from another continent recognized me from my course and warmly shook my hand. Yet ggplot2 wasn’t the final chapter — it was another stepping-stone on my road of discovery.
Plotly: Making Data Come Alive
ggplot2 revealed the beauty of data, but Plotly in R taught me how to make visualizations interactive with clickable charts and dynamic features. Visualizations were no longer just static images on screens — they could come alive.
Plotly also allowed me to fine-tune my skills in another programming language called Python. Plotly in Python opened doors to freelance opportunities in data visualization. These projects boosted both my skills and confidence.
These experiences prepared me for my leap into industry, where I would turn tools into solutions. But before that transition, there was one more tool in R to master — it would become my most trusted companion.
R Shiny: The Catalyst for Transformation
R had become the backbone of my career when I stumbled upon something unexpected — R Shiny, a tool for creating web apps in R. I stared at the screen in awe, remembering the first time I saw protein ribbons in PyMOL. I used online resources to teach myself R Shiny.
R Shiny brought everything together: R’s analytics, ggplot2’s beauty, and Plotly’s interactivity. Now I could share data through intuitive web apps, no more creating endless PowerPoint presentations. Shiny became my treasured companion and the cornerstone of my budding career.
R Shiny wasn’t just a tool — it was a career catalyst. Making apps wasn’t part of my original plan — honestly, there was no plan. But learning R Shiny gave me the confidence to tackle new challenges beyond the academic environment I called home.
Shiny in Action: Empowering Users and Solving Problems
I joined a startup where I used Shiny to detect fraud — my first venture beyond academia into the world of technology professionals. Then came an opportunity that would tie all my tools together.
Still new to industry, I was unfamiliar with recruiters, hiring practices and corporate culture. But I did what I had always done, I used my best tools. The hiring process required a hands-on use case, so I built a Shiny app in two intense days. That app got me the job.
Within this new role, R Shiny gave me my first industry publication and first published app. Like PyMOL opened the door to science, R Shiny introduced me to the complexities of working in industry. Each new app connected me with different business functions — from Marketing to Medical Affairs — teaching me about collaboration, resilience, and servant leadership.
These experiences prepared me for an unexpected shift — the rise of AI that transformed how we interact with technology.
Generative AI: Redefining Interaction
The release of ChatGPT marked a turning point in how people interacted with technology. I turned to my trusted friend — R Shiny — to quickly build examples of what this new technology could do. Within two months of the release of ChatGPT, we had our first generative AI application running. Once again, R Shiny proved to be an invaluable tool for embracing the future.
By the next year, generative AI had infiltrated industries, creating new opportunities for innovation. At work, I had the chance to contribute to an exciting generative AI project. The increasing demands for flexibility led me to transition to Shiny for Python, combining Shiny’s elegance with Python’s vast AI resources. The application proved successful enough to move from a prototype to an operational solution within the company.
Shiny had evolved, and so had I. No longer just “the woman who makes pretty plots and apps”, I stepped into the future of AI with my trusted companion at my side. Because regardless of the engines that power AI, the need to make data accessible and interactive will always remain.
My Tools, My Teammates
Looking back, these weren’t just tools — they were teammates. PyMOL revealed the beauty of science. R offered boundless creativity. ggplot2 and Plotly turned data into stories. Shiny transformed me from a scientist to an innovator, ready for the AI revolution.
Each tool shaped who I am, and together they taught me the most important lesson: technology’s true power lies not in the code, but in how it empowers people to do wonderful things.
About the Author
Sophia Banton is an Associate Director and AI Solution Lead in biopharma, specializing in Responsible AI governance, workplace AI adoption, and building and scaling AI solutions across IT and business functions.
With a background in bioinformatics, public health, and data science, she brings an interdisciplinary lens to AI implementation — balancing technical execution, ethical design, and business alignment in highly regulated environments. Her writing explores the real-world impact of AI beyond theory, helping organizations adopt AI responsibly and sustainably.
Connect with her on LinkedIn or explore more AI insights on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
#journey #tools #that #opened #eachMy AI Journey: The Tools That Opened Each DoorAuthor: Sophia Banton Originally published on Towards AI. Steve Jobs once said, “Technology is nothing. What’s important is that you have a faith in people, that they’re basically good and smart, and if you give them tools, they’ll do wonderful things with them.” These were the tools that were given to me so that I could fly, by the giants on whose shoulders I stand. PyMOL: Seeing Beauty in Science I remember it like it was yesterday. I was working on an assignment in class. Following the steps carefully, I watched as it happened: proteins appeared as beautiful ribbons on the screen, their intricate structures swirling in vibrant colors. In that moment, I was captivated by PyMOL, a computer program for viewing biological molecules in 3D. Warren Delano’s PyMOL wasn’t just a visualization tool — it was a window into the elegance of science. PyMOL taught me that data is more than information — it’s art, and that technology and science are deeply intertwined. It was also my first interaction with open-source software — free tools that bring opportunities to anyone, anywhere. This insight, the power of accessible technology, has endured among my fundamental beliefs. With PyMOL, I found the gateway to the next chapter of my journey. An image I created with PyMOL was central to my first scientific publication. That image remains on the opening page of my portfolio today, a testament to the power of visualization. That publication led to my first professional role in science, where I discovered the tool that would open the door to endless possibilities. R: Freedom to Create with Code In that role I discovered R, a programming language for graphics and statistics — it was love at first byte. Unlike PyMOL, R was my first self-taught adventure, mastered at home with just an Amazon-bought book and determination. While other programming languages felt like strict rule books, R was an artist’s palette. Its quirky symbols and flexible approach felt like an invitation to be creative with code. R became my key to exploring data, ultimately unlocking the most impactful opportunity on my path. The data manipulation skills I developed in R led me to the frontiers of innovation — a new role in biomedical research. R wasn’t just a tool; it became a trusted companion for weaving together complex data — from genomics to clinical information. With R, data analysis was just the beginning. The next tool allowed me to mesmerize audiences with the beauty of data. ggplot2: Turning Data into Colorful Stories Like my discovery of PyMOL, my first encounter with Hadley Wickham’s ggplot2 resonated deeply. This visualization toolkit for R, built on the principles of the grammar of graphics, transcended data into stories told through colors, patterns, and shapes. I wasn’t just analyzing data anymore; I was uncovering hidden stories. These plots had elements of style that would impress Van Gogh — themes, borders, and vibrant palettes. The result? Multiple scientific publications and a new identity: “the woman who makes pretty plots”. But like PyMOL and R, ggplot2 taught me that success isn’t just about achievements — it’s about empowering others. Inspired by the open-source community, I created an online ggplot2 course. The most rewarding moment? When a colleague from another continent recognized me from my course and warmly shook my hand. Yet ggplot2 wasn’t the final chapter — it was another stepping-stone on my road of discovery. Plotly: Making Data Come Alive ggplot2 revealed the beauty of data, but Plotly in R taught me how to make visualizations interactive with clickable charts and dynamic features. Visualizations were no longer just static images on screens — they could come alive. Plotly also allowed me to fine-tune my skills in another programming language called Python. Plotly in Python opened doors to freelance opportunities in data visualization. These projects boosted both my skills and confidence. These experiences prepared me for my leap into industry, where I would turn tools into solutions. But before that transition, there was one more tool in R to master — it would become my most trusted companion. R Shiny: The Catalyst for Transformation R had become the backbone of my career when I stumbled upon something unexpected — R Shiny, a tool for creating web apps in R. I stared at the screen in awe, remembering the first time I saw protein ribbons in PyMOL. I used online resources to teach myself R Shiny. R Shiny brought everything together: R’s analytics, ggplot2’s beauty, and Plotly’s interactivity. Now I could share data through intuitive web apps, no more creating endless PowerPoint presentations. Shiny became my treasured companion and the cornerstone of my budding career. R Shiny wasn’t just a tool — it was a career catalyst. Making apps wasn’t part of my original plan — honestly, there was no plan. But learning R Shiny gave me the confidence to tackle new challenges beyond the academic environment I called home. Shiny in Action: Empowering Users and Solving Problems I joined a startup where I used Shiny to detect fraud — my first venture beyond academia into the world of technology professionals. Then came an opportunity that would tie all my tools together. Still new to industry, I was unfamiliar with recruiters, hiring practices and corporate culture. But I did what I had always done, I used my best tools. The hiring process required a hands-on use case, so I built a Shiny app in two intense days. That app got me the job. Within this new role, R Shiny gave me my first industry publication and first published app. Like PyMOL opened the door to science, R Shiny introduced me to the complexities of working in industry. Each new app connected me with different business functions — from Marketing to Medical Affairs — teaching me about collaboration, resilience, and servant leadership. These experiences prepared me for an unexpected shift — the rise of AI that transformed how we interact with technology. Generative AI: Redefining Interaction The release of ChatGPT marked a turning point in how people interacted with technology. I turned to my trusted friend — R Shiny — to quickly build examples of what this new technology could do. Within two months of the release of ChatGPT, we had our first generative AI application running. Once again, R Shiny proved to be an invaluable tool for embracing the future. By the next year, generative AI had infiltrated industries, creating new opportunities for innovation. At work, I had the chance to contribute to an exciting generative AI project. The increasing demands for flexibility led me to transition to Shiny for Python, combining Shiny’s elegance with Python’s vast AI resources. The application proved successful enough to move from a prototype to an operational solution within the company. Shiny had evolved, and so had I. No longer just “the woman who makes pretty plots and apps”, I stepped into the future of AI with my trusted companion at my side. Because regardless of the engines that power AI, the need to make data accessible and interactive will always remain. My Tools, My Teammates Looking back, these weren’t just tools — they were teammates. PyMOL revealed the beauty of science. R offered boundless creativity. ggplot2 and Plotly turned data into stories. Shiny transformed me from a scientist to an innovator, ready for the AI revolution. Each tool shaped who I am, and together they taught me the most important lesson: technology’s true power lies not in the code, but in how it empowers people to do wonderful things. About the Author Sophia Banton is an Associate Director and AI Solution Lead in biopharma, specializing in Responsible AI governance, workplace AI adoption, and building and scaling AI solutions across IT and business functions. With a background in bioinformatics, public health, and data science, she brings an interdisciplinary lens to AI implementation — balancing technical execution, ethical design, and business alignment in highly regulated environments. Her writing explores the real-world impact of AI beyond theory, helping organizations adopt AI responsibly and sustainably. Connect with her on LinkedIn or explore more AI insights on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI #journey #tools #that #opened #eachTOWARDSAI.NETMy AI Journey: The Tools That Opened Each DoorAuthor(s): Sophia Banton Originally published on Towards AI. Steve Jobs once said, “Technology is nothing. What’s important is that you have a faith in people, that they’re basically good and smart, and if you give them tools, they’ll do wonderful things with them.” These were the tools that were given to me so that I could fly, by the giants on whose shoulders I stand. PyMOL: Seeing Beauty in Science I remember it like it was yesterday. I was working on an assignment in class. Following the steps carefully, I watched as it happened: proteins appeared as beautiful ribbons on the screen, their intricate structures swirling in vibrant colors. In that moment, I was captivated by PyMOL, a computer program for viewing biological molecules in 3D. Warren Delano’s PyMOL wasn’t just a visualization tool — it was a window into the elegance of science. PyMOL taught me that data is more than information — it’s art, and that technology and science are deeply intertwined. It was also my first interaction with open-source software — free tools that bring opportunities to anyone, anywhere. This insight, the power of accessible technology, has endured among my fundamental beliefs. With PyMOL, I found the gateway to the next chapter of my journey. An image I created with PyMOL was central to my first scientific publication. That image remains on the opening page of my portfolio today, a testament to the power of visualization. That publication led to my first professional role in science, where I discovered the tool that would open the door to endless possibilities. R: Freedom to Create with Code In that role I discovered R, a programming language for graphics and statistics — it was love at first byte. Unlike PyMOL, R was my first self-taught adventure, mastered at home with just an Amazon-bought book and determination. While other programming languages felt like strict rule books, R was an artist’s palette. Its quirky symbols and flexible approach felt like an invitation to be creative with code. R became my key to exploring data, ultimately unlocking the most impactful opportunity on my path. The data manipulation skills I developed in R led me to the frontiers of innovation — a new role in biomedical research. R wasn’t just a tool; it became a trusted companion for weaving together complex data — from genomics to clinical information. With R, data analysis was just the beginning. The next tool allowed me to mesmerize audiences with the beauty of data. ggplot2: Turning Data into Colorful Stories Like my discovery of PyMOL, my first encounter with Hadley Wickham’s ggplot2 resonated deeply. This visualization toolkit for R, built on the principles of the grammar of graphics (hence the gg), transcended data into stories told through colors, patterns, and shapes. I wasn’t just analyzing data anymore; I was uncovering hidden stories. These plots had elements of style that would impress Van Gogh — themes, borders, and vibrant palettes. The result? Multiple scientific publications and a new identity: “the woman who makes pretty plots”. But like PyMOL and R, ggplot2 taught me that success isn’t just about achievements — it’s about empowering others. Inspired by the open-source community, I created an online ggplot2 course. The most rewarding moment? When a colleague from another continent recognized me from my course and warmly shook my hand. Yet ggplot2 wasn’t the final chapter — it was another stepping-stone on my road of discovery. Plotly: Making Data Come Alive ggplot2 revealed the beauty of data, but Plotly in R taught me how to make visualizations interactive with clickable charts and dynamic features. Visualizations were no longer just static images on screens — they could come alive. Plotly also allowed me to fine-tune my skills in another programming language called Python. Plotly in Python opened doors to freelance opportunities in data visualization. These projects boosted both my skills and confidence. These experiences prepared me for my leap into industry, where I would turn tools into solutions. But before that transition, there was one more tool in R to master — it would become my most trusted companion. R Shiny: The Catalyst for Transformation R had become the backbone of my career when I stumbled upon something unexpected — R Shiny, a tool for creating web apps in R. I stared at the screen in awe, remembering the first time I saw protein ribbons in PyMOL. I used online resources to teach myself R Shiny. R Shiny brought everything together: R’s analytics, ggplot2’s beauty, and Plotly’s interactivity. Now I could share data through intuitive web apps, no more creating endless PowerPoint presentations. Shiny became my treasured companion and the cornerstone of my budding career. R Shiny wasn’t just a tool — it was a career catalyst. Making apps wasn’t part of my original plan — honestly, there was no plan. But learning R Shiny gave me the confidence to tackle new challenges beyond the academic environment I called home. Shiny in Action: Empowering Users and Solving Problems I joined a startup where I used Shiny to detect fraud — my first venture beyond academia into the world of technology professionals. Then came an opportunity that would tie all my tools together. Still new to industry, I was unfamiliar with recruiters, hiring practices and corporate culture. But I did what I had always done, I used my best tools. The hiring process required a hands-on use case, so I built a Shiny app in two intense days. That app got me the job. Within this new role, R Shiny gave me my first industry publication and first published app. Like PyMOL opened the door to science, R Shiny introduced me to the complexities of working in industry. Each new app connected me with different business functions — from Marketing to Medical Affairs — teaching me about collaboration, resilience, and servant leadership. These experiences prepared me for an unexpected shift — the rise of AI that transformed how we interact with technology. Generative AI: Redefining Interaction The release of ChatGPT marked a turning point in how people interacted with technology. I turned to my trusted friend — R Shiny — to quickly build examples of what this new technology could do. Within two months of the release of ChatGPT, we had our first generative AI application running. Once again, R Shiny proved to be an invaluable tool for embracing the future. By the next year, generative AI had infiltrated industries, creating new opportunities for innovation. At work, I had the chance to contribute to an exciting generative AI project. The increasing demands for flexibility led me to transition to Shiny for Python, combining Shiny’s elegance with Python’s vast AI resources. The application proved successful enough to move from a prototype to an operational solution within the company. Shiny had evolved, and so had I. No longer just “the woman who makes pretty plots and apps”, I stepped into the future of AI with my trusted companion at my side. Because regardless of the engines that power AI, the need to make data accessible and interactive will always remain. My Tools, My Teammates Looking back, these weren’t just tools — they were teammates. PyMOL revealed the beauty of science. R offered boundless creativity. ggplot2 and Plotly turned data into stories. Shiny transformed me from a scientist to an innovator, ready for the AI revolution. Each tool shaped who I am, and together they taught me the most important lesson: technology’s true power lies not in the code, but in how it empowers people to do wonderful things. About the Author Sophia Banton is an Associate Director and AI Solution Lead in biopharma, specializing in Responsible AI governance, workplace AI adoption, and building and scaling AI solutions across IT and business functions. With a background in bioinformatics, public health, and data science, she brings an interdisciplinary lens to AI implementation — balancing technical execution, ethical design, and business alignment in highly regulated environments. Her writing explores the real-world impact of AI beyond theory, helping organizations adopt AI responsibly and sustainably. Connect with her on LinkedIn or explore more AI insights on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 التعليقات 0 المشاركات -
7 AI Agent Tools for n8n You MUST Have
7 AI Agent Tools for n8n You MUST Have
0 like
May 17, 2025
Share this post
Last Updated on May 17, 2025 by Editorial Team
Author: Kalash Vasaniya
Originally published on Towards AI.
n8n AI agents are revolutionizing how we automate.Source: From /
If you’re not a member but want to read this article, see this friend link here.
They assist us in creating sophisticated workflows that save us time by automating repetitive, mundane tasks in our working and personal lives. Intelligent agents can decide, interact with other systems, and provide meaningful results with minimal human interaction.
In this article, I will be pointing out seven essential AI agent tools that will greatly improve your n8n workflows and enable you to develop quality automation.
OpenRouter is a single API gateway offering easy access to multiple AI models from top vendors like Anthropic, Google, Meta, Mistral, and more. As a n8n community node, OpenRouter brings some huge benefits to AI agent workflows.
Single API Key: Use many models from different providers without needing to hold separate logins.Low cost: Pay-as-you-go with no contractTransparent Pricing: Clear costs per token for all models available.High Availability: Enterprise-grade infrastructure that can perform failoverStandardized API: One interface to rule them all
The n8n community node for OpenRouter allows you to directly integrate multiple AI models into your n8n workflows. You can easily install it from n8n’s Community Nodes menu by entering n8n-nodes-openrouter in the npm package name field.
Once installed, you can adjust… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI - Medium
Share this post
#agent #tools #n8n #you #must7 AI Agent Tools for n8n You MUST Have7 AI Agent Tools for n8n You MUST Have 0 like May 17, 2025 Share this post Last Updated on May 17, 2025 by Editorial Team Author: Kalash Vasaniya Originally published on Towards AI. n8n AI agents are revolutionizing how we automate.Source: From / If you’re not a member but want to read this article, see this friend link here. They assist us in creating sophisticated workflows that save us time by automating repetitive, mundane tasks in our working and personal lives. Intelligent agents can decide, interact with other systems, and provide meaningful results with minimal human interaction. In this article, I will be pointing out seven essential AI agent tools that will greatly improve your n8n workflows and enable you to develop quality automation. OpenRouter is a single API gateway offering easy access to multiple AI models from top vendors like Anthropic, Google, Meta, Mistral, and more. As a n8n community node, OpenRouter brings some huge benefits to AI agent workflows. Single API Key: Use many models from different providers without needing to hold separate logins.Low cost: Pay-as-you-go with no contractTransparent Pricing: Clear costs per token for all models available.High Availability: Enterprise-grade infrastructure that can perform failoverStandardized API: One interface to rule them all The n8n community node for OpenRouter allows you to directly integrate multiple AI models into your n8n workflows. You can easily install it from n8n’s Community Nodes menu by entering n8n-nodes-openrouter in the npm package name field. Once installed, you can adjust… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post #agent #tools #n8n #you #mustTOWARDSAI.NET7 AI Agent Tools for n8n You MUST Have7 AI Agent Tools for n8n You MUST Have 0 like May 17, 2025 Share this post Last Updated on May 17, 2025 by Editorial Team Author(s): Kalash Vasaniya Originally published on Towards AI. n8n AI agents are revolutionizing how we automate.Source: From https://n8n.io/ If you’re not a member but want to read this article, see this friend link here. They assist us in creating sophisticated workflows that save us time by automating repetitive, mundane tasks in our working and personal lives. Intelligent agents can decide, interact with other systems, and provide meaningful results with minimal human interaction. In this article, I will be pointing out seven essential AI agent tools that will greatly improve your n8n workflows and enable you to develop quality automation. OpenRouter is a single API gateway offering easy access to multiple AI models from top vendors like Anthropic, Google, Meta, Mistral, and more. As a n8n community node, OpenRouter brings some huge benefits to AI agent workflows. Single API Key: Use many models from different providers without needing to hold separate logins.Low cost: Pay-as-you-go with no contractTransparent Pricing: Clear costs per token for all models available.High Availability: Enterprise-grade infrastructure that can perform failoverStandardized API: One interface to rule them all The n8n community node for OpenRouter allows you to directly integrate multiple AI models into your n8n workflows. You can easily install it from n8n’s Community Nodes menu by entering n8n-nodes-openrouter in the npm package name field. Once installed, you can adjust… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 التعليقات 0 المشاركات -
Agency is The Key to AGI
Author: Adam BEN KHALIFA
Originally published on Towards AI.
Why are agentic workflows essential for achieving AGI
Let me ask you this, what if the path to truly smart and effective AI , the kind we call AGI, isn’t just about building one colossal, all-knowing brain? What if the real breakthrough lies not in making our models only smarter, but in making them also capable of acting, adapting, and evolving?
Well, LLMs continue to amaze us day after day, but the road to AGI demands more than raw intellect. It requires agency.
Getting Our Terms Straight: AGI, Agency, and Agentic Workflows
Before we dive in, let’s define the main concepts here:
AGI — Artificial General Intelligence:
You can see it as an AI model that can perform any intellectual task a human can. This means not just understanding language or generating images, but adapting, learning, reasoning, and acting across entirely new domains.
Agency:
The capacity of an entity to act purposefully in its environment to achieve goals. A rock has no agency; a human planning their day has plenty. For an AI, agency means it’s not just passively responding to prompts but actively pursuing objectives.
Simply put, it’s the capacity to pursue goals autonomously through planning, acting, and adapting.
Agentic Workflows:
If agency is the “what”, agentic workflows are the “how”. These are the dynamic processes and systems an AI uses to exercise its agency. Think beyond a simple input-output model.
Agentic workflows involve:
Autonomous Goal-Setting & Planning: the AI doesn’t just execute a pre-defined plan, it can formulate goals and strategize how to achieve them.
Tool Use & Orchestration: like a skilled craftsperson, it can select, combine, and utilize various “tools”to get the job done.
Memory & Learning: it remembers past actions, learns from successes and failures, and adapts its strategies over time.
Adaptation in Dynamic Environments: the real world is messy, an agentic AI can adjust its plan when encountering unexpected obstacles or new information.
It’s vital to understand the difference here:
An LLM calling a weather API is just tool use.
An agentic workflow is when an LLM, tasked with “analyzing market trends for a new product,” autonomously decides to:1) search recent financial news, 2) query a sales database, 3) use a data analysis tool to spot correlations, 4) ask a specialized forecasting model for projections, and then 5) compile a summary report, re-evaluating its approach at each step.It’s like the difference between a single musician playing one note, and a conductor leading an entire orchestra.
The Limitation of Isolated Intelligence
Let’s consider a human analogy. Imagine a brilliant engineer, a genius in their field. Now, strip away their tools: no computer, no internet for research, no pen and paper for sketching ideas or taking notes, no lab for prototyping, no colleagues to bounce ideas off. Confine them to only their thoughts. How much could they truly achieve? Their raw intellect, however vast, becomes severely handicapped when uncoupled from the ability to interact, experiment, and leverage external resources.
This “intelligence in isolation” scenario illustrates a fundamental truth: intelligence doesn’t operate in a vacuum. It thrives on interaction, tool use, and the ability to execute plans in the world. If we want AGI, we can’t just build a disembodied digital brain, we need to build something that can act.
Agentic Workflows: AI can act like us, and perhaps even better
Humans are masters of adapting their “workflows.” A painter uses different tools and processes than an engineer, who uses different methods than a chef. We intuitively understand context, choose the right approach, and even invent new methods when old ones fail.
Agentic workflows aim to achieve similar capabilities:
Contextual Flexibility: An Agentic AI could switch between “investigative journalist mode”and “creative writer mode”as needed for a complex task.
Learning by Doing: Human learning is an iterative workflow: observe, hypothesize, experiment, analyze, conclude, refine. Agentic systems can embody this, trying approaches, evaluating outcomes, and improving their strategies.
Beyond Monolithic Thought: We don’t store everything in our heads. We use notes, computers, books, and critically, we delegate tasks to others. Agentic AI can similarly leverage external knowledge bases, specialized sub-agents, and computational tools, creating a distributed, more powerful form of intelligence.
Thinking About Thinking: Humans possess meta-cognition — the ability to reflect on our own thought processes and adjust them. Agentic workflows, with their capacity for self-monitoring and re-planning, are a foundational step towards AI developing its own form of meta-cognition.
Inventing New Ways: Perhaps soon enough, an advanced agentic AI won’t just use existing tools and workflows, but identify the need for entirely new ones and even contribute to their creation. A hallmark of true general intelligence.
Not Just Helpful, But Mandatory: Why AGI Needs Agentic Workflows
These capabilities aren’t just fancy add-ons. They are arguably essential for anything we’d recognize as AGI:
Tackling Complexity: Real-world problems are messy, multifaceted, and rarely solved by a single, linear process. Agentic workflows will allow AI to break down these complex challenges into manageable sub-tasks, orchestrating diverse capabilities.
Achieving Scale: Imagine trying to manage global logistics, conduct large-scale scientific research, or personalize education for millions with a single, rigid program. Agentic systems offer the modularity and dynamic coordination needed for such scale.
Adaptability and Robustness: What happens when the data changes, a tool fails, or an assumption proves wrong? A static AI might grind to a halt. An agentic AI can adapt, re-plan, find alternative solutions, and continue pursuing its goal. It can handle the unexpected.
Resourcefulness: Like our engineer, an AGI needs to be able to identify and use the right “tool”for the job at hand, rather than trying to be a jack-of-all-trades with a single block massive model.
Surpassing Human Adaptability
The first step is for AI to achieve a human-like ability to set goals, plan, use tools, and adapt through agentic workflows. But the true promise of AGI lies in surpassing these capabilities:
Speed: Learn and adapt at speeds incomprehensible to us, iterating through problem-solving cycles in milliseconds.
Scale: Manage and orchestrate operations of immense complexity, juggling thousands of variables and “tools” simultaneously.
Novelty: Devise entirely new, perhaps counter-intuitive, workflows and solutions to problems that humans haven’t even conceived of.
Self-Improvement of Workflows: An AGI that doesn’t just use workflows but actively refines, optimizes, and even discovers fundamentally new and more efficient ways to achieve its goals.
Deeper Meta-Learning: Learning how to learn, plan, and strategize more effectively over time, becoming increasingly more intelligent and capable.
Long-Horizon Reasoning: Successfully breaking down and navigating extremely complex, multi-stage goals that unfold over extended periods, adapting robustly along the way.
Obviously, this is easier said than done. Building true AGI presents formidable challenges: How do we design systems that can reliably plan in open-ended environments? How can they discover and integrate new tools seamlessly? How does the system learn which part of a long, complex workflow was responsible for success or failure?
These are active areas of research, pushing the boundaries of what AI can do. Thankfully we are witnessing more and more breakthroughs everyday, and RL — Reinforcement Learning based approaches are showing great promise.
Conclusion: Agency as the Cornerstone of AGI
The quest for AGI is more than a race for larger models or faster processing. It’s a quest for intelligence that is versatile, adaptive, and purposeful. Agentic workflows provide the framework for such intelligence, enabling AI to move beyond mere pattern recognition to become an active participant in the problem-solving process.
Just as human collective general intelligence emerged not merely from neurons, but from networks of thought, culture, and action — we must build AGI not as a single-block model, but as an AI capable of learning, adapting, and acting. Agency, in this light, isn’t just a feature; it’s the fundamental engine that will drive us towards true Artificial General Intelligence.
If you liked this article, make sure to follow for more.And you can find me on:
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
#agency #key #agiAgency is The Key to AGIAuthor: Adam BEN KHALIFA Originally published on Towards AI. Why are agentic workflows essential for achieving AGI Let me ask you this, what if the path to truly smart and effective AI , the kind we call AGI, isn’t just about building one colossal, all-knowing brain? What if the real breakthrough lies not in making our models only smarter, but in making them also capable of acting, adapting, and evolving? Well, LLMs continue to amaze us day after day, but the road to AGI demands more than raw intellect. It requires agency. Getting Our Terms Straight: AGI, Agency, and Agentic Workflows Before we dive in, let’s define the main concepts here: AGI — Artificial General Intelligence: You can see it as an AI model that can perform any intellectual task a human can. This means not just understanding language or generating images, but adapting, learning, reasoning, and acting across entirely new domains. Agency: The capacity of an entity to act purposefully in its environment to achieve goals. A rock has no agency; a human planning their day has plenty. For an AI, agency means it’s not just passively responding to prompts but actively pursuing objectives. Simply put, it’s the capacity to pursue goals autonomously through planning, acting, and adapting. Agentic Workflows: If agency is the “what”, agentic workflows are the “how”. These are the dynamic processes and systems an AI uses to exercise its agency. Think beyond a simple input-output model. Agentic workflows involve: Autonomous Goal-Setting & Planning: the AI doesn’t just execute a pre-defined plan, it can formulate goals and strategize how to achieve them. Tool Use & Orchestration: like a skilled craftsperson, it can select, combine, and utilize various “tools”to get the job done. Memory & Learning: it remembers past actions, learns from successes and failures, and adapts its strategies over time. Adaptation in Dynamic Environments: the real world is messy, an agentic AI can adjust its plan when encountering unexpected obstacles or new information. It’s vital to understand the difference here: An LLM calling a weather API is just tool use. An agentic workflow is when an LLM, tasked with “analyzing market trends for a new product,” autonomously decides to:1) search recent financial news, 2) query a sales database, 3) use a data analysis tool to spot correlations, 4) ask a specialized forecasting model for projections, and then 5) compile a summary report, re-evaluating its approach at each step.It’s like the difference between a single musician playing one note, and a conductor leading an entire orchestra. The Limitation of Isolated Intelligence Let’s consider a human analogy. Imagine a brilliant engineer, a genius in their field. Now, strip away their tools: no computer, no internet for research, no pen and paper for sketching ideas or taking notes, no lab for prototyping, no colleagues to bounce ideas off. Confine them to only their thoughts. How much could they truly achieve? Their raw intellect, however vast, becomes severely handicapped when uncoupled from the ability to interact, experiment, and leverage external resources. This “intelligence in isolation” scenario illustrates a fundamental truth: intelligence doesn’t operate in a vacuum. It thrives on interaction, tool use, and the ability to execute plans in the world. If we want AGI, we can’t just build a disembodied digital brain, we need to build something that can act. Agentic Workflows: AI can act like us, and perhaps even better Humans are masters of adapting their “workflows.” A painter uses different tools and processes than an engineer, who uses different methods than a chef. We intuitively understand context, choose the right approach, and even invent new methods when old ones fail. Agentic workflows aim to achieve similar capabilities: Contextual Flexibility: An Agentic AI could switch between “investigative journalist mode”and “creative writer mode”as needed for a complex task. Learning by Doing: Human learning is an iterative workflow: observe, hypothesize, experiment, analyze, conclude, refine. Agentic systems can embody this, trying approaches, evaluating outcomes, and improving their strategies. Beyond Monolithic Thought: We don’t store everything in our heads. We use notes, computers, books, and critically, we delegate tasks to others. Agentic AI can similarly leverage external knowledge bases, specialized sub-agents, and computational tools, creating a distributed, more powerful form of intelligence. Thinking About Thinking: Humans possess meta-cognition — the ability to reflect on our own thought processes and adjust them. Agentic workflows, with their capacity for self-monitoring and re-planning, are a foundational step towards AI developing its own form of meta-cognition. Inventing New Ways: Perhaps soon enough, an advanced agentic AI won’t just use existing tools and workflows, but identify the need for entirely new ones and even contribute to their creation. A hallmark of true general intelligence. Not Just Helpful, But Mandatory: Why AGI Needs Agentic Workflows These capabilities aren’t just fancy add-ons. They are arguably essential for anything we’d recognize as AGI: Tackling Complexity: Real-world problems are messy, multifaceted, and rarely solved by a single, linear process. Agentic workflows will allow AI to break down these complex challenges into manageable sub-tasks, orchestrating diverse capabilities. Achieving Scale: Imagine trying to manage global logistics, conduct large-scale scientific research, or personalize education for millions with a single, rigid program. Agentic systems offer the modularity and dynamic coordination needed for such scale. Adaptability and Robustness: What happens when the data changes, a tool fails, or an assumption proves wrong? A static AI might grind to a halt. An agentic AI can adapt, re-plan, find alternative solutions, and continue pursuing its goal. It can handle the unexpected. Resourcefulness: Like our engineer, an AGI needs to be able to identify and use the right “tool”for the job at hand, rather than trying to be a jack-of-all-trades with a single block massive model. Surpassing Human Adaptability The first step is for AI to achieve a human-like ability to set goals, plan, use tools, and adapt through agentic workflows. But the true promise of AGI lies in surpassing these capabilities: Speed: Learn and adapt at speeds incomprehensible to us, iterating through problem-solving cycles in milliseconds. Scale: Manage and orchestrate operations of immense complexity, juggling thousands of variables and “tools” simultaneously. Novelty: Devise entirely new, perhaps counter-intuitive, workflows and solutions to problems that humans haven’t even conceived of. Self-Improvement of Workflows: An AGI that doesn’t just use workflows but actively refines, optimizes, and even discovers fundamentally new and more efficient ways to achieve its goals. Deeper Meta-Learning: Learning how to learn, plan, and strategize more effectively over time, becoming increasingly more intelligent and capable. Long-Horizon Reasoning: Successfully breaking down and navigating extremely complex, multi-stage goals that unfold over extended periods, adapting robustly along the way. Obviously, this is easier said than done. Building true AGI presents formidable challenges: How do we design systems that can reliably plan in open-ended environments? How can they discover and integrate new tools seamlessly? How does the system learn which part of a long, complex workflow was responsible for success or failure? These are active areas of research, pushing the boundaries of what AI can do. Thankfully we are witnessing more and more breakthroughs everyday, and RL — Reinforcement Learning based approaches are showing great promise. Conclusion: Agency as the Cornerstone of AGI The quest for AGI is more than a race for larger models or faster processing. It’s a quest for intelligence that is versatile, adaptive, and purposeful. Agentic workflows provide the framework for such intelligence, enabling AI to move beyond mere pattern recognition to become an active participant in the problem-solving process. Just as human collective general intelligence emerged not merely from neurons, but from networks of thought, culture, and action — we must build AGI not as a single-block model, but as an AI capable of learning, adapting, and acting. Agency, in this light, isn’t just a feature; it’s the fundamental engine that will drive us towards true Artificial General Intelligence. If you liked this article, make sure to follow for more.And you can find me on: Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI #agency #key #agiTOWARDSAI.NETAgency is The Key to AGIAuthor(s): Adam BEN KHALIFA Originally published on Towards AI. Why are agentic workflows essential for achieving AGI Let me ask you this, what if the path to truly smart and effective AI , the kind we call AGI, isn’t just about building one colossal, all-knowing brain? What if the real breakthrough lies not in making our models only smarter, but in making them also capable of acting, adapting, and evolving? Well, LLMs continue to amaze us day after day, but the road to AGI demands more than raw intellect. It requires agency. Getting Our Terms Straight: AGI, Agency, and Agentic Workflows Before we dive in, let’s define the main concepts here: AGI — Artificial General Intelligence: You can see it as an AI model that can perform any intellectual task a human can. This means not just understanding language or generating images, but adapting, learning, reasoning, and acting across entirely new domains. Agency: The capacity of an entity to act purposefully in its environment to achieve goals. A rock has no agency; a human planning their day has plenty. For an AI, agency means it’s not just passively responding to prompts but actively pursuing objectives. Simply put, it’s the capacity to pursue goals autonomously through planning, acting, and adapting. Agentic Workflows: If agency is the “what”, agentic workflows are the “how”. These are the dynamic processes and systems an AI uses to exercise its agency. Think beyond a simple input-output model. Agentic workflows involve: Autonomous Goal-Setting & Planning: the AI doesn’t just execute a pre-defined plan, it can formulate goals and strategize how to achieve them. Tool Use & Orchestration: like a skilled craftsperson, it can select, combine, and utilize various “tools” (other AI models, databases, APIs, code execution environments) to get the job done. Memory & Learning: it remembers past actions, learns from successes and failures, and adapts its strategies over time. Adaptation in Dynamic Environments: the real world is messy, an agentic AI can adjust its plan when encountering unexpected obstacles or new information. It’s vital to understand the difference here: An LLM calling a weather API is just tool use. An agentic workflow is when an LLM, tasked with “analyzing market trends for a new product,” autonomously decides to:1) search recent financial news, 2) query a sales database, 3) use a data analysis tool to spot correlations, 4) ask a specialized forecasting model for projections, and then 5) compile a summary report, re-evaluating its approach at each step.It’s like the difference between a single musician playing one note, and a conductor leading an entire orchestra. The Limitation of Isolated Intelligence Let’s consider a human analogy. Imagine a brilliant engineer, a genius in their field. Now, strip away their tools: no computer, no internet for research, no pen and paper for sketching ideas or taking notes, no lab for prototyping, no colleagues to bounce ideas off. Confine them to only their thoughts. How much could they truly achieve? Their raw intellect, however vast, becomes severely handicapped when uncoupled from the ability to interact, experiment, and leverage external resources. This “intelligence in isolation” scenario illustrates a fundamental truth: intelligence doesn’t operate in a vacuum. It thrives on interaction, tool use, and the ability to execute plans in the world. If we want AGI, we can’t just build a disembodied digital brain, we need to build something that can act. Agentic Workflows: AI can act like us, and perhaps even better Humans are masters of adapting their “workflows.” A painter uses different tools and processes than an engineer, who uses different methods than a chef. We intuitively understand context, choose the right approach, and even invent new methods when old ones fail. Agentic workflows aim to achieve similar capabilities: Contextual Flexibility: An Agentic AI could switch between “investigative journalist mode” (querying databases, cross-referencing sources, interviewing) and “creative writer mode” (generating narratives, exploring styles) as needed for a complex task. Learning by Doing (and Re-doing): Human learning is an iterative workflow: observe, hypothesize, experiment, analyze, conclude, refine. Agentic systems can embody this, trying approaches, evaluating outcomes, and improving their strategies. Beyond Monolithic Thought: We don’t store everything in our heads. We use notes, computers, books, and critically, we delegate tasks to others. Agentic AI can similarly leverage external knowledge bases, specialized sub-agents, and computational tools, creating a distributed, more powerful form of intelligence. Thinking About Thinking: Humans possess meta-cognition — the ability to reflect on our own thought processes and adjust them. Agentic workflows, with their capacity for self-monitoring and re-planning, are a foundational step towards AI developing its own form of meta-cognition. Inventing New Ways: Perhaps soon enough, an advanced agentic AI won’t just use existing tools and workflows, but identify the need for entirely new ones and even contribute to their creation. A hallmark of true general intelligence. Not Just Helpful, But Mandatory: Why AGI Needs Agentic Workflows These capabilities aren’t just fancy add-ons. They are arguably essential for anything we’d recognize as AGI: Tackling Complexity: Real-world problems are messy, multifaceted, and rarely solved by a single, linear process. Agentic workflows will allow AI to break down these complex challenges into manageable sub-tasks, orchestrating diverse capabilities. Achieving Scale: Imagine trying to manage global logistics, conduct large-scale scientific research, or personalize education for millions with a single, rigid program. Agentic systems offer the modularity and dynamic coordination needed for such scale. Adaptability and Robustness: What happens when the data changes, a tool fails, or an assumption proves wrong? A static AI might grind to a halt. An agentic AI can adapt, re-plan, find alternative solutions, and continue pursuing its goal. It can handle the unexpected. Resourcefulness: Like our engineer, an AGI needs to be able to identify and use the right “tool” (be it a specific algorithm, dataset, or external service) for the job at hand, rather than trying to be a jack-of-all-trades with a single block massive model. Surpassing Human Adaptability The first step is for AI to achieve a human-like ability to set goals, plan, use tools, and adapt through agentic workflows. But the true promise of AGI lies in surpassing these capabilities: Speed: Learn and adapt at speeds incomprehensible to us, iterating through problem-solving cycles in milliseconds. Scale: Manage and orchestrate operations of immense complexity, juggling thousands of variables and “tools” simultaneously. Novelty: Devise entirely new, perhaps counter-intuitive, workflows and solutions to problems that humans haven’t even conceived of. Self-Improvement of Workflows: An AGI that doesn’t just use workflows but actively refines, optimizes, and even discovers fundamentally new and more efficient ways to achieve its goals. Deeper Meta-Learning: Learning how to learn, plan, and strategize more effectively over time, becoming increasingly more intelligent and capable. Long-Horizon Reasoning: Successfully breaking down and navigating extremely complex, multi-stage goals that unfold over extended periods, adapting robustly along the way. Obviously, this is easier said than done. Building true AGI presents formidable challenges: How do we design systems that can reliably plan in open-ended environments? How can they discover and integrate new tools seamlessly? How does the system learn which part of a long, complex workflow was responsible for success or failure? These are active areas of research, pushing the boundaries of what AI can do. Thankfully we are witnessing more and more breakthroughs everyday, and RL — Reinforcement Learning based approaches are showing great promise (which make sense actually, but that’s for another article). Conclusion: Agency as the Cornerstone of AGI The quest for AGI is more than a race for larger models or faster processing. It’s a quest for intelligence that is versatile, adaptive, and purposeful. Agentic workflows provide the framework for such intelligence, enabling AI to move beyond mere pattern recognition to become an active participant in the problem-solving process. Just as human collective general intelligence emerged not merely from neurons, but from networks of thought, culture, and action — we must build AGI not as a single-block model, but as an AI capable of learning, adapting, and acting. Agency, in this light, isn’t just a feature; it’s the fundamental engine that will drive us towards true Artificial General Intelligence. If you liked this article, make sure to follow for more.And you can find me on: Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 التعليقات 0 المشاركات -
You Have NO Excuse to Not Be an LLM Developer Today
Author: Towards AI Editorial Team
Originally published on Towards AI.
Ever since we released our first book on LLMs — and now our most comprehensive course, From Beginner to Advanced LLM Developer — we’ve heard the same questions from devs sitting on the fence:
“Isn’t most of this stuff already free online?”
Sure — if you have time to stitch together YouTube videos, half-written blog posts, outdated Colabs, and guess your way through broken workflows. But this course is the equivalent of hiring an expert AI mentor for /hour — someone who’s built production-grade LLM systems and walks you through the exact pipeline, step by step.
“But things are moving so fast… won’t it be outdated?”
That’s why we built it: to move with you. The course is updated every single week to reflect the latest models, tools, and practices. What you’re learning isn’t a static curriculum — it’s a living roadmap with lifetime access, so you can adapt as the field evolves.
And because staying current isn’t enough, you also need confidence that what you ship today still holds tomorrow.
That’s why we’re now running monthly live cohorts — so you stay sharp, supported, and up to date.
The next cohort kicks off June 1 with a live welcome call with our CEO.
Join the course here
“Will I actually build something real?”
You won’t just learn. You’ll build and ship.
A real LLM product. A full-stack application with prompting, RAG, fine-tuning, evaluation, and deployment — wrapped in a Gradio or Streamlit front end. You’ll walk away with something you can show off to a CTO, use in a job interview, or demo to your team.
“Will it actually move the needle on my career?”
Don’t take our word for it — here’s what past students are saying:
“Best course out there to become an AI engineer. Planning to build my own startup based on the learnings.” — Abhijit L
“Expanded my knowledge of RAG pipelines and gave me real-world tools.” — Eoin McGrath
“From zero to hero as an LLM Developer… a clear path to build LLM applications that can change your career.” — Luca Tanieli
This is the course we wish we had when we started.
It’s not fluff. It’s not slides. It’s a system built on two years of working with real-world LLM deployments across companies, research teams, and startups.
You’ll walk away with:
A repeatable pipeline A mindset for thinking like an AI engineer, not just a prompt tinkerer
Weekly updates to stay ahead of the curve
Access to our private Slack + 70,000+ builder community on Discord
A portfolio-ready project that proves what you can do
The next live cohort starts June 1 with a welcome call from our CEO.
We’re capping enrollments again this month to keep the experience hands-on and high-touch, and last month’s seats filled up faster than expected.
When you join now, you don’t need to wait — you get immediate access to the full course and can start building your AI product this week.
If you want to be ahead of this next wave of AI instead of trying to catch up…
This is your window.
PEOPLE LIKE YOU ARE:
Breaking into in-demand roles like LLM Developer
Advancing careers and increasing earnings
Monetizing AI by building high-selling LLM products
Leading AI and ML engineering teams
Using AI to work smarter, not harder, and achieving more
If you’re thinking, “This sounds great, but what if it’s not for me?” — we get it. That’s why the course comes with a 30-day, no-questions-asked money-back guarantee. Try it. Dive into the material. If it doesn’t meet your expectations, we’ll refund you in full.
Secure your spot for the June 1st cohort
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
#you #have #excuse #not #llmYou Have NO Excuse to Not Be an LLM Developer TodayAuthor: Towards AI Editorial Team Originally published on Towards AI. Ever since we released our first book on LLMs — and now our most comprehensive course, From Beginner to Advanced LLM Developer — we’ve heard the same questions from devs sitting on the fence: “Isn’t most of this stuff already free online?” Sure — if you have time to stitch together YouTube videos, half-written blog posts, outdated Colabs, and guess your way through broken workflows. But this course is the equivalent of hiring an expert AI mentor for /hour — someone who’s built production-grade LLM systems and walks you through the exact pipeline, step by step. “But things are moving so fast… won’t it be outdated?” That’s why we built it: to move with you. The course is updated every single week to reflect the latest models, tools, and practices. What you’re learning isn’t a static curriculum — it’s a living roadmap with lifetime access, so you can adapt as the field evolves. And because staying current isn’t enough, you also need confidence that what you ship today still holds tomorrow. That’s why we’re now running monthly live cohorts — so you stay sharp, supported, and up to date. The next cohort kicks off June 1 with a live welcome call with our CEO. 👉 Join the course here “Will I actually build something real?” You won’t just learn. You’ll build and ship. A real LLM product. A full-stack application with prompting, RAG, fine-tuning, evaluation, and deployment — wrapped in a Gradio or Streamlit front end. You’ll walk away with something you can show off to a CTO, use in a job interview, or demo to your team. “Will it actually move the needle on my career?” Don’t take our word for it — here’s what past students are saying: “Best course out there to become an AI engineer. Planning to build my own startup based on the learnings.” — Abhijit L “Expanded my knowledge of RAG pipelines and gave me real-world tools.” — Eoin McGrath “From zero to hero as an LLM Developer… a clear path to build LLM applications that can change your career.” — Luca Tanieli This is the course we wish we had when we started. It’s not fluff. It’s not slides. It’s a system built on two years of working with real-world LLM deployments across companies, research teams, and startups. You’ll walk away with: ✅ A repeatable pipeline✅ A mindset for thinking like an AI engineer, not just a prompt tinkerer ✅ Weekly updates to stay ahead of the curve ✅ Access to our private Slack + 70,000+ builder community on Discord ✅ A portfolio-ready project that proves what you can do The next live cohort starts June 1 with a welcome call from our CEO. We’re capping enrollments again this month to keep the experience hands-on and high-touch, and last month’s seats filled up faster than expected. When you join now, you don’t need to wait — you get immediate access to the full course and can start building your AI product this week. If you want to be ahead of this next wave of AI instead of trying to catch up… This is your window. PEOPLE LIKE YOU ARE: Breaking into in-demand roles like LLM Developer Advancing careers and increasing earnings Monetizing AI by building high-selling LLM products Leading AI and ML engineering teams Using AI to work smarter, not harder, and achieving more If you’re thinking, “This sounds great, but what if it’s not for me?” — we get it. That’s why the course comes with a 30-day, no-questions-asked money-back guarantee. Try it. Dive into the material. If it doesn’t meet your expectations, we’ll refund you in full. 👉 Secure your spot for the June 1st cohort Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI #you #have #excuse #not #llmTOWARDSAI.NETYou Have NO Excuse to Not Be an LLM Developer TodayAuthor(s): Towards AI Editorial Team Originally published on Towards AI. Ever since we released our first book on LLMs — and now our most comprehensive course, From Beginner to Advanced LLM Developer — we’ve heard the same questions from devs sitting on the fence: “Isn’t most of this stuff already free online?” Sure — if you have time to stitch together YouTube videos, half-written blog posts, outdated Colabs, and guess your way through broken workflows. But this course is the equivalent of hiring an expert AI mentor for $4/hour — someone who’s built production-grade LLM systems and walks you through the exact pipeline, step by step. “But things are moving so fast… won’t it be outdated?” That’s why we built it: to move with you. The course is updated every single week to reflect the latest models, tools, and practices. What you’re learning isn’t a static curriculum — it’s a living roadmap with lifetime access, so you can adapt as the field evolves. And because staying current isn’t enough, you also need confidence that what you ship today still holds tomorrow. That’s why we’re now running monthly live cohorts — so you stay sharp, supported, and up to date. The next cohort kicks off June 1 with a live welcome call with our CEO. 👉 Join the course here “Will I actually build something real?” You won’t just learn. You’ll build and ship. A real LLM product. A full-stack application with prompting, RAG, fine-tuning, evaluation, and deployment — wrapped in a Gradio or Streamlit front end. You’ll walk away with something you can show off to a CTO, use in a job interview, or demo to your team. “Will it actually move the needle on my career?” Don’t take our word for it — here’s what past students are saying: “Best course out there to become an AI engineer. Planning to build my own startup based on the learnings.” — Abhijit L “Expanded my knowledge of RAG pipelines and gave me real-world tools.” — Eoin McGrath “From zero to hero as an LLM Developer… a clear path to build LLM applications that can change your career.” — Luca Tanieli This is the course we wish we had when we started. It’s not fluff. It’s not slides. It’s a system built on two years of working with real-world LLM deployments across companies, research teams, and startups. You’ll walk away with: ✅ A repeatable pipeline (prompting → RAG → fine-tuning → evaluation → deployment) ✅ A mindset for thinking like an AI engineer, not just a prompt tinkerer ✅ Weekly updates to stay ahead of the curve ✅ Access to our private Slack + 70,000+ builder community on Discord ✅ A portfolio-ready project that proves what you can do The next live cohort starts June 1 with a welcome call from our CEO. We’re capping enrollments again this month to keep the experience hands-on and high-touch, and last month’s seats filled up faster than expected. When you join now, you don’t need to wait — you get immediate access to the full course and can start building your AI product this week. If you want to be ahead of this next wave of AI instead of trying to catch up… This is your window. PEOPLE LIKE YOU ARE: Breaking into in-demand roles like LLM Developer Advancing careers and increasing earnings Monetizing AI by building high-selling LLM products Leading AI and ML engineering teams Using AI to work smarter, not harder, and achieving more If you’re thinking, “This sounds great, but what if it’s not for me?” — we get it. That’s why the course comes with a 30-day, no-questions-asked money-back guarantee. Try it. Dive into the material. If it doesn’t meet your expectations, we’ll refund you in full. 👉 Secure your spot for the June 1st cohort Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 التعليقات 0 المشاركات -
Data Storytelling with Altair and pynarrative: Turning Data into Insight
Author: S Aishwarya
Originally published on Towards AI.
Strong data storytelling goes beyond simply visualizing numbers it uncovers the meaning behind the patterns, bringing clarity to what would otherwise be just a spreadsheet of values.
Photo by Carlos Muza on Unsplash
While visualization libraries like matplotlib, Plotly, and Seaborn can produce beautiful charts, they often lack one crucial feature: narrative. They leave it up to the viewer to interpret the story behind the lines and bars.
That’s where Altair and the pynarrative library shine. Together, they help us not only visualize data — but actually explain it.
What is Altair?
Altair is a Python library for declarative data visualization that allows users to create clean, concise, and interactive charts based on the Vega-Lite grammar of graphics.
You only need to provide:
your datachart typeencodingoptional interactivity, filtering, and tooltips
Altair then renders the visualization using a JSON specification — ready for use in dashboards, notebooks, web applications, or reports.
The Altair library directly integrates with pandasand Vega-Lite, making it easy for Python users to create powerful data stories without writing complex plotting code.
What is pynarrative?
pynarrative is a Python library designed to automatically craft clear, insightful narrative summaries from pandas DataFrames and Altair charts.
With just a few inputs:
A datasetA visualizationAxis labels, optional context, and your intended message
pynarrative generates a well-structured textual explanation — ideal for embedding in dashboards, reports, presentations, or interactive data stories.
Built to work seamlessly with pandasand Altair, pynarrative helps bridge the gap between raw data and human-readable insights — turning visualizations into compelling narratives with minimal effort.
Data Description:
We’re using the cars dataset, which contains information about different car models. The main features we’ll focus on are:
Horsepower: The power of the car’s engine.
Miles_per_Gallon: How fuel-efficient the car is.
Origin: Where the car was made.
Name: The model name of the car.
These features help us explore the relationship between a car’s power and fuel efficiency, and how that varies by origin.
Data Cleaning & Preparation
We’ll begin by automatically loading the dataset using Seaborn, then clean it for our visualizations.
import pandas as pdimport seaborn as snsimport altair as altimport pynarrative as pndf = sns.load_datasetprint)
Output:
Image by Author
Cleaning Steps:
Convert horsepower to numeric to handle any potential issues.
Drop rows with missing values in critical fields.
df= pd.to_numericdf_clean = df.dropnaprint)
Output:
Image by Author
Story 1: Power vs. Fuel Efficiency
Let’s explore the relationship between a car’s engine powerand its fuel efficiency.
By color-coding the data points based on the car’s region of origin, we gain insight into how different countries approach automotive design.
chart = pn.Story.mark_circle.encode.add_title.add_context.renderchart
Output:
Image by Author
This visualization reveals that American cars tend to have higher horsepower but lower fuel economy, whereas Japanese and European cars show more balance.
Story 2: Regional Efficiency Trends Over Time
Let’s observe how fuel efficiencyhas changed over time across different regions.
# Estimate year from model_year columndf_clean= df_clean+ 1900# Compute average MPG by region and yearregional_avg = df_clean.groupby.mean.reset_indexchart = pn.Story.mark_line.encode), y=alt.Y, color='origin:N').add_title.add_context.renderchart
Output:
Image by Author
We see how regulatory changes and fuel crises influenced fuel efficiency, especially in the U.S.
Story 3: Impact of the 1973 Oil Crisis
Let’s annotate our chart with the 1973 Oil Crisis, a pivotal moment for car design.
chart = pn.Story.mark_line.encode.add_title.add_context.add_annotation.renderchart
Output:
Image by Author
This annotated visualization adds historical context, showing how global events shape industry trends.
In Summary…
Using pynarrative and Altair, we seamlessly transformed car performance data into engaging visual stories by:
Highlighting the inverse relationship between horsepower and fuel efficiency
Exploring how regional design philosophies shape fuel economy over time
Annotating major historical events like the 1973 Oil Crisis to show their industry impact
All of this was done using a single, intuitive interface combining pandas, Altair, and pynarrative. Once this storytelling pipeline is in place, it can be adapted to any dataset rendered through Altair from automotive to healthcare and beyond.
This approach is quicker, more scalable, and more intuitive than conventional manual charting methods. Whether you’re building technical reports, dynamic dashboards, or insight-driven narratives, this serves as a reliable foundation for effective data storytelling.
I would love to read your comments!
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
#data #storytelling #with #altair #pynarrativeData Storytelling with Altair and pynarrative: Turning Data into InsightAuthor: S Aishwarya Originally published on Towards AI. Strong data storytelling goes beyond simply visualizing numbers it uncovers the meaning behind the patterns, bringing clarity to what would otherwise be just a spreadsheet of values. Photo by Carlos Muza on Unsplash While visualization libraries like matplotlib, Plotly, and Seaborn can produce beautiful charts, they often lack one crucial feature: narrative. They leave it up to the viewer to interpret the story behind the lines and bars. That’s where Altair and the pynarrative library shine. Together, they help us not only visualize data — but actually explain it. 🔍What is Altair? Altair is a Python library for declarative data visualization that allows users to create clean, concise, and interactive charts based on the Vega-Lite grammar of graphics. You only need to provide: your datachart typeencodingoptional interactivity, filtering, and tooltips Altair then renders the visualization using a JSON specification — ready for use in dashboards, notebooks, web applications, or reports. The Altair library directly integrates with pandasand Vega-Lite, making it easy for Python users to create powerful data stories without writing complex plotting code. 🔍 What is pynarrative? pynarrative is a Python library designed to automatically craft clear, insightful narrative summaries from pandas DataFrames and Altair charts. With just a few inputs: A datasetA visualizationAxis labels, optional context, and your intended message pynarrative generates a well-structured textual explanation — ideal for embedding in dashboards, reports, presentations, or interactive data stories. Built to work seamlessly with pandasand Altair, pynarrative helps bridge the gap between raw data and human-readable insights — turning visualizations into compelling narratives with minimal effort. Data Description: We’re using the cars dataset, which contains information about different car models. The main features we’ll focus on are: Horsepower: The power of the car’s engine. Miles_per_Gallon: How fuel-efficient the car is. Origin: Where the car was made. Name: The model name of the car. These features help us explore the relationship between a car’s power and fuel efficiency, and how that varies by origin. Data Cleaning & Preparation We’ll begin by automatically loading the dataset using Seaborn, then clean it for our visualizations. import pandas as pdimport seaborn as snsimport altair as altimport pynarrative as pndf = sns.load_datasetprint) Output: Image by Author Cleaning Steps: Convert horsepower to numeric to handle any potential issues. Drop rows with missing values in critical fields. df= pd.to_numericdf_clean = df.dropnaprint) Output: Image by Author Story 1: Power vs. Fuel Efficiency Let’s explore the relationship between a car’s engine powerand its fuel efficiency. By color-coding the data points based on the car’s region of origin, we gain insight into how different countries approach automotive design. chart = pn.Story.mark_circle.encode.add_title.add_context.renderchart Output: Image by Author This visualization reveals that American cars tend to have higher horsepower but lower fuel economy, whereas Japanese and European cars show more balance. Story 2: Regional Efficiency Trends Over Time Let’s observe how fuel efficiencyhas changed over time across different regions. # Estimate year from model_year columndf_clean= df_clean+ 1900# Compute average MPG by region and yearregional_avg = df_clean.groupby.mean.reset_indexchart = pn.Story.mark_line.encode), y=alt.Y, color='origin:N').add_title.add_context.renderchart Output: Image by Author We see how regulatory changes and fuel crises influenced fuel efficiency, especially in the U.S. Story 3: Impact of the 1973 Oil Crisis Let’s annotate our chart with the 1973 Oil Crisis, a pivotal moment for car design. chart = pn.Story.mark_line.encode.add_title.add_context.add_annotation.renderchart Output: Image by Author This annotated visualization adds historical context, showing how global events shape industry trends. In Summary… Using pynarrative and Altair, we seamlessly transformed car performance data into engaging visual stories by: Highlighting the inverse relationship between horsepower and fuel efficiency Exploring how regional design philosophies shape fuel economy over time Annotating major historical events like the 1973 Oil Crisis to show their industry impact All of this was done using a single, intuitive interface combining pandas, Altair, and pynarrative. Once this storytelling pipeline is in place, it can be adapted to any dataset rendered through Altair from automotive to healthcare and beyond. This approach is quicker, more scalable, and more intuitive than conventional manual charting methods. Whether you’re building technical reports, dynamic dashboards, or insight-driven narratives, this serves as a reliable foundation for effective data storytelling. I would love to read your comments! Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI #data #storytelling #with #altair #pynarrativeTOWARDSAI.NETData Storytelling with Altair and pynarrative: Turning Data into InsightAuthor(s): S Aishwarya Originally published on Towards AI. Strong data storytelling goes beyond simply visualizing numbers it uncovers the meaning behind the patterns, bringing clarity to what would otherwise be just a spreadsheet of values. Photo by Carlos Muza on Unsplash While visualization libraries like matplotlib, Plotly, and Seaborn can produce beautiful charts, they often lack one crucial feature: narrative. They leave it up to the viewer to interpret the story behind the lines and bars. That’s where Altair and the pynarrative library shine. Together, they help us not only visualize data — but actually explain it. 🔍What is Altair? Altair is a Python library for declarative data visualization that allows users to create clean, concise, and interactive charts based on the Vega-Lite grammar of graphics. You only need to provide: your data (typically a pandas DataFrame or Vega datasets) chart type (e.g., bar, line, scatter) encoding (x/y axes, color, size, etc.) optional interactivity, filtering, and tooltips Altair then renders the visualization using a JSON specification — ready for use in dashboards, notebooks, web applications, or reports. The Altair library directly integrates with pandas (for data handling) and Vega-Lite (for rendering and interactivity), making it easy for Python users to create powerful data stories without writing complex plotting code. 🔍 What is pynarrative? pynarrative is a Python library designed to automatically craft clear, insightful narrative summaries from pandas DataFrames and Altair charts. With just a few inputs: A dataset (often a time series or structured data in a DataFrame) A visualization (created using Altair) Axis labels, optional context, and your intended message pynarrative generates a well-structured textual explanation — ideal for embedding in dashboards, reports, presentations, or interactive data stories. Built to work seamlessly with pandas (for data handling) and Altair (for visual rendering), pynarrative helps bridge the gap between raw data and human-readable insights — turning visualizations into compelling narratives with minimal effort. Data Description: We’re using the cars dataset, which contains information about different car models. The main features we’ll focus on are: Horsepower: The power of the car’s engine. Miles_per_Gallon (MPG): How fuel-efficient the car is. Origin: Where the car was made (USA, Europe, or Japan). Name: The model name of the car. These features help us explore the relationship between a car’s power and fuel efficiency, and how that varies by origin. Data Cleaning & Preparation We’ll begin by automatically loading the dataset using Seaborn, then clean it for our visualizations. import pandas as pdimport seaborn as snsimport altair as altimport pynarrative as pndf = sns.load_dataset('mpg')print(df[['name', 'mpg', 'horsepower', 'origin']].head()) Output: Image by Author Cleaning Steps: Convert horsepower to numeric to handle any potential issues. Drop rows with missing values in critical fields. df['horsepower'] = pd.to_numeric(df['horsepower'], errors='coerce')df_clean = df.dropna(subset=['horsepower', 'mpg', 'origin'])print(df_clean[['name', 'mpg', 'horsepower', 'origin']].head()) Output: Image by Author Story 1: Power vs. Fuel Efficiency Let’s explore the relationship between a car’s engine power (horsepower) and its fuel efficiency (miles per gallon). By color-coding the data points based on the car’s region of origin, we gain insight into how different countries approach automotive design. chart = pn.Story(df_clean).mark_circle(size=60).encode( x='horsepower:Q', y='mpg:Q', color='origin:N', tooltip=['name', 'horsepower', 'mpg', 'origin']).add_title( "Horsepower vs MPG by Origin", "Higher horsepower often leads to lower fuel efficiency", title_color="#1a1a1a", subtitle_color="#4a4a4a").add_context( text=[ "Cars with more horsepower generally consume more fuel.", "Japanese and European models show a clear emphasis on fuel efficiency.", "This trend reveals differing consumer needs and manufacturer strategies." ], position="bottom", dx=0, color="black").render()chart Output: Image by Author This visualization reveals that American cars tend to have higher horsepower but lower fuel economy, whereas Japanese and European cars show more balance. Story 2: Regional Efficiency Trends Over Time Let’s observe how fuel efficiency (MPG) has changed over time across different regions. # Estimate year from model_year columndf_clean['year'] = df_clean['model_year'] + 1900# Compute average MPG by region and yearregional_avg = df_clean.groupby(['year', 'origin'])['mpg'].mean().reset_index()chart = pn.Story(regional_avg).mark_line(point=True).encode( x=alt.X('year:O', axis=alt.Axis(title='Year')), y=alt.Y('mpg:Q', title='Average MPG'), color='origin:N').add_title( "Average Fuel Efficiency by Region Over Time", "Trends in MPG from 1970s to 1980s", title_color="#1a1a1a", subtitle_color="#4a4a4a").add_context( text=[ "Japanese cars consistently lead in fuel efficiency.", "U.S. manufacturers ramped up efficiency post-1975.", "European models maintain a steady middle ground." ], position="bottom", dx=0, color="black").render()chart Output: Image by Author We see how regulatory changes and fuel crises influenced fuel efficiency, especially in the U.S. Story 3: Impact of the 1973 Oil Crisis Let’s annotate our chart with the 1973 Oil Crisis, a pivotal moment for car design. chart = pn.Story(regional_avg).mark_line().encode( x='year:O', y='mpg:Q', color='origin:N').add_title( "Impact of the 1973 Oil Crisis on MPG", "Shift in design philosophy after fuel shortages", title_color="#1a1a1a", subtitle_color="#4a4a4a").add_context( text=[ "The 1973 Oil Crisis increased focus on fuel efficiency worldwide.", "U.S. automakers shifted designs to improve MPG post-crisis.", "Japanese models were already MPG leaders at the time." ], position="bottom", dx=0, color="black").add_annotation( 1973, 15.5, "1973 Oil Crisis", arrow_direction='up', arrow_dx=0, arrow_dy=-1, arrow_color='red', arrow_size=50, label_color='black', label_size=14, show_point=True).render()chart Output: Image by Author This annotated visualization adds historical context, showing how global events shape industry trends. In Summary… Using pynarrative and Altair, we seamlessly transformed car performance data into engaging visual stories by: Highlighting the inverse relationship between horsepower and fuel efficiency Exploring how regional design philosophies shape fuel economy over time Annotating major historical events like the 1973 Oil Crisis to show their industry impact All of this was done using a single, intuitive interface combining pandas, Altair, and pynarrative. Once this storytelling pipeline is in place, it can be adapted to any dataset rendered through Altair from automotive to healthcare and beyond. This approach is quicker, more scalable, and more intuitive than conventional manual charting methods. Whether you’re building technical reports, dynamic dashboards, or insight-driven narratives, this serves as a reliable foundation for effective data storytelling. I would love to read your comments! Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 التعليقات 0 المشاركات -
Regression vs Classification in Machine Learning — Why Most Beginners Get This Wrong | M004
Regression vs Classification in Machine Learning — Why Most Beginners Get This Wrong | M004
0 like
May 15, 2025
Share this post
Author: Mehul Ligade
Originally published on Towards AI.
Regression vs Classification in Machine Learning — Why Most Beginners Get This Wrong | M004
If you’re learning Machine Learning and think supervised learning is straightforward, think again.
The moment you start building your first model, you face a decision that most tutorials barely explain: should this be a regression problem or a classification problem?
The difference might seem obvious — until you mess up a project by predicting categories with a regression model or trying to force numeric output into classification buckets.In this article, I will break it all down from the ground up. Not just the textbook definitions, but the thinking process behind choosing the right type of model. You will learn what these terms really mean, how to spot the difference in the wild, and how I personally approach this choice in real-world projects.And as always, no recycled fluff; only experience, insight, and lessons that actually stick.
Now let’s dive in.
Contents
Why This Article Exists
The Real Question Behind Regression and Classification
What Regression Actually Means in ML
A Real-Life Example of Regression
What Classification Means and How It Works
A Real-Life Example of Classification
How to Choose Between ThemEvaluation Metrics You Must Know
What I Learned the Hard Way
Final Thoughts: Don’t Just Choose Models. Understand Problems.
Why This Article Exists
I am writing this because I got it wrong. More than once.
When I first started with supervised learning, I picked models like they were tools in a toolbox. Linear regression for numbers. Logistic regression for yes or no. That was it. End of story.
But then I hit edge cases — datasets that looked like classification but acted like regression. Projects where I used the wrong loss function and got results that were mathematically correct but practically useless. It became clear to me that the distinction between regression and classification is not just about output format. It is about understanding your problem at a deeper level.
So this article is what I wish someone had handed me back then.
—
The Real Question Behind Regression and Classification
Before we define anything, ask yourself this:
What is the nature of the thing I am trying to predict?Am I trying to predict a quantity? Something with measurable distance between values — like price, age, or temperature?Or am I trying to predict a class? A distinct label, category, or group — like cat or dog, spam or not spam, fraud or genuine?That is the fundamental fork in the road.
Regression problems deal with continuous outcomes. You are estimating values on a number line.
Classification problems deal with discrete outcomes. You are assigning input data into one of several predefined buckets.
And every model, loss function, and evaluation metric flows from this initial choice.
—
What Regression Actually Means in ML
Regression is not about graphs or slopes or lines. It is about approximation.
When you use regression, you are asking the model to learn a function that maps input variables to a continuous output — like predicting house price from square footage or predicting someone’s weight based on age and height.
But here’s what matters: there is no “correct” label in a strict sense. There is just closeness. Accuracy in regression is about how far off you are from the actual value. That’s why regression models minimize error — not classification mistakes.
Think about this: if you predict a house price as ₹88,00,000 when it’s actually ₹90,00,000, you are off by 2 lakhs. That’s the loss. That’s what we care about.
You are not trying to get an exact number every time. You are trying to get close and consistently close.
—
A Real-Life Example of Regression
In one of my early projects, I built a system to predict healthcare insurance costs. The dataset included factors like age, BMI, gender, smoking status, and location. The goal was to estimate the cost of a person’s annual premium.
There were no categories. Just numbers — actual premium amounts from previous customers.
This is a textbook regression problem. The output is continuous. The distance between ₹24,000 and ₹26,000 is meaningful. A difference of ₹2,000 is better than a difference of ₹20,000.
My models tried to minimize the error between predicted cost and actual cost. I used RMSEas my main metric. And even though the numbers were not perfect, they got close enough to be valuable for real decision-making.
That is regression. Learning to estimate: not classify.
—
What Classification Means and How It Works
Classification is different. Here, you are predicting categories.
You are not interested in the value of the output — only which group it falls into.This is the kind of learning used in problems like spam detection, loan approval, sentiment analysis, medical diagnosis, and image recognition.
In classification, you are not measuring how close your prediction is — you are measuring whether it is correct or not. There is no halfway.
If you predict that a transaction is “not fraud” and it is actually “fraud,” that is not a 40 percent error — it is a full-blown misclassification. The cost of being wrong can vary, but the format is binary: right or wrong.
Classification models often work by estimating probabilities. For example, a logistic regression model might say, “This email has a 92 percent chance of being spam.” But in the end, it must make a call — spam or not.
The key is to get the categories right.
—
How to Choose Between ThemNow here’s the golden question: how do you decide if your problem is regression or classification?
Ask yourself:
Are you trying to predict a value that falls on a continuous scale? If the answer is yes, it’s probably regression. For example, predicting weight, speed, cost, score, rating, or any other numeric measurement.Are you trying to assign an input to a predefined group? If so, it’s classification. For example, identifying sentiment, detecting objects, predicting diagnoses, or categorizing news articles.
And if you are not sure, here’s a tip: look at your target variable. If it has units — like kilograms, rupees, degrees, or centimeters — it’s probably regression. If it has labels like “positive,” “negative,” “approved,” or “rejected” it’s classification.
—
Evaluation Metrics You Must Know
This is where many people go wrong — including me, at first.
You cannot evaluate regression and classification models the same way.
In regression, we care about how far off the prediction is. Metrics like mean absolute error, mean squared error, or root mean squared error are used. They tell you how close the prediction is to the real value.
In classification, we care about how often the prediction is right. But accuracy alone is not always enough — especially with imbalanced data.
For example, in a fraud detection model where only 1 percent of transactions are actually fraud, a model that says “not fraud” for every case will be 99 percent accurate — and completely useless.
That’s why we use other metrics like precision, recall, F1-score, and AUC. These metrics tell us not just how often we are right, but how we are right — and when it matters.
Knowing the difference between evaluation strategies is just as important as choosing the right model.
—
What I Learned the Hard Way
In one of my earlier models, I was trying to predict how likely someone was to buy a product.
The target column was labeled “purchase likelihood” and had values between 0 and 1. I assumed it was a regression problem. I trained a model using RMSE. The predictions were pretty close.
But then I looked deeper and realized that this target had been generated by a previous model. It was already a probability. What I really needed was a classification decision: “Will buy” or “Will not buy.”
I had treated it like a regression problem when what I really wanted was classification. That mismatch between goal and framing wasted weeks of iteration.
Since then, I always start with the same question: “What decision is this model helping someone make?” That almost always leads me to the right type of problem.
—
Final Thoughts: Don’t Just Choose Models. Understand Problems.
Machine Learning is not about throwing algorithms at data. It is about solving real problems. And that starts with framing those problems the right way.
Choosing between regression and classification is not about picking the most popular model. It is about understanding the shape of the outcome you are trying to predict.
The closer you look at your data — especially your target variable — the better your choices will be. And the better your choices, the more reliable your models become.
This is how I build ML systems. Not just by following tutorials — but by understanding what the model is supposed to do, and why.
—
What Comes Next
In the next few articles, I will dive into model evaluation, error analysis, overfitting, and how I engineer features that actually improve predictions not just accuracy on paper.
As always, I will write from experience. From curiosity. From real-world projects and lessons that stick.
Follow along if you are tired of fluffy articles and want to build Machine Learning systems that actually work.
Find me here:
Twitter: x.com/MehulLigade LinkedIn: linkedin.com/in/mehulcode12Let’s keep learning one layer deeper at a time.
—
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI - Medium
Share this post
#regression #classification #machine #learning #whyRegression vs Classification in Machine Learning — Why Most Beginners Get This Wrong | M004Regression vs Classification in Machine Learning — Why Most Beginners Get This Wrong | M004 0 like May 15, 2025 Share this post Author: Mehul Ligade Originally published on Towards AI. Regression vs Classification in Machine Learning — Why Most Beginners Get This Wrong | M004 If you’re learning Machine Learning and think supervised learning is straightforward, think again. The moment you start building your first model, you face a decision that most tutorials barely explain: should this be a regression problem or a classification problem? The difference might seem obvious — until you mess up a project by predicting categories with a regression model or trying to force numeric output into classification buckets.In this article, I will break it all down from the ground up. Not just the textbook definitions, but the thinking process behind choosing the right type of model. You will learn what these terms really mean, how to spot the difference in the wild, and how I personally approach this choice in real-world projects.And as always, no recycled fluff; only experience, insight, and lessons that actually stick. Now let’s dive in. 📘 Contents Why This Article Exists The Real Question Behind Regression and Classification What Regression Actually Means in ML A Real-Life Example of Regression What Classification Means and How It Works A Real-Life Example of Classification How to Choose Between ThemEvaluation Metrics You Must Know What I Learned the Hard Way Final Thoughts: Don’t Just Choose Models. Understand Problems. 🔴 Why This Article Exists I am writing this because I got it wrong. More than once. When I first started with supervised learning, I picked models like they were tools in a toolbox. Linear regression for numbers. Logistic regression for yes or no. That was it. End of story. But then I hit edge cases — datasets that looked like classification but acted like regression. Projects where I used the wrong loss function and got results that were mathematically correct but practically useless. It became clear to me that the distinction between regression and classification is not just about output format. It is about understanding your problem at a deeper level. So this article is what I wish someone had handed me back then. — 🔴 The Real Question Behind Regression and Classification Before we define anything, ask yourself this: What is the nature of the thing I am trying to predict?Am I trying to predict a quantity? Something with measurable distance between values — like price, age, or temperature?Or am I trying to predict a class? A distinct label, category, or group — like cat or dog, spam or not spam, fraud or genuine?That is the fundamental fork in the road. Regression problems deal with continuous outcomes. You are estimating values on a number line. Classification problems deal with discrete outcomes. You are assigning input data into one of several predefined buckets. And every model, loss function, and evaluation metric flows from this initial choice. — 🔴 What Regression Actually Means in ML Regression is not about graphs or slopes or lines. It is about approximation. When you use regression, you are asking the model to learn a function that maps input variables to a continuous output — like predicting house price from square footage or predicting someone’s weight based on age and height. But here’s what matters: there is no “correct” label in a strict sense. There is just closeness. Accuracy in regression is about how far off you are from the actual value. That’s why regression models minimize error — not classification mistakes. Think about this: if you predict a house price as ₹88,00,000 when it’s actually ₹90,00,000, you are off by 2 lakhs. That’s the loss. That’s what we care about. You are not trying to get an exact number every time. You are trying to get close and consistently close. — 🔴 A Real-Life Example of Regression In one of my early projects, I built a system to predict healthcare insurance costs. The dataset included factors like age, BMI, gender, smoking status, and location. The goal was to estimate the cost of a person’s annual premium. There were no categories. Just numbers — actual premium amounts from previous customers. This is a textbook regression problem. The output is continuous. The distance between ₹24,000 and ₹26,000 is meaningful. A difference of ₹2,000 is better than a difference of ₹20,000. My models tried to minimize the error between predicted cost and actual cost. I used RMSEas my main metric. And even though the numbers were not perfect, they got close enough to be valuable for real decision-making. That is regression. Learning to estimate: not classify. — 🔴 What Classification Means and How It Works Classification is different. Here, you are predicting categories. You are not interested in the value of the output — only which group it falls into.This is the kind of learning used in problems like spam detection, loan approval, sentiment analysis, medical diagnosis, and image recognition. In classification, you are not measuring how close your prediction is — you are measuring whether it is correct or not. There is no halfway. If you predict that a transaction is “not fraud” and it is actually “fraud,” that is not a 40 percent error — it is a full-blown misclassification. The cost of being wrong can vary, but the format is binary: right or wrong. Classification models often work by estimating probabilities. For example, a logistic regression model might say, “This email has a 92 percent chance of being spam.” But in the end, it must make a call — spam or not. The key is to get the categories right. — 🔴 How to Choose Between ThemNow here’s the golden question: how do you decide if your problem is regression or classification? Ask yourself: Are you trying to predict a value that falls on a continuous scale? If the answer is yes, it’s probably regression. For example, predicting weight, speed, cost, score, rating, or any other numeric measurement.Are you trying to assign an input to a predefined group? If so, it’s classification. For example, identifying sentiment, detecting objects, predicting diagnoses, or categorizing news articles. And if you are not sure, here’s a tip: look at your target variable. If it has units — like kilograms, rupees, degrees, or centimeters — it’s probably regression. If it has labels like “positive,” “negative,” “approved,” or “rejected” it’s classification. — 🔴 Evaluation Metrics You Must Know This is where many people go wrong — including me, at first. You cannot evaluate regression and classification models the same way. In regression, we care about how far off the prediction is. Metrics like mean absolute error, mean squared error, or root mean squared error are used. They tell you how close the prediction is to the real value. In classification, we care about how often the prediction is right. But accuracy alone is not always enough — especially with imbalanced data. For example, in a fraud detection model where only 1 percent of transactions are actually fraud, a model that says “not fraud” for every case will be 99 percent accurate — and completely useless. That’s why we use other metrics like precision, recall, F1-score, and AUC. These metrics tell us not just how often we are right, but how we are right — and when it matters. Knowing the difference between evaluation strategies is just as important as choosing the right model. — 🔴 What I Learned the Hard Way In one of my earlier models, I was trying to predict how likely someone was to buy a product. The target column was labeled “purchase likelihood” and had values between 0 and 1. I assumed it was a regression problem. I trained a model using RMSE. The predictions were pretty close. But then I looked deeper and realized that this target had been generated by a previous model. It was already a probability. What I really needed was a classification decision: “Will buy” or “Will not buy.” I had treated it like a regression problem when what I really wanted was classification. That mismatch between goal and framing wasted weeks of iteration. Since then, I always start with the same question: “What decision is this model helping someone make?” That almost always leads me to the right type of problem. — 🔴 Final Thoughts: Don’t Just Choose Models. Understand Problems. Machine Learning is not about throwing algorithms at data. It is about solving real problems. And that starts with framing those problems the right way. Choosing between regression and classification is not about picking the most popular model. It is about understanding the shape of the outcome you are trying to predict. The closer you look at your data — especially your target variable — the better your choices will be. And the better your choices, the more reliable your models become. This is how I build ML systems. Not just by following tutorials — but by understanding what the model is supposed to do, and why. — 🔴 What Comes Next In the next few articles, I will dive into model evaluation, error analysis, overfitting, and how I engineer features that actually improve predictions not just accuracy on paper. As always, I will write from experience. From curiosity. From real-world projects and lessons that stick. Follow along if you are tired of fluffy articles and want to build Machine Learning systems that actually work. 📍 Find me here: Twitter: x.com/MehulLigade LinkedIn: linkedin.com/in/mehulcode12Let’s keep learning one layer deeper at a time. — Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post #regression #classification #machine #learning #whyTOWARDSAI.NETRegression vs Classification in Machine Learning — Why Most Beginners Get This Wrong | M004Regression vs Classification in Machine Learning — Why Most Beginners Get This Wrong | M004 0 like May 15, 2025 Share this post Author(s): Mehul Ligade Originally published on Towards AI. Regression vs Classification in Machine Learning — Why Most Beginners Get This Wrong | M004 If you’re learning Machine Learning and think supervised learning is straightforward, think again. The moment you start building your first model, you face a decision that most tutorials barely explain: should this be a regression problem or a classification problem? The difference might seem obvious — until you mess up a project by predicting categories with a regression model or trying to force numeric output into classification buckets.In this article, I will break it all down from the ground up. Not just the textbook definitions, but the thinking process behind choosing the right type of model. You will learn what these terms really mean, how to spot the difference in the wild, and how I personally approach this choice in real-world projects.And as always, no recycled fluff; only experience, insight, and lessons that actually stick. Now let’s dive in. 📘 Contents Why This Article Exists The Real Question Behind Regression and Classification What Regression Actually Means in ML A Real-Life Example of Regression What Classification Means and How It Works A Real-Life Example of Classification How to Choose Between Them (with a Decision Guide) Evaluation Metrics You Must Know What I Learned the Hard Way Final Thoughts: Don’t Just Choose Models. Understand Problems. 🔴 Why This Article Exists I am writing this because I got it wrong. More than once. When I first started with supervised learning, I picked models like they were tools in a toolbox. Linear regression for numbers. Logistic regression for yes or no. That was it. End of story. But then I hit edge cases — datasets that looked like classification but acted like regression. Projects where I used the wrong loss function and got results that were mathematically correct but practically useless. It became clear to me that the distinction between regression and classification is not just about output format. It is about understanding your problem at a deeper level. So this article is what I wish someone had handed me back then. — 🔴 The Real Question Behind Regression and Classification Before we define anything, ask yourself this: What is the nature of the thing I am trying to predict?Am I trying to predict a quantity? Something with measurable distance between values — like price, age, or temperature?Or am I trying to predict a class? A distinct label, category, or group — like cat or dog, spam or not spam, fraud or genuine?That is the fundamental fork in the road. Regression problems deal with continuous outcomes. You are estimating values on a number line. Classification problems deal with discrete outcomes. You are assigning input data into one of several predefined buckets. And every model, loss function, and evaluation metric flows from this initial choice. — 🔴 What Regression Actually Means in ML Regression is not about graphs or slopes or lines. It is about approximation. When you use regression, you are asking the model to learn a function that maps input variables to a continuous output — like predicting house price from square footage or predicting someone’s weight based on age and height. But here’s what matters: there is no “correct” label in a strict sense. There is just closeness. Accuracy in regression is about how far off you are from the actual value. That’s why regression models minimize error — not classification mistakes. Think about this: if you predict a house price as ₹88,00,000 when it’s actually ₹90,00,000, you are off by 2 lakhs. That’s the loss. That’s what we care about. You are not trying to get an exact number every time. You are trying to get close and consistently close. — 🔴 A Real-Life Example of Regression In one of my early projects, I built a system to predict healthcare insurance costs. The dataset included factors like age, BMI, gender, smoking status, and location. The goal was to estimate the cost of a person’s annual premium. There were no categories. Just numbers — actual premium amounts from previous customers. This is a textbook regression problem. The output is continuous. The distance between ₹24,000 and ₹26,000 is meaningful. A difference of ₹2,000 is better than a difference of ₹20,000. My models tried to minimize the error between predicted cost and actual cost. I used RMSE (root mean squared error) as my main metric. And even though the numbers were not perfect, they got close enough to be valuable for real decision-making. That is regression. Learning to estimate: not classify. — 🔴 What Classification Means and How It Works Classification is different. Here, you are predicting categories. You are not interested in the value of the output — only which group it falls into.This is the kind of learning used in problems like spam detection, loan approval, sentiment analysis, medical diagnosis, and image recognition. In classification, you are not measuring how close your prediction is — you are measuring whether it is correct or not. There is no halfway. If you predict that a transaction is “not fraud” and it is actually “fraud,” that is not a 40 percent error — it is a full-blown misclassification. The cost of being wrong can vary, but the format is binary: right or wrong. Classification models often work by estimating probabilities. For example, a logistic regression model might say, “This email has a 92 percent chance of being spam.” But in the end, it must make a call — spam or not. The key is to get the categories right. — 🔴 How to Choose Between Them (with a Decision Guide) Now here’s the golden question: how do you decide if your problem is regression or classification? Ask yourself: Are you trying to predict a value that falls on a continuous scale? If the answer is yes, it’s probably regression. For example, predicting weight, speed, cost, score, rating, or any other numeric measurement.Are you trying to assign an input to a predefined group? If so, it’s classification. For example, identifying sentiment, detecting objects, predicting diagnoses, or categorizing news articles. And if you are not sure, here’s a tip: look at your target variable. If it has units — like kilograms, rupees, degrees, or centimeters — it’s probably regression. If it has labels like “positive,” “negative,” “approved,” or “rejected” it’s classification. — 🔴 Evaluation Metrics You Must Know This is where many people go wrong — including me, at first. You cannot evaluate regression and classification models the same way. In regression, we care about how far off the prediction is. Metrics like mean absolute error, mean squared error, or root mean squared error are used. They tell you how close the prediction is to the real value. In classification, we care about how often the prediction is right. But accuracy alone is not always enough — especially with imbalanced data. For example, in a fraud detection model where only 1 percent of transactions are actually fraud, a model that says “not fraud” for every case will be 99 percent accurate — and completely useless. That’s why we use other metrics like precision, recall, F1-score, and AUC (area under the curve). These metrics tell us not just how often we are right, but how we are right — and when it matters. Knowing the difference between evaluation strategies is just as important as choosing the right model. — 🔴 What I Learned the Hard Way In one of my earlier models, I was trying to predict how likely someone was to buy a product. The target column was labeled “purchase likelihood” and had values between 0 and 1. I assumed it was a regression problem. I trained a model using RMSE. The predictions were pretty close. But then I looked deeper and realized that this target had been generated by a previous model. It was already a probability. What I really needed was a classification decision: “Will buy” or “Will not buy.” I had treated it like a regression problem when what I really wanted was classification. That mismatch between goal and framing wasted weeks of iteration. Since then, I always start with the same question: “What decision is this model helping someone make?” That almost always leads me to the right type of problem. — 🔴 Final Thoughts: Don’t Just Choose Models. Understand Problems. Machine Learning is not about throwing algorithms at data. It is about solving real problems. And that starts with framing those problems the right way. Choosing between regression and classification is not about picking the most popular model. It is about understanding the shape of the outcome you are trying to predict. The closer you look at your data — especially your target variable — the better your choices will be. And the better your choices, the more reliable your models become. This is how I build ML systems. Not just by following tutorials — but by understanding what the model is supposed to do, and why. — 🔴 What Comes Next In the next few articles, I will dive into model evaluation, error analysis, overfitting, and how I engineer features that actually improve predictions not just accuracy on paper. As always, I will write from experience. From curiosity. From real-world projects and lessons that stick. Follow along if you are tired of fluffy articles and want to build Machine Learning systems that actually work. 📍 Find me here: Twitter: x.com/MehulLigade LinkedIn: linkedin.com/in/mehulcode12Let’s keep learning one layer deeper at a time. — Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 التعليقات 0 المشاركات -
LAI #75: Generative AI vs. Agentic AI vs. AI Agents
LAI #75: Generative AI vs. Agentic AI vs. AI Agents
0 like
May 15, 2025
Share this post
Author: Towards AI Editorial Team
Originally published on Towards AI.
Good morning, AI enthusiasts,
This week’s issue dives into where the field is heading — beyond generation, toward autonomy and better error awareness. We’re starting with a breakdown of the increasingly fuzzy but important distinctions between Generative AI, Agentic AI, and AI Agents. Then we move into applied innovation: Microsoft’s GraphRAG, multimodal RAG systems using Cohere and Gemini, and a practical framework for predicting when your model is about to get something wrong.
Also in the mix: DNNs vs. tree-based models for e-commerce ranking, a powerful Cursor.ai-like browser extension from the community, and this week’s poll on when vibes are enough — and when accuracy has to come first.
Let’s get into it.
— Louis-François Bouchard, Towards AI Co-founder & Head of Community
Learn AI Together Community Section!
Featured Community post from the Discord
Retconned has built Sophon, an AI chat app that enhances your browsing experience by understanding and interacting with your tabs. With its intelligent composer, it can see the tabs you have open, allowing it to understand context and autofill forms, textboxes, or fields with a single click. It is a browser extension and completely free. Check it out here. Share your feedback in the thread and support a fellow community member!
AI poll of the week!
Most of you are doing vibe checks, and of course, for general tasks, the entire idea is for the AI to not feel like AI. But would you also rely on “vibes” for more quantitative tasks, where output accuracy matters more than output feel? Share in the thread, let’s decide together!
Meme of the week!
Meme shared by rucha8062
TAI Curated Section
Article of the week
How GraphRAG Works Step-by-Step By Mariana Avelino
This blog explains Microsoft’s GraphRAG, a method that uses knowledge graphs for retrieval-augmented generation. Key detailed processes were graph creation, involving entity extraction, community partitioning, and querying, with distinct Local and Global Search functions. It outlined how entities, relationships, and community reports are generated and used for LLM response generation, including context management and semantic retrieval.
Our must-read articles
1. Distill-then-Detect: A Practical Framework for Error-Aware Machine Learning By Shenggang Li
The author presented a framework, “Distill-then-Detect,” to address prediction errors in machine learning models, particularly the “big misses” on critical data slices. This approach involves distilling a compact “student” model from a larger “teacher” model. It then quantifies teacher uncertainty and trains a meta-model to predict where the teacher is likely to err. By combining these signals into a risk score and applying conformal calibration for thresholding, the system effectively flags high-risk predictions. Experiments demonstrated that this method identified error-prone cases with balanced precision and recall while clustering these errors provided actionable insights into problematic data segments.
2. Beyond Text: Building Multimodal RAG Systems with Cohere and Gemini By Sridhar Sampath
Traditional Retrieval-Augmented Generationsystems often fail to process visual data. This article details a multimodal RAG system designed to overcome this limitation by understanding both text and images within documents. It utilizes Cohere’s multimodal embeddings to create unified vector representations from content like PDFs. Gemini 2.5 Flash then generates context-aware answers using either matched text or images, with FAISS managing vector indexing. It explains the system’s workflow, from document upload to answer generation, demonstrating its enhanced capability to extract information from charts, tables, and other visuals compared to text-only RAG.
3. Generative AI vs. Agentic AI vs. AI Agents: What Everyone Needs to Know By Poojan Vig
The article clarified the distinct roles of Generative AI, Agentic AI, and AI Agents. It explained that Generative AI produces new content based on learned patterns. Agentic AI focuses on strategy, planning, and iteration towards a goal without continuous human intervention. AI Agents then sense their environment and execute actions in the digital or real world. Using a cooking analogy and examples like automated customer service, the piece illustrated how these AI types can operate independently or collaboratively to perform complex tasks.
4. DNNs vs Traditional Tree-Based Models for E-Commerce Ranking By Nikhilesh Pandey
The author discusses the evolution of e-commerce ad ranking systems, detailing the shift from traditional tree-based models to Deep Neural Networks. It outlines why tree-based models have reached their limits and how DNNs offer superior capabilities for handling complex data, personalization, and achieving better conversion ratepredictions. Using DoorDash Ads as a case study, the piece illustrates the iterative migration process, including defining baselines, optimizing model training and evaluation with techniques like data normalization and distributed processing, and addressing challenges such as the offline-online performance gap.
If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI - Medium
Share this post
#lai #generative #agentic #agentsLAI #75: Generative AI vs. Agentic AI vs. AI AgentsLAI #75: Generative AI vs. Agentic AI vs. AI Agents 0 like May 15, 2025 Share this post Author: Towards AI Editorial Team Originally published on Towards AI. Good morning, AI enthusiasts, This week’s issue dives into where the field is heading — beyond generation, toward autonomy and better error awareness. We’re starting with a breakdown of the increasingly fuzzy but important distinctions between Generative AI, Agentic AI, and AI Agents. Then we move into applied innovation: Microsoft’s GraphRAG, multimodal RAG systems using Cohere and Gemini, and a practical framework for predicting when your model is about to get something wrong. Also in the mix: DNNs vs. tree-based models for e-commerce ranking, a powerful Cursor.ai-like browser extension from the community, and this week’s poll on when vibes are enough — and when accuracy has to come first. Let’s get into it. — Louis-François Bouchard, Towards AI Co-founder & Head of Community Learn AI Together Community Section! Featured Community post from the Discord Retconned has built Sophon, an AI chat app that enhances your browsing experience by understanding and interacting with your tabs. With its intelligent composer, it can see the tabs you have open, allowing it to understand context and autofill forms, textboxes, or fields with a single click. It is a browser extension and completely free. Check it out here. Share your feedback in the thread and support a fellow community member! AI poll of the week! Most of you are doing vibe checks, and of course, for general tasks, the entire idea is for the AI to not feel like AI. But would you also rely on “vibes” for more quantitative tasks, where output accuracy matters more than output feel? Share in the thread, let’s decide together! Meme of the week! Meme shared by rucha8062 TAI Curated Section Article of the week How GraphRAG Works Step-by-Step By Mariana Avelino This blog explains Microsoft’s GraphRAG, a method that uses knowledge graphs for retrieval-augmented generation. Key detailed processes were graph creation, involving entity extraction, community partitioning, and querying, with distinct Local and Global Search functions. It outlined how entities, relationships, and community reports are generated and used for LLM response generation, including context management and semantic retrieval. Our must-read articles 1. Distill-then-Detect: A Practical Framework for Error-Aware Machine Learning By Shenggang Li The author presented a framework, “Distill-then-Detect,” to address prediction errors in machine learning models, particularly the “big misses” on critical data slices. This approach involves distilling a compact “student” model from a larger “teacher” model. It then quantifies teacher uncertainty and trains a meta-model to predict where the teacher is likely to err. By combining these signals into a risk score and applying conformal calibration for thresholding, the system effectively flags high-risk predictions. Experiments demonstrated that this method identified error-prone cases with balanced precision and recall while clustering these errors provided actionable insights into problematic data segments. 2. Beyond Text: Building Multimodal RAG Systems with Cohere and Gemini By Sridhar Sampath Traditional Retrieval-Augmented Generationsystems often fail to process visual data. This article details a multimodal RAG system designed to overcome this limitation by understanding both text and images within documents. It utilizes Cohere’s multimodal embeddings to create unified vector representations from content like PDFs. Gemini 2.5 Flash then generates context-aware answers using either matched text or images, with FAISS managing vector indexing. It explains the system’s workflow, from document upload to answer generation, demonstrating its enhanced capability to extract information from charts, tables, and other visuals compared to text-only RAG. 3. Generative AI vs. Agentic AI vs. AI Agents: What Everyone Needs to Know By Poojan Vig The article clarified the distinct roles of Generative AI, Agentic AI, and AI Agents. It explained that Generative AI produces new content based on learned patterns. Agentic AI focuses on strategy, planning, and iteration towards a goal without continuous human intervention. AI Agents then sense their environment and execute actions in the digital or real world. Using a cooking analogy and examples like automated customer service, the piece illustrated how these AI types can operate independently or collaboratively to perform complex tasks. 4. DNNs vs Traditional Tree-Based Models for E-Commerce Ranking By Nikhilesh Pandey The author discusses the evolution of e-commerce ad ranking systems, detailing the shift from traditional tree-based models to Deep Neural Networks. It outlines why tree-based models have reached their limits and how DNNs offer superior capabilities for handling complex data, personalization, and achieving better conversion ratepredictions. Using DoorDash Ads as a case study, the piece illustrates the iterative migration process, including defining baselines, optimizing model training and evaluation with techniques like data normalization and distributed processing, and addressing challenges such as the offline-online performance gap. If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post #lai #generative #agentic #agentsTOWARDSAI.NETLAI #75: Generative AI vs. Agentic AI vs. AI AgentsLAI #75: Generative AI vs. Agentic AI vs. AI Agents 0 like May 15, 2025 Share this post Author(s): Towards AI Editorial Team Originally published on Towards AI. Good morning, AI enthusiasts, This week’s issue dives into where the field is heading — beyond generation, toward autonomy and better error awareness. We’re starting with a breakdown of the increasingly fuzzy but important distinctions between Generative AI, Agentic AI, and AI Agents. Then we move into applied innovation: Microsoft’s GraphRAG, multimodal RAG systems using Cohere and Gemini, and a practical framework for predicting when your model is about to get something wrong. Also in the mix: DNNs vs. tree-based models for e-commerce ranking, a powerful Cursor.ai-like browser extension from the community, and this week’s poll on when vibes are enough — and when accuracy has to come first. Let’s get into it. — Louis-François Bouchard, Towards AI Co-founder & Head of Community Learn AI Together Community Section! Featured Community post from the Discord Retconned has built Sophon, an AI chat app that enhances your browsing experience by understanding and interacting with your tabs. With its intelligent composer, it can see the tabs you have open, allowing it to understand context and autofill forms, textboxes, or fields with a single click. It is a browser extension and completely free. Check it out here. Share your feedback in the thread and support a fellow community member! AI poll of the week! Most of you are doing vibe checks, and of course, for general tasks, the entire idea is for the AI to not feel like AI. But would you also rely on “vibes” for more quantitative tasks, where output accuracy matters more than output feel? Share in the thread, let’s decide together! Meme of the week! Meme shared by rucha8062 TAI Curated Section Article of the week How GraphRAG Works Step-by-Step By Mariana Avelino This blog explains Microsoft’s GraphRAG, a method that uses knowledge graphs for retrieval-augmented generation. Key detailed processes were graph creation, involving entity extraction, community partitioning, and querying, with distinct Local and Global Search functions. It outlined how entities, relationships, and community reports are generated and used for LLM response generation, including context management and semantic retrieval. Our must-read articles 1. Distill-then-Detect: A Practical Framework for Error-Aware Machine Learning By Shenggang Li The author presented a framework, “Distill-then-Detect,” to address prediction errors in machine learning models, particularly the “big misses” on critical data slices. This approach involves distilling a compact “student” model from a larger “teacher” model. It then quantifies teacher uncertainty and trains a meta-model to predict where the teacher is likely to err. By combining these signals into a risk score and applying conformal calibration for thresholding, the system effectively flags high-risk predictions. Experiments demonstrated that this method identified error-prone cases with balanced precision and recall while clustering these errors provided actionable insights into problematic data segments. 2. Beyond Text: Building Multimodal RAG Systems with Cohere and Gemini By Sridhar Sampath Traditional Retrieval-Augmented Generation (RAG) systems often fail to process visual data. This article details a multimodal RAG system designed to overcome this limitation by understanding both text and images within documents. It utilizes Cohere’s multimodal embeddings to create unified vector representations from content like PDFs. Gemini 2.5 Flash then generates context-aware answers using either matched text or images, with FAISS managing vector indexing. It explains the system’s workflow, from document upload to answer generation, demonstrating its enhanced capability to extract information from charts, tables, and other visuals compared to text-only RAG. 3. Generative AI vs. Agentic AI vs. AI Agents: What Everyone Needs to Know By Poojan Vig The article clarified the distinct roles of Generative AI, Agentic AI, and AI Agents. It explained that Generative AI produces new content based on learned patterns. Agentic AI focuses on strategy, planning, and iteration towards a goal without continuous human intervention. AI Agents then sense their environment and execute actions in the digital or real world. Using a cooking analogy and examples like automated customer service, the piece illustrated how these AI types can operate independently or collaboratively to perform complex tasks. 4. DNNs vs Traditional Tree-Based Models for E-Commerce Ranking By Nikhilesh Pandey The author discusses the evolution of e-commerce ad ranking systems, detailing the shift from traditional tree-based models to Deep Neural Networks (DNNs). It outlines why tree-based models have reached their limits and how DNNs offer superior capabilities for handling complex data, personalization, and achieving better conversion rate (CVR) predictions. Using DoorDash Ads as a case study, the piece illustrates the iterative migration process, including defining baselines, optimizing model training and evaluation with techniques like data normalization and distributed processing, and addressing challenges such as the offline-online performance gap. If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 التعليقات 0 المشاركات -
Extracting Data from Unstructured Documents
Author: Felix Pappe
Originally published on Towards AI.
Image created by the author using gpt-image-1
Introduction
In the past, extracting specific information from documents or images using traditional methods could have become quickly cumbersome and frustrating, especially when the final results stray far from what you intended. The reasons for this can be diverse, ranging from overly complex document layouts to improperly formatted files, or an avalanche of visual elements, which machines struggle to interpret.
However, vision-enabled languagemodels have come to the rescue. Over the past months and years, these models have gained ever-greater capabilities, from rough image descriptions to detailed text extraction. Notably, the extraction of complex textual information from images has seen astonishing progress. This allows for rapid knowledge extraction from diverse document types without brittle, rule-based systems that break as soon as the document structure changes — and without the time-, data-, and cost-intensive specialized training of custom models.
However, there is one flaw: vLMs, like their text-only counterparts, tend to produce verbose output around the information you actually want. Phrases such as “Of course, here is the information you requested” or “This is the extracted information about XYZ” commonly surround the essential content.
You could use regular expressions together with advanced prompt engineering to constrain the vLM to output only the requested information. However, crafting the perfect prompt and matching regex for a given task is difficult and requires much trial and error. In this blog post, I’d like to introduce a simpler approach: combining the rich capabilities of vLLMs with the strict validation offered by Pydantic classes to extract exactly the desired information for your document processing pipeline.
Description of post tackled issue
The example in this blog post describes a situation that every job applicant has likely experienced many times. I am sure of it.
After you have carefully and thoroughly created your CV, thinking about every word and maybe even every letter, you upload the file to a job portal. But after successfully uploading the file, including all the requested information, you are asked once again to fill out the same details in standard HTML forms by copying and pasting the information from your CV into the correct fields.Some companies attempt to autofill these fields based on the information extracted from your CV, but the results are often far from accurate or complete.In the following code, I combine Pixtral, LangChain, and Pydantic to provide a simple solution.
The code extracts the first name, last name, phone number, email, and birthday from the CV if they exist. This helps keep the example simple and focuses on the technical aspects.The code can be easily adapted for other use cases or extended to extract all required information from a CV.So let us dive into the code.
Code walkthrough
Importing required libraries
In the first step, the required libraries are imported, including:
os, pathlib, and typing for standard Python modules providing filesystem access and type annotations
base64
dontenv.env file into os.environ
pydanticLLM output
ChatMistralAILLM interface
PIL
import osimport base64from pathlib import Pathfrom typing import Optionalfrom dotenv import load_dotenvfrom pydantic import BaseModel, Fieldfrom langchain_mistralai.chat_models import ChatMistralAIfrom langchain_core.messages import HumanMessagefrom PIL import Image
Loading environment variables
Subsequently, the environment variables are loaded using load_dotenv, and the MISTRAL_API_KEY is retrieved.
load_dotenvMISTRAL_API_KEY = os.getenvif not MISTRAL_API_KEY: raise ValueErrorDefining the output schema with pydantic
Following that, the output schema is defined using Pydantic. Pydantic is a Python library for data parsing and validation based on Python type hints. At its core, Pydantic’s BaseModel offers various useful features, such as the declaration of data typesand automatic coercion of incoming data into the required types when possible.
Moreover, it validates whether the incoming data matches the predefined schema and raises an error if it does not. Thanks to these clearly defined schemas, the data can be quickly serialized into other formats such as JSON. Likewise, Pydantic also allows the creation of document fields with metadata that tools such as LLMs can inspect and utilize. The next code block defines the structure of the expected output using Pydantic. These are the data points that the model should extract from the CV image.class BasicCV: first_name: Optional= Fieldlast_name: Optional= Fieldphone: Optional= Fieldemail: Optional= Fieldbirthday: Optional= Field")
Converting images to base64
Subsequently, the first function is defined for the script. The function encode_image_to_base64does exactly what its name suggests. It loads an image and converts it into a base64 string, which is passed into the vLM later.
Moreover, an upscaling factor has been integrated. Although no additional information is gained by simply increasing the height and width, in my experience, the results tend to improve, especially in situations where the original resolution of the image is low.
def encode_image_to_base64-> str: with Image.openas img: if upscale_factor != 1.0: new_size =, int) img = img.resizefrom io import BytesIO buffer = BytesIOimg.saveimage_bytes = buffer.getvaluereturn base64.b64encode.decodeProcessing the CV with a vision language model
Now, let’s move on to the main function of this script. The process_cvfunction begins by initializing the Mistral interface using a previously generated API key. This model is then wrapped using the .with_structured_outputfunction, in which the Pydantic model defined above is passed as input. If you are using a different vLM, make sure that it supports structured output, as not all vLMs do.
Afterwards, the input image is converted into a base64string, which is then transformed into a Uniform Resource Identifierby attaching a metadata string in front of the b64 string.
Next, a simple system prompt is defined, which leaves room for improvement in more complex extraction tasks but works perfectly for this scenario.
Finally, the URI and system prompt are combined into a LangChain HumanMessage, which is passed to the structured vLM. The model then returns the requested information in the previously defined Pydantic format.
def process_cv-> BasicCV: image_path: Path, api_key: Optional= None llm = ChatMistralAIstructured_llm = llm.with_structured_outputimage_b64 = encode_image_to_base64data_uri = f"data:image/png;base64,{image_b64}" system_text =message = HumanMessageresult: BasicCV = structured_llm.invokereturn result
Running the script
This function is executed by the main, where the path is defined and the final information is printed out.
if __name__ == "__main__": image_file = Pathcv_data = process_cvprintprintprintprintprintConclusion
This simple Python script provides only a first impression of how powerful and flexible vLMs have become. In combination with Pydantic and with the support of the powerful LangChain framework, vLMs can be turned into a meaningful solution for many document processing workflows, such as application processing or invoice handling.
What experience have you had with vision Large Language Models? Do you have other fields in mind where such a workflow might be beneficial?
Source
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
#extracting #data #unstructured #documentsExtracting Data from Unstructured DocumentsAuthor: Felix Pappe Originally published on Towards AI. Image created by the author using gpt-image-1 Introduction In the past, extracting specific information from documents or images using traditional methods could have become quickly cumbersome and frustrating, especially when the final results stray far from what you intended. The reasons for this can be diverse, ranging from overly complex document layouts to improperly formatted files, or an avalanche of visual elements, which machines struggle to interpret. However, vision-enabled languagemodels have come to the rescue. Over the past months and years, these models have gained ever-greater capabilities, from rough image descriptions to detailed text extraction. Notably, the extraction of complex textual information from images has seen astonishing progress. This allows for rapid knowledge extraction from diverse document types without brittle, rule-based systems that break as soon as the document structure changes — and without the time-, data-, and cost-intensive specialized training of custom models. However, there is one flaw: vLMs, like their text-only counterparts, tend to produce verbose output around the information you actually want. Phrases such as “Of course, here is the information you requested” or “This is the extracted information about XYZ” commonly surround the essential content. You could use regular expressions together with advanced prompt engineering to constrain the vLM to output only the requested information. However, crafting the perfect prompt and matching regex for a given task is difficult and requires much trial and error. In this blog post, I’d like to introduce a simpler approach: combining the rich capabilities of vLLMs with the strict validation offered by Pydantic classes to extract exactly the desired information for your document processing pipeline. Description of post tackled issue The example in this blog post describes a situation that every job applicant has likely experienced many times. I am sure of it. After you have carefully and thoroughly created your CV, thinking about every word and maybe even every letter, you upload the file to a job portal. But after successfully uploading the file, including all the requested information, you are asked once again to fill out the same details in standard HTML forms by copying and pasting the information from your CV into the correct fields.Some companies attempt to autofill these fields based on the information extracted from your CV, but the results are often far from accurate or complete.In the following code, I combine Pixtral, LangChain, and Pydantic to provide a simple solution. The code extracts the first name, last name, phone number, email, and birthday from the CV if they exist. This helps keep the example simple and focuses on the technical aspects.The code can be easily adapted for other use cases or extended to extract all required information from a CV.So let us dive into the code. Code walkthrough Importing required libraries In the first step, the required libraries are imported, including: os, pathlib, and typing for standard Python modules providing filesystem access and type annotations base64 dontenv.env file into os.environ pydanticLLM output ChatMistralAILLM interface PIL import osimport base64from pathlib import Pathfrom typing import Optionalfrom dotenv import load_dotenvfrom pydantic import BaseModel, Fieldfrom langchain_mistralai.chat_models import ChatMistralAIfrom langchain_core.messages import HumanMessagefrom PIL import Image Loading environment variables Subsequently, the environment variables are loaded using load_dotenv, and the MISTRAL_API_KEY is retrieved. load_dotenvMISTRAL_API_KEY = os.getenvif not MISTRAL_API_KEY: raise ValueErrorDefining the output schema with pydantic Following that, the output schema is defined using Pydantic. Pydantic is a Python library for data parsing and validation based on Python type hints. At its core, Pydantic’s BaseModel offers various useful features, such as the declaration of data typesand automatic coercion of incoming data into the required types when possible. Moreover, it validates whether the incoming data matches the predefined schema and raises an error if it does not. Thanks to these clearly defined schemas, the data can be quickly serialized into other formats such as JSON. Likewise, Pydantic also allows the creation of document fields with metadata that tools such as LLMs can inspect and utilize. The next code block defines the structure of the expected output using Pydantic. These are the data points that the model should extract from the CV image.class BasicCV: first_name: Optional= Fieldlast_name: Optional= Fieldphone: Optional= Fieldemail: Optional= Fieldbirthday: Optional= Field") Converting images to base64 Subsequently, the first function is defined for the script. The function encode_image_to_base64does exactly what its name suggests. It loads an image and converts it into a base64 string, which is passed into the vLM later. Moreover, an upscaling factor has been integrated. Although no additional information is gained by simply increasing the height and width, in my experience, the results tend to improve, especially in situations where the original resolution of the image is low. def encode_image_to_base64-> str: with Image.openas img: if upscale_factor != 1.0: new_size =, int) img = img.resizefrom io import BytesIO buffer = BytesIOimg.saveimage_bytes = buffer.getvaluereturn base64.b64encode.decodeProcessing the CV with a vision language model Now, let’s move on to the main function of this script. The process_cvfunction begins by initializing the Mistral interface using a previously generated API key. This model is then wrapped using the .with_structured_outputfunction, in which the Pydantic model defined above is passed as input. If you are using a different vLM, make sure that it supports structured output, as not all vLMs do. Afterwards, the input image is converted into a base64string, which is then transformed into a Uniform Resource Identifierby attaching a metadata string in front of the b64 string. Next, a simple system prompt is defined, which leaves room for improvement in more complex extraction tasks but works perfectly for this scenario. Finally, the URI and system prompt are combined into a LangChain HumanMessage, which is passed to the structured vLM. The model then returns the requested information in the previously defined Pydantic format. def process_cv-> BasicCV: image_path: Path, api_key: Optional= None llm = ChatMistralAIstructured_llm = llm.with_structured_outputimage_b64 = encode_image_to_base64data_uri = f"data:image/png;base64,{image_b64}" system_text =message = HumanMessageresult: BasicCV = structured_llm.invokereturn result Running the script This function is executed by the main, where the path is defined and the final information is printed out. if __name__ == "__main__": image_file = Pathcv_data = process_cvprintprintprintprintprintConclusion This simple Python script provides only a first impression of how powerful and flexible vLMs have become. In combination with Pydantic and with the support of the powerful LangChain framework, vLMs can be turned into a meaningful solution for many document processing workflows, such as application processing or invoice handling. What experience have you had with vision Large Language Models? Do you have other fields in mind where such a workflow might be beneficial? Source Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI #extracting #data #unstructured #documentsTOWARDSAI.NETExtracting Data from Unstructured DocumentsAuthor(s): Felix Pappe Originally published on Towards AI. Image created by the author using gpt-image-1 Introduction In the past, extracting specific information from documents or images using traditional methods could have become quickly cumbersome and frustrating, especially when the final results stray far from what you intended. The reasons for this can be diverse, ranging from overly complex document layouts to improperly formatted files, or an avalanche of visual elements, which machines struggle to interpret. However, vision-enabled language (vLMs)models have come to the rescue. Over the past months and years, these models have gained ever-greater capabilities, from rough image descriptions to detailed text extraction. Notably, the extraction of complex textual information from images has seen astonishing progress. This allows for rapid knowledge extraction from diverse document types without brittle, rule-based systems that break as soon as the document structure changes — and without the time-, data-, and cost-intensive specialized training of custom models. However, there is one flaw: vLMs, like their text-only counterparts, tend to produce verbose output around the information you actually want. Phrases such as “Of course, here is the information you requested” or “This is the extracted information about XYZ” commonly surround the essential content. You could use regular expressions together with advanced prompt engineering to constrain the vLM to output only the requested information. However, crafting the perfect prompt and matching regex for a given task is difficult and requires much trial and error. In this blog post, I’d like to introduce a simpler approach: combining the rich capabilities of vLLMs with the strict validation offered by Pydantic classes to extract exactly the desired information for your document processing pipeline. Description of post tackled issue The example in this blog post describes a situation that every job applicant has likely experienced many times. I am sure of it. After you have carefully and thoroughly created your CV, thinking about every word and maybe even every letter, you upload the file to a job portal. But after successfully uploading the file, including all the requested information, you are asked once again to fill out the same details in standard HTML forms by copying and pasting the information from your CV into the correct fields.Some companies attempt to autofill these fields based on the information extracted from your CV, but the results are often far from accurate or complete.In the following code, I combine Pixtral, LangChain, and Pydantic to provide a simple solution. The code extracts the first name, last name, phone number, email, and birthday from the CV if they exist. This helps keep the example simple and focuses on the technical aspects.The code can be easily adapted for other use cases or extended to extract all required information from a CV.So let us dive into the code. Code walkthrough Importing required libraries In the first step, the required libraries are imported, including: os, pathlib, and typing for standard Python modules providing filesystem access and type annotations base64 dontenv.env file into os.environ pydanticLLM output ChatMistralAILLM interface PIL import osimport base64from pathlib import Pathfrom typing import Optionalfrom dotenv import load_dotenvfrom pydantic import BaseModel, Fieldfrom langchain_mistralai.chat_models import ChatMistralAIfrom langchain_core.messages import HumanMessagefrom PIL import Image Loading environment variables Subsequently, the environment variables are loaded using load_dotenv(), and the MISTRAL_API_KEY is retrieved. load_dotenv()MISTRAL_API_KEY = os.getenv("MISTRAL_API_KEY")if not MISTRAL_API_KEY: raise ValueError("MISTRAL_API_KEY not set in environment") Defining the output schema with pydantic Following that, the output schema is defined using Pydantic. Pydantic is a Python library for data parsing and validation based on Python type hints. At its core, Pydantic’s BaseModel offers various useful features, such as the declaration of data types (e.g. str, int, List[str], nested models, etc.) and automatic coercion of incoming data into the required types when possible (e.g., converting "102" into 102). Moreover, it validates whether the incoming data matches the predefined schema and raises an error if it does not. Thanks to these clearly defined schemas, the data can be quickly serialized into other formats such as JSON. Likewise, Pydantic also allows the creation of document fields with metadata that tools such as LLMs can inspect and utilize. The next code block defines the structure of the expected output using Pydantic. These are the data points that the model should extract from the CV image.class BasicCV(BaseModel): first_name: Optional[str] = Field(None, description="first name") last_name: Optional[str] = Field(None, description="last name") phone: Optional[str] = Field(None, description="Telephone number") email: Optional[str] = Field(None, description="Email address") birthday: Optional[str] = Field(None, description="Date of birth (e.g., YYYY-MM-DD)") Converting images to base64 Subsequently, the first function is defined for the script. The function encode_image_to_base64() does exactly what its name suggests. It loads an image and converts it into a base64 string, which is passed into the vLM later. Moreover, an upscaling factor has been integrated. Although no additional information is gained by simply increasing the height and width, in my experience, the results tend to improve, especially in situations where the original resolution of the image is low. def encode_image_to_base64(image_path: Path, upscale_factor: float = 1.0) -> str: with Image.open(image_path) as img: if upscale_factor != 1.0: new_size = (int(img.width * upscale_factor), int(img.height * upscale_factor)) img = img.resize(new_size, Image.LANCZOS) from io import BytesIO buffer = BytesIO() img.save(buffer, format="PNG") image_bytes = buffer.getvalue() return base64.b64encode(image_bytes).decode() Processing the CV with a vision language model Now, let’s move on to the main function of this script. The process_cv() function begins by initializing the Mistral interface using a previously generated API key. This model is then wrapped using the .with_structured_output(BasicCV) function, in which the Pydantic model defined above is passed as input. If you are using a different vLM, make sure that it supports structured output, as not all vLMs do. Afterwards, the input image is converted into a base64 (b64) string, which is then transformed into a Uniform Resource Identifier (URI) by attaching a metadata string in front of the b64 string. Next, a simple system prompt is defined, which leaves room for improvement in more complex extraction tasks but works perfectly for this scenario. Finally, the URI and system prompt are combined into a LangChain HumanMessage, which is passed to the structured vLM. The model then returns the requested information in the previously defined Pydantic format. def process_cv() -> BasicCV: image_path: Path, api_key: Optional[str] = None llm = ChatMistralAI( model="pixtral-12b-latest", mistral_api_key=api_key or MISTRAL_API_KEY, ) structured_llm = llm.with_structured_output(BasicCV) image_b64 = encode_image_to_base64(image_path) data_uri = f"data:image/png;base64,{image_b64}" system_text = ( "Extract only the following fields from this CV: first name, last name, " "telephone number, email address, and birthday. Return JSON matching the schema." ) message = HumanMessage( content=[ {"type": "text", "text": system_text}, {"type": "image_url", "image_url": data_uri}, ] ) result: BasicCV = structured_llm.invoke([message]) return result Running the script This function is executed by the main, where the path is defined and the final information is printed out. if __name__ == "__main__": image_file = Path("cv-test.png") cv_data = process_cv(image_file) print(f"First Name: {cv_data.first_name}") print(f"Last Name: {cv_data.last_name}") print(f"Phone: {cv_data.phone}") print(f"Email: {cv_data.email}") print(f"Birthday: {cv_data.birthday}") Conclusion This simple Python script provides only a first impression of how powerful and flexible vLMs have become. In combination with Pydantic and with the support of the powerful LangChain framework, vLMs can be turned into a meaningful solution for many document processing workflows, such as application processing or invoice handling. What experience have you had with vision Large Language Models? Do you have other fields in mind where such a workflow might be beneficial? Source Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 التعليقات 0 المشاركات -
Parse Documents Including Images, Tables, Equations, Charts, and Code.
Latest Machine Learning
Parse Documents Including Images, Tables, Equations, Charts, and Code.
0 like
May 14, 2025
Share this post
Author: Ahmed Boulahia
Originally published on Towards AI.
Enhance Your RAG Pipeline by Using SmolDocling to Parse Complex Documentsinto Your Vector DBImage created by the authorVision + Structure: SmolDocling is a new 256M-parameter model that reads entire document pages and converts them into a rich DocTags markup format capturing content and layout.Compact & Fast: Despite its small size, it matches the accuracy of models 10–27× larger. It runs quickly.Key Features: Built-in OCR with bounding boxes, formula/code recognition, table/chart parsing, list grouping, caption linking, etc., all in one end-to-end package.
Have you ever tried to copy-paste text from a PDF research paper and ended up with gibberish, missing figures, or malformed equations? Complex documents are often packed with non-text elements like images, graphs, tables and math , that simple text-based AI can’t handle.
SmolDocling aims to change that, it’s a multimodal AI model designed to process a whole page image and output a single, structured representation of everything on it.
In this post, we’ll see why combining vision and language is essential for modern document AI, and how SmolDocling’s features set let it convert complex docs end-to-end.
Traditional document AI often treated pages as “just text”. One common pattern was: run an OCR engine to get all the words, then feed that into a text model.
Systems like LayoutLM… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI - Medium
Share this post
#parse #documents #including #images #tablesParse Documents Including Images, Tables, Equations, Charts, and Code.Latest Machine Learning Parse Documents Including Images, Tables, Equations, Charts, and Code. 0 like May 14, 2025 Share this post Author: Ahmed Boulahia Originally published on Towards AI. Enhance Your RAG Pipeline by Using SmolDocling to Parse Complex Documentsinto Your Vector DBImage created by the authorVision + Structure: SmolDocling is a new 256M-parameter model that reads entire document pages and converts them into a rich DocTags markup format capturing content and layout.Compact & Fast: Despite its small size, it matches the accuracy of models 10–27× larger. It runs quickly.Key Features: Built-in OCR with bounding boxes, formula/code recognition, table/chart parsing, list grouping, caption linking, etc., all in one end-to-end package. Have you ever tried to copy-paste text from a PDF research paper and ended up with gibberish, missing figures, or malformed equations? Complex documents are often packed with non-text elements like images, graphs, tables and math , that simple text-based AI can’t handle. SmolDocling aims to change that, it’s a multimodal AI model designed to process a whole page image and output a single, structured representation of everything on it. In this post, we’ll see why combining vision and language is essential for modern document AI, and how SmolDocling’s features set let it convert complex docs end-to-end. Traditional document AI often treated pages as “just text”. One common pattern was: run an OCR engine to get all the words, then feed that into a text model. Systems like LayoutLM… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post #parse #documents #including #images #tablesTOWARDSAI.NETParse Documents Including Images, Tables, Equations, Charts, and Code.Latest Machine Learning Parse Documents Including Images, Tables, Equations, Charts, and Code. 0 like May 14, 2025 Share this post Author(s): Ahmed Boulahia Originally published on Towards AI. Enhance Your RAG Pipeline by Using SmolDocling to Parse Complex Documents (Tables, Equations, Charts & Code) into Your Vector DBImage created by the authorVision + Structure: SmolDocling is a new 256M-parameter model that reads entire document pages and converts them into a rich DocTags markup format capturing content and layout.Compact & Fast: Despite its small size, it matches the accuracy of models 10–27× larger. It runs quickly (≈0.35s/page on an A100 GPU).Key Features: Built-in OCR with bounding boxes, formula/code recognition, table/chart parsing, list grouping, caption linking, etc., all in one end-to-end package. Have you ever tried to copy-paste text from a PDF research paper and ended up with gibberish, missing figures, or malformed equations? Complex documents are often packed with non-text elements like images, graphs, tables and math , that simple text-based AI can’t handle. SmolDocling aims to change that, it’s a multimodal AI model designed to process a whole page image and output a single, structured representation of everything on it. In this post, we’ll see why combining vision and language is essential for modern document AI, and how SmolDocling’s features set let it convert complex docs end-to-end. Traditional document AI often treated pages as “just text”. One common pattern was: run an OCR engine to get all the words (and their positions), then feed that into a text model. Systems like LayoutLM… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 التعليقات 0 المشاركات -
World’s First AI Agent Hospital: 42 AI Doctors, 4 Nurses, 0 Humans
World’s First AI Agent Hospital: 42 AI Doctors, 4 Nurses, 0 Humans
0 like
May 14, 2025
Share this post
Author: MKWriteshere
Originally published on Towards AI.
In the real world, becoming a doctor is a marathon, not a sprint. It takes roughly 20 years of education, 12 years of school, four years of college, four years of medical school, followed by years of residency before a medical student becomes a fully qualified physician.
But what if AI could take a shortcut?
Researchers at Tsinghua University have developed a revolutionary approach that might reshape medical AI.
Their system, called “Agent Hospital,” is a virtual medical world where AI doctors treat AI patients, learning from each successful treatment and mistake along the way, all without needing human-labeled training data.
Agent Hospital is essentially a simulated healthcare environment where all patients, nurses, and doctors are autonomous agents powered by large language models.
Unlike traditional medical AI that focuses mainly on acquiring knowledge from textbooks, Agent Hospital simulates the practical experience of treating patients — the second and arguably more critical phase of medical expertise development.
Figure 1 from the Research Paper
As shown in Figure 1, this virtual hospital contains various functional areas, including triage stations, registration desks, waiting areas, consultation rooms, examination rooms, pharmacies, and follow-up areas.
What makes this approach revolutionary is that the entire process of treating illness is simulated: from disease onset… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI - Medium
Share this post
#worlds #first #agent #hospital #doctorsWorld’s First AI Agent Hospital: 42 AI Doctors, 4 Nurses, 0 HumansWorld’s First AI Agent Hospital: 42 AI Doctors, 4 Nurses, 0 Humans 0 like May 14, 2025 Share this post Author: MKWriteshere Originally published on Towards AI. In the real world, becoming a doctor is a marathon, not a sprint. It takes roughly 20 years of education, 12 years of school, four years of college, four years of medical school, followed by years of residency before a medical student becomes a fully qualified physician. But what if AI could take a shortcut? Researchers at Tsinghua University have developed a revolutionary approach that might reshape medical AI. Their system, called “Agent Hospital,” is a virtual medical world where AI doctors treat AI patients, learning from each successful treatment and mistake along the way, all without needing human-labeled training data. Agent Hospital is essentially a simulated healthcare environment where all patients, nurses, and doctors are autonomous agents powered by large language models. Unlike traditional medical AI that focuses mainly on acquiring knowledge from textbooks, Agent Hospital simulates the practical experience of treating patients — the second and arguably more critical phase of medical expertise development. Figure 1 from the Research Paper As shown in Figure 1, this virtual hospital contains various functional areas, including triage stations, registration desks, waiting areas, consultation rooms, examination rooms, pharmacies, and follow-up areas. What makes this approach revolutionary is that the entire process of treating illness is simulated: from disease onset… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post #worlds #first #agent #hospital #doctorsTOWARDSAI.NETWorld’s First AI Agent Hospital: 42 AI Doctors, 4 Nurses, 0 HumansWorld’s First AI Agent Hospital: 42 AI Doctors, 4 Nurses, 0 Humans 0 like May 14, 2025 Share this post Author(s): MKWriteshere Originally published on Towards AI. In the real world, becoming a doctor is a marathon, not a sprint. It takes roughly 20 years of education, 12 years of school, four years of college, four years of medical school, followed by years of residency before a medical student becomes a fully qualified physician. But what if AI could take a shortcut? Researchers at Tsinghua University have developed a revolutionary approach that might reshape medical AI. Their system, called “Agent Hospital,” is a virtual medical world where AI doctors treat AI patients, learning from each successful treatment and mistake along the way, all without needing human-labeled training data. Agent Hospital is essentially a simulated healthcare environment where all patients, nurses, and doctors are autonomous agents powered by large language models (LLMs). Unlike traditional medical AI that focuses mainly on acquiring knowledge from textbooks, Agent Hospital simulates the practical experience of treating patients — the second and arguably more critical phase of medical expertise development. Figure 1 from the Research Paper As shown in Figure 1, this virtual hospital contains various functional areas, including triage stations, registration desks, waiting areas, consultation rooms, examination rooms, pharmacies, and follow-up areas. What makes this approach revolutionary is that the entire process of treating illness is simulated: from disease onset… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 التعليقات 0 المشاركات -
Master the LLM Stack in 60+ hours — learn, code, ship, and certify
Author: Towards AI Editorial Team
Originally published on Towards AI.
Over the past two years, we’ve helped teams design and deploy real-world LLM systems — RAG pipelines, copilots that actually reduce load, PoCs that became products, and cut down hallucinations.
One year ago, we decided to put everything we knew about building architectures around LLMs — the stack, the mistakes, the gotchas, the strategies — into a single guide:
Building LLMs for Production — “the most comprehensive textbook to date on building LLM applications,” as Jerry Liuput it.
The response was amazing. People read it. People built with it. People shared it.
But a few months in, our DMs started filling up:
“Has the book been updated?”
“Does it cover the latest models like o3 or Gemini 2.5?”
“Can I useinstead?”
“How do I choose the right model for my use case?”
“What if I want to dothat isn’t in the book?”
Fair. The landscape’s shifting fast.
Inference got scaled. SLMs showed up. Context windows stretched. Costs dropped. Everything moved.
If AI has taught us anything, it’s to think AI-first — not just to keep up, but to build in ways that scale.
So instead of answering each DM, we took a step back. And we decided to build something that answers all of it, now and as things evolve.
The result?
From Beginner to Advanced LLM Developer
A 60+ hour, hands-on course that takes you from “I can prompt ChatGPT” → to deploying a production-grade RAG system with a real front end.
But we didn’t just pack it with knowledge — we designed it to evolve with the field.
Here’s what you walk away with:
A repeatable pipeline that adapts with tools, not trends
A deep instinct for how to think like an AI engineer
Lifetime access and weekly updates as the ecosystem changes
A private Slack for graduates + a 70,000+ builder community on Discord
Because staying current isn’t enough — you also need confidence that what you ship today still holds tomorrow.
That’s why we’re now running monthly live cohorts — so you stay sharp, supported, and up to date.
The next cohort kicks off June 1 with a live welcome call with our CEO.
Launch price:— zero risk thanks to a 30-day money-back guarantee.
Join the course here
The results speak for themselves:
“The course greatly expanded my knowledge of building and assessing RAG pipelines.” — Eoin McGrath
“Best course out there to become an AI engineer. Planning to build my own startup based on the learnings.” — Abhijit L.
“From zero to hero as an LLM Developer — a clear path to build LLM applications that can change your career.” — Luca Tanieli
Even industry voices you know have shared support:
“A resource I’ll return to again and again, no matter how fast the AI landscape shifts.” — Tina Huang, Lonely Octopus
This course is for you:
You know Python but haven’t taken an LLM past the notebook.
If you’re frustrated by shallow tutorials and fragmented docs…
If you want to build things that work, not just read about them…
If you’re ready to take LLMs seriously and want a proven structure…
There’s a roadmap. And it’s working.
The next cohort starts June 1st. As soon as you join, you get full access to all course material — no need to wait for the live kickoff. You can start building right away.
What You’ll Learn:
LLM Basics & Prompt Mastery
Transformers, tokenization, and prompting that actually reduces hallucinations
Retrieval-Augmented GenerationChunking, embedding models, re-ranking, query rewriting, eval, and feedback loops
Fine-Tuning
LoRA, adapters, and domain-specific models that actually perform
Tool & API Integration
Function calling, external tools, and chained agent workflows
Deployment & Cost Control
Gradio, Streamlit, latency fixes, caching, logging, monitoring, cost tracking
Capstone Project & Certification
Build and ship your own LLM app — get feedback, and leave with a portfolio-ready build
If you’re thinking, “This sounds great, but what if it’s not for me?” — we get it. That’s why the course comes with a 30-day, no-questions-asked money-back guarantee. Try it. Dive into the material. If it doesn’t meet your expectations, we’ll refund you in full.
Secure your spot for the June 1st cohort
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
#master #llm #stack #hours #learnMaster the LLM Stack in 60+ hours — learn, code, ship, and certifyAuthor: Towards AI Editorial Team Originally published on Towards AI. Over the past two years, we’ve helped teams design and deploy real-world LLM systems — RAG pipelines, copilots that actually reduce load, PoCs that became products, and cut down hallucinations. One year ago, we decided to put everything we knew about building architectures around LLMs — the stack, the mistakes, the gotchas, the strategies — into a single guide: Building LLMs for Production — “the most comprehensive textbook to date on building LLM applications,” as Jerry Liuput it. The response was amazing. People read it. People built with it. People shared it. But a few months in, our DMs started filling up: “Has the book been updated?” “Does it cover the latest models like o3 or Gemini 2.5?” “Can I useinstead?” “How do I choose the right model for my use case?” “What if I want to dothat isn’t in the book?” Fair. The landscape’s shifting fast. Inference got scaled. SLMs showed up. Context windows stretched. Costs dropped. Everything moved. If AI has taught us anything, it’s to think AI-first — not just to keep up, but to build in ways that scale. So instead of answering each DM, we took a step back. And we decided to build something that answers all of it, now and as things evolve. The result? From Beginner to Advanced LLM Developer A 60+ hour, hands-on course that takes you from “I can prompt ChatGPT” → to deploying a production-grade RAG system with a real front end. But we didn’t just pack it with knowledge — we designed it to evolve with the field. Here’s what you walk away with: ✅ A repeatable pipeline that adapts with tools, not trends ✅ A deep instinct for how to think like an AI engineer ✅ Lifetime access and weekly updates as the ecosystem changes ✅ A private Slack for graduates + a 70,000+ builder community on Discord Because staying current isn’t enough — you also need confidence that what you ship today still holds tomorrow. That’s why we’re now running monthly live cohorts — so you stay sharp, supported, and up to date. The next cohort kicks off June 1 with a live welcome call with our CEO. Launch price:— zero risk thanks to a 30-day money-back guarantee. 👉 Join the course here The results speak for themselves: “The course greatly expanded my knowledge of building and assessing RAG pipelines.” — Eoin McGrath “Best course out there to become an AI engineer. Planning to build my own startup based on the learnings.” — Abhijit L. “From zero to hero as an LLM Developer — a clear path to build LLM applications that can change your career.” — Luca Tanieli Even industry voices you know have shared support: “A resource I’ll return to again and again, no matter how fast the AI landscape shifts.” — Tina Huang, Lonely Octopus This course is for you: You know Python but haven’t taken an LLM past the notebook. If you’re frustrated by shallow tutorials and fragmented docs… If you want to build things that work, not just read about them… If you’re ready to take LLMs seriously and want a proven structure… There’s a roadmap. And it’s working. The next cohort starts June 1st. As soon as you join, you get full access to all course material — no need to wait for the live kickoff. You can start building right away. What You’ll Learn: 🧠 LLM Basics & Prompt Mastery Transformers, tokenization, and prompting that actually reduces hallucinations 🔍 Retrieval-Augmented GenerationChunking, embedding models, re-ranking, query rewriting, eval, and feedback loops 🎨 Fine-Tuning LoRA, adapters, and domain-specific models that actually perform 🤖 Tool & API Integration Function calling, external tools, and chained agent workflows 🚀 Deployment & Cost Control Gradio, Streamlit, latency fixes, caching, logging, monitoring, cost tracking 🏆 Capstone Project & Certification Build and ship your own LLM app — get feedback, and leave with a portfolio-ready build If you’re thinking, “This sounds great, but what if it’s not for me?” — we get it. That’s why the course comes with a 30-day, no-questions-asked money-back guarantee. Try it. Dive into the material. If it doesn’t meet your expectations, we’ll refund you in full. 👉 Secure your spot for the June 1st cohort Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI #master #llm #stack #hours #learnTOWARDSAI.NETMaster the LLM Stack in 60+ hours — learn, code, ship, and certifyAuthor(s): Towards AI Editorial Team Originally published on Towards AI. Over the past two years, we’ve helped teams design and deploy real-world LLM systems — RAG pipelines, copilots that actually reduce load, PoCs that became products, and cut down hallucinations. One year ago, we decided to put everything we knew about building architectures around LLMs — the stack, the mistakes, the gotchas, the strategies — into a single guide: Building LLMs for Production — “the most comprehensive textbook to date on building LLM applications,” as Jerry Liu (Founder & CEO, LlamaIndex) put it. The response was amazing. People read it. People built with it. People shared it. But a few months in, our DMs started filling up: “Has the book been updated?” “Does it cover the latest models like o3 or Gemini 2.5?” “Can I use [X vector DB] instead?” “How do I choose the right model for my use case?” “What if I want to do [insert new approach] that isn’t in the book?” Fair. The landscape’s shifting fast. Inference got scaled. SLMs showed up. Context windows stretched. Costs dropped. Everything moved. If AI has taught us anything, it’s to think AI-first — not just to keep up, but to build in ways that scale. So instead of answering each DM (which we still do), we took a step back. And we decided to build something that answers all of it, now and as things evolve. The result? From Beginner to Advanced LLM Developer A 60+ hour, hands-on course that takes you from “I can prompt ChatGPT” → to deploying a production-grade RAG system with a real front end. But we didn’t just pack it with knowledge — we designed it to evolve with the field. Here’s what you walk away with: ✅ A repeatable pipeline that adapts with tools, not trends ✅ A deep instinct for how to think like an AI engineer ✅ Lifetime access and weekly updates as the ecosystem changes ✅ A private Slack for graduates + a 70,000+ builder community on Discord Because staying current isn’t enough — you also need confidence that what you ship today still holds tomorrow. That’s why we’re now running monthly live cohorts — so you stay sharp, supported, and up to date. The next cohort kicks off June 1 with a live welcome call with our CEO. Launch price: $249 (75 % off) — zero risk thanks to a 30-day money-back guarantee. 👉 Join the course here The results speak for themselves: “The course greatly expanded my knowledge of building and assessing RAG pipelines.” — Eoin McGrath “Best course out there to become an AI engineer. Planning to build my own startup based on the learnings.” — Abhijit L. “From zero to hero as an LLM Developer — a clear path to build LLM applications that can change your career.” — Luca Tanieli Even industry voices you know have shared support: “A resource I’ll return to again and again, no matter how fast the AI landscape shifts.” — Tina Huang, Lonely Octopus This course is for you: You know Python but haven’t taken an LLM past the notebook. If you’re frustrated by shallow tutorials and fragmented docs… If you want to build things that work, not just read about them… If you’re ready to take LLMs seriously and want a proven structure… There’s a roadmap. And it’s working. The next cohort starts June 1st. As soon as you join, you get full access to all course material — no need to wait for the live kickoff. You can start building right away. What You’ll Learn: 🧠 LLM Basics & Prompt Mastery Transformers, tokenization, and prompting that actually reduces hallucinations 🔍 Retrieval-Augmented Generation (RAG) Chunking, embedding models, re-ranking, query rewriting, eval, and feedback loops 🎨 Fine-Tuning LoRA, adapters, and domain-specific models that actually perform 🤖 Tool & API Integration Function calling, external tools, and chained agent workflows 🚀 Deployment & Cost Control Gradio, Streamlit, latency fixes, caching, logging, monitoring, cost tracking 🏆 Capstone Project & Certification Build and ship your own LLM app — get feedback, and leave with a portfolio-ready build If you’re thinking, “This sounds great, but what if it’s not for me?” — we get it. That’s why the course comes with a 30-day, no-questions-asked money-back guarantee. Try it. Dive into the material. If it doesn’t meet your expectations, we’ll refund you in full. 👉 Secure your spot for the June 1st cohort Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 التعليقات 0 المشاركات -
The New AI Model Paradox: When “Upgrades” Feel Like Downgrades (Claude 3.7)
The New AI Model Paradox: When “Upgrades” Feel Like Downgrades (Claude 3.7)
0 like
May 14, 2025
Share this post
Last Updated on May 14, 2025 by Editorial Team
Author(s): Mandar Karhade, MD.
PhD.
Originally published on Towards AI.
The Allure of the New is misleading when the speed of development is too fast!
The tech world buzzes with excitement each time a new Artificial Intelligence (AI) model is unveiled.
We’re conditioned to expect these digital brains to be significantly faster, demonstrably smarter, and unequivocally better than their predecessors.
Companies fuel this anticipation with announcements of major breakthroughs, showcasing impressive demonstrations that promise to revolutionize how we work, create, and interact.
It’s easy to get swept up in this wave of optimism and believe that each new release is a universal leap forward.
But what happens when the shiny new AI, despite all the fanfare and positive press, doesn’t quite live up to the hype for your specific, day-to-day needs? What if, for your particular use case, it feels less like an upgrade and more like an unexpected step backward? This is a situation an increasing number of users are finding themselves in.
It’s a phenomenon that prompts a deeper reflection on what “better” truly means in the rapidly evolving landscape of AI and highlights that progress isn’t always a straight line for everyone.
Photo by Milad Fakurian on Unsplash
New AI models often launch accompanied by bold claims of vastly improved capabilities, frequently backed by strong performances on standardized benchmarks.
We see statistics showing the AI… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter.
Join over 80,000 subscribers and keep up to date with the latest developments in AI.
From research to projects and ideas.
If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI - Medium
Share this post
Source: https://towardsai.net/p/machine-learning/the-new-ai-model-paradox-when-upgrades-feel-like-downgrades-claude-3-7">https://towardsai.net/p/machine-learning/the-new-ai-model-paradox-when-upgrades-feel-like-downgrades-claude-3-7">https://towardsai.net/p/machine-learning/the-new-ai-model-paradox-when-upgrades-feel-like-downgrades-claude-3-7
#the #new #model #paradox #when #upgrades #feel #like #downgrades #claudeThe New AI Model Paradox: When “Upgrades” Feel Like Downgrades (Claude 3.7)The New AI Model Paradox: When “Upgrades” Feel Like Downgrades (Claude 3.7) 0 like May 14, 2025 Share this post Last Updated on May 14, 2025 by Editorial Team Author(s): Mandar Karhade, MD. PhD. Originally published on Towards AI. The Allure of the New is misleading when the speed of development is too fast! The tech world buzzes with excitement each time a new Artificial Intelligence (AI) model is unveiled. We’re conditioned to expect these digital brains to be significantly faster, demonstrably smarter, and unequivocally better than their predecessors. Companies fuel this anticipation with announcements of major breakthroughs, showcasing impressive demonstrations that promise to revolutionize how we work, create, and interact. It’s easy to get swept up in this wave of optimism and believe that each new release is a universal leap forward. But what happens when the shiny new AI, despite all the fanfare and positive press, doesn’t quite live up to the hype for your specific, day-to-day needs? What if, for your particular use case, it feels less like an upgrade and more like an unexpected step backward? This is a situation an increasing number of users are finding themselves in. It’s a phenomenon that prompts a deeper reflection on what “better” truly means in the rapidly evolving landscape of AI and highlights that progress isn’t always a straight line for everyone. Photo by Milad Fakurian on Unsplash New AI models often launch accompanied by bold claims of vastly improved capabilities, frequently backed by strong performances on standardized benchmarks. We see statistics showing the AI… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post Source: https://towardsai.net/p/machine-learning/the-new-ai-model-paradox-when-upgrades-feel-like-downgrades-claude-3-7 #the #new #model #paradox #when #upgrades #feel #like #downgrades #claudeTOWARDSAI.NETThe New AI Model Paradox: When “Upgrades” Feel Like Downgrades (Claude 3.7)The New AI Model Paradox: When “Upgrades” Feel Like Downgrades (Claude 3.7) 0 like May 14, 2025 Share this post Last Updated on May 14, 2025 by Editorial Team Author(s): Mandar Karhade, MD. PhD. Originally published on Towards AI. The Allure of the New is misleading when the speed of development is too fast! The tech world buzzes with excitement each time a new Artificial Intelligence (AI) model is unveiled. We’re conditioned to expect these digital brains to be significantly faster, demonstrably smarter, and unequivocally better than their predecessors. Companies fuel this anticipation with announcements of major breakthroughs, showcasing impressive demonstrations that promise to revolutionize how we work, create, and interact. It’s easy to get swept up in this wave of optimism and believe that each new release is a universal leap forward. But what happens when the shiny new AI, despite all the fanfare and positive press, doesn’t quite live up to the hype for your specific, day-to-day needs? What if, for your particular use case, it feels less like an upgrade and more like an unexpected step backward? This is a situation an increasing number of users are finding themselves in. It’s a phenomenon that prompts a deeper reflection on what “better” truly means in the rapidly evolving landscape of AI and highlights that progress isn’t always a straight line for everyone. Photo by Milad Fakurian on Unsplash New AI models often launch accompanied by bold claims of vastly improved capabilities, frequently backed by strong performances on standardized benchmarks. We see statistics showing the AI… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 التعليقات 0 المشاركات -
The Hidden Cost of AI: The Workplace Penalty Nobody Saw Coming
The Hidden Cost of AI: The Workplace Penalty Nobody Saw Coming
0 like
May 13, 2025
Share this post
Last Updated on May 14, 2025 by Editorial Team
Author(s): MKWriteshere
Originally published on Towards AI.
Image Generated by Author Using Gpt-4o
Using AI may boost your productivity while secretly tanking your professional reputation.
New research reveals a troubling disconnect between AI’s benefits and how others perceive AI users, creating a workplace catch-22 no one prepared us for.
(Non-Member Link)
Imagine standing at a workplace crossroads.
One path offers a powerful tool that could dramatically boost your productivity.
The other preserves your professional image as competent and hardworking.
Surprisingly, you can’t have both.
This is the dilemma uncovered by a groundbreaking study published in PNAS by researchers from Duke University.
Through four experiments with over 4,400 participants, researchers revealed a social penalty for AI use: Individuals who use AI tools face negative judgments about their competence and motivation from others.
These judgments manifest as both anticipated and actual social penalties, creating a paradox where productivity-enhancing AI tools can simultaneously improve performance and damage one’s professional reputation.
The study’s lead author, Jessica A.
Reif, along with her colleagues Richard P.
Larrick and Jack B.
Soll, documented this surprising disconnect between AI’s practical benefits and its social costs.
Their findings expose a psychological barrier to AI adoption that has significant implications for workplaces rapidly integrating these technologies.
Image Generated by Author Using Gpt-4o
The research began by… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter.
Join over 80,000 subscribers and keep up to date with the latest developments in AI.
From research to projects and ideas.
If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI - Medium
Share this post
Source: https://towardsai.net/p/artificial-intelligence/the-hidden-cost-of-ai-the-workplace-penalty-nobody-saw-coming" style="color: #0066cc;">https://towardsai.net/p/artificial-intelligence/the-hidden-cost-of-ai-the-workplace-penalty-nobody-saw-coming
#the #hidden #cost #workplace #penalty #nobody #saw #comingThe Hidden Cost of AI: The Workplace Penalty Nobody Saw ComingThe Hidden Cost of AI: The Workplace Penalty Nobody Saw Coming 0 like May 13, 2025 Share this post Last Updated on May 14, 2025 by Editorial Team Author(s): MKWriteshere Originally published on Towards AI. Image Generated by Author Using Gpt-4o Using AI may boost your productivity while secretly tanking your professional reputation. New research reveals a troubling disconnect between AI’s benefits and how others perceive AI users, creating a workplace catch-22 no one prepared us for. (Non-Member Link) Imagine standing at a workplace crossroads. One path offers a powerful tool that could dramatically boost your productivity. The other preserves your professional image as competent and hardworking. Surprisingly, you can’t have both. This is the dilemma uncovered by a groundbreaking study published in PNAS by researchers from Duke University. Through four experiments with over 4,400 participants, researchers revealed a social penalty for AI use: Individuals who use AI tools face negative judgments about their competence and motivation from others. These judgments manifest as both anticipated and actual social penalties, creating a paradox where productivity-enhancing AI tools can simultaneously improve performance and damage one’s professional reputation. The study’s lead author, Jessica A. Reif, along with her colleagues Richard P. Larrick and Jack B. Soll, documented this surprising disconnect between AI’s practical benefits and its social costs. Their findings expose a psychological barrier to AI adoption that has significant implications for workplaces rapidly integrating these technologies. Image Generated by Author Using Gpt-4o The research began by… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post Source: https://towardsai.net/p/artificial-intelligence/the-hidden-cost-of-ai-the-workplace-penalty-nobody-saw-coming #the #hidden #cost #workplace #penalty #nobody #saw #comingTOWARDSAI.NETThe Hidden Cost of AI: The Workplace Penalty Nobody Saw ComingThe Hidden Cost of AI: The Workplace Penalty Nobody Saw Coming 0 like May 13, 2025 Share this post Last Updated on May 14, 2025 by Editorial Team Author(s): MKWriteshere Originally published on Towards AI. Image Generated by Author Using Gpt-4o Using AI may boost your productivity while secretly tanking your professional reputation. New research reveals a troubling disconnect between AI’s benefits and how others perceive AI users, creating a workplace catch-22 no one prepared us for. (Non-Member Link) Imagine standing at a workplace crossroads. One path offers a powerful tool that could dramatically boost your productivity. The other preserves your professional image as competent and hardworking. Surprisingly, you can’t have both. This is the dilemma uncovered by a groundbreaking study published in PNAS by researchers from Duke University. Through four experiments with over 4,400 participants, researchers revealed a social penalty for AI use: Individuals who use AI tools face negative judgments about their competence and motivation from others. These judgments manifest as both anticipated and actual social penalties, creating a paradox where productivity-enhancing AI tools can simultaneously improve performance and damage one’s professional reputation. The study’s lead author, Jessica A. Reif, along with her colleagues Richard P. Larrick and Jack B. Soll, documented this surprising disconnect between AI’s practical benefits and its social costs. Their findings expose a psychological barrier to AI adoption that has significant implications for workplaces rapidly integrating these technologies. Image Generated by Author Using Gpt-4o The research began by… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 التعليقات 0 المشاركات -
Merging Minds: How Neuroscience and AI Are Creating the Future of Intelligence
Author(s): Talha Nazar
Originally published on Towards AI.
Imagine a world where your thoughts can control machines.
You think, and a robotic arm moves.
You feel, and a digital avatar mimics your expression.
Sounds like science fiction, right? But this is no longer just an idea scribbled in a cyberpunk novel — it’s happening right now, at the intersection of neuroscience and artificial intelligence.
As someone who’s been closely following AI for years, I find this confluence between biology and code deeply fascinating.
It’s as if we’re uncovering a hidden mirror: AI reflects how we think, while neuroscience peels back the layers of what thinking even is.
In this story, we’ll journey from brainwaves to neural networks, exploring how scientists and engineers are blending biology with silicon to create machines that learn, adapt, and maybe one day, even feel.
The Brain as a Blueprint for Machines
Let’s start with a simple question: How did AI get so smart?
The answer lies partly in how closely it’s modeled after us.
When researchers first began building artificial intelligence, they didn’t pull the idea from thin air.
Instead, they looked inward — to the brain.
Our brains contain roughly 86 billion neurons, each connected to thousands of others, forming a massive web of electrical and chemical signals.
Early AI pioneers like Warren McCulloch and Walter Pitts were inspired by this structure.
In 1943, they introduced a computational model of a neuron, laying the groundwork for what would later become artificial neural networks.
Fast forward to today, and these neural networks form the backbone of AI systems like GPT, Siri, and autonomous cars.
While far simpler than a real brain, they mimic how we process information: through layers of pattern recognition, memory, and adjustment based on feedback.
“The brain is not a computer, but it teaches us how to build better ones.”
The parallels are stunning.
Just like we learn from experience, AI models use algorithms like backpropagation to tweak their internal weights — essentially fine-tuning their ‘memory’ to make better decisions over time.
Weirdly, it’s like machines are learning to think the way we do.
From Mirror Neurons to Machine Empathy
Here’s where things get even more sci-fi.
In 1992, neuroscientists in Italy discovered mirror neurons — special brain cells that activate both when we perform an action and when we observe someone else doing it.
It’s like your brain says, “Hey, I know what that feels like.” These neurons are believed to be central to empathy, learning by imitation, and even language acquisition.
Now, imagine giving machines a similar ability.
That’s precisely what researchers are trying to do.
AI systems like OpenAI’s CLIP or Google DeepMind’s Gato are trained across multiple modalities — text, images, audio, and more — to better understand human context and emotion.
Of course, machines don’t feel.
However, they can approximate emotional responses using vast datasets of human expression.
Think of AI-generated art that captures loneliness, or chatbots that recognize your tone and respond with sympathy.
Are they truly empathetic? Probably not.
But can they simulate empathy well enough to be helpful? Increasingly, yes.
And that opens up enormous potential — especially in fields like mental health, where AI tools could one day assist therapists by detecting early signs of distress in patients’ speech or facial expressions.
Brain-Computer Interfaces (BCIs): Reading Minds, Literally
Let’s go a step further.
What if machines didn’t just respond to your words or actions — what if they could read your thoughts?
That’s the promise of brain-computer interfaces (BCIs), a fast-growing area at the crossroads of neuroscience, AI, and hardware engineering.
Companies like Neuralink (yes, Elon Musk’s venture) are developing implantable devices that allow the brain to communicate directly with computers.
These chips record electrical signals from neurons and translate them into digital commands.
That means someone paralyzed could one day send emails or move a robotic arm — just by thinking.
Sounds incredible, right? But it’s not just Neuralink.
UC San Francisco researchers recently used AI to decode brain activity into speech in real time.
Meanwhile, non-invasive devices — like EEG headsets — are getting better at detecting focus, fatigue, and even emotional states.
This isn’t just about convenience — it could redefine accessibility, communication, and even what it means to be human.
Still, there are ethical challenges.
Who owns your neural data? Can it be hacked? And what happens if the interface misfires? These questions aren’t just theoretical.
As BCI tech scales, we’ll need policies to ensure it enhances autonomy rather than undermines it.
Where They Merge: Shared Architectures and Inspirations
As the convergence of AI and neuroscience deepens, we begin to see a fascinating blend of ideas and structures.
AI models inspired by the brain are not just theoretical anymore; they are real-world tools pushing the boundaries of what we thought possible.
Let’s break down some of the key areas where the two fields come together.
1.
Neural Networks & Deep Learning
When you look at deep learning models, you might notice something oddly familiar: the way they’re structured.
Although artificial neurons are simpler, they resemble biological neurons in some ways.
Deep learning models are designed with layers — just like the visual cortex in the human brain.
Early layers of neural networks detect basic features like edges, and as the network gets deeper, it begins to recognize complex patterns and objects.
This mimics the brain’s hierarchical system of processing information, starting from simple features and building up to complex recognition.
It’s this analogy that has led to breakthroughs like image recognition and language translation.
Illustration by Author — Napkin.ai
2.
Reinforcement Learning and Dopamine
Reinforcement learning (RL) is a type of machine learning where agents learn by interacting with an environment, making decisions, and receiving rewards.
This idea of learning through rewards and punishments draws directly from neuroscience.
In the brain, dopaminergic neurons play a huge role in reward-based learning.
The basal ganglia, a part of the brain involved in motor control and decision-making, is activated when we receive a reward.
Similarly, in reinforcement learning, an agent’s actions are reinforced based on a reward signal, guiding the system toward better choices over time.
Illustration by Author — Napkin.ai
3.
Memory and Attention Mechanisms
Have you ever wondered how we remember important details in a conversation or a lecture, despite distractions around us? That’s the power of attention mechanisms in the brain.
These mechanisms allow us to focus on the most relevant pieces of information and filter out the noise.
In AI, this is mimicked by models like Transformers, which have taken the machine-learning world by storm, particularly in natural language processing (NLP).
By paying attention to key parts of input data, Transformers can process entire sentences, paragraphs, or even entire documents to extract meaning more effectively.
It’s what powers tools like ChatGPT, Gemmni, Grok, Deepseek, and many others.
Illustration by Author — Napkin.ai
4.
Neuromorphic Computing
The field of neuromorphic computing is a fascinating intersection where hardware and brain science collide.
Neuromorphic chips are designed to replicate the brain’s efficiency and power in processing.
These chips aren’t just inspired by the brain’s architecture but also mimic the way the brain communicates via spiking neural networks, which process information in discrete pulses — similar to how neurons fire in the brain.
Companies like IBM with TrueNorth and Intel with Loihi are leading the way in neuromorphic chips, creating highly energy-efficient processors that can learn from their environments, much like a biological brain.
Illustration by Author — Napkin.ai
Top Impactful Applications of the AI-Neuroscience Merge
The possibilities that arise from the blending of AI and neuroscience are not just theoretical.
They’re already shaping the future, from the way we interface with machines to how we treat mental health.
Let’s explore some of the most groundbreaking applications.
1.
Brain-Computer Interfaces (BCIs)
If you’ve ever dreamed of controlling a machine with just your thoughts, then you’re in luck.
Brain-computer interfaces (BCIs) are making this possible.
Companies like Neuralink are developing technologies that allow individuals to control devices using only their brain signals.
For example, BCIs could allow someone paralyzed from the neck down to move a robotic arm or type with their mind.
The big breakthrough came in 2023 when Neuralink received FDA approval for human trials.
While this is a huge step forward, it’s only the beginning.
These technologies could revolutionize the way we interact with technology and provide life-changing solutions for people with disabilities.
2.
Mental Health Diagnosis and Treatment
We all know how complex mental health is.
But AI has started to play a pivotal role in helping us understand and treat mental illnesses.
Imagine an AI system that analyzes speech, text, and behavior to detect early signs of depression, anxiety, or even schizophrenia.
Neuroscience has validated these AI models by comparing them with brain imaging techniques like fMRI.
Recent studies have shown that combining fMRI scans with deep learning can predict suicidal ideation in individuals at risk, a breakthrough that could save countless lives.
3.
Brain-Inspired AI Models
AI is increasingly drawing inspiration from how the brain works.
For example, DeepMind’s AlphaFold revolutionized protein folding predictions, but its inspiration didn’t come solely from computers.
By studying how the brain processes information, DeepMind developed models that learn and adapt in ways similar to human cognition.
This approach has given birth to models like Gato, a single neural architecture capable of handling hundreds of tasks — just as the human brain can handle a wide array of functions with efficiency and ease.
4.
Neuroprosthetics
One of the most inspiring applications of AI in neuroscience is in neuroprosthetics.
These prosthetics enable people to control artificial limbs directly with their brain signals, bypassing the need for physical motion.
The DEKA Arm is an example of a prosthetic that allows people with paralysis to control their arms through neural input, helping them regain lost independence.
5.
Cognitive Simulation & Brain Mapping
Understanding the human brain in its entirety — from the smallest neuron to the largest cognitive functions — is one of the greatest challenges of modern science.
Projects like the Human Brain Project and Blue Brain Project aim to simulate entire regions of the brain using AI models trained on massive datasets.
These initiatives could unlock the mysteries of consciousness and cognition, making the human brain one of the most powerful tools in science.
The Future: Beyond the Intersection of AI and Neuroscience
The future of AI and neuroscience is incredibly exciting, and we’re only just scratching the surface.
As AI models become more advanced and neuroscience continues to uncover the brain’s mysteries, we’ll see more refined and powerful applications that can change our lives in unimaginable ways.
1.
Personalized Healthcare
Imagine a world where AI doesn’t just treat illnesses based on generalized data but tailors treatments to your unique brain structure.
With advances in neuroimaging and AI algorithms, personalized medicine could become a reality.
AI could analyze your brain’s unique structure and function to predict diseases like Alzheimer’s, Parkinson’s, or even mental health disorders, offering treatments designed specifically for you.
2.
AI-Augmented Cognition
In the distant future, we may see a world where AI enhances human cognition.
Augmenting our natural intelligence with AI-driven enhancements could help us solve complex problems faster and more accurately.
Whether it’s through direct brain interfaces or enhanced learning techniques, this fusion of AI and neuroscience could reshape human potential in ways we can’t even begin to fathom.
3.
Artificial Consciousness
At the intersection of AI and neuroscience, some are exploring the possibility of artificial consciousness — the idea that AI could one day become self-aware.
Though this concept is still very much in the realm of science fiction, the continued merging of AI and neuroscience might eventually lead to machines that can think, feel, and understand the world just as we do.
The ethical implications of such a development would be profound, but the pursuit of consciousness in AI is something many researchers are already investigating.
Conclusion
The merging of AI and neuroscience is not just a passing trend; it’s an ongoing revolution that promises to change the way we interact with machines, understand the brain, and even treat neurological conditions.
While AI has already made incredible strides, the integration of neuroscientific insights will accelerate these advancements, bringing us closer to a future where human and machine intelligence work together seamlessly.
With the potential to reshape everything from healthcare to personal cognition, the collaboration between AI and neuroscience is poised to transform both fields.
The journey ahead is long, but the possibilities are endless.
The brain — our most sophisticated and enigmatic organ — may soon be the blueprint for a new era of intelligence, both human and artificial.
References
Thank you for reading! If you enjoyed this story, please consider giving it a clap, leaving a comment to share your thoughts, and passing it along to friends or colleagues who might benefit.
Your support and feedback help me create more valuable content for everyone.
Join thousands of data leaders on the AI newsletter.
Join over 80,000 subscribers and keep up to date with the latest developments in AI.
From research to projects and ideas.
If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Source: https://towardsai.net/p/artificial-intelligence/merging-minds-how-neuroscience-and-ai-are-creating-the-future-of-intelligence" style="color: #0066cc;">https://towardsai.net/p/artificial-intelligence/merging-minds-how-neuroscience-and-ai-are-creating-the-future-of-intelligence
#merging #minds #how #neuroscience #and #are #creating #the #future #intelligenceMerging Minds: How Neuroscience and AI Are Creating the Future of IntelligenceAuthor(s): Talha Nazar Originally published on Towards AI. Imagine a world where your thoughts can control machines. You think, and a robotic arm moves. You feel, and a digital avatar mimics your expression. Sounds like science fiction, right? But this is no longer just an idea scribbled in a cyberpunk novel — it’s happening right now, at the intersection of neuroscience and artificial intelligence. As someone who’s been closely following AI for years, I find this confluence between biology and code deeply fascinating. It’s as if we’re uncovering a hidden mirror: AI reflects how we think, while neuroscience peels back the layers of what thinking even is. In this story, we’ll journey from brainwaves to neural networks, exploring how scientists and engineers are blending biology with silicon to create machines that learn, adapt, and maybe one day, even feel. The Brain as a Blueprint for Machines Let’s start with a simple question: How did AI get so smart? The answer lies partly in how closely it’s modeled after us. When researchers first began building artificial intelligence, they didn’t pull the idea from thin air. Instead, they looked inward — to the brain. Our brains contain roughly 86 billion neurons, each connected to thousands of others, forming a massive web of electrical and chemical signals. Early AI pioneers like Warren McCulloch and Walter Pitts were inspired by this structure. In 1943, they introduced a computational model of a neuron, laying the groundwork for what would later become artificial neural networks. Fast forward to today, and these neural networks form the backbone of AI systems like GPT, Siri, and autonomous cars. While far simpler than a real brain, they mimic how we process information: through layers of pattern recognition, memory, and adjustment based on feedback. “The brain is not a computer, but it teaches us how to build better ones.” The parallels are stunning. Just like we learn from experience, AI models use algorithms like backpropagation to tweak their internal weights — essentially fine-tuning their ‘memory’ to make better decisions over time. Weirdly, it’s like machines are learning to think the way we do. From Mirror Neurons to Machine Empathy Here’s where things get even more sci-fi. In 1992, neuroscientists in Italy discovered mirror neurons — special brain cells that activate both when we perform an action and when we observe someone else doing it. It’s like your brain says, “Hey, I know what that feels like.” These neurons are believed to be central to empathy, learning by imitation, and even language acquisition. Now, imagine giving machines a similar ability. That’s precisely what researchers are trying to do. AI systems like OpenAI’s CLIP or Google DeepMind’s Gato are trained across multiple modalities — text, images, audio, and more — to better understand human context and emotion. Of course, machines don’t feel. However, they can approximate emotional responses using vast datasets of human expression. Think of AI-generated art that captures loneliness, or chatbots that recognize your tone and respond with sympathy. Are they truly empathetic? Probably not. But can they simulate empathy well enough to be helpful? Increasingly, yes. And that opens up enormous potential — especially in fields like mental health, where AI tools could one day assist therapists by detecting early signs of distress in patients’ speech or facial expressions. Brain-Computer Interfaces (BCIs): Reading Minds, Literally Let’s go a step further. What if machines didn’t just respond to your words or actions — what if they could read your thoughts? That’s the promise of brain-computer interfaces (BCIs), a fast-growing area at the crossroads of neuroscience, AI, and hardware engineering. Companies like Neuralink (yes, Elon Musk’s venture) are developing implantable devices that allow the brain to communicate directly with computers. These chips record electrical signals from neurons and translate them into digital commands. That means someone paralyzed could one day send emails or move a robotic arm — just by thinking. Sounds incredible, right? But it’s not just Neuralink. UC San Francisco researchers recently used AI to decode brain activity into speech in real time. Meanwhile, non-invasive devices — like EEG headsets — are getting better at detecting focus, fatigue, and even emotional states. This isn’t just about convenience — it could redefine accessibility, communication, and even what it means to be human. Still, there are ethical challenges. Who owns your neural data? Can it be hacked? And what happens if the interface misfires? These questions aren’t just theoretical. As BCI tech scales, we’ll need policies to ensure it enhances autonomy rather than undermines it. Where They Merge: Shared Architectures and Inspirations As the convergence of AI and neuroscience deepens, we begin to see a fascinating blend of ideas and structures. AI models inspired by the brain are not just theoretical anymore; they are real-world tools pushing the boundaries of what we thought possible. Let’s break down some of the key areas where the two fields come together. 1. Neural Networks & Deep Learning When you look at deep learning models, you might notice something oddly familiar: the way they’re structured. Although artificial neurons are simpler, they resemble biological neurons in some ways. Deep learning models are designed with layers — just like the visual cortex in the human brain. Early layers of neural networks detect basic features like edges, and as the network gets deeper, it begins to recognize complex patterns and objects. This mimics the brain’s hierarchical system of processing information, starting from simple features and building up to complex recognition. It’s this analogy that has led to breakthroughs like image recognition and language translation. Illustration by Author — Napkin.ai 2. Reinforcement Learning and Dopamine Reinforcement learning (RL) is a type of machine learning where agents learn by interacting with an environment, making decisions, and receiving rewards. This idea of learning through rewards and punishments draws directly from neuroscience. In the brain, dopaminergic neurons play a huge role in reward-based learning. The basal ganglia, a part of the brain involved in motor control and decision-making, is activated when we receive a reward. Similarly, in reinforcement learning, an agent’s actions are reinforced based on a reward signal, guiding the system toward better choices over time. Illustration by Author — Napkin.ai 3. Memory and Attention Mechanisms Have you ever wondered how we remember important details in a conversation or a lecture, despite distractions around us? That’s the power of attention mechanisms in the brain. These mechanisms allow us to focus on the most relevant pieces of information and filter out the noise. In AI, this is mimicked by models like Transformers, which have taken the machine-learning world by storm, particularly in natural language processing (NLP). By paying attention to key parts of input data, Transformers can process entire sentences, paragraphs, or even entire documents to extract meaning more effectively. It’s what powers tools like ChatGPT, Gemmni, Grok, Deepseek, and many others. Illustration by Author — Napkin.ai 4. Neuromorphic Computing The field of neuromorphic computing is a fascinating intersection where hardware and brain science collide. Neuromorphic chips are designed to replicate the brain’s efficiency and power in processing. These chips aren’t just inspired by the brain’s architecture but also mimic the way the brain communicates via spiking neural networks, which process information in discrete pulses — similar to how neurons fire in the brain. Companies like IBM with TrueNorth and Intel with Loihi are leading the way in neuromorphic chips, creating highly energy-efficient processors that can learn from their environments, much like a biological brain. Illustration by Author — Napkin.ai Top Impactful Applications of the AI-Neuroscience Merge The possibilities that arise from the blending of AI and neuroscience are not just theoretical. They’re already shaping the future, from the way we interface with machines to how we treat mental health. Let’s explore some of the most groundbreaking applications. 1. Brain-Computer Interfaces (BCIs) If you’ve ever dreamed of controlling a machine with just your thoughts, then you’re in luck. Brain-computer interfaces (BCIs) are making this possible. Companies like Neuralink are developing technologies that allow individuals to control devices using only their brain signals. For example, BCIs could allow someone paralyzed from the neck down to move a robotic arm or type with their mind. The big breakthrough came in 2023 when Neuralink received FDA approval for human trials. While this is a huge step forward, it’s only the beginning. These technologies could revolutionize the way we interact with technology and provide life-changing solutions for people with disabilities. 2. Mental Health Diagnosis and Treatment We all know how complex mental health is. But AI has started to play a pivotal role in helping us understand and treat mental illnesses. Imagine an AI system that analyzes speech, text, and behavior to detect early signs of depression, anxiety, or even schizophrenia. Neuroscience has validated these AI models by comparing them with brain imaging techniques like fMRI. Recent studies have shown that combining fMRI scans with deep learning can predict suicidal ideation in individuals at risk, a breakthrough that could save countless lives. 3. Brain-Inspired AI Models AI is increasingly drawing inspiration from how the brain works. For example, DeepMind’s AlphaFold revolutionized protein folding predictions, but its inspiration didn’t come solely from computers. By studying how the brain processes information, DeepMind developed models that learn and adapt in ways similar to human cognition. This approach has given birth to models like Gato, a single neural architecture capable of handling hundreds of tasks — just as the human brain can handle a wide array of functions with efficiency and ease. 4. Neuroprosthetics One of the most inspiring applications of AI in neuroscience is in neuroprosthetics. These prosthetics enable people to control artificial limbs directly with their brain signals, bypassing the need for physical motion. The DEKA Arm is an example of a prosthetic that allows people with paralysis to control their arms through neural input, helping them regain lost independence. 5. Cognitive Simulation & Brain Mapping Understanding the human brain in its entirety — from the smallest neuron to the largest cognitive functions — is one of the greatest challenges of modern science. Projects like the Human Brain Project and Blue Brain Project aim to simulate entire regions of the brain using AI models trained on massive datasets. These initiatives could unlock the mysteries of consciousness and cognition, making the human brain one of the most powerful tools in science. The Future: Beyond the Intersection of AI and Neuroscience The future of AI and neuroscience is incredibly exciting, and we’re only just scratching the surface. As AI models become more advanced and neuroscience continues to uncover the brain’s mysteries, we’ll see more refined and powerful applications that can change our lives in unimaginable ways. 1. Personalized Healthcare Imagine a world where AI doesn’t just treat illnesses based on generalized data but tailors treatments to your unique brain structure. With advances in neuroimaging and AI algorithms, personalized medicine could become a reality. AI could analyze your brain’s unique structure and function to predict diseases like Alzheimer’s, Parkinson’s, or even mental health disorders, offering treatments designed specifically for you. 2. AI-Augmented Cognition In the distant future, we may see a world where AI enhances human cognition. Augmenting our natural intelligence with AI-driven enhancements could help us solve complex problems faster and more accurately. Whether it’s through direct brain interfaces or enhanced learning techniques, this fusion of AI and neuroscience could reshape human potential in ways we can’t even begin to fathom. 3. Artificial Consciousness At the intersection of AI and neuroscience, some are exploring the possibility of artificial consciousness — the idea that AI could one day become self-aware. Though this concept is still very much in the realm of science fiction, the continued merging of AI and neuroscience might eventually lead to machines that can think, feel, and understand the world just as we do. The ethical implications of such a development would be profound, but the pursuit of consciousness in AI is something many researchers are already investigating. Conclusion The merging of AI and neuroscience is not just a passing trend; it’s an ongoing revolution that promises to change the way we interact with machines, understand the brain, and even treat neurological conditions. While AI has already made incredible strides, the integration of neuroscientific insights will accelerate these advancements, bringing us closer to a future where human and machine intelligence work together seamlessly. With the potential to reshape everything from healthcare to personal cognition, the collaboration between AI and neuroscience is poised to transform both fields. The journey ahead is long, but the possibilities are endless. The brain — our most sophisticated and enigmatic organ — may soon be the blueprint for a new era of intelligence, both human and artificial. References Thank you for reading! If you enjoyed this story, please consider giving it a clap, leaving a comment to share your thoughts, and passing it along to friends or colleagues who might benefit. Your support and feedback help me create more valuable content for everyone. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Source: https://towardsai.net/p/artificial-intelligence/merging-minds-how-neuroscience-and-ai-are-creating-the-future-of-intelligence #merging #minds #how #neuroscience #and #are #creating #the #future #intelligenceTOWARDSAI.NETMerging Minds: How Neuroscience and AI Are Creating the Future of IntelligenceAuthor(s): Talha Nazar Originally published on Towards AI. Imagine a world where your thoughts can control machines. You think, and a robotic arm moves. You feel, and a digital avatar mimics your expression. Sounds like science fiction, right? But this is no longer just an idea scribbled in a cyberpunk novel — it’s happening right now, at the intersection of neuroscience and artificial intelligence. As someone who’s been closely following AI for years, I find this confluence between biology and code deeply fascinating. It’s as if we’re uncovering a hidden mirror: AI reflects how we think, while neuroscience peels back the layers of what thinking even is. In this story, we’ll journey from brainwaves to neural networks, exploring how scientists and engineers are blending biology with silicon to create machines that learn, adapt, and maybe one day, even feel. The Brain as a Blueprint for Machines Let’s start with a simple question: How did AI get so smart? The answer lies partly in how closely it’s modeled after us. When researchers first began building artificial intelligence, they didn’t pull the idea from thin air. Instead, they looked inward — to the brain. Our brains contain roughly 86 billion neurons, each connected to thousands of others, forming a massive web of electrical and chemical signals. Early AI pioneers like Warren McCulloch and Walter Pitts were inspired by this structure. In 1943, they introduced a computational model of a neuron, laying the groundwork for what would later become artificial neural networks. Fast forward to today, and these neural networks form the backbone of AI systems like GPT, Siri, and autonomous cars. While far simpler than a real brain, they mimic how we process information: through layers of pattern recognition, memory, and adjustment based on feedback. “The brain is not a computer, but it teaches us how to build better ones.” The parallels are stunning. Just like we learn from experience, AI models use algorithms like backpropagation to tweak their internal weights — essentially fine-tuning their ‘memory’ to make better decisions over time. Weirdly, it’s like machines are learning to think the way we do. From Mirror Neurons to Machine Empathy Here’s where things get even more sci-fi. In 1992, neuroscientists in Italy discovered mirror neurons — special brain cells that activate both when we perform an action and when we observe someone else doing it. It’s like your brain says, “Hey, I know what that feels like.” These neurons are believed to be central to empathy, learning by imitation, and even language acquisition. Now, imagine giving machines a similar ability. That’s precisely what researchers are trying to do. AI systems like OpenAI’s CLIP or Google DeepMind’s Gato are trained across multiple modalities — text, images, audio, and more — to better understand human context and emotion. Of course, machines don’t feel. However, they can approximate emotional responses using vast datasets of human expression. Think of AI-generated art that captures loneliness, or chatbots that recognize your tone and respond with sympathy. Are they truly empathetic? Probably not. But can they simulate empathy well enough to be helpful? Increasingly, yes. And that opens up enormous potential — especially in fields like mental health, where AI tools could one day assist therapists by detecting early signs of distress in patients’ speech or facial expressions. Brain-Computer Interfaces (BCIs): Reading Minds, Literally Let’s go a step further. What if machines didn’t just respond to your words or actions — what if they could read your thoughts? That’s the promise of brain-computer interfaces (BCIs), a fast-growing area at the crossroads of neuroscience, AI, and hardware engineering. Companies like Neuralink (yes, Elon Musk’s venture) are developing implantable devices that allow the brain to communicate directly with computers. These chips record electrical signals from neurons and translate them into digital commands. That means someone paralyzed could one day send emails or move a robotic arm — just by thinking. Sounds incredible, right? But it’s not just Neuralink. UC San Francisco researchers recently used AI to decode brain activity into speech in real time. Meanwhile, non-invasive devices — like EEG headsets — are getting better at detecting focus, fatigue, and even emotional states. This isn’t just about convenience — it could redefine accessibility, communication, and even what it means to be human. Still, there are ethical challenges. Who owns your neural data? Can it be hacked? And what happens if the interface misfires? These questions aren’t just theoretical. As BCI tech scales, we’ll need policies to ensure it enhances autonomy rather than undermines it. Where They Merge: Shared Architectures and Inspirations As the convergence of AI and neuroscience deepens, we begin to see a fascinating blend of ideas and structures. AI models inspired by the brain are not just theoretical anymore; they are real-world tools pushing the boundaries of what we thought possible. Let’s break down some of the key areas where the two fields come together. 1. Neural Networks & Deep Learning When you look at deep learning models, you might notice something oddly familiar: the way they’re structured. Although artificial neurons are simpler, they resemble biological neurons in some ways. Deep learning models are designed with layers — just like the visual cortex in the human brain. Early layers of neural networks detect basic features like edges, and as the network gets deeper, it begins to recognize complex patterns and objects. This mimics the brain’s hierarchical system of processing information, starting from simple features and building up to complex recognition. It’s this analogy that has led to breakthroughs like image recognition and language translation. Illustration by Author — Napkin.ai 2. Reinforcement Learning and Dopamine Reinforcement learning (RL) is a type of machine learning where agents learn by interacting with an environment, making decisions, and receiving rewards. This idea of learning through rewards and punishments draws directly from neuroscience. In the brain, dopaminergic neurons play a huge role in reward-based learning. The basal ganglia, a part of the brain involved in motor control and decision-making, is activated when we receive a reward. Similarly, in reinforcement learning, an agent’s actions are reinforced based on a reward signal, guiding the system toward better choices over time. Illustration by Author — Napkin.ai 3. Memory and Attention Mechanisms Have you ever wondered how we remember important details in a conversation or a lecture, despite distractions around us? That’s the power of attention mechanisms in the brain. These mechanisms allow us to focus on the most relevant pieces of information and filter out the noise. In AI, this is mimicked by models like Transformers, which have taken the machine-learning world by storm, particularly in natural language processing (NLP). By paying attention to key parts of input data, Transformers can process entire sentences, paragraphs, or even entire documents to extract meaning more effectively. It’s what powers tools like ChatGPT, Gemmni, Grok, Deepseek, and many others. Illustration by Author — Napkin.ai 4. Neuromorphic Computing The field of neuromorphic computing is a fascinating intersection where hardware and brain science collide. Neuromorphic chips are designed to replicate the brain’s efficiency and power in processing. These chips aren’t just inspired by the brain’s architecture but also mimic the way the brain communicates via spiking neural networks, which process information in discrete pulses — similar to how neurons fire in the brain. Companies like IBM with TrueNorth and Intel with Loihi are leading the way in neuromorphic chips, creating highly energy-efficient processors that can learn from their environments, much like a biological brain. Illustration by Author — Napkin.ai Top Impactful Applications of the AI-Neuroscience Merge The possibilities that arise from the blending of AI and neuroscience are not just theoretical. They’re already shaping the future, from the way we interface with machines to how we treat mental health. Let’s explore some of the most groundbreaking applications. 1. Brain-Computer Interfaces (BCIs) If you’ve ever dreamed of controlling a machine with just your thoughts, then you’re in luck. Brain-computer interfaces (BCIs) are making this possible. Companies like Neuralink are developing technologies that allow individuals to control devices using only their brain signals. For example, BCIs could allow someone paralyzed from the neck down to move a robotic arm or type with their mind. The big breakthrough came in 2023 when Neuralink received FDA approval for human trials. While this is a huge step forward, it’s only the beginning. These technologies could revolutionize the way we interact with technology and provide life-changing solutions for people with disabilities. 2. Mental Health Diagnosis and Treatment We all know how complex mental health is. But AI has started to play a pivotal role in helping us understand and treat mental illnesses. Imagine an AI system that analyzes speech, text, and behavior to detect early signs of depression, anxiety, or even schizophrenia. Neuroscience has validated these AI models by comparing them with brain imaging techniques like fMRI. Recent studies have shown that combining fMRI scans with deep learning can predict suicidal ideation in individuals at risk, a breakthrough that could save countless lives. 3. Brain-Inspired AI Models AI is increasingly drawing inspiration from how the brain works. For example, DeepMind’s AlphaFold revolutionized protein folding predictions, but its inspiration didn’t come solely from computers. By studying how the brain processes information, DeepMind developed models that learn and adapt in ways similar to human cognition. This approach has given birth to models like Gato, a single neural architecture capable of handling hundreds of tasks — just as the human brain can handle a wide array of functions with efficiency and ease. 4. Neuroprosthetics One of the most inspiring applications of AI in neuroscience is in neuroprosthetics. These prosthetics enable people to control artificial limbs directly with their brain signals, bypassing the need for physical motion. The DEKA Arm is an example of a prosthetic that allows people with paralysis to control their arms through neural input, helping them regain lost independence. 5. Cognitive Simulation & Brain Mapping Understanding the human brain in its entirety — from the smallest neuron to the largest cognitive functions — is one of the greatest challenges of modern science. Projects like the Human Brain Project and Blue Brain Project aim to simulate entire regions of the brain using AI models trained on massive datasets. These initiatives could unlock the mysteries of consciousness and cognition, making the human brain one of the most powerful tools in science. The Future: Beyond the Intersection of AI and Neuroscience The future of AI and neuroscience is incredibly exciting, and we’re only just scratching the surface. As AI models become more advanced and neuroscience continues to uncover the brain’s mysteries, we’ll see more refined and powerful applications that can change our lives in unimaginable ways. 1. Personalized Healthcare Imagine a world where AI doesn’t just treat illnesses based on generalized data but tailors treatments to your unique brain structure. With advances in neuroimaging and AI algorithms, personalized medicine could become a reality. AI could analyze your brain’s unique structure and function to predict diseases like Alzheimer’s, Parkinson’s, or even mental health disorders, offering treatments designed specifically for you. 2. AI-Augmented Cognition In the distant future, we may see a world where AI enhances human cognition. Augmenting our natural intelligence with AI-driven enhancements could help us solve complex problems faster and more accurately. Whether it’s through direct brain interfaces or enhanced learning techniques, this fusion of AI and neuroscience could reshape human potential in ways we can’t even begin to fathom. 3. Artificial Consciousness At the intersection of AI and neuroscience, some are exploring the possibility of artificial consciousness — the idea that AI could one day become self-aware. Though this concept is still very much in the realm of science fiction, the continued merging of AI and neuroscience might eventually lead to machines that can think, feel, and understand the world just as we do. The ethical implications of such a development would be profound, but the pursuit of consciousness in AI is something many researchers are already investigating. Conclusion The merging of AI and neuroscience is not just a passing trend; it’s an ongoing revolution that promises to change the way we interact with machines, understand the brain, and even treat neurological conditions. While AI has already made incredible strides, the integration of neuroscientific insights will accelerate these advancements, bringing us closer to a future where human and machine intelligence work together seamlessly. With the potential to reshape everything from healthcare to personal cognition, the collaboration between AI and neuroscience is poised to transform both fields. The journey ahead is long, but the possibilities are endless. The brain — our most sophisticated and enigmatic organ — may soon be the blueprint for a new era of intelligence, both human and artificial. References Thank you for reading! If you enjoyed this story, please consider giving it a clap, leaving a comment to share your thoughts, and passing it along to friends or colleagues who might benefit. Your support and feedback help me create more valuable content for everyone. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 التعليقات 0 المشاركات
المزيد من المنشورات