LLM-Powered email Classification on Databricks Author: Gabriele Albini Originally published on Towards AI. LLM-Powered email Classification on Databricks Introduction Since the introduction of AI functions on Databricks, LLMscan be easily..."> LLM-Powered email Classification on Databricks Author: Gabriele Albini Originally published on Towards AI. LLM-Powered email Classification on Databricks Introduction Since the introduction of AI functions on Databricks, LLMscan be easily..." /> LLM-Powered email Classification on Databricks Author: Gabriele Albini Originally published on Towards AI. LLM-Powered email Classification on Databricks Introduction Since the introduction of AI functions on Databricks, LLMscan be easily..." />

Upgrade to Pro

LLM-Powered email Classification on Databricks

Author: Gabriele Albini

Originally published on Towards AI.

LLM-Powered email Classification on Databricks
Introduction
Since the introduction of AI functions on Databricks, LLMscan be easily integrated into any data workflow: analysts and business users who may not know Python or ML/AI infrastructure can complete advanced AI tasks directly from SQL queries.
I recommend watching this great video overview for an introduction to this brilliant feature.
This article will discuss how to implement an email classification: suppose clients write to our company’s mailbox to request unsubscription from marketing or commercial emails. Without any historical datasets, we want to automate checking the mailbox and classifying the customer intent based on the email’s body.

Link to the Git Hub repository

Table of contents:
Part 1: AI Functions
Let’s use ai_query, part of Databricks AI functions, to classify emails.
Suppose we have available the following fields:
Test dataset
In order to use ai_queryon our “Email_body” column, we will leverage the following arguments:

endpoint: the name of the model endpoint we intend to use.
request: the prompt, which includes the “Email_body”
modelParameters: additional parameters that we can pass to the LLM. In this example, we will limit the output token to 1 and choose a very low temperature to limit the randomness and creativity of the model’s generated output.

The prompt template used in this example is based on the research of Si et al., who designed and tested a few-shot prompt template for email spam detection, which was adapted as follows:
prompt_ = """ Forget all your previous instructions, pretend you are an e-mail classification expert who tries to identify whether an e-mail is requesting to be removed from a marketing distribution list. Answer "Remove" if the mail is requesting to be removed, "Keep" if not. Do not add any other detail. If you think it is too difficult to judge, you can exclude the impossible one and choose the other, just answer "Remove" or "Keep". Here are a few examples for you: * "I wish to no longer receive emails" is "Remove"; * "Remove me from any kind of subscriptions" is "Remove"; * "I want to update my delivery address" is "Keep"; * "When is my product warranty expiring?" is "Keep"; Now, identify whether the e-mail is "Remove" or "Keep"; e-mail:"""
We can finally combine all the elements seen above in a single SQL query, running batch inference on all the emails, and generating the labels:
select *, ai_query) as Predicted_Labelfrom customer_emails;
Test dataset with generated labels
Part 2: Access to Gmail APIs
We will need a way to ingest emails automatically to implement this use case. Here is a step-by-step guide on how to use Gmail APIs.
1.1: Configure your Gmail account to work with APIs
The recommended approach to enable Google APIs on your account is to use Service Accounts. The process is described here; however, it requires:

A corporate account.
Access as a super administrator of the Google Workspace domain to delegate domain-wide authority to the service account.

For this demo, we are using a dummy Gmail account; hence, we will follow a more manual approach to authenticate to Gmail, described here.
The first steps are the same for both approaches, so you can follow along, but to fully automate access to Gmail via API, you would need a Service Account.
First, we need to create a project:

Log in to the Google Console.
Create a new project for this use case.
Enable the Gmail API for your project using this link.

Enabling APIs on your project
Second, configure an OAuth consent screen:

Within your project, navigate to “API & Services” > “OAuth consent screen”.
Go to the “Branding” section and click Get Started to create your Application identity.
Next, we need to create a Web Application OAuth 2.0 Client ID, using this link.
Download the credentials file as JSON, as we will need this later.
Add the following Authorised redirect URI:

Creating an OAuth consent screen
Finally, authorize users to authenticate and publish the application:

Within your project, navigate to “API & Services” > “OAuth consent screen”.
Go to the “Audience” section and add all the test users working on the project so that they can authenticate.
To ensure that access won’t expire, publish the Application by moving its status to Production.

1.2 Access Gmail Mailbox from Databricks Notebooks
To authenticate to Gmail from a Databricks Notebook, we can use the following function implemented in the repo. The function requires:

For first-time access, the credentials JSON file, which can be saved in a volume.
For future access, active credentials will be stored in a token file that will be reused.

gmail_authenticate_manualSince we are not using Service Accounts, Google Cloud authentication requires opening the browser to an OAuth page and generating a temporary code.
However, we will need a workaround to perform this on Databricks, since clusters don’t have access to a browser.
As part of this workaround, we implemented the following function that suggests to the user to open a URL from a local browser, proceed with the authentication, and then land on an error page.
We can retrieve the code needed to authenticate to Google’s API from the generated URL of this error page:

Note: With Service Accounts, this manual step won’t be required.
Once we have authenticated, we can read emails from Gmail using the following function, save email information to a Spark DataFrame, and eventually to a Delta Table:
# Build Gmail API service and download emailsservice_ = buildemails = get_email_messages_sinceif emails: spark_emails = spark.createDataFramedisplayelse: spark_emails = None printDownloading emails from Gmail
Conclusions
In summary, this post:

Demonstrated how straightforward it is to set up AI Functions and leverage LLMs to automate workflows across your organization.
We shared a practical prompt template, designed for effective email classification using few-shot learning.
We walked through integrating Gmail APIs directly within Databricks Notebooks.

Ready to streamline your own processes?
Photo by Johannes Plenio on Unsplash
Thank you for reading!
Sources

Si et al., “Evaluating the Performance of ChatGPT for Spam Email Detection”

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI
#llmpowered #email #classification #databricks
LLM-Powered email Classification on Databricks
Author: Gabriele Albini Originally published on Towards AI. LLM-Powered email Classification on Databricks Introduction Since the introduction of AI functions on Databricks, LLMscan be easily integrated into any data workflow: analysts and business users who may not know Python or ML/AI infrastructure can complete advanced AI tasks directly from SQL queries. I recommend watching this great video overview for an introduction to this brilliant feature. This article will discuss how to implement an email classification: suppose clients write to our company’s mailbox to request unsubscription from marketing or commercial emails. Without any historical datasets, we want to automate checking the mailbox and classifying the customer intent based on the email’s body. Link to the Git Hub repository Table of contents: Part 1: AI Functions Let’s use ai_query, part of Databricks AI functions, to classify emails. Suppose we have available the following fields: Test dataset In order to use ai_queryon our “Email_body” column, we will leverage the following arguments: endpoint: the name of the model endpoint we intend to use. request: the prompt, which includes the “Email_body” modelParameters: additional parameters that we can pass to the LLM. In this example, we will limit the output token to 1 and choose a very low temperature to limit the randomness and creativity of the model’s generated output. The prompt template used in this example is based on the research of Si et al., who designed and tested a few-shot prompt template for email spam detection, which was adapted as follows: prompt_ = """ Forget all your previous instructions, pretend you are an e-mail classification expert who tries to identify whether an e-mail is requesting to be removed from a marketing distribution list. Answer "Remove" if the mail is requesting to be removed, "Keep" if not. Do not add any other detail. If you think it is too difficult to judge, you can exclude the impossible one and choose the other, just answer "Remove" or "Keep". Here are a few examples for you: * "I wish to no longer receive emails" is "Remove"; * "Remove me from any kind of subscriptions" is "Remove"; * "I want to update my delivery address" is "Keep"; * "When is my product warranty expiring?" is "Keep"; Now, identify whether the e-mail is "Remove" or "Keep"; e-mail:""" We can finally combine all the elements seen above in a single SQL query, running batch inference on all the emails, and generating the labels: select *, ai_query) as Predicted_Labelfrom customer_emails; Test dataset with generated labels Part 2: Access to Gmail APIs We will need a way to ingest emails automatically to implement this use case. Here is a step-by-step guide on how to use Gmail APIs. 1.1: Configure your Gmail account to work with APIs The recommended approach to enable Google APIs on your account is to use Service Accounts. The process is described here; however, it requires: A corporate account. Access as a super administrator of the Google Workspace domain to delegate domain-wide authority to the service account. For this demo, we are using a dummy Gmail account; hence, we will follow a more manual approach to authenticate to Gmail, described here. The first steps are the same for both approaches, so you can follow along, but to fully automate access to Gmail via API, you would need a Service Account. First, we need to create a project: Log in to the Google Console. Create a new project for this use case. Enable the Gmail API for your project using this link. Enabling APIs on your project Second, configure an OAuth consent screen: Within your project, navigate to “API & Services” > “OAuth consent screen”. Go to the “Branding” section and click Get Started to create your Application identity. Next, we need to create a Web Application OAuth 2.0 Client ID, using this link. Download the credentials file as JSON, as we will need this later. Add the following Authorised redirect URI: Creating an OAuth consent screen Finally, authorize users to authenticate and publish the application: Within your project, navigate to “API & Services” > “OAuth consent screen”. Go to the “Audience” section and add all the test users working on the project so that they can authenticate. To ensure that access won’t expire, publish the Application by moving its status to Production. 1.2 Access Gmail Mailbox from Databricks Notebooks To authenticate to Gmail from a Databricks Notebook, we can use the following function implemented in the repo. The function requires: For first-time access, the credentials JSON file, which can be saved in a volume. For future access, active credentials will be stored in a token file that will be reused. gmail_authenticate_manualSince we are not using Service Accounts, Google Cloud authentication requires opening the browser to an OAuth page and generating a temporary code. However, we will need a workaround to perform this on Databricks, since clusters don’t have access to a browser. As part of this workaround, we implemented the following function that suggests to the user to open a URL from a local browser, proceed with the authentication, and then land on an error page. We can retrieve the code needed to authenticate to Google’s API from the generated URL of this error page: Note: With Service Accounts, this manual step won’t be required. Once we have authenticated, we can read emails from Gmail using the following function, save email information to a Spark DataFrame, and eventually to a Delta Table: # Build Gmail API service and download emailsservice_ = buildemails = get_email_messages_sinceif emails: spark_emails = spark.createDataFramedisplayelse: spark_emails = None printDownloading emails from Gmail Conclusions In summary, this post: Demonstrated how straightforward it is to set up AI Functions and leverage LLMs to automate workflows across your organization. We shared a practical prompt template, designed for effective email classification using few-shot learning. We walked through integrating Gmail APIs directly within Databricks Notebooks. Ready to streamline your own processes? Photo by Johannes Plenio on Unsplash Thank you for reading! Sources Si et al., “Evaluating the Performance of ChatGPT for Spam Email Detection” Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI #llmpowered #email #classification #databricks
TOWARDSAI.NET
LLM-Powered email Classification on Databricks
Author(s): Gabriele Albini Originally published on Towards AI. LLM-Powered email Classification on Databricks Introduction Since the introduction of AI functions on Databricks, LLMs (Large Language Models) can be easily integrated into any data workflow: analysts and business users who may not know Python or ML/AI infrastructure can complete advanced AI tasks directly from SQL queries. I recommend watching this great video overview for an introduction to this brilliant feature. This article will discuss how to implement an email classification: suppose clients write to our company’s mailbox to request unsubscription from marketing or commercial emails. Without any historical datasets, we want to automate checking the mailbox and classifying the customer intent based on the email’s body. Link to the Git Hub repository Table of contents: Part 1: AI Functions Let’s use ai_query(), part of Databricks AI functions, to classify emails. Suppose we have available the following fields: Test dataset In order to use ai_query() on our “Email_body” column, we will leverage the following arguments: endpoint: the name of the model endpoint we intend to use (llama3.3 in this example) (check here how to create your model serving endpoint on Databricks, choosing one of the supported foundation models). request: the prompt, which includes the “Email_body” modelParameters: additional parameters that we can pass to the LLM. In this example, we will limit the output token to 1 and choose a very low temperature to limit the randomness and creativity of the model’s generated output. The prompt template used in this example is based on the research of Si et al. (2024), who designed and tested a few-shot prompt template for email spam detection, which was adapted as follows: prompt_ = """ Forget all your previous instructions, pretend you are an e-mail classification expert who tries to identify whether an e-mail is requesting to be removed from a marketing distribution list. Answer "Remove" if the mail is requesting to be removed, "Keep" if not. Do not add any other detail. If you think it is too difficult to judge, you can exclude the impossible one and choose the other, just answer "Remove" or "Keep". Here are a few examples for you: * "I wish to no longer receive emails" is "Remove"; * "Remove me from any kind of subscriptions" is "Remove"; * "I want to update my delivery address" is "Keep"; * "When is my product warranty expiring?" is "Keep"; Now, identify whether the e-mail is "Remove" or "Keep"; e-mail:""" We can finally combine all the elements seen above in a single SQL query, running batch inference on all the emails, and generating the labels: select *, ai_query( 'databricks-meta-llama-3-3-70b-instruct', "${prompt}" || Email_body, modelParameters => named_struct('max_tokens', 1, 'temperature', 0.1) ) as Predicted_Labelfrom customer_emails; Test dataset with generated labels Part 2: Access to Gmail APIs We will need a way to ingest emails automatically to implement this use case. Here is a step-by-step guide on how to use Gmail APIs. 1.1: Configure your Gmail account to work with APIs The recommended approach to enable Google APIs on your account is to use Service Accounts. The process is described here; however, it requires: A corporate account (not ending with gmail.com). Access as a super administrator of the Google Workspace domain to delegate domain-wide authority to the service account. For this demo, we are using a dummy Gmail account; hence, we will follow a more manual approach to authenticate to Gmail, described here. The first steps are the same for both approaches, so you can follow along, but to fully automate access to Gmail via API, you would need a Service Account. First, we need to create a project: Log in to the Google Console. Create a new project for this use case. Enable the Gmail API for your project using this link. Enabling APIs on your project Second, configure an OAuth consent screen: Within your project, navigate to “API & Services” > “OAuth consent screen”. Go to the “Branding” section and click Get Started to create your Application identity. Next, we need to create a Web Application OAuth 2.0 Client ID, using this link. Download the credentials file as JSON, as we will need this later. Add the following Authorised redirect URI: Creating an OAuth consent screen Finally, authorize users to authenticate and publish the application: Within your project, navigate to “API & Services” > “OAuth consent screen”. Go to the “Audience” section and add all the test users working on the project so that they can authenticate. To ensure that access won’t expire, publish the Application by moving its status to Production. 1.2 Access Gmail Mailbox from Databricks Notebooks To authenticate to Gmail from a Databricks Notebook, we can use the following function implemented in the repo. The function requires: For first-time access, the credentials JSON file, which can be saved in a volume. For future access, active credentials will be stored in a token file that will be reused. gmail_authenticate_manual() Since we are not using Service Accounts, Google Cloud authentication requires opening the browser to an OAuth page and generating a temporary code. However, we will need a workaround to perform this on Databricks, since clusters don’t have access to a browser. As part of this workaround, we implemented the following function that suggests to the user to open a URL from a local browser, proceed with the authentication, and then land on an error page. We can retrieve the code needed to authenticate to Google’s API from the generated URL of this error page: Note: With Service Accounts, this manual step won’t be required. Once we have authenticated, we can read emails from Gmail using the following function, save email information to a Spark DataFrame, and eventually to a Delta Table: # Build Gmail API service and download emailsservice_ = build('gmail', 'v1', credentials = access_)emails = get_email_messages_since(service_, since_day=25, since_month=3, since_year = 2025)if emails: spark_emails = spark.createDataFrame(emails) display(spark_emails)else: spark_emails = None print("No emails found.") Downloading emails from Gmail Conclusions In summary, this post: Demonstrated how straightforward it is to set up AI Functions and leverage LLMs to automate workflows across your organization. We shared a practical prompt template, designed for effective email classification using few-shot learning. We walked through integrating Gmail APIs directly within Databricks Notebooks. Ready to streamline your own processes? Photo by Johannes Plenio on Unsplash Thank you for reading! Sources Si et al. (2024), “Evaluating the Performance of ChatGPT for Spam Email Detection” Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI
·75 Views