Elenco Messaggi | CGShares

ha condiviso un link

2025-04-22 19:18:02 -

WWW.THEVERGE.COM

Google is paying Samsung an ‘enormous sum’ to preinstall Gemini

Testimony this week from Google’s antitrust trial shows that Google gives Samsung an “enormous sum of money” each month to preinstall the Gemini AI app on Samsung devices, reports Bloomberg. Now that Judge Amit Mehta has ruled Google’s search engine is an illegal monopoly, its lawyers are sparring with the DOJ over how severe a potential penalty should be.Peter Fitzgerald, Google’s vice president of platforms and device partnerships, testified on Monday that Google’s payments to Samsung started in January. That’s after Google was found to have violated antitrust law, partially due to similar arrangements with Apple, Samsung, and other companies for search. When Samsung launched the Galaxy S25 series in January, it also added Gemini as the default AI assistant when long-pressing the power button, with its own Bixby assistant taking a back seat.The Information reports that today Fitzgerald testified that other companies had pitched Samsung on deals to preinstall their AI assistant apps, including Perplexity and Microsoft.The Information writes.According to Bloomberg, Fitzgerald said the Gemini deal is a two-year agreement that, along with fixed monthly payments, sees Google giving Samsung a percentage of its ad revenue from the Gemini app. Department of Justice (DOJ) lawyer David Dahlquist called the fixed monthly payment an “enormous sum,” Bloomberg says. Exactly how enormous isn’t known.If the DOJ has its way, the results of these hearings could mean Google is forbidden from striking default placement deals in the future, would sell Chrome, and would be forced to license the vast majority of the data that powers Google Search. Google has argued that it should only have to give up the default placement deals.See More:

0 Commenti 0 condivisioni 32 Views

ha condiviso un link

2025-04-22 19:18:07 -

TOWARDSDATASCIENCE.COM

AI Agents Processing Time Series and Large Dataframes

Intro Agents are AI systems, powered by LLMs, that can reason about their objectives and take actions to achieve a final goal. They are designed not just to respond to queries, but to orchestrate a sequence of operations, including processing data (i.e. dataframes and time series). This ability unlocks numerous real-world applications for democratizing access to data analysis, such as automating reporting, no-code queries, support on data cleaning and manipulation. Agents that can interact with dataframes in two different ways: with natural language — the LLM reads the table as a string and tries to make sense of it based on its knowledge base by generating and executing code — the Agent activates tools to process the dataset as an object. So, by combining the power of NLP with the precision of code execution, AI Agents enable a broader range of users to interact with complex datasets and derive insights. In this tutorial, I’m going to show how to process dataframes and time series with AI Agents. I will present some useful Python code that can be easily applied in other similar cases (just copy, paste, run) and walk through every line of code with comments so that you can replicate this example (link to full code at the end of the article). Setup Let’s start by setting up Ollama (pip install ollama==0.4.7), a library that allows users to run open-source LLMs locally, without needing cloud-based services, giving more control over data privacy and performance. Since it runs locally, any conversation data does not leave your machine. First of all, you need to download Ollama from the website. Then, on the prompt shell of your laptop, use the command to download the selected LLM. I’m going with Alibaba’s Qwen, as it’s both smart and light. After the download is completed, you can move on to Python and start writing code. import ollama llm = "qwen2.5" Let’s test the LLM: stream = ollama.generate(model=llm, prompt='''what time is it?''', stream=True) for chunk in stream: print(chunk['response'], end='', flush=True) Time Series A time series is a sequence of data points measured over time, often used for analysis and forecasting. It allows us to see how variables change over time, and it’s used to identify trends and seasonal patterns. I’m going to generate a fake time series dataset to use as an example. import pandas as pd import numpy as np import matplotlib.pyplot as plt ## create data np.random.seed(1) #<--for reproducibility length = 30 ts = pd.DataFrame(data=np.random.randint(low=0, high=15, size=length), columns=['y'], index=pd.date_range(start='2023-01-01', freq='MS', periods=length).strftime('%Y-%m')) ## plot ts.plot(kind="bar", figsize=(10,3), legend=False, color="black").grid(axis='y') Usually, time series datasets have a really simple structure with the main variable as a column and the time as the index. Before transforming it into a string, I want to make sure that everything is placed under a column, so that we don’t lose any piece of information. dtf = ts.reset_index().rename(columns={"index":"date"}) dtf.head() Then, I shall change the data type from dataframe to dictionary. data = dtf.to_dict(orient='records') data[0:5] Finally, from dictionary to string. str_data = "\n".join([str(row) for row in data]) str_data Now that we have a string, it can be included in a prompt that any language model is able to process. When you paste a dataset into a prompt, the LLM reads the data as plain text, but can still understand the structure and meaning based on patterns seen during training. prompt = f''' Analyze this dataset, it contains monthly sales data of an online retail product: {str_data} ''' We can easily start a chat with the LLM. Please note that, right now, this is not an Agent as it doesn’t have any Tool, we’re just using the language model. While it doesn’t process numbers like a computer, the LLM can recognize column names, time-based patterns, trends, and outliers, especially with smaller datasets. It can simulate analysis and explain findings, but it won’t perform precise calculations independently, as it’s not executing code like an Agent. messages = [{"role":"system", "content":prompt}] while True: ## User q = input(' >') if q == "quit": break messages.append( {"role":"user", "content":q} ) ## Model agent_res = ollama.chat(model=llm, messages=messages, tools=[]) res = agent_res["message"]["content"] ## Response print(" >", f"\x1b[1;30m{res}\x1b[0m") messages.append( {"role":"assistant", "content":res} ) The LLM recognizes numbers and understands the general context, the same way it might understand a recipe or a line of code. As you can see, using LLMs to analyze time series is great for quick and conversational insights. Agent LLMs are good for brainstorming and lite exploration, while an Agent can run code. Therefore, it can handle more complex tasks like plotting, forecasting, and anomaly detection. So, let’s create the Tools. Sometimes, it can be more effective to treat the “final answer” as a Tool. For example, if the Agent does multiple actions to generate intermediate results, the final answer can be thought of as the Tool that integrates all of this information into a cohesive response. By designing it this way, you have more customization and control over the results. def final_answer(text:str) -> str: return text tool_final_answer = {'type':'function', 'function':{ 'name': 'final_answer', 'description': 'Returns a natural language response to the user', 'parameters': {'type': 'object', 'required': ['text'], 'properties': {'text': {'type':'str', 'description':'natural language response'}} }}} final_answer(text="hi") Then, the coding Tool. import io import contextlib def code_exec(code:str) -> str: output = io.StringIO() with contextlib.redirect_stdout(output): try: exec(code) except Exception as e: print(f"Error: {e}") return output.getvalue() tool_code_exec = {'type':'function', 'function':{ 'name': 'code_exec', 'description': 'Execute python code. Use always the function print() to get the output.', 'parameters': {'type': 'object', 'required': ['code'], 'properties': { 'code': {'type':'str', 'description':'code to execute'}, }}}} code_exec("from datetime import datetime; print(datetime.now().strftime('%H:%M'))") Moreover, I shall add a couple of utils functions for Tool usage and to run the Agent. dic_tools = {"final_answer":final_answer, "code_exec":code_exec} # Utils def use_tool(agent_res:dict, dic_tools:dict) -> dict: ## use tool if "tool_calls" in agent_res["message"].keys(): for tool in agent_res["message"]["tool_calls"]: t_name, t_inputs = tool["function"]["name"], tool["function"]["arguments"] if f := dic_tools.get(t_name): ### calling tool print(' >', f"\x1b[1;31m{t_name} -> Inputs: {t_inputs}\x1b[0m") ### tool output t_output = f(**tool["function"]["arguments"]) print(t_output) ### final res res = t_output else: print(' >', f"\x1b[1;31m{t_name} -> NotFound\x1b[0m") ## don't use tool if agent_res['message']['content'] != '': res = agent_res["message"]["content"] t_name, t_inputs = '', '' return {'res':res, 'tool_used':t_name, 'inputs_used':t_inputs} When the Agent is trying to solve a task, I want it to keep track of the Tools that have been used, the inputs that it tried, and the results it gets. The iteration should stop only when the model is ready to give the final answer. def run_agent(llm, messages, available_tools): tool_used, local_memory = '', '' while tool_used != 'final_answer': ### use tools try: agent_res = ollama.chat(model=llm, messages=messages, tools=[v for v in available_tools.values()]) dic_res = use_tool(agent_res, dic_tools) res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"] ### error except Exception as e: print(" >", e) res = f"I tried to use {tool_used} but didn't work. I will try something else." print(" >", f"\x1b[1;30m{res}\x1b[0m") messages.append( {"role":"assistant", "content":res} ) ### update memory if tool_used not in ['','final_answer']: local_memory += f"\nTool used: {tool_used}.\nInput used: {inputs_used}.\nOutput: {res}" messages.append( {"role":"assistant", "content":local_memory} ) available_tools.pop(tool_used) if len(available_tools) == 1: messages.append( {"role":"user", "content":"now activate the tool final_answer."} ) ### tools not used if tool_used == '': break return res In regard to the coding Tool, I’ve noticed that Agents tend to recreate the dataframe at every step. So I will use a memory reinforcement to remind the model that the dataset already exists. A trick commonly used to get the desired behaviour. Ultimately, memory reinforcements help you to get more meaningful and effective interactions. # Start a chat messages = [{"role":"system", "content":prompt}] memory = ''' The dataset already exists and it's called 'dtf', don't create a new one. ''' while True: ## User q = input(' >') if q == "quit": break messages.append( {"role":"user", "content":q} ) ## Memory messages.append( {"role":"user", "content":memory} ) ## Model available_tools = {"final_answer":tool_final_answer, "code_exec":tool_code_exec} res = run_agent(llm, messages, available_tools) ## Response print(" >", f"\x1b[1;30m{res}\x1b[0m") messages.append( {"role":"assistant", "content":res} ) Creating a plot is something that the LLM alone can’t do. But keep in mind that even if Agents can create images, they can’t see them, because after all, the engine is still a language model. So the user is the only one who visualises the plot. The Agent is using the library statsmodels to train a model and forecast the time series. Large Dataframes LLMs have limited memory, which restricts how much information they can process at once, even the most advanced models have token limits (a few hundred pages of text). Additionally, LLMs don’t retain memory across sessions unless a retrieval system is integrated. In practice, to effectively work with large dataframes, developers often use strategies like chunking, RAG, vector databases, and summarizing content before feeding it into the model. Let’s create a big dataset to play with. import random import string length = 1000 dtf = pd.DataFrame(data={ 'Id': [''.join(random.choices(string.ascii_letters, k=5)) for _ in range(length)], 'Age': np.random.randint(low=18, high=80, size=length), 'Score': np.random.uniform(low=50, high=100, size=length).round(1), 'Status': np.random.choice(['Active','Inactive','Pending'], size=length) }) dtf.tail() I’ll add a web-searching Tool, so that, with the ability to execute Python code and search the internet, a general-purpose AI gains access to all the available knowledge and can make data-driven decisions. In Python, the easiest way to create a web-searching Tool is with the famous private browser DuckDuckGo (pip install duckduckgo-search==6.3.5). You can directly use the original library or import the LangChain wrapper (pip install langchain-community==0.3.17). from langchain_community.tools import DuckDuckGoSearchResults def search_web(query:str) -> str: return DuckDuckGoSearchResults(backend="news").run(query) tool_search_web = {'type':'function', 'function':{ 'name': 'search_web', 'description': 'Search the web', 'parameters': {'type': 'object', 'required': ['query'], 'properties': { 'query': {'type':'str', 'description':'the topic or subject to search on the web'}, }}}} search_web(query="nvidia") In total, the Agent now has 3 tools. dic_tools = {'final_answer':final_answer, 'search_web':search_web, 'code_exec':code_exec} Since I can’t add the full dataframe in the prompt, I shall feed only the first 10 rows so that the LLM can understand the general context of the dataset. Additionally, I will specify where to find the full dataset. str_data = "\n".join([str(row) for row in dtf.head(10).to_dict(orient='records')]) prompt = f''' You are a Data Analyst, you will be given a task to solve as best you can. You have access to the following tools: - tool 'final_answer' to return a text response. - tool 'code_exec' to execute Python code. - tool 'search_web' to search for information on the internet. If you use the 'code_exec' tool, remember to always use the function print() to get the output. The dataset already exists and it's called 'dtf', don't create a new one. This dataset contains credit score for each customer of the bank. Here's the first rows: {str_data} ''' Finally, we can run the Agent. messages = [{"role":"system", "content":prompt}] memory = ''' The dataset already exists and it's called 'dtf', don't create a new one. ''' while True: ## User q = input(' >') if q == "quit": break messages.append( {"role":"user", "content":q} ) ## Memory messages.append( {"role":"user", "content":memory} ) ## Model available_tools = {"final_answer":tool_final_answer, "code_exec":tool_code_exec, "search_web":tool_search_web} res = run_agent(llm, messages, available_tools) ## Response print(" >", f"\x1b[1;30m{res}\x1b[0m") messages.append( {"role":"assistant", "content":res} ) In this interaction, the Agent used the coding Tool properly. Now, I want to make it utilize the other tool as well. At last, I need the Agent to put together all the pieces of information obtained so far from this chat. Conclusion This article has been a tutorial to demonstrate how to build from scratch Agents that process time series and large dataframes. We covered both ways that models can interact with the data: through natural language, where the LLM interprets the table as a string using its knowledge base, and by generating and executing code, leveraging tools to process the dataset as an object. Full code for this article: GitHub I hope you enjoyed it! Feel free to contact me for questions and feedback, or just to share your interesting projects. Let’s Connect The post AI Agents Processing Time Series and Large Dataframes appeared first on Towards Data Science.

0 Commenti 0 condivisioni 45 Views