TAI #139: LLM Adoption; Anthropic Measures Use Cases. OpenAI API...

@TowardsAI Compartió un vínculo

2025-02-11 22:13:35 ·

TAI #139: LLM Adoption; Anthropic Measures Use Cases. OpenAI API Traffic up 7x in 2024

Author(s): Towards AI Editorial Team Originally published on Towards AI. What happened this week in AI by LouieThis week, Google DeepMind expanded access to Gemini 2.0, OpenAI increased transparency in ChatGPTs reasoning and thinking steps, and Mistral launched its rapid AI assistant app. OpenAI was also subject to a surprise $97bn hostile takeover offer from Elon Musk to acquire OpenAIs assets from its parent Charity (competing with Sam Altmans own proposal to acquire the charitys assets). Meanwhile, we learned ChatGPTs weekly active users tripled in 2024, jumping from 110 million to 350 million, while paid subscribers nearly tripled as well, reaching 15.8 million. OpenAIs API traffic also surged from 200 million to 1.4 billion tokens per minute, reflecting the growing demand for AI-powered workflows. Against this backdrop of accelerating adoption, Anthropics latest study provides the first large-scale empirical measurement of how AI is actually being used across the economy.Anthropic analyzed four million Claude conversations using an LLM agent to directly track how AI is used across different jobs and tasks. The data makes it clear that Claudes AI usage is highly concentrated in software-related professions, which account for 37.2% of all Claude interactions. The next largest category, writing and media occupations, accounts for just 10.3%, with other professional domains trailing further behind. AI use is almost nonexistent in jobs requiring physical labor, such as construction, transportation, and healthcare support. The extent of its use within any given job remains highly uneven. About 36% of occupations show AI being used for at least 25% of their associated tasks, but only 4% have AI usage covering 75% or more of their tasks.The types of tasks AI is being used for reflect its current strengths. Cognitive skills such as reading comprehension, writing, and critical thinking dominate AI-assisted work, whereas physical and managerial skills like installation, equipment maintenance, and negotiation see minimal AI involvement. Writing-heavy professions including technical writing, copywriting, and even archival work account for nearly 50% of AI usage.Across all analyzed conversations, 43% of AI interactions involved full automation, where the model performed a task with little or no human involvement (However, we note these responses have been edited outside the chat window). The remaining 57% of interactions suggested augmentation, where AI was used to refine, iterate, or enhance human work rather than replace it outright. This suggests that, at least for now, AI is complementing work rather than fully automating job roles. Automation was most prevalent in directive tasks, such as formatting documents or generating marketing copy, whereas augmentation dominated coding and debugging workflows, where users iterated with AI to resolve errors and refine solutions.Why should you care?For AI developers and businesses, this study also provides a roadmap for where the next wave of adoption is likely to occur. Software and technical writing tasks are already AI-heavy, but structured professions with lower AI adoption such as finance, legal work, and research are well-positioned for further integration. Meanwhile, the steady increase in automation suggests that customer support, routine business writing, and structured data analysis will likely see further AI-driven efficiency gains. We note that OpenAis API usage grew even faster than its ChatGPT users in 2024, and in part, this reflects the need for extra development and customization of LLMs (by external LLM Developers!) to achieve necessary performance and reliability for many task categories. As AIs role in economic workflows continues to evolve, empirical studies like this one will become increasingly valuable for tracking its real-world impact.Hottest News1. Gemini 2.0 Is Now Available to EveryoneGoogle DeepMind has launched the updated Gemini 2.0 Flash and an experimental Gemini 2.0 Pro, offering improved performance for developers via Google AI Studio and Vertex AI. The release also includes 2.0 Flash-Lite for cost efficiency and reinforces safety with new reinforcement learning techniques, underscoring Geminis enhanced capabilities for multimodal reasoning and coding.2. Elon Musks Consortium Makes $97.4B Bid to Acquire OpenAI, Disrupting Altmans For-Profit Transition PlanA consortium led by Elon Musk made a surprise $97.4 billion offer to purchase all assets of OpenAI, Inc. from the OpenAI parent Charity with funds to be used exclusively to further OpenAI, Inc.s original charitable mission. Recent reports suggest Sam Altman was offering the OpenAI charity a 25% equity stake in a new for-profit OpenAI entity to simplify its complicated structure. OpenAI was then planning to raise $40bn at a $260bn pre-money valuation as a regular for-profit company. OpenAI Charitys assets are very complex ownership over OpenAIs business profits, which become much more material after its investors have been repaid and AGI has been achieved. Elons new higher bid complicates the current plan to transform OpenAi into a for-profit entity.3. OpenAI Now Reveals More of Its o3-Mini Models Thought ProcessOpenAI announced that free and paid users of ChatGPT, the companys AI-powered chatbot platform, will see an updated chain of thought that shows more of the models reasoning steps and how it arrived at answers to questions. According to OpenAI, subscribers to premium ChatGPT plans who use o3-mini in the high reasoning configuration will also see this updated readout.4. Mistral Releases Its AI Assistant on iOS and AndroidMistral is releasing several updates to its AI assistant, Le Chat. In addition to a major web interface upgrade, the company is releasing a mobile app on iOS and Android. The mobile app features the usual chatbot interface. You can query Mistrals AI model and ask follow-up questions in a simple conversation-like interface.5. GitHub Introduced Agent Mode for GitHub CopilotGitHub introduced an agent mode for GitHub Copilot in VS Code. This new agent mode can iterate on its own code, recognize errors, and fix them automatically. It can suggest terminal commands and ask you to execute them. It can also analyze run-time errors with self-healing capabilities.6. AI-Designed Proteins Take On Deadly Snake VenomAI-driven research at the University of Washington created synthetic proteins that neutralize deadly snake venoms. Using NVIDIA GPUs, their AI models swiftly identified effective antitoxin designs. These proteins may offer a low-cost, quickly produced treatment, potentially transforming snakebite care worldwide and offering hope for other medical conditions.Five 5-minute reads/videos to keep you learning1. Open-Source DeepResearchThis article walks you through building an open-source Deep Research alternative to OpenAIs Deep Research. This system browses the web to summarize content and answers questions based on the summary.2. Achieve OpenAI o1-mini Level Reasoning with Open-Source ModelsThe article explores using open-source models to achieve OpenAIs GPT-4 Mini-level reasoning. It examines benchmarking results, techniques like distillation, supervised fine-tuning, and retrieval-augmented generation (RAG), and how, with proper optimizations, open-source LLMs can approach GPT-4-level performance.3. The DeepSeek Effect Why Your Company Needs an AI Usage Policy and How to Create OneThe emergence of free and readily available AI models, like DeepSeek, offers impressive capabilities. However, their aggressive user data collection practices and security test failures are a stark warning. This article explores why this is a huge threat to your companys data, intellectual property, and security compliance and how to navigate it.4. Constitutional Classifiers: Defending Against Universal JailbreaksAnthropics new Constitutional Classifiers method robustly defends AI models against universal jailbreaks, reducing jailbreak success rates from 86% to 4.4% with minimal additional refusal rates and compute overhead. Participants testing the systems robustness failed to achieve universal jailbreaks, showing the classifiers effectiveness in safeguarding AI while maintaining practicality.5. How to Scale Your Model: A Systems View of LLMs on TPUsAuthors from Google DeepMind discuss scaling language models on TPUs, explaining model optimization, hardware communication, and parallelization techniques. They also explore Transformer architecture, TPU workings, and parallelism methods to achieve strong scaling without becoming communication-bound.Repositories & Tools1. GPT Researcher is an autonomous agent designed for comprehensive web and local research on any given task.2. Chatbox is a desktop client for ChatGPT, Claude, and other LLMs, available on Windows, Mac, and Linux.3. Hugging Face open-sourced DABStep, a multi-step reasoning benchmark consisting of 450 tasks. It is designed to evaluate the capabilities of state-of-the-art LLMs and AI agents.Top Papers of The Week1. DeepRAG: Thinking to Retrieval Step by Step for Large Language ModelsDeepRAG enhances retrieval-augmented reasoning by modeling it as a Markov Decision Process, improving accuracy. By iteratively decomposing queries, it strategically decides between retrieving external knowledge and relying on parametric reasoning, thus optimizing retrieval efficiency and minimizing noise. DeepRAG effectively addresses challenges in reasoning and retrieval integration for large language models.2. 1.58-bit FLUXResearchers introduced 1.58-bit FLUX, a method for quantizing the FLUX.1-dev text-to-image model using 1.58-bit weights, enhancing efficiency without sacrificing quality for 1024 x 1024 image generation. Their approach, relying on model self-supervision, achieves a 7.7x storage reduction, a 5.1x inference memory reduction, and improved latency, as verified on GenEval and T2I Compbench benchmarks.3. LIMO: Less is More for ReasoningLIMO challenges conventional beliefs about reasoning in language models by achieving high accuracy in mathematical tasks with minimal training data. Using only 817 samples, it achieves 57.1% on AIME and 94.8% on MATH, outperforming previous models. The Less-Is-More Reasoning Hypothesis suggests complex reasoning emerges from well-pre-trained models and effective cognitive templates.4. Efficient Memory Management for Large Language Model Serving with PagedAttentionThis paper proposes PagedAttention, an attention algorithm inspired by the classical virtual memory and paging techniques in operating systems. Researchers build vLLM, an LLM serving system that achieves near-zero waste in KV cache memory and flexible sharing of KV cache within and across requests to reduce memory usage further.5. SmolLM2: When Smol Goes Big Data-Centric Training of a Small Language ModelSmolLM2, a new small language model with 1.7 billion parameters, achieves superior performance by overtraining on 11 trillion tokens of diverse data, including specialized datasets like FineMath and Stack-Edu. Using an iterative dataset refinement process, SmolLM2 surpasses recent models such as Qwen2.51.5B and Llama3.21B.Quick Links1. Reuters reported that Ilya Sutskevers Safe Superintelligence Inc. is in talks to raise funding at a valuation of at least $20 billion. This is a significant jump from their $1bn raise at a $5bn valuation only 5 months ago. The company has not publicly disclosed any of its research or progress, but it does not plan to release any products before superintelligence.2. The European Union has published guidance on what constitutes an AI system under its new AI Act. Determining whether a particular software system falls within the acts scope will be a key consideration for AI developers with the risk of fines of up to 7% of global annual turnover for breaches.3. Meta allegedly downloaded over 81.7TB of pirated books to train its AI models, according to authors in a lawsuit. The plaintiffs claim Meta used torrent sites to acquire copyrighted material without permission. This adds to growing concerns about AI companies exploiting copyrighted content for model training.4. Sam Altman mentioned that an internal model has now hit the 50th best in the world in Codeforces vs. 175th for o3 in December. This suggests Codeforces Elo has climbed to ~3045 vs. 2727 for o3 (which used parallel samples) and 808 with 4o less than a year ago.It is unclear if this result is a later checkpoint of o3 after more RL steps or if its a coding Agent (Deep Research style specialist via further Reinforcement Learned), but either way, there is no dead-end yet for the new LLM RL paradigm.Whos Hiring in AIAI Workflow Specialist @Power Digital (US/Remote)AI Test Engineer @Noblis (Washington, DC, USA)AI Fullstack Engineer @Insight Global (Portland, OR, USA)R&D GenAI Product Owner @Sanofi Group (Paris, France/Hyrbid)Data Scientist (NLP) @Binance (Remote)Contract Fullstack Developer (Python/React) @The Motley Fool (US/Remote)Research Data Analyst @Stanford University (USA/Remote)Interested in sharing a job opportunity here? Contact [emailprotected].Think a friend would enjoy this too? Share the newsletter and let them join the conversation.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI

0 Commentarios ·0 Acciones ·20 Views

Upgrade to Pro