Anthropic launches a new AI model that thinks as long as you want

@TechCrunch compartilhou um link

2025-02-24 18:35:39 ·

Anthropic is releasing a new frontier AI model called Claude 3.7 Sonnet, which the company designed to think about questions for as long as users want it to.Anthropic calls Claude 3.7 Sonnet the industrys first hybrid AI reasoning model, because its a single model that can give both real-time answers and more considered, thought-out answers to questions. Users can choose whether to activate the AI models reasoning abilities, which prompt Claude 3.7 Sonnet to think for a short or long period of time.The model represents Anthropics broader effort to simplify the user experience around its AI products. Most AI chatbots today have a daunting model picker that forces users to choose from several different options that vary in cost and capability. Labs like Anthropic would rather you not have to think about it ideally, one model does all the work.Claude 3.7 Sonnet is rolling out to all users and developers on Monday, Anthropic said, but only users paying for Anthropics premium Claude chatbot plans will get access to the models reasoning features. Free Claude users will get the standard, non-reasoning version of Claude 3.7 Sonnet, which Anthropic claims outperforms its previous frontier AI model, Claude 3.5 Sonnet. (Yes, the company skipped a number.)Claude 3.7 Sonnet costs $3 per million input tokens (meaning you could enter roughly 750,000 words, more words than the entire Lord of the Rings series, into Claude for $3) and $15 per million output tokens. That makes it more expensive than OpenAIs o3-mini ($1.10 per 1M input tokens/$4.40 per 1M output tokens) and DeepSeeks R1 ($0.55 per 1M input tokens/$2.19per 1M output tokens), but keep in mind that o3-mini and R1 are strictly reasoning models not hybrids like Claude 3.7 Sonnet.Anthropics new thinking modes Image Credits: AnthropicClaude 3.7 Sonnet is Anthropics first AI model that can reason, a technique many AI labs have turned to as traditional methods of improving AI performance taper off.Reasoning models like o3-mini, R1, Googles Gemini 2.0 Flash Thinking, and xAIs Grok 3 (Think) use more time and computing power before answering questions. The models break problems down into smaller steps, which tends to improve the accuracy of the final answer. Reasoning models arent thinking or reasoning like a human would, necessarily, but their process is modeled after deduction.Eventually, Anthropic would like Claude to figure out how long it should think about questions on its own, without needing users to select controls in advance, Anthropics product and research lead, Diane Penn, told TechCrunch in an interview.Similar to how humans dont have two separate brains for questions that can be answered immediately versus those that require thought, Anthropic wrote in a blog post shared with TechCrunch, we regard reasoning as simply one of the capabilities a frontier model should have, to be smoothly integrated with other capabilities, rather than something to be provided in a separate model.Anthropic says its allowing Claude 3.7 Sonnet to show its internal planning phase through a visible scratch pad. Lee told TechCrunch users will see Claudes full thinking process for most prompts, but that some portions may be redacted for trust and safety purposes.Claudes thinking process in the claude app (Credit: Anthropic)Anthropic says it optimized Claudes thinking modes for real-world tasks, such as difficult coding problems or agentic tasks. Developers tapping Anthropics API can control the budget for thinking, trading speed and cost for quality of answer.On one test to measure real-word coding tasks, SWE-Bench, Claude 3.7 Sonnet was 62.3% accurate, compared to OpenAIs o3-mini model which scored 49.3%. On another test to measure an AI models ability to interact with simulated users and external APIs in a retail setting, TAU-Bench, Claude 3.7 Sonnet scored 81.2%, compared to OpenAIs o1 model which scored 73.5%.Anthropic also says Claude 3.7 Sonnet will refuse to answer questions less often than its previous models, claiming the model is capable of making more nuanced distinctions between harmful and benign prompts. Anthropic says it reduced unnecessary refusals by 45% compared to Claude 3.5 Sonnet. This comes at a time when some other AI labs are rethinking their approach to restricting their AI chatbots answers.In addition to Claude 3.7 Sonnet, Anthropic is also releasing an agentic coding tool called Claude Code. Launching as a research preview, the tool lets developers run specific tasks through Claude directly from their terminal.In a demo, Anthropic employees showed how Claude Code can analyze a coding project with a simple command such as, Explain this project structure. Using plain English in the command line, a developer can modify a codebase. Claude Code will describe its edits as it makes changes, and even test a project for errors or push it to a GitHub repository.Claude Code will initially be available to a limited number of users on a first come first serve basis, an Anthropic spokesperson told TechCrunch.Anthropic is releasing Claude 3.7 Sonnet at a time when AI labs are shipping new AI models at a breakneck pace. Anthropic has historically taken a more methodical, safety-focused approach. But this time, the companys looking to lead the pack.For how long is the question. OpenAI may be close to releasing a hybrid AI model of its own; the companys CEO, Sam Altman, has said itll arrive in months.

0 Comentários ·0 Compartilhamentos ·62 Visualizações

Atualize para o Pro