OpenAI hits back at DeepSeek with o3-mini reasoning model
arstechnica.com
That's 2 more than o1-mini OpenAI hits back at DeepSeek with o3-mini reasoning model OpenAI says faster, more accurate STEM-focused model will be free to all users. Kyle Orland Jan 31, 2025 3:06 pm | 3 Credit: Benj Edwards / OpenAI Credit: Benj Edwards / OpenAI Story textSizeSmallStandardLargeWidth *StandardWideLinksStandardOrange* Subscribers only Learn moreOver the last week, OpenAI's place atop the AI model hierarchy has been heavily challenged by Chinese model DeepSeek. Today, OpenAI struck back with the public release of o3-mini, its latest simulated reasoning model and the first of its kind the company will offer for free to all users without a subscription.First teased last month, OpenAI brags in today's announcement that o3-mini "advances the boundaries of what small models can achieve." Like September's o1-mini before it, the model has been optimized for STEM functions and shows "particular strength in science, math, and coding" despite lower operating costs and latency than o1-mini, OpenAI says.Harder, better, faster, strongerUsers are able to choose from three different "reasoning effort options" when using o3-mini, allowing them to fine-tune a balance between latency and accuracy depending on the task. The lowest of these reasoning levels generally shows accuracy levels comparable to o1-mini in math and coding benchmarks, according to OpenAI, while the highest matches or surpasses the full-fledged o1 model in the same tests. The reasoning effort chosen can have a sizable impact on the accuracy of the o3 model, in OpenAI's tests. Credit: OpenAI The reasoning effort chosen can have a sizable impact on the accuracy of the o3 model, in OpenAI's tests. Credit: OpenAI OpenAI says testers reported a 39 percent reduction in "major errors" when using o3-mini, compared to o1-mini, and preferred the o3-mini responses 56 percent of the time. That's despite the medium version of o3-mini offering a 24 percent faster response time than o1-mini on averagedown from 10.16 seconds to 7.7 seconds.OpenAI also promises that o3-mini features an "early prototype" of a search function that allows it to "find up-to-date answers with links to relevant web sources" when appropriate. OpenAI says the o3-mini model significantly improves on its previous models when it comes to coding capabilities. Credit: OpenAI OpenAI says the o3-mini model significantly improves on its previous models when it comes to coding capabilities. Credit: OpenAI Subscribers to OpenAI's Plus, Team, or Pro tiers will see o3-mini replace o1-mini in the model options starting today. Those on a Plus and Team subscription will be limited to 150 messages a day on the new model, up from a 50 message daily limit for o1-mini.Users without a paid subscription will also have access to the model by selecting "Reason" from a drop-down menu in the ChatGPT interface, the first time the company has made a simulated reasoning model available to free users.But can it teach itself?Alongside today's announcement post, an accompanying o3-mini system card goes into more details on the testing and safety mitigations that went into o3-mini before deployment. This included testing the models on topics ranging from chemical and biological weapons to evaluations of persuasion capabilities that were judged "similarly persuasive to human-written text on the same topics."But OpenAI warns that the o3-mini model "still performs poorly on evaluations designed to test real-world ML research capabilities relevant for self-improvement," meaning OpenAI isn't yet approaching a self-improving AI explosion. The o3-mini model also scored a dismal score of 0 percent on a test meant to measure "if and when models can automate the job of an OpenAI research engineer" in terms of coding.The system was trained on "a mix of publicly available data and custom datasets developed in-house," OpenAI says, with "rigorous filtering to maintain data quality and mitigate potential risks."Kyle OrlandSenior Gaming EditorKyle OrlandSenior Gaming Editor Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper. 3 Comments
0 Reacties ·0 aandelen ·73 Views