Researchers created an open rival to OpenAIs o1 reasoning model for under $50
techcrunch.com
AI researchers at Stanford and the University of Washington were able to train an AI reasoning model for under $50 in cloud compute credits, according to a new research paper released last Friday.The model, known as s1, performs similarly to cutting-edge reasoning models, such as OpenAIs o1 and DeepSeeks R1, on tests measuring math and coding abilities. The s1 model is available on GitHub, along with the data and code used to train it.The team behind s1 said they started with an off-the-shelf base model, then fine-tuned it through distillation, a process to extract the reasoning capabilities from another AI model by training on its answers. The researchers said s1 is distilled from one of Googles reasoning models, Gemini 2.0 Flash Thinking Experimental. Distillation is the same approach Berkeley researchers used to create an AI reasoning model for around $450 last month.To some, the idea that a few researchers without millions of dollars behind them can still innovate in the AI space is exciting. But s1 raises real questions about the commoditization of AI models. Wheres the moat if someone can closely replicate a multi-million-dollar model with relative pocket change?Unsurprisingly, big AI labs arent happy. OpenAI has accused DeepSeek of improperly harvesting data from its API for the purposes of model distillation.The researchers behind s1 were looking to find the simplest approach to achieve strong reasoning performance and test-time scaling, or allowing an AI model to think more before it answers a question. These were a few of thebreakthroughsin OpenAIs o1, which DeepSeek and other AI labs have tried to replicate through various techniques.The s1 paper suggests that reasoning models can be distilled with a relatively small dataset using a process called supervised fine-tuning (SFT), in which an AI model is explicitly instructed to mimic certain behaviors in a dataset. SFT tends to be cheaper than the large-scale reinforcement learning method that DeepSeek employed to train its competitor to OpenAIs o1 model, R1.Google offers free access to Gemini 2.0 Flash Thinking Experimental, albeit with daily rate limits, via its Google AI Studio platform. Googles terms forbid reverse-engineering its models to develop services that compete with the companys own AI offerings, however. Weve reached out to Google for comment.S1 is based on a small, off-the-shelf AI model from Alibaba-owned Chinese AI lab Qwen, which is available to download for free. To train s1, the researchers created a dataset of just 1,000 carefully curated questions, paired with answers to those questions, as well as the thinking process behind each answer from Googles Gemini 2.0 Flash Thinking Experimental.After training s1, which took less than 30 minutes using 16 Nvidia H100 GPUs, s1 achieved strong performance on certain AI benchmarks, according to the researchers. Niklas Muennighoff, a Stanford researcher who worked on the project, told TechCrunch he could rent the necessary compute today for about $20.The researchers used a nifty trick to get s1 to double-check its work and extend its thinking time: They told it to wait. Adding the word wait during s1s reasoning helped the model arrive at slightly more accurate answers, per the paper.In 2025, Meta, Google, and Microsoft plan to invest hundreds of billions of dollars in AI infrastructure, which will partially go toward training next-generation AI models.That level of investment may still be necessary to push the envelope of AI innovation. Distillation has shown to be a good method for cheaply re-creating an AI models capabilities, but it doesnt create new AI models vastly better than whats available today.
0 Comentários ·0 Compartilhamentos ·67 Visualizações