
AI firms follow DeepSeeks lead, create cheaper models with distillation
arstechnica.com
Monkey see, Monkey do AI firms follow DeepSeeks lead, create cheaper models with distillation Technique uses a "teacher" LLM to train smaller AI systems. Cristina Criddle and Melissa Heikkil, Financial Times Mar 3, 2025 9:36 am | 0 The technique caught attention after DeepSeek used it to build AI models based on open source systems released by competitors Meta and Alibaba Credit: FT montage/Getty The technique caught attention after DeepSeek used it to build AI models based on open source systems released by competitors Meta and Alibaba Credit: FT montage/Getty Story textSizeSmallStandardLargeWidth *StandardWideLinksStandardOrange* Subscribers only Learn moreLeading artificial intelligence firms including OpenAI, Microsoft, and Meta are turning to a process called distillation in the global race to create AI models that are cheaper for consumers and businesses to adopt.The technique caught widespread attention after Chinas DeepSeek used it to build powerful and efficient AI models based on open-source systems released by competitors Meta and Alibaba. The breakthrough rocked confidence in Silicon Valleys AI leadership, leading Wall Street investors to wipe billions of dollars of value from US Big Tech stocks.Through distillation, companies take a large language modeldubbed a teacher modelwhich generates the next likely word in a sentence. The teacher model generates data which then trains a smaller student model, helping to quickly transfer knowledge and predictions of the bigger model to the smaller one.While distillation has been widely used for years, recent advances have led industry experts to believe the process will increasingly be a boon for start-ups seeking cost-effective ways to build applications based on the technology.Distillation is quite magical, said Olivier Godement, head of product for OpenAIs platform. Its the process of essentially taking a very large smart frontier model and using that model to teach a smaller model...very capable in specific tasks that is super cheap and super fast to execute.Large language models such as OpenAIs GPT-4, Googles Gemini and Metas Llama require massive amounts of data and computing power to develop and maintain. While the companies have not revealed precise figures for how much it costs to train large models, it is likely to be hundreds of millions of dollars.Thanks to distillation, developers, and businesses can access these models capabilities at a fraction of the price, allowing app developers to run AI models quickly on devices such as laptops and smartphones.Developers can use OpenAIs platform for distillation, learning from the large language models that underpin products like ChatGPT. OpenAIs largest backer, Microsoft, used GPT-4 to distill its small language family of models Phi as part of a commercial partnership after investing nearly $14 billion into the company.However, the San Francisco-based start-up has said it believes DeepSeek distilled OpenAIs models to train its competitor, a move that would be against its terms of service. DeepSeek has not commented on the claims.While distillation can be used to create high-performing models, experts add they are more limited.Distillation presents an interesting trade-off; if you make the models smaller, you inevitably reduce their capability, said Ahmed Awadallah of Microsoft Research, who said a distilled model can be designed to be very good at summarising emails, for example, but it really would not be good at anything else.David Cox, vice-president for AI models at IBM Research, said most businesses do not need a massive model to run their products, and distilled ones are powerful enough for purposes such as customer service chatbots or running on smaller devices like phones.Any time you can [make it less expensive] and it gives you the right performance you want, there is very little reason not to do it, he added.That presents a challenge to many of the business models of leading AI firms. Even if developers use distilled models from companies like OpenAI, they cost far less to run, are less expensive to create, and, therefore, generate less revenue. Model-makers like OpenAI often charge less for the use of distilled models as they require less computational load.Yet, OpenAIs Godement argued that large language models will still be required for high intelligence and high stakes tasks where businesses are willing to pay more for a high level of accuracy and reliability. He added that large models will also be needed to discover new capabilities that can then be distilled into smaller ones.Still, the company aims to prevent its large models from being distilled to train a competitor. OpenAI has teams monitoring usage and can remove access to users it suspects are generating vast amounts of data to export and train a rival, as it has apparently done with accounts it believes were linked to DeepSeek. Yet much of this action happens retroactively.OpenAI has been trying to protect against distillation for a long time, but it is very hard to avoid it altogether, said Douwe Kiela, chief executive of Contextual AI, a start-up building information retrieval tools for enterprises.Distillation is also a victory for advocates of open models, where the technology is made freely available for developers to build upon. DeepSeek has made its recent models also open for developers.Were going to use [distillation] and put it in our products right away, said Yann LeCun, Metas chief AI scientist. Thats the whole idea of open source. You profit from everyone and everyone elses progress as long as those processes are open.Distillation also means that model-makers can spend billions of dollars to advance the capabilities of AI systems but still face competitors that often catch up quickly, as DeepSeeks recent releases demonstrate. This raises questions about the first-mover advantage in building LLMs when their capabilities can be replicated in a matter of months.In a world where things are moving so fast...you could actually spend a lot of money, doing it the hard way, and then the rest of the field is right on your heels, IBMs Cox said. So it is an interesting and tricky business landscape.Additional reporting by Michael Acton in San Francisco. 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.Cristina Criddle and Melissa Heikkil, Financial TimesCristina Criddle and Melissa Heikkil, Financial Times 0 Comments
0 Kommentare
·0 Anteile
·32 Ansichten