Why DeepSeeks AI Model Just Became the Top-Rated App in the U.S.
www.scientificamerican.com
January 27, 20253 min readWhy DeepSeeks AI Model Just Became the Top-Rated App in the U.S.A Chinese start-up has stunned the technology industryand financial marketswith a cheaper, lower-tech AI assistant that matches the state of the artBy Stephanie Pappas edited by Jeanna Bryner Weiquan Lin/Getty ImagesDeepSeeks artificial intelligence assistant made big waves Monday, becoming the top-rated app in the Apple Store and sending tech stocks into a downward tumble. Whats all the fuss about?The Chinese start-up, DeepSeek, surprised the tech industry with a new model that rivals the abilities of OpenAIs most recent modelwith far less investment and using reduced-capacity chips. The U.S. bans exports of state-of-the-art computer chips to China and limits sales of chipmaking equipment. DeepSeek, based in the eastern Chinese city of Hangzhou, reportedly had a stockpile of high-performance Nvidia A100 chips from times prior to the banso its engineers could have used those to develop the model. But in a key breakthrough, the start-up says it instead used much lower-powered Nvidia H800 chips to train the new model, dubbed DeepSeek-R1.Weve seen up to now that the success of large tech companies working in AI was measured in how much money they raised, not necessarily in what the technology actually was, says Ashlesha Nesarikar, the CEO of AI company Plano Intelligence, Inc. I think well be paying a lot more attention to what tech is underpinning these companies different products.On supporting science journalismIf you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.On common AI tests in mathematics and coding, DeepSeek-R1 matched the scores of Open AIs o1 model, according to VentureBeat. U.S. companies dont disclose the cost of training their own large language models (LLMs), the systems that undergird popular chatbots such as ChatGPT. But OpenAI CEO Sam Altman told an audience at MIT in 2023 that training ChatGPT-4 cost over $100 million. DeepSeek-R1 is free for users to download, while the comparable version of ChatGPT costs $200 a month.DeepSeeks $6 million number doesnt necessarily reflect the cost of building a LLM from scratch, Nesarikar says; that cost may represent a fine-tuning of this latest version. Nevertheless, she says, the models improved energy efficiency would make AI more accessible to more people in more industries. The increase in efficiency could be good news when it comes to AIs environmental impact, as the computation cost of generating new data with an LLM is four to five times higher than a typical search engine query.Because it requires less computational power, the cost of running DeepSeek-R1 is a tenth of the cost of similar competitors, says Hanchang Cao, an incoming assistant professor in Information Systems and Operations Management at Emory University. For academic researchers or start-ups, this difference in the cost really means a lot, Cao says.DeepSeek achieved its efficiency in several ways, says Anil Ananthaswamy, author of Why Machines Learn: The Elegant Math Behind Modern AI. The model has 670 billion parameters, or variables it learns from during training, making it the largest open-source large language model yet, Ananthaswamy explains. But the model uses an architecture called mixture of experts so that only a relevant fraction of these parameterstens of billions instead of hundreds of billionsare activated for any given query. This cuts down on computing costs. The DeepSeek LLM also uses a method called multi-head latent attention; instead of predicting an answer word-by-word, it generates multiple words at once.The model further differs from others like o1 in how it reinforces learning during training. While many LLMs have an external critic model that runs alongside them, correcting errors and nudging the LLM toward verified answers, DeepSeek-R1 uses a set of rules internal to the model to teach it which of the possible answers it generates is best. DeepSeek has streamlined that process, Anasthaswamy says.Another important aspect of DeepSeek-R1 is that the company has made the code behind the product open-source, Anasthaswamy says. (The training data remains proprietary.) This means that the companys claims can be checked. If the model is as computationally efficient as DeepSeek claims, he says, it will probably open up new avenues for researchers who use AI in their work to do so more quickly and cheaply. It will also enable more research into the inner workings of LLMs themselves.One of the big things has been this divide that has opened up between academia and industry because academia has been unable to work with these really large models or do research in any meaningful way, Anasthaswamy says. But something like this, its within the reach of academia now, because you have the code.
0 Comentários ·0 Compartilhamentos ·95 Visualizações