After a week of DeepSeek freakout, doubts and mysteries remain
www.fastcompany.com
Welcome toAI Decoded,Fast Companys weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every weekhere.After a week of DeepSeek freakout, doubts and mysteries remainThe Chinese company DeepSeek sent shockwaves through the AI and investment communities this week as people learned that it created state-of-the-art AI models using far less computing power and capital than anyone thought possible. The company then showed its work in published research papers and by making its models available to other developers. This raised two burning questions: Has the U.S. lost its edge in the AI race? And will we really need as many expensive AI chips as weve been told?How much computing power did DeepSeek really use?DeepSeek claimed it trained its most recent model for about $5.6 million, and without the most powerful AI chips (the U.S. barred Nvidia from selling its powerful H100 graphics processing units in China, so DeepSeek made do with 2,048 H800s). But the information it provided in research papers about its costs and methods is incomplete. The $5 million refers to the final training run of the system, points out Oregon State University AI/robotics professor Alan Fern in a statement to Fast Company. In order to experiment with and identify a system configuration and mix of tricks that would result in a $5M training run, they very likely spent orders of magnitude more. He adds that based on the available information its impossible to replicate DeepSeeks $5.6 million training run.How exactly did DeepSeek do so much with so little?DeepSeek appears to have pulled off some legitimate engineering innovations to make its models less expensive to train and run. But the techniques it used, such as Mixture-of-experts architecture and chain-of-thought reasoning, are well-known in the AI world and generally used by all the major AI research labs.The innovations are described only at a high level in the research papers, so its not easy to see how DeepSeek put its own spin on them. Maybe there was one main trick or maybe there were lots of things that were just very well engineered all over, says Robert Nishihara, cofounder of the AI run-time platform Anyscale. Many of DeepSeeks innovations grew from having to use less powerful GPUs (Nvidia H800s instead of H100s) because of the Biden Administrations chip bans.Being resource limited forces you to come up with new innovative efficient methods, Nishihara says. Thats why grad students come up with a lot of interesting stuff with far less resourcesits just a different mindset.What innovation is likely to influence other AI labs the most?As Anthropics Jack Clark points out in a recent blog post, DeepSeek was able to use a large model, DeepSeek-V3 (~700K parameters), to teach a smaller R1 model to be a reasoning model (like OpenAIs o1) with a surprisingly small amount of training data and no human supervision. V3 generated 800,000 annotated text samples showing questions and the chains of thought it followed to answer them, Clark writes.DeepSeek showed that after processing the samples for a time the smaller R1 model spontaneously began to think about its answers, explains Andrew Jardine, head of go-to-market at Adaptive ML. You just say heres my problemcreate some answers to that problem and then based on the answers that are correct or incorrect, you give it a reward [a binary code that means good] and say try again, and eventually it starts going Im not sure; let me try this new angle or approach or that approach wasnt the right one, let me try this other one and it just starts happening on its own. Theres some real magic there. DeepSeeks researchers called it an aha moment.Why havent U.S. AI companies already been doing what DeepSeek did?How do you know they havent? asks Jardine. We dont have visibility into exactly the techniques that are being used by Google and OpenAI; we dont know exactly how efficient the training approaches are. Thats because those U.S. AI labs dont describe their techniques in research papers and release the weights of their models, as DeepSeek did. Theres a lot of reason to believe they do have at least some of these efficiency methods already. It should come as no surprise if OpenAIs next reasoning model, o3, is less compute-intensive, more cost-effective, and faster than DeepSeeks models.Is Nvidia stock still worth 50X of earnings?Nvidia provides up to 95% percent of the advanced AI chips used to research, train, and run frontier AI models. The companys stock lost 17% of its value on Monday when investors interpreted DeepSeeks research results as a signal that fewer expensive Nvidia chips would be needed in the future than previously anticipated. Metas Yann LeCun says Mondays sell-off grew from a major misunderstanding about AI infrastructure investments.The Turing Award winner says that while DeepSeek showed that frontier models could be trained with fewer GPUs, the main job of the chips in the future will be during inferencethe reasoning work the model does when its responding to a users question or problem. (Actually, DeepSeek did find a novel way of compressing context window data so that less compute is needed during inference.) He says that as AI systems process more data, and more kinds of data, during inference, the computing costs will continue to increase. As of Wednesday night, the stock has not recovered.Did DeepSeek use OpenAI models to help train its own models?Nobody knows for sure, and disagreement remains among AI experts on the question. The Financial Times reports Wednesday that OpenAI believes it has seen evidence that DeepSeek did use content generated by OpenAI models to train its own models, which would violate OpenAIs terms. Distillation refers to saving time and money by feeding the outputs of larger, smarter models into smaller models to teach them how to handle specific tasks.Weve just experienced a moment when the open-source world produced some models that equaled the current closed-source offerings in performance. The real cost of developing the DeepSeek models remains an open question. But in the long run the AI companies that can marshal the most cutting-edge chips and infrastructure will very likely have the advantage as fewer performance gains can be wrung from pretraining and more computing power is applied at inference, when the AI must reason toward its answers. So the answers to the two burning questions raised above are probably not and likely yes.The DeepSeek breakthroughs could be good news for AppleThe problem of finding truly useful ways of using AI in real life is becoming more pressing as the cost of developing models and building infrastructure mounts. One big hope is that powerful AI models will become so small and efficient that they can run on devices like smartphones and AR glasses. DeepSeeks engineering breakthroughs to create cheaper and less compute-hungry models may breathe new life into research on small models that live on edge devices.Dramatically decreased memory requirements for inference make edge inference much more viable, and Apple has the best hardware for exactly that, says tech analyst Ben Thompson in a recent Stratechery newsletter. Apple Silicon uses unified memory, which means that the CPU, GPU, and NPU (neural processing unit) have access to a shared pool of memory; this means that Apples high-end hardware actually has the best consumer chip for inference.Stability AI founder Emad Mostaque says that reasoning models like OpenAIs o1 and DeepSeeks R1 will run on smartphones by next year, performing PhD-level tasks with only 20 watts of electricityequivalent to the human brain.OpenAI releases an AI agent for government workersOpenAI this week announced a new AI tool called ChatGPT Gov thats designed specifically for use by U.S. government agencies. Since sending sensitive government data out through an API to an OpenAI server presents obvious privacy and security problems, ChatGPT Gov can be hosted within an agencys own private cloud environment.[W]e see enormous potential for these tools to support the public sector in tackling complex challengesfrom improving public health and infrastructure to strengthening national security, OpenAI writes in a blog post. The Biden Administration in 2023 directed government agencies to find productive and safe ways to use new generative AI technology (Trump recently revoked the executive order).The Department of Homeland Security, for example, built its own AI chatbot, which is now used by thousands of DHS workers. OpenAI says 90,000 users within federal, state, and local government offices have already used the companys ChatGPT Enterprise product.More AI coverage from Fast Company:Microsoft posts 10% growth for Q4 as it plans to spend $80 billion on AIAI assistants for lawyers are a booming businesswith big risksWhy we need to leverage AI to address global food insecurityAlibaba rolls out AI model, claiming its better than DeepSeek-V3Want exclusive reporting and trend analysis on technology, business innovation, future of work, and design? Sign up for Fast Company Premium.
0 Comentários
·0 Compartilhamentos
·51 Visualizações