DeepSeek triggers shock waves for AI giants, but the disruption wont last
www.computerworld.com
Chinese start-up DeepSeek rocked the tech industry Monday after the companys new generative AI (genAI) bot hit Apples App Store and Google Play Store and downloads almost immediately exceeded those of OpenAIs ChatGPT. US AI model and chipmaker stock prices were hit hard by the newcomers arrival; Google, Meta and OpenAI all initially suffered and chipmaker Nvidias stock closed the day down 17%. (The tech heavy Nasdaq exchange lost more than 600 points.)DeepSeeks open-source AI models impact lies in matching US models performance at a fraction of the cost by using compute and memory resources more efficiently. But industry analysts believe investor reaction to DeepSeeks impact on US tech firms and others is being dramatically exaggerated.The market is incorrectly presuming this as a zero-sum game, said Chirag Dekate, a vice president analyst at Gartner Research. Theyre basically saying, Maybe we dont need to build data centers anymore, maybe were not as energy starved because DeepSeek showed us we can do more with less.Giuseppe Sette, president of AI tech firm Reflexivity agreed, stressing that DeepSeek took the market by storm by doing more with less.In layman terms, they activate only the most relevant portions of their model for each query, and that saves money and computation power, Sette said. This shows that with AI, the surprises will keep on coming in the next few years. And even though that might be a bit of a shocker today, its extremely bullish in the long-term because it opens the way for deeper and broader adoption of AI at all scales.In essence, the markets have overlooked that companies such as Google, Meta, and OpenAI can replicate DeepSeeks efficiencies with more mature, scalable AI models that offer better security and privacy.This is not a the sky is falling moment for markets. I think they should take a close look at what this actually is: there are techniques you can implement to more effectively scale your AI models, Dekate said.Another looming problem for the newcomer is that DeepSeek is purported to filter out content that could be viewed as critical of the Chinese Communist government. DeepSeeks release of its R1 and R1-Zero reasoning models on Jan. 20 quickly drew attention for two key aspects:DeepSeek eliminates human feedback in training, speeding up model development, according to AI developer Ben Thompson.DeepSeek requires less memory and compute power, needing fewer GPUs to perform the same tasks as other models.DeepSeek claims its breakthroughs in AI efficiency cost less than $6 million and took less than two months to develop.John Belton, a portfolio manager atGabelli Funds, an asset management firm whose funds include shares of Nvidia, Microsoft, Amazon, and others, said DeepSeeks achievements are real, but some of the companys claims are misleading.No, you cannot recreate DeepSeek with $6 million and the extent to which they distilled existing models (took shortcuts potentially without license) is an unknown, Belton said via email to Computerworld. However, they have made key breakthroughs that show how to reduce training and inference costs.Belton also pointed out that DeepSeek isnt new. Its creator, Liang Wenfeng, a hedge fund manager and AI enthusiast, published a paper on the performance breakthroughs more than a month ago and released a model with similar methods a year ago.Dekate said DeepSeeks rollout was particularly timely because just last month news outlets were publishing stories about AI scaling limitations from leading providers.As organizations continue toembrace genAI tools and platforms and explore how they can create efficiencies and boost worker productivity, theyre also grappling with the high costs and complexity of the technology.DeepSeek improved memory bandwidth efficiency with two key innovations: using a lower-position memory algorithm and switching from FP32 (32-bit) to FP8 (8-bit) for model precision training. Theyre using the same amount of memory to store and move more data, Dekate said.One analogy would be to consider the onramp to a major city highway the highway being the data path. If the onramp only has one lane, there are only two ways to address traffic congestion:Increase the width of the roadway to fit more trafficReduce the size of the vehicles to more fit on the roadwayDeepSeek did both. It created smaller vehicles, i.e., it used smaller data packets (8-bit) and therefore was able to pack more data into the same footprint.The second key innovation was optimizing and compressing the key-value cache. DeepSeek used compression algorithms to reduce memory by processing prompts in two phases: decomposing and generating responses, both relying on efficient key-value cache use.They utilized underlying compute and memory resources incredibly efficiently, Dekate said. That is an amazing accomplishment, because theyre utilizing the underlying GPU resources more productively. Their models are able to perform at leadership-class levels while using a relatively lower scale of resources.Enterprises can benefit as well by adopting the techniques introduced by DeepSeek because it reduces the cost of adoption by using fewer compute resources for inferencing and training. Lower model costs should benefit innovators such as OpenAI and reduce the cost of applying AI across industries. By using resources more efficiently, DeepSeek enables faster, broader AI adoption by other companies, driving growth in AI development, demand, and infrastructure.And in the end, DeepSeeks algorithm still needs AI accelerator technology to work meaning GPUs and ASICs.Its not the case that DeepSeek just woke up one day and had an amazing breakthrough. No, theyre using sound engineering techniques and theyre using some of the leading AI accelerators and GPUs happen to be table stakes, Dekate said. And they use thousands of them. Its not like they discovered a new technique that blew this whole space wide open. No. You still need AI accelerators to perform model training.Even in the most pessimistic view, if AI costs drop to 5% of those from other leading AI models, that efficiency eventually benefits those other models by reducing their costs, allowing for faster model adoption.For enterprises, Dekate said, its worth exploring DeepSeek and similar models internally and in private settings. Your legal team evaluates the terms and conditions of your ecosystem quite extensively. Theyll ask if privacy is protected. Are the data sources filtered? Are AI model responses filtered in any sense? he said.Before jumping in, enterprises should carefully consider these details. Models like Gemini and GPT offer reliable, secure responses with enterprise-level protections, unlike many open models that lack these controls, Dekate argued.Once everything settles, the net-net is that DeepSeek has developed very specific capabilities that are quantitative and thats something to learn from, just as they did from Llama 3, Dekate said.
0 Comentários
·0 Compartilhamentos
·61 Visualizações