Rethinking AI: DeepSeekâs playbook shakes up the high-spend, high-compute paradigm
Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more
When DeepSeek released its R1 model this January, it wasnât just another AI announcement. It was a watershed moment that sent shockwaves through the tech industry, forcing industry leaders to reconsider their fundamental approaches to AI development.
What makes DeepSeekâs accomplishment remarkable isnât that the company developed novel capabilities; rather, it was how it achieved comparable results to those delivered by tech heavyweights at a fraction of the cost. In reality, DeepSeek didnât do anything that hadnât been done before; its innovation stemmed from pursuing different priorities. As a result, we are now experiencing rapid-fire development along two parallel tracks: efficiency and compute.Â
As DeepSeek prepares to release its R2 model, and as it concurrently faces the potential of even greater chip restrictions from the U.S., itâs important to look at how it captured so much attention.
Engineering around constraints
DeepSeekâs arrival, as sudden and dramatic as it was, captivated us all because it showcased the capacity for innovation to thrive even under significant constraints. Faced with U.S. export controls limiting access to cutting-edge AI chips, DeepSeek was forced to find alternative pathways to AI advancement.
While U.S. companies pursued performance gains through more powerful hardware, bigger models and better data, DeepSeek focused on optimizing what was available. It implemented known ideas with remarkable execution â and there is novelty in executing whatâs known and doing it well.
This efficiency-first mindset yielded incredibly impressive results. DeepSeekâs R1 model reportedly matches OpenAIâs capabilities at just 5 to 10% of the operating cost. According to reports, the final training run for DeepSeekâs V3 predecessor cost a mere million â which was described by former Tesla AI scientist Andrej Karpathy as âa joke of a budgetâ compared to the tens or hundreds of millions spent by U.S. competitors. More strikingly, while OpenAI reportedly spent million training its recent âOrionâ model, DeepSeek achieved superior benchmark results for just million â less than 1.2% of OpenAIâs investment.
If you get starry eyed believing these incredible results were achieved even as DeepSeek was at a severe disadvantage based on its inability to access advanced AI chips, I hate to tell you, but that narrative isnât entirely accurate. Initial U.S. export controls focused primarily on compute capabilities, not on memory and networking â two crucial components for AI development.
That means that the chips DeepSeek had access to were not poor quality chips; their networking and memory capabilities allowed DeepSeek to parallelize operations across many units, a key strategy for running their large model efficiently.
This, combined with Chinaâs national push toward controlling the entire vertical stack of AI infrastructure, resulted in accelerated innovation that many Western observers didnât anticipate. DeepSeekâs advancements were an inevitable part of AI development, but they brought known advancements forward a few years earlier than would have been possible otherwise, and thatâs pretty amazing.
Pragmatism over process
Beyond hardware optimization, DeepSeekâs approach to training data represents another departure from conventional Western practices. Rather than relying solely on web-scraped content, DeepSeek reportedly leveraged significant amounts of synthetic data and outputs from other proprietary models. This is a classic example of model distillation, or the ability to learn from really powerful models. Such an approach, however, raises questions about data privacy and governance that might concern Western enterprise customers. Still, it underscores DeepSeekâs overall pragmatic focus on results over process.
The effective use of synthetic data is a key differentiator. Synthetic data can be very effective when it comes to training large models, but you have to be careful; some model architectures handle synthetic data better than others. For instance, transformer-based models with mixture of expertsarchitectures like DeepSeekâs tend to be more robust when incorporating synthetic data, while more traditional dense architectures like those used in early Llama models can experience performance degradation or even âmodel collapseâ when trained on too much synthetic content.
This architectural sensitivity matters because synthetic data introduces different patterns and distributions compared to real-world data. When a model architecture doesnât handle synthetic data well, it may learn shortcuts or biases present in the synthetic data generation process rather than generalizable knowledge. This can lead to reduced performance on real-world tasks, increased hallucinations or brittleness when facing novel situations.Â
Still, DeepSeekâs engineering teams reportedly designed their model architecture specifically with synthetic data integration in mind from the earliest planning stages. This allowed the company to leverage the cost benefits of synthetic data without sacrificing performance.
Market reverberations
Why does all of this matter? Stock market aside, DeepSeekâs emergence has triggered substantive strategic shifts among industry leaders.
Case in point: OpenAI. Sam Altman recently announced plans to release the companyâs first âopen-weightâ language model since 2019. This is a pretty notable pivot for a company that built its business on proprietary systems. It seems DeepSeekâs rise, on top of Llamaâs success, has hit OpenAIâs leader hard. Just a month after DeepSeek arrived on the scene, Altman admitted that OpenAI had been âon the wrong side of historyâ regarding open-source AI.Â
With OpenAI reportedly spending to 8 billion annually on operations, the economic pressure from efficient alternatives like DeepSeek has become impossible to ignore. As AI scholar Kai-Fu Lee bluntly put it: âYouâre spending billion or billion a year, making a massive loss, and here you have a competitor coming in with an open-source model thatâs for free.â This necessitates change.
This economic reality prompted OpenAI to pursue a massive billion funding round that valued the company at an unprecedented billion. But even with a war chest of funds at its disposal, the fundamental challenge remains: OpenAIâs approach is dramatically more resource-intensive than DeepSeekâs.
Beyond model training
Another significant trend accelerated by DeepSeek is the shift toward âtest-time computeâ. As major AI labs have now trained their models on much of the available public data on the internet, data scarcity is slowing further improvements in pre-training.
To get around this, DeepSeek announced a collaboration with Tsinghua University to enable âself-principled critique tuningâ. This approach trains AI to develop its own rules for judging content and then uses those rules to provide detailed critiques. The system includes a built-in âjudgeâ that evaluates the AIâs answers in real-time, comparing responses against core rules and quality standards.
The development is part of a movement towards autonomous self-evaluation and improvement in AI systems in which models use inference time to improve results, rather than simply making models larger during training. DeepSeek calls its system âDeepSeek-GRMâ. But, as with its model distillation approach, this could be considered a mix of promise and risk.
For example, if the AI develops its own judging criteria, thereâs a risk those principles diverge from human values, ethics or context. The rules could end up being overly rigid or biased, optimizing for style over substance, and/or reinforce incorrect assumptions or hallucinations. Additionally, without a human in the loop, issues could arise if the âjudgeâ is flawed or misaligned. Itâs a kind of AI talking to itself, without robust external grounding. On top of this, users and developers may not understand why the AI reached a certain conclusion â which feeds into a bigger concern: Should an AI be allowed to decide what is âgoodâ or âcorrectâ based solely on its own logic? These risks shouldnât be discounted.
At the same time, this approach is gaining traction, as again DeepSeek builds on the body of work of othersto create what is likely the first full-stack application of SPCT in a commercial effort.
This could mark a powerful shift in AI autonomy, but there still is a need for rigorous auditing, transparency and safeguards. Itâs not just about models getting smarter, but that they remain aligned, interpretable, and trustworthy as they begin critiquing themselves without human guardrails.
Moving into the future
So, taking all of this into account, the rise of DeepSeek signals a broader shift in the AI industry toward parallel innovation tracks. While companies continue building more powerful compute clusters for next-generation capabilities, there will also be intense focus on finding efficiency gains through software engineering and model architecture improvements to offset the challenges of AI energy consumption, which far outpaces power generation capacity.Â
Companies are taking note. Microsoft, for example, has halted data center development in multiple regions globally, recalibrating toward a more distributed, efficient infrastructure approach. While still planning to invest approximately billion in AI infrastructure this fiscal year, the company is reallocating resources in response to the efficiency gains DeepSeek introduced to the market.
Meta has also responded,
With so much movement in such a short time, it becomes somewhat ironic that the U.S. sanctions designed to maintain American AI dominance may have instead accelerated the very innovation they sought to contain. By constraining access to materials, DeepSeek was forced to blaze a new trail.
Moving forward, as the industry continues to evolve globally, adaptability for all players will be key. Policies, people and market reactions will continue to shift the ground rules â whether itâs eliminating the AI diffusion rule, a new ban on technology purchases or something else entirely. Itâs what we learn from one another and how we respond that will be worth watching.
Jae Lee is CEO and co-founder of TwelveLabs.
Daily insights on business use cases with VB Daily
If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.
Read our Privacy Policy
Thanks for subscribing. Check out more VB newsletters here.
An error occured.
#rethinking #deepseeks #playbook #shakes #highspend
Rethinking AI: DeepSeekâs playbook shakes up the high-spend, high-compute paradigm
Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more
When DeepSeek released its R1 model this January, it wasnât just another AI announcement. It was a watershed moment that sent shockwaves through the tech industry, forcing industry leaders to reconsider their fundamental approaches to AI development.
What makes DeepSeekâs accomplishment remarkable isnât that the company developed novel capabilities; rather, it was how it achieved comparable results to those delivered by tech heavyweights at a fraction of the cost. In reality, DeepSeek didnât do anything that hadnât been done before; its innovation stemmed from pursuing different priorities. As a result, we are now experiencing rapid-fire development along two parallel tracks: efficiency and compute.Â
As DeepSeek prepares to release its R2 model, and as it concurrently faces the potential of even greater chip restrictions from the U.S., itâs important to look at how it captured so much attention.
Engineering around constraints
DeepSeekâs arrival, as sudden and dramatic as it was, captivated us all because it showcased the capacity for innovation to thrive even under significant constraints. Faced with U.S. export controls limiting access to cutting-edge AI chips, DeepSeek was forced to find alternative pathways to AI advancement.
While U.S. companies pursued performance gains through more powerful hardware, bigger models and better data, DeepSeek focused on optimizing what was available. It implemented known ideas with remarkable execution â and there is novelty in executing whatâs known and doing it well.
This efficiency-first mindset yielded incredibly impressive results. DeepSeekâs R1 model reportedly matches OpenAIâs capabilities at just 5 to 10% of the operating cost. According to reports, the final training run for DeepSeekâs V3 predecessor cost a mere million â which was described by former Tesla AI scientist Andrej Karpathy as âa joke of a budgetâ compared to the tens or hundreds of millions spent by U.S. competitors. More strikingly, while OpenAI reportedly spent million training its recent âOrionâ model, DeepSeek achieved superior benchmark results for just million â less than 1.2% of OpenAIâs investment.
If you get starry eyed believing these incredible results were achieved even as DeepSeek was at a severe disadvantage based on its inability to access advanced AI chips, I hate to tell you, but that narrative isnât entirely accurate. Initial U.S. export controls focused primarily on compute capabilities, not on memory and networking â two crucial components for AI development.
That means that the chips DeepSeek had access to were not poor quality chips; their networking and memory capabilities allowed DeepSeek to parallelize operations across many units, a key strategy for running their large model efficiently.
This, combined with Chinaâs national push toward controlling the entire vertical stack of AI infrastructure, resulted in accelerated innovation that many Western observers didnât anticipate. DeepSeekâs advancements were an inevitable part of AI development, but they brought known advancements forward a few years earlier than would have been possible otherwise, and thatâs pretty amazing.
Pragmatism over process
Beyond hardware optimization, DeepSeekâs approach to training data represents another departure from conventional Western practices. Rather than relying solely on web-scraped content, DeepSeek reportedly leveraged significant amounts of synthetic data and outputs from other proprietary models. This is a classic example of model distillation, or the ability to learn from really powerful models. Such an approach, however, raises questions about data privacy and governance that might concern Western enterprise customers. Still, it underscores DeepSeekâs overall pragmatic focus on results over process.
The effective use of synthetic data is a key differentiator. Synthetic data can be very effective when it comes to training large models, but you have to be careful; some model architectures handle synthetic data better than others. For instance, transformer-based models with mixture of expertsarchitectures like DeepSeekâs tend to be more robust when incorporating synthetic data, while more traditional dense architectures like those used in early Llama models can experience performance degradation or even âmodel collapseâ when trained on too much synthetic content.
This architectural sensitivity matters because synthetic data introduces different patterns and distributions compared to real-world data. When a model architecture doesnât handle synthetic data well, it may learn shortcuts or biases present in the synthetic data generation process rather than generalizable knowledge. This can lead to reduced performance on real-world tasks, increased hallucinations or brittleness when facing novel situations.Â
Still, DeepSeekâs engineering teams reportedly designed their model architecture specifically with synthetic data integration in mind from the earliest planning stages. This allowed the company to leverage the cost benefits of synthetic data without sacrificing performance.
Market reverberations
Why does all of this matter? Stock market aside, DeepSeekâs emergence has triggered substantive strategic shifts among industry leaders.
Case in point: OpenAI. Sam Altman recently announced plans to release the companyâs first âopen-weightâ language model since 2019. This is a pretty notable pivot for a company that built its business on proprietary systems. It seems DeepSeekâs rise, on top of Llamaâs success, has hit OpenAIâs leader hard. Just a month after DeepSeek arrived on the scene, Altman admitted that OpenAI had been âon the wrong side of historyâ regarding open-source AI.Â
With OpenAI reportedly spending to 8 billion annually on operations, the economic pressure from efficient alternatives like DeepSeek has become impossible to ignore. As AI scholar Kai-Fu Lee bluntly put it: âYouâre spending billion or billion a year, making a massive loss, and here you have a competitor coming in with an open-source model thatâs for free.â This necessitates change.
This economic reality prompted OpenAI to pursue a massive billion funding round that valued the company at an unprecedented billion. But even with a war chest of funds at its disposal, the fundamental challenge remains: OpenAIâs approach is dramatically more resource-intensive than DeepSeekâs.
Beyond model training
Another significant trend accelerated by DeepSeek is the shift toward âtest-time computeâ. As major AI labs have now trained their models on much of the available public data on the internet, data scarcity is slowing further improvements in pre-training.
To get around this, DeepSeek announced a collaboration with Tsinghua University to enable âself-principled critique tuningâ. This approach trains AI to develop its own rules for judging content and then uses those rules to provide detailed critiques. The system includes a built-in âjudgeâ that evaluates the AIâs answers in real-time, comparing responses against core rules and quality standards.
The development is part of a movement towards autonomous self-evaluation and improvement in AI systems in which models use inference time to improve results, rather than simply making models larger during training. DeepSeek calls its system âDeepSeek-GRMâ. But, as with its model distillation approach, this could be considered a mix of promise and risk.
For example, if the AI develops its own judging criteria, thereâs a risk those principles diverge from human values, ethics or context. The rules could end up being overly rigid or biased, optimizing for style over substance, and/or reinforce incorrect assumptions or hallucinations. Additionally, without a human in the loop, issues could arise if the âjudgeâ is flawed or misaligned. Itâs a kind of AI talking to itself, without robust external grounding. On top of this, users and developers may not understand why the AI reached a certain conclusion â which feeds into a bigger concern: Should an AI be allowed to decide what is âgoodâ or âcorrectâ based solely on its own logic? These risks shouldnât be discounted.
At the same time, this approach is gaining traction, as again DeepSeek builds on the body of work of othersto create what is likely the first full-stack application of SPCT in a commercial effort.
This could mark a powerful shift in AI autonomy, but there still is a need for rigorous auditing, transparency and safeguards. Itâs not just about models getting smarter, but that they remain aligned, interpretable, and trustworthy as they begin critiquing themselves without human guardrails.
Moving into the future
So, taking all of this into account, the rise of DeepSeek signals a broader shift in the AI industry toward parallel innovation tracks. While companies continue building more powerful compute clusters for next-generation capabilities, there will also be intense focus on finding efficiency gains through software engineering and model architecture improvements to offset the challenges of AI energy consumption, which far outpaces power generation capacity.Â
Companies are taking note. Microsoft, for example, has halted data center development in multiple regions globally, recalibrating toward a more distributed, efficient infrastructure approach. While still planning to invest approximately billion in AI infrastructure this fiscal year, the company is reallocating resources in response to the efficiency gains DeepSeek introduced to the market.
Meta has also responded,
With so much movement in such a short time, it becomes somewhat ironic that the U.S. sanctions designed to maintain American AI dominance may have instead accelerated the very innovation they sought to contain. By constraining access to materials, DeepSeek was forced to blaze a new trail.
Moving forward, as the industry continues to evolve globally, adaptability for all players will be key. Policies, people and market reactions will continue to shift the ground rules â whether itâs eliminating the AI diffusion rule, a new ban on technology purchases or something else entirely. Itâs what we learn from one another and how we respond that will be worth watching.
Jae Lee is CEO and co-founder of TwelveLabs.
Daily insights on business use cases with VB Daily
If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.
Read our Privacy Policy
Thanks for subscribing. Check out more VB newsletters here.
An error occured.
#rethinking #deepseeks #playbook #shakes #highspend
0 Commentaires
0 Parts