An Interview with Nvidia CEO Jensen Huang About Chip Controls, AI Factories, and Enterprise Pragmatism
An Interview with Nvidia CEO Jensen Huang About Chip Controls, AI Factories, and Enterprise Pragmatism
Monday, May 19, 2025
Listen to Podcast
Listen to this post:
Good morning,
This week’s Stratechery Interview is running early this week, as I had the chance to speak in person with Nvidia CEO Jensen Huang at the conclusion of his Computex 2025 keynote, which occurred this morning in Taiwan. I do plan on touching on some of the topics in this interview later this week, so, in the spirit of sharing my conversations with you — which undergirds this interview series — I wanted to post this as soon as possible.
I have spoken to Huang three times previously, in March 2022, September 2022, and March 2023. What was notable about those interviews was the extent to which Huang was trying to make the world understand the potential of GPU computing; now that the potential is being realized, Huang and Nvidia are facing an entirely new set of problems, even as they continue to push computing forward.
This interview starts out discussing some of those new challenges that are related to politics in particular: we discuss last week’s deals with Saudi Arabia and the United Arab Emirates, the ban on H20 sales to China, and why the U.S. approach to chip controls risks America’s — and Nvidia’s — long term control. Huang also makes the case for why AI will drive GDP growth in the near future, and maybe even reduce the trade deficit.
After that we get into today’s keynote and Huang’s keynote last month at GTC. As I note in this interview, I was surprised at how different they were, perhaps because they had different audiences: Taiwan OEMs and component makers and their enterprise customers today, versus American hyperscalers last month; the key thing to understand about Nvidia is that they want to sell to both. To that end, we discuss why a full-stack Nvidia solution maximizes utility, including how Dynamo improves inference performance, even as Nvidia’s approach to software and systems-building lets them sell you only the parts you want. And — perhaps appropriately given the question — we briefly touch on gaming at the end.
As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player.
On to the Interview:
An Interview with Nvidia CEO Jensen Huang About Chip Controls, AI Factories, and Enterprise Pragmatism
This interview is lightly edited for clarity.
Arab AI and the Chip Diffusion Rule
Jensen Huang, welcome back to Stratechery
Jensen Huang: Great to see you, Ben.
It is great to actually meet you in person, our previous talks have been over Zoom, and you’re here in Taiwan. You just announced a new building that’s pretty close to my house, so that’s exciting. When we talked before, I felt like you wanted the world to understand what GPUs could be. It was a pre-ChatGPT when we first started talking and now the world’s entire market rests on a knife’s edge when you announce earnings. Now, I think we’re in a quiet period, I’m not asking about earnings, but how does it feel to be thrust in that position, the center of the world in that regard?
JH: Well, you asked me a question that I now have no interesting answer. The answer is I have no feelings about it, but I do do recognize this, that while we’re in the process of reinventing Nvidia, which it is always really central to what we’re doing at the office, we’re trying to reinvent Nvidia so that we could be ahead of the puck so that we could be where the industry will go and we want to solve problems that are hard and contribute to the industry. But very importantly now, not only have we created a computing platform, we reinvented our company, we’re much more of a data center scale company, and we offer technology that is for the very first time wholly integrated to work together, but disintegrated so that the whole ecosystem could work with it.
But the thing that I said at the keynote, which is really important is that for the very first time that we’re building computers, not just for the technology industry, we’re building computers for a new industry called AI. Now, AI is partly technology, but it’s also partly labor and it augments labor as we know, and as we go into robotics it’ll be very, very clear. This new technology called AI actually is a new industry wholly, and this whole industry is going to be powered by factories, which is going to need a lot of computers, and people are just coming to terms with the fact that we are about to go into a future where we’re computing, what people call data centers, but they’re really AI factories, is likely to be quite large.
I noticed you referenced Satya Nadella on the Microsoft earnings call reported the number of tokens that they processed, I think that was last quarter. Was that your favorite bit of earnings from this quarter? I latched onto it right away too, what a great metric.
JH: In fact, the number of tokens that are actually being generated is way, way, way higher than that. That’s just the part that Microsoft was generating for third parties, but their own consumption is much, much higher and doesn’t include OpenAI either, so you could just imagine how much it is.
From what I understand, that is a very, very large amount relative to the number that was reported. You’ve been on quite the world tour — you and I know Taiwan is beautiful, I mentioned the new office park — I do have to ask what’s the Middle East like at this time of the year?
JH: Hot, but not humid.
It’s a dry heat, right?
JH: Yeah, it’s dry heat. I sort of really enjoyed it because the buildings were cold and I would walk out and just bask in the sun and I actually felt really great. But the nights are just incredible. The nights are incredible. Eating outside, having a cup of tea outside, it’s real incredible.
I’m also of course asking about these AI deals that have been announced with Saudi Arabia and UAE. Why from your perspective is that important and why was it important for you to be there?
JH: Well, because they asked me to be there, and we were there to announce two quite ambitious AI infrastructure build outs, one in Saudi Arabia and one in Abu Dhabi, and the leaders of both countries were very out in front recognizing the importance of their nations participating in the AI revolution, recognizing that they have an extraordinary opportunity, they have an abundance of energy and a shortage of labor, and the potential of their countries are limited by the amount of labor that they have, the amount of people that they have. So for the first time, they could transform, if you will, from energy to digital labor and robotics labor, agents, robots. They’re super focused on that and very articulate about it.
His Royal Highness in Saudi Arabia was very articulate about it and very passionate about and understand the technology even. And Sheik Tahnoun in Abu Dhabi, very passionate about it, very forward thinking about it, understands very deeply the implications of the technology and the opportunities for them and so I was delighted to be there, we’re partnering with both of them.
We helped launch a new company called HUMAIN in Saudi Arabia and their hope is to be on the world stage building these AI factories, hosting international companies, companies like OpenAI who was also there, and so a very big initiative.
This is a big shift. Part and parcel of this is a step back from the AI diffusion rules, which I think was pretty harsh on those countries in particular, having a regulated number, has to be controlled by US companies, gated in some respects by what’s built in the US. Nvidia, I think contrary to your previous actions, had come out very strongly against those and from your perspective — there’s a bit where you’ve had to grow up, I feel like. Tae Kim said in his book that Nvidia is like an F1 car built around you, and you’re the driver and is there a bit where you never wanted to think about this government stuff, and so Nvidia never really thought about this government stuff, and then suddenly you’re the most important company of the world and you had to learn about this very, very quickly?
JH: Well, it wasn’t that I never wanted to, I never had to. For the vast majority of Nvidia’s life we’ve been dealing with building the technology, building the company, building the industry, competing.
Yeah, in an industry that’s pure competition.
JH: Every single day, every single moment. Building our supply chain, building our ecosystem. Notice I just described a bunch of things that are gigantic in scale and scope, plenty hard in itself, and all of a sudden the diffusion rule came out, and I think we said it at the time, but I think it’s become apparent to everybody now, it is exactly wrong, it’s exactly wrong for America. If the goal of the diffusion rule is to ensure that America has to lead, the diffusion rule as it was written will exactly cause us to lose our lead.
AI is not just the layer of software called a model, AI is a full stack thing, that’s the reason why everybody’s always talking about Nvidia systems and infrastructure and factories and so on and so forth. AI is full stack. If America wants to lead in AI, it has to start by leading full stack at the chip level, at the factory level, infrastructure level, at the model level as well as the application level — AI is all of that.
You can’t just say, “Let’s go write a diffusion rule, protect one layer at the expense of everything else”, it’s nonsensical. The idea that we would limit American AI technology right at the time when international competitors have caught up, and we pretty much predicted it.
And by international competitors, you mean other models?
JH: China’s doing fantastic, 50% of the world’s AI researchers are Chinese and you’re not going to hold them back, you’re not going to stop them from advancing AI. Let’s face it, DeepSeek is deeply excellent work. To give them anything short of that is a lack of confidence so deep that I just can’t even tolerate it.
Did we spur that work to be even better by virtue of the restrictions that were placed on them, particularly in terms of memory management and bandwidth?
JH: Everybody loves competition. Companies need competition to inspire themselves, nations need that, and there’s no question we spur them. However, I fully expected China to be there every step of the way. Huawei is a formidable company, they’re a world-class technology company. The researchers, the AI scientists in China, they’re world-class. These are not Chinese AI researchers, they’re world-class AI researchers. You walk up and down the aisles of Anthropic or OpenAI or DeepMind, there’s a whole bunch of AI researchers there, and they’re from China. Of course it’s sensible, and they’re extraordinary and so the fact that they do extraordinary work is not surprising to me.
The idea of AI diffusion limiting other countries access American technology is a mission expressed exactly wrong, it should be about accelerating the adoption of American technology everywhere before it’s too late. If the goal is for America to lead, then AI diffusion did exactly the opposite of that.
I think AI diffusion also misses the big idea about how the AI stack works. The AI stack works like a computing platform, it’s a platform. The larger, the more capable your platform, the larger the install base, more developers run and develop on it. When more developers develop on it, it makes the results, the applications, that run on your computing platform better. As a result, you sell more, and more of your computing platform is adopted, which increases your install base, which increases developers using it to develop AI models, which increases — that positive feedback system can’t be understated for any computing platform, it’s the reason why Nvidia is successful today.
The idea that we would have America not compete in the Chinese market, where 50% of the developers are, makes absolutely no sense from a computing infrastructure, computing architectural perspective. We ought to go and give American companies the opportunity to compete in China, offset the trade deficit, generate tax income for the American people, build, hire jobs, create more jobs.
Nvidia and China
Is it fair to say we’re halfway there? Because we started out with the Gulf deal and the AI diffusion rule and certainly, I think you can see from a nation-state competition perspective, having these countries—
JH: These two ideas go hand in hand and what I mean by that is this: if we don’t compete in China, and we allow the Chinese ecosystem to build a rich ecosystem because we’re not there to compete for it, and new platforms are developed and they’re not American at a time when the world is diffusing AI technology, their leadership and their technology will diffuse all around the world.
That’s my point, where from your perspective, we’re halfway there. At least we’re not cutting us off in other countries.
JH: That’s right.
But we should go all the way and let Nvidia back in China.
JH: Yeah, but I would argue that, in fact, not going into China is about 90% of the way there. It’s actually not 50/50, it’s 90%.
So we got 10% done.
JH: Yeah, that’s right. Exactly.
For the record, I agree with you. My view is this attempt to limit chip cells and then give them all the chip-making equipment they want is precisely backwards — it’s a lot harder to track chips than it is chip-making equipment anyway. One of the theories that people in Washington DC have put forward is, “The chip-making companies or the semiconductor equipment manufacturing companies, they’ve been in Washington for years, they’re very good at lobbying and Nvidia’s not here, and so they’re behind the eight ball”. Does that ring true to you? Do you just say have a hard time having people in Washington understand this point of view?
JH: We had to work really hard in the last several years to build a presence in DC. We have a handful of people, most companies our size have hundreds of people, we have a handful. Our handful of people are amazing, they’re telling our story. They’re helping people explain, understand not just how chips work, but how ecosystems work, and how AI ecosystems work, and what are some of the unintended consequences of the policies.
We want America to win. Every company should want their country to win, and every country should want their companies to win, those are not terrible things to feel, those are good things to feel, and it is also good that people love to win. Competition is a good thing, aspiring to be great is a good thing. When some country aspires to be great, we shouldn’t begrudge them. When some company aspires to be great, I don’t begrudge them. It causes us to all rise above and do even better than we could, and so I love watching people who aspire to be great.
There’s no question China aspires to be great, good for them! They should expect absolutely nothing less, and for all of the AI researchers and AI scientists that I know around the world, they got to where they are because they all aspire to be great, and they are great. I think the idea that somehow that—
To win, you have to put the other one down.
JH: That’s right, it makes no sense to me. We ought to go faster. The reason why Nvidia is here today, the reason why we have our position today, we had absolutely zero support from anybody to get here, just let us keep running hard. I think the idea that we would hold other people back, as you mentioned, it just spurs them to be even greater, because these are amazing people.
I agree. I find it, as an American, deeply frustrating. I feel we should want to win by out-innovating, by going faster and this idea we’re going to win by pulling up the ladder and cutting people off, and putting bureaucratic red tape on everyone and trying to track everything just seems deeply, frustratingly un-American to me.
JH: Yeah. Anyhow, I think the President really sees it, he wants America to win.
Well here’s a question on this, because this is the same administration that cut off the H20, a chip that you basically designed to the previous administration’s specs, and suddenly, “It’s not okay”, and now they’re doing this deal. The critics are there, “Oh, this is going to open it up to China, potentially, XYZ”. It does feel like a shift in administration, maybe they’d argue it’s still the same thing. But we’ve also had a lot of shifts between the US and China over the last six weeks, I think is one way to put it.
Do you get a sense that maybe there’s been a real realization that this world is so interconnected and related, and what goes on one side happens on the other, and maybe it’s not going to be so easy to peel apart, and there’s going to be a return of pragmatism, and how do we manage this? Are you optimistic in that regard or are you preparing for the worst?
JH: The President has a vision of what he wants to achieve, I support the President, I believe in the President, and I think that he’ll create a great outcome for America, and he’ll do it with respect and with an attitude of wanting to compete, but also looking for opportunities to cooperate. I sense that, I see all that. Obviously, I’m not in the White House and I don’t know exactly how they feel, but that’s what I sense.
First of all, the ban on H20s, that’s the limit of what we can do to Hopper, and we’ve cut it down to there’s not much left to cut. We’ve written off — I think it’s billion — no company in history has ever written off that much inventory, so this additional ban on Nvidia’s H20 is deeply painful. Its costs are enormously costly, not only am I losing billion, we wrote off billion, we walked away from billion of sales and probably — what is it? — billion worth of taxes. The China market is about billion a year and it’s not million, it’s billion. billion is like Boeing, not the plane, the whole company. To leave that behind so that the profits that go with that, the scale that goes with that, the ecosystem building that goes with that—
That’s the real threat to CUDA in the long run—
JH: That’s right.
China builds an alternative.
JH: Exactly. Anybody who thought that one chess move to somehow ban China from H20s would somehow cut off their ability to do AI is deeply uninformed.
AI GDP Growth
There’s an angle on this in the power stuff that I want to get to in a moment, but this is going to be more fun. Let’s leave aside all the government stuff, we’ll circle back around. A third way to get to my question about financial markets, governments, on today’s keynote you started out by saying, “We’re an infrastructure company, you need five-year roadmaps”. You mentioned in passing that your original TAM estimate when you started Nvidia was million. When did you actually see this coming, “We’re going to be infrastructure?” — again, I go back to our conversations previously, my sense from those is you just wanted people to see this possibility. You saw the possibility of GPU computing, but the scale, has it blown your mind just a little bit?
JH: If you watch my keynotes, as you do, almost pretty consistently, things that are happening today, I spoke about five years ago. At the time when I was speaking about it five years ago, the words weren’t as clear and the vocabulary I was using wasn’t as precise, but where we were going is consistent.
So basically right now when you talk a lot about robotics at the end of every keynote, which you have been doing, that is our five-year preview that we should really be paying attention to.
JH: Yeah. And in fact, I’ve been talking about it for about three years.
Yeah, so a couple years from now.
JH: It’s a couple years from now, I think it’s going to happen.
The thing that is fairly deep and fairly profound for this industry is that for all of the last 60 years we’ve been the IT industry, which is a technology and tool, it’s a technology and tool used by people — for the very first time, we’re going to leave the IT budget, what we sell goes into the IT budget, we’re about to leave the IT budget and into the manufacturing or the OpEx budget.
The manufacturing budget is because we’re building robots or because robotic systems are being used to build products and then the OpEx is because of digital workers. The world’s OpEx and CapEx is what? Combined trillion? It’s a giant number. So the IT industry is about a trillion, we’re about to bring, because of AI, all of us into about a trillion industry.
Of course my first hope, and I think it will happen this way, although jobs will be changed and some jobs will be lost, a lot of jobs will be created. It is very likely that robotic systems where their agents are physical robots, will likely expand the world’s GDP. The reason for that is we have a shortage of labor, that’s why everybody’s employed. If you go around the United States, unemployment is at all-time lows, and so it’s because we just don’t have enough labor. Restaurants are having a hard time filling staff, many factories are obviously having a very hard time filling staff. I think the idea that you would hire a robot for a year, I think people will do that in a heartbeat and the reason for that is because it just increased their ability to generate more revenues, and so I think that that next five, ten years is we’re likely to experience that expansion of GDP and a whole new industry of these token manufacturing systems that people now will understand.
What I thought was also interesting about today’s keynote is I prepped for this interview before I came and I’m like, “Well, it’s probably going to be a bit of a rehash of GTC”, and I thought it was actually pretty starkly different. Here’s my interpretation, you have to let me know if it’s correct. It felt like GTC was for the hyperscalers and today’s presentation was for enterprise IT, it was like two different markets.
JH: Yeah.
Do I have that correct in terms of the target?
JH: Enterprise IT or agents and robots, and agents for enterprise IT and robots for manufacturing and the reason for that is very clear, this is this the beginning of the ecosystem.
You made a beautiful video by the way of the Taiwan ecosystem and that goes into making all the pieces, that was really great.
Dynamo and Full-Stack Nvidia
Let’s go to the GTC keynote, that was one of my favorite keynotes of yours, I do watch them all and watched them all for years. Some real Professor Jensen energy, as you explain the limitations of data centers, why Nvidia was the answer, and I interpreted that as kind of an anti-ASIC message. You had a combination of, you showed your roadmap, it’s like, “Try to keep up with this”, and then number two, you introduce the Pareto curve of latency versus bandwidth, and because they’re programmable, you can use the same GPUs all over this curve and of course, hyperscalers are the ones that are going make ASICs.
Do I have the right understanding of your presentation there?
JH: I think the teachings was right, the reasons why I did it wasn’t exactly that. I was simply trying to help people understand how to build a new data center. We’ve been thinking about it and so here’s the challenge. There’s only so much energy in the data center. 100 megawatts is 100 megawatts, 250 megawatts is 250 megawatts and so your fundamental job, if it’s a factory, is to make sure that the overall throughput-per-watt is the highest because that overall throughput in tokens, depending on if it’s cheap, inexpensive tokens, meaning free-to-use tokens or the high quality tokens that somebody might pay actually say, a thousand dollars a month, a month.
Well you just mentioned a AI assistant.
JH: Exactly. Would I hire a /year AI agent? In a heartbeat. And the reason for that is we hire people way more expensive than that all day long and if I can just simply amplify somebody who I’m paying a year, that’d be incredible, for a hundred thousand bucks, so of course I would.
The quality of tokens that you generate in a factory is quite varied. You need some that are free-to-use, you need some that are high quality and so you’re across that Pareto. You can’t design a chip or a system that is only good at one, because it’ll be underutilized and so now the question is, how do you create a system that simultaneously, at some time, could be used for free token generation, some of it for free tokens, some of it for high quality?
If you cause the architecture to be too fragmented, then your ability to move workload back and forth is difficult and so I think when people go through the thinking of it, if you design a system that’s very, very good at high token rate, it naturally has very low overall throughput. If you design something at a very high throughput, it tends to have very low interactivity, it’s tokens-per-second per user is low and so it’s easy to hug the X-axis, it’s easy to hug the Y-axis, it’s hard to fill out that area, and so that’s the invention over all of the combination of what we did with the Blackwell architecture and FP4 and NVLink 72 and the ratio, the balance between HBM memory and its capacity, the balance between the amount of floating-point and the memory capacity and bandwidth and then very importantly, the Dynamo disaggregated streaming serving ecosystem, hardware system.
I wanted to ask you about Dynamo, which did not come up today, but I think is super interesting.
JH: Super important.
Give me the pitch, I think you called it the operating system for data centers.
JH: The pitch basically is that the inference workload, the transformer, has different stages of it, and different stages could be used differently depending on the user and depending on the model and depending on the context of that model and so we disaggregated the processing of the large language model into pre-fill, which is the context processing, thinking about what you’re about to ask me. It has to do with my memories of Ben and the type of the deep and conversational podcasts like you like to do, and they tend to have — if I start talking deeply about the industry and the technology, I don’t feel uncomfortable doing so.
Right, you’re not doing a sound byte right now for the evening news or something like that.
JH: That’s right. I feel like I can lean in and because you’ll understand it, I don’t feel like I’m talking to the wall, and so I feel very comfortable talking about these things.
Well, when an AI comes to a chatbot, the chatbot needs to have some of that context and so chatbots have memory, they process context, and they might even have to read a PDF or two, and so that’s called a pre-fill part, that pre-fill part is very floating-point intensive.
Then there’s the decode part. The decode part is about now generating the thoughts, it’s about reasoning through what you’re about to say, predicting the next token and so a chain of thought basically generates a lot more tokens, which gets fed back into the context which generates more tokens, and so it’s reasoning through a problem step-by-step, maybe it has to go off and read some stuff. The modern versions of AI, this agentic AI’s, reasoning AI’s, the amount of floating-point, the amount of bandwidth — decode requires a lot of bandwidth — is high in all cases, but it could be higher.
It varies.
JH: That’s right, it varies depending on things.
You don’t need a high floating-point precision in the decode stage.
JH: That’s right. So for example, one-shot, and it’s got a strong KV cache already, you don’t need much floating-point. However, the moment you load it with context, you need a lot of floating-point. Dynamo disaggregates all the processing and it disperses it in the data center smartly metering the workload and metering the load on the processors, really complex stuff.
Well, and it ties into, if the entire data center is one GPU, you’re talking about a software layer that treats it that way.
JH: That’s right, it’s essentially the operating system of an AI factory.
When you think about these thinking models, these reasoning models, looking forward — you’re someone, like you said, you have great predictions — do you see these being used mostly in agentic workflows and the downside of them is you’re sitting around and waiting for them, or maybe you’re setting up a bunch of agents that are acting in parallel, so that works out well, or they can actually end up being most important in generating data for training to get better one-shot results which is how people would interact more frequently?
JH: I think depending on cost, and my prediction is that it’s likely that reasoning models will just be the baseline, because we’re going to process this so lightning fast. Basically when you turn on Grace Blackwell, it’s 40 times faster, and let’s say the next click is another 40 times faster and the models are getting better. So the idea that between now and five years from now that we could be 100,000 times faster for agentic models, very sensible to me.
That’s the history of computing.
JH: That’s right. So it just thought about a mountain of things, you just didn’t see it. It’s a fast thinker now, even slow thinking is fast.
What was that book? Thinking Fast and Slow, now apply that to AI. I guess it could read the whole thing in a second, so it might defeat the purpose.
JH: That’s right.
Enterprise AI and Pragmatism
To go back, just a quick little touch on politics. Is there a bit where your delivery in talking about this and your performance-per-watt, is that really a US-centric thing in a world where we have a hard time building power and power is the chief constraint? You look at something like these Gulf countries, power more accessible, easier to build for various reasons, and you go to China, guess what? If power is not the chief constraint, you can work through a lot of problems that Nvidia solves for you. Is that a reason GTC is in the US, that’s the message for the US?
JH: Oh, I didn’t think of it that way. I think that no matter what happens, your factory will always be a certain size and even though your country has a lot more energy, your data center doesn’t and so I think perf-per-watt is important, always.
It’s always important, but the degree of importance may vary.
JH: That’s right, yeah. It’s just that if you’re planning for it, on the other hand you say, “Okay, well I have an architecture that has half the performance-per-watt, and so maybe I’ll just get twice as much land, and twice of some much power, and just start building that from the get go”. When you put all that stuff together, though, this is the problem.
Remember, even the infrastructure itself and the power delivery let’s say for a gigawatt, let’s just do some simple math. Let me just say billion of it could be shell, power, land, operating it, all that billion. Let’s say the compute and the networking and everything, all in storage, billion, okay? Well, if it turns out that you have to build twice as many, you just multiply your by 2, so it’s billion and so you’re going to have to get some really cheap compute to make up for it. That’s why I always feel that in the world of AI factories, the math would suggest that if an architecture is not as good, sometimes depending on how poor it is, even free is not cheap enough.
If it’s your only choice, you’ll make it work.
JH: That’s right.
Well, let’s contrast that to today. You said a couple of times today, “I love it if you buy everything from me, but I’m happy you buy anything from me”. It was funny before it fully crystallized for me that this feels like the enterprise keynote, which is again my words not yours, there was this sense of pragmatism that I’m like, “He’s sounding like an enterprise CEO right now, they’re very pragmatic”. Of course, if you buy the whole stack, it works better, and it’s kind of interesting where if you’re talking about building a full up AI factory, to use your words, of course using all Nvidia will maximize your returns from it, but there’s a lot of customers out there that are just buying bits and pieces, and those customers, maybe you like them to buy the whole thing, but also if they buy anything from you, they’re probably going to buy it from you forever. So it seems like just strategically, it’s a very useful base to go for.
JH: Serving a customer is just smart. If you look at the way Nvidia goes to market, we’ve always built things in a fully integrated way because software needs to be integrated with hardware somehow, but we do it with enough discipline that we can then disaggregate the software from the hardware, and you could decide not to use our software, you can choose not to. And if you look at the way we design our systems, we’ve actually disaggregated the systems in a sufficiently disciplined way, that if you wanted to replace some of it, you could. Right now, Grace Blackwell is being integrated, and stood up all over the world in different clouds and they’re all based on our standards, but they’re all a little bit different, and yet we fit into them.
That’s I think the real challenge of Nvidia’s business model, and it goes hand-in-hand with wanting to be a computing platform company. The most important thing is that one of Nvidia’s stacks, if it’s our compute stack, that’s great. If our networking stack, which I feel deeply about and as strongly about as my computing stack, if my computing stack gets adopted all in all, terrific. If my networking stack gets adopted, terrific. If both of them gets adopted, incredible.
Well, I mean, a lot of people, your NVLink Fusion, you can get just NVLink, you can integrate it with your ASIC with — again, total contrast to what I interpreted the GTC messaging — but again, I can see the view here. I mean, who’s the customer?
JH: I still have deep beliefs that Nvidia is building a better system overall, I still believe that. And if I don’t believe that, then obviously we must be doing something wrong, and we’ve got to go get ourselves to believe that, and so I completely believe that Nvidia is the largest scale accelerated computing company in the world, we’re the largest scale AI computing company in the world. Nobody of 36-38,000 people is united to this one job than any place ever and so the idea that a small team of 14 people could do a better job than us would be quite painful to internalize, and so we strive to do better.
However, you also believe in scale, and a great way to get scale in everything that you’re selling is to sell it however the customer wants it.
JH: That’s right, exactly. That’s exactly right. So I have preferences, but we want to make sure that we’re able to serve every customer however they’d like to be served.
Whither Gaming
Along these lines, and maybe this is related: I was asking a friend of mine about this interview and he said his son insisted that I ask this question. Some people in gaming feel, you mentioned it today, only 10% of the keynote is about GeForce, but that’s still important to us. Is that a, “It’s still important to us because this all scales and we’re making GPUs?”, or what should I tell my friend’s son about Nvidia and gaming?
JH: See, I wish I had said it, RTX PRO wouldn’t be possible without GeForce, Omniverse wouldn’t be possible without GeForce, not one of those pixels that we saw in any of those videos would’ve been possible without GeForce, robots wouldn’t be possible without GeForce, Newton is not possible without GeForce, so GeForce itself, they’re not as deeply part of the GTC event because GTC tends to be about high-performance computing, and enterprise, and AI, and things like that. We have a separate conference for game developers and things like that and so it’s just that when I do GTC, I got a group of people that I always feel a little badly about that their product launch isn’t as central, but it’s just not the right audience, but they also know that GeForce plays such an integral role in everything that we do.
I mean, is there a bit that maybe the gamers just don’t fully appreciate the extent to which their GeForce’s are much more than just graphic rendering engines at this point?
JH:Yeah, right. Exactly. And I said it today, we’re only rendering 1 out of 10 pixels, it’s a shocking number. Suppose I gave you a puzzle, and I gave you 1 out of 10 pieces and the other 10 pieces I’m not even going to give you, you’ve just got to make them up.
I got another pitch for you to connect gaming to your other things. You just talked about how you’re disciplined about keeping things separate, and being able to separate it and software managing all that. Kind of sounds like the driver problem on Windows, to be totally honest, that’s just a core skill set that you have.
JH: Yeah, it’s just a driver is too low-level, and it’s got too many things, too many registers and the driver abstraction was actually a revolution that Microsoft really played a very large role in. Windows wouldn’t be where Windows is if not for this concept of a driver, and it created an abstraction of an API while underneath the hardware can change fairly significantly.
We’re open source, our driver now, and quite frankly, I don’t see that many people contributing to it, and the reason for that is because the moment I come up with another GPU, all the work that they did in the last driver is kind of thrown away, and so without a large body of engineers like Nvidia has, it’s hard to do. But if we optimize every GPU for every driver with its associated driver, then there’s a wonderful isolation layer, an abstraction layer, whether it’s CUDA or DirectX, that people could build on top of.
Look, here’s my answer to my friend’s son. I had to ask you about the government stuff, and you gave a good and passionate defense of your view, but you truly got excited and your eyes lit up when I asked about gaming drivers.
JH: Oh, is that right?
So I think everything’s still good.
JH: Oh good. Yeah, I love GeForce actually.
There you go, that’s why it’s good to speak in-person. Jensen Huang, thank you very much.
This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery.
The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount, please contact me directly.
Thanks for being a supporter, and have a great day!
Get notified about new Articles
Please verify your email address to proceed.
←2025.20: Product Dreams and Marketplace Realities
#interview #with #nvidia #ceo #jensen
An Interview with Nvidia CEO Jensen Huang About Chip Controls, AI Factories, and Enterprise Pragmatism
An Interview with Nvidia CEO Jensen Huang About Chip Controls, AI Factories, and Enterprise Pragmatism
Monday, May 19, 2025
Listen to Podcast
Listen to this post:
Good morning,
This week’s Stratechery Interview is running early this week, as I had the chance to speak in person with Nvidia CEO Jensen Huang at the conclusion of his Computex 2025 keynote, which occurred this morning in Taiwan. I do plan on touching on some of the topics in this interview later this week, so, in the spirit of sharing my conversations with you — which undergirds this interview series — I wanted to post this as soon as possible.
I have spoken to Huang three times previously, in March 2022, September 2022, and March 2023. What was notable about those interviews was the extent to which Huang was trying to make the world understand the potential of GPU computing; now that the potential is being realized, Huang and Nvidia are facing an entirely new set of problems, even as they continue to push computing forward.
This interview starts out discussing some of those new challenges that are related to politics in particular: we discuss last week’s deals with Saudi Arabia and the United Arab Emirates, the ban on H20 sales to China, and why the U.S. approach to chip controls risks America’s — and Nvidia’s — long term control. Huang also makes the case for why AI will drive GDP growth in the near future, and maybe even reduce the trade deficit.
After that we get into today’s keynote and Huang’s keynote last month at GTC. As I note in this interview, I was surprised at how different they were, perhaps because they had different audiences: Taiwan OEMs and component makers and their enterprise customers today, versus American hyperscalers last month; the key thing to understand about Nvidia is that they want to sell to both. To that end, we discuss why a full-stack Nvidia solution maximizes utility, including how Dynamo improves inference performance, even as Nvidia’s approach to software and systems-building lets them sell you only the parts you want. And — perhaps appropriately given the question — we briefly touch on gaming at the end.
As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player.
On to the Interview:
An Interview with Nvidia CEO Jensen Huang About Chip Controls, AI Factories, and Enterprise Pragmatism
This interview is lightly edited for clarity.
Arab AI and the Chip Diffusion Rule
Jensen Huang, welcome back to Stratechery
Jensen Huang: Great to see you, Ben.
It is great to actually meet you in person, our previous talks have been over Zoom, and you’re here in Taiwan. You just announced a new building that’s pretty close to my house, so that’s exciting. When we talked before, I felt like you wanted the world to understand what GPUs could be. It was a pre-ChatGPT when we first started talking and now the world’s entire market rests on a knife’s edge when you announce earnings. Now, I think we’re in a quiet period, I’m not asking about earnings, but how does it feel to be thrust in that position, the center of the world in that regard?
JH: Well, you asked me a question that I now have no interesting answer. The answer is I have no feelings about it, but I do do recognize this, that while we’re in the process of reinventing Nvidia, which it is always really central to what we’re doing at the office, we’re trying to reinvent Nvidia so that we could be ahead of the puck so that we could be where the industry will go and we want to solve problems that are hard and contribute to the industry. But very importantly now, not only have we created a computing platform, we reinvented our company, we’re much more of a data center scale company, and we offer technology that is for the very first time wholly integrated to work together, but disintegrated so that the whole ecosystem could work with it.
But the thing that I said at the keynote, which is really important is that for the very first time that we’re building computers, not just for the technology industry, we’re building computers for a new industry called AI. Now, AI is partly technology, but it’s also partly labor and it augments labor as we know, and as we go into robotics it’ll be very, very clear. This new technology called AI actually is a new industry wholly, and this whole industry is going to be powered by factories, which is going to need a lot of computers, and people are just coming to terms with the fact that we are about to go into a future where we’re computing, what people call data centers, but they’re really AI factories, is likely to be quite large.
I noticed you referenced Satya Nadella on the Microsoft earnings call reported the number of tokens that they processed, I think that was last quarter. Was that your favorite bit of earnings from this quarter? I latched onto it right away too, what a great metric.
JH: In fact, the number of tokens that are actually being generated is way, way, way higher than that. That’s just the part that Microsoft was generating for third parties, but their own consumption is much, much higher and doesn’t include OpenAI either, so you could just imagine how much it is.
From what I understand, that is a very, very large amount relative to the number that was reported. You’ve been on quite the world tour — you and I know Taiwan is beautiful, I mentioned the new office park — I do have to ask what’s the Middle East like at this time of the year?
JH: Hot, but not humid.
It’s a dry heat, right?
JH: Yeah, it’s dry heat. I sort of really enjoyed it because the buildings were cold and I would walk out and just bask in the sun and I actually felt really great. But the nights are just incredible. The nights are incredible. Eating outside, having a cup of tea outside, it’s real incredible.
I’m also of course asking about these AI deals that have been announced with Saudi Arabia and UAE. Why from your perspective is that important and why was it important for you to be there?
JH: Well, because they asked me to be there, and we were there to announce two quite ambitious AI infrastructure build outs, one in Saudi Arabia and one in Abu Dhabi, and the leaders of both countries were very out in front recognizing the importance of their nations participating in the AI revolution, recognizing that they have an extraordinary opportunity, they have an abundance of energy and a shortage of labor, and the potential of their countries are limited by the amount of labor that they have, the amount of people that they have. So for the first time, they could transform, if you will, from energy to digital labor and robotics labor, agents, robots. They’re super focused on that and very articulate about it.
His Royal Highness in Saudi Arabia was very articulate about it and very passionate about and understand the technology even. And Sheik Tahnoun in Abu Dhabi, very passionate about it, very forward thinking about it, understands very deeply the implications of the technology and the opportunities for them and so I was delighted to be there, we’re partnering with both of them.
We helped launch a new company called HUMAIN in Saudi Arabia and their hope is to be on the world stage building these AI factories, hosting international companies, companies like OpenAI who was also there, and so a very big initiative.
This is a big shift. Part and parcel of this is a step back from the AI diffusion rules, which I think was pretty harsh on those countries in particular, having a regulated number, has to be controlled by US companies, gated in some respects by what’s built in the US. Nvidia, I think contrary to your previous actions, had come out very strongly against those and from your perspective — there’s a bit where you’ve had to grow up, I feel like. Tae Kim said in his book that Nvidia is like an F1 car built around you, and you’re the driver and is there a bit where you never wanted to think about this government stuff, and so Nvidia never really thought about this government stuff, and then suddenly you’re the most important company of the world and you had to learn about this very, very quickly?
JH: Well, it wasn’t that I never wanted to, I never had to. For the vast majority of Nvidia’s life we’ve been dealing with building the technology, building the company, building the industry, competing.
Yeah, in an industry that’s pure competition.
JH: Every single day, every single moment. Building our supply chain, building our ecosystem. Notice I just described a bunch of things that are gigantic in scale and scope, plenty hard in itself, and all of a sudden the diffusion rule came out, and I think we said it at the time, but I think it’s become apparent to everybody now, it is exactly wrong, it’s exactly wrong for America. If the goal of the diffusion rule is to ensure that America has to lead, the diffusion rule as it was written will exactly cause us to lose our lead.
AI is not just the layer of software called a model, AI is a full stack thing, that’s the reason why everybody’s always talking about Nvidia systems and infrastructure and factories and so on and so forth. AI is full stack. If America wants to lead in AI, it has to start by leading full stack at the chip level, at the factory level, infrastructure level, at the model level as well as the application level — AI is all of that.
You can’t just say, “Let’s go write a diffusion rule, protect one layer at the expense of everything else”, it’s nonsensical. The idea that we would limit American AI technology right at the time when international competitors have caught up, and we pretty much predicted it.
And by international competitors, you mean other models?
JH: China’s doing fantastic, 50% of the world’s AI researchers are Chinese and you’re not going to hold them back, you’re not going to stop them from advancing AI. Let’s face it, DeepSeek is deeply excellent work. To give them anything short of that is a lack of confidence so deep that I just can’t even tolerate it.
Did we spur that work to be even better by virtue of the restrictions that were placed on them, particularly in terms of memory management and bandwidth?
JH: Everybody loves competition. Companies need competition to inspire themselves, nations need that, and there’s no question we spur them. However, I fully expected China to be there every step of the way. Huawei is a formidable company, they’re a world-class technology company. The researchers, the AI scientists in China, they’re world-class. These are not Chinese AI researchers, they’re world-class AI researchers. You walk up and down the aisles of Anthropic or OpenAI or DeepMind, there’s a whole bunch of AI researchers there, and they’re from China. Of course it’s sensible, and they’re extraordinary and so the fact that they do extraordinary work is not surprising to me.
The idea of AI diffusion limiting other countries access American technology is a mission expressed exactly wrong, it should be about accelerating the adoption of American technology everywhere before it’s too late. If the goal is for America to lead, then AI diffusion did exactly the opposite of that.
I think AI diffusion also misses the big idea about how the AI stack works. The AI stack works like a computing platform, it’s a platform. The larger, the more capable your platform, the larger the install base, more developers run and develop on it. When more developers develop on it, it makes the results, the applications, that run on your computing platform better. As a result, you sell more, and more of your computing platform is adopted, which increases your install base, which increases developers using it to develop AI models, which increases — that positive feedback system can’t be understated for any computing platform, it’s the reason why Nvidia is successful today.
The idea that we would have America not compete in the Chinese market, where 50% of the developers are, makes absolutely no sense from a computing infrastructure, computing architectural perspective. We ought to go and give American companies the opportunity to compete in China, offset the trade deficit, generate tax income for the American people, build, hire jobs, create more jobs.
Nvidia and China
Is it fair to say we’re halfway there? Because we started out with the Gulf deal and the AI diffusion rule and certainly, I think you can see from a nation-state competition perspective, having these countries—
JH: These two ideas go hand in hand and what I mean by that is this: if we don’t compete in China, and we allow the Chinese ecosystem to build a rich ecosystem because we’re not there to compete for it, and new platforms are developed and they’re not American at a time when the world is diffusing AI technology, their leadership and their technology will diffuse all around the world.
That’s my point, where from your perspective, we’re halfway there. At least we’re not cutting us off in other countries.
JH: That’s right.
But we should go all the way and let Nvidia back in China.
JH: Yeah, but I would argue that, in fact, not going into China is about 90% of the way there. It’s actually not 50/50, it’s 90%.
So we got 10% done.
JH: Yeah, that’s right. Exactly.
For the record, I agree with you. My view is this attempt to limit chip cells and then give them all the chip-making equipment they want is precisely backwards — it’s a lot harder to track chips than it is chip-making equipment anyway. One of the theories that people in Washington DC have put forward is, “The chip-making companies or the semiconductor equipment manufacturing companies, they’ve been in Washington for years, they’re very good at lobbying and Nvidia’s not here, and so they’re behind the eight ball”. Does that ring true to you? Do you just say have a hard time having people in Washington understand this point of view?
JH: We had to work really hard in the last several years to build a presence in DC. We have a handful of people, most companies our size have hundreds of people, we have a handful. Our handful of people are amazing, they’re telling our story. They’re helping people explain, understand not just how chips work, but how ecosystems work, and how AI ecosystems work, and what are some of the unintended consequences of the policies.
We want America to win. Every company should want their country to win, and every country should want their companies to win, those are not terrible things to feel, those are good things to feel, and it is also good that people love to win. Competition is a good thing, aspiring to be great is a good thing. When some country aspires to be great, we shouldn’t begrudge them. When some company aspires to be great, I don’t begrudge them. It causes us to all rise above and do even better than we could, and so I love watching people who aspire to be great.
There’s no question China aspires to be great, good for them! They should expect absolutely nothing less, and for all of the AI researchers and AI scientists that I know around the world, they got to where they are because they all aspire to be great, and they are great. I think the idea that somehow that—
To win, you have to put the other one down.
JH: That’s right, it makes no sense to me. We ought to go faster. The reason why Nvidia is here today, the reason why we have our position today, we had absolutely zero support from anybody to get here, just let us keep running hard. I think the idea that we would hold other people back, as you mentioned, it just spurs them to be even greater, because these are amazing people.
I agree. I find it, as an American, deeply frustrating. I feel we should want to win by out-innovating, by going faster and this idea we’re going to win by pulling up the ladder and cutting people off, and putting bureaucratic red tape on everyone and trying to track everything just seems deeply, frustratingly un-American to me.
JH: Yeah. Anyhow, I think the President really sees it, he wants America to win.
Well here’s a question on this, because this is the same administration that cut off the H20, a chip that you basically designed to the previous administration’s specs, and suddenly, “It’s not okay”, and now they’re doing this deal. The critics are there, “Oh, this is going to open it up to China, potentially, XYZ”. It does feel like a shift in administration, maybe they’d argue it’s still the same thing. But we’ve also had a lot of shifts between the US and China over the last six weeks, I think is one way to put it.
Do you get a sense that maybe there’s been a real realization that this world is so interconnected and related, and what goes on one side happens on the other, and maybe it’s not going to be so easy to peel apart, and there’s going to be a return of pragmatism, and how do we manage this? Are you optimistic in that regard or are you preparing for the worst?
JH: The President has a vision of what he wants to achieve, I support the President, I believe in the President, and I think that he’ll create a great outcome for America, and he’ll do it with respect and with an attitude of wanting to compete, but also looking for opportunities to cooperate. I sense that, I see all that. Obviously, I’m not in the White House and I don’t know exactly how they feel, but that’s what I sense.
First of all, the ban on H20s, that’s the limit of what we can do to Hopper, and we’ve cut it down to there’s not much left to cut. We’ve written off — I think it’s billion — no company in history has ever written off that much inventory, so this additional ban on Nvidia’s H20 is deeply painful. Its costs are enormously costly, not only am I losing billion, we wrote off billion, we walked away from billion of sales and probably — what is it? — billion worth of taxes. The China market is about billion a year and it’s not million, it’s billion. billion is like Boeing, not the plane, the whole company. To leave that behind so that the profits that go with that, the scale that goes with that, the ecosystem building that goes with that—
That’s the real threat to CUDA in the long run—
JH: That’s right.
China builds an alternative.
JH: Exactly. Anybody who thought that one chess move to somehow ban China from H20s would somehow cut off their ability to do AI is deeply uninformed.
AI GDP Growth
There’s an angle on this in the power stuff that I want to get to in a moment, but this is going to be more fun. Let’s leave aside all the government stuff, we’ll circle back around. A third way to get to my question about financial markets, governments, on today’s keynote you started out by saying, “We’re an infrastructure company, you need five-year roadmaps”. You mentioned in passing that your original TAM estimate when you started Nvidia was million. When did you actually see this coming, “We’re going to be infrastructure?” — again, I go back to our conversations previously, my sense from those is you just wanted people to see this possibility. You saw the possibility of GPU computing, but the scale, has it blown your mind just a little bit?
JH: If you watch my keynotes, as you do, almost pretty consistently, things that are happening today, I spoke about five years ago. At the time when I was speaking about it five years ago, the words weren’t as clear and the vocabulary I was using wasn’t as precise, but where we were going is consistent.
So basically right now when you talk a lot about robotics at the end of every keynote, which you have been doing, that is our five-year preview that we should really be paying attention to.
JH: Yeah. And in fact, I’ve been talking about it for about three years.
Yeah, so a couple years from now.
JH: It’s a couple years from now, I think it’s going to happen.
The thing that is fairly deep and fairly profound for this industry is that for all of the last 60 years we’ve been the IT industry, which is a technology and tool, it’s a technology and tool used by people — for the very first time, we’re going to leave the IT budget, what we sell goes into the IT budget, we’re about to leave the IT budget and into the manufacturing or the OpEx budget.
The manufacturing budget is because we’re building robots or because robotic systems are being used to build products and then the OpEx is because of digital workers. The world’s OpEx and CapEx is what? Combined trillion? It’s a giant number. So the IT industry is about a trillion, we’re about to bring, because of AI, all of us into about a trillion industry.
Of course my first hope, and I think it will happen this way, although jobs will be changed and some jobs will be lost, a lot of jobs will be created. It is very likely that robotic systems where their agents are physical robots, will likely expand the world’s GDP. The reason for that is we have a shortage of labor, that’s why everybody’s employed. If you go around the United States, unemployment is at all-time lows, and so it’s because we just don’t have enough labor. Restaurants are having a hard time filling staff, many factories are obviously having a very hard time filling staff. I think the idea that you would hire a robot for a year, I think people will do that in a heartbeat and the reason for that is because it just increased their ability to generate more revenues, and so I think that that next five, ten years is we’re likely to experience that expansion of GDP and a whole new industry of these token manufacturing systems that people now will understand.
What I thought was also interesting about today’s keynote is I prepped for this interview before I came and I’m like, “Well, it’s probably going to be a bit of a rehash of GTC”, and I thought it was actually pretty starkly different. Here’s my interpretation, you have to let me know if it’s correct. It felt like GTC was for the hyperscalers and today’s presentation was for enterprise IT, it was like two different markets.
JH: Yeah.
Do I have that correct in terms of the target?
JH: Enterprise IT or agents and robots, and agents for enterprise IT and robots for manufacturing and the reason for that is very clear, this is this the beginning of the ecosystem.
You made a beautiful video by the way of the Taiwan ecosystem and that goes into making all the pieces, that was really great.
Dynamo and Full-Stack Nvidia
Let’s go to the GTC keynote, that was one of my favorite keynotes of yours, I do watch them all and watched them all for years. Some real Professor Jensen energy, as you explain the limitations of data centers, why Nvidia was the answer, and I interpreted that as kind of an anti-ASIC message. You had a combination of, you showed your roadmap, it’s like, “Try to keep up with this”, and then number two, you introduce the Pareto curve of latency versus bandwidth, and because they’re programmable, you can use the same GPUs all over this curve and of course, hyperscalers are the ones that are going make ASICs.
Do I have the right understanding of your presentation there?
JH: I think the teachings was right, the reasons why I did it wasn’t exactly that. I was simply trying to help people understand how to build a new data center. We’ve been thinking about it and so here’s the challenge. There’s only so much energy in the data center. 100 megawatts is 100 megawatts, 250 megawatts is 250 megawatts and so your fundamental job, if it’s a factory, is to make sure that the overall throughput-per-watt is the highest because that overall throughput in tokens, depending on if it’s cheap, inexpensive tokens, meaning free-to-use tokens or the high quality tokens that somebody might pay actually say, a thousand dollars a month, a month.
Well you just mentioned a AI assistant.
JH: Exactly. Would I hire a /year AI agent? In a heartbeat. And the reason for that is we hire people way more expensive than that all day long and if I can just simply amplify somebody who I’m paying a year, that’d be incredible, for a hundred thousand bucks, so of course I would.
The quality of tokens that you generate in a factory is quite varied. You need some that are free-to-use, you need some that are high quality and so you’re across that Pareto. You can’t design a chip or a system that is only good at one, because it’ll be underutilized and so now the question is, how do you create a system that simultaneously, at some time, could be used for free token generation, some of it for free tokens, some of it for high quality?
If you cause the architecture to be too fragmented, then your ability to move workload back and forth is difficult and so I think when people go through the thinking of it, if you design a system that’s very, very good at high token rate, it naturally has very low overall throughput. If you design something at a very high throughput, it tends to have very low interactivity, it’s tokens-per-second per user is low and so it’s easy to hug the X-axis, it’s easy to hug the Y-axis, it’s hard to fill out that area, and so that’s the invention over all of the combination of what we did with the Blackwell architecture and FP4 and NVLink 72 and the ratio, the balance between HBM memory and its capacity, the balance between the amount of floating-point and the memory capacity and bandwidth and then very importantly, the Dynamo disaggregated streaming serving ecosystem, hardware system.
I wanted to ask you about Dynamo, which did not come up today, but I think is super interesting.
JH: Super important.
Give me the pitch, I think you called it the operating system for data centers.
JH: The pitch basically is that the inference workload, the transformer, has different stages of it, and different stages could be used differently depending on the user and depending on the model and depending on the context of that model and so we disaggregated the processing of the large language model into pre-fill, which is the context processing, thinking about what you’re about to ask me. It has to do with my memories of Ben and the type of the deep and conversational podcasts like you like to do, and they tend to have — if I start talking deeply about the industry and the technology, I don’t feel uncomfortable doing so.
Right, you’re not doing a sound byte right now for the evening news or something like that.
JH: That’s right. I feel like I can lean in and because you’ll understand it, I don’t feel like I’m talking to the wall, and so I feel very comfortable talking about these things.
Well, when an AI comes to a chatbot, the chatbot needs to have some of that context and so chatbots have memory, they process context, and they might even have to read a PDF or two, and so that’s called a pre-fill part, that pre-fill part is very floating-point intensive.
Then there’s the decode part. The decode part is about now generating the thoughts, it’s about reasoning through what you’re about to say, predicting the next token and so a chain of thought basically generates a lot more tokens, which gets fed back into the context which generates more tokens, and so it’s reasoning through a problem step-by-step, maybe it has to go off and read some stuff. The modern versions of AI, this agentic AI’s, reasoning AI’s, the amount of floating-point, the amount of bandwidth — decode requires a lot of bandwidth — is high in all cases, but it could be higher.
It varies.
JH: That’s right, it varies depending on things.
You don’t need a high floating-point precision in the decode stage.
JH: That’s right. So for example, one-shot, and it’s got a strong KV cache already, you don’t need much floating-point. However, the moment you load it with context, you need a lot of floating-point. Dynamo disaggregates all the processing and it disperses it in the data center smartly metering the workload and metering the load on the processors, really complex stuff.
Well, and it ties into, if the entire data center is one GPU, you’re talking about a software layer that treats it that way.
JH: That’s right, it’s essentially the operating system of an AI factory.
When you think about these thinking models, these reasoning models, looking forward — you’re someone, like you said, you have great predictions — do you see these being used mostly in agentic workflows and the downside of them is you’re sitting around and waiting for them, or maybe you’re setting up a bunch of agents that are acting in parallel, so that works out well, or they can actually end up being most important in generating data for training to get better one-shot results which is how people would interact more frequently?
JH: I think depending on cost, and my prediction is that it’s likely that reasoning models will just be the baseline, because we’re going to process this so lightning fast. Basically when you turn on Grace Blackwell, it’s 40 times faster, and let’s say the next click is another 40 times faster and the models are getting better. So the idea that between now and five years from now that we could be 100,000 times faster for agentic models, very sensible to me.
That’s the history of computing.
JH: That’s right. So it just thought about a mountain of things, you just didn’t see it. It’s a fast thinker now, even slow thinking is fast.
What was that book? Thinking Fast and Slow, now apply that to AI. I guess it could read the whole thing in a second, so it might defeat the purpose.
JH: That’s right.
Enterprise AI and Pragmatism
To go back, just a quick little touch on politics. Is there a bit where your delivery in talking about this and your performance-per-watt, is that really a US-centric thing in a world where we have a hard time building power and power is the chief constraint? You look at something like these Gulf countries, power more accessible, easier to build for various reasons, and you go to China, guess what? If power is not the chief constraint, you can work through a lot of problems that Nvidia solves for you. Is that a reason GTC is in the US, that’s the message for the US?
JH: Oh, I didn’t think of it that way. I think that no matter what happens, your factory will always be a certain size and even though your country has a lot more energy, your data center doesn’t and so I think perf-per-watt is important, always.
It’s always important, but the degree of importance may vary.
JH: That’s right, yeah. It’s just that if you’re planning for it, on the other hand you say, “Okay, well I have an architecture that has half the performance-per-watt, and so maybe I’ll just get twice as much land, and twice of some much power, and just start building that from the get go”. When you put all that stuff together, though, this is the problem.
Remember, even the infrastructure itself and the power delivery let’s say for a gigawatt, let’s just do some simple math. Let me just say billion of it could be shell, power, land, operating it, all that billion. Let’s say the compute and the networking and everything, all in storage, billion, okay? Well, if it turns out that you have to build twice as many, you just multiply your by 2, so it’s billion and so you’re going to have to get some really cheap compute to make up for it. That’s why I always feel that in the world of AI factories, the math would suggest that if an architecture is not as good, sometimes depending on how poor it is, even free is not cheap enough.
If it’s your only choice, you’ll make it work.
JH: That’s right.
Well, let’s contrast that to today. You said a couple of times today, “I love it if you buy everything from me, but I’m happy you buy anything from me”. It was funny before it fully crystallized for me that this feels like the enterprise keynote, which is again my words not yours, there was this sense of pragmatism that I’m like, “He’s sounding like an enterprise CEO right now, they’re very pragmatic”. Of course, if you buy the whole stack, it works better, and it’s kind of interesting where if you’re talking about building a full up AI factory, to use your words, of course using all Nvidia will maximize your returns from it, but there’s a lot of customers out there that are just buying bits and pieces, and those customers, maybe you like them to buy the whole thing, but also if they buy anything from you, they’re probably going to buy it from you forever. So it seems like just strategically, it’s a very useful base to go for.
JH: Serving a customer is just smart. If you look at the way Nvidia goes to market, we’ve always built things in a fully integrated way because software needs to be integrated with hardware somehow, but we do it with enough discipline that we can then disaggregate the software from the hardware, and you could decide not to use our software, you can choose not to. And if you look at the way we design our systems, we’ve actually disaggregated the systems in a sufficiently disciplined way, that if you wanted to replace some of it, you could. Right now, Grace Blackwell is being integrated, and stood up all over the world in different clouds and they’re all based on our standards, but they’re all a little bit different, and yet we fit into them.
That’s I think the real challenge of Nvidia’s business model, and it goes hand-in-hand with wanting to be a computing platform company. The most important thing is that one of Nvidia’s stacks, if it’s our compute stack, that’s great. If our networking stack, which I feel deeply about and as strongly about as my computing stack, if my computing stack gets adopted all in all, terrific. If my networking stack gets adopted, terrific. If both of them gets adopted, incredible.
Well, I mean, a lot of people, your NVLink Fusion, you can get just NVLink, you can integrate it with your ASIC with — again, total contrast to what I interpreted the GTC messaging — but again, I can see the view here. I mean, who’s the customer?
JH: I still have deep beliefs that Nvidia is building a better system overall, I still believe that. And if I don’t believe that, then obviously we must be doing something wrong, and we’ve got to go get ourselves to believe that, and so I completely believe that Nvidia is the largest scale accelerated computing company in the world, we’re the largest scale AI computing company in the world. Nobody of 36-38,000 people is united to this one job than any place ever and so the idea that a small team of 14 people could do a better job than us would be quite painful to internalize, and so we strive to do better.
However, you also believe in scale, and a great way to get scale in everything that you’re selling is to sell it however the customer wants it.
JH: That’s right, exactly. That’s exactly right. So I have preferences, but we want to make sure that we’re able to serve every customer however they’d like to be served.
Whither Gaming
Along these lines, and maybe this is related: I was asking a friend of mine about this interview and he said his son insisted that I ask this question. Some people in gaming feel, you mentioned it today, only 10% of the keynote is about GeForce, but that’s still important to us. Is that a, “It’s still important to us because this all scales and we’re making GPUs?”, or what should I tell my friend’s son about Nvidia and gaming?
JH: See, I wish I had said it, RTX PRO wouldn’t be possible without GeForce, Omniverse wouldn’t be possible without GeForce, not one of those pixels that we saw in any of those videos would’ve been possible without GeForce, robots wouldn’t be possible without GeForce, Newton is not possible without GeForce, so GeForce itself, they’re not as deeply part of the GTC event because GTC tends to be about high-performance computing, and enterprise, and AI, and things like that. We have a separate conference for game developers and things like that and so it’s just that when I do GTC, I got a group of people that I always feel a little badly about that their product launch isn’t as central, but it’s just not the right audience, but they also know that GeForce plays such an integral role in everything that we do.
I mean, is there a bit that maybe the gamers just don’t fully appreciate the extent to which their GeForce’s are much more than just graphic rendering engines at this point?
JH:Yeah, right. Exactly. And I said it today, we’re only rendering 1 out of 10 pixels, it’s a shocking number. Suppose I gave you a puzzle, and I gave you 1 out of 10 pieces and the other 10 pieces I’m not even going to give you, you’ve just got to make them up.
I got another pitch for you to connect gaming to your other things. You just talked about how you’re disciplined about keeping things separate, and being able to separate it and software managing all that. Kind of sounds like the driver problem on Windows, to be totally honest, that’s just a core skill set that you have.
JH: Yeah, it’s just a driver is too low-level, and it’s got too many things, too many registers and the driver abstraction was actually a revolution that Microsoft really played a very large role in. Windows wouldn’t be where Windows is if not for this concept of a driver, and it created an abstraction of an API while underneath the hardware can change fairly significantly.
We’re open source, our driver now, and quite frankly, I don’t see that many people contributing to it, and the reason for that is because the moment I come up with another GPU, all the work that they did in the last driver is kind of thrown away, and so without a large body of engineers like Nvidia has, it’s hard to do. But if we optimize every GPU for every driver with its associated driver, then there’s a wonderful isolation layer, an abstraction layer, whether it’s CUDA or DirectX, that people could build on top of.
Look, here’s my answer to my friend’s son. I had to ask you about the government stuff, and you gave a good and passionate defense of your view, but you truly got excited and your eyes lit up when I asked about gaming drivers.
JH: Oh, is that right?
So I think everything’s still good.
JH: Oh good. Yeah, I love GeForce actually.
There you go, that’s why it’s good to speak in-person. Jensen Huang, thank you very much.
This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery.
The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount, please contact me directly.
Thanks for being a supporter, and have a great day!
Get notified about new Articles
Please verify your email address to proceed.
←2025.20: Product Dreams and Marketplace Realities
#interview #with #nvidia #ceo #jensen
·18 Views