Buscar | CGShares

Buscar

Entradas

Blogs

Usuarios

Páginas

Grupos

MoEngage @MoEngage Compartió un vínculo
2025-06-15 07:28:48 ·

Alec Haase Q&A: Customer Engagement Book Interview

Reading Time: 6 minutes
What is marketing without data? Assumptions. Guesses. Fluff.
For Chapter 6 of our book, “The Customer Engagement Book: Adapt or Die,” we spoke with Alec Haase, Product GTM Lead, Commerce and AI at Hightouch, to explore how engagement data can truly inform critical business decisions.
Alec discusses the different types of customer behaviors that matter most, how to separate meaningful information from the rest, and the role of systems that learn over time to create tailored customer experiences.
This interview provides insights into using data for real-time actions and shaping the future of marketing. Prepare to learn about AI decision-making and how a focus on data is changing how we engage with customers.

Alec Haase Q&A Interview
1. What types of customer engagement data are most valuable for making strategic business decisions?
It’s a culmination of everything.
Behavioral signals — the actual conversions and micro-conversions that users take within your product or website.
Obviously, that’s things like purchases. But there are also other behavioral signals marketers should be using and thinking about. Things like micro-conversions — maybe that’s shopping for a product, clicking to learn more about a product, or visiting a certain page on your website.
Behind that, you also need to have all your user data to tie that to.

So I know someone took said action; I can follow up with them in email or out on paid social. I need the user identifiers to do that.

2. How do you distinguish between data that is actionable versus data that is just noise?
Data that’s actionable includes the conversions and micro-conversions — very clear instances of “someone did this.” I can react to or measure those.
What’s becoming a bit of a challenge for marketers is understanding that there’s other data that is valuable for machine learning or reinforcement learning models, things like tags on the types of products customers are interacting with.
Maybe there’s category information about that product, or color information. That would otherwise look like noise to the average marketer. But behind the scenes, it can be used for reinforcement learning.

There is definitely the “clear-cut” actionable data, but marketers shouldn’t be quick to classify things as noise because the rise in machine learning and reinforcement learning will make that data more valuable.

3. How can customer engagement data be used to identify and prioritize new business opportunities?
At Hightouch, we don’t necessarily think about retroactive analysis. We have a system where we have customer engagement data firing in that we then have real-time scores reacting to.
An interesting example is when you have machine learning and reinforcement learning models running. In the pet retailer example I gave you, the system is able to figure out what to prioritize.
The concept of reinforcement learning is not a marketer making rules to say, “I know this type of thing works well on this type of audience.”

It’s the machine itself using the data to determine what attribute responds well to which offer, recommendation, or marketing campaign.

4. How can marketers ensure their use of customer engagement data aligns with the broader business objectives?
It starts with the objectives. It’s starting with the desired outcome and working your way back. That whole flip of the paradigm is starting with outcomes and letting the system optimize. What are you trying to drive, and then back into the types of experiences that can make that happen?
There’s personalization.
When we talk about data-driven experiences and personalization, Spotify Wrapped is the North Star. For Spotify Wrapped, you want to drive customer stickiness and create a brand. To make that happen, you want to send a personalized email. What components do you want in that email?

Maybe it’s top five songs, top five artists, and then you can back into the actual event data you need to make that happen.

5. What role does engagement data play in influencing cross-functional decisions such as those in product development, sales, or customer service?
For product development, it’s product analytics — knowing what features users are using, or seeing in heat maps where users are clicking.
Sales is similar. We’re using behavioral signals like what types of content they’re reading on the site to help inform what they would be interested in — the types of products or the types of use cases.

For customer service, you can look at errors they’ve run into in the past or specific purchases they’ve made, so that when you’re helping them the next time they engage with you, you know exactly what their past behaviors were and what products they could be calling about.

6. What are some challenges marketers face when trying to translate customer engagement data into actionable insights?
Access to data is one challenge. You might not know what data you have because marketers historically may not have been used to the systems where data is stored.
Historically, that’s been pretty siloed away from them. Rich behavioral data and other data across the business was stored somewhere else.
Now, as more companies embrace the data warehouse at the center of their business, it gives everyone a true single place where data can be stored.

Marketers are working more with data teams, understanding more about the data they have, and using that data to power downstream use cases, personalization, reinforcement learning, or general business insights.

7. How do you address skepticism or resistance from stakeholders when presenting data-driven recommendations?
As a marketer, I think proof is key. The best thing is if you’ve actually run a test. “I think we should do this. I ran a small test, and it’s showing that this is actually proving out.” Being able to clearly explain and justify your reasoning with data is super important.

8. What technology or tools have you found most effective for gathering and analyzing customer engagement data?
Any type of behavioral event collection, specifically ones that write to the cloud data warehouse, is the critical component. Your data team is operating off the data warehouse.
Having an event collection product that stores data in that central spot is really important if you want to use the other data when making recommendations.
You want to get everything into the data warehouse where it can be used both for insights and for putting into action.

For Spotify Wrapped, you want to collect behavioral event signals like songs listened to or concerts attended, writing to the warehouse so that you can get insights back — how many songs were played this year, projections for next month — but then you can also use those behavioral events in downstream platforms to fire off personalized emails with product recommendations or Spotify Wrapped-style experiences.

9. How do you see the role of customer engagement data evolving in shaping business strategies over the next five years?

What we’re excited about is the concept of AI Decisioning — having AI agents actually using customer data to train their own models and decision-making to create personalized experiences.
We’re sitting on top of all this behavioral data, engagement data, and user attributes, and our system is learning from all of that to make the best decisions across downstream systems.
Whether that’s as simple as driving a loyalty program and figuring out what emails to send or what on-site experiences to show, or exposing insights that might lead you to completely change your business strategy, we see engagement data as the fuel to the engine of reinforcement learning, machine learning, AI agents, this whole next wave of Martech that’s just now coming.
But it all starts with having the data to train those systems.

I think that behavioral data is the fuel of modern Martech, and that only holds more true as Martech platforms adopt these decisioning and AI capabilities, because they’re only as good as the data that’s training the models.

This interview Q&A was hosted with Alec Haase, Product GTM Lead, Commerce and AI at Hightouch, for Chapter 6 of The Customer Engagement Book: Adapt or Die.
Download the PDF or request a physical copy of the book here.
The post Alec Haase Q&A: Customer Engagement Book Interview appeared first on MoEngage.
#alec #haase #qampampa #customer #engagement

Alec Haase Q&A: Customer Engagement Book Interview
Reading Time: 6 minutes What is marketing without data? Assumptions. Guesses. Fluff. For Chapter 6 of our book, “The Customer Engagement Book: Adapt or Die,” we spoke with Alec Haase, Product GTM Lead, Commerce and AI at Hightouch, to explore how engagement data can truly inform critical business decisions. Alec discusses the different types of customer behaviors that matter most, how to separate meaningful information from the rest, and the role of systems that learn over time to create tailored customer experiences. This interview provides insights into using data for real-time actions and shaping the future of marketing. Prepare to learn about AI decision-making and how a focus on data is changing how we engage with customers. Alec Haase Q&A Interview 1. What types of customer engagement data are most valuable for making strategic business decisions? It’s a culmination of everything. Behavioral signals — the actual conversions and micro-conversions that users take within your product or website. Obviously, that’s things like purchases. But there are also other behavioral signals marketers should be using and thinking about. Things like micro-conversions — maybe that’s shopping for a product, clicking to learn more about a product, or visiting a certain page on your website. Behind that, you also need to have all your user data to tie that to. So I know someone took said action; I can follow up with them in email or out on paid social. I need the user identifiers to do that. 2. How do you distinguish between data that is actionable versus data that is just noise? Data that’s actionable includes the conversions and micro-conversions — very clear instances of “someone did this.” I can react to or measure those. What’s becoming a bit of a challenge for marketers is understanding that there’s other data that is valuable for machine learning or reinforcement learning models, things like tags on the types of products customers are interacting with. Maybe there’s category information about that product, or color information. That would otherwise look like noise to the average marketer. But behind the scenes, it can be used for reinforcement learning. There is definitely the “clear-cut” actionable data, but marketers shouldn’t be quick to classify things as noise because the rise in machine learning and reinforcement learning will make that data more valuable. 3. How can customer engagement data be used to identify and prioritize new business opportunities? At Hightouch, we don’t necessarily think about retroactive analysis. We have a system where we have customer engagement data firing in that we then have real-time scores reacting to. An interesting example is when you have machine learning and reinforcement learning models running. In the pet retailer example I gave you, the system is able to figure out what to prioritize. The concept of reinforcement learning is not a marketer making rules to say, “I know this type of thing works well on this type of audience.” It’s the machine itself using the data to determine what attribute responds well to which offer, recommendation, or marketing campaign. 4. How can marketers ensure their use of customer engagement data aligns with the broader business objectives? It starts with the objectives. It’s starting with the desired outcome and working your way back. That whole flip of the paradigm is starting with outcomes and letting the system optimize. What are you trying to drive, and then back into the types of experiences that can make that happen? There’s personalization. When we talk about data-driven experiences and personalization, Spotify Wrapped is the North Star. For Spotify Wrapped, you want to drive customer stickiness and create a brand. To make that happen, you want to send a personalized email. What components do you want in that email? Maybe it’s top five songs, top five artists, and then you can back into the actual event data you need to make that happen. 5. What role does engagement data play in influencing cross-functional decisions such as those in product development, sales, or customer service? For product development, it’s product analytics — knowing what features users are using, or seeing in heat maps where users are clicking. Sales is similar. We’re using behavioral signals like what types of content they’re reading on the site to help inform what they would be interested in — the types of products or the types of use cases. For customer service, you can look at errors they’ve run into in the past or specific purchases they’ve made, so that when you’re helping them the next time they engage with you, you know exactly what their past behaviors were and what products they could be calling about. 6. What are some challenges marketers face when trying to translate customer engagement data into actionable insights? Access to data is one challenge. You might not know what data you have because marketers historically may not have been used to the systems where data is stored. Historically, that’s been pretty siloed away from them. Rich behavioral data and other data across the business was stored somewhere else. Now, as more companies embrace the data warehouse at the center of their business, it gives everyone a true single place where data can be stored. Marketers are working more with data teams, understanding more about the data they have, and using that data to power downstream use cases, personalization, reinforcement learning, or general business insights. 7. How do you address skepticism or resistance from stakeholders when presenting data-driven recommendations? As a marketer, I think proof is key. The best thing is if you’ve actually run a test. “I think we should do this. I ran a small test, and it’s showing that this is actually proving out.” Being able to clearly explain and justify your reasoning with data is super important. 8. What technology or tools have you found most effective for gathering and analyzing customer engagement data? Any type of behavioral event collection, specifically ones that write to the cloud data warehouse, is the critical component. Your data team is operating off the data warehouse. Having an event collection product that stores data in that central spot is really important if you want to use the other data when making recommendations. You want to get everything into the data warehouse where it can be used both for insights and for putting into action. For Spotify Wrapped, you want to collect behavioral event signals like songs listened to or concerts attended, writing to the warehouse so that you can get insights back — how many songs were played this year, projections for next month — but then you can also use those behavioral events in downstream platforms to fire off personalized emails with product recommendations or Spotify Wrapped-style experiences. 9. How do you see the role of customer engagement data evolving in shaping business strategies over the next five years? What we’re excited about is the concept of AI Decisioning — having AI agents actually using customer data to train their own models and decision-making to create personalized experiences. We’re sitting on top of all this behavioral data, engagement data, and user attributes, and our system is learning from all of that to make the best decisions across downstream systems. Whether that’s as simple as driving a loyalty program and figuring out what emails to send or what on-site experiences to show, or exposing insights that might lead you to completely change your business strategy, we see engagement data as the fuel to the engine of reinforcement learning, machine learning, AI agents, this whole next wave of Martech that’s just now coming. But it all starts with having the data to train those systems. I think that behavioral data is the fuel of modern Martech, and that only holds more true as Martech platforms adopt these decisioning and AI capabilities, because they’re only as good as the data that’s training the models. This interview Q&A was hosted with Alec Haase, Product GTM Lead, Commerce and AI at Hightouch, for Chapter 6 of The Customer Engagement Book: Adapt or Die. Download the PDF or request a physical copy of the book here. The post Alec Haase Q&A: Customer Engagement Book Interview appeared first on MoEngage. #alec #haase #qampampa #customer #engagement

Alec Haase Q&A: Customer Engagement Book Interview

www.moengage.com
Reading Time: 6 minutes What is marketing without data? Assumptions. Guesses. Fluff. For Chapter 6 of our book, “The Customer Engagement Book: Adapt or Die,” we spoke with Alec Haase, Product GTM Lead, Commerce and AI at Hightouch, to explore how engagement data can truly inform critical business decisions. Alec discusses the different types of customer behaviors that matter most, how to separate meaningful information from the rest, and the role of systems that learn over time to create tailored customer experiences. This interview provides insights into using data for real-time actions and shaping the future of marketing. Prepare to learn about AI decision-making and how a focus on data is changing how we engage with customers. Alec Haase Q&A Interview 1. What types of customer engagement data are most valuable for making strategic business decisions? It’s a culmination of everything. Behavioral signals — the actual conversions and micro-conversions that users take within your product or website. Obviously, that’s things like purchases. But there are also other behavioral signals marketers should be using and thinking about. Things like micro-conversions — maybe that’s shopping for a product, clicking to learn more about a product, or visiting a certain page on your website. Behind that, you also need to have all your user data to tie that to. So I know someone took said action; I can follow up with them in email or out on paid social. I need the user identifiers to do that. 2. How do you distinguish between data that is actionable versus data that is just noise? Data that’s actionable includes the conversions and micro-conversions — very clear instances of “someone did this.” I can react to or measure those. What’s becoming a bit of a challenge for marketers is understanding that there’s other data that is valuable for machine learning or reinforcement learning models, things like tags on the types of products customers are interacting with. Maybe there’s category information about that product, or color information. That would otherwise look like noise to the average marketer. But behind the scenes, it can be used for reinforcement learning. There is definitely the “clear-cut” actionable data, but marketers shouldn’t be quick to classify things as noise because the rise in machine learning and reinforcement learning will make that data more valuable. 3. How can customer engagement data be used to identify and prioritize new business opportunities? At Hightouch, we don’t necessarily think about retroactive analysis. We have a system where we have customer engagement data firing in that we then have real-time scores reacting to. An interesting example is when you have machine learning and reinforcement learning models running. In the pet retailer example I gave you, the system is able to figure out what to prioritize. The concept of reinforcement learning is not a marketer making rules to say, “I know this type of thing works well on this type of audience.” It’s the machine itself using the data to determine what attribute responds well to which offer, recommendation, or marketing campaign. 4. How can marketers ensure their use of customer engagement data aligns with the broader business objectives? It starts with the objectives. It’s starting with the desired outcome and working your way back. That whole flip of the paradigm is starting with outcomes and letting the system optimize. What are you trying to drive, and then back into the types of experiences that can make that happen? There’s personalization. When we talk about data-driven experiences and personalization, Spotify Wrapped is the North Star. For Spotify Wrapped, you want to drive customer stickiness and create a brand. To make that happen, you want to send a personalized email. What components do you want in that email? Maybe it’s top five songs, top five artists, and then you can back into the actual event data you need to make that happen. 5. What role does engagement data play in influencing cross-functional decisions such as those in product development, sales, or customer service? For product development, it’s product analytics — knowing what features users are using, or seeing in heat maps where users are clicking. Sales is similar. We’re using behavioral signals like what types of content they’re reading on the site to help inform what they would be interested in — the types of products or the types of use cases. For customer service, you can look at errors they’ve run into in the past or specific purchases they’ve made, so that when you’re helping them the next time they engage with you, you know exactly what their past behaviors were and what products they could be calling about. 6. What are some challenges marketers face when trying to translate customer engagement data into actionable insights? Access to data is one challenge. You might not know what data you have because marketers historically may not have been used to the systems where data is stored. Historically, that’s been pretty siloed away from them. Rich behavioral data and other data across the business was stored somewhere else. Now, as more companies embrace the data warehouse at the center of their business, it gives everyone a true single place where data can be stored. Marketers are working more with data teams, understanding more about the data they have, and using that data to power downstream use cases, personalization, reinforcement learning, or general business insights. 7. How do you address skepticism or resistance from stakeholders when presenting data-driven recommendations? As a marketer, I think proof is key. The best thing is if you’ve actually run a test. “I think we should do this. I ran a small test, and it’s showing that this is actually proving out.” Being able to clearly explain and justify your reasoning with data is super important. 8. What technology or tools have you found most effective for gathering and analyzing customer engagement data? Any type of behavioral event collection, specifically ones that write to the cloud data warehouse, is the critical component. Your data team is operating off the data warehouse. Having an event collection product that stores data in that central spot is really important if you want to use the other data when making recommendations. You want to get everything into the data warehouse where it can be used both for insights and for putting into action. For Spotify Wrapped, you want to collect behavioral event signals like songs listened to or concerts attended, writing to the warehouse so that you can get insights back — how many songs were played this year, projections for next month — but then you can also use those behavioral events in downstream platforms to fire off personalized emails with product recommendations or Spotify Wrapped-style experiences. 9. How do you see the role of customer engagement data evolving in shaping business strategies over the next five years? What we’re excited about is the concept of AI Decisioning — having AI agents actually using customer data to train their own models and decision-making to create personalized experiences. We’re sitting on top of all this behavioral data, engagement data, and user attributes, and our system is learning from all of that to make the best decisions across downstream systems. Whether that’s as simple as driving a loyalty program and figuring out what emails to send or what on-site experiences to show, or exposing insights that might lead you to completely change your business strategy, we see engagement data as the fuel to the engine of reinforcement learning, machine learning, AI agents, this whole next wave of Martech that’s just now coming. But it all starts with having the data to train those systems. I think that behavioral data is the fuel of modern Martech, and that only holds more true as Martech platforms adopt these decisioning and AI capabilities, because they’re only as good as the data that’s training the models. This interview Q&A was hosted with Alec Haase, Product GTM Lead, Commerce and AI at Hightouch, for Chapter 6 of The Customer Engagement Book: Adapt or Die. Download the PDF or request a physical copy of the book here. The post Alec Haase Q&A: Customer Engagement Book Interview appeared first on MoEngage.

0 Commentarios ·0 Acciones ·0 Vista previa

Please log in to like, share and comment!
Microsoft Academic @MicrosoftAcademic Compartió un vínculo
2025-06-15 06:55:50 ·

How AI is reshaping the future of healthcare and medical research

Transcript      
PETER LEE: “In ‘The Little Black Bag,’ a classic science fiction story, a high-tech doctor’s kit of the future is accidentally transported back to the 1950s, into the shaky hands of a washed-up, alcoholic doctor. The ultimate medical tool, it redeems the doctor wielding it, allowing him to practice gratifyingly heroic medicine. … The tale ends badly for the doctor and his treacherous assistant, but it offered a picture of how advanced technology could transform medicine—powerful when it was written nearly 75 years ago and still so today. What would be the Al equivalent of that little black bag? At this moment when new capabilities are emerging, how do we imagine them into medicine?”         
This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.  
Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?   
In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.  The book passage I read at the top is from “Chapter 10: The Big Black Bag.”
In imagining AI in medicine, Carey, Zak, and I included in our book two fictional accounts. In the first, a medical resident consults GPT-4 on her personal phone as the patient in front of her crashes. Within seconds, it offers an alternate response based on recent literature. In the second account, a 90-year-old woman with several chronic conditions is living independently and receiving near-constant medical support from an AI aide.
In our conversations with the guests we’ve spoken to so far, we’ve caught a glimpse of these predicted futures, seeing how clinicians and patients are actually using AI today and how developers are leveraging the technology in the healthcare products and services they’re creating. In fact, that first fictional account isn’t so fictional after all, as most of the doctors in the real world actually appear to be using AI at least occasionally—and sometimes much more than occasionally—to help in their daily clinical work. And as for the second fictional account, which is more of a science fiction account, it seems we are indeed on the verge of a new way of delivering and receiving healthcare, though the future is still very much open.
As we continue to examine the current state of AI in healthcare and its potential to transform the field, I’m pleased to welcome Bill Gates and Sébastien Bubeck.
Bill may be best known as the co-founder of Microsoft, having created the company with his childhood friend Paul Allen in 1975. He’s now the founder of Breakthrough Energy, which aims to advance clean energy innovation, and TerraPower, a company developing groundbreaking nuclear energy and science technologies. He also chairs the world’s largest philanthropic organization, the Gates Foundation, and focuses on solving a variety of health challenges around the globe and here at home.
Sébastien is a research lead at OpenAI. He was previously a distinguished scientist, vice president of AI, and a colleague of mine here at Microsoft, where his work included spearheading the development of the family of small language models known as Phi. While at Microsoft, he also coauthored the discussion-provoking 2023 paper “Sparks of Artificial General Intelligence,” which presented the results of early experiments with GPT-4 conducted by a small team from Microsoft Research.
Here’s my conversation with Bill Gates and Sébastien Bubeck.
LEE: Bill, welcome.
BILL GATES: Thank you.
LEE: Seb …
SÉBASTIEN BUBECK: Yeah. Hi, hi, Peter. Nice to be here.
LEE: You know, one of the things that I’ve been doing just to get the conversation warmed up is to talk about origin stories, and what I mean about origin stories is, you know, what was the first contact that you had with large language models or the concept of generative AI that convinced you or made you think that something really important was happening?
And so, Bill, I think I’ve heard the story about, you know, the time when the OpenAI folks—Sam Altman, Greg Brockman, and others—showed you something, but could we hear from you what those early encounters were like and what was going through your mind?
GATES: Well, I’d been visiting OpenAI soon after it was created to see things like GPT-2 and to see the little arm they had that was trying to match human manipulation and, you know, looking at their games like Dota that they were trying to get as good as human play. And honestly, I didn’t think the language model stuff they were doing, even when they got to GPT-3, would show the ability to learn, you know, in the same sense that a human reads a biology book and is able to take that knowledge and access it not only to pass a test but also to create new medicines.
And so my challenge to them was that if their LLM could get a five on the advanced placement biology test, then I would say, OK, it took biologic knowledge and encoded it in an accessible way and that I didn’t expect them to do that very quickly but it would be profound.
And it was only about six months after I challenged them to do that, that an early version of GPT-4 they brought up to a dinner at my house, and in fact, it answered most of the questions that night very well. The one it got totally wrong, we were … because it was so good, we kept thinking, Oh, we must be wrong. It turned out it was a math weaknessthat, you know, we later understood that that was an area of, weirdly, of incredible weakness of those early models. But, you know, that was when I realized, OK, the age of cheap intelligence was at its beginning.
LEE: Yeah. So I guess it seems like you had something similar to me in that my first encounters, I actually harbored some skepticism. Is it fair to say you were skeptical before that?
GATES: Well, the idea that we’ve figured out how to encode and access knowledge in this very deep sense without even understanding the nature of the encoding, …
LEE: Right.
GATES: … that is a bit weird.
LEE: Yeah.
GATES: We have an algorithm that creates the computation, but even say, OK, where is the president’s birthday stored in there? Where is this fact stored in there? The fact that even now when we’re playing around, getting a little bit more sense of it, it’s opaque to us what the semantic encoding is, it’s, kind of, amazing to me. I thought the invention of knowledge storage would be an explicit way of encoding knowledge, not an implicit statistical training.
LEE: Yeah, yeah. All right. So, Seb, you know, on this same topic, you know, I got—as we say at Microsoft—I got pulled into the tent.
BUBECK: Yes.
LEE: Because this was a very secret project. And then, um, I had the opportunity to select a small number of researchers in MSRto join and start investigating this thing seriously. And the first person I pulled in was you.
BUBECK: Yeah.
LEE: And so what were your first encounters? Because I actually don’t remember what happened then.
BUBECK: Oh, I remember it very well.My first encounter with GPT-4 was in a meeting with the two of you, actually. But my kind of first contact, the first moment where I realized that something was happening with generative AI, was before that. And I agree with Bill that I also wasn’t too impressed by GPT-3.
I though that it was kind of, you know, very naturally mimicking the web, sort of parroting what was written there in a nice way. Still in a way which seemed very impressive. But it wasn’t really intelligent in any way. But shortly after GPT-3, there was a model before GPT-4 that really shocked me, and this was the first image generation model, DALL-E 1.
So that was in 2021. And I will forever remember the press release of OpenAI where they had this prompt of an avocado chair and then you had this image of the avocado chair.And what really shocked me is that clearly the model kind of “understood” what is a chair, what is an avocado, and was able to merge those concepts.
So this was really, to me, the first moment where I saw some understanding in those models.
LEE: So this was, just to get the timing right, that was before I pulled you into the tent.
BUBECK: That was before. That was like a year before.
LEE: Right.
BUBECK: And now I will tell you how, you know, we went from that moment to the meeting with the two of you and GPT-4.
So once I saw this kind of understanding, I thought, OK, fine. It understands concept, but it’s still not able to reason. It cannot—as, you know, Bill was saying—it cannot learn from your document. It cannot reason.
So I set out to try to prove that. You know, this is what I was in the business of at the time, trying to prove things in mathematics. So I was trying to prove that basically autoregressive transformers could never reason. So I was trying to prove this. And after a year of work, I had something reasonable to show. And so I had the meeting with the two of you, and I had this example where I wanted to say, there is no way that an LLM is going to be able to do x.
And then as soon as I … I don’t know if you remember, Bill. But as soon as I said that, you said, oh, but wait a second. I had, you know, the OpenAI crew at my house recently, and they showed me a new model. Why don’t we ask this new model this question?
LEE: Yeah.
BUBECK: And we did, and it solved it on the spot. And that really, honestly, just changed my life. Like, you know, I had been working for a year trying to say that this was impossible. And just right there, it was shown to be possible.
LEE:One of the very first things I got interested in—because I was really thinking a lot about healthcare—was healthcare and medicine.
And I don’t know if the two of you remember, but I ended up doing a lot of tests. I ran through, you know, step one and step two of the US Medical Licensing Exam. Did a whole bunch of other things. I wrote this big report. It was, you know, I can’t remember … a couple hundred pages.
And I needed to share this with someone. I didn’t … there weren’t too many people I could share it with. So I sent, I think, a copy to you, Bill. Sent a copy to you, Seb.
I hardly slept for about a week putting that report together. And, yeah, and I kept working on it. But I was far from alone. I think everyone who was in the tent, so to speak, in those early days was going through something pretty similar. All right. So I think … of course, a lot of what I put in the report also ended up being examples that made it into the book.
But the main purpose of this conversation isn’t to reminisce aboutor indulge in those reminiscences but to talk about what’s happening in healthcare and medicine. And, you know, as I said, we wrote this book. We did it very, very quickly. Seb, you helped. Bill, you know, you provided a review and some endorsements.
But, you know, honestly, we didn’t know what we were talking about because no one had access to this thing. And so we just made a bunch of guesses. So really, the whole thing I wanted to probe with the two of you is, now with two years of experience out in the world, what, you know, what do we think is happening today?
You know, is AI actually having an impact, positive or negative, on healthcare and medicine? And what do we now think is going to happen in the next two years, five years, or 10 years? And so I realize it’s a little bit too abstract to just ask it that way. So let me just try to narrow the discussion and guide us a little bit.
Um, the kind of administrative and clerical work, paperwork, around healthcare—and we made a lot of guesses about that—that appears to be going well, but, you know, Bill, I know we’ve discussed that sometimes that you think there ought to be a lot more going on. Do you have a viewpoint on how AI is actually finding its way into reducing paperwork?
GATES: Well, I’m stunned … I don’t think there should be a patient-doctor meeting where the AI is not sitting in and both transcribing, offering to help with the paperwork, and even making suggestions, although the doctor will be the one, you know, who makes the final decision about the diagnosis and whatever prescription gets done.
It’s so helpful. You know, when that patient goes home and their, you know, son who wants to understand what happened has some questions, that AI should be available to continue that conversation. And the way you can improve that experience and streamline things and, you know, involve the people who advise you. I don’t understand why that’s not more adopted, because there you still have the human in the loop making that final decision.
But even for, like, follow-up calls to make sure the patient did things, to understand if they have concerns and knowing when to escalate back to the doctor, the benefit is incredible. And, you know, that thing is ready for prime time. That paradigm is ready for prime time, in my view.
LEE: Yeah, there are some good products, but it seems like the number one use right now—and we kind of got this from some of the previous guests in previous episodes—is the use of AI just to respond to emails from patients.Does that make sense to you?
BUBECK: Yeah. So maybe I want to second what Bill was saying but maybe take a step back first. You know, two years ago, like, the concept of clinical scribes, which is one of the things that we’re talking about right now, it would have sounded, in fact, it sounded two years ago, borderline dangerous. Because everybody was worried about hallucinations. What happened if you have this AI listening in and then it transcribes, you know, something wrong?
Now, two years later, I think it’s mostly working. And in fact, it is not yet, you know, fully adopted. You’re right. But it is in production. It is used, you know, in many, many places. So this rate of progress is astounding because it wasn’t obvious that we would be able to overcome those obstacles of hallucination. It’s not to say that hallucinations are fully solved. In the case of the closed system, they are.
Now, I think more generally what’s going on in the background is that there is something that we, that certainly I, underestimated, which is this management overhead. So I think the reason why this is not adopted everywhere is really a training and teaching aspect. People need to be taught, like, those systems, how to interact with them.
And one example that I really like, a study that recently appeared where they tried to use ChatGPT for diagnosis and they were comparing doctors without and with ChatGPT. And the amazing thing … so this was a set of cases where the accuracy of the doctors alone was around 75%. ChatGPT alone was 90%. So that’s already kind of mind blowing. But then the kicker is that doctors with ChatGPT was 80%.
Intelligence alone is not enough. It’s also how it’s presented, how you interact with it. And ChatGPT, it’s an amazing tool. Obviously, I absolutely love it. But it’s not … you don’t want a doctor to have to type in, you know, prompts and use it that way.
It should be, as Bill was saying, kind of running continuously in the background, sending you notifications. And you have to be really careful of the rate at which those notifications are being sent. Because if they are too frequent, then the doctor will learn to ignore them. So you have to … all of those things matter, in fact, at least as much as the level of intelligence of the machine.
LEE: One of the things I think about, Bill, in that scenario that you described, doctors do some thinking about the patient when they write the note. So, you know, I’m always a little uncertain whether it’s actually … you know, you wouldn’t necessarily want to fully automate this, I don’t think. Or at least there needs to be some prompt to the doctor to make sure that the doctor puts some thought into what happened in the encounter with the patient. Does that make sense to you at all?
GATES: At this stage, you know, I’d still put the onus on the doctor to write the conclusions and the summary and not delegate that.
The tradeoffs you make a little bit are somewhat dependent on the situation you’re in. If you’re in Africa,
So, yes, the doctor’s still going to have to do a lot of work, but just the quality of letting the patient and the people around them interact and ask questions and have things explained, that alone is such a quality improvement. It’s mind blowing.
LEE: So since you mentioned, you know, Africa—and, of course, this touches on the mission and some of the priorities of the Gates Foundation and this idea of democratization of access to expert medical care—what’s the most interesting stuff going on right now? Are there people and organizations or technologies that are impressing you or that you’re tracking?
GATES: Yeah. So the Gates Foundation has given out a lot of grants to people in Africa doing education, agriculture but more healthcare examples than anything. And the way these things start off, they often start out either being patient-centric in a narrow situation, like, OK, I’m a pregnant woman; talk to me. Or, I have infectious disease symptoms; talk to me. Or they’re connected to a health worker where they’re helping that worker get their job done. And we have lots of pilots out, you know, in both of those cases.
The dream would be eventually to have the thing the patient consults be so broad that it’s like having a doctor available who understands the local things.
LEE: Right.
GATES: We’re not there yet. But over the next two or three years, you know, particularly given the worsening financial constraints against African health systems, where the withdrawal of money has been dramatic, you know, figuring out how to take this—what I sometimes call “free intelligence”—and build a quality health system around that, we will have to be more radical in low-income countries than any rich country is ever going to be.
LEE: Also, there’s maybe a different regulatory environment, so some of those things maybe are easier? Because right now, I think the world hasn’t figured out how to and whether to regulate, let’s say, an AI that might give a medical diagnosis or write a prescription for a medication.
BUBECK: Yeah. I think one issue with this, and it’s also slowing down the deployment of AI in healthcare more generally, is a lack of proper benchmark. Because, you know, you were mentioning the USMLE, for example. That’s a great test to test human beings and their knowledge of healthcare and medicine. But it’s not a great test to give to an AI.
It’s not asking the right questions. So finding what are the right questions to test whether an AI system is ready to give diagnosis in a constrained setting, that’s a very, very important direction, which to my surprise, is not yet accelerating at the rate that I was hoping for.
LEE: OK, so that gives me an excuse to get more now into the core AI tech because something I’ve discussed with both of you is this issue of what are the right tests. And you both know the very first test I give to any new spin of an LLM is I present a patient, the results—a mythical patient—the results of my physical exam, my mythical physical exam. Maybe some results of some initial labs. And then I present or propose a differential diagnosis. And if you’re not in medicine, a differential diagnosis you can just think of as a prioritized list of the possible diagnoses that fit with all that data. And in that proposed differential, I always intentionally make two mistakes.
I make a textbook technical error in one of the possible elements of the differential diagnosis, and I have an error of omission. And, you know, I just want to know, does the LLM understand what I’m talking about? And all the good ones out there do now. But then I want to know, can it spot the errors? And then most importantly, is it willing to tell me I’m wrong, that I’ve made a mistake?
That last piece seems really hard for AI today. And so let me ask you first, Seb, because at the time of this taping, of course, there was a new spin of GPT-4o last week that became overly sycophantic. In other words, it was actually prone in that test of mine not only to not tell me I’m wrong, but it actually praised me for the creativity of my differential.What’s up with that?
BUBECK: Yeah, I guess it’s a testament to the fact that training those models is still more of an art than a science. So it’s a difficult job. Just to be clear with the audience, we have rolled back thatversion of GPT-4o, so now we don’t have the sycophant version out there.
Yeah, no, it’s a really difficult question. It has to do … as you said, it’s very technical. It has to do with the post-training and how, like, where do you nudge the model? So, you know, there is this very classical by now technique called RLHF, where you push the model in the direction of a certain reward model. So the reward model is just telling the model, you know, what behavior is good, what behavior is bad.
But this reward model is itself an LLM, and, you know, Bill was saying at the very beginning of the conversation that we don’t really understand how those LLMs deal with concepts like, you know, where is the capital of France located? Things like that. It is the same thing for this reward model. We don’t know why it says that it prefers one output to another, and whether this is correlated with some sycophancy is, you know, something that we discovered basically just now. That if you push too hard in optimization on this reward model, you will get a sycophant model.
So it’s kind of … what I’m trying to say is we became too good at what we were doing, and we ended up, in fact, in a trap of the reward model.
LEE: I mean, you do want … it’s a difficult balance because you do want models to follow your desires and …
BUBECK: It’s a very difficult, very difficult balance.
LEE: So this brings up then the following question for me, which is the extent to which we think we’ll need to have specially trained models for things. So let me start with you, Bill. Do you have a point of view on whether we will need to, you know, quote-unquote take AI models to med school? Have them specially trained? Like, if you were going to deploy something to give medical care in underserved parts of the world, do we need to do something special to create those models?
GATES: We certainly need to teach them the African languages and the unique dialects so that the multimedia interactions are very high quality. We certainly need to teach them the disease prevalence and unique disease patterns like, you know, neglected tropical diseases and malaria. So we need to gather a set of facts that somebody trying to go for a US customer base, you know, wouldn’t necessarily have that in there.
Those two things are actually very straightforward because the additional training time is small. I’d say for the next few years, we’ll also need to do reinforcement learning about the context of being a doctor and how important certain behaviors are. Humans learn over the course of their life to some degree that, I’m in a different context and the way I behave in terms of being willing to criticize or be nice, you know, how important is it? Who’s here? What’s my relationship to them?
Right now, these machines don’t have that broad social experience. And so if you know it’s going to be used for health things, a lot of reinforcement learning of the very best humans in that context would still be valuable. Eventually, the models will, having read all the literature of the world about good doctors, bad doctors, it’ll understand as soon as you say, “I want you to be a doctor diagnosing somebody.” All of the implicit reinforcement that fits that situation, you know, will be there.
LEE: Yeah.
GATES: And so I hope three years from now, we don’t have to do that reinforcement learning. But today, for any medical context, you would want a lot of data to reinforce tone, willingness to say things when, you know, there might be something significant at stake.
LEE: Yeah. So, you know, something Bill said, kind of, reminds me of another thing that I think we missed, which is, the context also … and the specialization also pertains to different, I guess, what we still call “modes,” although I don’t know if the idea of multimodal is the same as it was two years ago. But, you know, what do you make of all of the hubbub around—in fact, within Microsoft Research, this is a big deal, but I think we’re far from alone—you know, medical images and vision, video, proteins and molecules, cell, you know, cellular data and so on.
BUBECK: Yeah. OK. So there is a lot to say to everything … to the last, you know, couple of minutes. Maybe on the specialization aspect, you know, I think there is, hiding behind this, a really fundamental scientific question of whether eventually we have a singular AGIthat kind of knows everything and you can just put, you know, explain your own context and it will just get it and understand everything.
That’s one vision. I have to say, I don’t particularly believe in this vision. In fact, we humans are not like that at all. I think, hopefully, we are general intelligences, yet we have to specialize a lot. And, you know, I did myself a lot of RL, reinforcement learning, on mathematics. Like, that’s what I did, you know, spent a lot of time doing that. And I didn’t improve on other aspects. You know, in fact, I probably degraded in other aspects.So it’s … I think it’s an important example to have in mind.
LEE: I think I might disagree with you on that, though, because, like, doesn’t a model have to see both good science and bad science in order to be able to gain the ability to discern between the two?
BUBECK: Yeah, no, that absolutely. I think there is value in seeing the generality, in having a very broad base. But then you, kind of, specialize on verticals. And this is where also, you know, open-weights model, which we haven’t talked about yet, are really important because they allow you to provide this broad base to everyone. And then you can specialize on top of it.
LEE: So we have about three hours of stuff to talk about, but our time is actually running low.
BUBECK: Yes, yes, yes.
LEE: So I think I want … there’s a more provocative question. It’s almost a silly question, but I need to ask it of the two of you, which is, is there a future, you know, where AI replaces doctors or replaces, you know, medical specialties that we have today? So what does the world look like, say, five years from now?
GATES: Well, it’s important to distinguish healthcare discovery activity from healthcare delivery activity. We focused mostly on delivery. I think it’s very much within the realm of possibility that the AI is not only accelerating healthcare discovery but substituting for a lot of the roles of, you know, I’m an organic chemist, or I run various types of assays. I can see those, which are, you know, testable-output-type jobs but with still very high value, I can see, you know, some replacement in those areas before the doctor.
The doctor, still understanding the human condition and long-term dialogues, you know, they’ve had a lifetime of reinforcement of that, particularly when you get into areas like mental health. So I wouldn’t say in five years, either people will choose to adopt it, but it will be profound that there’ll be this nearly free intelligence that can do follow-up, that can help you, you know, make sure you went through different possibilities.
And so I’d say, yes, we’ll have doctors, but I’d say healthcare will be massively transformed in its quality and in efficiency by AI in that time period.
LEE: Is there a comparison, useful comparison, say, between doctors and, say, programmers, computer programmers, or doctors and, I don’t know, lawyers?
GATES: Programming is another one that has, kind of, a mathematical correctness to it, you know, and so the objective function that you’re trying to reinforce to, as soon as you can understand the state machines, you can have something that’s “checkable”; that’s correct. So I think programming, you know, which is weird to say, that the machine will beat us at most programming tasks before we let it take over roles that have deep empathy, you know, physical presence and social understanding in them.
LEE: Yeah. By the way, you know, I fully expect in five years that AI will produce mathematical proofs that are checkable for validity, easily checkable, because they’ll be written in a proof-checking language like Lean or something but will be so complex that no human mathematician can understand them. I expect that to happen.
I can imagine in some fields, like cellular biology, we could have the same situation in the future because the molecular pathways, the chemistry, biochemistry of human cells or living cells is as complex as any mathematics, and so it seems possible that we may be in a state where in wet lab, we see, Oh yeah, this actually works, but no one can understand why.
BUBECK: Yeah, absolutely. I mean, I think I really agree with Bill’s distinction of the discovery and the delivery, and indeed, the discovery’s when you can check things, and at the end, there is an artifact that you can verify. You know, you can run the protocol in the wet lab and seeproduced what you wanted. So I absolutely agree with that.
And in fact, you know, we don’t have to talk five years from now. I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini. So this is really amazing. And, you know, just very quickly, just so people know, it was about this statistical physics model, the frustrated Potts model, which has to do with coloring, and basically, the case of three colors, like, more than two colors was open for a long time, and o3 was able to reduce the case of three colors to two colors.
LEE: Yeah.
BUBECK: Which is just, like, astounding. And this is not … this is now. This is happening right now. So this is something that I personally didn’t expect it would happen so quickly, and it’s due to those reasoning models.
Now, on the delivery side, I would add something more to it for the reason why doctors and, in fact, lawyers and coders will remain for a long time, and it’s because we still don’t understand how those models generalize. Like, at the end of the day, we are not able to tell you when they are confronted with a really new, novel situation, whether they will work or not.
Nobody is able to give you that guarantee. And I think until we understand this generalization better, we’re not going to be willing to just let the system in the wild without human supervision.
LEE: But don’t human doctors, human specialists … so, for example, a cardiologist sees a patient in a certain way that a nephrologist …
BUBECK: Yeah.
LEE: … or an endocrinologist might not.
BUBECK: That’s right. But another cardiologist will understand and, kind of, expect a certain level of generalization from their peer. And this, we just don’t have it with AI models. Now, of course, you’re exactly right. That generalization is also hard for humans. Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know.
LEE: OK. You know, the podcast is focused on what’s happened over the last two years. But now, I’d like one provocative prediction about what you think the world of AI and medicine is going to be at some point in the future. You pick your timeframe. I don’t care if it’s two years or 20 years from now, but, you know, what do you think will be different about AI in medicine in that future than today?
BUBECK: Yeah, I think the deployment is going to accelerate soon. Like, we’re really not missing very much. There is this enormous capability overhang. Like, even if progress completely stopped, with current systems, we can do a lot more than what we’re doing right now. So I think this will … this has to be realized, you know, sooner rather than later.
And I think it’s probably dependent on these benchmarks and proper evaluation and tying this with regulation. So these are things that take time in human society and for good reason. But now we already are at two years; you know, give it another two years and it should be really …
LEE: Will AI prescribe your medicines? Write your prescriptions?
BUBECK: I think yes. I think yes.
LEE: OK. Bill?
GATES: Well, I think the next two years, we’ll have massive pilots, and so the amount of use of the AI, still in a copilot-type mode, you know, we should get millions of patient visits, you know, both in general medicine and in the mental health side, as well. And I think that’s going to build up both the data and the confidence to give the AI some additional autonomy. You know, are you going to let it talk to you at night when you’re panicked about your mental health with some ability to escalate?
And, you know, I’ve gone so far as to tell politicians with national health systems that if they deploy AI appropriately, that the quality of care, the overload of the doctors, the improvement in the economics will be enough that their voters will be stunned because they just don’t expect this, and, you know, they could be reelectedjust on this one thing of fixing what is a very overloaded and economically challenged health system in these rich countries.
You know, my personal role is going to be to make sure that in the poorer countries, there isn’t some lag; in fact, in many cases, that we’ll be more aggressive because, you know, we’re comparing to having no access to doctors at all. And, you know, so I think whether it’s India or Africa, there’ll be lessons that are globally valuable because we need medical intelligence. And, you know, thank god AI is going to provide a lot of that.
LEE: Well, on that optimistic note, I think that’s a good way to end. Bill, Seb, really appreciate all of this.
I think the most fundamental prediction we made in the book is that AI would actually find its way into the practice of medicine, and I think that that at least has come true, maybe in different ways than we expected, but it’s come true, and I think it’ll only accelerate from here. So thanks again, both of you.
GATES: Yeah. Thanks, you guys.
BUBECK: Thank you, Peter. Thanks, Bill.
LEE: I just always feel such a sense of privilege to have a chance to interact and actually work with people like Bill and Sébastien.
With Bill, I’m always amazed at how practically minded he is. He’s really thinking about the nuts and bolts of what AI might be able to do for people, and his thoughts about underserved parts of the world, the idea that we might actually be able to empower people with access to expert medical knowledge, I think is both inspiring and amazing.
And then, Seb, Sébastien Bubeck, he’s just absolutely a brilliant mind. He has a really firm grip on the deep mathematics of artificial intelligence and brings that to bear in his research and development work. And where that mathematics takes him isn’t just into the nuts and bolts of algorithms but into philosophical questions about the nature of intelligence.
One of the things that Sébastien brought up was the state of evaluation of AI systems. And indeed, he was fairly critical in our conversation. But of course, the world of AI research and development is just moving so fast, and indeed, since we recorded our conversation, OpenAI, in fact, released a new evaluation metric that is directly relevant to medical applications, and that is something called HealthBench. And Microsoft Research also released a new evaluation approach or process called ADeLe.
HealthBench and ADeLe are examples of new approaches to evaluating AI models that are less about testing their knowledge and ability to pass multiple-choice exams and instead are evaluation approaches designed to assess how well AI models are able to complete tasks that actually arise every day in typical healthcare or biomedical research settings. These are examples of really important good work that speak to how well AI models work in the real world of healthcare and biomedical research and how well they can collaborate with human beings in those settings.
You know, I asked Bill and Seb to make some predictions about the future. You know, my own answer, I expect that we’re going to be able to use AI to change how we diagnose patients, change how we decide treatment options.
If you’re a doctor or a nurse and you encounter a patient, you’ll ask questions, do a physical exam, you know, call out for labs just like you do today, but then you’ll be able to engage with AI based on all of that data and just ask, you know, based on all the other people who have gone through the same experience, who have similar data, how were they diagnosed? How were they treated? What were their outcomes? And what does that mean for the patient I have right now? Some people call it the “patients like me” paradigm. And I think that’s going to become real because of AI within our lifetimes. That idea of really grounding the delivery in healthcare and medical practice through data and intelligence, I actually now don’t see any barriers to that future becoming real.
I’d like to extend another big thank you to Bill and Sébastien for their time. And to our listeners, as always, it’s a pleasure to have you along for the ride. I hope you’ll join us for our remaining conversations, as well as a second coauthor roundtable with Carey and Zak.
Until next time.
#how #reshaping #future #healthcare #medical

How AI is reshaping the future of healthcare and medical research
Transcript       PETER LEE: “In ‘The Little Black Bag,’ a classic science fiction story, a high-tech doctor’s kit of the future is accidentally transported back to the 1950s, into the shaky hands of a washed-up, alcoholic doctor. The ultimate medical tool, it redeems the doctor wielding it, allowing him to practice gratifyingly heroic medicine. … The tale ends badly for the doctor and his treacherous assistant, but it offered a picture of how advanced technology could transform medicine—powerful when it was written nearly 75 years ago and still so today. What would be the Al equivalent of that little black bag? At this moment when new capabilities are emerging, how do we imagine them into medicine?”          This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.   Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?    In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.  The book passage I read at the top is from “Chapter 10: The Big Black Bag.” In imagining AI in medicine, Carey, Zak, and I included in our book two fictional accounts. In the first, a medical resident consults GPT-4 on her personal phone as the patient in front of her crashes. Within seconds, it offers an alternate response based on recent literature. In the second account, a 90-year-old woman with several chronic conditions is living independently and receiving near-constant medical support from an AI aide.    In our conversations with the guests we’ve spoken to so far, we’ve caught a glimpse of these predicted futures, seeing how clinicians and patients are actually using AI today and how developers are leveraging the technology in the healthcare products and services they’re creating. In fact, that first fictional account isn’t so fictional after all, as most of the doctors in the real world actually appear to be using AI at least occasionally—and sometimes much more than occasionally—to help in their daily clinical work. And as for the second fictional account, which is more of a science fiction account, it seems we are indeed on the verge of a new way of delivering and receiving healthcare, though the future is still very much open. As we continue to examine the current state of AI in healthcare and its potential to transform the field, I’m pleased to welcome Bill Gates and Sébastien Bubeck.   Bill may be best known as the co-founder of Microsoft, having created the company with his childhood friend Paul Allen in 1975. He’s now the founder of Breakthrough Energy, which aims to advance clean energy innovation, and TerraPower, a company developing groundbreaking nuclear energy and science technologies. He also chairs the world’s largest philanthropic organization, the Gates Foundation, and focuses on solving a variety of health challenges around the globe and here at home. Sébastien is a research lead at OpenAI. He was previously a distinguished scientist, vice president of AI, and a colleague of mine here at Microsoft, where his work included spearheading the development of the family of small language models known as Phi. While at Microsoft, he also coauthored the discussion-provoking 2023 paper “Sparks of Artificial General Intelligence,” which presented the results of early experiments with GPT-4 conducted by a small team from Microsoft Research.      Here’s my conversation with Bill Gates and Sébastien Bubeck. LEE: Bill, welcome. BILL GATES: Thank you. LEE: Seb … SÉBASTIEN BUBECK: Yeah. Hi, hi, Peter. Nice to be here. LEE: You know, one of the things that I’ve been doing just to get the conversation warmed up is to talk about origin stories, and what I mean about origin stories is, you know, what was the first contact that you had with large language models or the concept of generative AI that convinced you or made you think that something really important was happening? And so, Bill, I think I’ve heard the story about, you know, the time when the OpenAI folks—Sam Altman, Greg Brockman, and others—showed you something, but could we hear from you what those early encounters were like and what was going through your mind?   GATES: Well, I’d been visiting OpenAI soon after it was created to see things like GPT-2 and to see the little arm they had that was trying to match human manipulation and, you know, looking at their games like Dota that they were trying to get as good as human play. And honestly, I didn’t think the language model stuff they were doing, even when they got to GPT-3, would show the ability to learn, you know, in the same sense that a human reads a biology book and is able to take that knowledge and access it not only to pass a test but also to create new medicines. And so my challenge to them was that if their LLM could get a five on the advanced placement biology test, then I would say, OK, it took biologic knowledge and encoded it in an accessible way and that I didn’t expect them to do that very quickly but it would be profound.   And it was only about six months after I challenged them to do that, that an early version of GPT-4 they brought up to a dinner at my house, and in fact, it answered most of the questions that night very well. The one it got totally wrong, we were … because it was so good, we kept thinking, Oh, we must be wrong. It turned out it was a math weaknessthat, you know, we later understood that that was an area of, weirdly, of incredible weakness of those early models. But, you know, that was when I realized, OK, the age of cheap intelligence was at its beginning. LEE: Yeah. So I guess it seems like you had something similar to me in that my first encounters, I actually harbored some skepticism. Is it fair to say you were skeptical before that? GATES: Well, the idea that we’ve figured out how to encode and access knowledge in this very deep sense without even understanding the nature of the encoding, … LEE: Right.   GATES: … that is a bit weird.   LEE: Yeah. GATES: We have an algorithm that creates the computation, but even say, OK, where is the president’s birthday stored in there? Where is this fact stored in there? The fact that even now when we’re playing around, getting a little bit more sense of it, it’s opaque to us what the semantic encoding is, it’s, kind of, amazing to me. I thought the invention of knowledge storage would be an explicit way of encoding knowledge, not an implicit statistical training. LEE: Yeah, yeah. All right. So, Seb, you know, on this same topic, you know, I got—as we say at Microsoft—I got pulled into the tent. BUBECK: Yes.   LEE: Because this was a very secret project. And then, um, I had the opportunity to select a small number of researchers in MSRto join and start investigating this thing seriously. And the first person I pulled in was you. BUBECK: Yeah. LEE: And so what were your first encounters? Because I actually don’t remember what happened then. BUBECK: Oh, I remember it very well.My first encounter with GPT-4 was in a meeting with the two of you, actually. But my kind of first contact, the first moment where I realized that something was happening with generative AI, was before that. And I agree with Bill that I also wasn’t too impressed by GPT-3. I though that it was kind of, you know, very naturally mimicking the web, sort of parroting what was written there in a nice way. Still in a way which seemed very impressive. But it wasn’t really intelligent in any way. But shortly after GPT-3, there was a model before GPT-4 that really shocked me, and this was the first image generation model, DALL-E 1. So that was in 2021. And I will forever remember the press release of OpenAI where they had this prompt of an avocado chair and then you had this image of the avocado chair.And what really shocked me is that clearly the model kind of “understood” what is a chair, what is an avocado, and was able to merge those concepts. So this was really, to me, the first moment where I saw some understanding in those models.   LEE: So this was, just to get the timing right, that was before I pulled you into the tent. BUBECK: That was before. That was like a year before. LEE: Right.   BUBECK: And now I will tell you how, you know, we went from that moment to the meeting with the two of you and GPT-4. So once I saw this kind of understanding, I thought, OK, fine. It understands concept, but it’s still not able to reason. It cannot—as, you know, Bill was saying—it cannot learn from your document. It cannot reason.   So I set out to try to prove that. You know, this is what I was in the business of at the time, trying to prove things in mathematics. So I was trying to prove that basically autoregressive transformers could never reason. So I was trying to prove this. And after a year of work, I had something reasonable to show. And so I had the meeting with the two of you, and I had this example where I wanted to say, there is no way that an LLM is going to be able to do x. And then as soon as I … I don’t know if you remember, Bill. But as soon as I said that, you said, oh, but wait a second. I had, you know, the OpenAI crew at my house recently, and they showed me a new model. Why don’t we ask this new model this question?   LEE: Yeah. BUBECK: And we did, and it solved it on the spot. And that really, honestly, just changed my life. Like, you know, I had been working for a year trying to say that this was impossible. And just right there, it was shown to be possible.   LEE:One of the very first things I got interested in—because I was really thinking a lot about healthcare—was healthcare and medicine. And I don’t know if the two of you remember, but I ended up doing a lot of tests. I ran through, you know, step one and step two of the US Medical Licensing Exam. Did a whole bunch of other things. I wrote this big report. It was, you know, I can’t remember … a couple hundred pages.   And I needed to share this with someone. I didn’t … there weren’t too many people I could share it with. So I sent, I think, a copy to you, Bill. Sent a copy to you, Seb.   I hardly slept for about a week putting that report together. And, yeah, and I kept working on it. But I was far from alone. I think everyone who was in the tent, so to speak, in those early days was going through something pretty similar. All right. So I think … of course, a lot of what I put in the report also ended up being examples that made it into the book. But the main purpose of this conversation isn’t to reminisce aboutor indulge in those reminiscences but to talk about what’s happening in healthcare and medicine. And, you know, as I said, we wrote this book. We did it very, very quickly. Seb, you helped. Bill, you know, you provided a review and some endorsements. But, you know, honestly, we didn’t know what we were talking about because no one had access to this thing. And so we just made a bunch of guesses. So really, the whole thing I wanted to probe with the two of you is, now with two years of experience out in the world, what, you know, what do we think is happening today? You know, is AI actually having an impact, positive or negative, on healthcare and medicine? And what do we now think is going to happen in the next two years, five years, or 10 years? And so I realize it’s a little bit too abstract to just ask it that way. So let me just try to narrow the discussion and guide us a little bit.   Um, the kind of administrative and clerical work, paperwork, around healthcare—and we made a lot of guesses about that—that appears to be going well, but, you know, Bill, I know we’ve discussed that sometimes that you think there ought to be a lot more going on. Do you have a viewpoint on how AI is actually finding its way into reducing paperwork? GATES: Well, I’m stunned … I don’t think there should be a patient-doctor meeting where the AI is not sitting in and both transcribing, offering to help with the paperwork, and even making suggestions, although the doctor will be the one, you know, who makes the final decision about the diagnosis and whatever prescription gets done.   It’s so helpful. You know, when that patient goes home and their, you know, son who wants to understand what happened has some questions, that AI should be available to continue that conversation. And the way you can improve that experience and streamline things and, you know, involve the people who advise you. I don’t understand why that’s not more adopted, because there you still have the human in the loop making that final decision. But even for, like, follow-up calls to make sure the patient did things, to understand if they have concerns and knowing when to escalate back to the doctor, the benefit is incredible. And, you know, that thing is ready for prime time. That paradigm is ready for prime time, in my view. LEE: Yeah, there are some good products, but it seems like the number one use right now—and we kind of got this from some of the previous guests in previous episodes—is the use of AI just to respond to emails from patients.Does that make sense to you? BUBECK: Yeah. So maybe I want to second what Bill was saying but maybe take a step back first. You know, two years ago, like, the concept of clinical scribes, which is one of the things that we’re talking about right now, it would have sounded, in fact, it sounded two years ago, borderline dangerous. Because everybody was worried about hallucinations. What happened if you have this AI listening in and then it transcribes, you know, something wrong? Now, two years later, I think it’s mostly working. And in fact, it is not yet, you know, fully adopted. You’re right. But it is in production. It is used, you know, in many, many places. So this rate of progress is astounding because it wasn’t obvious that we would be able to overcome those obstacles of hallucination. It’s not to say that hallucinations are fully solved. In the case of the closed system, they are.   Now, I think more generally what’s going on in the background is that there is something that we, that certainly I, underestimated, which is this management overhead. So I think the reason why this is not adopted everywhere is really a training and teaching aspect. People need to be taught, like, those systems, how to interact with them. And one example that I really like, a study that recently appeared where they tried to use ChatGPT for diagnosis and they were comparing doctors without and with ChatGPT. And the amazing thing … so this was a set of cases where the accuracy of the doctors alone was around 75%. ChatGPT alone was 90%. So that’s already kind of mind blowing. But then the kicker is that doctors with ChatGPT was 80%.   Intelligence alone is not enough. It’s also how it’s presented, how you interact with it. And ChatGPT, it’s an amazing tool. Obviously, I absolutely love it. But it’s not … you don’t want a doctor to have to type in, you know, prompts and use it that way. It should be, as Bill was saying, kind of running continuously in the background, sending you notifications. And you have to be really careful of the rate at which those notifications are being sent. Because if they are too frequent, then the doctor will learn to ignore them. So you have to … all of those things matter, in fact, at least as much as the level of intelligence of the machine. LEE: One of the things I think about, Bill, in that scenario that you described, doctors do some thinking about the patient when they write the note. So, you know, I’m always a little uncertain whether it’s actually … you know, you wouldn’t necessarily want to fully automate this, I don’t think. Or at least there needs to be some prompt to the doctor to make sure that the doctor puts some thought into what happened in the encounter with the patient. Does that make sense to you at all? GATES: At this stage, you know, I’d still put the onus on the doctor to write the conclusions and the summary and not delegate that. The tradeoffs you make a little bit are somewhat dependent on the situation you’re in. If you’re in Africa, So, yes, the doctor’s still going to have to do a lot of work, but just the quality of letting the patient and the people around them interact and ask questions and have things explained, that alone is such a quality improvement. It’s mind blowing.   LEE: So since you mentioned, you know, Africa—and, of course, this touches on the mission and some of the priorities of the Gates Foundation and this idea of democratization of access to expert medical care—what’s the most interesting stuff going on right now? Are there people and organizations or technologies that are impressing you or that you’re tracking? GATES: Yeah. So the Gates Foundation has given out a lot of grants to people in Africa doing education, agriculture but more healthcare examples than anything. And the way these things start off, they often start out either being patient-centric in a narrow situation, like, OK, I’m a pregnant woman; talk to me. Or, I have infectious disease symptoms; talk to me. Or they’re connected to a health worker where they’re helping that worker get their job done. And we have lots of pilots out, you know, in both of those cases.   The dream would be eventually to have the thing the patient consults be so broad that it’s like having a doctor available who understands the local things.   LEE: Right.   GATES: We’re not there yet. But over the next two or three years, you know, particularly given the worsening financial constraints against African health systems, where the withdrawal of money has been dramatic, you know, figuring out how to take this—what I sometimes call “free intelligence”—and build a quality health system around that, we will have to be more radical in low-income countries than any rich country is ever going to be.   LEE: Also, there’s maybe a different regulatory environment, so some of those things maybe are easier? Because right now, I think the world hasn’t figured out how to and whether to regulate, let’s say, an AI that might give a medical diagnosis or write a prescription for a medication. BUBECK: Yeah. I think one issue with this, and it’s also slowing down the deployment of AI in healthcare more generally, is a lack of proper benchmark. Because, you know, you were mentioning the USMLE, for example. That’s a great test to test human beings and their knowledge of healthcare and medicine. But it’s not a great test to give to an AI. It’s not asking the right questions. So finding what are the right questions to test whether an AI system is ready to give diagnosis in a constrained setting, that’s a very, very important direction, which to my surprise, is not yet accelerating at the rate that I was hoping for. LEE: OK, so that gives me an excuse to get more now into the core AI tech because something I’ve discussed with both of you is this issue of what are the right tests. And you both know the very first test I give to any new spin of an LLM is I present a patient, the results—a mythical patient—the results of my physical exam, my mythical physical exam. Maybe some results of some initial labs. And then I present or propose a differential diagnosis. And if you’re not in medicine, a differential diagnosis you can just think of as a prioritized list of the possible diagnoses that fit with all that data. And in that proposed differential, I always intentionally make two mistakes. I make a textbook technical error in one of the possible elements of the differential diagnosis, and I have an error of omission. And, you know, I just want to know, does the LLM understand what I’m talking about? And all the good ones out there do now. But then I want to know, can it spot the errors? And then most importantly, is it willing to tell me I’m wrong, that I’ve made a mistake?   That last piece seems really hard for AI today. And so let me ask you first, Seb, because at the time of this taping, of course, there was a new spin of GPT-4o last week that became overly sycophantic. In other words, it was actually prone in that test of mine not only to not tell me I’m wrong, but it actually praised me for the creativity of my differential.What’s up with that? BUBECK: Yeah, I guess it’s a testament to the fact that training those models is still more of an art than a science. So it’s a difficult job. Just to be clear with the audience, we have rolled back thatversion of GPT-4o, so now we don’t have the sycophant version out there. Yeah, no, it’s a really difficult question. It has to do … as you said, it’s very technical. It has to do with the post-training and how, like, where do you nudge the model? So, you know, there is this very classical by now technique called RLHF, where you push the model in the direction of a certain reward model. So the reward model is just telling the model, you know, what behavior is good, what behavior is bad. But this reward model is itself an LLM, and, you know, Bill was saying at the very beginning of the conversation that we don’t really understand how those LLMs deal with concepts like, you know, where is the capital of France located? Things like that. It is the same thing for this reward model. We don’t know why it says that it prefers one output to another, and whether this is correlated with some sycophancy is, you know, something that we discovered basically just now. That if you push too hard in optimization on this reward model, you will get a sycophant model. So it’s kind of … what I’m trying to say is we became too good at what we were doing, and we ended up, in fact, in a trap of the reward model. LEE: I mean, you do want … it’s a difficult balance because you do want models to follow your desires and … BUBECK: It’s a very difficult, very difficult balance. LEE: So this brings up then the following question for me, which is the extent to which we think we’ll need to have specially trained models for things. So let me start with you, Bill. Do you have a point of view on whether we will need to, you know, quote-unquote take AI models to med school? Have them specially trained? Like, if you were going to deploy something to give medical care in underserved parts of the world, do we need to do something special to create those models? GATES: We certainly need to teach them the African languages and the unique dialects so that the multimedia interactions are very high quality. We certainly need to teach them the disease prevalence and unique disease patterns like, you know, neglected tropical diseases and malaria. So we need to gather a set of facts that somebody trying to go for a US customer base, you know, wouldn’t necessarily have that in there. Those two things are actually very straightforward because the additional training time is small. I’d say for the next few years, we’ll also need to do reinforcement learning about the context of being a doctor and how important certain behaviors are. Humans learn over the course of their life to some degree that, I’m in a different context and the way I behave in terms of being willing to criticize or be nice, you know, how important is it? Who’s here? What’s my relationship to them?   Right now, these machines don’t have that broad social experience. And so if you know it’s going to be used for health things, a lot of reinforcement learning of the very best humans in that context would still be valuable. Eventually, the models will, having read all the literature of the world about good doctors, bad doctors, it’ll understand as soon as you say, “I want you to be a doctor diagnosing somebody.” All of the implicit reinforcement that fits that situation, you know, will be there. LEE: Yeah. GATES: And so I hope three years from now, we don’t have to do that reinforcement learning. But today, for any medical context, you would want a lot of data to reinforce tone, willingness to say things when, you know, there might be something significant at stake. LEE: Yeah. So, you know, something Bill said, kind of, reminds me of another thing that I think we missed, which is, the context also … and the specialization also pertains to different, I guess, what we still call “modes,” although I don’t know if the idea of multimodal is the same as it was two years ago. But, you know, what do you make of all of the hubbub around—in fact, within Microsoft Research, this is a big deal, but I think we’re far from alone—you know, medical images and vision, video, proteins and molecules, cell, you know, cellular data and so on. BUBECK: Yeah. OK. So there is a lot to say to everything … to the last, you know, couple of minutes. Maybe on the specialization aspect, you know, I think there is, hiding behind this, a really fundamental scientific question of whether eventually we have a singular AGIthat kind of knows everything and you can just put, you know, explain your own context and it will just get it and understand everything. That’s one vision. I have to say, I don’t particularly believe in this vision. In fact, we humans are not like that at all. I think, hopefully, we are general intelligences, yet we have to specialize a lot. And, you know, I did myself a lot of RL, reinforcement learning, on mathematics. Like, that’s what I did, you know, spent a lot of time doing that. And I didn’t improve on other aspects. You know, in fact, I probably degraded in other aspects.So it’s … I think it’s an important example to have in mind. LEE: I think I might disagree with you on that, though, because, like, doesn’t a model have to see both good science and bad science in order to be able to gain the ability to discern between the two? BUBECK: Yeah, no, that absolutely. I think there is value in seeing the generality, in having a very broad base. But then you, kind of, specialize on verticals. And this is where also, you know, open-weights model, which we haven’t talked about yet, are really important because they allow you to provide this broad base to everyone. And then you can specialize on top of it. LEE: So we have about three hours of stuff to talk about, but our time is actually running low. BUBECK: Yes, yes, yes.   LEE: So I think I want … there’s a more provocative question. It’s almost a silly question, but I need to ask it of the two of you, which is, is there a future, you know, where AI replaces doctors or replaces, you know, medical specialties that we have today? So what does the world look like, say, five years from now? GATES: Well, it’s important to distinguish healthcare discovery activity from healthcare delivery activity. We focused mostly on delivery. I think it’s very much within the realm of possibility that the AI is not only accelerating healthcare discovery but substituting for a lot of the roles of, you know, I’m an organic chemist, or I run various types of assays. I can see those, which are, you know, testable-output-type jobs but with still very high value, I can see, you know, some replacement in those areas before the doctor.   The doctor, still understanding the human condition and long-term dialogues, you know, they’ve had a lifetime of reinforcement of that, particularly when you get into areas like mental health. So I wouldn’t say in five years, either people will choose to adopt it, but it will be profound that there’ll be this nearly free intelligence that can do follow-up, that can help you, you know, make sure you went through different possibilities. And so I’d say, yes, we’ll have doctors, but I’d say healthcare will be massively transformed in its quality and in efficiency by AI in that time period. LEE: Is there a comparison, useful comparison, say, between doctors and, say, programmers, computer programmers, or doctors and, I don’t know, lawyers? GATES: Programming is another one that has, kind of, a mathematical correctness to it, you know, and so the objective function that you’re trying to reinforce to, as soon as you can understand the state machines, you can have something that’s “checkable”; that’s correct. So I think programming, you know, which is weird to say, that the machine will beat us at most programming tasks before we let it take over roles that have deep empathy, you know, physical presence and social understanding in them. LEE: Yeah. By the way, you know, I fully expect in five years that AI will produce mathematical proofs that are checkable for validity, easily checkable, because they’ll be written in a proof-checking language like Lean or something but will be so complex that no human mathematician can understand them. I expect that to happen.   I can imagine in some fields, like cellular biology, we could have the same situation in the future because the molecular pathways, the chemistry, biochemistry of human cells or living cells is as complex as any mathematics, and so it seems possible that we may be in a state where in wet lab, we see, Oh yeah, this actually works, but no one can understand why. BUBECK: Yeah, absolutely. I mean, I think I really agree with Bill’s distinction of the discovery and the delivery, and indeed, the discovery’s when you can check things, and at the end, there is an artifact that you can verify. You know, you can run the protocol in the wet lab and seeproduced what you wanted. So I absolutely agree with that.   And in fact, you know, we don’t have to talk five years from now. I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini. So this is really amazing. And, you know, just very quickly, just so people know, it was about this statistical physics model, the frustrated Potts model, which has to do with coloring, and basically, the case of three colors, like, more than two colors was open for a long time, and o3 was able to reduce the case of three colors to two colors.   LEE: Yeah. BUBECK: Which is just, like, astounding. And this is not … this is now. This is happening right now. So this is something that I personally didn’t expect it would happen so quickly, and it’s due to those reasoning models.   Now, on the delivery side, I would add something more to it for the reason why doctors and, in fact, lawyers and coders will remain for a long time, and it’s because we still don’t understand how those models generalize. Like, at the end of the day, we are not able to tell you when they are confronted with a really new, novel situation, whether they will work or not. Nobody is able to give you that guarantee. And I think until we understand this generalization better, we’re not going to be willing to just let the system in the wild without human supervision. LEE: But don’t human doctors, human specialists … so, for example, a cardiologist sees a patient in a certain way that a nephrologist … BUBECK: Yeah. LEE: … or an endocrinologist might not. BUBECK: That’s right. But another cardiologist will understand and, kind of, expect a certain level of generalization from their peer. And this, we just don’t have it with AI models. Now, of course, you’re exactly right. That generalization is also hard for humans. Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know. LEE: OK. You know, the podcast is focused on what’s happened over the last two years. But now, I’d like one provocative prediction about what you think the world of AI and medicine is going to be at some point in the future. You pick your timeframe. I don’t care if it’s two years or 20 years from now, but, you know, what do you think will be different about AI in medicine in that future than today? BUBECK: Yeah, I think the deployment is going to accelerate soon. Like, we’re really not missing very much. There is this enormous capability overhang. Like, even if progress completely stopped, with current systems, we can do a lot more than what we’re doing right now. So I think this will … this has to be realized, you know, sooner rather than later. And I think it’s probably dependent on these benchmarks and proper evaluation and tying this with regulation. So these are things that take time in human society and for good reason. But now we already are at two years; you know, give it another two years and it should be really …   LEE: Will AI prescribe your medicines? Write your prescriptions? BUBECK: I think yes. I think yes. LEE: OK. Bill? GATES: Well, I think the next two years, we’ll have massive pilots, and so the amount of use of the AI, still in a copilot-type mode, you know, we should get millions of patient visits, you know, both in general medicine and in the mental health side, as well. And I think that’s going to build up both the data and the confidence to give the AI some additional autonomy. You know, are you going to let it talk to you at night when you’re panicked about your mental health with some ability to escalate? And, you know, I’ve gone so far as to tell politicians with national health systems that if they deploy AI appropriately, that the quality of care, the overload of the doctors, the improvement in the economics will be enough that their voters will be stunned because they just don’t expect this, and, you know, they could be reelectedjust on this one thing of fixing what is a very overloaded and economically challenged health system in these rich countries. You know, my personal role is going to be to make sure that in the poorer countries, there isn’t some lag; in fact, in many cases, that we’ll be more aggressive because, you know, we’re comparing to having no access to doctors at all. And, you know, so I think whether it’s India or Africa, there’ll be lessons that are globally valuable because we need medical intelligence. And, you know, thank god AI is going to provide a lot of that. LEE: Well, on that optimistic note, I think that’s a good way to end. Bill, Seb, really appreciate all of this.   I think the most fundamental prediction we made in the book is that AI would actually find its way into the practice of medicine, and I think that that at least has come true, maybe in different ways than we expected, but it’s come true, and I think it’ll only accelerate from here. So thanks again, both of you.   GATES: Yeah. Thanks, you guys. BUBECK: Thank you, Peter. Thanks, Bill. LEE: I just always feel such a sense of privilege to have a chance to interact and actually work with people like Bill and Sébastien.    With Bill, I’m always amazed at how practically minded he is. He’s really thinking about the nuts and bolts of what AI might be able to do for people, and his thoughts about underserved parts of the world, the idea that we might actually be able to empower people with access to expert medical knowledge, I think is both inspiring and amazing.   And then, Seb, Sébastien Bubeck, he’s just absolutely a brilliant mind. He has a really firm grip on the deep mathematics of artificial intelligence and brings that to bear in his research and development work. And where that mathematics takes him isn’t just into the nuts and bolts of algorithms but into philosophical questions about the nature of intelligence.   One of the things that Sébastien brought up was the state of evaluation of AI systems. And indeed, he was fairly critical in our conversation. But of course, the world of AI research and development is just moving so fast, and indeed, since we recorded our conversation, OpenAI, in fact, released a new evaluation metric that is directly relevant to medical applications, and that is something called HealthBench. And Microsoft Research also released a new evaluation approach or process called ADeLe.   HealthBench and ADeLe are examples of new approaches to evaluating AI models that are less about testing their knowledge and ability to pass multiple-choice exams and instead are evaluation approaches designed to assess how well AI models are able to complete tasks that actually arise every day in typical healthcare or biomedical research settings. These are examples of really important good work that speak to how well AI models work in the real world of healthcare and biomedical research and how well they can collaborate with human beings in those settings. You know, I asked Bill and Seb to make some predictions about the future. You know, my own answer, I expect that we’re going to be able to use AI to change how we diagnose patients, change how we decide treatment options.   If you’re a doctor or a nurse and you encounter a patient, you’ll ask questions, do a physical exam, you know, call out for labs just like you do today, but then you’ll be able to engage with AI based on all of that data and just ask, you know, based on all the other people who have gone through the same experience, who have similar data, how were they diagnosed? How were they treated? What were their outcomes? And what does that mean for the patient I have right now? Some people call it the “patients like me” paradigm. And I think that’s going to become real because of AI within our lifetimes. That idea of really grounding the delivery in healthcare and medical practice through data and intelligence, I actually now don’t see any barriers to that future becoming real.   I’d like to extend another big thank you to Bill and Sébastien for their time. And to our listeners, as always, it’s a pleasure to have you along for the ride. I hope you’ll join us for our remaining conversations, as well as a second coauthor roundtable with Carey and Zak.   Until next time.   #how #reshaping #future #healthcare #medical

How AI is reshaping the future of healthcare and medical research

www.microsoft.com
Transcript [MUSIC]      [BOOK PASSAGE]  PETER LEE: “In ‘The Little Black Bag,’ a classic science fiction story, a high-tech doctor’s kit of the future is accidentally transported back to the 1950s, into the shaky hands of a washed-up, alcoholic doctor. The ultimate medical tool, it redeems the doctor wielding it, allowing him to practice gratifyingly heroic medicine. … The tale ends badly for the doctor and his treacherous assistant, but it offered a picture of how advanced technology could transform medicine—powerful when it was written nearly 75 years ago and still so today. What would be the Al equivalent of that little black bag? At this moment when new capabilities are emerging, how do we imagine them into medicine?”   [END OF BOOK PASSAGE]    [THEME MUSIC]    This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.   Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?    In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.  [THEME MUSIC FADES] The book passage I read at the top is from “Chapter 10: The Big Black Bag.” In imagining AI in medicine, Carey, Zak, and I included in our book two fictional accounts. In the first, a medical resident consults GPT-4 on her personal phone as the patient in front of her crashes. Within seconds, it offers an alternate response based on recent literature. In the second account, a 90-year-old woman with several chronic conditions is living independently and receiving near-constant medical support from an AI aide.    In our conversations with the guests we’ve spoken to so far, we’ve caught a glimpse of these predicted futures, seeing how clinicians and patients are actually using AI today and how developers are leveraging the technology in the healthcare products and services they’re creating. In fact, that first fictional account isn’t so fictional after all, as most of the doctors in the real world actually appear to be using AI at least occasionally—and sometimes much more than occasionally—to help in their daily clinical work. And as for the second fictional account, which is more of a science fiction account, it seems we are indeed on the verge of a new way of delivering and receiving healthcare, though the future is still very much open. As we continue to examine the current state of AI in healthcare and its potential to transform the field, I’m pleased to welcome Bill Gates and Sébastien Bubeck.   Bill may be best known as the co-founder of Microsoft, having created the company with his childhood friend Paul Allen in 1975. He’s now the founder of Breakthrough Energy, which aims to advance clean energy innovation, and TerraPower, a company developing groundbreaking nuclear energy and science technologies. He also chairs the world’s largest philanthropic organization, the Gates Foundation, and focuses on solving a variety of health challenges around the globe and here at home. Sébastien is a research lead at OpenAI. He was previously a distinguished scientist, vice president of AI, and a colleague of mine here at Microsoft, where his work included spearheading the development of the family of small language models known as Phi. While at Microsoft, he also coauthored the discussion-provoking 2023 paper “Sparks of Artificial General Intelligence,” which presented the results of early experiments with GPT-4 conducted by a small team from Microsoft Research.    [TRANSITION MUSIC]   Here’s my conversation with Bill Gates and Sébastien Bubeck. LEE: Bill, welcome. BILL GATES: Thank you. LEE: Seb … SÉBASTIEN BUBECK: Yeah. Hi, hi, Peter. Nice to be here. LEE: You know, one of the things that I’ve been doing just to get the conversation warmed up is to talk about origin stories, and what I mean about origin stories is, you know, what was the first contact that you had with large language models or the concept of generative AI that convinced you or made you think that something really important was happening? And so, Bill, I think I’ve heard the story about, you know, the time when the OpenAI folks—Sam Altman, Greg Brockman, and others—showed you something, but could we hear from you what those early encounters were like and what was going through your mind?   GATES: Well, I’d been visiting OpenAI soon after it was created to see things like GPT-2 and to see the little arm they had that was trying to match human manipulation and, you know, looking at their games like Dota that they were trying to get as good as human play. And honestly, I didn’t think the language model stuff they were doing, even when they got to GPT-3, would show the ability to learn, you know, in the same sense that a human reads a biology book and is able to take that knowledge and access it not only to pass a test but also to create new medicines. And so my challenge to them was that if their LLM could get a five on the advanced placement biology test, then I would say, OK, it took biologic knowledge and encoded it in an accessible way and that I didn’t expect them to do that very quickly but it would be profound.   And it was only about six months after I challenged them to do that, that an early version of GPT-4 they brought up to a dinner at my house, and in fact, it answered most of the questions that night very well. The one it got totally wrong, we were … because it was so good, we kept thinking, Oh, we must be wrong. It turned out it was a math weakness [LAUGHTER] that, you know, we later understood that that was an area of, weirdly, of incredible weakness of those early models. But, you know, that was when I realized, OK, the age of cheap intelligence was at its beginning. LEE: Yeah. So I guess it seems like you had something similar to me in that my first encounters, I actually harbored some skepticism. Is it fair to say you were skeptical before that? GATES: Well, the idea that we’ve figured out how to encode and access knowledge in this very deep sense without even understanding the nature of the encoding, … LEE: Right.   GATES: … that is a bit weird.   LEE: Yeah. GATES: We have an algorithm that creates the computation, but even say, OK, where is the president’s birthday stored in there? Where is this fact stored in there? The fact that even now when we’re playing around, getting a little bit more sense of it, it’s opaque to us what the semantic encoding is, it’s, kind of, amazing to me. I thought the invention of knowledge storage would be an explicit way of encoding knowledge, not an implicit statistical training. LEE: Yeah, yeah. All right. So, Seb, you know, on this same topic, you know, I got—as we say at Microsoft—I got pulled into the tent. [LAUGHS] BUBECK: Yes.   LEE: Because this was a very secret project. And then, um, I had the opportunity to select a small number of researchers in MSR [Microsoft Research] to join and start investigating this thing seriously. And the first person I pulled in was you. BUBECK: Yeah. LEE: And so what were your first encounters? Because I actually don’t remember what happened then. BUBECK: Oh, I remember it very well. [LAUGHS] My first encounter with GPT-4 was in a meeting with the two of you, actually. But my kind of first contact, the first moment where I realized that something was happening with generative AI, was before that. And I agree with Bill that I also wasn’t too impressed by GPT-3. I though that it was kind of, you know, very naturally mimicking the web, sort of parroting what was written there in a nice way. Still in a way which seemed very impressive. But it wasn’t really intelligent in any way. But shortly after GPT-3, there was a model before GPT-4 that really shocked me, and this was the first image generation model, DALL-E 1. So that was in 2021. And I will forever remember the press release of OpenAI where they had this prompt of an avocado chair and then you had this image of the avocado chair. [LAUGHTER] And what really shocked me is that clearly the model kind of “understood” what is a chair, what is an avocado, and was able to merge those concepts. So this was really, to me, the first moment where I saw some understanding in those models.   LEE: So this was, just to get the timing right, that was before I pulled you into the tent. BUBECK: That was before. That was like a year before. LEE: Right.   BUBECK: And now I will tell you how, you know, we went from that moment to the meeting with the two of you and GPT-4. So once I saw this kind of understanding, I thought, OK, fine. It understands concept, but it’s still not able to reason. It cannot—as, you know, Bill was saying—it cannot learn from your document. It cannot reason.   So I set out to try to prove that. You know, this is what I was in the business of at the time, trying to prove things in mathematics. So I was trying to prove that basically autoregressive transformers could never reason. So I was trying to prove this. And after a year of work, I had something reasonable to show. And so I had the meeting with the two of you, and I had this example where I wanted to say, there is no way that an LLM is going to be able to do x. And then as soon as I … I don’t know if you remember, Bill. But as soon as I said that, you said, oh, but wait a second. I had, you know, the OpenAI crew at my house recently, and they showed me a new model. Why don’t we ask this new model this question?   LEE: Yeah. BUBECK: And we did, and it solved it on the spot. And that really, honestly, just changed my life. Like, you know, I had been working for a year trying to say that this was impossible. And just right there, it was shown to be possible.   LEE: [LAUGHS] One of the very first things I got interested in—because I was really thinking a lot about healthcare—was healthcare and medicine. And I don’t know if the two of you remember, but I ended up doing a lot of tests. I ran through, you know, step one and step two of the US Medical Licensing Exam. Did a whole bunch of other things. I wrote this big report. It was, you know, I can’t remember … a couple hundred pages.   And I needed to share this with someone. I didn’t … there weren’t too many people I could share it with. So I sent, I think, a copy to you, Bill. Sent a copy to you, Seb.   I hardly slept for about a week putting that report together. And, yeah, and I kept working on it. But I was far from alone. I think everyone who was in the tent, so to speak, in those early days was going through something pretty similar. All right. So I think … of course, a lot of what I put in the report also ended up being examples that made it into the book. But the main purpose of this conversation isn’t to reminisce about [LAUGHS] or indulge in those reminiscences but to talk about what’s happening in healthcare and medicine. And, you know, as I said, we wrote this book. We did it very, very quickly. Seb, you helped. Bill, you know, you provided a review and some endorsements. But, you know, honestly, we didn’t know what we were talking about because no one had access to this thing. And so we just made a bunch of guesses. So really, the whole thing I wanted to probe with the two of you is, now with two years of experience out in the world, what, you know, what do we think is happening today? You know, is AI actually having an impact, positive or negative, on healthcare and medicine? And what do we now think is going to happen in the next two years, five years, or 10 years? And so I realize it’s a little bit too abstract to just ask it that way. So let me just try to narrow the discussion and guide us a little bit.   Um, the kind of administrative and clerical work, paperwork, around healthcare—and we made a lot of guesses about that—that appears to be going well, but, you know, Bill, I know we’ve discussed that sometimes that you think there ought to be a lot more going on. Do you have a viewpoint on how AI is actually finding its way into reducing paperwork? GATES: Well, I’m stunned … I don’t think there should be a patient-doctor meeting where the AI is not sitting in and both transcribing, offering to help with the paperwork, and even making suggestions, although the doctor will be the one, you know, who makes the final decision about the diagnosis and whatever prescription gets done.   It’s so helpful. You know, when that patient goes home and their, you know, son who wants to understand what happened has some questions, that AI should be available to continue that conversation. And the way you can improve that experience and streamline things and, you know, involve the people who advise you. I don’t understand why that’s not more adopted, because there you still have the human in the loop making that final decision. But even for, like, follow-up calls to make sure the patient did things, to understand if they have concerns and knowing when to escalate back to the doctor, the benefit is incredible. And, you know, that thing is ready for prime time. That paradigm is ready for prime time, in my view. LEE: Yeah, there are some good products, but it seems like the number one use right now—and we kind of got this from some of the previous guests in previous episodes—is the use of AI just to respond to emails from patients. [LAUGHTER] Does that make sense to you? BUBECK: Yeah. So maybe I want to second what Bill was saying but maybe take a step back first. You know, two years ago, like, the concept of clinical scribes, which is one of the things that we’re talking about right now, it would have sounded, in fact, it sounded two years ago, borderline dangerous. Because everybody was worried about hallucinations. What happened if you have this AI listening in and then it transcribes, you know, something wrong? Now, two years later, I think it’s mostly working. And in fact, it is not yet, you know, fully adopted. You’re right. But it is in production. It is used, you know, in many, many places. So this rate of progress is astounding because it wasn’t obvious that we would be able to overcome those obstacles of hallucination. It’s not to say that hallucinations are fully solved. In the case of the closed system, they are.   Now, I think more generally what’s going on in the background is that there is something that we, that certainly I, underestimated, which is this management overhead. So I think the reason why this is not adopted everywhere is really a training and teaching aspect. People need to be taught, like, those systems, how to interact with them. And one example that I really like, a study that recently appeared where they tried to use ChatGPT for diagnosis and they were comparing doctors without and with ChatGPT (opens in new tab). And the amazing thing … so this was a set of cases where the accuracy of the doctors alone was around 75%. ChatGPT alone was 90%. So that’s already kind of mind blowing. But then the kicker is that doctors with ChatGPT was 80%.   Intelligence alone is not enough. It’s also how it’s presented, how you interact with it. And ChatGPT, it’s an amazing tool. Obviously, I absolutely love it. But it’s not … you don’t want a doctor to have to type in, you know, prompts and use it that way. It should be, as Bill was saying, kind of running continuously in the background, sending you notifications. And you have to be really careful of the rate at which those notifications are being sent. Because if they are too frequent, then the doctor will learn to ignore them. So you have to … all of those things matter, in fact, at least as much as the level of intelligence of the machine. LEE: One of the things I think about, Bill, in that scenario that you described, doctors do some thinking about the patient when they write the note. So, you know, I’m always a little uncertain whether it’s actually … you know, you wouldn’t necessarily want to fully automate this, I don’t think. Or at least there needs to be some prompt to the doctor to make sure that the doctor puts some thought into what happened in the encounter with the patient. Does that make sense to you at all? GATES: At this stage, you know, I’d still put the onus on the doctor to write the conclusions and the summary and not delegate that. The tradeoffs you make a little bit are somewhat dependent on the situation you’re in. If you’re in Africa, So, yes, the doctor’s still going to have to do a lot of work, but just the quality of letting the patient and the people around them interact and ask questions and have things explained, that alone is such a quality improvement. It’s mind blowing.   LEE: So since you mentioned, you know, Africa—and, of course, this touches on the mission and some of the priorities of the Gates Foundation and this idea of democratization of access to expert medical care—what’s the most interesting stuff going on right now? Are there people and organizations or technologies that are impressing you or that you’re tracking? GATES: Yeah. So the Gates Foundation has given out a lot of grants to people in Africa doing education, agriculture but more healthcare examples than anything. And the way these things start off, they often start out either being patient-centric in a narrow situation, like, OK, I’m a pregnant woman; talk to me. Or, I have infectious disease symptoms; talk to me. Or they’re connected to a health worker where they’re helping that worker get their job done. And we have lots of pilots out, you know, in both of those cases.   The dream would be eventually to have the thing the patient consults be so broad that it’s like having a doctor available who understands the local things.   LEE: Right.   GATES: We’re not there yet. But over the next two or three years, you know, particularly given the worsening financial constraints against African health systems, where the withdrawal of money has been dramatic, you know, figuring out how to take this—what I sometimes call “free intelligence”—and build a quality health system around that, we will have to be more radical in low-income countries than any rich country is ever going to be.   LEE: Also, there’s maybe a different regulatory environment, so some of those things maybe are easier? Because right now, I think the world hasn’t figured out how to and whether to regulate, let’s say, an AI that might give a medical diagnosis or write a prescription for a medication. BUBECK: Yeah. I think one issue with this, and it’s also slowing down the deployment of AI in healthcare more generally, is a lack of proper benchmark. Because, you know, you were mentioning the USMLE [United States Medical Licensing Examination], for example. That’s a great test to test human beings and their knowledge of healthcare and medicine. But it’s not a great test to give to an AI. It’s not asking the right questions. So finding what are the right questions to test whether an AI system is ready to give diagnosis in a constrained setting, that’s a very, very important direction, which to my surprise, is not yet accelerating at the rate that I was hoping for. LEE: OK, so that gives me an excuse to get more now into the core AI tech because something I’ve discussed with both of you is this issue of what are the right tests. And you both know the very first test I give to any new spin of an LLM is I present a patient, the results—a mythical patient—the results of my physical exam, my mythical physical exam. Maybe some results of some initial labs. And then I present or propose a differential diagnosis. And if you’re not in medicine, a differential diagnosis you can just think of as a prioritized list of the possible diagnoses that fit with all that data. And in that proposed differential, I always intentionally make two mistakes. I make a textbook technical error in one of the possible elements of the differential diagnosis, and I have an error of omission. And, you know, I just want to know, does the LLM understand what I’m talking about? And all the good ones out there do now. But then I want to know, can it spot the errors? And then most importantly, is it willing to tell me I’m wrong, that I’ve made a mistake?   That last piece seems really hard for AI today. And so let me ask you first, Seb, because at the time of this taping, of course, there was a new spin of GPT-4o last week that became overly sycophantic. In other words, it was actually prone in that test of mine not only to not tell me I’m wrong, but it actually praised me for the creativity of my differential. [LAUGHTER] What’s up with that? BUBECK: Yeah, I guess it’s a testament to the fact that training those models is still more of an art than a science. So it’s a difficult job. Just to be clear with the audience, we have rolled back that [LAUGHS] version of GPT-4o, so now we don’t have the sycophant version out there. Yeah, no, it’s a really difficult question. It has to do … as you said, it’s very technical. It has to do with the post-training and how, like, where do you nudge the model? So, you know, there is this very classical by now technique called RLHF [reinforcement learning from human feedback], where you push the model in the direction of a certain reward model. So the reward model is just telling the model, you know, what behavior is good, what behavior is bad. But this reward model is itself an LLM, and, you know, Bill was saying at the very beginning of the conversation that we don’t really understand how those LLMs deal with concepts like, you know, where is the capital of France located? Things like that. It is the same thing for this reward model. We don’t know why it says that it prefers one output to another, and whether this is correlated with some sycophancy is, you know, something that we discovered basically just now. That if you push too hard in optimization on this reward model, you will get a sycophant model. So it’s kind of … what I’m trying to say is we became too good at what we were doing, and we ended up, in fact, in a trap of the reward model. LEE: I mean, you do want … it’s a difficult balance because you do want models to follow your desires and … BUBECK: It’s a very difficult, very difficult balance. LEE: So this brings up then the following question for me, which is the extent to which we think we’ll need to have specially trained models for things. So let me start with you, Bill. Do you have a point of view on whether we will need to, you know, quote-unquote take AI models to med school? Have them specially trained? Like, if you were going to deploy something to give medical care in underserved parts of the world, do we need to do something special to create those models? GATES: We certainly need to teach them the African languages and the unique dialects so that the multimedia interactions are very high quality. We certainly need to teach them the disease prevalence and unique disease patterns like, you know, neglected tropical diseases and malaria. So we need to gather a set of facts that somebody trying to go for a US customer base, you know, wouldn’t necessarily have that in there. Those two things are actually very straightforward because the additional training time is small. I’d say for the next few years, we’ll also need to do reinforcement learning about the context of being a doctor and how important certain behaviors are. Humans learn over the course of their life to some degree that, I’m in a different context and the way I behave in terms of being willing to criticize or be nice, you know, how important is it? Who’s here? What’s my relationship to them?   Right now, these machines don’t have that broad social experience. And so if you know it’s going to be used for health things, a lot of reinforcement learning of the very best humans in that context would still be valuable. Eventually, the models will, having read all the literature of the world about good doctors, bad doctors, it’ll understand as soon as you say, “I want you to be a doctor diagnosing somebody.” All of the implicit reinforcement that fits that situation, you know, will be there. LEE: Yeah. GATES: And so I hope three years from now, we don’t have to do that reinforcement learning. But today, for any medical context, you would want a lot of data to reinforce tone, willingness to say things when, you know, there might be something significant at stake. LEE: Yeah. So, you know, something Bill said, kind of, reminds me of another thing that I think we missed, which is, the context also … and the specialization also pertains to different, I guess, what we still call “modes,” although I don’t know if the idea of multimodal is the same as it was two years ago. But, you know, what do you make of all of the hubbub around—in fact, within Microsoft Research, this is a big deal, but I think we’re far from alone—you know, medical images and vision, video, proteins and molecules, cell, you know, cellular data and so on. BUBECK: Yeah. OK. So there is a lot to say to everything … to the last, you know, couple of minutes. Maybe on the specialization aspect, you know, I think there is, hiding behind this, a really fundamental scientific question of whether eventually we have a singular AGI [artificial general intelligence] that kind of knows everything and you can just put, you know, explain your own context and it will just get it and understand everything. That’s one vision. I have to say, I don’t particularly believe in this vision. In fact, we humans are not like that at all. I think, hopefully, we are general intelligences, yet we have to specialize a lot. And, you know, I did myself a lot of RL, reinforcement learning, on mathematics. Like, that’s what I did, you know, spent a lot of time doing that. And I didn’t improve on other aspects. You know, in fact, I probably degraded in other aspects. [LAUGHTER] So it’s … I think it’s an important example to have in mind. LEE: I think I might disagree with you on that, though, because, like, doesn’t a model have to see both good science and bad science in order to be able to gain the ability to discern between the two? BUBECK: Yeah, no, that absolutely. I think there is value in seeing the generality, in having a very broad base. But then you, kind of, specialize on verticals. And this is where also, you know, open-weights model, which we haven’t talked about yet, are really important because they allow you to provide this broad base to everyone. And then you can specialize on top of it. LEE: So we have about three hours of stuff to talk about, but our time is actually running low. BUBECK: Yes, yes, yes.   LEE: So I think I want … there’s a more provocative question. It’s almost a silly question, but I need to ask it of the two of you, which is, is there a future, you know, where AI replaces doctors or replaces, you know, medical specialties that we have today? So what does the world look like, say, five years from now? GATES: Well, it’s important to distinguish healthcare discovery activity from healthcare delivery activity. We focused mostly on delivery. I think it’s very much within the realm of possibility that the AI is not only accelerating healthcare discovery but substituting for a lot of the roles of, you know, I’m an organic chemist, or I run various types of assays. I can see those, which are, you know, testable-output-type jobs but with still very high value, I can see, you know, some replacement in those areas before the doctor.   The doctor, still understanding the human condition and long-term dialogues, you know, they’ve had a lifetime of reinforcement of that, particularly when you get into areas like mental health. So I wouldn’t say in five years, either people will choose to adopt it, but it will be profound that there’ll be this nearly free intelligence that can do follow-up, that can help you, you know, make sure you went through different possibilities. And so I’d say, yes, we’ll have doctors, but I’d say healthcare will be massively transformed in its quality and in efficiency by AI in that time period. LEE: Is there a comparison, useful comparison, say, between doctors and, say, programmers, computer programmers, or doctors and, I don’t know, lawyers? GATES: Programming is another one that has, kind of, a mathematical correctness to it, you know, and so the objective function that you’re trying to reinforce to, as soon as you can understand the state machines, you can have something that’s “checkable”; that’s correct. So I think programming, you know, which is weird to say, that the machine will beat us at most programming tasks before we let it take over roles that have deep empathy, you know, physical presence and social understanding in them. LEE: Yeah. By the way, you know, I fully expect in five years that AI will produce mathematical proofs that are checkable for validity, easily checkable, because they’ll be written in a proof-checking language like Lean or something but will be so complex that no human mathematician can understand them. I expect that to happen.   I can imagine in some fields, like cellular biology, we could have the same situation in the future because the molecular pathways, the chemistry, biochemistry of human cells or living cells is as complex as any mathematics, and so it seems possible that we may be in a state where in wet lab, we see, Oh yeah, this actually works, but no one can understand why. BUBECK: Yeah, absolutely. I mean, I think I really agree with Bill’s distinction of the discovery and the delivery, and indeed, the discovery’s when you can check things, and at the end, there is an artifact that you can verify. You know, you can run the protocol in the wet lab and see [if you have] produced what you wanted. So I absolutely agree with that.   And in fact, you know, we don’t have to talk five years from now. I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini (opens in new tab). So this is really amazing. And, you know, just very quickly, just so people know, it was about this statistical physics model, the frustrated Potts model, which has to do with coloring, and basically, the case of three colors, like, more than two colors was open for a long time, and o3 was able to reduce the case of three colors to two colors.   LEE: Yeah. BUBECK: Which is just, like, astounding. And this is not … this is now. This is happening right now. So this is something that I personally didn’t expect it would happen so quickly, and it’s due to those reasoning models.   Now, on the delivery side, I would add something more to it for the reason why doctors and, in fact, lawyers and coders will remain for a long time, and it’s because we still don’t understand how those models generalize. Like, at the end of the day, we are not able to tell you when they are confronted with a really new, novel situation, whether they will work or not. Nobody is able to give you that guarantee. And I think until we understand this generalization better, we’re not going to be willing to just let the system in the wild without human supervision. LEE: But don’t human doctors, human specialists … so, for example, a cardiologist sees a patient in a certain way that a nephrologist … BUBECK: Yeah. LEE: … or an endocrinologist might not. BUBECK: That’s right. But another cardiologist will understand and, kind of, expect a certain level of generalization from their peer. And this, we just don’t have it with AI models. Now, of course, you’re exactly right. That generalization is also hard for humans. Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know. LEE: OK. You know, the podcast is focused on what’s happened over the last two years. But now, I’d like one provocative prediction about what you think the world of AI and medicine is going to be at some point in the future. You pick your timeframe. I don’t care if it’s two years or 20 years from now, but, you know, what do you think will be different about AI in medicine in that future than today? BUBECK: Yeah, I think the deployment is going to accelerate soon. Like, we’re really not missing very much. There is this enormous capability overhang. Like, even if progress completely stopped, with current systems, we can do a lot more than what we’re doing right now. So I think this will … this has to be realized, you know, sooner rather than later. And I think it’s probably dependent on these benchmarks and proper evaluation and tying this with regulation. So these are things that take time in human society and for good reason. But now we already are at two years; you know, give it another two years and it should be really …   LEE: Will AI prescribe your medicines? Write your prescriptions? BUBECK: I think yes. I think yes. LEE: OK. Bill? GATES: Well, I think the next two years, we’ll have massive pilots, and so the amount of use of the AI, still in a copilot-type mode, you know, we should get millions of patient visits, you know, both in general medicine and in the mental health side, as well. And I think that’s going to build up both the data and the confidence to give the AI some additional autonomy. You know, are you going to let it talk to you at night when you’re panicked about your mental health with some ability to escalate? And, you know, I’ve gone so far as to tell politicians with national health systems that if they deploy AI appropriately, that the quality of care, the overload of the doctors, the improvement in the economics will be enough that their voters will be stunned because they just don’t expect this, and, you know, they could be reelected [LAUGHTER] just on this one thing of fixing what is a very overloaded and economically challenged health system in these rich countries. You know, my personal role is going to be to make sure that in the poorer countries, there isn’t some lag; in fact, in many cases, that we’ll be more aggressive because, you know, we’re comparing to having no access to doctors at all. And, you know, so I think whether it’s India or Africa, there’ll be lessons that are globally valuable because we need medical intelligence. And, you know, thank god AI is going to provide a lot of that. LEE: Well, on that optimistic note, I think that’s a good way to end. Bill, Seb, really appreciate all of this.   I think the most fundamental prediction we made in the book is that AI would actually find its way into the practice of medicine, and I think that that at least has come true, maybe in different ways than we expected, but it’s come true, and I think it’ll only accelerate from here. So thanks again, both of you. [TRANSITION MUSIC] GATES: Yeah. Thanks, you guys. BUBECK: Thank you, Peter. Thanks, Bill. LEE: I just always feel such a sense of privilege to have a chance to interact and actually work with people like Bill and Sébastien.    With Bill, I’m always amazed at how practically minded he is. He’s really thinking about the nuts and bolts of what AI might be able to do for people, and his thoughts about underserved parts of the world, the idea that we might actually be able to empower people with access to expert medical knowledge, I think is both inspiring and amazing.   And then, Seb, Sébastien Bubeck, he’s just absolutely a brilliant mind. He has a really firm grip on the deep mathematics of artificial intelligence and brings that to bear in his research and development work. And where that mathematics takes him isn’t just into the nuts and bolts of algorithms but into philosophical questions about the nature of intelligence.   One of the things that Sébastien brought up was the state of evaluation of AI systems. And indeed, he was fairly critical in our conversation. But of course, the world of AI research and development is just moving so fast, and indeed, since we recorded our conversation, OpenAI, in fact, released a new evaluation metric that is directly relevant to medical applications, and that is something called HealthBench. And Microsoft Research also released a new evaluation approach or process called ADeLe.   HealthBench and ADeLe are examples of new approaches to evaluating AI models that are less about testing their knowledge and ability to pass multiple-choice exams and instead are evaluation approaches designed to assess how well AI models are able to complete tasks that actually arise every day in typical healthcare or biomedical research settings. These are examples of really important good work that speak to how well AI models work in the real world of healthcare and biomedical research and how well they can collaborate with human beings in those settings. You know, I asked Bill and Seb to make some predictions about the future. You know, my own answer, I expect that we’re going to be able to use AI to change how we diagnose patients, change how we decide treatment options.   If you’re a doctor or a nurse and you encounter a patient, you’ll ask questions, do a physical exam, you know, call out for labs just like you do today, but then you’ll be able to engage with AI based on all of that data and just ask, you know, based on all the other people who have gone through the same experience, who have similar data, how were they diagnosed? How were they treated? What were their outcomes? And what does that mean for the patient I have right now? Some people call it the “patients like me” paradigm. And I think that’s going to become real because of AI within our lifetimes. That idea of really grounding the delivery in healthcare and medical practice through data and intelligence, I actually now don’t see any barriers to that future becoming real. [THEME MUSIC] I’d like to extend another big thank you to Bill and Sébastien for their time. And to our listeners, as always, it’s a pleasure to have you along for the ride. I hope you’ll join us for our remaining conversations, as well as a second coauthor roundtable with Carey and Zak.   Until next time.   [MUSIC FADES]

0 Commentarios ·0 Acciones ·0 Vista previa

Please log in to like, share and comment!
3D Printing Industry @3DPrintingIndustry Compartió un vínculo
2025-06-15 06:42:56 ·

UMass and MIT Test Cold Spray 3D Printing to Repair Aging Massachusetts Bridge

Researchers from the US-based University of Massachusetts Amherst, in collaboration with the Massachusetts Institute of TechnologyDepartment of Mechanical Engineering, have applied cold spray to repair the deteriorating “Brown Bridge” in Great Barrington, built in 1949. The project marks the first known use of this method on bridge infrastructure and aims to evaluate its effectiveness as a faster, more cost-effective, and less disruptive alternative to conventional repair techniques.
“Now that we’ve completed this proof-of-concept repair, we see a clear path to a solution that is much faster, less costly, easier, and less invasive,” said Simos Gerasimidis, associate professor of civil and environmental engineering at the University of Massachusetts Amherst. “To our knowledge, this is a first. Of course, there is some R&D that needs to be developed, but this is a huge milestone to that,” he added.
The pilot project is also a collaboration with the Massachusetts Department of Transportation, the Massachusetts Technology Collaborative, the U.S. Department of Transportation, and the Federal Highway Administration. It was supported by the Massachusetts Manufacturing Innovation Initiative, which provided essential equipment for the demonstration.
Members of the UMass Amherst and MIT Department of Mechanical Engineering research team, led by Simos Gerasimidis. Photo via UMass Amherst.
Tackling America’s Bridge Crisis with Cold Spray Technology
Nearly half of the bridges across the United States are in “fair” condition, while 6.8% are classified as “poor,” according to the 2025 Report Card for America’s Infrastructure. In Massachusetts, about 9% of the state’s 5,295 bridges are considered structurally deficient. The costs of restoring this infrastructure are projected to exceed billion—well beyond current funding levels.
The cold spray method consists of propelling metal powder particles at high velocity onto the beam’s surface. Successive applications build up additional layers, helping restore its thickness and structural integrity. This method has successfully been used to repair large structures such as submarines, airplanes, and ships, but this marks the first instance of its application to a bridge.
One of cold spray’s key advantages is its ability to be deployed with minimal traffic disruption. “Every time you do repairs on a bridge you have to block traffic, you have to make traffic controls for substantial amounts of time,” explained Gerasimidis. “This will allow us toon this actual bridge while cars are going.”
To enhance precision, the research team integrated 3D LiDAR scanning technology into the process. Unlike visual inspections, which can be subjective and time-consuming, LiDAR creates high-resolution digital models that pinpoint areas of corrosion. This allows teams to develop targeted repair plans and deposit materials only where needed—reducing waste and potentially extending a bridge’s lifespan.
Next steps: Testing Cold-Sprayed Repairs
The bridge is scheduled for demolition in the coming years. When that happens, researchers will retrieve the repaired sections for further analysis. They plan to assess the durability, corrosion resistance, and mechanical performance of the cold-sprayed steel in real-world conditions, comparing it to results from laboratory tests.
“This is a tremendous collaboration where cutting-edge technology is brought to address a critical need for infrastructure in the commonwealth and across the United States,” said John Hart, Class of 1922 Professor in the Department of Mechanical Engineering at MIT. “I think we’re just at the beginning of a digital transformation of bridge inspection, repair and maintenance, among many other important use cases.”
3D Printing for Infrastructure Repairs
Beyond cold spray techniques, other innovative 3D printing methods are emerging to address construction repair challenges. For example, researchers at University College Londonhave developed an asphalt 3D printer specifically designed to repair road cracks and potholes. “The material properties of 3D printed asphalt are tunable, and combined with the flexibility and efficiency of the printing platform, this technique offers a compelling new design approach to the maintenance of infrastructure,” the UCL team explained.
Similarly, in 2018, Cintec, a Wales-based international structural engineering firm, contributed to restoring the historic Government building known as the Red House in the Republic of Trinidad and Tobago. This project, managed by Cintec’s North American branch, marked the first use of additive manufacturing within sacrificial structures. It also featured the installation of what are claimed to be the longest reinforcement anchors ever inserted into a structure—measuring an impressive 36.52 meters.
Join our Additive Manufacturing Advantageevent on July 10th, where AM leaders from Aerospace, Space, and Defense come together to share mission-critical insights. Online and free to attend.Secure your spot now.
Who won the2024 3D Printing Industry Awards?
Subscribe to the 3D Printing Industry newsletterto keep up with the latest 3D printing news.
You can also follow us onLinkedIn, and subscribe to the 3D Printing Industry Youtube channel to access more exclusive content.
Featured image shows members of the UMass Amherst and MIT Department of Mechanical Engineering research team, led by Simos Gerasimidis. Photo via UMass Amherst.
#umass #mit #test #cold #spray

UMass and MIT Test Cold Spray 3D Printing to Repair Aging Massachusetts Bridge
Researchers from the US-based University of Massachusetts Amherst, in collaboration with the Massachusetts Institute of TechnologyDepartment of Mechanical Engineering, have applied cold spray to repair the deteriorating “Brown Bridge” in Great Barrington, built in 1949. The project marks the first known use of this method on bridge infrastructure and aims to evaluate its effectiveness as a faster, more cost-effective, and less disruptive alternative to conventional repair techniques. “Now that we’ve completed this proof-of-concept repair, we see a clear path to a solution that is much faster, less costly, easier, and less invasive,” said Simos Gerasimidis, associate professor of civil and environmental engineering at the University of Massachusetts Amherst. “To our knowledge, this is a first. Of course, there is some R&D that needs to be developed, but this is a huge milestone to that,” he added. The pilot project is also a collaboration with the Massachusetts Department of Transportation, the Massachusetts Technology Collaborative, the U.S. Department of Transportation, and the Federal Highway Administration. It was supported by the Massachusetts Manufacturing Innovation Initiative, which provided essential equipment for the demonstration. Members of the UMass Amherst and MIT Department of Mechanical Engineering research team, led by Simos Gerasimidis. Photo via UMass Amherst. Tackling America’s Bridge Crisis with Cold Spray Technology Nearly half of the bridges across the United States are in “fair” condition, while 6.8% are classified as “poor,” according to the 2025 Report Card for America’s Infrastructure. In Massachusetts, about 9% of the state’s 5,295 bridges are considered structurally deficient. The costs of restoring this infrastructure are projected to exceed billion—well beyond current funding levels. The cold spray method consists of propelling metal powder particles at high velocity onto the beam’s surface. Successive applications build up additional layers, helping restore its thickness and structural integrity. This method has successfully been used to repair large structures such as submarines, airplanes, and ships, but this marks the first instance of its application to a bridge. One of cold spray’s key advantages is its ability to be deployed with minimal traffic disruption. “Every time you do repairs on a bridge you have to block traffic, you have to make traffic controls for substantial amounts of time,” explained Gerasimidis. “This will allow us toon this actual bridge while cars are going.” To enhance precision, the research team integrated 3D LiDAR scanning technology into the process. Unlike visual inspections, which can be subjective and time-consuming, LiDAR creates high-resolution digital models that pinpoint areas of corrosion. This allows teams to develop targeted repair plans and deposit materials only where needed—reducing waste and potentially extending a bridge’s lifespan. Next steps: Testing Cold-Sprayed Repairs The bridge is scheduled for demolition in the coming years. When that happens, researchers will retrieve the repaired sections for further analysis. They plan to assess the durability, corrosion resistance, and mechanical performance of the cold-sprayed steel in real-world conditions, comparing it to results from laboratory tests. “This is a tremendous collaboration where cutting-edge technology is brought to address a critical need for infrastructure in the commonwealth and across the United States,” said John Hart, Class of 1922 Professor in the Department of Mechanical Engineering at MIT. “I think we’re just at the beginning of a digital transformation of bridge inspection, repair and maintenance, among many other important use cases.” 3D Printing for Infrastructure Repairs Beyond cold spray techniques, other innovative 3D printing methods are emerging to address construction repair challenges. For example, researchers at University College Londonhave developed an asphalt 3D printer specifically designed to repair road cracks and potholes. “The material properties of 3D printed asphalt are tunable, and combined with the flexibility and efficiency of the printing platform, this technique offers a compelling new design approach to the maintenance of infrastructure,” the UCL team explained. Similarly, in 2018, Cintec, a Wales-based international structural engineering firm, contributed to restoring the historic Government building known as the Red House in the Republic of Trinidad and Tobago. This project, managed by Cintec’s North American branch, marked the first use of additive manufacturing within sacrificial structures. It also featured the installation of what are claimed to be the longest reinforcement anchors ever inserted into a structure—measuring an impressive 36.52 meters. Join our Additive Manufacturing Advantageevent on July 10th, where AM leaders from Aerospace, Space, and Defense come together to share mission-critical insights. Online and free to attend.Secure your spot now. Who won the2024 3D Printing Industry Awards? Subscribe to the 3D Printing Industry newsletterto keep up with the latest 3D printing news. You can also follow us onLinkedIn, and subscribe to the 3D Printing Industry Youtube channel to access more exclusive content. Featured image shows members of the UMass Amherst and MIT Department of Mechanical Engineering research team, led by Simos Gerasimidis. Photo via UMass Amherst. #umass #mit #test #cold #spray

UMass and MIT Test Cold Spray 3D Printing to Repair Aging Massachusetts Bridge

3dprintingindustry.com
Researchers from the US-based University of Massachusetts Amherst (UMass), in collaboration with the Massachusetts Institute of Technology (MIT) Department of Mechanical Engineering, have applied cold spray to repair the deteriorating “Brown Bridge” in Great Barrington, built in 1949. The project marks the first known use of this method on bridge infrastructure and aims to evaluate its effectiveness as a faster, more cost-effective, and less disruptive alternative to conventional repair techniques. “Now that we’ve completed this proof-of-concept repair, we see a clear path to a solution that is much faster, less costly, easier, and less invasive,” said Simos Gerasimidis, associate professor of civil and environmental engineering at the University of Massachusetts Amherst. “To our knowledge, this is a first. Of course, there is some R&D that needs to be developed, but this is a huge milestone to that,” he added. The pilot project is also a collaboration with the Massachusetts Department of Transportation (MassDOT), the Massachusetts Technology Collaborative (MassTech), the U.S. Department of Transportation, and the Federal Highway Administration. It was supported by the Massachusetts Manufacturing Innovation Initiative, which provided essential equipment for the demonstration. Members of the UMass Amherst and MIT Department of Mechanical Engineering research team, led by Simos Gerasimidis (left, standing). Photo via UMass Amherst. Tackling America’s Bridge Crisis with Cold Spray Technology Nearly half of the bridges across the United States are in “fair” condition, while 6.8% are classified as “poor,” according to the 2025 Report Card for America’s Infrastructure. In Massachusetts, about 9% of the state’s 5,295 bridges are considered structurally deficient. The costs of restoring this infrastructure are projected to exceed $190 billion—well beyond current funding levels. The cold spray method consists of propelling metal powder particles at high velocity onto the beam’s surface. Successive applications build up additional layers, helping restore its thickness and structural integrity. This method has successfully been used to repair large structures such as submarines, airplanes, and ships, but this marks the first instance of its application to a bridge. One of cold spray’s key advantages is its ability to be deployed with minimal traffic disruption. “Every time you do repairs on a bridge you have to block traffic, you have to make traffic controls for substantial amounts of time,” explained Gerasimidis. “This will allow us to [apply the technique] on this actual bridge while cars are going [across].” To enhance precision, the research team integrated 3D LiDAR scanning technology into the process. Unlike visual inspections, which can be subjective and time-consuming, LiDAR creates high-resolution digital models that pinpoint areas of corrosion. This allows teams to develop targeted repair plans and deposit materials only where needed—reducing waste and potentially extending a bridge’s lifespan. Next steps: Testing Cold-Sprayed Repairs The bridge is scheduled for demolition in the coming years. When that happens, researchers will retrieve the repaired sections for further analysis. They plan to assess the durability, corrosion resistance, and mechanical performance of the cold-sprayed steel in real-world conditions, comparing it to results from laboratory tests. “This is a tremendous collaboration where cutting-edge technology is brought to address a critical need for infrastructure in the commonwealth and across the United States,” said John Hart, Class of 1922 Professor in the Department of Mechanical Engineering at MIT. “I think we’re just at the beginning of a digital transformation of bridge inspection, repair and maintenance, among many other important use cases.” 3D Printing for Infrastructure Repairs Beyond cold spray techniques, other innovative 3D printing methods are emerging to address construction repair challenges. For example, researchers at University College London (UCL) have developed an asphalt 3D printer specifically designed to repair road cracks and potholes. “The material properties of 3D printed asphalt are tunable, and combined with the flexibility and efficiency of the printing platform, this technique offers a compelling new design approach to the maintenance of infrastructure,” the UCL team explained. Similarly, in 2018, Cintec, a Wales-based international structural engineering firm, contributed to restoring the historic Government building known as the Red House in the Republic of Trinidad and Tobago. This project, managed by Cintec’s North American branch, marked the first use of additive manufacturing within sacrificial structures. It also featured the installation of what are claimed to be the longest reinforcement anchors ever inserted into a structure—measuring an impressive 36.52 meters. Join our Additive Manufacturing Advantage (AMAA) event on July 10th, where AM leaders from Aerospace, Space, and Defense come together to share mission-critical insights. Online and free to attend.Secure your spot now. Who won the2024 3D Printing Industry Awards? Subscribe to the 3D Printing Industry newsletterto keep up with the latest 3D printing news. You can also follow us onLinkedIn, and subscribe to the 3D Printing Industry Youtube channel to access more exclusive content. Featured image shows members of the UMass Amherst and MIT Department of Mechanical Engineering research team, led by Simos Gerasimidis (left, standing). Photo via UMass Amherst.

0 Commentarios ·0 Acciones ·0 Vista previa

Please log in to like, share and comment!
Marktechpost AI @MarktechpostAI Compartió un vínculo
2025-06-15 06:37:21 ·

OThink-R1: A Dual-Mode Reasoning Framework to Cut Redundant Computation in LLMs

The Inefficiency of Static Chain-of-Thought Reasoning in LRMs
Recent LRMs achieve top performance by using detailed CoT reasoning to solve complex tasks. However, many simple tasks they handle could be solved by smaller models with fewer tokens, making such elaborate reasoning unnecessary. This echoes human thinking, where we use fast, intuitive responses for easy problems and slower, analytical thinking for complex ones. While LRMs mimic slow, logical reasoning, they generate significantly longer outputs, thereby increasing computational cost. Current methods for reducing reasoning steps lack flexibility, limiting models to a single fixed reasoning style. There is a growing need for adaptive reasoning that adjusts effort according to task difficulty.
Limitations of Existing Training-Based and Training-Free Approaches
Recent research on improving reasoning efficiency in LRMs can be categorized into two main areas: training-based and training-free methods. Training strategies often use reinforcement learning or fine-tuning to limit token usage or adjust reasoning depth, but they tend to follow fixed patterns without flexibility. Training-free approaches utilize prompt engineering or pattern detection to shorten outputs during inference; however, they also lack adaptability. More recent work focuses on variable-length reasoning, where models adjust reasoning depth based on task complexity. Others study “overthinking,” where models over-reason unnecessarily. However, few methods enable dynamic switching between quick and thorough reasoning—something this paper addresses directly.
Introducing OThink-R1: Dynamic Fast/Slow Reasoning Framework
Researchers from Zhejiang University and OPPO have developed OThink-R1, a new approach that enables LRMs to switch between fast and slow thinking smartly, much like humans do. By analyzing reasoning patterns, they identified which steps are essential and which are redundant. With help from another model acting as a judge, they trained LRMs to adapt their reasoning style based on task complexity. Their method reduces unnecessary reasoning by over 23% without losing accuracy. Using a loss function and fine-tuned datasets, OThink-R1 outperforms previous models in both efficiency and performance on various math and question-answering tasks.
System Architecture: Reasoning Pruning and Dual-Reference Optimization
The OThink-R1 framework helps LRMs dynamically switch between fast and slow thinking. First, it identifies when LRMs include unnecessary reasoning, like overexplaining or double-checking, versus when detailed steps are truly essential. Using this, it builds a curated training dataset by pruning redundant reasoning and retaining valuable logic. Then, during fine-tuning, a special loss function balances both reasoning styles. This dual-reference loss compares the model’s outputs with both fast and slow thinking variants, encouraging flexibility. As a result, OThink-R1 can adaptively choose the most efficient reasoning path for each problem while preserving accuracy and logical depth.

Empirical Evaluation and Comparative Performance
The OThink-R1 model was tested on simpler QA and math tasks to evaluate its ability to switch between fast and slow reasoning. Using datasets like OpenBookQA, CommonsenseQA, ASDIV, and GSM8K, the model demonstrated strong performance, generating fewer tokens while maintaining or improving accuracy. Compared to baselines such as NoThinking and DualFormer, OThink-R1 demonstrated a better balance between efficiency and effectiveness. Ablation studies confirmed the importance of pruning, KL constraints, and LLM-Judge in achieving optimal results. A case study illustrated that unnecessary reasoning can lead to overthinking and reduced accuracy, highlighting OThink-R1’s strength in adaptive reasoning.

Conclusion: Towards Scalable and Efficient Hybrid Reasoning Systems
In conclusion, OThink-R1 is a large reasoning model that adaptively switches between fast and slow thinking modes to improve both efficiency and performance. It addresses the issue of unnecessarily complex reasoning in large models by analyzing and classifying reasoning steps as either essential or redundant. By pruning the redundant ones while maintaining logical accuracy, OThink-R1 reduces unnecessary computation. It also introduces a dual-reference KL-divergence loss to strengthen hybrid reasoning. Tested on math and QA tasks, it cuts down reasoning redundancy by 23% without sacrificing accuracy, showing promise for building more adaptive, scalable, and efficient AI reasoning systems in the future.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Building AI-Powered Applications Using the Plan → Files → Code Workflow in TinyDevSana Hassanhttps://www.marktechpost.com/author/sana-hassan/MemOS: A Memory-Centric Operating System for Evolving and Adaptive Large Language ModelsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Google AI Unveils a Hybrid AI-Physics Model for Accurate Regional Climate Risk Forecasts with Better Uncertainty AssessmentSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Run Multiple AI Coding Agents in Parallel with Container-Use from Dagger
#othinkr1 #dualmode #reasoning #framework #cut

OThink-R1: A Dual-Mode Reasoning Framework to Cut Redundant Computation in LLMs
The Inefficiency of Static Chain-of-Thought Reasoning in LRMs Recent LRMs achieve top performance by using detailed CoT reasoning to solve complex tasks. However, many simple tasks they handle could be solved by smaller models with fewer tokens, making such elaborate reasoning unnecessary. This echoes human thinking, where we use fast, intuitive responses for easy problems and slower, analytical thinking for complex ones. While LRMs mimic slow, logical reasoning, they generate significantly longer outputs, thereby increasing computational cost. Current methods for reducing reasoning steps lack flexibility, limiting models to a single fixed reasoning style. There is a growing need for adaptive reasoning that adjusts effort according to task difficulty. Limitations of Existing Training-Based and Training-Free Approaches Recent research on improving reasoning efficiency in LRMs can be categorized into two main areas: training-based and training-free methods. Training strategies often use reinforcement learning or fine-tuning to limit token usage or adjust reasoning depth, but they tend to follow fixed patterns without flexibility. Training-free approaches utilize prompt engineering or pattern detection to shorten outputs during inference; however, they also lack adaptability. More recent work focuses on variable-length reasoning, where models adjust reasoning depth based on task complexity. Others study “overthinking,” where models over-reason unnecessarily. However, few methods enable dynamic switching between quick and thorough reasoning—something this paper addresses directly. Introducing OThink-R1: Dynamic Fast/Slow Reasoning Framework Researchers from Zhejiang University and OPPO have developed OThink-R1, a new approach that enables LRMs to switch between fast and slow thinking smartly, much like humans do. By analyzing reasoning patterns, they identified which steps are essential and which are redundant. With help from another model acting as a judge, they trained LRMs to adapt their reasoning style based on task complexity. Their method reduces unnecessary reasoning by over 23% without losing accuracy. Using a loss function and fine-tuned datasets, OThink-R1 outperforms previous models in both efficiency and performance on various math and question-answering tasks. System Architecture: Reasoning Pruning and Dual-Reference Optimization The OThink-R1 framework helps LRMs dynamically switch between fast and slow thinking. First, it identifies when LRMs include unnecessary reasoning, like overexplaining or double-checking, versus when detailed steps are truly essential. Using this, it builds a curated training dataset by pruning redundant reasoning and retaining valuable logic. Then, during fine-tuning, a special loss function balances both reasoning styles. This dual-reference loss compares the model’s outputs with both fast and slow thinking variants, encouraging flexibility. As a result, OThink-R1 can adaptively choose the most efficient reasoning path for each problem while preserving accuracy and logical depth. Empirical Evaluation and Comparative Performance The OThink-R1 model was tested on simpler QA and math tasks to evaluate its ability to switch between fast and slow reasoning. Using datasets like OpenBookQA, CommonsenseQA, ASDIV, and GSM8K, the model demonstrated strong performance, generating fewer tokens while maintaining or improving accuracy. Compared to baselines such as NoThinking and DualFormer, OThink-R1 demonstrated a better balance between efficiency and effectiveness. Ablation studies confirmed the importance of pruning, KL constraints, and LLM-Judge in achieving optimal results. A case study illustrated that unnecessary reasoning can lead to overthinking and reduced accuracy, highlighting OThink-R1’s strength in adaptive reasoning. Conclusion: Towards Scalable and Efficient Hybrid Reasoning Systems In conclusion, OThink-R1 is a large reasoning model that adaptively switches between fast and slow thinking modes to improve both efficiency and performance. It addresses the issue of unnecessarily complex reasoning in large models by analyzing and classifying reasoning steps as either essential or redundant. By pruning the redundant ones while maintaining logical accuracy, OThink-R1 reduces unnecessary computation. It also introduces a dual-reference KL-divergence loss to strengthen hybrid reasoning. Tested on math and QA tasks, it cuts down reasoning redundancy by 23% without sacrificing accuracy, showing promise for building more adaptive, scalable, and efficient AI reasoning systems in the future. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Building AI-Powered Applications Using the Plan → Files → Code Workflow in TinyDevSana Hassanhttps://www.marktechpost.com/author/sana-hassan/MemOS: A Memory-Centric Operating System for Evolving and Adaptive Large Language ModelsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Google AI Unveils a Hybrid AI-Physics Model for Accurate Regional Climate Risk Forecasts with Better Uncertainty AssessmentSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Run Multiple AI Coding Agents in Parallel with Container-Use from Dagger #othinkr1 #dualmode #reasoning #framework #cut

OThink-R1: A Dual-Mode Reasoning Framework to Cut Redundant Computation in LLMs

www.marktechpost.com
The Inefficiency of Static Chain-of-Thought Reasoning in LRMs Recent LRMs achieve top performance by using detailed CoT reasoning to solve complex tasks. However, many simple tasks they handle could be solved by smaller models with fewer tokens, making such elaborate reasoning unnecessary. This echoes human thinking, where we use fast, intuitive responses for easy problems and slower, analytical thinking for complex ones. While LRMs mimic slow, logical reasoning, they generate significantly longer outputs, thereby increasing computational cost. Current methods for reducing reasoning steps lack flexibility, limiting models to a single fixed reasoning style. There is a growing need for adaptive reasoning that adjusts effort according to task difficulty. Limitations of Existing Training-Based and Training-Free Approaches Recent research on improving reasoning efficiency in LRMs can be categorized into two main areas: training-based and training-free methods. Training strategies often use reinforcement learning or fine-tuning to limit token usage or adjust reasoning depth, but they tend to follow fixed patterns without flexibility. Training-free approaches utilize prompt engineering or pattern detection to shorten outputs during inference; however, they also lack adaptability. More recent work focuses on variable-length reasoning, where models adjust reasoning depth based on task complexity. Others study “overthinking,” where models over-reason unnecessarily. However, few methods enable dynamic switching between quick and thorough reasoning—something this paper addresses directly. Introducing OThink-R1: Dynamic Fast/Slow Reasoning Framework Researchers from Zhejiang University and OPPO have developed OThink-R1, a new approach that enables LRMs to switch between fast and slow thinking smartly, much like humans do. By analyzing reasoning patterns, they identified which steps are essential and which are redundant. With help from another model acting as a judge, they trained LRMs to adapt their reasoning style based on task complexity. Their method reduces unnecessary reasoning by over 23% without losing accuracy. Using a loss function and fine-tuned datasets, OThink-R1 outperforms previous models in both efficiency and performance on various math and question-answering tasks. System Architecture: Reasoning Pruning and Dual-Reference Optimization The OThink-R1 framework helps LRMs dynamically switch between fast and slow thinking. First, it identifies when LRMs include unnecessary reasoning, like overexplaining or double-checking, versus when detailed steps are truly essential. Using this, it builds a curated training dataset by pruning redundant reasoning and retaining valuable logic. Then, during fine-tuning, a special loss function balances both reasoning styles. This dual-reference loss compares the model’s outputs with both fast and slow thinking variants, encouraging flexibility. As a result, OThink-R1 can adaptively choose the most efficient reasoning path for each problem while preserving accuracy and logical depth. Empirical Evaluation and Comparative Performance The OThink-R1 model was tested on simpler QA and math tasks to evaluate its ability to switch between fast and slow reasoning. Using datasets like OpenBookQA, CommonsenseQA, ASDIV, and GSM8K, the model demonstrated strong performance, generating fewer tokens while maintaining or improving accuracy. Compared to baselines such as NoThinking and DualFormer, OThink-R1 demonstrated a better balance between efficiency and effectiveness. Ablation studies confirmed the importance of pruning, KL constraints, and LLM-Judge in achieving optimal results. A case study illustrated that unnecessary reasoning can lead to overthinking and reduced accuracy, highlighting OThink-R1’s strength in adaptive reasoning. Conclusion: Towards Scalable and Efficient Hybrid Reasoning Systems In conclusion, OThink-R1 is a large reasoning model that adaptively switches between fast and slow thinking modes to improve both efficiency and performance. It addresses the issue of unnecessarily complex reasoning in large models by analyzing and classifying reasoning steps as either essential or redundant. By pruning the redundant ones while maintaining logical accuracy, OThink-R1 reduces unnecessary computation. It also introduces a dual-reference KL-divergence loss to strengthen hybrid reasoning. Tested on math and QA tasks, it cuts down reasoning redundancy by 23% without sacrificing accuracy, showing promise for building more adaptive, scalable, and efficient AI reasoning systems in the future. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Building AI-Powered Applications Using the Plan → Files → Code Workflow in TinyDevSana Hassanhttps://www.marktechpost.com/author/sana-hassan/MemOS: A Memory-Centric Operating System for Evolving and Adaptive Large Language ModelsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Google AI Unveils a Hybrid AI-Physics Model for Accurate Regional Climate Risk Forecasts with Better Uncertainty AssessmentSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Run Multiple AI Coding Agents in Parallel with Container-Use from Dagger

0 Commentarios ·0 Acciones ·0 Vista previa

Please log in to like, share and comment!
Marktechpost AI @MarktechpostAI Compartió un vínculo
2025-06-07 20:19:33 ·

ByteDance Researchers Introduce DetailFlow: A 1D Coarse-to-Fine Autoregressive Framework for Faster, Token-Efficient Image Generation

Autoregressive image generation has been shaped by advances in sequential modeling, originally seen in natural language processing. This field focuses on generating images one token at a time, similar to how sentences are constructed in language models. The appeal of this approach lies in its ability to maintain structural coherence across the image while allowing for high levels of control during the generation process. As researchers began to apply these techniques to visual data, they found that structured prediction not only preserved spatial integrity but also supported tasks like image manipulation and multimodal translation effectively.
Despite these benefits, generating high-resolution images remains computationally expensive and slow. A primary issue is the number of tokens needed to represent complex visuals. Raster-scan methods that flatten 2D images into linear sequences require thousands of tokens for detailed images, resulting in long inference times and high memory consumption. Models like Infinity need over 10,000 tokens for a 1024×1024 image. This becomes unsustainable for real-time applications or when scaling to more extensive datasets. Reducing the token burden while preserving or improving output quality has become a pressing challenge.

Efforts to mitigate token inflation have led to innovations like next-scale prediction seen in VAR and FlexVAR. These models create images by predicting progressively finer scales, which imitates the human tendency to sketch rough outlines before adding detail. However, they still rely on hundreds of tokens—680 in the case of VAR and FlexVAR for 256×256 images. Moreover, approaches like TiTok and FlexTok use 1D tokenization to compress spatial redundancy, but they often fail to scale efficiently. For example, FlexTok’s gFID increases from 1.9 at 32 tokens to 2.5 at 256 tokens, highlighting a degradation in output quality as the token count grows.
Researchers from ByteDance introduced DetailFlow, a 1D autoregressive image generation framework. This method arranges token sequences from global to fine detail using a process called next-detail prediction. Unlike traditional 2D raster-scan or scale-based techniques, DetailFlow employs a 1D tokenizer trained on progressively degraded images. This design allows the model to prioritize foundational image structures before refining visual details. By mapping tokens directly to resolution levels, DetailFlow significantly reduces token requirements, enabling images to be generated in a semantically ordered, coarse-to-fine manner.

The mechanism in DetailFlow centers on a 1D latent space where each token contributes incrementally more detail. Earlier tokens encode global features, while later tokens refine specific visual aspects. To train this, the researchers created a resolution mapping function that links token count to target resolution. During training, the model is exposed to images of varying quality levels and learns to predict progressively higher-resolution outputs as more tokens are introduced. It also implements parallel token prediction by grouping sequences and predicting entire sets at once. Since parallel prediction can introduce sampling errors, a self-correction mechanism was integrated. This system perturbs certain tokens during training and teaches subsequent tokens to compensate, ensuring that final images maintain structural and visual integrity.
The results from the experiments on the ImageNet 256×256 benchmark were noteworthy. DetailFlow achieved a gFID score of 2.96 using only 128 tokens, outperforming VAR at 3.3 and FlexVAR at 3.05, both of which used 680 tokens. Even more impressive, DetailFlow-64 reached a gFID of 2.62 using 512 tokens. In terms of speed, it delivered nearly double the inference rate of VAR and FlexVAR. A further ablation study confirmed that the self-correction training and semantic ordering of tokens substantially improved output quality. For example, enabling self-correction dropped the gFID from 4.11 to 3.68 in one setting. These metrics demonstrate both higher quality and faster generation compared to established models.

By focusing on semantic structure and reducing redundancy, DetailFlow presents a viable solution to long-standing issues in autoregressive image generation. The method’s coarse-to-fine approach, efficient parallel decoding, and ability to self-correct highlight how architectural innovations can address performance and scalability limitations. Through their structured use of 1D tokens, the researchers from ByteDance have demonstrated a model that maintains high image fidelity while significantly reducing computational load, making it a valuable addition to image synthesis research.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.
NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement FinetuningNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces LLaDA-V: A Purely Diffusion-Based Multimodal Large Language Model for Visual Instruction Tuning and Multimodal ReasoningNikhilhttps://www.marktechpost.com/author/nikhil0980/NVIDIA AI Introduces Fast-dLLM: A Training-Free Framework That Brings KV Caching and Parallel Decoding to Diffusion LLMsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meet NovelSeek: A Unified Multi-Agent Framework for Autonomous Scientific Research from Hypothesis Generation to Experimental Validation
#bytedance #researchers #introduce #detailflow #coarsetofine

ByteDance Researchers Introduce DetailFlow: A 1D Coarse-to-Fine Autoregressive Framework for Faster, Token-Efficient Image Generation
Autoregressive image generation has been shaped by advances in sequential modeling, originally seen in natural language processing. This field focuses on generating images one token at a time, similar to how sentences are constructed in language models. The appeal of this approach lies in its ability to maintain structural coherence across the image while allowing for high levels of control during the generation process. As researchers began to apply these techniques to visual data, they found that structured prediction not only preserved spatial integrity but also supported tasks like image manipulation and multimodal translation effectively. Despite these benefits, generating high-resolution images remains computationally expensive and slow. A primary issue is the number of tokens needed to represent complex visuals. Raster-scan methods that flatten 2D images into linear sequences require thousands of tokens for detailed images, resulting in long inference times and high memory consumption. Models like Infinity need over 10,000 tokens for a 1024×1024 image. This becomes unsustainable for real-time applications or when scaling to more extensive datasets. Reducing the token burden while preserving or improving output quality has become a pressing challenge. Efforts to mitigate token inflation have led to innovations like next-scale prediction seen in VAR and FlexVAR. These models create images by predicting progressively finer scales, which imitates the human tendency to sketch rough outlines before adding detail. However, they still rely on hundreds of tokens—680 in the case of VAR and FlexVAR for 256×256 images. Moreover, approaches like TiTok and FlexTok use 1D tokenization to compress spatial redundancy, but they often fail to scale efficiently. For example, FlexTok’s gFID increases from 1.9 at 32 tokens to 2.5 at 256 tokens, highlighting a degradation in output quality as the token count grows. Researchers from ByteDance introduced DetailFlow, a 1D autoregressive image generation framework. This method arranges token sequences from global to fine detail using a process called next-detail prediction. Unlike traditional 2D raster-scan or scale-based techniques, DetailFlow employs a 1D tokenizer trained on progressively degraded images. This design allows the model to prioritize foundational image structures before refining visual details. By mapping tokens directly to resolution levels, DetailFlow significantly reduces token requirements, enabling images to be generated in a semantically ordered, coarse-to-fine manner. The mechanism in DetailFlow centers on a 1D latent space where each token contributes incrementally more detail. Earlier tokens encode global features, while later tokens refine specific visual aspects. To train this, the researchers created a resolution mapping function that links token count to target resolution. During training, the model is exposed to images of varying quality levels and learns to predict progressively higher-resolution outputs as more tokens are introduced. It also implements parallel token prediction by grouping sequences and predicting entire sets at once. Since parallel prediction can introduce sampling errors, a self-correction mechanism was integrated. This system perturbs certain tokens during training and teaches subsequent tokens to compensate, ensuring that final images maintain structural and visual integrity. The results from the experiments on the ImageNet 256×256 benchmark were noteworthy. DetailFlow achieved a gFID score of 2.96 using only 128 tokens, outperforming VAR at 3.3 and FlexVAR at 3.05, both of which used 680 tokens. Even more impressive, DetailFlow-64 reached a gFID of 2.62 using 512 tokens. In terms of speed, it delivered nearly double the inference rate of VAR and FlexVAR. A further ablation study confirmed that the self-correction training and semantic ordering of tokens substantially improved output quality. For example, enabling self-correction dropped the gFID from 4.11 to 3.68 in one setting. These metrics demonstrate both higher quality and faster generation compared to established models. By focusing on semantic structure and reducing redundancy, DetailFlow presents a viable solution to long-standing issues in autoregressive image generation. The method’s coarse-to-fine approach, efficient parallel decoding, and ability to self-correct highlight how architectural innovations can address performance and scalability limitations. Through their structured use of 1D tokens, the researchers from ByteDance have demonstrated a model that maintains high image fidelity while significantly reducing computational load, making it a valuable addition to image synthesis research. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement FinetuningNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces LLaDA-V: A Purely Diffusion-Based Multimodal Large Language Model for Visual Instruction Tuning and Multimodal ReasoningNikhilhttps://www.marktechpost.com/author/nikhil0980/NVIDIA AI Introduces Fast-dLLM: A Training-Free Framework That Brings KV Caching and Parallel Decoding to Diffusion LLMsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meet NovelSeek: A Unified Multi-Agent Framework for Autonomous Scientific Research from Hypothesis Generation to Experimental Validation #bytedance #researchers #introduce #detailflow #coarsetofine

ByteDance Researchers Introduce DetailFlow: A 1D Coarse-to-Fine Autoregressive Framework for Faster, Token-Efficient Image Generation

www.marktechpost.com
Autoregressive image generation has been shaped by advances in sequential modeling, originally seen in natural language processing. This field focuses on generating images one token at a time, similar to how sentences are constructed in language models. The appeal of this approach lies in its ability to maintain structural coherence across the image while allowing for high levels of control during the generation process. As researchers began to apply these techniques to visual data, they found that structured prediction not only preserved spatial integrity but also supported tasks like image manipulation and multimodal translation effectively. Despite these benefits, generating high-resolution images remains computationally expensive and slow. A primary issue is the number of tokens needed to represent complex visuals. Raster-scan methods that flatten 2D images into linear sequences require thousands of tokens for detailed images, resulting in long inference times and high memory consumption. Models like Infinity need over 10,000 tokens for a 1024×1024 image. This becomes unsustainable for real-time applications or when scaling to more extensive datasets. Reducing the token burden while preserving or improving output quality has become a pressing challenge. Efforts to mitigate token inflation have led to innovations like next-scale prediction seen in VAR and FlexVAR. These models create images by predicting progressively finer scales, which imitates the human tendency to sketch rough outlines before adding detail. However, they still rely on hundreds of tokens—680 in the case of VAR and FlexVAR for 256×256 images. Moreover, approaches like TiTok and FlexTok use 1D tokenization to compress spatial redundancy, but they often fail to scale efficiently. For example, FlexTok’s gFID increases from 1.9 at 32 tokens to 2.5 at 256 tokens, highlighting a degradation in output quality as the token count grows. Researchers from ByteDance introduced DetailFlow, a 1D autoregressive image generation framework. This method arranges token sequences from global to fine detail using a process called next-detail prediction. Unlike traditional 2D raster-scan or scale-based techniques, DetailFlow employs a 1D tokenizer trained on progressively degraded images. This design allows the model to prioritize foundational image structures before refining visual details. By mapping tokens directly to resolution levels, DetailFlow significantly reduces token requirements, enabling images to be generated in a semantically ordered, coarse-to-fine manner. The mechanism in DetailFlow centers on a 1D latent space where each token contributes incrementally more detail. Earlier tokens encode global features, while later tokens refine specific visual aspects. To train this, the researchers created a resolution mapping function that links token count to target resolution. During training, the model is exposed to images of varying quality levels and learns to predict progressively higher-resolution outputs as more tokens are introduced. It also implements parallel token prediction by grouping sequences and predicting entire sets at once. Since parallel prediction can introduce sampling errors, a self-correction mechanism was integrated. This system perturbs certain tokens during training and teaches subsequent tokens to compensate, ensuring that final images maintain structural and visual integrity. The results from the experiments on the ImageNet 256×256 benchmark were noteworthy. DetailFlow achieved a gFID score of 2.96 using only 128 tokens, outperforming VAR at 3.3 and FlexVAR at 3.05, both of which used 680 tokens. Even more impressive, DetailFlow-64 reached a gFID of 2.62 using 512 tokens. In terms of speed, it delivered nearly double the inference rate of VAR and FlexVAR. A further ablation study confirmed that the self-correction training and semantic ordering of tokens substantially improved output quality. For example, enabling self-correction dropped the gFID from 4.11 to 3.68 in one setting. These metrics demonstrate both higher quality and faster generation compared to established models. By focusing on semantic structure and reducing redundancy, DetailFlow presents a viable solution to long-standing issues in autoregressive image generation. The method’s coarse-to-fine approach, efficient parallel decoding, and ability to self-correct highlight how architectural innovations can address performance and scalability limitations. Through their structured use of 1D tokens, the researchers from ByteDance have demonstrated a model that maintains high image fidelity while significantly reducing computational load, making it a valuable addition to image synthesis research. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement FinetuningNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces LLaDA-V: A Purely Diffusion-Based Multimodal Large Language Model for Visual Instruction Tuning and Multimodal ReasoningNikhilhttps://www.marktechpost.com/author/nikhil0980/NVIDIA AI Introduces Fast-dLLM: A Training-Free Framework That Brings KV Caching and Parallel Decoding to Diffusion LLMsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meet NovelSeek: A Unified Multi-Agent Framework for Autonomous Scientific Research from Hypothesis Generation to Experimental Validation

821

· 0 Commentarios ·0 Acciones ·0 Vista previa

Please log in to like, share and comment!
3D Printing Industry @3DPrintingIndustry Compartió un vínculo
2025-06-06 07:31:47 ·

New Multi-Axis Tool from Virginia Tech Boosts Fiber-Reinforced 3D Printing

Researchers from the Department of Mechanical Engineering at Virginia Tech have introduced a continuous fiber reinforcementdeposition tool designed for multi-axis 3D printing, significantly enhancing mechanical performance in composite structures. Led by Kieran D. Beaumont, Joseph R. Kubalak, and Christopher B. Williams, and published in Springer Nature Link, the study demonstrates an 820% improvement in maximum load capacity compared to conventional planar short carbon fiber3D printing methods. This tool integrates three key functions: reliable fiber cutting and re-feeding, in situ fiber volume fraction control, and a slender collision volume to support complex multi-axis toolpaths.
The newly developed deposition tool addresses critical challenges in CFR additive manufacturing. It is capable of cutting and re-feeding continuous fibers during travel movements, a function required to create complex geometries without material tearing or print failure. In situ control of fiber volume fraction is also achieved by adjusting the polymer extrusion rate. A slender geometry minimizes collisions between the tool and the printed part during multi-axis movements.
The researchers designed the tool to co-extrude a thermoplastic polymer matrix with a continuous carbon fibertowpreg. This approach allowed reliable fiber re-feeding after each cut and enabled printing with variable fiber content within a single part. The tool’s slender collision volume supports increased range of motion for the robotic arm used in the experiments, allowing alignment of fibers with three-dimensional load paths in complex structures.
The six Degree-of-Freedom Robotic Arm printing a multi-axis geometry from a CFR polymer composite. Photo via Springer Nature Link.
Mechanical Testing Confirms Load-Bearing Improvements
Mechanical tests evaluated the impact of continuous fiber reinforcement on polylactic acidparts. In tensile tests, samples reinforced with continuous carbon fibers achieved a tensile strength of 190.76 MPa and a tensile modulus of 9.98 GPa in the fiber direction. These values compare to 60.31 MPa and 3.01 GPa for neat PLA, and 56.92 MPa and 4.30 GPa for parts containing short carbon fibers. Additional tests assessed intra-layer and inter-layer performance, revealing that the continuous fiber–reinforced material had reduced mechanical properties in these orientations. Compared to neat PLA, intra-layer tensile strength and modulus dropped by 66% and 63%, respectively, and inter-layer strength and modulus decreased by 86% and 60%.
Researchers printed curved tensile bar geometries using three methods to evaluate performance in parts with three-dimensional load paths: planar short carbon fiber–reinforced PLA, multi-axis short fiber–reinforced samples, and multi-axis continuous fiber–reinforced composites. The multi-axis short fiber–reinforced parts showed a 41.6% increase in maximum load compared to their planar counterparts. Meanwhile, multi-axis continuous fiber–reinforced parts absorbed loads 8.2 times higher than the planar short fiber–reinforced specimens. Scanning electron microscopyimages of fracture surfaces revealed fiber pull-out and limited fiber-matrix bonding, particularly in samples with continuous fibers.
Schematic illustration of common continuous fiber reinforcement–material extrusionmodalities: in situ impregnation, towpreg extrusion, and co-extrusion with towpreg. Photo via Springer Nature Link.
To verify the tool’s fiber cutting and re-feeding capability, the researchers printed a 100 × 150 × 3 mm rectangular plaque that required 426 cutting and re-feeding operations across six layers. The deposition tool achieved a 100% success rate, demonstrating reliable cutting and re-feeding without fiber clogging. This reliability is critical for manufacturing complex structures that require frequent travel movements between deposition paths.
In situ fiber volume fraction control was validated through printing a rectangular prism sample with varying polymer feed rates, road widths, and layer heights. The fiber volume fractions achieved in different sections of the part were 6.51%, 8.00%, and 9.86%, as measured by cross-sectional microscopy and image analysis. Although lower than some literature reports, the researchers attributed this to the specific combination of tool geometry, polymer-fiber interaction time, and print speed.
The tool uses Anisoprint’s CCF towpreg, a pre-impregnated continuous carbon fiber product with a fiber volume fraction of 57% and a diameter of 0.35 mm. 3DXTECH’s black PLA and SCF-PLA filaments were selected to ensure consistent matrix properties and avoid the influence of pigment variations on mechanical testing. The experiments were conducted using an ABB IRB 4600–40/2.55 robotic arm equipped with a tool changer for switching between the CFR-MEX deposition tool and a standard MEX tool with an elongated nozzle for planar prints.
Deposition Tool CAD and Assembly. Photo via Springer Nature Link.
Context Within Existing Research and Future Directions
Continuous fiber reinforcement in additive manufacturing has previously demonstrated significant improvements in part performance, with some studies reporting tensile strengths of up to 650 MPa for PLA composites reinforced with continuous carbon fibers. However, traditional three-axis printing methods restrict fiber orientation to planar directions, limiting these gains to within the XY-plane. Multi-axis 3D printing approaches have demonstrated improved load-bearing capacity in short-fiber reinforced parts. For example, multi-axis printed samples have shown failure loads several times higher than planar-printed counterparts in pressure cap and curved geometry applications.
Virginia Tech’s tool integrates multiple functionalities that previous tools in literature could not achieve simultaneously. It combines a polymer feeder based on a dual drive extruder, a fiber cutter and re-feeder assembly, and a co-extrusion hotend with adjustable interaction time for fiber-polymer bonding. A needle-like geometry and external pneumatic cooling pipes reduce the risk of collision with the printed part during multi-axis reorientation. Measured collision volume angles were 56.2° for the full tool and 41.6° for the hotend assembly.
Load-extension performance graphs for curved tensile bars. Photo via Springer Nature Link.
Despite these advances, the researchers identified challenges related to weak bonding between the fiber and the polymer matrix. SEM images showed limited impregnation of the polymer into the fiber towpreg, with the fiber-matrix interface remaining a key area for future work. The study highlights that optimizing fiber tow sizing and improving the fiber-polymer interaction time during printing could enhance inter-layer and intra-layer performance. The results also suggest that advanced toolpath planning algorithms could further leverage the tool’s ability to align fiber deposition along three-dimensional load paths, improving mechanical performance in functional parts.
The publication in Springer Nature Link documents the full design, validation experiments, and mechanical characterization of the CFR-MEX tool. The work adds to a growing body of research on multi-axis additive manufacturing, particularly in combining continuous fiber reinforcement with complex geometries.
Take the 3DPI Reader Survey — shape the future of AM reporting in under 5 minutes.
Ready to discover who won the 20243D Printing Industry Awards?
Subscribe to the 3D Printing Industry newsletter to stay updated with the latest news and insights.
Featured photo shows the six Degree-of-Freedom Robotic Arm printing a multi-axis geometry. Photo via Springer Nature Link.

Anyer Tenorio Lara
Anyer Tenorio Lara is an emerging tech journalist passionate about uncovering the latest advances in technology and innovation. With a sharp eye for detail and a talent for storytelling, Anyer has quickly made a name for himself in the tech community. Anyer's articles aim to make complex subjects accessible and engaging for a broad audience. In addition to his writing, Anyer enjoys participating in industry events and discussions, eager to learn and share knowledge in the dynamic world of technology.
#new #multiaxis #tool #virginia #tech

New Multi-Axis Tool from Virginia Tech Boosts Fiber-Reinforced 3D Printing
Researchers from the Department of Mechanical Engineering at Virginia Tech have introduced a continuous fiber reinforcementdeposition tool designed for multi-axis 3D printing, significantly enhancing mechanical performance in composite structures. Led by Kieran D. Beaumont, Joseph R. Kubalak, and Christopher B. Williams, and published in Springer Nature Link, the study demonstrates an 820% improvement in maximum load capacity compared to conventional planar short carbon fiber3D printing methods. This tool integrates three key functions: reliable fiber cutting and re-feeding, in situ fiber volume fraction control, and a slender collision volume to support complex multi-axis toolpaths. The newly developed deposition tool addresses critical challenges in CFR additive manufacturing. It is capable of cutting and re-feeding continuous fibers during travel movements, a function required to create complex geometries without material tearing or print failure. In situ control of fiber volume fraction is also achieved by adjusting the polymer extrusion rate. A slender geometry minimizes collisions between the tool and the printed part during multi-axis movements. The researchers designed the tool to co-extrude a thermoplastic polymer matrix with a continuous carbon fibertowpreg. This approach allowed reliable fiber re-feeding after each cut and enabled printing with variable fiber content within a single part. The tool’s slender collision volume supports increased range of motion for the robotic arm used in the experiments, allowing alignment of fibers with three-dimensional load paths in complex structures. The six Degree-of-Freedom Robotic Arm printing a multi-axis geometry from a CFR polymer composite. Photo via Springer Nature Link. Mechanical Testing Confirms Load-Bearing Improvements Mechanical tests evaluated the impact of continuous fiber reinforcement on polylactic acidparts. In tensile tests, samples reinforced with continuous carbon fibers achieved a tensile strength of 190.76 MPa and a tensile modulus of 9.98 GPa in the fiber direction. These values compare to 60.31 MPa and 3.01 GPa for neat PLA, and 56.92 MPa and 4.30 GPa for parts containing short carbon fibers. Additional tests assessed intra-layer and inter-layer performance, revealing that the continuous fiber–reinforced material had reduced mechanical properties in these orientations. Compared to neat PLA, intra-layer tensile strength and modulus dropped by 66% and 63%, respectively, and inter-layer strength and modulus decreased by 86% and 60%. Researchers printed curved tensile bar geometries using three methods to evaluate performance in parts with three-dimensional load paths: planar short carbon fiber–reinforced PLA, multi-axis short fiber–reinforced samples, and multi-axis continuous fiber–reinforced composites. The multi-axis short fiber–reinforced parts showed a 41.6% increase in maximum load compared to their planar counterparts. Meanwhile, multi-axis continuous fiber–reinforced parts absorbed loads 8.2 times higher than the planar short fiber–reinforced specimens. Scanning electron microscopyimages of fracture surfaces revealed fiber pull-out and limited fiber-matrix bonding, particularly in samples with continuous fibers. Schematic illustration of common continuous fiber reinforcement–material extrusionmodalities: in situ impregnation, towpreg extrusion, and co-extrusion with towpreg. Photo via Springer Nature Link. To verify the tool’s fiber cutting and re-feeding capability, the researchers printed a 100 × 150 × 3 mm rectangular plaque that required 426 cutting and re-feeding operations across six layers. The deposition tool achieved a 100% success rate, demonstrating reliable cutting and re-feeding without fiber clogging. This reliability is critical for manufacturing complex structures that require frequent travel movements between deposition paths. In situ fiber volume fraction control was validated through printing a rectangular prism sample with varying polymer feed rates, road widths, and layer heights. The fiber volume fractions achieved in different sections of the part were 6.51%, 8.00%, and 9.86%, as measured by cross-sectional microscopy and image analysis. Although lower than some literature reports, the researchers attributed this to the specific combination of tool geometry, polymer-fiber interaction time, and print speed. The tool uses Anisoprint’s CCF towpreg, a pre-impregnated continuous carbon fiber product with a fiber volume fraction of 57% and a diameter of 0.35 mm. 3DXTECH’s black PLA and SCF-PLA filaments were selected to ensure consistent matrix properties and avoid the influence of pigment variations on mechanical testing. The experiments were conducted using an ABB IRB 4600–40/2.55 robotic arm equipped with a tool changer for switching between the CFR-MEX deposition tool and a standard MEX tool with an elongated nozzle for planar prints. Deposition Tool CAD and Assembly. Photo via Springer Nature Link. Context Within Existing Research and Future Directions Continuous fiber reinforcement in additive manufacturing has previously demonstrated significant improvements in part performance, with some studies reporting tensile strengths of up to 650 MPa for PLA composites reinforced with continuous carbon fibers. However, traditional three-axis printing methods restrict fiber orientation to planar directions, limiting these gains to within the XY-plane. Multi-axis 3D printing approaches have demonstrated improved load-bearing capacity in short-fiber reinforced parts. For example, multi-axis printed samples have shown failure loads several times higher than planar-printed counterparts in pressure cap and curved geometry applications. Virginia Tech’s tool integrates multiple functionalities that previous tools in literature could not achieve simultaneously. It combines a polymer feeder based on a dual drive extruder, a fiber cutter and re-feeder assembly, and a co-extrusion hotend with adjustable interaction time for fiber-polymer bonding. A needle-like geometry and external pneumatic cooling pipes reduce the risk of collision with the printed part during multi-axis reorientation. Measured collision volume angles were 56.2° for the full tool and 41.6° for the hotend assembly. Load-extension performance graphs for curved tensile bars. Photo via Springer Nature Link. Despite these advances, the researchers identified challenges related to weak bonding between the fiber and the polymer matrix. SEM images showed limited impregnation of the polymer into the fiber towpreg, with the fiber-matrix interface remaining a key area for future work. The study highlights that optimizing fiber tow sizing and improving the fiber-polymer interaction time during printing could enhance inter-layer and intra-layer performance. The results also suggest that advanced toolpath planning algorithms could further leverage the tool’s ability to align fiber deposition along three-dimensional load paths, improving mechanical performance in functional parts. The publication in Springer Nature Link documents the full design, validation experiments, and mechanical characterization of the CFR-MEX tool. The work adds to a growing body of research on multi-axis additive manufacturing, particularly in combining continuous fiber reinforcement with complex geometries. Take the 3DPI Reader Survey — shape the future of AM reporting in under 5 minutes. Ready to discover who won the 20243D Printing Industry Awards? Subscribe to the 3D Printing Industry newsletter to stay updated with the latest news and insights. Featured photo shows the six Degree-of-Freedom Robotic Arm printing a multi-axis geometry. Photo via Springer Nature Link. Anyer Tenorio Lara Anyer Tenorio Lara is an emerging tech journalist passionate about uncovering the latest advances in technology and innovation. With a sharp eye for detail and a talent for storytelling, Anyer has quickly made a name for himself in the tech community. Anyer's articles aim to make complex subjects accessible and engaging for a broad audience. In addition to his writing, Anyer enjoys participating in industry events and discussions, eager to learn and share knowledge in the dynamic world of technology. #new #multiaxis #tool #virginia #tech

New Multi-Axis Tool from Virginia Tech Boosts Fiber-Reinforced 3D Printing

3dprintingindustry.com
Researchers from the Department of Mechanical Engineering at Virginia Tech have introduced a continuous fiber reinforcement (CFR) deposition tool designed for multi-axis 3D printing, significantly enhancing mechanical performance in composite structures. Led by Kieran D. Beaumont, Joseph R. Kubalak, and Christopher B. Williams, and published in Springer Nature Link, the study demonstrates an 820% improvement in maximum load capacity compared to conventional planar short carbon fiber (SCF) 3D printing methods. This tool integrates three key functions: reliable fiber cutting and re-feeding, in situ fiber volume fraction control, and a slender collision volume to support complex multi-axis toolpaths. The newly developed deposition tool addresses critical challenges in CFR additive manufacturing. It is capable of cutting and re-feeding continuous fibers during travel movements, a function required to create complex geometries without material tearing or print failure. In situ control of fiber volume fraction is also achieved by adjusting the polymer extrusion rate. A slender geometry minimizes collisions between the tool and the printed part during multi-axis movements. The researchers designed the tool to co-extrude a thermoplastic polymer matrix with a continuous carbon fiber (CCF) towpreg. This approach allowed reliable fiber re-feeding after each cut and enabled printing with variable fiber content within a single part. The tool’s slender collision volume supports increased range of motion for the robotic arm used in the experiments, allowing alignment of fibers with three-dimensional load paths in complex structures. The six Degree-of-Freedom Robotic Arm printing a multi-axis geometry from a CFR polymer composite. Photo via Springer Nature Link. Mechanical Testing Confirms Load-Bearing Improvements Mechanical tests evaluated the impact of continuous fiber reinforcement on polylactic acid (PLA) parts. In tensile tests, samples reinforced with continuous carbon fibers achieved a tensile strength of 190.76 MPa and a tensile modulus of 9.98 GPa in the fiber direction. These values compare to 60.31 MPa and 3.01 GPa for neat PLA, and 56.92 MPa and 4.30 GPa for parts containing short carbon fibers. Additional tests assessed intra-layer and inter-layer performance, revealing that the continuous fiber–reinforced material had reduced mechanical properties in these orientations. Compared to neat PLA, intra-layer tensile strength and modulus dropped by 66% and 63%, respectively, and inter-layer strength and modulus decreased by 86% and 60%. Researchers printed curved tensile bar geometries using three methods to evaluate performance in parts with three-dimensional load paths: planar short carbon fiber–reinforced PLA, multi-axis short fiber–reinforced samples, and multi-axis continuous fiber–reinforced composites. The multi-axis short fiber–reinforced parts showed a 41.6% increase in maximum load compared to their planar counterparts. Meanwhile, multi-axis continuous fiber–reinforced parts absorbed loads 8.2 times higher than the planar short fiber–reinforced specimens. Scanning electron microscopy (SEM) images of fracture surfaces revealed fiber pull-out and limited fiber-matrix bonding, particularly in samples with continuous fibers. Schematic illustration of common continuous fiber reinforcement–material extrusion (CFR-MEX) modalities: in situ impregnation, towpreg extrusion, and co-extrusion with towpreg. Photo via Springer Nature Link. To verify the tool’s fiber cutting and re-feeding capability, the researchers printed a 100 × 150 × 3 mm rectangular plaque that required 426 cutting and re-feeding operations across six layers. The deposition tool achieved a 100% success rate, demonstrating reliable cutting and re-feeding without fiber clogging. This reliability is critical for manufacturing complex structures that require frequent travel movements between deposition paths. In situ fiber volume fraction control was validated through printing a rectangular prism sample with varying polymer feed rates, road widths, and layer heights. The fiber volume fractions achieved in different sections of the part were 6.51%, 8.00%, and 9.86%, as measured by cross-sectional microscopy and image analysis. Although lower than some literature reports, the researchers attributed this to the specific combination of tool geometry, polymer-fiber interaction time, and print speed. The tool uses Anisoprint’s CCF towpreg, a pre-impregnated continuous carbon fiber product with a fiber volume fraction of 57% and a diameter of 0.35 mm. 3DXTECH’s black PLA and SCF-PLA filaments were selected to ensure consistent matrix properties and avoid the influence of pigment variations on mechanical testing. The experiments were conducted using an ABB IRB 4600–40/2.55 robotic arm equipped with a tool changer for switching between the CFR-MEX deposition tool and a standard MEX tool with an elongated nozzle for planar prints. Deposition Tool CAD and Assembly. Photo via Springer Nature Link. Context Within Existing Research and Future Directions Continuous fiber reinforcement in additive manufacturing has previously demonstrated significant improvements in part performance, with some studies reporting tensile strengths of up to 650 MPa for PLA composites reinforced with continuous carbon fibers. However, traditional three-axis printing methods restrict fiber orientation to planar directions, limiting these gains to within the XY-plane. Multi-axis 3D printing approaches have demonstrated improved load-bearing capacity in short-fiber reinforced parts. For example, multi-axis printed samples have shown failure loads several times higher than planar-printed counterparts in pressure cap and curved geometry applications. Virginia Tech’s tool integrates multiple functionalities that previous tools in literature could not achieve simultaneously. It combines a polymer feeder based on a dual drive extruder, a fiber cutter and re-feeder assembly, and a co-extrusion hotend with adjustable interaction time for fiber-polymer bonding. A needle-like geometry and external pneumatic cooling pipes reduce the risk of collision with the printed part during multi-axis reorientation. Measured collision volume angles were 56.2° for the full tool and 41.6° for the hotend assembly. Load-extension performance graphs for curved tensile bars. Photo via Springer Nature Link. Despite these advances, the researchers identified challenges related to weak bonding between the fiber and the polymer matrix. SEM images showed limited impregnation of the polymer into the fiber towpreg, with the fiber-matrix interface remaining a key area for future work. The study highlights that optimizing fiber tow sizing and improving the fiber-polymer interaction time during printing could enhance inter-layer and intra-layer performance. The results also suggest that advanced toolpath planning algorithms could further leverage the tool’s ability to align fiber deposition along three-dimensional load paths, improving mechanical performance in functional parts. The publication in Springer Nature Link documents the full design, validation experiments, and mechanical characterization of the CFR-MEX tool. The work adds to a growing body of research on multi-axis additive manufacturing, particularly in combining continuous fiber reinforcement with complex geometries. Take the 3DPI Reader Survey — shape the future of AM reporting in under 5 minutes. Ready to discover who won the 20243D Printing Industry Awards? Subscribe to the 3D Printing Industry newsletter to stay updated with the latest news and insights. Featured photo shows the six Degree-of-Freedom Robotic Arm printing a multi-axis geometry. Photo via Springer Nature Link. Anyer Tenorio Lara Anyer Tenorio Lara is an emerging tech journalist passionate about uncovering the latest advances in technology and innovation. With a sharp eye for detail and a talent for storytelling, Anyer has quickly made a name for himself in the tech community. Anyer's articles aim to make complex subjects accessible and engaging for a broad audience. In addition to his writing, Anyer enjoys participating in industry events and discussions, eager to learn and share knowledge in the dynamic world of technology.

323

· 0 Commentarios ·0 Acciones ·0 Vista previa

Please log in to like, share and comment!

Upgrade to Pro