• How AI is reshaping the future of healthcare and medical research

    Transcript       
    PETER LEE: “In ‘The Little Black Bag,’ a classic science fiction story, a high-tech doctor’s kit of the future is accidentally transported back to the 1950s, into the shaky hands of a washed-up, alcoholic doctor. The ultimate medical tool, it redeems the doctor wielding it, allowing him to practice gratifyingly heroic medicine. … The tale ends badly for the doctor and his treacherous assistant, but it offered a picture of how advanced technology could transform medicine—powerful when it was written nearly 75 years ago and still so today. What would be the Al equivalent of that little black bag? At this moment when new capabilities are emerging, how do we imagine them into medicine?”          
    This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.   
    Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?    
    In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.  The book passage I read at the top is from “Chapter 10: The Big Black Bag.” 
    In imagining AI in medicine, Carey, Zak, and I included in our book two fictional accounts. In the first, a medical resident consults GPT-4 on her personal phone as the patient in front of her crashes. Within seconds, it offers an alternate response based on recent literature. In the second account, a 90-year-old woman with several chronic conditions is living independently and receiving near-constant medical support from an AI aide.   
    In our conversations with the guests we’ve spoken to so far, we’ve caught a glimpse of these predicted futures, seeing how clinicians and patients are actually using AI today and how developers are leveraging the technology in the healthcare products and services they’re creating. In fact, that first fictional account isn’t so fictional after all, as most of the doctors in the real world actually appear to be using AI at least occasionally—and sometimes much more than occasionally—to help in their daily clinical work. And as for the second fictional account, which is more of a science fiction account, it seems we are indeed on the verge of a new way of delivering and receiving healthcare, though the future is still very much open. 
    As we continue to examine the current state of AI in healthcare and its potential to transform the field, I’m pleased to welcome Bill Gates and Sébastien Bubeck.  
    Bill may be best known as the co-founder of Microsoft, having created the company with his childhood friend Paul Allen in 1975. He’s now the founder of Breakthrough Energy, which aims to advance clean energy innovation, and TerraPower, a company developing groundbreaking nuclear energy and science technologies. He also chairs the world’s largest philanthropic organization, the Gates Foundation, and focuses on solving a variety of health challenges around the globe and here at home. 
    Sébastien is a research lead at OpenAI. He was previously a distinguished scientist, vice president of AI, and a colleague of mine here at Microsoft, where his work included spearheading the development of the family of small language models known as Phi. While at Microsoft, he also coauthored the discussion-provoking 2023 paper “Sparks of Artificial General Intelligence,” which presented the results of early experiments with GPT-4 conducted by a small team from Microsoft Research.     
    Here’s my conversation with Bill Gates and Sébastien Bubeck. 
    LEE: Bill, welcome. 
    BILL GATES: Thank you. 
    LEE: Seb … 
    SÉBASTIEN BUBECK: Yeah. Hi, hi, Peter. Nice to be here. 
    LEE: You know, one of the things that I’ve been doing just to get the conversation warmed up is to talk about origin stories, and what I mean about origin stories is, you know, what was the first contact that you had with large language models or the concept of generative AI that convinced you or made you think that something really important was happening? 
    And so, Bill, I think I’ve heard the story about, you know, the time when the OpenAI folks—Sam Altman, Greg Brockman, and others—showed you something, but could we hear from you what those early encounters were like and what was going through your mind?  
    GATES: Well, I’d been visiting OpenAI soon after it was created to see things like GPT-2 and to see the little arm they had that was trying to match human manipulation and, you know, looking at their games like Dota that they were trying to get as good as human play. And honestly, I didn’t think the language model stuff they were doing, even when they got to GPT-3, would show the ability to learn, you know, in the same sense that a human reads a biology book and is able to take that knowledge and access it not only to pass a test but also to create new medicines. 
    And so my challenge to them was that if their LLM could get a five on the advanced placement biology test, then I would say, OK, it took biologic knowledge and encoded it in an accessible way and that I didn’t expect them to do that very quickly but it would be profound.  
    And it was only about six months after I challenged them to do that, that an early version of GPT-4 they brought up to a dinner at my house, and in fact, it answered most of the questions that night very well. The one it got totally wrong, we were … because it was so good, we kept thinking, Oh, we must be wrong. It turned out it was a math weaknessthat, you know, we later understood that that was an area of, weirdly, of incredible weakness of those early models. But, you know, that was when I realized, OK, the age of cheap intelligence was at its beginning. 
    LEE: Yeah. So I guess it seems like you had something similar to me in that my first encounters, I actually harbored some skepticism. Is it fair to say you were skeptical before that? 
    GATES: Well, the idea that we’ve figured out how to encode and access knowledge in this very deep sense without even understanding the nature of the encoding, … 
    LEE: Right.  
    GATES: … that is a bit weird.  
    LEE: Yeah. 
    GATES: We have an algorithm that creates the computation, but even say, OK, where is the president’s birthday stored in there? Where is this fact stored in there? The fact that even now when we’re playing around, getting a little bit more sense of it, it’s opaque to us what the semantic encoding is, it’s, kind of, amazing to me. I thought the invention of knowledge storage would be an explicit way of encoding knowledge, not an implicit statistical training. 
    LEE: Yeah, yeah. All right. So, Seb, you know, on this same topic, you know, I got—as we say at Microsoft—I got pulled into the tent. 
    BUBECK: Yes.  
    LEE: Because this was a very secret project. And then, um, I had the opportunity to select a small number of researchers in MSRto join and start investigating this thing seriously. And the first person I pulled in was you. 
    BUBECK: Yeah. 
    LEE: And so what were your first encounters? Because I actually don’t remember what happened then. 
    BUBECK: Oh, I remember it very well.My first encounter with GPT-4 was in a meeting with the two of you, actually. But my kind of first contact, the first moment where I realized that something was happening with generative AI, was before that. And I agree with Bill that I also wasn’t too impressed by GPT-3. 
    I though that it was kind of, you know, very naturally mimicking the web, sort of parroting what was written there in a nice way. Still in a way which seemed very impressive. But it wasn’t really intelligent in any way. But shortly after GPT-3, there was a model before GPT-4 that really shocked me, and this was the first image generation model, DALL-E 1. 
    So that was in 2021. And I will forever remember the press release of OpenAI where they had this prompt of an avocado chair and then you had this image of the avocado chair.And what really shocked me is that clearly the model kind of “understood” what is a chair, what is an avocado, and was able to merge those concepts. 
    So this was really, to me, the first moment where I saw some understanding in those models.  
    LEE: So this was, just to get the timing right, that was before I pulled you into the tent. 
    BUBECK: That was before. That was like a year before. 
    LEE: Right.  
    BUBECK: And now I will tell you how, you know, we went from that moment to the meeting with the two of you and GPT-4. 
    So once I saw this kind of understanding, I thought, OK, fine. It understands concept, but it’s still not able to reason. It cannot—as, you know, Bill was saying—it cannot learn from your document. It cannot reason.  
    So I set out to try to prove that. You know, this is what I was in the business of at the time, trying to prove things in mathematics. So I was trying to prove that basically autoregressive transformers could never reason. So I was trying to prove this. And after a year of work, I had something reasonable to show. And so I had the meeting with the two of you, and I had this example where I wanted to say, there is no way that an LLM is going to be able to do x. 
    And then as soon as I … I don’t know if you remember, Bill. But as soon as I said that, you said, oh, but wait a second. I had, you know, the OpenAI crew at my house recently, and they showed me a new model. Why don’t we ask this new model this question?  
    LEE: Yeah.
    BUBECK: And we did, and it solved it on the spot. And that really, honestly, just changed my life. Like, you know, I had been working for a year trying to say that this was impossible. And just right there, it was shown to be possible.  
    LEE:One of the very first things I got interested in—because I was really thinking a lot about healthcare—was healthcare and medicine. 
    And I don’t know if the two of you remember, but I ended up doing a lot of tests. I ran through, you know, step one and step two of the US Medical Licensing Exam. Did a whole bunch of other things. I wrote this big report. It was, you know, I can’t remember … a couple hundred pages.  
    And I needed to share this with someone. I didn’t … there weren’t too many people I could share it with. So I sent, I think, a copy to you, Bill. Sent a copy to you, Seb.  
    I hardly slept for about a week putting that report together. And, yeah, and I kept working on it. But I was far from alone. I think everyone who was in the tent, so to speak, in those early days was going through something pretty similar. All right. So I think … of course, a lot of what I put in the report also ended up being examples that made it into the book. 
    But the main purpose of this conversation isn’t to reminisce aboutor indulge in those reminiscences but to talk about what’s happening in healthcare and medicine. And, you know, as I said, we wrote this book. We did it very, very quickly. Seb, you helped. Bill, you know, you provided a review and some endorsements. 
    But, you know, honestly, we didn’t know what we were talking about because no one had access to this thing. And so we just made a bunch of guesses. So really, the whole thing I wanted to probe with the two of you is, now with two years of experience out in the world, what, you know, what do we think is happening today? 
    You know, is AI actually having an impact, positive or negative, on healthcare and medicine? And what do we now think is going to happen in the next two years, five years, or 10 years? And so I realize it’s a little bit too abstract to just ask it that way. So let me just try to narrow the discussion and guide us a little bit.  
    Um, the kind of administrative and clerical work, paperwork, around healthcare—and we made a lot of guesses about that—that appears to be going well, but, you know, Bill, I know we’ve discussed that sometimes that you think there ought to be a lot more going on. Do you have a viewpoint on how AI is actually finding its way into reducing paperwork? 
    GATES: Well, I’m stunned … I don’t think there should be a patient-doctor meeting where the AI is not sitting in and both transcribing, offering to help with the paperwork, and even making suggestions, although the doctor will be the one, you know, who makes the final decision about the diagnosis and whatever prescription gets done.  
    It’s so helpful. You know, when that patient goes home and their, you know, son who wants to understand what happened has some questions, that AI should be available to continue that conversation. And the way you can improve that experience and streamline things and, you know, involve the people who advise you. I don’t understand why that’s not more adopted, because there you still have the human in the loop making that final decision. 
    But even for, like, follow-up calls to make sure the patient did things, to understand if they have concerns and knowing when to escalate back to the doctor, the benefit is incredible. And, you know, that thing is ready for prime time. That paradigm is ready for prime time, in my view. 
    LEE: Yeah, there are some good products, but it seems like the number one use right now—and we kind of got this from some of the previous guests in previous episodes—is the use of AI just to respond to emails from patients.Does that make sense to you? 
    BUBECK: Yeah. So maybe I want to second what Bill was saying but maybe take a step back first. You know, two years ago, like, the concept of clinical scribes, which is one of the things that we’re talking about right now, it would have sounded, in fact, it sounded two years ago, borderline dangerous. Because everybody was worried about hallucinations. What happened if you have this AI listening in and then it transcribes, you know, something wrong? 
    Now, two years later, I think it’s mostly working. And in fact, it is not yet, you know, fully adopted. You’re right. But it is in production. It is used, you know, in many, many places. So this rate of progress is astounding because it wasn’t obvious that we would be able to overcome those obstacles of hallucination. It’s not to say that hallucinations are fully solved. In the case of the closed system, they are.  
    Now, I think more generally what’s going on in the background is that there is something that we, that certainly I, underestimated, which is this management overhead. So I think the reason why this is not adopted everywhere is really a training and teaching aspect. People need to be taught, like, those systems, how to interact with them. 
    And one example that I really like, a study that recently appeared where they tried to use ChatGPT for diagnosis and they were comparing doctors without and with ChatGPT. And the amazing thing … so this was a set of cases where the accuracy of the doctors alone was around 75%. ChatGPT alone was 90%. So that’s already kind of mind blowing. But then the kicker is that doctors with ChatGPT was 80%.  
    Intelligence alone is not enough. It’s also how it’s presented, how you interact with it. And ChatGPT, it’s an amazing tool. Obviously, I absolutely love it. But it’s not … you don’t want a doctor to have to type in, you know, prompts and use it that way. 
    It should be, as Bill was saying, kind of running continuously in the background, sending you notifications. And you have to be really careful of the rate at which those notifications are being sent. Because if they are too frequent, then the doctor will learn to ignore them. So you have to … all of those things matter, in fact, at least as much as the level of intelligence of the machine. 
    LEE: One of the things I think about, Bill, in that scenario that you described, doctors do some thinking about the patient when they write the note. So, you know, I’m always a little uncertain whether it’s actually … you know, you wouldn’t necessarily want to fully automate this, I don’t think. Or at least there needs to be some prompt to the doctor to make sure that the doctor puts some thought into what happened in the encounter with the patient. Does that make sense to you at all? 
    GATES: At this stage, you know, I’d still put the onus on the doctor to write the conclusions and the summary and not delegate that. 
    The tradeoffs you make a little bit are somewhat dependent on the situation you’re in. If you’re in Africa,
    So, yes, the doctor’s still going to have to do a lot of work, but just the quality of letting the patient and the people around them interact and ask questions and have things explained, that alone is such a quality improvement. It’s mind blowing.  
    LEE: So since you mentioned, you know, Africa—and, of course, this touches on the mission and some of the priorities of the Gates Foundation and this idea of democratization of access to expert medical care—what’s the most interesting stuff going on right now? Are there people and organizations or technologies that are impressing you or that you’re tracking? 
    GATES: Yeah. So the Gates Foundation has given out a lot of grants to people in Africa doing education, agriculture but more healthcare examples than anything. And the way these things start off, they often start out either being patient-centric in a narrow situation, like, OK, I’m a pregnant woman; talk to me. Or, I have infectious disease symptoms; talk to me. Or they’re connected to a health worker where they’re helping that worker get their job done. And we have lots of pilots out, you know, in both of those cases.  
    The dream would be eventually to have the thing the patient consults be so broad that it’s like having a doctor available who understands the local things.  
    LEE: Right.  
    GATES: We’re not there yet. But over the next two or three years, you know, particularly given the worsening financial constraints against African health systems, where the withdrawal of money has been dramatic, you know, figuring out how to take this—what I sometimes call “free intelligence”—and build a quality health system around that, we will have to be more radical in low-income countries than any rich country is ever going to be.  
    LEE: Also, there’s maybe a different regulatory environment, so some of those things maybe are easier? Because right now, I think the world hasn’t figured out how to and whether to regulate, let’s say, an AI that might give a medical diagnosis or write a prescription for a medication. 
    BUBECK: Yeah. I think one issue with this, and it’s also slowing down the deployment of AI in healthcare more generally, is a lack of proper benchmark. Because, you know, you were mentioning the USMLE, for example. That’s a great test to test human beings and their knowledge of healthcare and medicine. But it’s not a great test to give to an AI. 
    It’s not asking the right questions. So finding what are the right questions to test whether an AI system is ready to give diagnosis in a constrained setting, that’s a very, very important direction, which to my surprise, is not yet accelerating at the rate that I was hoping for. 
    LEE: OK, so that gives me an excuse to get more now into the core AI tech because something I’ve discussed with both of you is this issue of what are the right tests. And you both know the very first test I give to any new spin of an LLM is I present a patient, the results—a mythical patient—the results of my physical exam, my mythical physical exam. Maybe some results of some initial labs. And then I present or propose a differential diagnosis. And if you’re not in medicine, a differential diagnosis you can just think of as a prioritized list of the possible diagnoses that fit with all that data. And in that proposed differential, I always intentionally make two mistakes. 
    I make a textbook technical error in one of the possible elements of the differential diagnosis, and I have an error of omission. And, you know, I just want to know, does the LLM understand what I’m talking about? And all the good ones out there do now. But then I want to know, can it spot the errors? And then most importantly, is it willing to tell me I’m wrong, that I’ve made a mistake?  
    That last piece seems really hard for AI today. And so let me ask you first, Seb, because at the time of this taping, of course, there was a new spin of GPT-4o last week that became overly sycophantic. In other words, it was actually prone in that test of mine not only to not tell me I’m wrong, but it actually praised me for the creativity of my differential.What’s up with that? 
    BUBECK: Yeah, I guess it’s a testament to the fact that training those models is still more of an art than a science. So it’s a difficult job. Just to be clear with the audience, we have rolled back thatversion of GPT-4o, so now we don’t have the sycophant version out there. 
    Yeah, no, it’s a really difficult question. It has to do … as you said, it’s very technical. It has to do with the post-training and how, like, where do you nudge the model? So, you know, there is this very classical by now technique called RLHF, where you push the model in the direction of a certain reward model. So the reward model is just telling the model, you know, what behavior is good, what behavior is bad. 
    But this reward model is itself an LLM, and, you know, Bill was saying at the very beginning of the conversation that we don’t really understand how those LLMs deal with concepts like, you know, where is the capital of France located? Things like that. It is the same thing for this reward model. We don’t know why it says that it prefers one output to another, and whether this is correlated with some sycophancy is, you know, something that we discovered basically just now. That if you push too hard in optimization on this reward model, you will get a sycophant model. 
    So it’s kind of … what I’m trying to say is we became too good at what we were doing, and we ended up, in fact, in a trap of the reward model. 
    LEE: I mean, you do want … it’s a difficult balance because you do want models to follow your desires and … 
    BUBECK: It’s a very difficult, very difficult balance. 
    LEE: So this brings up then the following question for me, which is the extent to which we think we’ll need to have specially trained models for things. So let me start with you, Bill. Do you have a point of view on whether we will need to, you know, quote-unquote take AI models to med school? Have them specially trained? Like, if you were going to deploy something to give medical care in underserved parts of the world, do we need to do something special to create those models? 
    GATES: We certainly need to teach them the African languages and the unique dialects so that the multimedia interactions are very high quality. We certainly need to teach them the disease prevalence and unique disease patterns like, you know, neglected tropical diseases and malaria. So we need to gather a set of facts that somebody trying to go for a US customer base, you know, wouldn’t necessarily have that in there. 
    Those two things are actually very straightforward because the additional training time is small. I’d say for the next few years, we’ll also need to do reinforcement learning about the context of being a doctor and how important certain behaviors are. Humans learn over the course of their life to some degree that, I’m in a different context and the way I behave in terms of being willing to criticize or be nice, you know, how important is it? Who’s here? What’s my relationship to them?  
    Right now, these machines don’t have that broad social experience. And so if you know it’s going to be used for health things, a lot of reinforcement learning of the very best humans in that context would still be valuable. Eventually, the models will, having read all the literature of the world about good doctors, bad doctors, it’ll understand as soon as you say, “I want you to be a doctor diagnosing somebody.” All of the implicit reinforcement that fits that situation, you know, will be there.
    LEE: Yeah.
    GATES: And so I hope three years from now, we don’t have to do that reinforcement learning. But today, for any medical context, you would want a lot of data to reinforce tone, willingness to say things when, you know, there might be something significant at stake. 
    LEE: Yeah. So, you know, something Bill said, kind of, reminds me of another thing that I think we missed, which is, the context also … and the specialization also pertains to different, I guess, what we still call “modes,” although I don’t know if the idea of multimodal is the same as it was two years ago. But, you know, what do you make of all of the hubbub around—in fact, within Microsoft Research, this is a big deal, but I think we’re far from alone—you know, medical images and vision, video, proteins and molecules, cell, you know, cellular data and so on. 
    BUBECK: Yeah. OK. So there is a lot to say to everything … to the last, you know, couple of minutes. Maybe on the specialization aspect, you know, I think there is, hiding behind this, a really fundamental scientific question of whether eventually we have a singular AGIthat kind of knows everything and you can just put, you know, explain your own context and it will just get it and understand everything. 
    That’s one vision. I have to say, I don’t particularly believe in this vision. In fact, we humans are not like that at all. I think, hopefully, we are general intelligences, yet we have to specialize a lot. And, you know, I did myself a lot of RL, reinforcement learning, on mathematics. Like, that’s what I did, you know, spent a lot of time doing that. And I didn’t improve on other aspects. You know, in fact, I probably degraded in other aspects.So it’s … I think it’s an important example to have in mind. 
    LEE: I think I might disagree with you on that, though, because, like, doesn’t a model have to see both good science and bad science in order to be able to gain the ability to discern between the two? 
    BUBECK: Yeah, no, that absolutely. I think there is value in seeing the generality, in having a very broad base. But then you, kind of, specialize on verticals. And this is where also, you know, open-weights model, which we haven’t talked about yet, are really important because they allow you to provide this broad base to everyone. And then you can specialize on top of it. 
    LEE: So we have about three hours of stuff to talk about, but our time is actually running low.
    BUBECK: Yes, yes, yes.  
    LEE: So I think I want … there’s a more provocative question. It’s almost a silly question, but I need to ask it of the two of you, which is, is there a future, you know, where AI replaces doctors or replaces, you know, medical specialties that we have today? So what does the world look like, say, five years from now? 
    GATES: Well, it’s important to distinguish healthcare discovery activity from healthcare delivery activity. We focused mostly on delivery. I think it’s very much within the realm of possibility that the AI is not only accelerating healthcare discovery but substituting for a lot of the roles of, you know, I’m an organic chemist, or I run various types of assays. I can see those, which are, you know, testable-output-type jobs but with still very high value, I can see, you know, some replacement in those areas before the doctor.  
    The doctor, still understanding the human condition and long-term dialogues, you know, they’ve had a lifetime of reinforcement of that, particularly when you get into areas like mental health. So I wouldn’t say in five years, either people will choose to adopt it, but it will be profound that there’ll be this nearly free intelligence that can do follow-up, that can help you, you know, make sure you went through different possibilities. 
    And so I’d say, yes, we’ll have doctors, but I’d say healthcare will be massively transformed in its quality and in efficiency by AI in that time period. 
    LEE: Is there a comparison, useful comparison, say, between doctors and, say, programmers, computer programmers, or doctors and, I don’t know, lawyers? 
    GATES: Programming is another one that has, kind of, a mathematical correctness to it, you know, and so the objective function that you’re trying to reinforce to, as soon as you can understand the state machines, you can have something that’s “checkable”; that’s correct. So I think programming, you know, which is weird to say, that the machine will beat us at most programming tasks before we let it take over roles that have deep empathy, you know, physical presence and social understanding in them. 
    LEE: Yeah. By the way, you know, I fully expect in five years that AI will produce mathematical proofs that are checkable for validity, easily checkable, because they’ll be written in a proof-checking language like Lean or something but will be so complex that no human mathematician can understand them. I expect that to happen.  
    I can imagine in some fields, like cellular biology, we could have the same situation in the future because the molecular pathways, the chemistry, biochemistry of human cells or living cells is as complex as any mathematics, and so it seems possible that we may be in a state where in wet lab, we see, Oh yeah, this actually works, but no one can understand why. 
    BUBECK: Yeah, absolutely. I mean, I think I really agree with Bill’s distinction of the discovery and the delivery, and indeed, the discovery’s when you can check things, and at the end, there is an artifact that you can verify. You know, you can run the protocol in the wet lab and seeproduced what you wanted. So I absolutely agree with that.  
    And in fact, you know, we don’t have to talk five years from now. I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini. So this is really amazing. And, you know, just very quickly, just so people know, it was about this statistical physics model, the frustrated Potts model, which has to do with coloring, and basically, the case of three colors, like, more than two colors was open for a long time, and o3 was able to reduce the case of three colors to two colors.  
    LEE: Yeah. 
    BUBECK: Which is just, like, astounding. And this is not … this is now. This is happening right now. So this is something that I personally didn’t expect it would happen so quickly, and it’s due to those reasoning models.  
    Now, on the delivery side, I would add something more to it for the reason why doctors and, in fact, lawyers and coders will remain for a long time, and it’s because we still don’t understand how those models generalize. Like, at the end of the day, we are not able to tell you when they are confronted with a really new, novel situation, whether they will work or not. 
    Nobody is able to give you that guarantee. And I think until we understand this generalization better, we’re not going to be willing to just let the system in the wild without human supervision. 
    LEE: But don’t human doctors, human specialists … so, for example, a cardiologist sees a patient in a certain way that a nephrologist … 
    BUBECK: Yeah.
    LEE: … or an endocrinologist might not.
    BUBECK: That’s right. But another cardiologist will understand and, kind of, expect a certain level of generalization from their peer. And this, we just don’t have it with AI models. Now, of course, you’re exactly right. That generalization is also hard for humans. Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know.
    LEE: OK. You know, the podcast is focused on what’s happened over the last two years. But now, I’d like one provocative prediction about what you think the world of AI and medicine is going to be at some point in the future. You pick your timeframe. I don’t care if it’s two years or 20 years from now, but, you know, what do you think will be different about AI in medicine in that future than today? 
    BUBECK: Yeah, I think the deployment is going to accelerate soon. Like, we’re really not missing very much. There is this enormous capability overhang. Like, even if progress completely stopped, with current systems, we can do a lot more than what we’re doing right now. So I think this will … this has to be realized, you know, sooner rather than later. 
    And I think it’s probably dependent on these benchmarks and proper evaluation and tying this with regulation. So these are things that take time in human society and for good reason. But now we already are at two years; you know, give it another two years and it should be really …  
    LEE: Will AI prescribe your medicines? Write your prescriptions? 
    BUBECK: I think yes. I think yes. 
    LEE: OK. Bill? 
    GATES: Well, I think the next two years, we’ll have massive pilots, and so the amount of use of the AI, still in a copilot-type mode, you know, we should get millions of patient visits, you know, both in general medicine and in the mental health side, as well. And I think that’s going to build up both the data and the confidence to give the AI some additional autonomy. You know, are you going to let it talk to you at night when you’re panicked about your mental health with some ability to escalate?
    And, you know, I’ve gone so far as to tell politicians with national health systems that if they deploy AI appropriately, that the quality of care, the overload of the doctors, the improvement in the economics will be enough that their voters will be stunned because they just don’t expect this, and, you know, they could be reelectedjust on this one thing of fixing what is a very overloaded and economically challenged health system in these rich countries. 
    You know, my personal role is going to be to make sure that in the poorer countries, there isn’t some lag; in fact, in many cases, that we’ll be more aggressive because, you know, we’re comparing to having no access to doctors at all. And, you know, so I think whether it’s India or Africa, there’ll be lessons that are globally valuable because we need medical intelligence. And, you know, thank god AI is going to provide a lot of that. 
    LEE: Well, on that optimistic note, I think that’s a good way to end. Bill, Seb, really appreciate all of this.  
    I think the most fundamental prediction we made in the book is that AI would actually find its way into the practice of medicine, and I think that that at least has come true, maybe in different ways than we expected, but it’s come true, and I think it’ll only accelerate from here. So thanks again, both of you.  
    GATES: Yeah. Thanks, you guys. 
    BUBECK: Thank you, Peter. Thanks, Bill. 
    LEE: I just always feel such a sense of privilege to have a chance to interact and actually work with people like Bill and Sébastien.   
    With Bill, I’m always amazed at how practically minded he is. He’s really thinking about the nuts and bolts of what AI might be able to do for people, and his thoughts about underserved parts of the world, the idea that we might actually be able to empower people with access to expert medical knowledge, I think is both inspiring and amazing.  
    And then, Seb, Sébastien Bubeck, he’s just absolutely a brilliant mind. He has a really firm grip on the deep mathematics of artificial intelligence and brings that to bear in his research and development work. And where that mathematics takes him isn’t just into the nuts and bolts of algorithms but into philosophical questions about the nature of intelligence.  
    One of the things that Sébastien brought up was the state of evaluation of AI systems. And indeed, he was fairly critical in our conversation. But of course, the world of AI research and development is just moving so fast, and indeed, since we recorded our conversation, OpenAI, in fact, released a new evaluation metric that is directly relevant to medical applications, and that is something called HealthBench. And Microsoft Research also released a new evaluation approach or process called ADeLe.  
    HealthBench and ADeLe are examples of new approaches to evaluating AI models that are less about testing their knowledge and ability to pass multiple-choice exams and instead are evaluation approaches designed to assess how well AI models are able to complete tasks that actually arise every day in typical healthcare or biomedical research settings. These are examples of really important good work that speak to how well AI models work in the real world of healthcare and biomedical research and how well they can collaborate with human beings in those settings. 
    You know, I asked Bill and Seb to make some predictions about the future. You know, my own answer, I expect that we’re going to be able to use AI to change how we diagnose patients, change how we decide treatment options.  
    If you’re a doctor or a nurse and you encounter a patient, you’ll ask questions, do a physical exam, you know, call out for labs just like you do today, but then you’ll be able to engage with AI based on all of that data and just ask, you know, based on all the other people who have gone through the same experience, who have similar data, how were they diagnosed? How were they treated? What were their outcomes? And what does that mean for the patient I have right now? Some people call it the “patients like me” paradigm. And I think that’s going to become real because of AI within our lifetimes. That idea of really grounding the delivery in healthcare and medical practice through data and intelligence, I actually now don’t see any barriers to that future becoming real.  
    I’d like to extend another big thank you to Bill and Sébastien for their time. And to our listeners, as always, it’s a pleasure to have you along for the ride. I hope you’ll join us for our remaining conversations, as well as a second coauthor roundtable with Carey and Zak.  
    Until next time.  
    #how #reshaping #future #healthcare #medical
    How AI is reshaping the future of healthcare and medical research
    Transcript        PETER LEE: “In ‘The Little Black Bag,’ a classic science fiction story, a high-tech doctor’s kit of the future is accidentally transported back to the 1950s, into the shaky hands of a washed-up, alcoholic doctor. The ultimate medical tool, it redeems the doctor wielding it, allowing him to practice gratifyingly heroic medicine. … The tale ends badly for the doctor and his treacherous assistant, but it offered a picture of how advanced technology could transform medicine—powerful when it was written nearly 75 years ago and still so today. What would be the Al equivalent of that little black bag? At this moment when new capabilities are emerging, how do we imagine them into medicine?”           This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.    Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?     In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.  The book passage I read at the top is from “Chapter 10: The Big Black Bag.”  In imagining AI in medicine, Carey, Zak, and I included in our book two fictional accounts. In the first, a medical resident consults GPT-4 on her personal phone as the patient in front of her crashes. Within seconds, it offers an alternate response based on recent literature. In the second account, a 90-year-old woman with several chronic conditions is living independently and receiving near-constant medical support from an AI aide.    In our conversations with the guests we’ve spoken to so far, we’ve caught a glimpse of these predicted futures, seeing how clinicians and patients are actually using AI today and how developers are leveraging the technology in the healthcare products and services they’re creating. In fact, that first fictional account isn’t so fictional after all, as most of the doctors in the real world actually appear to be using AI at least occasionally—and sometimes much more than occasionally—to help in their daily clinical work. And as for the second fictional account, which is more of a science fiction account, it seems we are indeed on the verge of a new way of delivering and receiving healthcare, though the future is still very much open.  As we continue to examine the current state of AI in healthcare and its potential to transform the field, I’m pleased to welcome Bill Gates and Sébastien Bubeck.   Bill may be best known as the co-founder of Microsoft, having created the company with his childhood friend Paul Allen in 1975. He’s now the founder of Breakthrough Energy, which aims to advance clean energy innovation, and TerraPower, a company developing groundbreaking nuclear energy and science technologies. He also chairs the world’s largest philanthropic organization, the Gates Foundation, and focuses on solving a variety of health challenges around the globe and here at home.  Sébastien is a research lead at OpenAI. He was previously a distinguished scientist, vice president of AI, and a colleague of mine here at Microsoft, where his work included spearheading the development of the family of small language models known as Phi. While at Microsoft, he also coauthored the discussion-provoking 2023 paper “Sparks of Artificial General Intelligence,” which presented the results of early experiments with GPT-4 conducted by a small team from Microsoft Research.      Here’s my conversation with Bill Gates and Sébastien Bubeck.  LEE: Bill, welcome.  BILL GATES: Thank you.  LEE: Seb …  SÉBASTIEN BUBECK: Yeah. Hi, hi, Peter. Nice to be here.  LEE: You know, one of the things that I’ve been doing just to get the conversation warmed up is to talk about origin stories, and what I mean about origin stories is, you know, what was the first contact that you had with large language models or the concept of generative AI that convinced you or made you think that something really important was happening?  And so, Bill, I think I’ve heard the story about, you know, the time when the OpenAI folks—Sam Altman, Greg Brockman, and others—showed you something, but could we hear from you what those early encounters were like and what was going through your mind?   GATES: Well, I’d been visiting OpenAI soon after it was created to see things like GPT-2 and to see the little arm they had that was trying to match human manipulation and, you know, looking at their games like Dota that they were trying to get as good as human play. And honestly, I didn’t think the language model stuff they were doing, even when they got to GPT-3, would show the ability to learn, you know, in the same sense that a human reads a biology book and is able to take that knowledge and access it not only to pass a test but also to create new medicines.  And so my challenge to them was that if their LLM could get a five on the advanced placement biology test, then I would say, OK, it took biologic knowledge and encoded it in an accessible way and that I didn’t expect them to do that very quickly but it would be profound.   And it was only about six months after I challenged them to do that, that an early version of GPT-4 they brought up to a dinner at my house, and in fact, it answered most of the questions that night very well. The one it got totally wrong, we were … because it was so good, we kept thinking, Oh, we must be wrong. It turned out it was a math weaknessthat, you know, we later understood that that was an area of, weirdly, of incredible weakness of those early models. But, you know, that was when I realized, OK, the age of cheap intelligence was at its beginning.  LEE: Yeah. So I guess it seems like you had something similar to me in that my first encounters, I actually harbored some skepticism. Is it fair to say you were skeptical before that?  GATES: Well, the idea that we’ve figured out how to encode and access knowledge in this very deep sense without even understanding the nature of the encoding, …  LEE: Right.   GATES: … that is a bit weird.   LEE: Yeah.  GATES: We have an algorithm that creates the computation, but even say, OK, where is the president’s birthday stored in there? Where is this fact stored in there? The fact that even now when we’re playing around, getting a little bit more sense of it, it’s opaque to us what the semantic encoding is, it’s, kind of, amazing to me. I thought the invention of knowledge storage would be an explicit way of encoding knowledge, not an implicit statistical training.  LEE: Yeah, yeah. All right. So, Seb, you know, on this same topic, you know, I got—as we say at Microsoft—I got pulled into the tent.  BUBECK: Yes.   LEE: Because this was a very secret project. And then, um, I had the opportunity to select a small number of researchers in MSRto join and start investigating this thing seriously. And the first person I pulled in was you.  BUBECK: Yeah.  LEE: And so what were your first encounters? Because I actually don’t remember what happened then.  BUBECK: Oh, I remember it very well.My first encounter with GPT-4 was in a meeting with the two of you, actually. But my kind of first contact, the first moment where I realized that something was happening with generative AI, was before that. And I agree with Bill that I also wasn’t too impressed by GPT-3.  I though that it was kind of, you know, very naturally mimicking the web, sort of parroting what was written there in a nice way. Still in a way which seemed very impressive. But it wasn’t really intelligent in any way. But shortly after GPT-3, there was a model before GPT-4 that really shocked me, and this was the first image generation model, DALL-E 1.  So that was in 2021. And I will forever remember the press release of OpenAI where they had this prompt of an avocado chair and then you had this image of the avocado chair.And what really shocked me is that clearly the model kind of “understood” what is a chair, what is an avocado, and was able to merge those concepts.  So this was really, to me, the first moment where I saw some understanding in those models.   LEE: So this was, just to get the timing right, that was before I pulled you into the tent.  BUBECK: That was before. That was like a year before.  LEE: Right.   BUBECK: And now I will tell you how, you know, we went from that moment to the meeting with the two of you and GPT-4.  So once I saw this kind of understanding, I thought, OK, fine. It understands concept, but it’s still not able to reason. It cannot—as, you know, Bill was saying—it cannot learn from your document. It cannot reason.   So I set out to try to prove that. You know, this is what I was in the business of at the time, trying to prove things in mathematics. So I was trying to prove that basically autoregressive transformers could never reason. So I was trying to prove this. And after a year of work, I had something reasonable to show. And so I had the meeting with the two of you, and I had this example where I wanted to say, there is no way that an LLM is going to be able to do x.  And then as soon as I … I don’t know if you remember, Bill. But as soon as I said that, you said, oh, but wait a second. I had, you know, the OpenAI crew at my house recently, and they showed me a new model. Why don’t we ask this new model this question?   LEE: Yeah. BUBECK: And we did, and it solved it on the spot. And that really, honestly, just changed my life. Like, you know, I had been working for a year trying to say that this was impossible. And just right there, it was shown to be possible.   LEE:One of the very first things I got interested in—because I was really thinking a lot about healthcare—was healthcare and medicine.  And I don’t know if the two of you remember, but I ended up doing a lot of tests. I ran through, you know, step one and step two of the US Medical Licensing Exam. Did a whole bunch of other things. I wrote this big report. It was, you know, I can’t remember … a couple hundred pages.   And I needed to share this with someone. I didn’t … there weren’t too many people I could share it with. So I sent, I think, a copy to you, Bill. Sent a copy to you, Seb.   I hardly slept for about a week putting that report together. And, yeah, and I kept working on it. But I was far from alone. I think everyone who was in the tent, so to speak, in those early days was going through something pretty similar. All right. So I think … of course, a lot of what I put in the report also ended up being examples that made it into the book.  But the main purpose of this conversation isn’t to reminisce aboutor indulge in those reminiscences but to talk about what’s happening in healthcare and medicine. And, you know, as I said, we wrote this book. We did it very, very quickly. Seb, you helped. Bill, you know, you provided a review and some endorsements.  But, you know, honestly, we didn’t know what we were talking about because no one had access to this thing. And so we just made a bunch of guesses. So really, the whole thing I wanted to probe with the two of you is, now with two years of experience out in the world, what, you know, what do we think is happening today?  You know, is AI actually having an impact, positive or negative, on healthcare and medicine? And what do we now think is going to happen in the next two years, five years, or 10 years? And so I realize it’s a little bit too abstract to just ask it that way. So let me just try to narrow the discussion and guide us a little bit.   Um, the kind of administrative and clerical work, paperwork, around healthcare—and we made a lot of guesses about that—that appears to be going well, but, you know, Bill, I know we’ve discussed that sometimes that you think there ought to be a lot more going on. Do you have a viewpoint on how AI is actually finding its way into reducing paperwork?  GATES: Well, I’m stunned … I don’t think there should be a patient-doctor meeting where the AI is not sitting in and both transcribing, offering to help with the paperwork, and even making suggestions, although the doctor will be the one, you know, who makes the final decision about the diagnosis and whatever prescription gets done.   It’s so helpful. You know, when that patient goes home and their, you know, son who wants to understand what happened has some questions, that AI should be available to continue that conversation. And the way you can improve that experience and streamline things and, you know, involve the people who advise you. I don’t understand why that’s not more adopted, because there you still have the human in the loop making that final decision.  But even for, like, follow-up calls to make sure the patient did things, to understand if they have concerns and knowing when to escalate back to the doctor, the benefit is incredible. And, you know, that thing is ready for prime time. That paradigm is ready for prime time, in my view.  LEE: Yeah, there are some good products, but it seems like the number one use right now—and we kind of got this from some of the previous guests in previous episodes—is the use of AI just to respond to emails from patients.Does that make sense to you?  BUBECK: Yeah. So maybe I want to second what Bill was saying but maybe take a step back first. You know, two years ago, like, the concept of clinical scribes, which is one of the things that we’re talking about right now, it would have sounded, in fact, it sounded two years ago, borderline dangerous. Because everybody was worried about hallucinations. What happened if you have this AI listening in and then it transcribes, you know, something wrong?  Now, two years later, I think it’s mostly working. And in fact, it is not yet, you know, fully adopted. You’re right. But it is in production. It is used, you know, in many, many places. So this rate of progress is astounding because it wasn’t obvious that we would be able to overcome those obstacles of hallucination. It’s not to say that hallucinations are fully solved. In the case of the closed system, they are.   Now, I think more generally what’s going on in the background is that there is something that we, that certainly I, underestimated, which is this management overhead. So I think the reason why this is not adopted everywhere is really a training and teaching aspect. People need to be taught, like, those systems, how to interact with them.  And one example that I really like, a study that recently appeared where they tried to use ChatGPT for diagnosis and they were comparing doctors without and with ChatGPT. And the amazing thing … so this was a set of cases where the accuracy of the doctors alone was around 75%. ChatGPT alone was 90%. So that’s already kind of mind blowing. But then the kicker is that doctors with ChatGPT was 80%.   Intelligence alone is not enough. It’s also how it’s presented, how you interact with it. And ChatGPT, it’s an amazing tool. Obviously, I absolutely love it. But it’s not … you don’t want a doctor to have to type in, you know, prompts and use it that way.  It should be, as Bill was saying, kind of running continuously in the background, sending you notifications. And you have to be really careful of the rate at which those notifications are being sent. Because if they are too frequent, then the doctor will learn to ignore them. So you have to … all of those things matter, in fact, at least as much as the level of intelligence of the machine.  LEE: One of the things I think about, Bill, in that scenario that you described, doctors do some thinking about the patient when they write the note. So, you know, I’m always a little uncertain whether it’s actually … you know, you wouldn’t necessarily want to fully automate this, I don’t think. Or at least there needs to be some prompt to the doctor to make sure that the doctor puts some thought into what happened in the encounter with the patient. Does that make sense to you at all?  GATES: At this stage, you know, I’d still put the onus on the doctor to write the conclusions and the summary and not delegate that.  The tradeoffs you make a little bit are somewhat dependent on the situation you’re in. If you’re in Africa, So, yes, the doctor’s still going to have to do a lot of work, but just the quality of letting the patient and the people around them interact and ask questions and have things explained, that alone is such a quality improvement. It’s mind blowing.   LEE: So since you mentioned, you know, Africa—and, of course, this touches on the mission and some of the priorities of the Gates Foundation and this idea of democratization of access to expert medical care—what’s the most interesting stuff going on right now? Are there people and organizations or technologies that are impressing you or that you’re tracking?  GATES: Yeah. So the Gates Foundation has given out a lot of grants to people in Africa doing education, agriculture but more healthcare examples than anything. And the way these things start off, they often start out either being patient-centric in a narrow situation, like, OK, I’m a pregnant woman; talk to me. Or, I have infectious disease symptoms; talk to me. Or they’re connected to a health worker where they’re helping that worker get their job done. And we have lots of pilots out, you know, in both of those cases.   The dream would be eventually to have the thing the patient consults be so broad that it’s like having a doctor available who understands the local things.   LEE: Right.   GATES: We’re not there yet. But over the next two or three years, you know, particularly given the worsening financial constraints against African health systems, where the withdrawal of money has been dramatic, you know, figuring out how to take this—what I sometimes call “free intelligence”—and build a quality health system around that, we will have to be more radical in low-income countries than any rich country is ever going to be.   LEE: Also, there’s maybe a different regulatory environment, so some of those things maybe are easier? Because right now, I think the world hasn’t figured out how to and whether to regulate, let’s say, an AI that might give a medical diagnosis or write a prescription for a medication.  BUBECK: Yeah. I think one issue with this, and it’s also slowing down the deployment of AI in healthcare more generally, is a lack of proper benchmark. Because, you know, you were mentioning the USMLE, for example. That’s a great test to test human beings and their knowledge of healthcare and medicine. But it’s not a great test to give to an AI.  It’s not asking the right questions. So finding what are the right questions to test whether an AI system is ready to give diagnosis in a constrained setting, that’s a very, very important direction, which to my surprise, is not yet accelerating at the rate that I was hoping for.  LEE: OK, so that gives me an excuse to get more now into the core AI tech because something I’ve discussed with both of you is this issue of what are the right tests. And you both know the very first test I give to any new spin of an LLM is I present a patient, the results—a mythical patient—the results of my physical exam, my mythical physical exam. Maybe some results of some initial labs. And then I present or propose a differential diagnosis. And if you’re not in medicine, a differential diagnosis you can just think of as a prioritized list of the possible diagnoses that fit with all that data. And in that proposed differential, I always intentionally make two mistakes.  I make a textbook technical error in one of the possible elements of the differential diagnosis, and I have an error of omission. And, you know, I just want to know, does the LLM understand what I’m talking about? And all the good ones out there do now. But then I want to know, can it spot the errors? And then most importantly, is it willing to tell me I’m wrong, that I’ve made a mistake?   That last piece seems really hard for AI today. And so let me ask you first, Seb, because at the time of this taping, of course, there was a new spin of GPT-4o last week that became overly sycophantic. In other words, it was actually prone in that test of mine not only to not tell me I’m wrong, but it actually praised me for the creativity of my differential.What’s up with that?  BUBECK: Yeah, I guess it’s a testament to the fact that training those models is still more of an art than a science. So it’s a difficult job. Just to be clear with the audience, we have rolled back thatversion of GPT-4o, so now we don’t have the sycophant version out there.  Yeah, no, it’s a really difficult question. It has to do … as you said, it’s very technical. It has to do with the post-training and how, like, where do you nudge the model? So, you know, there is this very classical by now technique called RLHF, where you push the model in the direction of a certain reward model. So the reward model is just telling the model, you know, what behavior is good, what behavior is bad.  But this reward model is itself an LLM, and, you know, Bill was saying at the very beginning of the conversation that we don’t really understand how those LLMs deal with concepts like, you know, where is the capital of France located? Things like that. It is the same thing for this reward model. We don’t know why it says that it prefers one output to another, and whether this is correlated with some sycophancy is, you know, something that we discovered basically just now. That if you push too hard in optimization on this reward model, you will get a sycophant model.  So it’s kind of … what I’m trying to say is we became too good at what we were doing, and we ended up, in fact, in a trap of the reward model.  LEE: I mean, you do want … it’s a difficult balance because you do want models to follow your desires and …  BUBECK: It’s a very difficult, very difficult balance.  LEE: So this brings up then the following question for me, which is the extent to which we think we’ll need to have specially trained models for things. So let me start with you, Bill. Do you have a point of view on whether we will need to, you know, quote-unquote take AI models to med school? Have them specially trained? Like, if you were going to deploy something to give medical care in underserved parts of the world, do we need to do something special to create those models?  GATES: We certainly need to teach them the African languages and the unique dialects so that the multimedia interactions are very high quality. We certainly need to teach them the disease prevalence and unique disease patterns like, you know, neglected tropical diseases and malaria. So we need to gather a set of facts that somebody trying to go for a US customer base, you know, wouldn’t necessarily have that in there.  Those two things are actually very straightforward because the additional training time is small. I’d say for the next few years, we’ll also need to do reinforcement learning about the context of being a doctor and how important certain behaviors are. Humans learn over the course of their life to some degree that, I’m in a different context and the way I behave in terms of being willing to criticize or be nice, you know, how important is it? Who’s here? What’s my relationship to them?   Right now, these machines don’t have that broad social experience. And so if you know it’s going to be used for health things, a lot of reinforcement learning of the very best humans in that context would still be valuable. Eventually, the models will, having read all the literature of the world about good doctors, bad doctors, it’ll understand as soon as you say, “I want you to be a doctor diagnosing somebody.” All of the implicit reinforcement that fits that situation, you know, will be there. LEE: Yeah. GATES: And so I hope three years from now, we don’t have to do that reinforcement learning. But today, for any medical context, you would want a lot of data to reinforce tone, willingness to say things when, you know, there might be something significant at stake.  LEE: Yeah. So, you know, something Bill said, kind of, reminds me of another thing that I think we missed, which is, the context also … and the specialization also pertains to different, I guess, what we still call “modes,” although I don’t know if the idea of multimodal is the same as it was two years ago. But, you know, what do you make of all of the hubbub around—in fact, within Microsoft Research, this is a big deal, but I think we’re far from alone—you know, medical images and vision, video, proteins and molecules, cell, you know, cellular data and so on.  BUBECK: Yeah. OK. So there is a lot to say to everything … to the last, you know, couple of minutes. Maybe on the specialization aspect, you know, I think there is, hiding behind this, a really fundamental scientific question of whether eventually we have a singular AGIthat kind of knows everything and you can just put, you know, explain your own context and it will just get it and understand everything.  That’s one vision. I have to say, I don’t particularly believe in this vision. In fact, we humans are not like that at all. I think, hopefully, we are general intelligences, yet we have to specialize a lot. And, you know, I did myself a lot of RL, reinforcement learning, on mathematics. Like, that’s what I did, you know, spent a lot of time doing that. And I didn’t improve on other aspects. You know, in fact, I probably degraded in other aspects.So it’s … I think it’s an important example to have in mind.  LEE: I think I might disagree with you on that, though, because, like, doesn’t a model have to see both good science and bad science in order to be able to gain the ability to discern between the two?  BUBECK: Yeah, no, that absolutely. I think there is value in seeing the generality, in having a very broad base. But then you, kind of, specialize on verticals. And this is where also, you know, open-weights model, which we haven’t talked about yet, are really important because they allow you to provide this broad base to everyone. And then you can specialize on top of it.  LEE: So we have about three hours of stuff to talk about, but our time is actually running low. BUBECK: Yes, yes, yes.   LEE: So I think I want … there’s a more provocative question. It’s almost a silly question, but I need to ask it of the two of you, which is, is there a future, you know, where AI replaces doctors or replaces, you know, medical specialties that we have today? So what does the world look like, say, five years from now?  GATES: Well, it’s important to distinguish healthcare discovery activity from healthcare delivery activity. We focused mostly on delivery. I think it’s very much within the realm of possibility that the AI is not only accelerating healthcare discovery but substituting for a lot of the roles of, you know, I’m an organic chemist, or I run various types of assays. I can see those, which are, you know, testable-output-type jobs but with still very high value, I can see, you know, some replacement in those areas before the doctor.   The doctor, still understanding the human condition and long-term dialogues, you know, they’ve had a lifetime of reinforcement of that, particularly when you get into areas like mental health. So I wouldn’t say in five years, either people will choose to adopt it, but it will be profound that there’ll be this nearly free intelligence that can do follow-up, that can help you, you know, make sure you went through different possibilities.  And so I’d say, yes, we’ll have doctors, but I’d say healthcare will be massively transformed in its quality and in efficiency by AI in that time period.  LEE: Is there a comparison, useful comparison, say, between doctors and, say, programmers, computer programmers, or doctors and, I don’t know, lawyers?  GATES: Programming is another one that has, kind of, a mathematical correctness to it, you know, and so the objective function that you’re trying to reinforce to, as soon as you can understand the state machines, you can have something that’s “checkable”; that’s correct. So I think programming, you know, which is weird to say, that the machine will beat us at most programming tasks before we let it take over roles that have deep empathy, you know, physical presence and social understanding in them.  LEE: Yeah. By the way, you know, I fully expect in five years that AI will produce mathematical proofs that are checkable for validity, easily checkable, because they’ll be written in a proof-checking language like Lean or something but will be so complex that no human mathematician can understand them. I expect that to happen.   I can imagine in some fields, like cellular biology, we could have the same situation in the future because the molecular pathways, the chemistry, biochemistry of human cells or living cells is as complex as any mathematics, and so it seems possible that we may be in a state where in wet lab, we see, Oh yeah, this actually works, but no one can understand why.  BUBECK: Yeah, absolutely. I mean, I think I really agree with Bill’s distinction of the discovery and the delivery, and indeed, the discovery’s when you can check things, and at the end, there is an artifact that you can verify. You know, you can run the protocol in the wet lab and seeproduced what you wanted. So I absolutely agree with that.   And in fact, you know, we don’t have to talk five years from now. I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini. So this is really amazing. And, you know, just very quickly, just so people know, it was about this statistical physics model, the frustrated Potts model, which has to do with coloring, and basically, the case of three colors, like, more than two colors was open for a long time, and o3 was able to reduce the case of three colors to two colors.   LEE: Yeah.  BUBECK: Which is just, like, astounding. And this is not … this is now. This is happening right now. So this is something that I personally didn’t expect it would happen so quickly, and it’s due to those reasoning models.   Now, on the delivery side, I would add something more to it for the reason why doctors and, in fact, lawyers and coders will remain for a long time, and it’s because we still don’t understand how those models generalize. Like, at the end of the day, we are not able to tell you when they are confronted with a really new, novel situation, whether they will work or not.  Nobody is able to give you that guarantee. And I think until we understand this generalization better, we’re not going to be willing to just let the system in the wild without human supervision.  LEE: But don’t human doctors, human specialists … so, for example, a cardiologist sees a patient in a certain way that a nephrologist …  BUBECK: Yeah. LEE: … or an endocrinologist might not. BUBECK: That’s right. But another cardiologist will understand and, kind of, expect a certain level of generalization from their peer. And this, we just don’t have it with AI models. Now, of course, you’re exactly right. That generalization is also hard for humans. Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know. LEE: OK. You know, the podcast is focused on what’s happened over the last two years. But now, I’d like one provocative prediction about what you think the world of AI and medicine is going to be at some point in the future. You pick your timeframe. I don’t care if it’s two years or 20 years from now, but, you know, what do you think will be different about AI in medicine in that future than today?  BUBECK: Yeah, I think the deployment is going to accelerate soon. Like, we’re really not missing very much. There is this enormous capability overhang. Like, even if progress completely stopped, with current systems, we can do a lot more than what we’re doing right now. So I think this will … this has to be realized, you know, sooner rather than later.  And I think it’s probably dependent on these benchmarks and proper evaluation and tying this with regulation. So these are things that take time in human society and for good reason. But now we already are at two years; you know, give it another two years and it should be really …   LEE: Will AI prescribe your medicines? Write your prescriptions?  BUBECK: I think yes. I think yes.  LEE: OK. Bill?  GATES: Well, I think the next two years, we’ll have massive pilots, and so the amount of use of the AI, still in a copilot-type mode, you know, we should get millions of patient visits, you know, both in general medicine and in the mental health side, as well. And I think that’s going to build up both the data and the confidence to give the AI some additional autonomy. You know, are you going to let it talk to you at night when you’re panicked about your mental health with some ability to escalate? And, you know, I’ve gone so far as to tell politicians with national health systems that if they deploy AI appropriately, that the quality of care, the overload of the doctors, the improvement in the economics will be enough that their voters will be stunned because they just don’t expect this, and, you know, they could be reelectedjust on this one thing of fixing what is a very overloaded and economically challenged health system in these rich countries.  You know, my personal role is going to be to make sure that in the poorer countries, there isn’t some lag; in fact, in many cases, that we’ll be more aggressive because, you know, we’re comparing to having no access to doctors at all. And, you know, so I think whether it’s India or Africa, there’ll be lessons that are globally valuable because we need medical intelligence. And, you know, thank god AI is going to provide a lot of that.  LEE: Well, on that optimistic note, I think that’s a good way to end. Bill, Seb, really appreciate all of this.   I think the most fundamental prediction we made in the book is that AI would actually find its way into the practice of medicine, and I think that that at least has come true, maybe in different ways than we expected, but it’s come true, and I think it’ll only accelerate from here. So thanks again, both of you.   GATES: Yeah. Thanks, you guys.  BUBECK: Thank you, Peter. Thanks, Bill.  LEE: I just always feel such a sense of privilege to have a chance to interact and actually work with people like Bill and Sébastien.    With Bill, I’m always amazed at how practically minded he is. He’s really thinking about the nuts and bolts of what AI might be able to do for people, and his thoughts about underserved parts of the world, the idea that we might actually be able to empower people with access to expert medical knowledge, I think is both inspiring and amazing.   And then, Seb, Sébastien Bubeck, he’s just absolutely a brilliant mind. He has a really firm grip on the deep mathematics of artificial intelligence and brings that to bear in his research and development work. And where that mathematics takes him isn’t just into the nuts and bolts of algorithms but into philosophical questions about the nature of intelligence.   One of the things that Sébastien brought up was the state of evaluation of AI systems. And indeed, he was fairly critical in our conversation. But of course, the world of AI research and development is just moving so fast, and indeed, since we recorded our conversation, OpenAI, in fact, released a new evaluation metric that is directly relevant to medical applications, and that is something called HealthBench. And Microsoft Research also released a new evaluation approach or process called ADeLe.   HealthBench and ADeLe are examples of new approaches to evaluating AI models that are less about testing their knowledge and ability to pass multiple-choice exams and instead are evaluation approaches designed to assess how well AI models are able to complete tasks that actually arise every day in typical healthcare or biomedical research settings. These are examples of really important good work that speak to how well AI models work in the real world of healthcare and biomedical research and how well they can collaborate with human beings in those settings.  You know, I asked Bill and Seb to make some predictions about the future. You know, my own answer, I expect that we’re going to be able to use AI to change how we diagnose patients, change how we decide treatment options.   If you’re a doctor or a nurse and you encounter a patient, you’ll ask questions, do a physical exam, you know, call out for labs just like you do today, but then you’ll be able to engage with AI based on all of that data and just ask, you know, based on all the other people who have gone through the same experience, who have similar data, how were they diagnosed? How were they treated? What were their outcomes? And what does that mean for the patient I have right now? Some people call it the “patients like me” paradigm. And I think that’s going to become real because of AI within our lifetimes. That idea of really grounding the delivery in healthcare and medical practice through data and intelligence, I actually now don’t see any barriers to that future becoming real.   I’d like to extend another big thank you to Bill and Sébastien for their time. And to our listeners, as always, it’s a pleasure to have you along for the ride. I hope you’ll join us for our remaining conversations, as well as a second coauthor roundtable with Carey and Zak.   Until next time.   #how #reshaping #future #healthcare #medical
    WWW.MICROSOFT.COM
    How AI is reshaping the future of healthcare and medical research
    Transcript [MUSIC]      [BOOK PASSAGE]   PETER LEE: “In ‘The Little Black Bag,’ a classic science fiction story, a high-tech doctor’s kit of the future is accidentally transported back to the 1950s, into the shaky hands of a washed-up, alcoholic doctor. The ultimate medical tool, it redeems the doctor wielding it, allowing him to practice gratifyingly heroic medicine. … The tale ends badly for the doctor and his treacherous assistant, but it offered a picture of how advanced technology could transform medicine—powerful when it was written nearly 75 years ago and still so today. What would be the Al equivalent of that little black bag? At this moment when new capabilities are emerging, how do we imagine them into medicine?”   [END OF BOOK PASSAGE]     [THEME MUSIC]     This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.    Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?     In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.   [THEME MUSIC FADES] The book passage I read at the top is from “Chapter 10: The Big Black Bag.”  In imagining AI in medicine, Carey, Zak, and I included in our book two fictional accounts. In the first, a medical resident consults GPT-4 on her personal phone as the patient in front of her crashes. Within seconds, it offers an alternate response based on recent literature. In the second account, a 90-year-old woman with several chronic conditions is living independently and receiving near-constant medical support from an AI aide.    In our conversations with the guests we’ve spoken to so far, we’ve caught a glimpse of these predicted futures, seeing how clinicians and patients are actually using AI today and how developers are leveraging the technology in the healthcare products and services they’re creating. In fact, that first fictional account isn’t so fictional after all, as most of the doctors in the real world actually appear to be using AI at least occasionally—and sometimes much more than occasionally—to help in their daily clinical work. And as for the second fictional account, which is more of a science fiction account, it seems we are indeed on the verge of a new way of delivering and receiving healthcare, though the future is still very much open.  As we continue to examine the current state of AI in healthcare and its potential to transform the field, I’m pleased to welcome Bill Gates and Sébastien Bubeck.   Bill may be best known as the co-founder of Microsoft, having created the company with his childhood friend Paul Allen in 1975. He’s now the founder of Breakthrough Energy, which aims to advance clean energy innovation, and TerraPower, a company developing groundbreaking nuclear energy and science technologies. He also chairs the world’s largest philanthropic organization, the Gates Foundation, and focuses on solving a variety of health challenges around the globe and here at home.  Sébastien is a research lead at OpenAI. He was previously a distinguished scientist, vice president of AI, and a colleague of mine here at Microsoft, where his work included spearheading the development of the family of small language models known as Phi. While at Microsoft, he also coauthored the discussion-provoking 2023 paper “Sparks of Artificial General Intelligence,” which presented the results of early experiments with GPT-4 conducted by a small team from Microsoft Research.    [TRANSITION MUSIC]   Here’s my conversation with Bill Gates and Sébastien Bubeck.  LEE: Bill, welcome.  BILL GATES: Thank you.  LEE: Seb …  SÉBASTIEN BUBECK: Yeah. Hi, hi, Peter. Nice to be here.  LEE: You know, one of the things that I’ve been doing just to get the conversation warmed up is to talk about origin stories, and what I mean about origin stories is, you know, what was the first contact that you had with large language models or the concept of generative AI that convinced you or made you think that something really important was happening?  And so, Bill, I think I’ve heard the story about, you know, the time when the OpenAI folks—Sam Altman, Greg Brockman, and others—showed you something, but could we hear from you what those early encounters were like and what was going through your mind?   GATES: Well, I’d been visiting OpenAI soon after it was created to see things like GPT-2 and to see the little arm they had that was trying to match human manipulation and, you know, looking at their games like Dota that they were trying to get as good as human play. And honestly, I didn’t think the language model stuff they were doing, even when they got to GPT-3, would show the ability to learn, you know, in the same sense that a human reads a biology book and is able to take that knowledge and access it not only to pass a test but also to create new medicines.  And so my challenge to them was that if their LLM could get a five on the advanced placement biology test, then I would say, OK, it took biologic knowledge and encoded it in an accessible way and that I didn’t expect them to do that very quickly but it would be profound.   And it was only about six months after I challenged them to do that, that an early version of GPT-4 they brought up to a dinner at my house, and in fact, it answered most of the questions that night very well. The one it got totally wrong, we were … because it was so good, we kept thinking, Oh, we must be wrong. It turned out it was a math weakness [LAUGHTER] that, you know, we later understood that that was an area of, weirdly, of incredible weakness of those early models. But, you know, that was when I realized, OK, the age of cheap intelligence was at its beginning.  LEE: Yeah. So I guess it seems like you had something similar to me in that my first encounters, I actually harbored some skepticism. Is it fair to say you were skeptical before that?  GATES: Well, the idea that we’ve figured out how to encode and access knowledge in this very deep sense without even understanding the nature of the encoding, …  LEE: Right.   GATES: … that is a bit weird.   LEE: Yeah.  GATES: We have an algorithm that creates the computation, but even say, OK, where is the president’s birthday stored in there? Where is this fact stored in there? The fact that even now when we’re playing around, getting a little bit more sense of it, it’s opaque to us what the semantic encoding is, it’s, kind of, amazing to me. I thought the invention of knowledge storage would be an explicit way of encoding knowledge, not an implicit statistical training.  LEE: Yeah, yeah. All right. So, Seb, you know, on this same topic, you know, I got—as we say at Microsoft—I got pulled into the tent. [LAUGHS]  BUBECK: Yes.   LEE: Because this was a very secret project. And then, um, I had the opportunity to select a small number of researchers in MSR [Microsoft Research] to join and start investigating this thing seriously. And the first person I pulled in was you.  BUBECK: Yeah.  LEE: And so what were your first encounters? Because I actually don’t remember what happened then.  BUBECK: Oh, I remember it very well. [LAUGHS] My first encounter with GPT-4 was in a meeting with the two of you, actually. But my kind of first contact, the first moment where I realized that something was happening with generative AI, was before that. And I agree with Bill that I also wasn’t too impressed by GPT-3.  I though that it was kind of, you know, very naturally mimicking the web, sort of parroting what was written there in a nice way. Still in a way which seemed very impressive. But it wasn’t really intelligent in any way. But shortly after GPT-3, there was a model before GPT-4 that really shocked me, and this was the first image generation model, DALL-E 1.  So that was in 2021. And I will forever remember the press release of OpenAI where they had this prompt of an avocado chair and then you had this image of the avocado chair. [LAUGHTER] And what really shocked me is that clearly the model kind of “understood” what is a chair, what is an avocado, and was able to merge those concepts.  So this was really, to me, the first moment where I saw some understanding in those models.   LEE: So this was, just to get the timing right, that was before I pulled you into the tent.  BUBECK: That was before. That was like a year before.  LEE: Right.   BUBECK: And now I will tell you how, you know, we went from that moment to the meeting with the two of you and GPT-4.  So once I saw this kind of understanding, I thought, OK, fine. It understands concept, but it’s still not able to reason. It cannot—as, you know, Bill was saying—it cannot learn from your document. It cannot reason.   So I set out to try to prove that. You know, this is what I was in the business of at the time, trying to prove things in mathematics. So I was trying to prove that basically autoregressive transformers could never reason. So I was trying to prove this. And after a year of work, I had something reasonable to show. And so I had the meeting with the two of you, and I had this example where I wanted to say, there is no way that an LLM is going to be able to do x.  And then as soon as I … I don’t know if you remember, Bill. But as soon as I said that, you said, oh, but wait a second. I had, you know, the OpenAI crew at my house recently, and they showed me a new model. Why don’t we ask this new model this question?   LEE: Yeah. BUBECK: And we did, and it solved it on the spot. And that really, honestly, just changed my life. Like, you know, I had been working for a year trying to say that this was impossible. And just right there, it was shown to be possible.   LEE: [LAUGHS] One of the very first things I got interested in—because I was really thinking a lot about healthcare—was healthcare and medicine.  And I don’t know if the two of you remember, but I ended up doing a lot of tests. I ran through, you know, step one and step two of the US Medical Licensing Exam. Did a whole bunch of other things. I wrote this big report. It was, you know, I can’t remember … a couple hundred pages.   And I needed to share this with someone. I didn’t … there weren’t too many people I could share it with. So I sent, I think, a copy to you, Bill. Sent a copy to you, Seb.   I hardly slept for about a week putting that report together. And, yeah, and I kept working on it. But I was far from alone. I think everyone who was in the tent, so to speak, in those early days was going through something pretty similar. All right. So I think … of course, a lot of what I put in the report also ended up being examples that made it into the book.  But the main purpose of this conversation isn’t to reminisce about [LAUGHS] or indulge in those reminiscences but to talk about what’s happening in healthcare and medicine. And, you know, as I said, we wrote this book. We did it very, very quickly. Seb, you helped. Bill, you know, you provided a review and some endorsements.  But, you know, honestly, we didn’t know what we were talking about because no one had access to this thing. And so we just made a bunch of guesses. So really, the whole thing I wanted to probe with the two of you is, now with two years of experience out in the world, what, you know, what do we think is happening today?  You know, is AI actually having an impact, positive or negative, on healthcare and medicine? And what do we now think is going to happen in the next two years, five years, or 10 years? And so I realize it’s a little bit too abstract to just ask it that way. So let me just try to narrow the discussion and guide us a little bit.   Um, the kind of administrative and clerical work, paperwork, around healthcare—and we made a lot of guesses about that—that appears to be going well, but, you know, Bill, I know we’ve discussed that sometimes that you think there ought to be a lot more going on. Do you have a viewpoint on how AI is actually finding its way into reducing paperwork?  GATES: Well, I’m stunned … I don’t think there should be a patient-doctor meeting where the AI is not sitting in and both transcribing, offering to help with the paperwork, and even making suggestions, although the doctor will be the one, you know, who makes the final decision about the diagnosis and whatever prescription gets done.   It’s so helpful. You know, when that patient goes home and their, you know, son who wants to understand what happened has some questions, that AI should be available to continue that conversation. And the way you can improve that experience and streamline things and, you know, involve the people who advise you. I don’t understand why that’s not more adopted, because there you still have the human in the loop making that final decision.  But even for, like, follow-up calls to make sure the patient did things, to understand if they have concerns and knowing when to escalate back to the doctor, the benefit is incredible. And, you know, that thing is ready for prime time. That paradigm is ready for prime time, in my view.  LEE: Yeah, there are some good products, but it seems like the number one use right now—and we kind of got this from some of the previous guests in previous episodes—is the use of AI just to respond to emails from patients. [LAUGHTER] Does that make sense to you?  BUBECK: Yeah. So maybe I want to second what Bill was saying but maybe take a step back first. You know, two years ago, like, the concept of clinical scribes, which is one of the things that we’re talking about right now, it would have sounded, in fact, it sounded two years ago, borderline dangerous. Because everybody was worried about hallucinations. What happened if you have this AI listening in and then it transcribes, you know, something wrong?  Now, two years later, I think it’s mostly working. And in fact, it is not yet, you know, fully adopted. You’re right. But it is in production. It is used, you know, in many, many places. So this rate of progress is astounding because it wasn’t obvious that we would be able to overcome those obstacles of hallucination. It’s not to say that hallucinations are fully solved. In the case of the closed system, they are.   Now, I think more generally what’s going on in the background is that there is something that we, that certainly I, underestimated, which is this management overhead. So I think the reason why this is not adopted everywhere is really a training and teaching aspect. People need to be taught, like, those systems, how to interact with them.  And one example that I really like, a study that recently appeared where they tried to use ChatGPT for diagnosis and they were comparing doctors without and with ChatGPT (opens in new tab). And the amazing thing … so this was a set of cases where the accuracy of the doctors alone was around 75%. ChatGPT alone was 90%. So that’s already kind of mind blowing. But then the kicker is that doctors with ChatGPT was 80%.   Intelligence alone is not enough. It’s also how it’s presented, how you interact with it. And ChatGPT, it’s an amazing tool. Obviously, I absolutely love it. But it’s not … you don’t want a doctor to have to type in, you know, prompts and use it that way.  It should be, as Bill was saying, kind of running continuously in the background, sending you notifications. And you have to be really careful of the rate at which those notifications are being sent. Because if they are too frequent, then the doctor will learn to ignore them. So you have to … all of those things matter, in fact, at least as much as the level of intelligence of the machine.  LEE: One of the things I think about, Bill, in that scenario that you described, doctors do some thinking about the patient when they write the note. So, you know, I’m always a little uncertain whether it’s actually … you know, you wouldn’t necessarily want to fully automate this, I don’t think. Or at least there needs to be some prompt to the doctor to make sure that the doctor puts some thought into what happened in the encounter with the patient. Does that make sense to you at all?  GATES: At this stage, you know, I’d still put the onus on the doctor to write the conclusions and the summary and not delegate that.  The tradeoffs you make a little bit are somewhat dependent on the situation you’re in. If you’re in Africa, So, yes, the doctor’s still going to have to do a lot of work, but just the quality of letting the patient and the people around them interact and ask questions and have things explained, that alone is such a quality improvement. It’s mind blowing.   LEE: So since you mentioned, you know, Africa—and, of course, this touches on the mission and some of the priorities of the Gates Foundation and this idea of democratization of access to expert medical care—what’s the most interesting stuff going on right now? Are there people and organizations or technologies that are impressing you or that you’re tracking?  GATES: Yeah. So the Gates Foundation has given out a lot of grants to people in Africa doing education, agriculture but more healthcare examples than anything. And the way these things start off, they often start out either being patient-centric in a narrow situation, like, OK, I’m a pregnant woman; talk to me. Or, I have infectious disease symptoms; talk to me. Or they’re connected to a health worker where they’re helping that worker get their job done. And we have lots of pilots out, you know, in both of those cases.   The dream would be eventually to have the thing the patient consults be so broad that it’s like having a doctor available who understands the local things.   LEE: Right.   GATES: We’re not there yet. But over the next two or three years, you know, particularly given the worsening financial constraints against African health systems, where the withdrawal of money has been dramatic, you know, figuring out how to take this—what I sometimes call “free intelligence”—and build a quality health system around that, we will have to be more radical in low-income countries than any rich country is ever going to be.   LEE: Also, there’s maybe a different regulatory environment, so some of those things maybe are easier? Because right now, I think the world hasn’t figured out how to and whether to regulate, let’s say, an AI that might give a medical diagnosis or write a prescription for a medication.  BUBECK: Yeah. I think one issue with this, and it’s also slowing down the deployment of AI in healthcare more generally, is a lack of proper benchmark. Because, you know, you were mentioning the USMLE [United States Medical Licensing Examination], for example. That’s a great test to test human beings and their knowledge of healthcare and medicine. But it’s not a great test to give to an AI.  It’s not asking the right questions. So finding what are the right questions to test whether an AI system is ready to give diagnosis in a constrained setting, that’s a very, very important direction, which to my surprise, is not yet accelerating at the rate that I was hoping for.  LEE: OK, so that gives me an excuse to get more now into the core AI tech because something I’ve discussed with both of you is this issue of what are the right tests. And you both know the very first test I give to any new spin of an LLM is I present a patient, the results—a mythical patient—the results of my physical exam, my mythical physical exam. Maybe some results of some initial labs. And then I present or propose a differential diagnosis. And if you’re not in medicine, a differential diagnosis you can just think of as a prioritized list of the possible diagnoses that fit with all that data. And in that proposed differential, I always intentionally make two mistakes.  I make a textbook technical error in one of the possible elements of the differential diagnosis, and I have an error of omission. And, you know, I just want to know, does the LLM understand what I’m talking about? And all the good ones out there do now. But then I want to know, can it spot the errors? And then most importantly, is it willing to tell me I’m wrong, that I’ve made a mistake?   That last piece seems really hard for AI today. And so let me ask you first, Seb, because at the time of this taping, of course, there was a new spin of GPT-4o last week that became overly sycophantic. In other words, it was actually prone in that test of mine not only to not tell me I’m wrong, but it actually praised me for the creativity of my differential. [LAUGHTER] What’s up with that?  BUBECK: Yeah, I guess it’s a testament to the fact that training those models is still more of an art than a science. So it’s a difficult job. Just to be clear with the audience, we have rolled back that [LAUGHS] version of GPT-4o, so now we don’t have the sycophant version out there.  Yeah, no, it’s a really difficult question. It has to do … as you said, it’s very technical. It has to do with the post-training and how, like, where do you nudge the model? So, you know, there is this very classical by now technique called RLHF [reinforcement learning from human feedback], where you push the model in the direction of a certain reward model. So the reward model is just telling the model, you know, what behavior is good, what behavior is bad.  But this reward model is itself an LLM, and, you know, Bill was saying at the very beginning of the conversation that we don’t really understand how those LLMs deal with concepts like, you know, where is the capital of France located? Things like that. It is the same thing for this reward model. We don’t know why it says that it prefers one output to another, and whether this is correlated with some sycophancy is, you know, something that we discovered basically just now. That if you push too hard in optimization on this reward model, you will get a sycophant model.  So it’s kind of … what I’m trying to say is we became too good at what we were doing, and we ended up, in fact, in a trap of the reward model.  LEE: I mean, you do want … it’s a difficult balance because you do want models to follow your desires and …  BUBECK: It’s a very difficult, very difficult balance.  LEE: So this brings up then the following question for me, which is the extent to which we think we’ll need to have specially trained models for things. So let me start with you, Bill. Do you have a point of view on whether we will need to, you know, quote-unquote take AI models to med school? Have them specially trained? Like, if you were going to deploy something to give medical care in underserved parts of the world, do we need to do something special to create those models?  GATES: We certainly need to teach them the African languages and the unique dialects so that the multimedia interactions are very high quality. We certainly need to teach them the disease prevalence and unique disease patterns like, you know, neglected tropical diseases and malaria. So we need to gather a set of facts that somebody trying to go for a US customer base, you know, wouldn’t necessarily have that in there.  Those two things are actually very straightforward because the additional training time is small. I’d say for the next few years, we’ll also need to do reinforcement learning about the context of being a doctor and how important certain behaviors are. Humans learn over the course of their life to some degree that, I’m in a different context and the way I behave in terms of being willing to criticize or be nice, you know, how important is it? Who’s here? What’s my relationship to them?   Right now, these machines don’t have that broad social experience. And so if you know it’s going to be used for health things, a lot of reinforcement learning of the very best humans in that context would still be valuable. Eventually, the models will, having read all the literature of the world about good doctors, bad doctors, it’ll understand as soon as you say, “I want you to be a doctor diagnosing somebody.” All of the implicit reinforcement that fits that situation, you know, will be there. LEE: Yeah. GATES: And so I hope three years from now, we don’t have to do that reinforcement learning. But today, for any medical context, you would want a lot of data to reinforce tone, willingness to say things when, you know, there might be something significant at stake.  LEE: Yeah. So, you know, something Bill said, kind of, reminds me of another thing that I think we missed, which is, the context also … and the specialization also pertains to different, I guess, what we still call “modes,” although I don’t know if the idea of multimodal is the same as it was two years ago. But, you know, what do you make of all of the hubbub around—in fact, within Microsoft Research, this is a big deal, but I think we’re far from alone—you know, medical images and vision, video, proteins and molecules, cell, you know, cellular data and so on.  BUBECK: Yeah. OK. So there is a lot to say to everything … to the last, you know, couple of minutes. Maybe on the specialization aspect, you know, I think there is, hiding behind this, a really fundamental scientific question of whether eventually we have a singular AGI [artificial general intelligence] that kind of knows everything and you can just put, you know, explain your own context and it will just get it and understand everything.  That’s one vision. I have to say, I don’t particularly believe in this vision. In fact, we humans are not like that at all. I think, hopefully, we are general intelligences, yet we have to specialize a lot. And, you know, I did myself a lot of RL, reinforcement learning, on mathematics. Like, that’s what I did, you know, spent a lot of time doing that. And I didn’t improve on other aspects. You know, in fact, I probably degraded in other aspects. [LAUGHTER] So it’s … I think it’s an important example to have in mind.  LEE: I think I might disagree with you on that, though, because, like, doesn’t a model have to see both good science and bad science in order to be able to gain the ability to discern between the two?  BUBECK: Yeah, no, that absolutely. I think there is value in seeing the generality, in having a very broad base. But then you, kind of, specialize on verticals. And this is where also, you know, open-weights model, which we haven’t talked about yet, are really important because they allow you to provide this broad base to everyone. And then you can specialize on top of it.  LEE: So we have about three hours of stuff to talk about, but our time is actually running low. BUBECK: Yes, yes, yes.   LEE: So I think I want … there’s a more provocative question. It’s almost a silly question, but I need to ask it of the two of you, which is, is there a future, you know, where AI replaces doctors or replaces, you know, medical specialties that we have today? So what does the world look like, say, five years from now?  GATES: Well, it’s important to distinguish healthcare discovery activity from healthcare delivery activity. We focused mostly on delivery. I think it’s very much within the realm of possibility that the AI is not only accelerating healthcare discovery but substituting for a lot of the roles of, you know, I’m an organic chemist, or I run various types of assays. I can see those, which are, you know, testable-output-type jobs but with still very high value, I can see, you know, some replacement in those areas before the doctor.   The doctor, still understanding the human condition and long-term dialogues, you know, they’ve had a lifetime of reinforcement of that, particularly when you get into areas like mental health. So I wouldn’t say in five years, either people will choose to adopt it, but it will be profound that there’ll be this nearly free intelligence that can do follow-up, that can help you, you know, make sure you went through different possibilities.  And so I’d say, yes, we’ll have doctors, but I’d say healthcare will be massively transformed in its quality and in efficiency by AI in that time period.  LEE: Is there a comparison, useful comparison, say, between doctors and, say, programmers, computer programmers, or doctors and, I don’t know, lawyers?  GATES: Programming is another one that has, kind of, a mathematical correctness to it, you know, and so the objective function that you’re trying to reinforce to, as soon as you can understand the state machines, you can have something that’s “checkable”; that’s correct. So I think programming, you know, which is weird to say, that the machine will beat us at most programming tasks before we let it take over roles that have deep empathy, you know, physical presence and social understanding in them.  LEE: Yeah. By the way, you know, I fully expect in five years that AI will produce mathematical proofs that are checkable for validity, easily checkable, because they’ll be written in a proof-checking language like Lean or something but will be so complex that no human mathematician can understand them. I expect that to happen.   I can imagine in some fields, like cellular biology, we could have the same situation in the future because the molecular pathways, the chemistry, biochemistry of human cells or living cells is as complex as any mathematics, and so it seems possible that we may be in a state where in wet lab, we see, Oh yeah, this actually works, but no one can understand why.  BUBECK: Yeah, absolutely. I mean, I think I really agree with Bill’s distinction of the discovery and the delivery, and indeed, the discovery’s when you can check things, and at the end, there is an artifact that you can verify. You know, you can run the protocol in the wet lab and see [if you have] produced what you wanted. So I absolutely agree with that.   And in fact, you know, we don’t have to talk five years from now. I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini (opens in new tab). So this is really amazing. And, you know, just very quickly, just so people know, it was about this statistical physics model, the frustrated Potts model, which has to do with coloring, and basically, the case of three colors, like, more than two colors was open for a long time, and o3 was able to reduce the case of three colors to two colors.   LEE: Yeah.  BUBECK: Which is just, like, astounding. And this is not … this is now. This is happening right now. So this is something that I personally didn’t expect it would happen so quickly, and it’s due to those reasoning models.   Now, on the delivery side, I would add something more to it for the reason why doctors and, in fact, lawyers and coders will remain for a long time, and it’s because we still don’t understand how those models generalize. Like, at the end of the day, we are not able to tell you when they are confronted with a really new, novel situation, whether they will work or not.  Nobody is able to give you that guarantee. And I think until we understand this generalization better, we’re not going to be willing to just let the system in the wild without human supervision.  LEE: But don’t human doctors, human specialists … so, for example, a cardiologist sees a patient in a certain way that a nephrologist …  BUBECK: Yeah. LEE: … or an endocrinologist might not. BUBECK: That’s right. But another cardiologist will understand and, kind of, expect a certain level of generalization from their peer. And this, we just don’t have it with AI models. Now, of course, you’re exactly right. That generalization is also hard for humans. Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know. LEE: OK. You know, the podcast is focused on what’s happened over the last two years. But now, I’d like one provocative prediction about what you think the world of AI and medicine is going to be at some point in the future. You pick your timeframe. I don’t care if it’s two years or 20 years from now, but, you know, what do you think will be different about AI in medicine in that future than today?  BUBECK: Yeah, I think the deployment is going to accelerate soon. Like, we’re really not missing very much. There is this enormous capability overhang. Like, even if progress completely stopped, with current systems, we can do a lot more than what we’re doing right now. So I think this will … this has to be realized, you know, sooner rather than later.  And I think it’s probably dependent on these benchmarks and proper evaluation and tying this with regulation. So these are things that take time in human society and for good reason. But now we already are at two years; you know, give it another two years and it should be really …   LEE: Will AI prescribe your medicines? Write your prescriptions?  BUBECK: I think yes. I think yes.  LEE: OK. Bill?  GATES: Well, I think the next two years, we’ll have massive pilots, and so the amount of use of the AI, still in a copilot-type mode, you know, we should get millions of patient visits, you know, both in general medicine and in the mental health side, as well. And I think that’s going to build up both the data and the confidence to give the AI some additional autonomy. You know, are you going to let it talk to you at night when you’re panicked about your mental health with some ability to escalate? And, you know, I’ve gone so far as to tell politicians with national health systems that if they deploy AI appropriately, that the quality of care, the overload of the doctors, the improvement in the economics will be enough that their voters will be stunned because they just don’t expect this, and, you know, they could be reelected [LAUGHTER] just on this one thing of fixing what is a very overloaded and economically challenged health system in these rich countries.  You know, my personal role is going to be to make sure that in the poorer countries, there isn’t some lag; in fact, in many cases, that we’ll be more aggressive because, you know, we’re comparing to having no access to doctors at all. And, you know, so I think whether it’s India or Africa, there’ll be lessons that are globally valuable because we need medical intelligence. And, you know, thank god AI is going to provide a lot of that.  LEE: Well, on that optimistic note, I think that’s a good way to end. Bill, Seb, really appreciate all of this.   I think the most fundamental prediction we made in the book is that AI would actually find its way into the practice of medicine, and I think that that at least has come true, maybe in different ways than we expected, but it’s come true, and I think it’ll only accelerate from here. So thanks again, both of you.  [TRANSITION MUSIC]  GATES: Yeah. Thanks, you guys.  BUBECK: Thank you, Peter. Thanks, Bill.  LEE: I just always feel such a sense of privilege to have a chance to interact and actually work with people like Bill and Sébastien.    With Bill, I’m always amazed at how practically minded he is. He’s really thinking about the nuts and bolts of what AI might be able to do for people, and his thoughts about underserved parts of the world, the idea that we might actually be able to empower people with access to expert medical knowledge, I think is both inspiring and amazing.   And then, Seb, Sébastien Bubeck, he’s just absolutely a brilliant mind. He has a really firm grip on the deep mathematics of artificial intelligence and brings that to bear in his research and development work. And where that mathematics takes him isn’t just into the nuts and bolts of algorithms but into philosophical questions about the nature of intelligence.   One of the things that Sébastien brought up was the state of evaluation of AI systems. And indeed, he was fairly critical in our conversation. But of course, the world of AI research and development is just moving so fast, and indeed, since we recorded our conversation, OpenAI, in fact, released a new evaluation metric that is directly relevant to medical applications, and that is something called HealthBench. And Microsoft Research also released a new evaluation approach or process called ADeLe.   HealthBench and ADeLe are examples of new approaches to evaluating AI models that are less about testing their knowledge and ability to pass multiple-choice exams and instead are evaluation approaches designed to assess how well AI models are able to complete tasks that actually arise every day in typical healthcare or biomedical research settings. These are examples of really important good work that speak to how well AI models work in the real world of healthcare and biomedical research and how well they can collaborate with human beings in those settings.  You know, I asked Bill and Seb to make some predictions about the future. You know, my own answer, I expect that we’re going to be able to use AI to change how we diagnose patients, change how we decide treatment options.   If you’re a doctor or a nurse and you encounter a patient, you’ll ask questions, do a physical exam, you know, call out for labs just like you do today, but then you’ll be able to engage with AI based on all of that data and just ask, you know, based on all the other people who have gone through the same experience, who have similar data, how were they diagnosed? How were they treated? What were their outcomes? And what does that mean for the patient I have right now? Some people call it the “patients like me” paradigm. And I think that’s going to become real because of AI within our lifetimes. That idea of really grounding the delivery in healthcare and medical practice through data and intelligence, I actually now don’t see any barriers to that future becoming real.  [THEME MUSIC]  I’d like to extend another big thank you to Bill and Sébastien for their time. And to our listeners, as always, it’s a pleasure to have you along for the ride. I hope you’ll join us for our remaining conversations, as well as a second coauthor roundtable with Carey and Zak.   Until next time.   [MUSIC FADES]
    0 Comments 0 Shares
  • Biofuels policy has been a failure for the climate, new report claims

    Fewer food crops

    Biofuels policy has been a failure for the climate, new report claims

    Report: An expansion of biofuels policy under Trump would lead to more greenhouse gas emissions.

    Georgina Gustin, Inside Climate News



    Jun 14, 2025 7:10 am

    |

    24

    An ethanol production plant on March 20, 2024 near Ravenna, Nebraska.

    Credit:

    David Madison/Getty Images

    An ethanol production plant on March 20, 2024 near Ravenna, Nebraska.

    Credit:

    David Madison/Getty Images

    Story text

    Size

    Small
    Standard
    Large

    Width
    *

    Standard
    Wide

    Links

    Standard
    Orange

    * Subscribers only
      Learn more

    This article originally appeared on Inside Climate News, a nonprofit, non-partisan news organization that covers climate, energy, and the environment. Sign up for their newsletter here.
    The American Midwest is home to some of the richest, most productive farmland in the world, enabling its transformation into a vast corn- and soy-producing machine—a conversion spurred largely by decades-long policies that support the production of biofuels.
    But a new report takes a big swing at the ethanol orthodoxy of American agriculture, criticizing the industry for causing economic and social imbalances across rural communities and saying that the expansion of biofuels will increase greenhouse gas emissions, despite their purported climate benefits.
    The report, from the World Resources Institute, which has been critical of US biofuel policy in the past, draws from 100 academic studies on biofuel impacts. It concludes that ethanol policy has been largely a failure and ought to be reconsidered, especially as the world needs more land to produce food to meet growing demand.
    “Multiple studies show that US biofuel policies have reshaped crop production, displacing food crops and driving up emissions from land conversion, tillage, and fertilizer use,” said the report’s lead author, Haley Leslie-Bole. “Corn-based ethanol, in particular, has contributed to nutrient runoff, degraded water quality and harmed wildlife habitat. As climate pressures grow, increasing irrigation and refining for first-gen biofuels could deepen water scarcity in already drought-prone parts of the Midwest.”
    The conversion of Midwestern agricultural land has been sweeping. Between 2004 and 2024, ethanol production increased by nearly 500 percent. Corn and soybeans are now grown on 92 and 86 million acres of land respectively—and roughly a third of those crops go to produce ethanol. That means about 30 million acres of land that could be used to grow food crops are instead being used to produce ethanol, despite ethanol only accounting for 6 percent of the country’s transportation fuel.

    The biofuels industry—which includes refiners, corn and soy growers and the influential agriculture lobby writ large—has long insisted that corn- and soy-based biofuels provide an energy-efficient alternative to fossil-based fuels. Congress and the US Department of Agriculture have agreed.
    The country’s primary biofuels policy, the Renewable Fuel Standard, requires that biofuels provide a greenhouse gas reduction over fossil fuels: The law says that ethanol from new plants must deliver a 20 percent reduction in greenhouse gas emissions compared to gasoline.
    In addition to greenhouse gas reductions, the industry and its allies in Congress have also continued to say that ethanol is a primary mainstay of the rural economy, benefiting communities across the Midwest.
    But a growing body of research—much of which the industry has tried to debunk and deride—suggests that ethanol actually may not provide the benefits that policies require. It may, in fact, produce more greenhouse gases than the fossil fuels it was intended to replace. Recent research says that biofuel refiners also emit significant amounts of carcinogenic and dangerous substances, including hexane and formaldehyde, in greater amounts than petroleum refineries.
    The new report points to research saying that increased production of biofuels from corn and soy could actually raise greenhouse gas emissions, largely from carbon emissions linked to clearing land in other countries to compensate for the use of land in the Midwest.
    On top of that, corn is an especially fertilizer-hungry crop requiring large amounts of nitrogen-based fertilizer, which releases huge amounts of nitrous oxide when it interacts with the soil. American farming is, by far, the largest source of domestic nitrous oxide emissions already—about 50 percent. If biofuel policies lead to expanded production, emissions of this enormously powerful greenhouse gas will likely increase, too.

    The new report concludes that not only will the expansion of ethanol increase greenhouse gas emissions, but it has also failed to provide the social and financial benefits to Midwestern communities that lawmakers and the industry say it has.“The benefits from biofuels remain concentrated in the hands of a few,” Leslie-Bole said. “As subsidies flow, so may the trend of farmland consolidation, increasing inaccessibility of farmland in the Midwest, and locking out emerging or low-resource farmers. This means the benefits of biofuels production are flowing to fewer people, while more are left bearing the costs.”
    New policies being considered in state legislatures and Congress, including additional tax credits and support for biofuel-based aviation fuel, could expand production, potentially causing more land conversion and greenhouse gas emissions, widening the gap between the rural communities and rich agribusinesses at a time when food demand is climbing and, critics say, land should be used to grow food instead.
    President Donald Trump’s tax cut bill, passed by the House and currently being negotiated in the Senate, would not only extend tax credits for biofuels producers, it specifically excludes calculations of emissions from land conversion when determining what qualifies as a low-emission fuel.
    The primary biofuels industry trade groups, including Growth Energy and the Renewable Fuels Association, did not respond to Inside Climate News requests for comment or interviews.
    An employee with the Clean Fuels Alliance America, which represents biodiesel and sustainable aviation fuel producers, not ethanol, said the report vastly overstates the carbon emissions from crop-based fuels by comparing the farmed land to natural landscapes, which no longer exist.
    They also noted that the impact of soy-based fuels in 2024 was more than billion, providing over 100,000 jobs.
    “Ten percent of the value of every bushel of soybeans is linked to biomass-based fuel,” they said.

    Georgina Gustin, Inside Climate News

    24 Comments
    #biofuels #policy #has #been #failure
    Biofuels policy has been a failure for the climate, new report claims
    Fewer food crops Biofuels policy has been a failure for the climate, new report claims Report: An expansion of biofuels policy under Trump would lead to more greenhouse gas emissions. Georgina Gustin, Inside Climate News – Jun 14, 2025 7:10 am | 24 An ethanol production plant on March 20, 2024 near Ravenna, Nebraska. Credit: David Madison/Getty Images An ethanol production plant on March 20, 2024 near Ravenna, Nebraska. Credit: David Madison/Getty Images Story text Size Small Standard Large Width * Standard Wide Links Standard Orange * Subscribers only   Learn more This article originally appeared on Inside Climate News, a nonprofit, non-partisan news organization that covers climate, energy, and the environment. Sign up for their newsletter here. The American Midwest is home to some of the richest, most productive farmland in the world, enabling its transformation into a vast corn- and soy-producing machine—a conversion spurred largely by decades-long policies that support the production of biofuels. But a new report takes a big swing at the ethanol orthodoxy of American agriculture, criticizing the industry for causing economic and social imbalances across rural communities and saying that the expansion of biofuels will increase greenhouse gas emissions, despite their purported climate benefits. The report, from the World Resources Institute, which has been critical of US biofuel policy in the past, draws from 100 academic studies on biofuel impacts. It concludes that ethanol policy has been largely a failure and ought to be reconsidered, especially as the world needs more land to produce food to meet growing demand. “Multiple studies show that US biofuel policies have reshaped crop production, displacing food crops and driving up emissions from land conversion, tillage, and fertilizer use,” said the report’s lead author, Haley Leslie-Bole. “Corn-based ethanol, in particular, has contributed to nutrient runoff, degraded water quality and harmed wildlife habitat. As climate pressures grow, increasing irrigation and refining for first-gen biofuels could deepen water scarcity in already drought-prone parts of the Midwest.” The conversion of Midwestern agricultural land has been sweeping. Between 2004 and 2024, ethanol production increased by nearly 500 percent. Corn and soybeans are now grown on 92 and 86 million acres of land respectively—and roughly a third of those crops go to produce ethanol. That means about 30 million acres of land that could be used to grow food crops are instead being used to produce ethanol, despite ethanol only accounting for 6 percent of the country’s transportation fuel. The biofuels industry—which includes refiners, corn and soy growers and the influential agriculture lobby writ large—has long insisted that corn- and soy-based biofuels provide an energy-efficient alternative to fossil-based fuels. Congress and the US Department of Agriculture have agreed. The country’s primary biofuels policy, the Renewable Fuel Standard, requires that biofuels provide a greenhouse gas reduction over fossil fuels: The law says that ethanol from new plants must deliver a 20 percent reduction in greenhouse gas emissions compared to gasoline. In addition to greenhouse gas reductions, the industry and its allies in Congress have also continued to say that ethanol is a primary mainstay of the rural economy, benefiting communities across the Midwest. But a growing body of research—much of which the industry has tried to debunk and deride—suggests that ethanol actually may not provide the benefits that policies require. It may, in fact, produce more greenhouse gases than the fossil fuels it was intended to replace. Recent research says that biofuel refiners also emit significant amounts of carcinogenic and dangerous substances, including hexane and formaldehyde, in greater amounts than petroleum refineries. The new report points to research saying that increased production of biofuels from corn and soy could actually raise greenhouse gas emissions, largely from carbon emissions linked to clearing land in other countries to compensate for the use of land in the Midwest. On top of that, corn is an especially fertilizer-hungry crop requiring large amounts of nitrogen-based fertilizer, which releases huge amounts of nitrous oxide when it interacts with the soil. American farming is, by far, the largest source of domestic nitrous oxide emissions already—about 50 percent. If biofuel policies lead to expanded production, emissions of this enormously powerful greenhouse gas will likely increase, too. The new report concludes that not only will the expansion of ethanol increase greenhouse gas emissions, but it has also failed to provide the social and financial benefits to Midwestern communities that lawmakers and the industry say it has.“The benefits from biofuels remain concentrated in the hands of a few,” Leslie-Bole said. “As subsidies flow, so may the trend of farmland consolidation, increasing inaccessibility of farmland in the Midwest, and locking out emerging or low-resource farmers. This means the benefits of biofuels production are flowing to fewer people, while more are left bearing the costs.” New policies being considered in state legislatures and Congress, including additional tax credits and support for biofuel-based aviation fuel, could expand production, potentially causing more land conversion and greenhouse gas emissions, widening the gap between the rural communities and rich agribusinesses at a time when food demand is climbing and, critics say, land should be used to grow food instead. President Donald Trump’s tax cut bill, passed by the House and currently being negotiated in the Senate, would not only extend tax credits for biofuels producers, it specifically excludes calculations of emissions from land conversion when determining what qualifies as a low-emission fuel. The primary biofuels industry trade groups, including Growth Energy and the Renewable Fuels Association, did not respond to Inside Climate News requests for comment or interviews. An employee with the Clean Fuels Alliance America, which represents biodiesel and sustainable aviation fuel producers, not ethanol, said the report vastly overstates the carbon emissions from crop-based fuels by comparing the farmed land to natural landscapes, which no longer exist. They also noted that the impact of soy-based fuels in 2024 was more than billion, providing over 100,000 jobs. “Ten percent of the value of every bushel of soybeans is linked to biomass-based fuel,” they said. Georgina Gustin, Inside Climate News 24 Comments #biofuels #policy #has #been #failure
    ARSTECHNICA.COM
    Biofuels policy has been a failure for the climate, new report claims
    Fewer food crops Biofuels policy has been a failure for the climate, new report claims Report: An expansion of biofuels policy under Trump would lead to more greenhouse gas emissions. Georgina Gustin, Inside Climate News – Jun 14, 2025 7:10 am | 24 An ethanol production plant on March 20, 2024 near Ravenna, Nebraska. Credit: David Madison/Getty Images An ethanol production plant on March 20, 2024 near Ravenna, Nebraska. Credit: David Madison/Getty Images Story text Size Small Standard Large Width * Standard Wide Links Standard Orange * Subscribers only   Learn more This article originally appeared on Inside Climate News, a nonprofit, non-partisan news organization that covers climate, energy, and the environment. Sign up for their newsletter here. The American Midwest is home to some of the richest, most productive farmland in the world, enabling its transformation into a vast corn- and soy-producing machine—a conversion spurred largely by decades-long policies that support the production of biofuels. But a new report takes a big swing at the ethanol orthodoxy of American agriculture, criticizing the industry for causing economic and social imbalances across rural communities and saying that the expansion of biofuels will increase greenhouse gas emissions, despite their purported climate benefits. The report, from the World Resources Institute, which has been critical of US biofuel policy in the past, draws from 100 academic studies on biofuel impacts. It concludes that ethanol policy has been largely a failure and ought to be reconsidered, especially as the world needs more land to produce food to meet growing demand. “Multiple studies show that US biofuel policies have reshaped crop production, displacing food crops and driving up emissions from land conversion, tillage, and fertilizer use,” said the report’s lead author, Haley Leslie-Bole. “Corn-based ethanol, in particular, has contributed to nutrient runoff, degraded water quality and harmed wildlife habitat. As climate pressures grow, increasing irrigation and refining for first-gen biofuels could deepen water scarcity in already drought-prone parts of the Midwest.” The conversion of Midwestern agricultural land has been sweeping. Between 2004 and 2024, ethanol production increased by nearly 500 percent. Corn and soybeans are now grown on 92 and 86 million acres of land respectively—and roughly a third of those crops go to produce ethanol. That means about 30 million acres of land that could be used to grow food crops are instead being used to produce ethanol, despite ethanol only accounting for 6 percent of the country’s transportation fuel. The biofuels industry—which includes refiners, corn and soy growers and the influential agriculture lobby writ large—has long insisted that corn- and soy-based biofuels provide an energy-efficient alternative to fossil-based fuels. Congress and the US Department of Agriculture have agreed. The country’s primary biofuels policy, the Renewable Fuel Standard, requires that biofuels provide a greenhouse gas reduction over fossil fuels: The law says that ethanol from new plants must deliver a 20 percent reduction in greenhouse gas emissions compared to gasoline. In addition to greenhouse gas reductions, the industry and its allies in Congress have also continued to say that ethanol is a primary mainstay of the rural economy, benefiting communities across the Midwest. But a growing body of research—much of which the industry has tried to debunk and deride—suggests that ethanol actually may not provide the benefits that policies require. It may, in fact, produce more greenhouse gases than the fossil fuels it was intended to replace. Recent research says that biofuel refiners also emit significant amounts of carcinogenic and dangerous substances, including hexane and formaldehyde, in greater amounts than petroleum refineries. The new report points to research saying that increased production of biofuels from corn and soy could actually raise greenhouse gas emissions, largely from carbon emissions linked to clearing land in other countries to compensate for the use of land in the Midwest. On top of that, corn is an especially fertilizer-hungry crop requiring large amounts of nitrogen-based fertilizer, which releases huge amounts of nitrous oxide when it interacts with the soil. American farming is, by far, the largest source of domestic nitrous oxide emissions already—about 50 percent. If biofuel policies lead to expanded production, emissions of this enormously powerful greenhouse gas will likely increase, too. The new report concludes that not only will the expansion of ethanol increase greenhouse gas emissions, but it has also failed to provide the social and financial benefits to Midwestern communities that lawmakers and the industry say it has. (The report defines the Midwest as Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, and Wisconsin.) “The benefits from biofuels remain concentrated in the hands of a few,” Leslie-Bole said. “As subsidies flow, so may the trend of farmland consolidation, increasing inaccessibility of farmland in the Midwest, and locking out emerging or low-resource farmers. This means the benefits of biofuels production are flowing to fewer people, while more are left bearing the costs.” New policies being considered in state legislatures and Congress, including additional tax credits and support for biofuel-based aviation fuel, could expand production, potentially causing more land conversion and greenhouse gas emissions, widening the gap between the rural communities and rich agribusinesses at a time when food demand is climbing and, critics say, land should be used to grow food instead. President Donald Trump’s tax cut bill, passed by the House and currently being negotiated in the Senate, would not only extend tax credits for biofuels producers, it specifically excludes calculations of emissions from land conversion when determining what qualifies as a low-emission fuel. The primary biofuels industry trade groups, including Growth Energy and the Renewable Fuels Association, did not respond to Inside Climate News requests for comment or interviews. An employee with the Clean Fuels Alliance America, which represents biodiesel and sustainable aviation fuel producers, not ethanol, said the report vastly overstates the carbon emissions from crop-based fuels by comparing the farmed land to natural landscapes, which no longer exist. They also noted that the impact of soy-based fuels in 2024 was more than $42 billion, providing over 100,000 jobs. “Ten percent of the value of every bushel of soybeans is linked to biomass-based fuel,” they said. Georgina Gustin, Inside Climate News 24 Comments
    0 Comments 0 Shares
  • For June’s Patch Tuesday, 68 fixes — and two zero-day flaws

    Microsoft offered up a fairly light Patch Tuesday release this month, with 68 patches to Microsoft Windows and Microsoft Office. There were no updates for Exchange or SQL server and just two minor patches for Microsoft Edge. That said, two zero-day vulnerabilitieshave led to a “Patch Now” recommendation for both Windows and Office.To help navigate these changes, the team from Readiness has provided auseful  infographic detailing the risks involved when deploying the latest updates.Known issues

    Microsoft released a limited number of known issues for June, with a product-focused issue and a very minor display concern:

    Microsoft Excel: This a rare product level entry in the “known issues” category — an advisory that “square brackets” orare not supported in Excel filenames. An error is generated, advising the user to remove the offending characters.

    Windows 10: There are reports of blurry or unclear CJKtext when displayed at 96 DPIin Chromium-based browsers such as Microsoft Edge and Google Chrome. This is a limited resource issue, as the font resolution in Windows 10 does not fully match the high-level resolution of the Noto font. Microsoft recommends changing the display scaling to 125% or 150% to improve clarity.

    Major revisions and mitigations

    Microsoft might have won an award for the shortest time between releasing an update and a revision with:

    CVE-2025-33073: Windows SMB Client Elevation of Privilege. Microsoft worked to address a vulnerability where improper access control in Windows SMB allows an attacker to elevate privileges over a network. This patch was revised on the same day as its initial release.

    Windows lifecycle and enforcement updates

    Microsoft did not release any enforcement updates for June.

    Each month, the Readiness team analyzes Microsoft’s latest updates and provides technically sound, actionable testing plans. While June’s release includes no stated functional changes, many foundational components across authentication, storage, networking, and user experience have been updated.

    For this testing guide, we grouped Microsoft’s updates by Windows feature and then accompanied the section with prescriptive test actions and rationale to help prioritize enterprise efforts.

    Core OS and UI compatibility

    Microsoft updated several core kernel drivers affecting Windows as a whole. This is a low-level system change and carries a high risk of compatibility and system issues. In addition, core Microsoft print libraries have been included in the update, requiring additional print testing in addition to the following recommendations:

    Run print operations from 32-bit applications on 64-bit Windows environments.

    Use different print drivers and configurations.

    Observe printing from older productivity apps and virtual environments.

    Remote desktop and network connectivity

    This update could impact the reliability of remote access while broken DHCP-to-DNS integration can block device onboarding, and NAT misbehavior disrupts VPNs or site-to-site routing configurations. We recommend the following tests be performed:

    Create and reconnect Remote Desktopsessions under varying network conditions.

    Confirm that DHCP-assigned IP addresses are correctly registered with DNS in AD-integrated environments.

    Test modifying NAT and routing settings in RRAS configurations and ensure that changes persist across reboots.

    Filesystem, SMB and storage

    Updates to the core Windows storage libraries affect nearly every command related to Microsoft Storage Spaces. A minor misalignment here can result in degraded clusters, orphaned volumes, or data loss in a failover scenario. These are high-priority components in modern data center and hybrid cloud infrastructure, with the following storage-related testing recommendations:

    Access file shares using server names, FQDNs, and IP addresses.

    Enable and validate encrypted and compressed file-share operations between clients and servers.

    Run tests that create, open, and read from system log files using various file and storage configurations.

    Validate core cluster storage management tasks, including creating and managing storage pools, tiers, and volumes.

    Test disk addition/removal, failover behaviors, and resiliency settings.

    Run system-level storage diagnostics across active and passive nodes in the cluster.

    Windows installer and recovery

    Microsoft delivered another update to the Windows Installerapplication infrastructure. Broken or regressed Installer package MSI handling disrupts app deployment pipelines while putting core business applications at risk. We suggest the following tests for the latest changes to MSI Installer, Windows Recovery and Microsoft’s Virtualization Based Security:

    Perform installation, repair, and uninstallation of MSI Installer packages using standard enterprise deployment tools.

    Validate restore point behavior for points older than 60 days under varying virtualization-based securitysettings.

    Check both client and server behaviors for allowed or blocked restores.

    We highly recommend prioritizing printer testing this month, then remote desktop deployment testing to ensure your core business applications install and uninstall as expected.

    Each month, we break down the update cycle into product familieswith the following basic groupings: 

    Browsers;

    Microsoft Windows;

    Microsoft Office;

    Microsoft Exchange and SQL Server; 

    Microsoft Developer Tools;

    And Adobe.

    Browsers

    Microsoft delivered a very minor series of updates to Microsoft Edge. The  browser receives two Chrome patcheswhere both updates are rated important. These low-profile changes can be added to your standard release calendar.

    Microsoft Windows

    Microsoft released five critical patches and40 patches rated important. This month the five critical Windows patches cover the following desktop and server vulnerabilities:

    Missing release of memory after effective lifetime in Windows Cryptographic Servicesallows an unauthorized attacker to execute code over a network.

    Use after free in Windows Remote Desktop Services allows an unauthorized attacker to execute code over a network.

    Use after free in Windows KDC Proxy Serviceallows an unauthorized attacker to execute code over a network.

    Use of uninitialized resources in Windows Netlogon allows an unauthorized attacker to elevate privileges over a network.

    Unfortunately, CVE-2025-33073 has been reported as publicly disclosed while CVE-2025-33053 has been reported as exploited. Given these two zero-days, the Readiness recommends a “Patch Now” release schedule for your Windows updates.

    Microsoft Office

    Microsoft released five critical updates and a further 13 rated important for Office. The critical patches deal with memory related and “use after free” memory allocation issues affecting the entire platform. Due to the number and severity of these issues, we recommend a “Patch Now” schedule for Office for this Patch Tuesday release.

    Microsoft Exchange and SQL Server

    There are no updates for either Microsoft Exchange or SQL Server this month. 

    Developer tools

    There were only three low-level updatesreleased, affecting .NET and Visual Studio. Add these updates to your standard developer release schedule.

    AdobeAdobe has releaseda single update to Adobe Acrobat. There were two other non-Microsoft updated releases affecting the Chromium platform, which were covered in the Browser section above.
    #junes #patch #tuesday #fixes #two
    For June’s Patch Tuesday, 68 fixes — and two zero-day flaws
    Microsoft offered up a fairly light Patch Tuesday release this month, with 68 patches to Microsoft Windows and Microsoft Office. There were no updates for Exchange or SQL server and just two minor patches for Microsoft Edge. That said, two zero-day vulnerabilitieshave led to a “Patch Now” recommendation for both Windows and Office.To help navigate these changes, the team from Readiness has provided auseful  infographic detailing the risks involved when deploying the latest updates.Known issues Microsoft released a limited number of known issues for June, with a product-focused issue and a very minor display concern: Microsoft Excel: This a rare product level entry in the “known issues” category — an advisory that “square brackets” orare not supported in Excel filenames. An error is generated, advising the user to remove the offending characters. Windows 10: There are reports of blurry or unclear CJKtext when displayed at 96 DPIin Chromium-based browsers such as Microsoft Edge and Google Chrome. This is a limited resource issue, as the font resolution in Windows 10 does not fully match the high-level resolution of the Noto font. Microsoft recommends changing the display scaling to 125% or 150% to improve clarity. Major revisions and mitigations Microsoft might have won an award for the shortest time between releasing an update and a revision with: CVE-2025-33073: Windows SMB Client Elevation of Privilege. Microsoft worked to address a vulnerability where improper access control in Windows SMB allows an attacker to elevate privileges over a network. This patch was revised on the same day as its initial release. Windows lifecycle and enforcement updates Microsoft did not release any enforcement updates for June. Each month, the Readiness team analyzes Microsoft’s latest updates and provides technically sound, actionable testing plans. While June’s release includes no stated functional changes, many foundational components across authentication, storage, networking, and user experience have been updated. For this testing guide, we grouped Microsoft’s updates by Windows feature and then accompanied the section with prescriptive test actions and rationale to help prioritize enterprise efforts. Core OS and UI compatibility Microsoft updated several core kernel drivers affecting Windows as a whole. This is a low-level system change and carries a high risk of compatibility and system issues. In addition, core Microsoft print libraries have been included in the update, requiring additional print testing in addition to the following recommendations: Run print operations from 32-bit applications on 64-bit Windows environments. Use different print drivers and configurations. Observe printing from older productivity apps and virtual environments. Remote desktop and network connectivity This update could impact the reliability of remote access while broken DHCP-to-DNS integration can block device onboarding, and NAT misbehavior disrupts VPNs or site-to-site routing configurations. We recommend the following tests be performed: Create and reconnect Remote Desktopsessions under varying network conditions. Confirm that DHCP-assigned IP addresses are correctly registered with DNS in AD-integrated environments. Test modifying NAT and routing settings in RRAS configurations and ensure that changes persist across reboots. Filesystem, SMB and storage Updates to the core Windows storage libraries affect nearly every command related to Microsoft Storage Spaces. A minor misalignment here can result in degraded clusters, orphaned volumes, or data loss in a failover scenario. These are high-priority components in modern data center and hybrid cloud infrastructure, with the following storage-related testing recommendations: Access file shares using server names, FQDNs, and IP addresses. Enable and validate encrypted and compressed file-share operations between clients and servers. Run tests that create, open, and read from system log files using various file and storage configurations. Validate core cluster storage management tasks, including creating and managing storage pools, tiers, and volumes. Test disk addition/removal, failover behaviors, and resiliency settings. Run system-level storage diagnostics across active and passive nodes in the cluster. Windows installer and recovery Microsoft delivered another update to the Windows Installerapplication infrastructure. Broken or regressed Installer package MSI handling disrupts app deployment pipelines while putting core business applications at risk. We suggest the following tests for the latest changes to MSI Installer, Windows Recovery and Microsoft’s Virtualization Based Security: Perform installation, repair, and uninstallation of MSI Installer packages using standard enterprise deployment tools. Validate restore point behavior for points older than 60 days under varying virtualization-based securitysettings. Check both client and server behaviors for allowed or blocked restores. We highly recommend prioritizing printer testing this month, then remote desktop deployment testing to ensure your core business applications install and uninstall as expected. Each month, we break down the update cycle into product familieswith the following basic groupings:  Browsers; Microsoft Windows; Microsoft Office; Microsoft Exchange and SQL Server;  Microsoft Developer Tools; And Adobe. Browsers Microsoft delivered a very minor series of updates to Microsoft Edge. The  browser receives two Chrome patcheswhere both updates are rated important. These low-profile changes can be added to your standard release calendar. Microsoft Windows Microsoft released five critical patches and40 patches rated important. This month the five critical Windows patches cover the following desktop and server vulnerabilities: Missing release of memory after effective lifetime in Windows Cryptographic Servicesallows an unauthorized attacker to execute code over a network. Use after free in Windows Remote Desktop Services allows an unauthorized attacker to execute code over a network. Use after free in Windows KDC Proxy Serviceallows an unauthorized attacker to execute code over a network. Use of uninitialized resources in Windows Netlogon allows an unauthorized attacker to elevate privileges over a network. Unfortunately, CVE-2025-33073 has been reported as publicly disclosed while CVE-2025-33053 has been reported as exploited. Given these two zero-days, the Readiness recommends a “Patch Now” release schedule for your Windows updates. Microsoft Office Microsoft released five critical updates and a further 13 rated important for Office. The critical patches deal with memory related and “use after free” memory allocation issues affecting the entire platform. Due to the number and severity of these issues, we recommend a “Patch Now” schedule for Office for this Patch Tuesday release. Microsoft Exchange and SQL Server There are no updates for either Microsoft Exchange or SQL Server this month.  Developer tools There were only three low-level updatesreleased, affecting .NET and Visual Studio. Add these updates to your standard developer release schedule. AdobeAdobe has releaseda single update to Adobe Acrobat. There were two other non-Microsoft updated releases affecting the Chromium platform, which were covered in the Browser section above. #junes #patch #tuesday #fixes #two
    WWW.COMPUTERWORLD.COM
    For June’s Patch Tuesday, 68 fixes — and two zero-day flaws
    Microsoft offered up a fairly light Patch Tuesday release this month, with 68 patches to Microsoft Windows and Microsoft Office. There were no updates for Exchange or SQL server and just two minor patches for Microsoft Edge. That said, two zero-day vulnerabilities (CVE-2025-33073 and CVE-2025-33053) have led to a “Patch Now” recommendation for both Windows and Office. (Developers can follow their usual release cadence with updates to Microsoft .NET and Visual Studio.) To help navigate these changes, the team from Readiness has provided auseful  infographic detailing the risks involved when deploying the latest updates. (More information about recent Patch Tuesday releases is available here.) Known issues Microsoft released a limited number of known issues for June, with a product-focused issue and a very minor display concern: Microsoft Excel: This a rare product level entry in the “known issues” category — an advisory that “square brackets” or [] are not supported in Excel filenames. An error is generated, advising the user to remove the offending characters. Windows 10: There are reports of blurry or unclear CJK (Chinese, Japanese, Korean) text when displayed at 96 DPI (100% scaling) in Chromium-based browsers such as Microsoft Edge and Google Chrome. This is a limited resource issue, as the font resolution in Windows 10 does not fully match the high-level resolution of the Noto font. Microsoft recommends changing the display scaling to 125% or 150% to improve clarity. Major revisions and mitigations Microsoft might have won an award for the shortest time between releasing an update and a revision with: CVE-2025-33073: Windows SMB Client Elevation of Privilege. Microsoft worked to address a vulnerability where improper access control in Windows SMB allows an attacker to elevate privileges over a network. This patch was revised on the same day as its initial release (and has been revised again for documentation purposes). Windows lifecycle and enforcement updates Microsoft did not release any enforcement updates for June. Each month, the Readiness team analyzes Microsoft’s latest updates and provides technically sound, actionable testing plans. While June’s release includes no stated functional changes, many foundational components across authentication, storage, networking, and user experience have been updated. For this testing guide, we grouped Microsoft’s updates by Windows feature and then accompanied the section with prescriptive test actions and rationale to help prioritize enterprise efforts. Core OS and UI compatibility Microsoft updated several core kernel drivers affecting Windows as a whole. This is a low-level system change and carries a high risk of compatibility and system issues. In addition, core Microsoft print libraries have been included in the update, requiring additional print testing in addition to the following recommendations: Run print operations from 32-bit applications on 64-bit Windows environments. Use different print drivers and configurations (e.g., local, networked). Observe printing from older productivity apps and virtual environments. Remote desktop and network connectivity This update could impact the reliability of remote access while broken DHCP-to-DNS integration can block device onboarding, and NAT misbehavior disrupts VPNs or site-to-site routing configurations. We recommend the following tests be performed: Create and reconnect Remote Desktop (RDP) sessions under varying network conditions. Confirm that DHCP-assigned IP addresses are correctly registered with DNS in AD-integrated environments. Test modifying NAT and routing settings in RRAS configurations and ensure that changes persist across reboots. Filesystem, SMB and storage Updates to the core Windows storage libraries affect nearly every command related to Microsoft Storage Spaces. A minor misalignment here can result in degraded clusters, orphaned volumes, or data loss in a failover scenario. These are high-priority components in modern data center and hybrid cloud infrastructure, with the following storage-related testing recommendations: Access file shares using server names, FQDNs, and IP addresses. Enable and validate encrypted and compressed file-share operations between clients and servers. Run tests that create, open, and read from system log files using various file and storage configurations. Validate core cluster storage management tasks, including creating and managing storage pools, tiers, and volumes. Test disk addition/removal, failover behaviors, and resiliency settings. Run system-level storage diagnostics across active and passive nodes in the cluster. Windows installer and recovery Microsoft delivered another update to the Windows Installer (MSI) application infrastructure. Broken or regressed Installer package MSI handling disrupts app deployment pipelines while putting core business applications at risk. We suggest the following tests for the latest changes to MSI Installer, Windows Recovery and Microsoft’s Virtualization Based Security (VBS): Perform installation, repair, and uninstallation of MSI Installer packages using standard enterprise deployment tools (e.g. Intune). Validate restore point behavior for points older than 60 days under varying virtualization-based security (VBS) settings. Check both client and server behaviors for allowed or blocked restores. We highly recommend prioritizing printer testing this month, then remote desktop deployment testing to ensure your core business applications install and uninstall as expected. Each month, we break down the update cycle into product families (as defined by Microsoft) with the following basic groupings:  Browsers (Microsoft IE and Edge); Microsoft Windows (both desktop and server); Microsoft Office; Microsoft Exchange and SQL Server;  Microsoft Developer Tools (Visual Studio and .NET); And Adobe (if you get this far). Browsers Microsoft delivered a very minor series of updates to Microsoft Edge. The  browser receives two Chrome patches (CVE-2025-5068 and CVE-2025-5419) where both updates are rated important. These low-profile changes can be added to your standard release calendar. Microsoft Windows Microsoft released five critical patches and (a smaller than usual) 40 patches rated important. This month the five critical Windows patches cover the following desktop and server vulnerabilities: Missing release of memory after effective lifetime in Windows Cryptographic Services (WCS) allows an unauthorized attacker to execute code over a network. Use after free in Windows Remote Desktop Services allows an unauthorized attacker to execute code over a network. Use after free in Windows KDC Proxy Service (KPSSVC) allows an unauthorized attacker to execute code over a network. Use of uninitialized resources in Windows Netlogon allows an unauthorized attacker to elevate privileges over a network. Unfortunately, CVE-2025-33073 has been reported as publicly disclosed while CVE-2025-33053 has been reported as exploited. Given these two zero-days, the Readiness recommends a “Patch Now” release schedule for your Windows updates. Microsoft Office Microsoft released five critical updates and a further 13 rated important for Office. The critical patches deal with memory related and “use after free” memory allocation issues affecting the entire platform. Due to the number and severity of these issues, we recommend a “Patch Now” schedule for Office for this Patch Tuesday release. Microsoft Exchange and SQL Server There are no updates for either Microsoft Exchange or SQL Server this month.  Developer tools There were only three low-level updates (product focused and rated important) released, affecting .NET and Visual Studio. Add these updates to your standard developer release schedule. Adobe (and 3rd party updates) Adobe has released (but Microsoft has not co-published) a single update to Adobe Acrobat (APSB25-57). There were two other non-Microsoft updated releases affecting the Chromium platform, which were covered in the Browser section above.
    0 Comments 0 Shares
  • ByteDance Researchers Introduce DetailFlow: A 1D Coarse-to-Fine Autoregressive Framework for Faster, Token-Efficient Image Generation

    Autoregressive image generation has been shaped by advances in sequential modeling, originally seen in natural language processing. This field focuses on generating images one token at a time, similar to how sentences are constructed in language models. The appeal of this approach lies in its ability to maintain structural coherence across the image while allowing for high levels of control during the generation process. As researchers began to apply these techniques to visual data, they found that structured prediction not only preserved spatial integrity but also supported tasks like image manipulation and multimodal translation effectively.
    Despite these benefits, generating high-resolution images remains computationally expensive and slow. A primary issue is the number of tokens needed to represent complex visuals. Raster-scan methods that flatten 2D images into linear sequences require thousands of tokens for detailed images, resulting in long inference times and high memory consumption. Models like Infinity need over 10,000 tokens for a 1024×1024 image. This becomes unsustainable for real-time applications or when scaling to more extensive datasets. Reducing the token burden while preserving or improving output quality has become a pressing challenge.

    Efforts to mitigate token inflation have led to innovations like next-scale prediction seen in VAR and FlexVAR. These models create images by predicting progressively finer scales, which imitates the human tendency to sketch rough outlines before adding detail. However, they still rely on hundreds of tokens—680 in the case of VAR and FlexVAR for 256×256 images. Moreover, approaches like TiTok and FlexTok use 1D tokenization to compress spatial redundancy, but they often fail to scale efficiently. For example, FlexTok’s gFID increases from 1.9 at 32 tokens to 2.5 at 256 tokens, highlighting a degradation in output quality as the token count grows.
    Researchers from ByteDance introduced DetailFlow, a 1D autoregressive image generation framework. This method arranges token sequences from global to fine detail using a process called next-detail prediction. Unlike traditional 2D raster-scan or scale-based techniques, DetailFlow employs a 1D tokenizer trained on progressively degraded images. This design allows the model to prioritize foundational image structures before refining visual details. By mapping tokens directly to resolution levels, DetailFlow significantly reduces token requirements, enabling images to be generated in a semantically ordered, coarse-to-fine manner.

    The mechanism in DetailFlow centers on a 1D latent space where each token contributes incrementally more detail. Earlier tokens encode global features, while later tokens refine specific visual aspects. To train this, the researchers created a resolution mapping function that links token count to target resolution. During training, the model is exposed to images of varying quality levels and learns to predict progressively higher-resolution outputs as more tokens are introduced. It also implements parallel token prediction by grouping sequences and predicting entire sets at once. Since parallel prediction can introduce sampling errors, a self-correction mechanism was integrated. This system perturbs certain tokens during training and teaches subsequent tokens to compensate, ensuring that final images maintain structural and visual integrity.
    The results from the experiments on the ImageNet 256×256 benchmark were noteworthy. DetailFlow achieved a gFID score of 2.96 using only 128 tokens, outperforming VAR at 3.3 and FlexVAR at 3.05, both of which used 680 tokens. Even more impressive, DetailFlow-64 reached a gFID of 2.62 using 512 tokens. In terms of speed, it delivered nearly double the inference rate of VAR and FlexVAR. A further ablation study confirmed that the self-correction training and semantic ordering of tokens substantially improved output quality. For example, enabling self-correction dropped the gFID from 4.11 to 3.68 in one setting. These metrics demonstrate both higher quality and faster generation compared to established models.

    By focusing on semantic structure and reducing redundancy, DetailFlow presents a viable solution to long-standing issues in autoregressive image generation. The method’s coarse-to-fine approach, efficient parallel decoding, and ability to self-correct highlight how architectural innovations can address performance and scalability limitations. Through their structured use of 1D tokens, the researchers from ByteDance have demonstrated a model that maintains high image fidelity while significantly reducing computational load, making it a valuable addition to image synthesis research.

    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.
    NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement FinetuningNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces LLaDA-V: A Purely Diffusion-Based Multimodal Large Language Model for Visual Instruction Tuning and Multimodal ReasoningNikhilhttps://www.marktechpost.com/author/nikhil0980/NVIDIA AI Introduces Fast-dLLM: A Training-Free Framework That Brings KV Caching and Parallel Decoding to Diffusion LLMsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meet NovelSeek: A Unified Multi-Agent Framework for Autonomous Scientific Research from Hypothesis Generation to Experimental Validation
    #bytedance #researchers #introduce #detailflow #coarsetofine
    ByteDance Researchers Introduce DetailFlow: A 1D Coarse-to-Fine Autoregressive Framework for Faster, Token-Efficient Image Generation
    Autoregressive image generation has been shaped by advances in sequential modeling, originally seen in natural language processing. This field focuses on generating images one token at a time, similar to how sentences are constructed in language models. The appeal of this approach lies in its ability to maintain structural coherence across the image while allowing for high levels of control during the generation process. As researchers began to apply these techniques to visual data, they found that structured prediction not only preserved spatial integrity but also supported tasks like image manipulation and multimodal translation effectively. Despite these benefits, generating high-resolution images remains computationally expensive and slow. A primary issue is the number of tokens needed to represent complex visuals. Raster-scan methods that flatten 2D images into linear sequences require thousands of tokens for detailed images, resulting in long inference times and high memory consumption. Models like Infinity need over 10,000 tokens for a 1024×1024 image. This becomes unsustainable for real-time applications or when scaling to more extensive datasets. Reducing the token burden while preserving or improving output quality has become a pressing challenge. Efforts to mitigate token inflation have led to innovations like next-scale prediction seen in VAR and FlexVAR. These models create images by predicting progressively finer scales, which imitates the human tendency to sketch rough outlines before adding detail. However, they still rely on hundreds of tokens—680 in the case of VAR and FlexVAR for 256×256 images. Moreover, approaches like TiTok and FlexTok use 1D tokenization to compress spatial redundancy, but they often fail to scale efficiently. For example, FlexTok’s gFID increases from 1.9 at 32 tokens to 2.5 at 256 tokens, highlighting a degradation in output quality as the token count grows. Researchers from ByteDance introduced DetailFlow, a 1D autoregressive image generation framework. This method arranges token sequences from global to fine detail using a process called next-detail prediction. Unlike traditional 2D raster-scan or scale-based techniques, DetailFlow employs a 1D tokenizer trained on progressively degraded images. This design allows the model to prioritize foundational image structures before refining visual details. By mapping tokens directly to resolution levels, DetailFlow significantly reduces token requirements, enabling images to be generated in a semantically ordered, coarse-to-fine manner. The mechanism in DetailFlow centers on a 1D latent space where each token contributes incrementally more detail. Earlier tokens encode global features, while later tokens refine specific visual aspects. To train this, the researchers created a resolution mapping function that links token count to target resolution. During training, the model is exposed to images of varying quality levels and learns to predict progressively higher-resolution outputs as more tokens are introduced. It also implements parallel token prediction by grouping sequences and predicting entire sets at once. Since parallel prediction can introduce sampling errors, a self-correction mechanism was integrated. This system perturbs certain tokens during training and teaches subsequent tokens to compensate, ensuring that final images maintain structural and visual integrity. The results from the experiments on the ImageNet 256×256 benchmark were noteworthy. DetailFlow achieved a gFID score of 2.96 using only 128 tokens, outperforming VAR at 3.3 and FlexVAR at 3.05, both of which used 680 tokens. Even more impressive, DetailFlow-64 reached a gFID of 2.62 using 512 tokens. In terms of speed, it delivered nearly double the inference rate of VAR and FlexVAR. A further ablation study confirmed that the self-correction training and semantic ordering of tokens substantially improved output quality. For example, enabling self-correction dropped the gFID from 4.11 to 3.68 in one setting. These metrics demonstrate both higher quality and faster generation compared to established models. By focusing on semantic structure and reducing redundancy, DetailFlow presents a viable solution to long-standing issues in autoregressive image generation. The method’s coarse-to-fine approach, efficient parallel decoding, and ability to self-correct highlight how architectural innovations can address performance and scalability limitations. Through their structured use of 1D tokens, the researchers from ByteDance have demonstrated a model that maintains high image fidelity while significantly reducing computational load, making it a valuable addition to image synthesis research. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement FinetuningNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces LLaDA-V: A Purely Diffusion-Based Multimodal Large Language Model for Visual Instruction Tuning and Multimodal ReasoningNikhilhttps://www.marktechpost.com/author/nikhil0980/NVIDIA AI Introduces Fast-dLLM: A Training-Free Framework That Brings KV Caching and Parallel Decoding to Diffusion LLMsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meet NovelSeek: A Unified Multi-Agent Framework for Autonomous Scientific Research from Hypothesis Generation to Experimental Validation #bytedance #researchers #introduce #detailflow #coarsetofine
    WWW.MARKTECHPOST.COM
    ByteDance Researchers Introduce DetailFlow: A 1D Coarse-to-Fine Autoregressive Framework for Faster, Token-Efficient Image Generation
    Autoregressive image generation has been shaped by advances in sequential modeling, originally seen in natural language processing. This field focuses on generating images one token at a time, similar to how sentences are constructed in language models. The appeal of this approach lies in its ability to maintain structural coherence across the image while allowing for high levels of control during the generation process. As researchers began to apply these techniques to visual data, they found that structured prediction not only preserved spatial integrity but also supported tasks like image manipulation and multimodal translation effectively. Despite these benefits, generating high-resolution images remains computationally expensive and slow. A primary issue is the number of tokens needed to represent complex visuals. Raster-scan methods that flatten 2D images into linear sequences require thousands of tokens for detailed images, resulting in long inference times and high memory consumption. Models like Infinity need over 10,000 tokens for a 1024×1024 image. This becomes unsustainable for real-time applications or when scaling to more extensive datasets. Reducing the token burden while preserving or improving output quality has become a pressing challenge. Efforts to mitigate token inflation have led to innovations like next-scale prediction seen in VAR and FlexVAR. These models create images by predicting progressively finer scales, which imitates the human tendency to sketch rough outlines before adding detail. However, they still rely on hundreds of tokens—680 in the case of VAR and FlexVAR for 256×256 images. Moreover, approaches like TiTok and FlexTok use 1D tokenization to compress spatial redundancy, but they often fail to scale efficiently. For example, FlexTok’s gFID increases from 1.9 at 32 tokens to 2.5 at 256 tokens, highlighting a degradation in output quality as the token count grows. Researchers from ByteDance introduced DetailFlow, a 1D autoregressive image generation framework. This method arranges token sequences from global to fine detail using a process called next-detail prediction. Unlike traditional 2D raster-scan or scale-based techniques, DetailFlow employs a 1D tokenizer trained on progressively degraded images. This design allows the model to prioritize foundational image structures before refining visual details. By mapping tokens directly to resolution levels, DetailFlow significantly reduces token requirements, enabling images to be generated in a semantically ordered, coarse-to-fine manner. The mechanism in DetailFlow centers on a 1D latent space where each token contributes incrementally more detail. Earlier tokens encode global features, while later tokens refine specific visual aspects. To train this, the researchers created a resolution mapping function that links token count to target resolution. During training, the model is exposed to images of varying quality levels and learns to predict progressively higher-resolution outputs as more tokens are introduced. It also implements parallel token prediction by grouping sequences and predicting entire sets at once. Since parallel prediction can introduce sampling errors, a self-correction mechanism was integrated. This system perturbs certain tokens during training and teaches subsequent tokens to compensate, ensuring that final images maintain structural and visual integrity. The results from the experiments on the ImageNet 256×256 benchmark were noteworthy. DetailFlow achieved a gFID score of 2.96 using only 128 tokens, outperforming VAR at 3.3 and FlexVAR at 3.05, both of which used 680 tokens. Even more impressive, DetailFlow-64 reached a gFID of 2.62 using 512 tokens. In terms of speed, it delivered nearly double the inference rate of VAR and FlexVAR. A further ablation study confirmed that the self-correction training and semantic ordering of tokens substantially improved output quality. For example, enabling self-correction dropped the gFID from 4.11 to 3.68 in one setting. These metrics demonstrate both higher quality and faster generation compared to established models. By focusing on semantic structure and reducing redundancy, DetailFlow presents a viable solution to long-standing issues in autoregressive image generation. The method’s coarse-to-fine approach, efficient parallel decoding, and ability to self-correct highlight how architectural innovations can address performance and scalability limitations. Through their structured use of 1D tokens, the researchers from ByteDance have demonstrated a model that maintains high image fidelity while significantly reducing computational load, making it a valuable addition to image synthesis research. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement FinetuningNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces LLaDA-V: A Purely Diffusion-Based Multimodal Large Language Model for Visual Instruction Tuning and Multimodal ReasoningNikhilhttps://www.marktechpost.com/author/nikhil0980/NVIDIA AI Introduces Fast-dLLM: A Training-Free Framework That Brings KV Caching and Parallel Decoding to Diffusion LLMsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meet NovelSeek: A Unified Multi-Agent Framework for Autonomous Scientific Research from Hypothesis Generation to Experimental Validation
    Like
    Love
    Wow
    Sad
    Angry
    821
    0 Comments 0 Shares
  • How Old Is Too Old When Buying an Apple Watch?

    We may earn a commission from links on this page.In 2023, I decided to update my Apple Watch after consistently failing to wear my Series 4 for a number of years. I sold that one on Poshmark and began looking at newer models to find one with enough features to convince me to actually wear it. I opted to get a Series 8, although the Series 9 had just been released, as I was buying two: one for my mom and one for myself. As it turns out, that was a great decision.If you're searching for a new wearable or considering upgrading yours, you might also be wondering which of the older Apple Watch models is still useful today. My Series 8 is holding up beautifully three years after it was introduced, so I'm a big proponent of using older devices as long as possible. But not all Apple Watches will work as well as the Series 8 does in 2025. Don’t buy a watch Apple doesn’t support anymoreWe have to draw the line somewhere: Seven of Apple's watches are no longer supported, meaning they won't receive any software or security updates anymore. In addition, you run the risk that the watch will no longer be compatible with your iPhone or certain apps. In short, you shouldn't buy a watch that Apple doesn't support. That includes the following:Apple Watch Series 0Apple Watch Series 1Apple Watch Series 2Apple Watch Series 3Apple Watch Series 4Apple Watch Series 5Apple Watch SEWhile the company does currently support the Series 6, it is next in line to join this list. It's not clear when that will happen, but you can be sure it will. We'll see next week—when Apple reveals watchOS 26—whether the watch will be supported another year. If not, it'll be stuck on watchOS 11 for good.Performance and other generational Watch improvementsThere are considerations for older Apple Watch models that extend beyond their ability to simply run the latest operating system. With each generation, improvements are made in some form or another. For instance, the Series 4 introduced the ECG sensor, while the Series 6 introduced the blood oxygen sensor. The Series 7 charges faster than its predecessors, and Apple has included fast charging on most watch models since.In general, each Apple Watch is faster than the last. Apple tends to put its newest S-Chip—the Apple Watch's processor—in its latest watch series. Simply put, a newer S-chip gives you a faster, more productive product. The Series 6 has an S6 chip, Series 7 has S7, and so on until you hit the Ultras.While there are some core features all currently supported watches share—like workout and swim tracking, sleep tracking, Apple Pay, ECG scanning, and the ability to read and respond to messages—newer models also each have some of their own special advancements and upgrades. Here's a brief list:The Series 7 introduced faster charging, a larger display, and more durable screen.The Series 8 brought temperature sensing, crash detection, and a low-power mode for conserving battery.The Series 9 debuted new gesture controls, on-device Siri access, more precise location tracking in Find My, and a display with double the brightness of the Series 8.The first-gen Apple Watch Ultra introduced a more durable titanium casing, custom shortcuts to apps and modes via the Action button, a depth gauge and water temperature sensor, more accurate GPS, a 36-hour battery life, and an emergency siren.The Apple Watch Ultra 2 introduced a display with a maximum brightness of 3,000 nits and on-device media playback. The Series 10 introduced the largest display available on a standard Apple Watch and faster charging. If you see a feature you absolutely need in a particular watch model, you'll have to spring for it. But if you just want something for core Apple Watch tasks, you can start to consider older options. Apple's watch comparison site can be a helpful tool for identifying different features among models. Battery degradation All tech degrades to some extent and the Apple Watch is no different—particularly when it comes to the battery. While there are ways to mitigate the problem, over time, the lithium-ion battery powering your wrist computer won't last as long as it used to. That might be a bigger issue than your watch's ability to download and support a new operating system. Apple's warranty doesn't cover batteries that wear down from normal use, and charges for the repair, which you could instead put towards the purchase of a new watch. There is one exception: Battery service is free if you have AppleCare+ and your watch's battery holds less than 80% of its original capacity. You need to take your watch in to an Apple Store or service provider to have it tested. My watch was pre-owned, and while I have no way of knowing if it has its original battery, my battery life has not declined substantially in the two years I've been using it daily. I primarily use mine to track my workouts, vitals, and sleep, which means it's always running. I charge it while I'm in the shower and occasionally for a few minutes before bed, and that's about it. On an average day of constant notifications, mine lasts me a bit longer than the advertised 18-hour mark. Because I have little interest in the small improvements offered by the Series 9 and Series 10—like extra brightness, larger screen size, performance bumps, and advanced cycle tracking—the battery life is what wouldcompel me to upgrade in the future, but for now, I have not noticed any problems. I asked my mom if she's noticed any battery degradation on hers, since I bought it at the same time and place as mine, and she said no. She uses hers to track walking workouts, talk on the phone, and monitor her sleep and vitals, too.Stick with the Series 7 or newerThoroughly consider which of the features on newer models are actually important to you before making any buying decision and, if you can, stay above a Series 7. The Series 6 is still functional, but, again, it's a matter of time until the company stops acknowledging that one completely. For now, I have been pleasantly surprised by how well my Series 8 has held up for two years. Its touchscreen has never faltered, the external buttons function perfectly, it syncs to all of my apps and devices with no problem, and it does exactly what I need it to do—which is to tell me how many steps I'm taking and how hard I'm exerting myself at the gym. If you're in the market for a smart watch, I see no reason that an older version shouldn't be considered, as long as it still runs the latest operating system. You can save a chunk of change by sourcing an older model from the resale or refurbished markets and put that money away for when Apple drops something super revolutionary in the wearable space. Apple doesn't sell anything below a Series 10 or SE directly anymore, so if you want a 6, 7, 8, or 9, you'll have to check the resale and refurbished markets. You'll definitely save some money that way.

    Apple Watch Series 8Learn More

    Learn More
    #how #old #too #when #buying
    How Old Is Too Old When Buying an Apple Watch?
    We may earn a commission from links on this page.In 2023, I decided to update my Apple Watch after consistently failing to wear my Series 4 for a number of years. I sold that one on Poshmark and began looking at newer models to find one with enough features to convince me to actually wear it. I opted to get a Series 8, although the Series 9 had just been released, as I was buying two: one for my mom and one for myself. As it turns out, that was a great decision.If you're searching for a new wearable or considering upgrading yours, you might also be wondering which of the older Apple Watch models is still useful today. My Series 8 is holding up beautifully three years after it was introduced, so I'm a big proponent of using older devices as long as possible. But not all Apple Watches will work as well as the Series 8 does in 2025. Don’t buy a watch Apple doesn’t support anymoreWe have to draw the line somewhere: Seven of Apple's watches are no longer supported, meaning they won't receive any software or security updates anymore. In addition, you run the risk that the watch will no longer be compatible with your iPhone or certain apps. In short, you shouldn't buy a watch that Apple doesn't support. That includes the following:Apple Watch Series 0Apple Watch Series 1Apple Watch Series 2Apple Watch Series 3Apple Watch Series 4Apple Watch Series 5Apple Watch SEWhile the company does currently support the Series 6, it is next in line to join this list. It's not clear when that will happen, but you can be sure it will. We'll see next week—when Apple reveals watchOS 26—whether the watch will be supported another year. If not, it'll be stuck on watchOS 11 for good.Performance and other generational Watch improvementsThere are considerations for older Apple Watch models that extend beyond their ability to simply run the latest operating system. With each generation, improvements are made in some form or another. For instance, the Series 4 introduced the ECG sensor, while the Series 6 introduced the blood oxygen sensor. The Series 7 charges faster than its predecessors, and Apple has included fast charging on most watch models since.In general, each Apple Watch is faster than the last. Apple tends to put its newest S-Chip—the Apple Watch's processor—in its latest watch series. Simply put, a newer S-chip gives you a faster, more productive product. The Series 6 has an S6 chip, Series 7 has S7, and so on until you hit the Ultras.While there are some core features all currently supported watches share—like workout and swim tracking, sleep tracking, Apple Pay, ECG scanning, and the ability to read and respond to messages—newer models also each have some of their own special advancements and upgrades. Here's a brief list:The Series 7 introduced faster charging, a larger display, and more durable screen.The Series 8 brought temperature sensing, crash detection, and a low-power mode for conserving battery.The Series 9 debuted new gesture controls, on-device Siri access, more precise location tracking in Find My, and a display with double the brightness of the Series 8.The first-gen Apple Watch Ultra introduced a more durable titanium casing, custom shortcuts to apps and modes via the Action button, a depth gauge and water temperature sensor, more accurate GPS, a 36-hour battery life, and an emergency siren.The Apple Watch Ultra 2 introduced a display with a maximum brightness of 3,000 nits and on-device media playback. The Series 10 introduced the largest display available on a standard Apple Watch and faster charging. If you see a feature you absolutely need in a particular watch model, you'll have to spring for it. But if you just want something for core Apple Watch tasks, you can start to consider older options. Apple's watch comparison site can be a helpful tool for identifying different features among models. Battery degradation All tech degrades to some extent and the Apple Watch is no different—particularly when it comes to the battery. While there are ways to mitigate the problem, over time, the lithium-ion battery powering your wrist computer won't last as long as it used to. That might be a bigger issue than your watch's ability to download and support a new operating system. Apple's warranty doesn't cover batteries that wear down from normal use, and charges for the repair, which you could instead put towards the purchase of a new watch. There is one exception: Battery service is free if you have AppleCare+ and your watch's battery holds less than 80% of its original capacity. You need to take your watch in to an Apple Store or service provider to have it tested. My watch was pre-owned, and while I have no way of knowing if it has its original battery, my battery life has not declined substantially in the two years I've been using it daily. I primarily use mine to track my workouts, vitals, and sleep, which means it's always running. I charge it while I'm in the shower and occasionally for a few minutes before bed, and that's about it. On an average day of constant notifications, mine lasts me a bit longer than the advertised 18-hour mark. Because I have little interest in the small improvements offered by the Series 9 and Series 10—like extra brightness, larger screen size, performance bumps, and advanced cycle tracking—the battery life is what wouldcompel me to upgrade in the future, but for now, I have not noticed any problems. I asked my mom if she's noticed any battery degradation on hers, since I bought it at the same time and place as mine, and she said no. She uses hers to track walking workouts, talk on the phone, and monitor her sleep and vitals, too.Stick with the Series 7 or newerThoroughly consider which of the features on newer models are actually important to you before making any buying decision and, if you can, stay above a Series 7. The Series 6 is still functional, but, again, it's a matter of time until the company stops acknowledging that one completely. For now, I have been pleasantly surprised by how well my Series 8 has held up for two years. Its touchscreen has never faltered, the external buttons function perfectly, it syncs to all of my apps and devices with no problem, and it does exactly what I need it to do—which is to tell me how many steps I'm taking and how hard I'm exerting myself at the gym. If you're in the market for a smart watch, I see no reason that an older version shouldn't be considered, as long as it still runs the latest operating system. You can save a chunk of change by sourcing an older model from the resale or refurbished markets and put that money away for when Apple drops something super revolutionary in the wearable space. Apple doesn't sell anything below a Series 10 or SE directly anymore, so if you want a 6, 7, 8, or 9, you'll have to check the resale and refurbished markets. You'll definitely save some money that way. Apple Watch Series 8Learn More Learn More #how #old #too #when #buying
    LIFEHACKER.COM
    How Old Is Too Old When Buying an Apple Watch?
    We may earn a commission from links on this page.In 2023, I decided to update my Apple Watch after consistently failing to wear my Series 4 for a number of years. I sold that one on Poshmark and began looking at newer models to find one with enough features to convince me to actually wear it. I opted to get a Series 8, although the Series 9 had just been released, as I was buying two: one for my mom and one for myself. As it turns out, that was a great decision.If you're searching for a new wearable or considering upgrading yours, you might also be wondering which of the older Apple Watch models is still useful today. My Series 8 is holding up beautifully three years after it was introduced, so I'm a big proponent of using older devices as long as possible. But not all Apple Watches will work as well as the Series 8 does in 2025. Don’t buy a watch Apple doesn’t support anymoreWe have to draw the line somewhere: Seven of Apple's watches are no longer supported, meaning they won't receive any software or security updates anymore. In addition, you run the risk that the watch will no longer be compatible with your iPhone or certain apps. In short, you shouldn't buy a watch that Apple doesn't support. That includes the following:Apple Watch Series 0Apple Watch Series 1Apple Watch Series 2Apple Watch Series 3Apple Watch Series 4Apple Watch Series 5Apple Watch SE (first-gen) While the company does currently support the Series 6, it is next in line to join this list. It's not clear when that will happen, but you can be sure it will. We'll see next week—when Apple reveals watchOS 26—whether the watch will be supported another year. If not, it'll be stuck on watchOS 11 for good.Performance and other generational Watch improvementsThere are considerations for older Apple Watch models that extend beyond their ability to simply run the latest operating system. With each generation, improvements are made in some form or another. For instance, the Series 4 introduced the ECG sensor, while the Series 6 introduced the blood oxygen sensor (though Apple had to disable the feature for the Series 9 and Ultra 2 in the U.S. due to a lawsuit). The Series 7 charges faster than its predecessors, and Apple has included fast charging on most watch models since (sorry, Apple Watch SE users).In general, each Apple Watch is faster than the last. Apple tends to put its newest S-Chip—the Apple Watch's processor—in its latest watch series. Simply put, a newer S-chip gives you a faster, more productive product. The Series 6 has an S6 chip, Series 7 has S7, and so on until you hit the Ultras. (The first-generation Ultra has an S8 chip like the Series 8, while the Ultra 2 has an S9 chip like the Series 9.)While there are some core features all currently supported watches share—like workout and swim tracking, sleep tracking, Apple Pay, ECG scanning, and the ability to read and respond to messages—newer models also each have some of their own special advancements and upgrades. Here's a brief list:The Series 7 introduced faster charging, a larger display, and more durable screen.The Series 8 brought temperature sensing, crash detection, and a low-power mode for conserving battery (as did the second-gen Apple Watch SE).The Series 9 debuted new gesture controls, on-device Siri access, more precise location tracking in Find My, and a display with double the brightness of the Series 8.The first-gen Apple Watch Ultra introduced a more durable titanium casing, custom shortcuts to apps and modes via the Action button, a depth gauge and water temperature sensor, more accurate GPS, a 36-hour battery life, and an emergency siren.The Apple Watch Ultra 2 introduced a display with a maximum brightness of 3,000 nits and on-device media playback. The Series 10 introduced the largest display available on a standard Apple Watch and faster charging. If you see a feature you absolutely need in a particular watch model, you'll have to spring for it. But if you just want something for core Apple Watch tasks, you can start to consider older options. Apple's watch comparison site can be a helpful tool for identifying different features among models. Battery degradation All tech degrades to some extent and the Apple Watch is no different—particularly when it comes to the battery. While there are ways to mitigate the problem, over time, the lithium-ion battery powering your wrist computer won't last as long as it used to. That might be a bigger issue than your watch's ability to download and support a new operating system. Apple's warranty doesn't cover batteries that wear down from normal use, and charges $99 for the repair, which you could instead put towards the purchase of a new watch. There is one exception: Battery service is free if you have AppleCare+ and your watch's battery holds less than 80% of its original capacity. You need to take your watch in to an Apple Store or service provider to have it tested. My watch was pre-owned, and while I have no way of knowing if it has its original battery, my battery life has not declined substantially in the two years I've been using it daily. I primarily use mine to track my workouts, vitals, and sleep, which means it's always running. I charge it while I'm in the shower and occasionally for a few minutes before bed, and that's about it. On an average day of constant notifications, mine lasts me a bit longer than the advertised 18-hour mark. Because I have little interest in the small improvements offered by the Series 9 and Series 10—like extra brightness, larger screen size, performance bumps, and advanced cycle tracking—the battery life is what would (or will) compel me to upgrade in the future, but for now, I have not noticed any problems. I asked my mom if she's noticed any battery degradation on hers, since I bought it at the same time and place as mine, and she said no. She uses hers to track walking workouts, talk on the phone, and monitor her sleep and vitals, too.Stick with the Series 7 or newerThoroughly consider which of the features on newer models are actually important to you before making any buying decision and, if you can, stay above a Series 7. The Series 6 is still functional, but, again, it's a matter of time until the company stops acknowledging that one completely. For now, I have been pleasantly surprised by how well my Series 8 has held up for two years. Its touchscreen has never faltered, the external buttons function perfectly, it syncs to all of my apps and devices with no problem, and it does exactly what I need it to do—which is to tell me how many steps I'm taking and how hard I'm exerting myself at the gym. If you're in the market for a smart watch, I see no reason that an older version shouldn't be considered, as long as it still runs the latest operating system. You can save a chunk of change by sourcing an older model from the resale or refurbished markets and put that money away for when Apple drops something super revolutionary in the wearable space. Apple doesn't sell anything below a Series 10 or SE directly anymore, so if you want a 6, 7, 8, or 9, you'll have to check the resale and refurbished markets. You'll definitely save some money that way (a new Series 10 starts at $399, though it can be found on sale, and the refurbished Series 8 I got is selling right now for $219). Apple Watch Series 8 (Renewed) $209.00 at Amazon $220.00 Save $11.00 Learn More Learn More $209.00 at Amazon $220.00 Save $11.00
    Like
    Love
    Wow
    Sad
    Angry
    166
    0 Comments 0 Shares
  • Lawmakers Form First Extreme Heat Caucus, Citing ‘Deadly Risk’

    May 30, 20252 min readLawmakers Form First Extreme Heat Caucus, Citing ‘Deadly Risk’The House of Representatives’ first caucus to address extreme heat is being launched by a Democrat from the Southwest and a Republican from the NortheastBy Ariel Wittenberg & E&E News A construction worker in Folsom, Calif., during a July 2024 heatwave that brought daytime highs of 110 degrees Fahrenheit. David Paul Morris/Bloomberg via Getty ImagesCLIMATEWIRE | An Arizona Democrat and a New York Republican are teaming up to form the Congressional Extreme Heat Caucus in an attempt to find bipartisan solutions for deadly temperatures.“We hope this caucus can make sure the United States is better prepared for the inevitable increase in temperatures, not just in Arizona and the Southwest but all across the country,” Arizona Rep. Greg Stantonsaid in an interview.He's creating the caucus with New York Rep. Mike Lawler, a moderate Republican who bucked his party last year by expressing support for the nation's first proposed regulation to protect workers from heat by the Occupational Safety and Health Administration.On supporting science journalismIf you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.“Extreme heat kills more Americans each year than any other weather event — over 1,300 lives lost, including 570 in New York alone — and it’s a growing threat to the Hudson Valley,” Lawler said in a statement. “That’s why I’m co-chairing the Heat Caucus to drive real solutions, raise awareness, and protect our communities from this deadly risk.”Stanton said he was excited to team up with Lawler, who understands that heat jeopardizes health even in northern climates.“He is from New York and I’m proud he recognizes how heat is important for workers,” he said.The caucus will be open to House lawmakers who have bipartisan ideas for addressing extreme heat. Noting that many Republicans have slammed OSHA's proposed heat rule, Stanton said the caucus doesn't have to find consensus on every policy, but members should be willing to search for common ground."It is important to have that conversation on what we can come together and agree on because that's how we get legislation passed in this town, even if we don't agree on how far to go," he said.Lawler and Stanton teamed up earlier this spring to protest workforce reductions at the Department of Health and Human Services that could degrade heat-related programs.In April, the pair wrote a letter to Health Secretary Robert F. Kennedy Jr., protesting layoffs that purged the entire staff of the Centers for Disease Control and Prevention’s Division of Environmental Health Science and Practice as well as the Low-Income Home Energy Assistance Program, which helps families pay for heating and cooling.“As we head into another summer — with projections suggesting 2025 will rank again among the warmest years on record, we cannot afford to limit our ability to counter the impacts of extreme heat,” they wrote in April with nine other lawmakers.Among the caucus' priorities is making LIHEAP funding more evenly distributed to southern states to help pay for cooling assistance. The program was initially created to help low-income families pay their heating bills during winter, and the majority of its funding still goes toward cold-weather states.“We have had too many deaths of people in their homes because they are unable to access programs that would help them access air conditioning,” Stanton said.Reprinted from E&E News with permission from POLITICO, LLC. Copyright 2025. E&E News provides essential news for energy and environment professionals.
    #lawmakers #form #first #extreme #heat
    Lawmakers Form First Extreme Heat Caucus, Citing ‘Deadly Risk’
    May 30, 20252 min readLawmakers Form First Extreme Heat Caucus, Citing ‘Deadly Risk’The House of Representatives’ first caucus to address extreme heat is being launched by a Democrat from the Southwest and a Republican from the NortheastBy Ariel Wittenberg & E&E News A construction worker in Folsom, Calif., during a July 2024 heatwave that brought daytime highs of 110 degrees Fahrenheit. David Paul Morris/Bloomberg via Getty ImagesCLIMATEWIRE | An Arizona Democrat and a New York Republican are teaming up to form the Congressional Extreme Heat Caucus in an attempt to find bipartisan solutions for deadly temperatures.“We hope this caucus can make sure the United States is better prepared for the inevitable increase in temperatures, not just in Arizona and the Southwest but all across the country,” Arizona Rep. Greg Stantonsaid in an interview.He's creating the caucus with New York Rep. Mike Lawler, a moderate Republican who bucked his party last year by expressing support for the nation's first proposed regulation to protect workers from heat by the Occupational Safety and Health Administration.On supporting science journalismIf you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.“Extreme heat kills more Americans each year than any other weather event — over 1,300 lives lost, including 570 in New York alone — and it’s a growing threat to the Hudson Valley,” Lawler said in a statement. “That’s why I’m co-chairing the Heat Caucus to drive real solutions, raise awareness, and protect our communities from this deadly risk.”Stanton said he was excited to team up with Lawler, who understands that heat jeopardizes health even in northern climates.“He is from New York and I’m proud he recognizes how heat is important for workers,” he said.The caucus will be open to House lawmakers who have bipartisan ideas for addressing extreme heat. Noting that many Republicans have slammed OSHA's proposed heat rule, Stanton said the caucus doesn't have to find consensus on every policy, but members should be willing to search for common ground."It is important to have that conversation on what we can come together and agree on because that's how we get legislation passed in this town, even if we don't agree on how far to go," he said.Lawler and Stanton teamed up earlier this spring to protest workforce reductions at the Department of Health and Human Services that could degrade heat-related programs.In April, the pair wrote a letter to Health Secretary Robert F. Kennedy Jr., protesting layoffs that purged the entire staff of the Centers for Disease Control and Prevention’s Division of Environmental Health Science and Practice as well as the Low-Income Home Energy Assistance Program, which helps families pay for heating and cooling.“As we head into another summer — with projections suggesting 2025 will rank again among the warmest years on record, we cannot afford to limit our ability to counter the impacts of extreme heat,” they wrote in April with nine other lawmakers.Among the caucus' priorities is making LIHEAP funding more evenly distributed to southern states to help pay for cooling assistance. The program was initially created to help low-income families pay their heating bills during winter, and the majority of its funding still goes toward cold-weather states.“We have had too many deaths of people in their homes because they are unable to access programs that would help them access air conditioning,” Stanton said.Reprinted from E&E News with permission from POLITICO, LLC. Copyright 2025. E&E News provides essential news for energy and environment professionals. #lawmakers #form #first #extreme #heat
    WWW.SCIENTIFICAMERICAN.COM
    Lawmakers Form First Extreme Heat Caucus, Citing ‘Deadly Risk’
    May 30, 20252 min readLawmakers Form First Extreme Heat Caucus, Citing ‘Deadly Risk’The House of Representatives’ first caucus to address extreme heat is being launched by a Democrat from the Southwest and a Republican from the NortheastBy Ariel Wittenberg & E&E News A construction worker in Folsom, Calif., during a July 2024 heatwave that brought daytime highs of 110 degrees Fahrenheit. David Paul Morris/Bloomberg via Getty ImagesCLIMATEWIRE | An Arizona Democrat and a New York Republican are teaming up to form the Congressional Extreme Heat Caucus in an attempt to find bipartisan solutions for deadly temperatures.“We hope this caucus can make sure the United States is better prepared for the inevitable increase in temperatures, not just in Arizona and the Southwest but all across the country,” Arizona Rep. Greg Stanton (D) said in an interview.He's creating the caucus with New York Rep. Mike Lawler, a moderate Republican who bucked his party last year by expressing support for the nation's first proposed regulation to protect workers from heat by the Occupational Safety and Health Administration.On supporting science journalismIf you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.“Extreme heat kills more Americans each year than any other weather event — over 1,300 lives lost, including 570 in New York alone — and it’s a growing threat to the Hudson Valley,” Lawler said in a statement. “That’s why I’m co-chairing the Heat Caucus to drive real solutions, raise awareness, and protect our communities from this deadly risk.”Stanton said he was excited to team up with Lawler, who understands that heat jeopardizes health even in northern climates.“He is from New York and I’m proud he recognizes how heat is important for workers,” he said.The caucus will be open to House lawmakers who have bipartisan ideas for addressing extreme heat. Noting that many Republicans have slammed OSHA's proposed heat rule, Stanton said the caucus doesn't have to find consensus on every policy, but members should be willing to search for common ground."It is important to have that conversation on what we can come together and agree on because that's how we get legislation passed in this town, even if we don't agree on how far to go," he said.Lawler and Stanton teamed up earlier this spring to protest workforce reductions at the Department of Health and Human Services that could degrade heat-related programs.In April, the pair wrote a letter to Health Secretary Robert F. Kennedy Jr., protesting layoffs that purged the entire staff of the Centers for Disease Control and Prevention’s Division of Environmental Health Science and Practice as well as the Low-Income Home Energy Assistance Program, which helps families pay for heating and cooling.“As we head into another summer — with projections suggesting 2025 will rank again among the warmest years on record, we cannot afford to limit our ability to counter the impacts of extreme heat,” they wrote in April with nine other lawmakers.Among the caucus' priorities is making LIHEAP funding more evenly distributed to southern states to help pay for cooling assistance. The program was initially created to help low-income families pay their heating bills during winter, and the majority of its funding still goes toward cold-weather states.“We have had too many deaths of people in their homes because they are unable to access programs that would help them access air conditioning,” Stanton said.Reprinted from E&E News with permission from POLITICO, LLC. Copyright 2025. E&E News provides essential news for energy and environment professionals.
    0 Comments 0 Shares
  • Yep, X was down again

    Elon Musk’s X experienced an outage on Friday, according to user reports and crowdsourced data from sites like Downdetector. From what TechCrunch staff has witnessed firsthand, the X website and app still loaded, but various features were not functioning properly, and entire feeds — like X’s algorithmic For You feed — didn’t load any content.
    Others reported that photos weren’t loading, X’s banking service XMoney wasn’t working, the timeline was bare, posts weren’t publishing, and search wasn’t returning results, among other things.
    Based on the timing of the user reports, the outage and errors seem to have begun just before 4 p.m. ET on Friday and seem to be resolving as of the time of writing.
    During the outage, thousands of users submitted reports to Downdetector, and others more directly complained in X posts and on other social networks, like Bluesky and Threads.
    This was the second outage for X in just over a week, as the service also failed on May 22, when X experienced a number of issues over a 24-hour period. During that time, messages wouldn’t load, timelines wouldn’t update, and some posts couldn’t be seen without refreshing the webpage.
    The current outage may be short-lived. We noticed that some features seemed to either be intermittently working, or worked on X’s mobile app but not on the desktop, for instance. Then, people began reporting that the service seemed to have come back up.
    The X Developer API v2 is still showing degraded status, however.

    Techcrunch event

    now through June 4 for TechCrunch Sessions: AI
    on your ticket to TC Sessions: AI—and get 50% off a second. Hear from leaders at OpenAI, Anthropic, Khosla Ventures, and more during a full day of expert insights, hands-on workshops, and high-impact networking. These low-rate deals disappear when the doors open on June 5.

    Exhibit at TechCrunch Sessions: AI
    Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you’ve built — without the big spend. Available through May 9 or while tables last.

    Berkeley, CA
    |
    June 5

    REGISTER NOW

    Just ahead of the outage, X began rolling out its new DM feature, XChat, into beta after pausing encrypted DMs as it worked to make improvements. It’s not clear if any of these changes impacted X’s stability, and the company no longer responds to press inquiries since Musk took ownership.
    #yep #was #down #again
    Yep, X was down again
    Elon Musk’s X experienced an outage on Friday, according to user reports and crowdsourced data from sites like Downdetector. From what TechCrunch staff has witnessed firsthand, the X website and app still loaded, but various features were not functioning properly, and entire feeds — like X’s algorithmic For You feed — didn’t load any content. Others reported that photos weren’t loading, X’s banking service XMoney wasn’t working, the timeline was bare, posts weren’t publishing, and search wasn’t returning results, among other things. Based on the timing of the user reports, the outage and errors seem to have begun just before 4 p.m. ET on Friday and seem to be resolving as of the time of writing. During the outage, thousands of users submitted reports to Downdetector, and others more directly complained in X posts and on other social networks, like Bluesky and Threads. This was the second outage for X in just over a week, as the service also failed on May 22, when X experienced a number of issues over a 24-hour period. During that time, messages wouldn’t load, timelines wouldn’t update, and some posts couldn’t be seen without refreshing the webpage. The current outage may be short-lived. We noticed that some features seemed to either be intermittently working, or worked on X’s mobile app but not on the desktop, for instance. Then, people began reporting that the service seemed to have come back up. The X Developer API v2 is still showing degraded status, however. Techcrunch event now through June 4 for TechCrunch Sessions: AI on your ticket to TC Sessions: AI—and get 50% off a second. Hear from leaders at OpenAI, Anthropic, Khosla Ventures, and more during a full day of expert insights, hands-on workshops, and high-impact networking. These low-rate deals disappear when the doors open on June 5. Exhibit at TechCrunch Sessions: AI Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you’ve built — without the big spend. Available through May 9 or while tables last. Berkeley, CA | June 5 REGISTER NOW Just ahead of the outage, X began rolling out its new DM feature, XChat, into beta after pausing encrypted DMs as it worked to make improvements. It’s not clear if any of these changes impacted X’s stability, and the company no longer responds to press inquiries since Musk took ownership. #yep #was #down #again
    TECHCRUNCH.COM
    Yep, X was down again
    Elon Musk’s X experienced an outage on Friday, according to user reports and crowdsourced data from sites like Downdetector. From what TechCrunch staff has witnessed firsthand, the X website and app still loaded, but various features were not functioning properly, and entire feeds — like X’s algorithmic For You feed — didn’t load any content. Others reported that photos weren’t loading, X’s banking service XMoney wasn’t working, the timeline was bare, posts weren’t publishing, and search wasn’t returning results, among other things. Based on the timing of the user reports, the outage and errors seem to have begun just before 4 p.m. ET on Friday and seem to be resolving as of the time of writing. During the outage, thousands of users submitted reports to Downdetector, and others more directly complained in X posts and on other social networks, like Bluesky and Threads. This was the second outage for X in just over a week, as the service also failed on May 22, when X experienced a number of issues over a 24-hour period. During that time, messages wouldn’t load, timelines wouldn’t update, and some posts couldn’t be seen without refreshing the webpage. The current outage may be short-lived. We noticed that some features seemed to either be intermittently working, or worked on X’s mobile app but not on the desktop, for instance. Then, people began reporting that the service seemed to have come back up. The X Developer API v2 is still showing degraded status, however. Techcrunch event Save now through June 4 for TechCrunch Sessions: AI Save $300 on your ticket to TC Sessions: AI—and get 50% off a second. Hear from leaders at OpenAI, Anthropic, Khosla Ventures, and more during a full day of expert insights, hands-on workshops, and high-impact networking. These low-rate deals disappear when the doors open on June 5. Exhibit at TechCrunch Sessions: AI Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you’ve built — without the big spend. Available through May 9 or while tables last. Berkeley, CA | June 5 REGISTER NOW Just ahead of the outage, X began rolling out its new DM feature, XChat, into beta after pausing encrypted DMs as it worked to make improvements. It’s not clear if any of these changes impacted X’s stability, and the company no longer responds to press inquiries since Musk took ownership.
    0 Comments 0 Shares
  • Scientists Investigate 2.2-Million-Year-Old Tooth Enamel to Unravel the Mysteries of Ancient Human Relatives

    Scientists Investigate 2.2-Million-Year-Old Tooth Enamel to Unravel the Mysteries of Ancient Human Relatives
    By studying proteins preserved in teeth, researchers determined the sex of four Paranthropus robustus individuals that lived in southern Africa

    This skull of a 1.8-million-year-old Paranthropus robustus individual was unearthed in South Africa, but it was not one of the fossils included in the study.
    José Braga and Didier Descouens via Wikimedia Commons under CC BY-SA 4.0

    Paranthropus robustus was a prehistoric, two-legged human relative that lived in southern Africa roughly two million years ago. Scientists have unearthed various P. robustus fossils, but because of the specimens’ age, they haven’t been able to glean much from them.
    Now, using a novel method, researchers say they’ve determined the sex of four P. robustus individuals by studying their teeth. The new analysis, published Thursday in the journal Science, also reveals insights into the genetic diversity of the broader Paranthropus genus.
    For the study, the team analyzed teeth discovered in a cave at the Swartkrans paleoanthropological site in South Africa, reports Live Science’s Kristina Killgrove. Because the fossils were so old—dating to between 1.8 million and 2.2 million years ago—researchers could not recover ancient DNA from them. So, instead, they turned to the relatively new field of paleoproteomics, or the study of preserved proteins.
    Ancient DNA degrades over time, particularly in hot places like southern Africa. So far, scientists studying hominin remains on the continent have only been able to successfully sequence DNA from material that’s less than 20,000 years old. But proteins can survive much longer than DNA, particularly in hard tooth enamel.
    When they analyzed the fossilized enamel of the P. robustus teeth, the researchers were able to identify specific protein sequences found only in males. This allowed them to determine that two of the P. robustus individuals were male and two were female.
    They were surprised to learn that one individual, named SK 835, was male. Based on the comparatively small size of that individual’s teeth, researchers had previously thought SK 835 was female, since male hominins tend to be larger than females, on average.
    This marks an important finding, as it supports the idea that dental measurements are not the most reliable way to determine the sex of ancient hominins.
    “Paleoanthropologists have long known that our use of tooth size to estimate sex was fraught with uncertainty, but it was the best we had,” says Paul Constantino, a paleoanthropologist at Saint Michael’s College who was not involved with the research, to ScienceNews’ Bruce Bower. “Being able to accurately identify the sex of fossils using proteins will be hugely impactful.”
    Further analyses of the fossils’ amino acid sequences revealed that SK 835 was less closely related to the other three individuals than they were to each other. That means it’s possible SK 835 represents a different species altogether—maybe the newly proposed Paranthropus capensis. After all, the team writes in the paper, the recent description of that species shows Paranthropus diversity “is currently underestimated and needs to be investigated further.”
    Or, perhaps the small size of SK 835’s teeth can be explained by microevolution—variations between P. robustus groups living at different sites. Scientists say they will need to get their hands on more Paranthropus fossils from multiple places to know for certain, per Science News.
    Moving forward, researchers hope they can one day use paleoproteomic methods to map the entire human family tree. Right now, however, their “ability to distinguish between different species is limited by the small number of different proteins present in enamel,” three of the study authors tell Live Science in an email.
    Scientists are also exploring other protein-sequencing techniques that are less destructive to fossil samples than the current methods. In the meantime, they’re excited about the potential of paleoproteomics to help them learn even more about humans’ ancient ancestors.
    “It opens entirely new avenues for understanding our evolutionary history,” study co-author Marc Dickinson, a chemist at the University of York in England, says in a statement.

    Get the latest stories in your inbox every weekday.
    #scientists #investigate #22millionyearold #tooth #enamel
    Scientists Investigate 2.2-Million-Year-Old Tooth Enamel to Unravel the Mysteries of Ancient Human Relatives
    Scientists Investigate 2.2-Million-Year-Old Tooth Enamel to Unravel the Mysteries of Ancient Human Relatives By studying proteins preserved in teeth, researchers determined the sex of four Paranthropus robustus individuals that lived in southern Africa This skull of a 1.8-million-year-old Paranthropus robustus individual was unearthed in South Africa, but it was not one of the fossils included in the study. José Braga and Didier Descouens via Wikimedia Commons under CC BY-SA 4.0 Paranthropus robustus was a prehistoric, two-legged human relative that lived in southern Africa roughly two million years ago. Scientists have unearthed various P. robustus fossils, but because of the specimens’ age, they haven’t been able to glean much from them. Now, using a novel method, researchers say they’ve determined the sex of four P. robustus individuals by studying their teeth. The new analysis, published Thursday in the journal Science, also reveals insights into the genetic diversity of the broader Paranthropus genus. For the study, the team analyzed teeth discovered in a cave at the Swartkrans paleoanthropological site in South Africa, reports Live Science’s Kristina Killgrove. Because the fossils were so old—dating to between 1.8 million and 2.2 million years ago—researchers could not recover ancient DNA from them. So, instead, they turned to the relatively new field of paleoproteomics, or the study of preserved proteins. Ancient DNA degrades over time, particularly in hot places like southern Africa. So far, scientists studying hominin remains on the continent have only been able to successfully sequence DNA from material that’s less than 20,000 years old. But proteins can survive much longer than DNA, particularly in hard tooth enamel. When they analyzed the fossilized enamel of the P. robustus teeth, the researchers were able to identify specific protein sequences found only in males. This allowed them to determine that two of the P. robustus individuals were male and two were female. They were surprised to learn that one individual, named SK 835, was male. Based on the comparatively small size of that individual’s teeth, researchers had previously thought SK 835 was female, since male hominins tend to be larger than females, on average. This marks an important finding, as it supports the idea that dental measurements are not the most reliable way to determine the sex of ancient hominins. “Paleoanthropologists have long known that our use of tooth size to estimate sex was fraught with uncertainty, but it was the best we had,” says Paul Constantino, a paleoanthropologist at Saint Michael’s College who was not involved with the research, to ScienceNews’ Bruce Bower. “Being able to accurately identify the sex of fossils using proteins will be hugely impactful.” Further analyses of the fossils’ amino acid sequences revealed that SK 835 was less closely related to the other three individuals than they were to each other. That means it’s possible SK 835 represents a different species altogether—maybe the newly proposed Paranthropus capensis. After all, the team writes in the paper, the recent description of that species shows Paranthropus diversity “is currently underestimated and needs to be investigated further.” Or, perhaps the small size of SK 835’s teeth can be explained by microevolution—variations between P. robustus groups living at different sites. Scientists say they will need to get their hands on more Paranthropus fossils from multiple places to know for certain, per Science News. Moving forward, researchers hope they can one day use paleoproteomic methods to map the entire human family tree. Right now, however, their “ability to distinguish between different species is limited by the small number of different proteins present in enamel,” three of the study authors tell Live Science in an email. Scientists are also exploring other protein-sequencing techniques that are less destructive to fossil samples than the current methods. In the meantime, they’re excited about the potential of paleoproteomics to help them learn even more about humans’ ancient ancestors. “It opens entirely new avenues for understanding our evolutionary history,” study co-author Marc Dickinson, a chemist at the University of York in England, says in a statement. Get the latest stories in your inbox every weekday. #scientists #investigate #22millionyearold #tooth #enamel
    WWW.SMITHSONIANMAG.COM
    Scientists Investigate 2.2-Million-Year-Old Tooth Enamel to Unravel the Mysteries of Ancient Human Relatives
    Scientists Investigate 2.2-Million-Year-Old Tooth Enamel to Unravel the Mysteries of Ancient Human Relatives By studying proteins preserved in teeth, researchers determined the sex of four Paranthropus robustus individuals that lived in southern Africa This skull of a 1.8-million-year-old Paranthropus robustus individual was unearthed in South Africa, but it was not one of the fossils included in the study. José Braga and Didier Descouens via Wikimedia Commons under CC BY-SA 4.0 Paranthropus robustus was a prehistoric, two-legged human relative that lived in southern Africa roughly two million years ago. Scientists have unearthed various P. robustus fossils, but because of the specimens’ age, they haven’t been able to glean much from them. Now, using a novel method, researchers say they’ve determined the sex of four P. robustus individuals by studying their teeth. The new analysis, published Thursday in the journal Science, also reveals insights into the genetic diversity of the broader Paranthropus genus. For the study, the team analyzed teeth discovered in a cave at the Swartkrans paleoanthropological site in South Africa, reports Live Science’s Kristina Killgrove. Because the fossils were so old—dating to between 1.8 million and 2.2 million years ago—researchers could not recover ancient DNA from them. So, instead, they turned to the relatively new field of paleoproteomics, or the study of preserved proteins. Ancient DNA degrades over time, particularly in hot places like southern Africa. So far, scientists studying hominin remains on the continent have only been able to successfully sequence DNA from material that’s less than 20,000 years old. But proteins can survive much longer than DNA, particularly in hard tooth enamel. When they analyzed the fossilized enamel of the P. robustus teeth, the researchers were able to identify specific protein sequences found only in males. This allowed them to determine that two of the P. robustus individuals were male and two were female. They were surprised to learn that one individual, named SK 835, was male. Based on the comparatively small size of that individual’s teeth, researchers had previously thought SK 835 was female, since male hominins tend to be larger than females, on average. This marks an important finding, as it supports the idea that dental measurements are not the most reliable way to determine the sex of ancient hominins. “Paleoanthropologists have long known that our use of tooth size to estimate sex was fraught with uncertainty, but it was the best we had,” says Paul Constantino, a paleoanthropologist at Saint Michael’s College who was not involved with the research, to ScienceNews’ Bruce Bower. “Being able to accurately identify the sex of fossils using proteins will be hugely impactful.” Further analyses of the fossils’ amino acid sequences revealed that SK 835 was less closely related to the other three individuals than they were to each other. That means it’s possible SK 835 represents a different species altogether—maybe the newly proposed Paranthropus capensis. After all, the team writes in the paper, the recent description of that species shows Paranthropus diversity “is currently underestimated and needs to be investigated further.” Or, perhaps the small size of SK 835’s teeth can be explained by microevolution—variations between P. robustus groups living at different sites. Scientists say they will need to get their hands on more Paranthropus fossils from multiple places to know for certain, per Science News. Moving forward, researchers hope they can one day use paleoproteomic methods to map the entire human family tree. Right now, however, their “ability to distinguish between different species is limited by the small number of different proteins present in enamel,” three of the study authors tell Live Science in an email. Scientists are also exploring other protein-sequencing techniques that are less destructive to fossil samples than the current methods. In the meantime, they’re excited about the potential of paleoproteomics to help them learn even more about humans’ ancient ancestors. “It opens entirely new avenues for understanding our evolutionary history [in Africa],” study co-author Marc Dickinson, a chemist at the University of York in England, says in a statement. Get the latest stories in your inbox every weekday.
    0 Comments 0 Shares
  • This AI Paper Introduces ARM and Ada-GRPO: Adaptive Reasoning Models for Efficient and Scalable Problem-Solving

    Reasoning tasks are a fundamental aspect of artificial intelligence, encompassing areas like commonsense understanding, mathematical problem-solving, and symbolic reasoning. These tasks often involve multiple steps of logical inference, which large language modelsattempt to mimic through structured approaches such as chain-of-thoughtprompting. However, as LLMs grow in size and complexity, they tend to produce longer outputs across all tasks, regardless of difficulty, leading to significant inefficiencies. The field has been striving to balance the depth of reasoning with computational cost while also ensuring that models can adapt their reasoning strategies to meet the unique needs of each problem.
    A key issue with current reasoning models is the inability to tailor the reasoning process to different task complexities. Most models, including well-known ones like OpenAI’s o1 and DeepSeek-R1, apply a uniform strategy—typically relying on Long CoT across all tasks. This causes the “overthinking” problem, where models generate unnecessarily verbose explanations for simpler tasks. Not only does this waste resources, but it also degrades accuracy, as excessive reasoning can introduce irrelevant information. Approaches such as prompt-guided generation or token budget estimation have attempted to mitigate this issue. Still, these methods are limited by their dependence on predefined assumptions, which are not always reliable for diverse tasks.

    Attempts to address these issues include methods like GRPO, length-penalty mechanisms, and rule-based prompt controls. While GRPO enables models to learn different reasoning strategies by rewarding correct answers, it leads to a “format collapse,” where models increasingly rely on Long CoT, crowding out more efficient formats, such as Short CoT or Direct Answer. Length-penalty techniques, such as those applied in methods like THINKPRUNE, control output length during training or inference, but often at the cost of reduced accuracy, especially in complex problem-solving tasks. These solutions struggle to achieve a consistent trade-off between reasoning effectiveness and efficiency, highlighting the need for an adaptive approach.
    A team of researchers from Fudan University and Ohio State University introduced the Adaptive Reasoning Model, which dynamically adjusts reasoning formats based on task difficulty. ARM supports four distinct reasoning styles: Direct Answer for simple tasks, Short CoT for concise reasoning, Code for structured problem-solving, and Long CoT for deep multi-step reasoning. It operates in an Adaptive Mode by default, automatically selecting the appropriate format, and also provides Instruction-Guided and Consensus-Guided Modes for explicit control or aggregation across formats. The key innovation lies in its training process, which utilizes Ada-GRPO, an extension of GRPO that introduces a format diversity reward mechanism. This prevents the dominance of Long CoT and ensures that ARM continues to explore and use simpler reasoning formats when appropriate.

    The ARM methodology is built on a two-stage framework. First, the model undergoes Supervised Fine-Tuningwith 10.8K questions, each annotated across four reasoning formats, sourced from datasets like AQuA-Rat and generated with tools such as GPT-4o and DeepSeek-R1. This stage teaches the model the structure of each reasoning format but does not instill adaptiveness. The second stage applies Ada-GRPO, where the model receives scaled rewards for using less frequent formats, such as Direct Answer or Short CoT. A decaying factor ensures that this reward gradually shifts back to accuracy as training progresses, preventing long-term bias toward inefficient exploration. This structure enables ARM to avoid format collapse and dynamically match reasoning strategies to task difficulty, achieving a balance of efficiency and performance.

    ARM demonstrated impressive results across various benchmarks, including commonsense, mathematical, and symbolic reasoning tasks. It reduced token usage by an average of 30%, with reductions as high as 70% for simpler tasks, compared to models relying solely on Long CoT. ARM achieved a 2x training speedup over GRPO-based models, accelerating model development without sacrificing accuracy. For example, ARM-7B achieved 75.9% accuracy on the challenging AIME’25 task while using 32.5% fewer tokens. ARM-14B achieved 85.6% accuracy on OpenBookQA and 86.4% accuracy on the MATH dataset, with a token usage reduction of over 30% compared to Qwen2.5SFT+GRPO models. These numbers demonstrate ARM’s ability to maintain competitive performance while delivering significant efficiency gains.
    Overall, the Adaptive Reasoning Model addresses the persistent inefficiency of reasoning models by enabling the adaptive selection of reasoning formats based on task difficulty. The introduction of Ada-GRPO and the multi-format training framework ensures that models no longer waste resources on overthinking. Instead, ARM provides a flexible and practical solution for balancing accuracy and computational cost in reasoning tasks, making it a promising approach for scalable and efficient large language models.

    Check out the Paper, Models on Hugging Face and Project Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.
    NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces WEB-SHEPHERD: A Process Reward Model for Web Agents with 40K Dataset and 10× Cost EfficiencyNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces MMaDA: A Unified Multimodal Diffusion Model for Textual Reasoning, Visual Understanding, and Image GenerationNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces Differentiable MCMC Layers: A New AI Framework for Learning with Inexact Combinatorial Solvers in Neural NetworksNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces GRIT: A Method for Teaching MLLMs to Reason with Images by Interleaving Text and Visual Grounding
    #this #paper #introduces #arm #adagrpo
    This AI Paper Introduces ARM and Ada-GRPO: Adaptive Reasoning Models for Efficient and Scalable Problem-Solving
    Reasoning tasks are a fundamental aspect of artificial intelligence, encompassing areas like commonsense understanding, mathematical problem-solving, and symbolic reasoning. These tasks often involve multiple steps of logical inference, which large language modelsattempt to mimic through structured approaches such as chain-of-thoughtprompting. However, as LLMs grow in size and complexity, they tend to produce longer outputs across all tasks, regardless of difficulty, leading to significant inefficiencies. The field has been striving to balance the depth of reasoning with computational cost while also ensuring that models can adapt their reasoning strategies to meet the unique needs of each problem. A key issue with current reasoning models is the inability to tailor the reasoning process to different task complexities. Most models, including well-known ones like OpenAI’s o1 and DeepSeek-R1, apply a uniform strategy—typically relying on Long CoT across all tasks. This causes the “overthinking” problem, where models generate unnecessarily verbose explanations for simpler tasks. Not only does this waste resources, but it also degrades accuracy, as excessive reasoning can introduce irrelevant information. Approaches such as prompt-guided generation or token budget estimation have attempted to mitigate this issue. Still, these methods are limited by their dependence on predefined assumptions, which are not always reliable for diverse tasks. Attempts to address these issues include methods like GRPO, length-penalty mechanisms, and rule-based prompt controls. While GRPO enables models to learn different reasoning strategies by rewarding correct answers, it leads to a “format collapse,” where models increasingly rely on Long CoT, crowding out more efficient formats, such as Short CoT or Direct Answer. Length-penalty techniques, such as those applied in methods like THINKPRUNE, control output length during training or inference, but often at the cost of reduced accuracy, especially in complex problem-solving tasks. These solutions struggle to achieve a consistent trade-off between reasoning effectiveness and efficiency, highlighting the need for an adaptive approach. A team of researchers from Fudan University and Ohio State University introduced the Adaptive Reasoning Model, which dynamically adjusts reasoning formats based on task difficulty. ARM supports four distinct reasoning styles: Direct Answer for simple tasks, Short CoT for concise reasoning, Code for structured problem-solving, and Long CoT for deep multi-step reasoning. It operates in an Adaptive Mode by default, automatically selecting the appropriate format, and also provides Instruction-Guided and Consensus-Guided Modes for explicit control or aggregation across formats. The key innovation lies in its training process, which utilizes Ada-GRPO, an extension of GRPO that introduces a format diversity reward mechanism. This prevents the dominance of Long CoT and ensures that ARM continues to explore and use simpler reasoning formats when appropriate. The ARM methodology is built on a two-stage framework. First, the model undergoes Supervised Fine-Tuningwith 10.8K questions, each annotated across four reasoning formats, sourced from datasets like AQuA-Rat and generated with tools such as GPT-4o and DeepSeek-R1. This stage teaches the model the structure of each reasoning format but does not instill adaptiveness. The second stage applies Ada-GRPO, where the model receives scaled rewards for using less frequent formats, such as Direct Answer or Short CoT. A decaying factor ensures that this reward gradually shifts back to accuracy as training progresses, preventing long-term bias toward inefficient exploration. This structure enables ARM to avoid format collapse and dynamically match reasoning strategies to task difficulty, achieving a balance of efficiency and performance. ARM demonstrated impressive results across various benchmarks, including commonsense, mathematical, and symbolic reasoning tasks. It reduced token usage by an average of 30%, with reductions as high as 70% for simpler tasks, compared to models relying solely on Long CoT. ARM achieved a 2x training speedup over GRPO-based models, accelerating model development without sacrificing accuracy. For example, ARM-7B achieved 75.9% accuracy on the challenging AIME’25 task while using 32.5% fewer tokens. ARM-14B achieved 85.6% accuracy on OpenBookQA and 86.4% accuracy on the MATH dataset, with a token usage reduction of over 30% compared to Qwen2.5SFT+GRPO models. These numbers demonstrate ARM’s ability to maintain competitive performance while delivering significant efficiency gains. Overall, the Adaptive Reasoning Model addresses the persistent inefficiency of reasoning models by enabling the adaptive selection of reasoning formats based on task difficulty. The introduction of Ada-GRPO and the multi-format training framework ensures that models no longer waste resources on overthinking. Instead, ARM provides a flexible and practical solution for balancing accuracy and computational cost in reasoning tasks, making it a promising approach for scalable and efficient large language models. Check out the Paper, Models on Hugging Face and Project Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces WEB-SHEPHERD: A Process Reward Model for Web Agents with 40K Dataset and 10× Cost EfficiencyNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces MMaDA: A Unified Multimodal Diffusion Model for Textual Reasoning, Visual Understanding, and Image GenerationNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces Differentiable MCMC Layers: A New AI Framework for Learning with Inexact Combinatorial Solvers in Neural NetworksNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces GRIT: A Method for Teaching MLLMs to Reason with Images by Interleaving Text and Visual Grounding #this #paper #introduces #arm #adagrpo
    WWW.MARKTECHPOST.COM
    This AI Paper Introduces ARM and Ada-GRPO: Adaptive Reasoning Models for Efficient and Scalable Problem-Solving
    Reasoning tasks are a fundamental aspect of artificial intelligence, encompassing areas like commonsense understanding, mathematical problem-solving, and symbolic reasoning. These tasks often involve multiple steps of logical inference, which large language models (LLMs) attempt to mimic through structured approaches such as chain-of-thought (CoT) prompting. However, as LLMs grow in size and complexity, they tend to produce longer outputs across all tasks, regardless of difficulty, leading to significant inefficiencies. The field has been striving to balance the depth of reasoning with computational cost while also ensuring that models can adapt their reasoning strategies to meet the unique needs of each problem. A key issue with current reasoning models is the inability to tailor the reasoning process to different task complexities. Most models, including well-known ones like OpenAI’s o1 and DeepSeek-R1, apply a uniform strategy—typically relying on Long CoT across all tasks. This causes the “overthinking” problem, where models generate unnecessarily verbose explanations for simpler tasks. Not only does this waste resources, but it also degrades accuracy, as excessive reasoning can introduce irrelevant information. Approaches such as prompt-guided generation or token budget estimation have attempted to mitigate this issue. Still, these methods are limited by their dependence on predefined assumptions, which are not always reliable for diverse tasks. Attempts to address these issues include methods like GRPO (Group Relative Policy Optimization), length-penalty mechanisms, and rule-based prompt controls. While GRPO enables models to learn different reasoning strategies by rewarding correct answers, it leads to a “format collapse,” where models increasingly rely on Long CoT, crowding out more efficient formats, such as Short CoT or Direct Answer. Length-penalty techniques, such as those applied in methods like THINKPRUNE, control output length during training or inference, but often at the cost of reduced accuracy, especially in complex problem-solving tasks. These solutions struggle to achieve a consistent trade-off between reasoning effectiveness and efficiency, highlighting the need for an adaptive approach. A team of researchers from Fudan University and Ohio State University introduced the Adaptive Reasoning Model (ARM), which dynamically adjusts reasoning formats based on task difficulty. ARM supports four distinct reasoning styles: Direct Answer for simple tasks, Short CoT for concise reasoning, Code for structured problem-solving, and Long CoT for deep multi-step reasoning. It operates in an Adaptive Mode by default, automatically selecting the appropriate format, and also provides Instruction-Guided and Consensus-Guided Modes for explicit control or aggregation across formats. The key innovation lies in its training process, which utilizes Ada-GRPO, an extension of GRPO that introduces a format diversity reward mechanism. This prevents the dominance of Long CoT and ensures that ARM continues to explore and use simpler reasoning formats when appropriate. The ARM methodology is built on a two-stage framework. First, the model undergoes Supervised Fine-Tuning (SFT) with 10.8K questions, each annotated across four reasoning formats, sourced from datasets like AQuA-Rat and generated with tools such as GPT-4o and DeepSeek-R1. This stage teaches the model the structure of each reasoning format but does not instill adaptiveness. The second stage applies Ada-GRPO, where the model receives scaled rewards for using less frequent formats, such as Direct Answer or Short CoT. A decaying factor ensures that this reward gradually shifts back to accuracy as training progresses, preventing long-term bias toward inefficient exploration. This structure enables ARM to avoid format collapse and dynamically match reasoning strategies to task difficulty, achieving a balance of efficiency and performance. ARM demonstrated impressive results across various benchmarks, including commonsense, mathematical, and symbolic reasoning tasks. It reduced token usage by an average of 30%, with reductions as high as 70% for simpler tasks, compared to models relying solely on Long CoT. ARM achieved a 2x training speedup over GRPO-based models, accelerating model development without sacrificing accuracy. For example, ARM-7B achieved 75.9% accuracy on the challenging AIME’25 task while using 32.5% fewer tokens. ARM-14B achieved 85.6% accuracy on OpenBookQA and 86.4% accuracy on the MATH dataset, with a token usage reduction of over 30% compared to Qwen2.5SFT+GRPO models. These numbers demonstrate ARM’s ability to maintain competitive performance while delivering significant efficiency gains. Overall, the Adaptive Reasoning Model addresses the persistent inefficiency of reasoning models by enabling the adaptive selection of reasoning formats based on task difficulty. The introduction of Ada-GRPO and the multi-format training framework ensures that models no longer waste resources on overthinking. Instead, ARM provides a flexible and practical solution for balancing accuracy and computational cost in reasoning tasks, making it a promising approach for scalable and efficient large language models. Check out the Paper, Models on Hugging Face and Project Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces WEB-SHEPHERD: A Process Reward Model for Web Agents with 40K Dataset and 10× Cost EfficiencyNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces MMaDA: A Unified Multimodal Diffusion Model for Textual Reasoning, Visual Understanding, and Image GenerationNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces Differentiable MCMC Layers: A New AI Framework for Learning with Inexact Combinatorial Solvers in Neural NetworksNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces GRIT: A Method for Teaching MLLMs to Reason with Images by Interleaving Text and Visual Grounding
    5 Comments 0 Shares
  • Google Is Burying the Web Alive

    screen time

    Google Is Burying the Web Alive

    5:00 A.M.

    saved

    this article to read it later.

    Find this story in your account’s ‘Saved for Later’ section.

    Photo-Illustration: Intelligencer

    By now, there’s a good chance you’ve encountered Google’s AI Overviews, possibly thousands of times. Appearing as blurbs at the top of search results, they attempt to settle your queries before you scroll — to offer answers, or relevant information, gleaned from websites that you no longer need to click on. The feature was officially rolled out at Google’s developer conference last year and had been in testing for quite some time before that; on the occasion of this year’s conference, the company characterized it as “one of the most successful launches in Search in the past decade,” a strangely narrow claim that is almost certainly true: Google put AI summaries on top of everything else, for everyone, as if to say, “Before you use our main product, see if this works instead.”
    This year’s conference included another change to search, this one more profound but less aggressively deployed. “AI Mode,” which has similarly been in beta testing for a while, will appear as an option for all users. It’s not like AI Overviews; that is, it’s not an extra module taking up space on a familiar search-results page but rather a complete replacement for conventional search. It’s Google’s “most powerful AI search, with more advanced reasoning and multimodality, and the ability to go deeper through follow-up questions and helpful links to the web,” the company says, “breaking down your question into subtopics and issuing a multitude of queries simultaneously on your behalf.” It’s available to everyone. It’s a lot like using AI-first chatbots that have search functions, like those from OpenAI, Anthropic, and Perplexity, and Google says it’s destined for greater things than a small tab. “As we get feedback, we’ll graduate many features and capabilities from AI Mode right into the core Search experience,” the company says.
    I’ve been testing AI Mode for a few months now, and in some ways it’s less radical than it sounds andfeels. It resembles the initial demos of AI search tools, including those by Google, meaning it responds to many questions with clean, ad-free answers. Sometimes it answers in extended plain language, but it also makes a lot of lists and pulls in familiar little gridded modules — especially when you ask about things you can buy — resulting in a product that, despite its chatty interface, feels an awful lot like … search.
    Again, now you can try it yourself, and your mileage may vary; it hasn’t drawn me away from Google proper for a lot of thoughtless rote tasks, but it’s competitive with ChatGPT for the expanding range of searchish tasks you might attempt with a chatbot.
    From the very first use, however, AI Mode crystallized something about Google’s priorities and in particular its relationship to the web from which the company has drawn, and returned, many hundreds of billions of dollars of value. AI Overviews demoted links, quite literally pushing content from the web down on the page, and summarizing its contents for digestion without clicking:

    Photo-Illustration: Intelligencer; Screenshot: Google

    Meanwhile, AI Mode all but buries them, not just summarizing their content for reading within Google’s product but inviting you to explore and expand on those summaries by asking more questions, rather than clicking out. In many cases, links are retained merely to provide backup and sourcing, included as footnotes and appendices rather than destinations:

    Photo-Illustration: Intelligencer; Screenshot: Google

    This is typical with AI search tools and all but inevitable now that such things are possible. In terms of automation, this means companies like OpenAI and Google are mechanizing some of the “work” that goes into using tools like Google search, removing, when possible, the step where users leave their platforms and reducing, in theory, the time and effort it takes to navigate to somewhere else when necessary. In even broader terms — contra Google’s effort to brand this as “going beyond information to intelligence” — this is an example of how LLMs offer different ways to interact with much of the same information: summarization rather than retrieval, regeneration rather than fact-finding, and vibe-y reconstruction over deterministic reproduction.
    This is interesting to think about and often compelling to use but leaves unresolved one of the first questions posed by chatbots-as-search: Where will they get all the data they need to continue to work well? When Microsoft and Google showed off their first neo-search mockups in 2023, which are pretty close to today’s AI mode, it revealed a dilemma:
    Search engines still provide the de facto gateway to the broader web, and have a deeply codependent relationship with the people and companies whose content they crawl, index, and rank; a Google that instantly but sometimes unreliably summarizes the websites to which it used to send people would destroy that relationship, and probably a lot of websites, including the ones on which its models were trained.
    And, well, yep! Now, both AI Overviews and AI Mode, when they aren’t occasionally hallucinating, produce relatively clean answers that benefit in contrast to increasingly degraded regular search results on Google, which are full of hyperoptimized and duplicative spamlike content designed first and foremost with the demands of Google’s ranking algorithms and advertising in mind. AI Mode feels one step further removed from that ecosystem and once again looks good in contrast, a placid textual escape from Google’s own mountain of links that look like ads and ads that look like links. In its drive to embrace AI, Google is further concealing the raw material that fuels it, demoting links as it continues to ingest them for abstraction. Google may still retain plenty of attention to monetize and perhaps keep even more of it for itself, now that it doesn’t need to send people elsewhere; in the process, however, it really is starving the web that supplies it with data on which to train and
    Two years later, Google has become more explicit about the extent to which it’s moving on from the “you provide us results to rank, and we send you visitors to monetize” bargain, with the head of search telling The Verge, “I think the search results page was a construct.” Which is true, as far as it goes, but also a remarkable thing to hear from a company that’s communicated carefully and voluminously to website operators about small updates to its search algorithms for years.
    I don’t doubt that Google has been thinking about this stuff for a while and that there are people at the company who deem it strategically irrelevant or at least of secondary importance to winning the AI race — the fate of the web might not sound terribly important when your bosses are talking nonstop about cashing out its accumulated data and expertise for AGI. I also don’t want to be precious about the web as it actually exists in 2025, nor do I suggest that websites working with or near companies like Meta and Google should have expected anything but temporary, incidental alignment with their businesses. If I had to guess, the future of Google search looks more like AI Overviews than AI mode — a jumble of widgets and modules including and united by AI-generated content, rather than a clean break — if only for purposes of sustaining Google’s multi-hundred-billion-dollar advertising business.
    But I also don’t want to assume Google knows exactly how this stuff will play out for Google, much less what it will actually mean for millions of websites, and their visitors, if Google stops sending as many people beyond its results pages. Google’s push into productizing generative AI is substantially fear-driven, faith-based, and informed by the actions of competitors that are far less invested in and dependent on the vast collection of behaviors — websites full of content authentic and inauthentic, volunteer and commercial, social and antisocial, archival and up-to-date — that make up what’s left of the web and have far less to lose. Maybe, in a few years, a fresh economy will grow around the new behaviors produced by searchlike AI tools; perhaps companies like OpenAI and Google will sign a bunch more licensing deals; conceivably, this style of search automation simply collapses the marketplace supported by search, leveraging training based on years of scraped data to do more with less. In any case, the signals from Google — despite its unconvincing suggestions to the contrary — are clear: It’ll do anything to win the AI race. If that means burying the web, then so be it.

    Sign Up for the Intelligencer Newsletter
    Daily news about the politics, business, and technology shaping our world.

    This site is protected by reCAPTCHA and the Google
    Privacy Policy and
    Terms of Service apply.

    By submitting your email, you agree to our Terms and Privacy Notice and to receive email correspondence from us.

    Tags:

    Google Is Burying the Web Alive
    #google #burying #web #alive
    Google Is Burying the Web Alive
    screen time Google Is Burying the Web Alive 5:00 A.M. saved this article to read it later. Find this story in your account’s ‘Saved for Later’ section. Photo-Illustration: Intelligencer By now, there’s a good chance you’ve encountered Google’s AI Overviews, possibly thousands of times. Appearing as blurbs at the top of search results, they attempt to settle your queries before you scroll — to offer answers, or relevant information, gleaned from websites that you no longer need to click on. The feature was officially rolled out at Google’s developer conference last year and had been in testing for quite some time before that; on the occasion of this year’s conference, the company characterized it as “one of the most successful launches in Search in the past decade,” a strangely narrow claim that is almost certainly true: Google put AI summaries on top of everything else, for everyone, as if to say, “Before you use our main product, see if this works instead.” This year’s conference included another change to search, this one more profound but less aggressively deployed. “AI Mode,” which has similarly been in beta testing for a while, will appear as an option for all users. It’s not like AI Overviews; that is, it’s not an extra module taking up space on a familiar search-results page but rather a complete replacement for conventional search. It’s Google’s “most powerful AI search, with more advanced reasoning and multimodality, and the ability to go deeper through follow-up questions and helpful links to the web,” the company says, “breaking down your question into subtopics and issuing a multitude of queries simultaneously on your behalf.” It’s available to everyone. It’s a lot like using AI-first chatbots that have search functions, like those from OpenAI, Anthropic, and Perplexity, and Google says it’s destined for greater things than a small tab. “As we get feedback, we’ll graduate many features and capabilities from AI Mode right into the core Search experience,” the company says. I’ve been testing AI Mode for a few months now, and in some ways it’s less radical than it sounds andfeels. It resembles the initial demos of AI search tools, including those by Google, meaning it responds to many questions with clean, ad-free answers. Sometimes it answers in extended plain language, but it also makes a lot of lists and pulls in familiar little gridded modules — especially when you ask about things you can buy — resulting in a product that, despite its chatty interface, feels an awful lot like … search. Again, now you can try it yourself, and your mileage may vary; it hasn’t drawn me away from Google proper for a lot of thoughtless rote tasks, but it’s competitive with ChatGPT for the expanding range of searchish tasks you might attempt with a chatbot. From the very first use, however, AI Mode crystallized something about Google’s priorities and in particular its relationship to the web from which the company has drawn, and returned, many hundreds of billions of dollars of value. AI Overviews demoted links, quite literally pushing content from the web down on the page, and summarizing its contents for digestion without clicking: Photo-Illustration: Intelligencer; Screenshot: Google Meanwhile, AI Mode all but buries them, not just summarizing their content for reading within Google’s product but inviting you to explore and expand on those summaries by asking more questions, rather than clicking out. In many cases, links are retained merely to provide backup and sourcing, included as footnotes and appendices rather than destinations: Photo-Illustration: Intelligencer; Screenshot: Google This is typical with AI search tools and all but inevitable now that such things are possible. In terms of automation, this means companies like OpenAI and Google are mechanizing some of the “work” that goes into using tools like Google search, removing, when possible, the step where users leave their platforms and reducing, in theory, the time and effort it takes to navigate to somewhere else when necessary. In even broader terms — contra Google’s effort to brand this as “going beyond information to intelligence” — this is an example of how LLMs offer different ways to interact with much of the same information: summarization rather than retrieval, regeneration rather than fact-finding, and vibe-y reconstruction over deterministic reproduction. This is interesting to think about and often compelling to use but leaves unresolved one of the first questions posed by chatbots-as-search: Where will they get all the data they need to continue to work well? When Microsoft and Google showed off their first neo-search mockups in 2023, which are pretty close to today’s AI mode, it revealed a dilemma: Search engines still provide the de facto gateway to the broader web, and have a deeply codependent relationship with the people and companies whose content they crawl, index, and rank; a Google that instantly but sometimes unreliably summarizes the websites to which it used to send people would destroy that relationship, and probably a lot of websites, including the ones on which its models were trained. And, well, yep! Now, both AI Overviews and AI Mode, when they aren’t occasionally hallucinating, produce relatively clean answers that benefit in contrast to increasingly degraded regular search results on Google, which are full of hyperoptimized and duplicative spamlike content designed first and foremost with the demands of Google’s ranking algorithms and advertising in mind. AI Mode feels one step further removed from that ecosystem and once again looks good in contrast, a placid textual escape from Google’s own mountain of links that look like ads and ads that look like links. In its drive to embrace AI, Google is further concealing the raw material that fuels it, demoting links as it continues to ingest them for abstraction. Google may still retain plenty of attention to monetize and perhaps keep even more of it for itself, now that it doesn’t need to send people elsewhere; in the process, however, it really is starving the web that supplies it with data on which to train and Two years later, Google has become more explicit about the extent to which it’s moving on from the “you provide us results to rank, and we send you visitors to monetize” bargain, with the head of search telling The Verge, “I think the search results page was a construct.” Which is true, as far as it goes, but also a remarkable thing to hear from a company that’s communicated carefully and voluminously to website operators about small updates to its search algorithms for years. I don’t doubt that Google has been thinking about this stuff for a while and that there are people at the company who deem it strategically irrelevant or at least of secondary importance to winning the AI race — the fate of the web might not sound terribly important when your bosses are talking nonstop about cashing out its accumulated data and expertise for AGI. I also don’t want to be precious about the web as it actually exists in 2025, nor do I suggest that websites working with or near companies like Meta and Google should have expected anything but temporary, incidental alignment with their businesses. If I had to guess, the future of Google search looks more like AI Overviews than AI mode — a jumble of widgets and modules including and united by AI-generated content, rather than a clean break — if only for purposes of sustaining Google’s multi-hundred-billion-dollar advertising business. But I also don’t want to assume Google knows exactly how this stuff will play out for Google, much less what it will actually mean for millions of websites, and their visitors, if Google stops sending as many people beyond its results pages. Google’s push into productizing generative AI is substantially fear-driven, faith-based, and informed by the actions of competitors that are far less invested in and dependent on the vast collection of behaviors — websites full of content authentic and inauthentic, volunteer and commercial, social and antisocial, archival and up-to-date — that make up what’s left of the web and have far less to lose. Maybe, in a few years, a fresh economy will grow around the new behaviors produced by searchlike AI tools; perhaps companies like OpenAI and Google will sign a bunch more licensing deals; conceivably, this style of search automation simply collapses the marketplace supported by search, leveraging training based on years of scraped data to do more with less. In any case, the signals from Google — despite its unconvincing suggestions to the contrary — are clear: It’ll do anything to win the AI race. If that means burying the web, then so be it. Sign Up for the Intelligencer Newsletter Daily news about the politics, business, and technology shaping our world. This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply. By submitting your email, you agree to our Terms and Privacy Notice and to receive email correspondence from us. Tags: Google Is Burying the Web Alive #google #burying #web #alive
    NYMAG.COM
    Google Is Burying the Web Alive
    screen time Google Is Burying the Web Alive 5:00 A.M. saved Save this article to read it later. Find this story in your account’s ‘Saved for Later’ section. Photo-Illustration: Intelligencer By now, there’s a good chance you’ve encountered Google’s AI Overviews, possibly thousands of times. Appearing as blurbs at the top of search results, they attempt to settle your queries before you scroll — to offer answers, or relevant information, gleaned from websites that you no longer need to click on. The feature was officially rolled out at Google’s developer conference last year and had been in testing for quite some time before that; on the occasion of this year’s conference, the company characterized it as “one of the most successful launches in Search in the past decade,” a strangely narrow claim that is almost certainly true: Google put AI summaries on top of everything else, for everyone, as if to say, “Before you use our main product, see if this works instead.” This year’s conference included another change to search, this one more profound but less aggressively deployed. “AI Mode,” which has similarly been in beta testing for a while, will appear as an option for all users. It’s not like AI Overviews; that is, it’s not an extra module taking up space on a familiar search-results page but rather a complete replacement for conventional search. It’s Google’s “most powerful AI search, with more advanced reasoning and multimodality, and the ability to go deeper through follow-up questions and helpful links to the web,” the company says, “breaking down your question into subtopics and issuing a multitude of queries simultaneously on your behalf.” It’s available to everyone. It’s a lot like using AI-first chatbots that have search functions, like those from OpenAI, Anthropic, and Perplexity, and Google says it’s destined for greater things than a small tab. “As we get feedback, we’ll graduate many features and capabilities from AI Mode right into the core Search experience,” the company says. I’ve been testing AI Mode for a few months now, and in some ways it’s less radical than it sounds and (at first) feels. It resembles the initial demos of AI search tools, including those by Google, meaning it responds to many questions with clean, ad-free answers. Sometimes it answers in extended plain language, but it also makes a lot of lists and pulls in familiar little gridded modules — especially when you ask about things you can buy — resulting in a product that, despite its chatty interface, feels an awful lot like … search. Again, now you can try it yourself, and your mileage may vary; it hasn’t drawn me away from Google proper for a lot of thoughtless rote tasks, but it’s competitive with ChatGPT for the expanding range of searchish tasks you might attempt with a chatbot. From the very first use, however, AI Mode crystallized something about Google’s priorities and in particular its relationship to the web from which the company has drawn, and returned, many hundreds of billions of dollars of value. AI Overviews demoted links, quite literally pushing content from the web down on the page, and summarizing its contents for digestion without clicking: Photo-Illustration: Intelligencer; Screenshot: Google Meanwhile, AI Mode all but buries them, not just summarizing their content for reading within Google’s product but inviting you to explore and expand on those summaries by asking more questions, rather than clicking out. In many cases, links are retained merely to provide backup and sourcing, included as footnotes and appendices rather than destinations: Photo-Illustration: Intelligencer; Screenshot: Google This is typical with AI search tools and all but inevitable now that such things are possible. In terms of automation, this means companies like OpenAI and Google are mechanizing some of the “work” that goes into using tools like Google search, removing, when possible, the step where users leave their platforms and reducing, in theory, the time and effort it takes to navigate to somewhere else when necessary. In even broader terms — contra Google’s effort to brand this as “going beyond information to intelligence” — this is an example of how LLMs offer different ways to interact with much of the same information: summarization rather than retrieval, regeneration rather than fact-finding, and vibe-y reconstruction over deterministic reproduction. This is interesting to think about and often compelling to use but leaves unresolved one of the first questions posed by chatbots-as-search: Where will they get all the data they need to continue to work well? When Microsoft and Google showed off their first neo-search mockups in 2023, which are pretty close to today’s AI mode, it revealed a dilemma: Search engines still provide the de facto gateway to the broader web, and have a deeply codependent relationship with the people and companies whose content they crawl, index, and rank; a Google that instantly but sometimes unreliably summarizes the websites to which it used to send people would destroy that relationship, and probably a lot of websites, including the ones on which its models were trained. And, well, yep! Now, both AI Overviews and AI Mode, when they aren’t occasionally hallucinating, produce relatively clean answers that benefit in contrast to increasingly degraded regular search results on Google, which are full of hyperoptimized and duplicative spamlike content designed first and foremost with the demands of Google’s ranking algorithms and advertising in mind. AI Mode feels one step further removed from that ecosystem and once again looks good in contrast, a placid textual escape from Google’s own mountain of links that look like ads and ads that look like links (of course, Google is already working on ads for both Overviews and AI Mode). In its drive to embrace AI, Google is further concealing the raw material that fuels it, demoting links as it continues to ingest them for abstraction. Google may still retain plenty of attention to monetize and perhaps keep even more of it for itself, now that it doesn’t need to send people elsewhere; in the process, however, it really is starving the web that supplies it with data on which to train and Two years later, Google has become more explicit about the extent to which it’s moving on from the “you provide us results to rank, and we send you visitors to monetize” bargain, with the head of search telling The Verge, “I think the search results page was a construct.” Which is true, as far as it goes, but also a remarkable thing to hear from a company that’s communicated carefully and voluminously to website operators about small updates to its search algorithms for years. I don’t doubt that Google has been thinking about this stuff for a while and that there are people at the company who deem it strategically irrelevant or at least of secondary importance to winning the AI race — the fate of the web might not sound terribly important when your bosses are talking nonstop about cashing out its accumulated data and expertise for AGI. I also don’t want to be precious about the web as it actually exists in 2025, nor do I suggest that websites working with or near companies like Meta and Google should have expected anything but temporary, incidental alignment with their businesses. If I had to guess, the future of Google search looks more like AI Overviews than AI mode — a jumble of widgets and modules including and united by AI-generated content, rather than a clean break — if only for purposes of sustaining Google’s multi-hundred-billion-dollar advertising business. But I also don’t want to assume Google knows exactly how this stuff will play out for Google, much less what it will actually mean for millions of websites, and their visitors, if Google stops sending as many people beyond its results pages. Google’s push into productizing generative AI is substantially fear-driven, faith-based, and informed by the actions of competitors that are far less invested in and dependent on the vast collection of behaviors — websites full of content authentic and inauthentic, volunteer and commercial, social and antisocial, archival and up-to-date — that make up what’s left of the web and have far less to lose. Maybe, in a few years, a fresh economy will grow around the new behaviors produced by searchlike AI tools; perhaps companies like OpenAI and Google will sign a bunch more licensing deals; conceivably, this style of search automation simply collapses the marketplace supported by search, leveraging training based on years of scraped data to do more with less. In any case, the signals from Google — despite its unconvincing suggestions to the contrary — are clear: It’ll do anything to win the AI race. If that means burying the web, then so be it. Sign Up for the Intelligencer Newsletter Daily news about the politics, business, and technology shaping our world. This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply. By submitting your email, you agree to our Terms and Privacy Notice and to receive email correspondence from us. Tags: Google Is Burying the Web Alive
    0 Comments 0 Shares
More Results