• How AI is reshaping the future of healthcare and medical research

    Transcript       
    PETER LEE: “In ‘The Little Black Bag,’ a classic science fiction story, a high-tech doctor’s kit of the future is accidentally transported back to the 1950s, into the shaky hands of a washed-up, alcoholic doctor. The ultimate medical tool, it redeems the doctor wielding it, allowing him to practice gratifyingly heroic medicine. … The tale ends badly for the doctor and his treacherous assistant, but it offered a picture of how advanced technology could transform medicine—powerful when it was written nearly 75 years ago and still so today. What would be the Al equivalent of that little black bag? At this moment when new capabilities are emerging, how do we imagine them into medicine?”          
    This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.   
    Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?    
    In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.  The book passage I read at the top is from “Chapter 10: The Big Black Bag.” 
    In imagining AI in medicine, Carey, Zak, and I included in our book two fictional accounts. In the first, a medical resident consults GPT-4 on her personal phone as the patient in front of her crashes. Within seconds, it offers an alternate response based on recent literature. In the second account, a 90-year-old woman with several chronic conditions is living independently and receiving near-constant medical support from an AI aide.   
    In our conversations with the guests we’ve spoken to so far, we’ve caught a glimpse of these predicted futures, seeing how clinicians and patients are actually using AI today and how developers are leveraging the technology in the healthcare products and services they’re creating. In fact, that first fictional account isn’t so fictional after all, as most of the doctors in the real world actually appear to be using AI at least occasionally—and sometimes much more than occasionally—to help in their daily clinical work. And as for the second fictional account, which is more of a science fiction account, it seems we are indeed on the verge of a new way of delivering and receiving healthcare, though the future is still very much open. 
    As we continue to examine the current state of AI in healthcare and its potential to transform the field, I’m pleased to welcome Bill Gates and Sébastien Bubeck.  
    Bill may be best known as the co-founder of Microsoft, having created the company with his childhood friend Paul Allen in 1975. He’s now the founder of Breakthrough Energy, which aims to advance clean energy innovation, and TerraPower, a company developing groundbreaking nuclear energy and science technologies. He also chairs the world’s largest philanthropic organization, the Gates Foundation, and focuses on solving a variety of health challenges around the globe and here at home. 
    Sébastien is a research lead at OpenAI. He was previously a distinguished scientist, vice president of AI, and a colleague of mine here at Microsoft, where his work included spearheading the development of the family of small language models known as Phi. While at Microsoft, he also coauthored the discussion-provoking 2023 paper “Sparks of Artificial General Intelligence,” which presented the results of early experiments with GPT-4 conducted by a small team from Microsoft Research.     
    Here’s my conversation with Bill Gates and Sébastien Bubeck. 
    LEE: Bill, welcome. 
    BILL GATES: Thank you. 
    LEE: Seb … 
    SÉBASTIEN BUBECK: Yeah. Hi, hi, Peter. Nice to be here. 
    LEE: You know, one of the things that I’ve been doing just to get the conversation warmed up is to talk about origin stories, and what I mean about origin stories is, you know, what was the first contact that you had with large language models or the concept of generative AI that convinced you or made you think that something really important was happening? 
    And so, Bill, I think I’ve heard the story about, you know, the time when the OpenAI folks—Sam Altman, Greg Brockman, and others—showed you something, but could we hear from you what those early encounters were like and what was going through your mind?  
    GATES: Well, I’d been visiting OpenAI soon after it was created to see things like GPT-2 and to see the little arm they had that was trying to match human manipulation and, you know, looking at their games like Dota that they were trying to get as good as human play. And honestly, I didn’t think the language model stuff they were doing, even when they got to GPT-3, would show the ability to learn, you know, in the same sense that a human reads a biology book and is able to take that knowledge and access it not only to pass a test but also to create new medicines. 
    And so my challenge to them was that if their LLM could get a five on the advanced placement biology test, then I would say, OK, it took biologic knowledge and encoded it in an accessible way and that I didn’t expect them to do that very quickly but it would be profound.  
    And it was only about six months after I challenged them to do that, that an early version of GPT-4 they brought up to a dinner at my house, and in fact, it answered most of the questions that night very well. The one it got totally wrong, we were … because it was so good, we kept thinking, Oh, we must be wrong. It turned out it was a math weaknessthat, you know, we later understood that that was an area of, weirdly, of incredible weakness of those early models. But, you know, that was when I realized, OK, the age of cheap intelligence was at its beginning. 
    LEE: Yeah. So I guess it seems like you had something similar to me in that my first encounters, I actually harbored some skepticism. Is it fair to say you were skeptical before that? 
    GATES: Well, the idea that we’ve figured out how to encode and access knowledge in this very deep sense without even understanding the nature of the encoding, … 
    LEE: Right.  
    GATES: … that is a bit weird.  
    LEE: Yeah. 
    GATES: We have an algorithm that creates the computation, but even say, OK, where is the president’s birthday stored in there? Where is this fact stored in there? The fact that even now when we’re playing around, getting a little bit more sense of it, it’s opaque to us what the semantic encoding is, it’s, kind of, amazing to me. I thought the invention of knowledge storage would be an explicit way of encoding knowledge, not an implicit statistical training. 
    LEE: Yeah, yeah. All right. So, Seb, you know, on this same topic, you know, I got—as we say at Microsoft—I got pulled into the tent. 
    BUBECK: Yes.  
    LEE: Because this was a very secret project. And then, um, I had the opportunity to select a small number of researchers in MSRto join and start investigating this thing seriously. And the first person I pulled in was you. 
    BUBECK: Yeah. 
    LEE: And so what were your first encounters? Because I actually don’t remember what happened then. 
    BUBECK: Oh, I remember it very well.My first encounter with GPT-4 was in a meeting with the two of you, actually. But my kind of first contact, the first moment where I realized that something was happening with generative AI, was before that. And I agree with Bill that I also wasn’t too impressed by GPT-3. 
    I though that it was kind of, you know, very naturally mimicking the web, sort of parroting what was written there in a nice way. Still in a way which seemed very impressive. But it wasn’t really intelligent in any way. But shortly after GPT-3, there was a model before GPT-4 that really shocked me, and this was the first image generation model, DALL-E 1. 
    So that was in 2021. And I will forever remember the press release of OpenAI where they had this prompt of an avocado chair and then you had this image of the avocado chair.And what really shocked me is that clearly the model kind of “understood” what is a chair, what is an avocado, and was able to merge those concepts. 
    So this was really, to me, the first moment where I saw some understanding in those models.  
    LEE: So this was, just to get the timing right, that was before I pulled you into the tent. 
    BUBECK: That was before. That was like a year before. 
    LEE: Right.  
    BUBECK: And now I will tell you how, you know, we went from that moment to the meeting with the two of you and GPT-4. 
    So once I saw this kind of understanding, I thought, OK, fine. It understands concept, but it’s still not able to reason. It cannot—as, you know, Bill was saying—it cannot learn from your document. It cannot reason.  
    So I set out to try to prove that. You know, this is what I was in the business of at the time, trying to prove things in mathematics. So I was trying to prove that basically autoregressive transformers could never reason. So I was trying to prove this. And after a year of work, I had something reasonable to show. And so I had the meeting with the two of you, and I had this example where I wanted to say, there is no way that an LLM is going to be able to do x. 
    And then as soon as I … I don’t know if you remember, Bill. But as soon as I said that, you said, oh, but wait a second. I had, you know, the OpenAI crew at my house recently, and they showed me a new model. Why don’t we ask this new model this question?  
    LEE: Yeah.
    BUBECK: And we did, and it solved it on the spot. And that really, honestly, just changed my life. Like, you know, I had been working for a year trying to say that this was impossible. And just right there, it was shown to be possible.  
    LEE:One of the very first things I got interested in—because I was really thinking a lot about healthcare—was healthcare and medicine. 
    And I don’t know if the two of you remember, but I ended up doing a lot of tests. I ran through, you know, step one and step two of the US Medical Licensing Exam. Did a whole bunch of other things. I wrote this big report. It was, you know, I can’t remember … a couple hundred pages.  
    And I needed to share this with someone. I didn’t … there weren’t too many people I could share it with. So I sent, I think, a copy to you, Bill. Sent a copy to you, Seb.  
    I hardly slept for about a week putting that report together. And, yeah, and I kept working on it. But I was far from alone. I think everyone who was in the tent, so to speak, in those early days was going through something pretty similar. All right. So I think … of course, a lot of what I put in the report also ended up being examples that made it into the book. 
    But the main purpose of this conversation isn’t to reminisce aboutor indulge in those reminiscences but to talk about what’s happening in healthcare and medicine. And, you know, as I said, we wrote this book. We did it very, very quickly. Seb, you helped. Bill, you know, you provided a review and some endorsements. 
    But, you know, honestly, we didn’t know what we were talking about because no one had access to this thing. And so we just made a bunch of guesses. So really, the whole thing I wanted to probe with the two of you is, now with two years of experience out in the world, what, you know, what do we think is happening today? 
    You know, is AI actually having an impact, positive or negative, on healthcare and medicine? And what do we now think is going to happen in the next two years, five years, or 10 years? And so I realize it’s a little bit too abstract to just ask it that way. So let me just try to narrow the discussion and guide us a little bit.  
    Um, the kind of administrative and clerical work, paperwork, around healthcare—and we made a lot of guesses about that—that appears to be going well, but, you know, Bill, I know we’ve discussed that sometimes that you think there ought to be a lot more going on. Do you have a viewpoint on how AI is actually finding its way into reducing paperwork? 
    GATES: Well, I’m stunned … I don’t think there should be a patient-doctor meeting where the AI is not sitting in and both transcribing, offering to help with the paperwork, and even making suggestions, although the doctor will be the one, you know, who makes the final decision about the diagnosis and whatever prescription gets done.  
    It’s so helpful. You know, when that patient goes home and their, you know, son who wants to understand what happened has some questions, that AI should be available to continue that conversation. And the way you can improve that experience and streamline things and, you know, involve the people who advise you. I don’t understand why that’s not more adopted, because there you still have the human in the loop making that final decision. 
    But even for, like, follow-up calls to make sure the patient did things, to understand if they have concerns and knowing when to escalate back to the doctor, the benefit is incredible. And, you know, that thing is ready for prime time. That paradigm is ready for prime time, in my view. 
    LEE: Yeah, there are some good products, but it seems like the number one use right now—and we kind of got this from some of the previous guests in previous episodes—is the use of AI just to respond to emails from patients.Does that make sense to you? 
    BUBECK: Yeah. So maybe I want to second what Bill was saying but maybe take a step back first. You know, two years ago, like, the concept of clinical scribes, which is one of the things that we’re talking about right now, it would have sounded, in fact, it sounded two years ago, borderline dangerous. Because everybody was worried about hallucinations. What happened if you have this AI listening in and then it transcribes, you know, something wrong? 
    Now, two years later, I think it’s mostly working. And in fact, it is not yet, you know, fully adopted. You’re right. But it is in production. It is used, you know, in many, many places. So this rate of progress is astounding because it wasn’t obvious that we would be able to overcome those obstacles of hallucination. It’s not to say that hallucinations are fully solved. In the case of the closed system, they are.  
    Now, I think more generally what’s going on in the background is that there is something that we, that certainly I, underestimated, which is this management overhead. So I think the reason why this is not adopted everywhere is really a training and teaching aspect. People need to be taught, like, those systems, how to interact with them. 
    And one example that I really like, a study that recently appeared where they tried to use ChatGPT for diagnosis and they were comparing doctors without and with ChatGPT. And the amazing thing … so this was a set of cases where the accuracy of the doctors alone was around 75%. ChatGPT alone was 90%. So that’s already kind of mind blowing. But then the kicker is that doctors with ChatGPT was 80%.  
    Intelligence alone is not enough. It’s also how it’s presented, how you interact with it. And ChatGPT, it’s an amazing tool. Obviously, I absolutely love it. But it’s not … you don’t want a doctor to have to type in, you know, prompts and use it that way. 
    It should be, as Bill was saying, kind of running continuously in the background, sending you notifications. And you have to be really careful of the rate at which those notifications are being sent. Because if they are too frequent, then the doctor will learn to ignore them. So you have to … all of those things matter, in fact, at least as much as the level of intelligence of the machine. 
    LEE: One of the things I think about, Bill, in that scenario that you described, doctors do some thinking about the patient when they write the note. So, you know, I’m always a little uncertain whether it’s actually … you know, you wouldn’t necessarily want to fully automate this, I don’t think. Or at least there needs to be some prompt to the doctor to make sure that the doctor puts some thought into what happened in the encounter with the patient. Does that make sense to you at all? 
    GATES: At this stage, you know, I’d still put the onus on the doctor to write the conclusions and the summary and not delegate that. 
    The tradeoffs you make a little bit are somewhat dependent on the situation you’re in. If you’re in Africa,
    So, yes, the doctor’s still going to have to do a lot of work, but just the quality of letting the patient and the people around them interact and ask questions and have things explained, that alone is such a quality improvement. It’s mind blowing.  
    LEE: So since you mentioned, you know, Africa—and, of course, this touches on the mission and some of the priorities of the Gates Foundation and this idea of democratization of access to expert medical care—what’s the most interesting stuff going on right now? Are there people and organizations or technologies that are impressing you or that you’re tracking? 
    GATES: Yeah. So the Gates Foundation has given out a lot of grants to people in Africa doing education, agriculture but more healthcare examples than anything. And the way these things start off, they often start out either being patient-centric in a narrow situation, like, OK, I’m a pregnant woman; talk to me. Or, I have infectious disease symptoms; talk to me. Or they’re connected to a health worker where they’re helping that worker get their job done. And we have lots of pilots out, you know, in both of those cases.  
    The dream would be eventually to have the thing the patient consults be so broad that it’s like having a doctor available who understands the local things.  
    LEE: Right.  
    GATES: We’re not there yet. But over the next two or three years, you know, particularly given the worsening financial constraints against African health systems, where the withdrawal of money has been dramatic, you know, figuring out how to take this—what I sometimes call “free intelligence”—and build a quality health system around that, we will have to be more radical in low-income countries than any rich country is ever going to be.  
    LEE: Also, there’s maybe a different regulatory environment, so some of those things maybe are easier? Because right now, I think the world hasn’t figured out how to and whether to regulate, let’s say, an AI that might give a medical diagnosis or write a prescription for a medication. 
    BUBECK: Yeah. I think one issue with this, and it’s also slowing down the deployment of AI in healthcare more generally, is a lack of proper benchmark. Because, you know, you were mentioning the USMLE, for example. That’s a great test to test human beings and their knowledge of healthcare and medicine. But it’s not a great test to give to an AI. 
    It’s not asking the right questions. So finding what are the right questions to test whether an AI system is ready to give diagnosis in a constrained setting, that’s a very, very important direction, which to my surprise, is not yet accelerating at the rate that I was hoping for. 
    LEE: OK, so that gives me an excuse to get more now into the core AI tech because something I’ve discussed with both of you is this issue of what are the right tests. And you both know the very first test I give to any new spin of an LLM is I present a patient, the results—a mythical patient—the results of my physical exam, my mythical physical exam. Maybe some results of some initial labs. And then I present or propose a differential diagnosis. And if you’re not in medicine, a differential diagnosis you can just think of as a prioritized list of the possible diagnoses that fit with all that data. And in that proposed differential, I always intentionally make two mistakes. 
    I make a textbook technical error in one of the possible elements of the differential diagnosis, and I have an error of omission. And, you know, I just want to know, does the LLM understand what I’m talking about? And all the good ones out there do now. But then I want to know, can it spot the errors? And then most importantly, is it willing to tell me I’m wrong, that I’ve made a mistake?  
    That last piece seems really hard for AI today. And so let me ask you first, Seb, because at the time of this taping, of course, there was a new spin of GPT-4o last week that became overly sycophantic. In other words, it was actually prone in that test of mine not only to not tell me I’m wrong, but it actually praised me for the creativity of my differential.What’s up with that? 
    BUBECK: Yeah, I guess it’s a testament to the fact that training those models is still more of an art than a science. So it’s a difficult job. Just to be clear with the audience, we have rolled back thatversion of GPT-4o, so now we don’t have the sycophant version out there. 
    Yeah, no, it’s a really difficult question. It has to do … as you said, it’s very technical. It has to do with the post-training and how, like, where do you nudge the model? So, you know, there is this very classical by now technique called RLHF, where you push the model in the direction of a certain reward model. So the reward model is just telling the model, you know, what behavior is good, what behavior is bad. 
    But this reward model is itself an LLM, and, you know, Bill was saying at the very beginning of the conversation that we don’t really understand how those LLMs deal with concepts like, you know, where is the capital of France located? Things like that. It is the same thing for this reward model. We don’t know why it says that it prefers one output to another, and whether this is correlated with some sycophancy is, you know, something that we discovered basically just now. That if you push too hard in optimization on this reward model, you will get a sycophant model. 
    So it’s kind of … what I’m trying to say is we became too good at what we were doing, and we ended up, in fact, in a trap of the reward model. 
    LEE: I mean, you do want … it’s a difficult balance because you do want models to follow your desires and … 
    BUBECK: It’s a very difficult, very difficult balance. 
    LEE: So this brings up then the following question for me, which is the extent to which we think we’ll need to have specially trained models for things. So let me start with you, Bill. Do you have a point of view on whether we will need to, you know, quote-unquote take AI models to med school? Have them specially trained? Like, if you were going to deploy something to give medical care in underserved parts of the world, do we need to do something special to create those models? 
    GATES: We certainly need to teach them the African languages and the unique dialects so that the multimedia interactions are very high quality. We certainly need to teach them the disease prevalence and unique disease patterns like, you know, neglected tropical diseases and malaria. So we need to gather a set of facts that somebody trying to go for a US customer base, you know, wouldn’t necessarily have that in there. 
    Those two things are actually very straightforward because the additional training time is small. I’d say for the next few years, we’ll also need to do reinforcement learning about the context of being a doctor and how important certain behaviors are. Humans learn over the course of their life to some degree that, I’m in a different context and the way I behave in terms of being willing to criticize or be nice, you know, how important is it? Who’s here? What’s my relationship to them?  
    Right now, these machines don’t have that broad social experience. And so if you know it’s going to be used for health things, a lot of reinforcement learning of the very best humans in that context would still be valuable. Eventually, the models will, having read all the literature of the world about good doctors, bad doctors, it’ll understand as soon as you say, “I want you to be a doctor diagnosing somebody.” All of the implicit reinforcement that fits that situation, you know, will be there.
    LEE: Yeah.
    GATES: And so I hope three years from now, we don’t have to do that reinforcement learning. But today, for any medical context, you would want a lot of data to reinforce tone, willingness to say things when, you know, there might be something significant at stake. 
    LEE: Yeah. So, you know, something Bill said, kind of, reminds me of another thing that I think we missed, which is, the context also … and the specialization also pertains to different, I guess, what we still call “modes,” although I don’t know if the idea of multimodal is the same as it was two years ago. But, you know, what do you make of all of the hubbub around—in fact, within Microsoft Research, this is a big deal, but I think we’re far from alone—you know, medical images and vision, video, proteins and molecules, cell, you know, cellular data and so on. 
    BUBECK: Yeah. OK. So there is a lot to say to everything … to the last, you know, couple of minutes. Maybe on the specialization aspect, you know, I think there is, hiding behind this, a really fundamental scientific question of whether eventually we have a singular AGIthat kind of knows everything and you can just put, you know, explain your own context and it will just get it and understand everything. 
    That’s one vision. I have to say, I don’t particularly believe in this vision. In fact, we humans are not like that at all. I think, hopefully, we are general intelligences, yet we have to specialize a lot. And, you know, I did myself a lot of RL, reinforcement learning, on mathematics. Like, that’s what I did, you know, spent a lot of time doing that. And I didn’t improve on other aspects. You know, in fact, I probably degraded in other aspects.So it’s … I think it’s an important example to have in mind. 
    LEE: I think I might disagree with you on that, though, because, like, doesn’t a model have to see both good science and bad science in order to be able to gain the ability to discern between the two? 
    BUBECK: Yeah, no, that absolutely. I think there is value in seeing the generality, in having a very broad base. But then you, kind of, specialize on verticals. And this is where also, you know, open-weights model, which we haven’t talked about yet, are really important because they allow you to provide this broad base to everyone. And then you can specialize on top of it. 
    LEE: So we have about three hours of stuff to talk about, but our time is actually running low.
    BUBECK: Yes, yes, yes.  
    LEE: So I think I want … there’s a more provocative question. It’s almost a silly question, but I need to ask it of the two of you, which is, is there a future, you know, where AI replaces doctors or replaces, you know, medical specialties that we have today? So what does the world look like, say, five years from now? 
    GATES: Well, it’s important to distinguish healthcare discovery activity from healthcare delivery activity. We focused mostly on delivery. I think it’s very much within the realm of possibility that the AI is not only accelerating healthcare discovery but substituting for a lot of the roles of, you know, I’m an organic chemist, or I run various types of assays. I can see those, which are, you know, testable-output-type jobs but with still very high value, I can see, you know, some replacement in those areas before the doctor.  
    The doctor, still understanding the human condition and long-term dialogues, you know, they’ve had a lifetime of reinforcement of that, particularly when you get into areas like mental health. So I wouldn’t say in five years, either people will choose to adopt it, but it will be profound that there’ll be this nearly free intelligence that can do follow-up, that can help you, you know, make sure you went through different possibilities. 
    And so I’d say, yes, we’ll have doctors, but I’d say healthcare will be massively transformed in its quality and in efficiency by AI in that time period. 
    LEE: Is there a comparison, useful comparison, say, between doctors and, say, programmers, computer programmers, or doctors and, I don’t know, lawyers? 
    GATES: Programming is another one that has, kind of, a mathematical correctness to it, you know, and so the objective function that you’re trying to reinforce to, as soon as you can understand the state machines, you can have something that’s “checkable”; that’s correct. So I think programming, you know, which is weird to say, that the machine will beat us at most programming tasks before we let it take over roles that have deep empathy, you know, physical presence and social understanding in them. 
    LEE: Yeah. By the way, you know, I fully expect in five years that AI will produce mathematical proofs that are checkable for validity, easily checkable, because they’ll be written in a proof-checking language like Lean or something but will be so complex that no human mathematician can understand them. I expect that to happen.  
    I can imagine in some fields, like cellular biology, we could have the same situation in the future because the molecular pathways, the chemistry, biochemistry of human cells or living cells is as complex as any mathematics, and so it seems possible that we may be in a state where in wet lab, we see, Oh yeah, this actually works, but no one can understand why. 
    BUBECK: Yeah, absolutely. I mean, I think I really agree with Bill’s distinction of the discovery and the delivery, and indeed, the discovery’s when you can check things, and at the end, there is an artifact that you can verify. You know, you can run the protocol in the wet lab and seeproduced what you wanted. So I absolutely agree with that.  
    And in fact, you know, we don’t have to talk five years from now. I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini. So this is really amazing. And, you know, just very quickly, just so people know, it was about this statistical physics model, the frustrated Potts model, which has to do with coloring, and basically, the case of three colors, like, more than two colors was open for a long time, and o3 was able to reduce the case of three colors to two colors.  
    LEE: Yeah. 
    BUBECK: Which is just, like, astounding. And this is not … this is now. This is happening right now. So this is something that I personally didn’t expect it would happen so quickly, and it’s due to those reasoning models.  
    Now, on the delivery side, I would add something more to it for the reason why doctors and, in fact, lawyers and coders will remain for a long time, and it’s because we still don’t understand how those models generalize. Like, at the end of the day, we are not able to tell you when they are confronted with a really new, novel situation, whether they will work or not. 
    Nobody is able to give you that guarantee. And I think until we understand this generalization better, we’re not going to be willing to just let the system in the wild without human supervision. 
    LEE: But don’t human doctors, human specialists … so, for example, a cardiologist sees a patient in a certain way that a nephrologist … 
    BUBECK: Yeah.
    LEE: … or an endocrinologist might not.
    BUBECK: That’s right. But another cardiologist will understand and, kind of, expect a certain level of generalization from their peer. And this, we just don’t have it with AI models. Now, of course, you’re exactly right. That generalization is also hard for humans. Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know.
    LEE: OK. You know, the podcast is focused on what’s happened over the last two years. But now, I’d like one provocative prediction about what you think the world of AI and medicine is going to be at some point in the future. You pick your timeframe. I don’t care if it’s two years or 20 years from now, but, you know, what do you think will be different about AI in medicine in that future than today? 
    BUBECK: Yeah, I think the deployment is going to accelerate soon. Like, we’re really not missing very much. There is this enormous capability overhang. Like, even if progress completely stopped, with current systems, we can do a lot more than what we’re doing right now. So I think this will … this has to be realized, you know, sooner rather than later. 
    And I think it’s probably dependent on these benchmarks and proper evaluation and tying this with regulation. So these are things that take time in human society and for good reason. But now we already are at two years; you know, give it another two years and it should be really …  
    LEE: Will AI prescribe your medicines? Write your prescriptions? 
    BUBECK: I think yes. I think yes. 
    LEE: OK. Bill? 
    GATES: Well, I think the next two years, we’ll have massive pilots, and so the amount of use of the AI, still in a copilot-type mode, you know, we should get millions of patient visits, you know, both in general medicine and in the mental health side, as well. And I think that’s going to build up both the data and the confidence to give the AI some additional autonomy. You know, are you going to let it talk to you at night when you’re panicked about your mental health with some ability to escalate?
    And, you know, I’ve gone so far as to tell politicians with national health systems that if they deploy AI appropriately, that the quality of care, the overload of the doctors, the improvement in the economics will be enough that their voters will be stunned because they just don’t expect this, and, you know, they could be reelectedjust on this one thing of fixing what is a very overloaded and economically challenged health system in these rich countries. 
    You know, my personal role is going to be to make sure that in the poorer countries, there isn’t some lag; in fact, in many cases, that we’ll be more aggressive because, you know, we’re comparing to having no access to doctors at all. And, you know, so I think whether it’s India or Africa, there’ll be lessons that are globally valuable because we need medical intelligence. And, you know, thank god AI is going to provide a lot of that. 
    LEE: Well, on that optimistic note, I think that’s a good way to end. Bill, Seb, really appreciate all of this.  
    I think the most fundamental prediction we made in the book is that AI would actually find its way into the practice of medicine, and I think that that at least has come true, maybe in different ways than we expected, but it’s come true, and I think it’ll only accelerate from here. So thanks again, both of you.  
    GATES: Yeah. Thanks, you guys. 
    BUBECK: Thank you, Peter. Thanks, Bill. 
    LEE: I just always feel such a sense of privilege to have a chance to interact and actually work with people like Bill and Sébastien.   
    With Bill, I’m always amazed at how practically minded he is. He’s really thinking about the nuts and bolts of what AI might be able to do for people, and his thoughts about underserved parts of the world, the idea that we might actually be able to empower people with access to expert medical knowledge, I think is both inspiring and amazing.  
    And then, Seb, Sébastien Bubeck, he’s just absolutely a brilliant mind. He has a really firm grip on the deep mathematics of artificial intelligence and brings that to bear in his research and development work. And where that mathematics takes him isn’t just into the nuts and bolts of algorithms but into philosophical questions about the nature of intelligence.  
    One of the things that Sébastien brought up was the state of evaluation of AI systems. And indeed, he was fairly critical in our conversation. But of course, the world of AI research and development is just moving so fast, and indeed, since we recorded our conversation, OpenAI, in fact, released a new evaluation metric that is directly relevant to medical applications, and that is something called HealthBench. And Microsoft Research also released a new evaluation approach or process called ADeLe.  
    HealthBench and ADeLe are examples of new approaches to evaluating AI models that are less about testing their knowledge and ability to pass multiple-choice exams and instead are evaluation approaches designed to assess how well AI models are able to complete tasks that actually arise every day in typical healthcare or biomedical research settings. These are examples of really important good work that speak to how well AI models work in the real world of healthcare and biomedical research and how well they can collaborate with human beings in those settings. 
    You know, I asked Bill and Seb to make some predictions about the future. You know, my own answer, I expect that we’re going to be able to use AI to change how we diagnose patients, change how we decide treatment options.  
    If you’re a doctor or a nurse and you encounter a patient, you’ll ask questions, do a physical exam, you know, call out for labs just like you do today, but then you’ll be able to engage with AI based on all of that data and just ask, you know, based on all the other people who have gone through the same experience, who have similar data, how were they diagnosed? How were they treated? What were their outcomes? And what does that mean for the patient I have right now? Some people call it the “patients like me” paradigm. And I think that’s going to become real because of AI within our lifetimes. That idea of really grounding the delivery in healthcare and medical practice through data and intelligence, I actually now don’t see any barriers to that future becoming real.  
    I’d like to extend another big thank you to Bill and Sébastien for their time. And to our listeners, as always, it’s a pleasure to have you along for the ride. I hope you’ll join us for our remaining conversations, as well as a second coauthor roundtable with Carey and Zak.  
    Until next time.  
    #how #reshaping #future #healthcare #medical
    How AI is reshaping the future of healthcare and medical research
    Transcript        PETER LEE: “In ‘The Little Black Bag,’ a classic science fiction story, a high-tech doctor’s kit of the future is accidentally transported back to the 1950s, into the shaky hands of a washed-up, alcoholic doctor. The ultimate medical tool, it redeems the doctor wielding it, allowing him to practice gratifyingly heroic medicine. … The tale ends badly for the doctor and his treacherous assistant, but it offered a picture of how advanced technology could transform medicine—powerful when it was written nearly 75 years ago and still so today. What would be the Al equivalent of that little black bag? At this moment when new capabilities are emerging, how do we imagine them into medicine?”           This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.    Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?     In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.  The book passage I read at the top is from “Chapter 10: The Big Black Bag.”  In imagining AI in medicine, Carey, Zak, and I included in our book two fictional accounts. In the first, a medical resident consults GPT-4 on her personal phone as the patient in front of her crashes. Within seconds, it offers an alternate response based on recent literature. In the second account, a 90-year-old woman with several chronic conditions is living independently and receiving near-constant medical support from an AI aide.    In our conversations with the guests we’ve spoken to so far, we’ve caught a glimpse of these predicted futures, seeing how clinicians and patients are actually using AI today and how developers are leveraging the technology in the healthcare products and services they’re creating. In fact, that first fictional account isn’t so fictional after all, as most of the doctors in the real world actually appear to be using AI at least occasionally—and sometimes much more than occasionally—to help in their daily clinical work. And as for the second fictional account, which is more of a science fiction account, it seems we are indeed on the verge of a new way of delivering and receiving healthcare, though the future is still very much open.  As we continue to examine the current state of AI in healthcare and its potential to transform the field, I’m pleased to welcome Bill Gates and Sébastien Bubeck.   Bill may be best known as the co-founder of Microsoft, having created the company with his childhood friend Paul Allen in 1975. He’s now the founder of Breakthrough Energy, which aims to advance clean energy innovation, and TerraPower, a company developing groundbreaking nuclear energy and science technologies. He also chairs the world’s largest philanthropic organization, the Gates Foundation, and focuses on solving a variety of health challenges around the globe and here at home.  Sébastien is a research lead at OpenAI. He was previously a distinguished scientist, vice president of AI, and a colleague of mine here at Microsoft, where his work included spearheading the development of the family of small language models known as Phi. While at Microsoft, he also coauthored the discussion-provoking 2023 paper “Sparks of Artificial General Intelligence,” which presented the results of early experiments with GPT-4 conducted by a small team from Microsoft Research.      Here’s my conversation with Bill Gates and Sébastien Bubeck.  LEE: Bill, welcome.  BILL GATES: Thank you.  LEE: Seb …  SÉBASTIEN BUBECK: Yeah. Hi, hi, Peter. Nice to be here.  LEE: You know, one of the things that I’ve been doing just to get the conversation warmed up is to talk about origin stories, and what I mean about origin stories is, you know, what was the first contact that you had with large language models or the concept of generative AI that convinced you or made you think that something really important was happening?  And so, Bill, I think I’ve heard the story about, you know, the time when the OpenAI folks—Sam Altman, Greg Brockman, and others—showed you something, but could we hear from you what those early encounters were like and what was going through your mind?   GATES: Well, I’d been visiting OpenAI soon after it was created to see things like GPT-2 and to see the little arm they had that was trying to match human manipulation and, you know, looking at their games like Dota that they were trying to get as good as human play. And honestly, I didn’t think the language model stuff they were doing, even when they got to GPT-3, would show the ability to learn, you know, in the same sense that a human reads a biology book and is able to take that knowledge and access it not only to pass a test but also to create new medicines.  And so my challenge to them was that if their LLM could get a five on the advanced placement biology test, then I would say, OK, it took biologic knowledge and encoded it in an accessible way and that I didn’t expect them to do that very quickly but it would be profound.   And it was only about six months after I challenged them to do that, that an early version of GPT-4 they brought up to a dinner at my house, and in fact, it answered most of the questions that night very well. The one it got totally wrong, we were … because it was so good, we kept thinking, Oh, we must be wrong. It turned out it was a math weaknessthat, you know, we later understood that that was an area of, weirdly, of incredible weakness of those early models. But, you know, that was when I realized, OK, the age of cheap intelligence was at its beginning.  LEE: Yeah. So I guess it seems like you had something similar to me in that my first encounters, I actually harbored some skepticism. Is it fair to say you were skeptical before that?  GATES: Well, the idea that we’ve figured out how to encode and access knowledge in this very deep sense without even understanding the nature of the encoding, …  LEE: Right.   GATES: … that is a bit weird.   LEE: Yeah.  GATES: We have an algorithm that creates the computation, but even say, OK, where is the president’s birthday stored in there? Where is this fact stored in there? The fact that even now when we’re playing around, getting a little bit more sense of it, it’s opaque to us what the semantic encoding is, it’s, kind of, amazing to me. I thought the invention of knowledge storage would be an explicit way of encoding knowledge, not an implicit statistical training.  LEE: Yeah, yeah. All right. So, Seb, you know, on this same topic, you know, I got—as we say at Microsoft—I got pulled into the tent.  BUBECK: Yes.   LEE: Because this was a very secret project. And then, um, I had the opportunity to select a small number of researchers in MSRto join and start investigating this thing seriously. And the first person I pulled in was you.  BUBECK: Yeah.  LEE: And so what were your first encounters? Because I actually don’t remember what happened then.  BUBECK: Oh, I remember it very well.My first encounter with GPT-4 was in a meeting with the two of you, actually. But my kind of first contact, the first moment where I realized that something was happening with generative AI, was before that. And I agree with Bill that I also wasn’t too impressed by GPT-3.  I though that it was kind of, you know, very naturally mimicking the web, sort of parroting what was written there in a nice way. Still in a way which seemed very impressive. But it wasn’t really intelligent in any way. But shortly after GPT-3, there was a model before GPT-4 that really shocked me, and this was the first image generation model, DALL-E 1.  So that was in 2021. And I will forever remember the press release of OpenAI where they had this prompt of an avocado chair and then you had this image of the avocado chair.And what really shocked me is that clearly the model kind of “understood” what is a chair, what is an avocado, and was able to merge those concepts.  So this was really, to me, the first moment where I saw some understanding in those models.   LEE: So this was, just to get the timing right, that was before I pulled you into the tent.  BUBECK: That was before. That was like a year before.  LEE: Right.   BUBECK: And now I will tell you how, you know, we went from that moment to the meeting with the two of you and GPT-4.  So once I saw this kind of understanding, I thought, OK, fine. It understands concept, but it’s still not able to reason. It cannot—as, you know, Bill was saying—it cannot learn from your document. It cannot reason.   So I set out to try to prove that. You know, this is what I was in the business of at the time, trying to prove things in mathematics. So I was trying to prove that basically autoregressive transformers could never reason. So I was trying to prove this. And after a year of work, I had something reasonable to show. And so I had the meeting with the two of you, and I had this example where I wanted to say, there is no way that an LLM is going to be able to do x.  And then as soon as I … I don’t know if you remember, Bill. But as soon as I said that, you said, oh, but wait a second. I had, you know, the OpenAI crew at my house recently, and they showed me a new model. Why don’t we ask this new model this question?   LEE: Yeah. BUBECK: And we did, and it solved it on the spot. And that really, honestly, just changed my life. Like, you know, I had been working for a year trying to say that this was impossible. And just right there, it was shown to be possible.   LEE:One of the very first things I got interested in—because I was really thinking a lot about healthcare—was healthcare and medicine.  And I don’t know if the two of you remember, but I ended up doing a lot of tests. I ran through, you know, step one and step two of the US Medical Licensing Exam. Did a whole bunch of other things. I wrote this big report. It was, you know, I can’t remember … a couple hundred pages.   And I needed to share this with someone. I didn’t … there weren’t too many people I could share it with. So I sent, I think, a copy to you, Bill. Sent a copy to you, Seb.   I hardly slept for about a week putting that report together. And, yeah, and I kept working on it. But I was far from alone. I think everyone who was in the tent, so to speak, in those early days was going through something pretty similar. All right. So I think … of course, a lot of what I put in the report also ended up being examples that made it into the book.  But the main purpose of this conversation isn’t to reminisce aboutor indulge in those reminiscences but to talk about what’s happening in healthcare and medicine. And, you know, as I said, we wrote this book. We did it very, very quickly. Seb, you helped. Bill, you know, you provided a review and some endorsements.  But, you know, honestly, we didn’t know what we were talking about because no one had access to this thing. And so we just made a bunch of guesses. So really, the whole thing I wanted to probe with the two of you is, now with two years of experience out in the world, what, you know, what do we think is happening today?  You know, is AI actually having an impact, positive or negative, on healthcare and medicine? And what do we now think is going to happen in the next two years, five years, or 10 years? And so I realize it’s a little bit too abstract to just ask it that way. So let me just try to narrow the discussion and guide us a little bit.   Um, the kind of administrative and clerical work, paperwork, around healthcare—and we made a lot of guesses about that—that appears to be going well, but, you know, Bill, I know we’ve discussed that sometimes that you think there ought to be a lot more going on. Do you have a viewpoint on how AI is actually finding its way into reducing paperwork?  GATES: Well, I’m stunned … I don’t think there should be a patient-doctor meeting where the AI is not sitting in and both transcribing, offering to help with the paperwork, and even making suggestions, although the doctor will be the one, you know, who makes the final decision about the diagnosis and whatever prescription gets done.   It’s so helpful. You know, when that patient goes home and their, you know, son who wants to understand what happened has some questions, that AI should be available to continue that conversation. And the way you can improve that experience and streamline things and, you know, involve the people who advise you. I don’t understand why that’s not more adopted, because there you still have the human in the loop making that final decision.  But even for, like, follow-up calls to make sure the patient did things, to understand if they have concerns and knowing when to escalate back to the doctor, the benefit is incredible. And, you know, that thing is ready for prime time. That paradigm is ready for prime time, in my view.  LEE: Yeah, there are some good products, but it seems like the number one use right now—and we kind of got this from some of the previous guests in previous episodes—is the use of AI just to respond to emails from patients.Does that make sense to you?  BUBECK: Yeah. So maybe I want to second what Bill was saying but maybe take a step back first. You know, two years ago, like, the concept of clinical scribes, which is one of the things that we’re talking about right now, it would have sounded, in fact, it sounded two years ago, borderline dangerous. Because everybody was worried about hallucinations. What happened if you have this AI listening in and then it transcribes, you know, something wrong?  Now, two years later, I think it’s mostly working. And in fact, it is not yet, you know, fully adopted. You’re right. But it is in production. It is used, you know, in many, many places. So this rate of progress is astounding because it wasn’t obvious that we would be able to overcome those obstacles of hallucination. It’s not to say that hallucinations are fully solved. In the case of the closed system, they are.   Now, I think more generally what’s going on in the background is that there is something that we, that certainly I, underestimated, which is this management overhead. So I think the reason why this is not adopted everywhere is really a training and teaching aspect. People need to be taught, like, those systems, how to interact with them.  And one example that I really like, a study that recently appeared where they tried to use ChatGPT for diagnosis and they were comparing doctors without and with ChatGPT. And the amazing thing … so this was a set of cases where the accuracy of the doctors alone was around 75%. ChatGPT alone was 90%. So that’s already kind of mind blowing. But then the kicker is that doctors with ChatGPT was 80%.   Intelligence alone is not enough. It’s also how it’s presented, how you interact with it. And ChatGPT, it’s an amazing tool. Obviously, I absolutely love it. But it’s not … you don’t want a doctor to have to type in, you know, prompts and use it that way.  It should be, as Bill was saying, kind of running continuously in the background, sending you notifications. And you have to be really careful of the rate at which those notifications are being sent. Because if they are too frequent, then the doctor will learn to ignore them. So you have to … all of those things matter, in fact, at least as much as the level of intelligence of the machine.  LEE: One of the things I think about, Bill, in that scenario that you described, doctors do some thinking about the patient when they write the note. So, you know, I’m always a little uncertain whether it’s actually … you know, you wouldn’t necessarily want to fully automate this, I don’t think. Or at least there needs to be some prompt to the doctor to make sure that the doctor puts some thought into what happened in the encounter with the patient. Does that make sense to you at all?  GATES: At this stage, you know, I’d still put the onus on the doctor to write the conclusions and the summary and not delegate that.  The tradeoffs you make a little bit are somewhat dependent on the situation you’re in. If you’re in Africa, So, yes, the doctor’s still going to have to do a lot of work, but just the quality of letting the patient and the people around them interact and ask questions and have things explained, that alone is such a quality improvement. It’s mind blowing.   LEE: So since you mentioned, you know, Africa—and, of course, this touches on the mission and some of the priorities of the Gates Foundation and this idea of democratization of access to expert medical care—what’s the most interesting stuff going on right now? Are there people and organizations or technologies that are impressing you or that you’re tracking?  GATES: Yeah. So the Gates Foundation has given out a lot of grants to people in Africa doing education, agriculture but more healthcare examples than anything. And the way these things start off, they often start out either being patient-centric in a narrow situation, like, OK, I’m a pregnant woman; talk to me. Or, I have infectious disease symptoms; talk to me. Or they’re connected to a health worker where they’re helping that worker get their job done. And we have lots of pilots out, you know, in both of those cases.   The dream would be eventually to have the thing the patient consults be so broad that it’s like having a doctor available who understands the local things.   LEE: Right.   GATES: We’re not there yet. But over the next two or three years, you know, particularly given the worsening financial constraints against African health systems, where the withdrawal of money has been dramatic, you know, figuring out how to take this—what I sometimes call “free intelligence”—and build a quality health system around that, we will have to be more radical in low-income countries than any rich country is ever going to be.   LEE: Also, there’s maybe a different regulatory environment, so some of those things maybe are easier? Because right now, I think the world hasn’t figured out how to and whether to regulate, let’s say, an AI that might give a medical diagnosis or write a prescription for a medication.  BUBECK: Yeah. I think one issue with this, and it’s also slowing down the deployment of AI in healthcare more generally, is a lack of proper benchmark. Because, you know, you were mentioning the USMLE, for example. That’s a great test to test human beings and their knowledge of healthcare and medicine. But it’s not a great test to give to an AI.  It’s not asking the right questions. So finding what are the right questions to test whether an AI system is ready to give diagnosis in a constrained setting, that’s a very, very important direction, which to my surprise, is not yet accelerating at the rate that I was hoping for.  LEE: OK, so that gives me an excuse to get more now into the core AI tech because something I’ve discussed with both of you is this issue of what are the right tests. And you both know the very first test I give to any new spin of an LLM is I present a patient, the results—a mythical patient—the results of my physical exam, my mythical physical exam. Maybe some results of some initial labs. And then I present or propose a differential diagnosis. And if you’re not in medicine, a differential diagnosis you can just think of as a prioritized list of the possible diagnoses that fit with all that data. And in that proposed differential, I always intentionally make two mistakes.  I make a textbook technical error in one of the possible elements of the differential diagnosis, and I have an error of omission. And, you know, I just want to know, does the LLM understand what I’m talking about? And all the good ones out there do now. But then I want to know, can it spot the errors? And then most importantly, is it willing to tell me I’m wrong, that I’ve made a mistake?   That last piece seems really hard for AI today. And so let me ask you first, Seb, because at the time of this taping, of course, there was a new spin of GPT-4o last week that became overly sycophantic. In other words, it was actually prone in that test of mine not only to not tell me I’m wrong, but it actually praised me for the creativity of my differential.What’s up with that?  BUBECK: Yeah, I guess it’s a testament to the fact that training those models is still more of an art than a science. So it’s a difficult job. Just to be clear with the audience, we have rolled back thatversion of GPT-4o, so now we don’t have the sycophant version out there.  Yeah, no, it’s a really difficult question. It has to do … as you said, it’s very technical. It has to do with the post-training and how, like, where do you nudge the model? So, you know, there is this very classical by now technique called RLHF, where you push the model in the direction of a certain reward model. So the reward model is just telling the model, you know, what behavior is good, what behavior is bad.  But this reward model is itself an LLM, and, you know, Bill was saying at the very beginning of the conversation that we don’t really understand how those LLMs deal with concepts like, you know, where is the capital of France located? Things like that. It is the same thing for this reward model. We don’t know why it says that it prefers one output to another, and whether this is correlated with some sycophancy is, you know, something that we discovered basically just now. That if you push too hard in optimization on this reward model, you will get a sycophant model.  So it’s kind of … what I’m trying to say is we became too good at what we were doing, and we ended up, in fact, in a trap of the reward model.  LEE: I mean, you do want … it’s a difficult balance because you do want models to follow your desires and …  BUBECK: It’s a very difficult, very difficult balance.  LEE: So this brings up then the following question for me, which is the extent to which we think we’ll need to have specially trained models for things. So let me start with you, Bill. Do you have a point of view on whether we will need to, you know, quote-unquote take AI models to med school? Have them specially trained? Like, if you were going to deploy something to give medical care in underserved parts of the world, do we need to do something special to create those models?  GATES: We certainly need to teach them the African languages and the unique dialects so that the multimedia interactions are very high quality. We certainly need to teach them the disease prevalence and unique disease patterns like, you know, neglected tropical diseases and malaria. So we need to gather a set of facts that somebody trying to go for a US customer base, you know, wouldn’t necessarily have that in there.  Those two things are actually very straightforward because the additional training time is small. I’d say for the next few years, we’ll also need to do reinforcement learning about the context of being a doctor and how important certain behaviors are. Humans learn over the course of their life to some degree that, I’m in a different context and the way I behave in terms of being willing to criticize or be nice, you know, how important is it? Who’s here? What’s my relationship to them?   Right now, these machines don’t have that broad social experience. And so if you know it’s going to be used for health things, a lot of reinforcement learning of the very best humans in that context would still be valuable. Eventually, the models will, having read all the literature of the world about good doctors, bad doctors, it’ll understand as soon as you say, “I want you to be a doctor diagnosing somebody.” All of the implicit reinforcement that fits that situation, you know, will be there. LEE: Yeah. GATES: And so I hope three years from now, we don’t have to do that reinforcement learning. But today, for any medical context, you would want a lot of data to reinforce tone, willingness to say things when, you know, there might be something significant at stake.  LEE: Yeah. So, you know, something Bill said, kind of, reminds me of another thing that I think we missed, which is, the context also … and the specialization also pertains to different, I guess, what we still call “modes,” although I don’t know if the idea of multimodal is the same as it was two years ago. But, you know, what do you make of all of the hubbub around—in fact, within Microsoft Research, this is a big deal, but I think we’re far from alone—you know, medical images and vision, video, proteins and molecules, cell, you know, cellular data and so on.  BUBECK: Yeah. OK. So there is a lot to say to everything … to the last, you know, couple of minutes. Maybe on the specialization aspect, you know, I think there is, hiding behind this, a really fundamental scientific question of whether eventually we have a singular AGIthat kind of knows everything and you can just put, you know, explain your own context and it will just get it and understand everything.  That’s one vision. I have to say, I don’t particularly believe in this vision. In fact, we humans are not like that at all. I think, hopefully, we are general intelligences, yet we have to specialize a lot. And, you know, I did myself a lot of RL, reinforcement learning, on mathematics. Like, that’s what I did, you know, spent a lot of time doing that. And I didn’t improve on other aspects. You know, in fact, I probably degraded in other aspects.So it’s … I think it’s an important example to have in mind.  LEE: I think I might disagree with you on that, though, because, like, doesn’t a model have to see both good science and bad science in order to be able to gain the ability to discern between the two?  BUBECK: Yeah, no, that absolutely. I think there is value in seeing the generality, in having a very broad base. But then you, kind of, specialize on verticals. And this is where also, you know, open-weights model, which we haven’t talked about yet, are really important because they allow you to provide this broad base to everyone. And then you can specialize on top of it.  LEE: So we have about three hours of stuff to talk about, but our time is actually running low. BUBECK: Yes, yes, yes.   LEE: So I think I want … there’s a more provocative question. It’s almost a silly question, but I need to ask it of the two of you, which is, is there a future, you know, where AI replaces doctors or replaces, you know, medical specialties that we have today? So what does the world look like, say, five years from now?  GATES: Well, it’s important to distinguish healthcare discovery activity from healthcare delivery activity. We focused mostly on delivery. I think it’s very much within the realm of possibility that the AI is not only accelerating healthcare discovery but substituting for a lot of the roles of, you know, I’m an organic chemist, or I run various types of assays. I can see those, which are, you know, testable-output-type jobs but with still very high value, I can see, you know, some replacement in those areas before the doctor.   The doctor, still understanding the human condition and long-term dialogues, you know, they’ve had a lifetime of reinforcement of that, particularly when you get into areas like mental health. So I wouldn’t say in five years, either people will choose to adopt it, but it will be profound that there’ll be this nearly free intelligence that can do follow-up, that can help you, you know, make sure you went through different possibilities.  And so I’d say, yes, we’ll have doctors, but I’d say healthcare will be massively transformed in its quality and in efficiency by AI in that time period.  LEE: Is there a comparison, useful comparison, say, between doctors and, say, programmers, computer programmers, or doctors and, I don’t know, lawyers?  GATES: Programming is another one that has, kind of, a mathematical correctness to it, you know, and so the objective function that you’re trying to reinforce to, as soon as you can understand the state machines, you can have something that’s “checkable”; that’s correct. So I think programming, you know, which is weird to say, that the machine will beat us at most programming tasks before we let it take over roles that have deep empathy, you know, physical presence and social understanding in them.  LEE: Yeah. By the way, you know, I fully expect in five years that AI will produce mathematical proofs that are checkable for validity, easily checkable, because they’ll be written in a proof-checking language like Lean or something but will be so complex that no human mathematician can understand them. I expect that to happen.   I can imagine in some fields, like cellular biology, we could have the same situation in the future because the molecular pathways, the chemistry, biochemistry of human cells or living cells is as complex as any mathematics, and so it seems possible that we may be in a state where in wet lab, we see, Oh yeah, this actually works, but no one can understand why.  BUBECK: Yeah, absolutely. I mean, I think I really agree with Bill’s distinction of the discovery and the delivery, and indeed, the discovery’s when you can check things, and at the end, there is an artifact that you can verify. You know, you can run the protocol in the wet lab and seeproduced what you wanted. So I absolutely agree with that.   And in fact, you know, we don’t have to talk five years from now. I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini. So this is really amazing. And, you know, just very quickly, just so people know, it was about this statistical physics model, the frustrated Potts model, which has to do with coloring, and basically, the case of three colors, like, more than two colors was open for a long time, and o3 was able to reduce the case of three colors to two colors.   LEE: Yeah.  BUBECK: Which is just, like, astounding. And this is not … this is now. This is happening right now. So this is something that I personally didn’t expect it would happen so quickly, and it’s due to those reasoning models.   Now, on the delivery side, I would add something more to it for the reason why doctors and, in fact, lawyers and coders will remain for a long time, and it’s because we still don’t understand how those models generalize. Like, at the end of the day, we are not able to tell you when they are confronted with a really new, novel situation, whether they will work or not.  Nobody is able to give you that guarantee. And I think until we understand this generalization better, we’re not going to be willing to just let the system in the wild without human supervision.  LEE: But don’t human doctors, human specialists … so, for example, a cardiologist sees a patient in a certain way that a nephrologist …  BUBECK: Yeah. LEE: … or an endocrinologist might not. BUBECK: That’s right. But another cardiologist will understand and, kind of, expect a certain level of generalization from their peer. And this, we just don’t have it with AI models. Now, of course, you’re exactly right. That generalization is also hard for humans. Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know. LEE: OK. You know, the podcast is focused on what’s happened over the last two years. But now, I’d like one provocative prediction about what you think the world of AI and medicine is going to be at some point in the future. You pick your timeframe. I don’t care if it’s two years or 20 years from now, but, you know, what do you think will be different about AI in medicine in that future than today?  BUBECK: Yeah, I think the deployment is going to accelerate soon. Like, we’re really not missing very much. There is this enormous capability overhang. Like, even if progress completely stopped, with current systems, we can do a lot more than what we’re doing right now. So I think this will … this has to be realized, you know, sooner rather than later.  And I think it’s probably dependent on these benchmarks and proper evaluation and tying this with regulation. So these are things that take time in human society and for good reason. But now we already are at two years; you know, give it another two years and it should be really …   LEE: Will AI prescribe your medicines? Write your prescriptions?  BUBECK: I think yes. I think yes.  LEE: OK. Bill?  GATES: Well, I think the next two years, we’ll have massive pilots, and so the amount of use of the AI, still in a copilot-type mode, you know, we should get millions of patient visits, you know, both in general medicine and in the mental health side, as well. And I think that’s going to build up both the data and the confidence to give the AI some additional autonomy. You know, are you going to let it talk to you at night when you’re panicked about your mental health with some ability to escalate? And, you know, I’ve gone so far as to tell politicians with national health systems that if they deploy AI appropriately, that the quality of care, the overload of the doctors, the improvement in the economics will be enough that their voters will be stunned because they just don’t expect this, and, you know, they could be reelectedjust on this one thing of fixing what is a very overloaded and economically challenged health system in these rich countries.  You know, my personal role is going to be to make sure that in the poorer countries, there isn’t some lag; in fact, in many cases, that we’ll be more aggressive because, you know, we’re comparing to having no access to doctors at all. And, you know, so I think whether it’s India or Africa, there’ll be lessons that are globally valuable because we need medical intelligence. And, you know, thank god AI is going to provide a lot of that.  LEE: Well, on that optimistic note, I think that’s a good way to end. Bill, Seb, really appreciate all of this.   I think the most fundamental prediction we made in the book is that AI would actually find its way into the practice of medicine, and I think that that at least has come true, maybe in different ways than we expected, but it’s come true, and I think it’ll only accelerate from here. So thanks again, both of you.   GATES: Yeah. Thanks, you guys.  BUBECK: Thank you, Peter. Thanks, Bill.  LEE: I just always feel such a sense of privilege to have a chance to interact and actually work with people like Bill and Sébastien.    With Bill, I’m always amazed at how practically minded he is. He’s really thinking about the nuts and bolts of what AI might be able to do for people, and his thoughts about underserved parts of the world, the idea that we might actually be able to empower people with access to expert medical knowledge, I think is both inspiring and amazing.   And then, Seb, Sébastien Bubeck, he’s just absolutely a brilliant mind. He has a really firm grip on the deep mathematics of artificial intelligence and brings that to bear in his research and development work. And where that mathematics takes him isn’t just into the nuts and bolts of algorithms but into philosophical questions about the nature of intelligence.   One of the things that Sébastien brought up was the state of evaluation of AI systems. And indeed, he was fairly critical in our conversation. But of course, the world of AI research and development is just moving so fast, and indeed, since we recorded our conversation, OpenAI, in fact, released a new evaluation metric that is directly relevant to medical applications, and that is something called HealthBench. And Microsoft Research also released a new evaluation approach or process called ADeLe.   HealthBench and ADeLe are examples of new approaches to evaluating AI models that are less about testing their knowledge and ability to pass multiple-choice exams and instead are evaluation approaches designed to assess how well AI models are able to complete tasks that actually arise every day in typical healthcare or biomedical research settings. These are examples of really important good work that speak to how well AI models work in the real world of healthcare and biomedical research and how well they can collaborate with human beings in those settings.  You know, I asked Bill and Seb to make some predictions about the future. You know, my own answer, I expect that we’re going to be able to use AI to change how we diagnose patients, change how we decide treatment options.   If you’re a doctor or a nurse and you encounter a patient, you’ll ask questions, do a physical exam, you know, call out for labs just like you do today, but then you’ll be able to engage with AI based on all of that data and just ask, you know, based on all the other people who have gone through the same experience, who have similar data, how were they diagnosed? How were they treated? What were their outcomes? And what does that mean for the patient I have right now? Some people call it the “patients like me” paradigm. And I think that’s going to become real because of AI within our lifetimes. That idea of really grounding the delivery in healthcare and medical practice through data and intelligence, I actually now don’t see any barriers to that future becoming real.   I’d like to extend another big thank you to Bill and Sébastien for their time. And to our listeners, as always, it’s a pleasure to have you along for the ride. I hope you’ll join us for our remaining conversations, as well as a second coauthor roundtable with Carey and Zak.   Until next time.   #how #reshaping #future #healthcare #medical
    WWW.MICROSOFT.COM
    How AI is reshaping the future of healthcare and medical research
    Transcript [MUSIC]      [BOOK PASSAGE]   PETER LEE: “In ‘The Little Black Bag,’ a classic science fiction story, a high-tech doctor’s kit of the future is accidentally transported back to the 1950s, into the shaky hands of a washed-up, alcoholic doctor. The ultimate medical tool, it redeems the doctor wielding it, allowing him to practice gratifyingly heroic medicine. … The tale ends badly for the doctor and his treacherous assistant, but it offered a picture of how advanced technology could transform medicine—powerful when it was written nearly 75 years ago and still so today. What would be the Al equivalent of that little black bag? At this moment when new capabilities are emerging, how do we imagine them into medicine?”   [END OF BOOK PASSAGE]     [THEME MUSIC]     This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.    Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?     In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.   [THEME MUSIC FADES] The book passage I read at the top is from “Chapter 10: The Big Black Bag.”  In imagining AI in medicine, Carey, Zak, and I included in our book two fictional accounts. In the first, a medical resident consults GPT-4 on her personal phone as the patient in front of her crashes. Within seconds, it offers an alternate response based on recent literature. In the second account, a 90-year-old woman with several chronic conditions is living independently and receiving near-constant medical support from an AI aide.    In our conversations with the guests we’ve spoken to so far, we’ve caught a glimpse of these predicted futures, seeing how clinicians and patients are actually using AI today and how developers are leveraging the technology in the healthcare products and services they’re creating. In fact, that first fictional account isn’t so fictional after all, as most of the doctors in the real world actually appear to be using AI at least occasionally—and sometimes much more than occasionally—to help in their daily clinical work. And as for the second fictional account, which is more of a science fiction account, it seems we are indeed on the verge of a new way of delivering and receiving healthcare, though the future is still very much open.  As we continue to examine the current state of AI in healthcare and its potential to transform the field, I’m pleased to welcome Bill Gates and Sébastien Bubeck.   Bill may be best known as the co-founder of Microsoft, having created the company with his childhood friend Paul Allen in 1975. He’s now the founder of Breakthrough Energy, which aims to advance clean energy innovation, and TerraPower, a company developing groundbreaking nuclear energy and science technologies. He also chairs the world’s largest philanthropic organization, the Gates Foundation, and focuses on solving a variety of health challenges around the globe and here at home.  Sébastien is a research lead at OpenAI. He was previously a distinguished scientist, vice president of AI, and a colleague of mine here at Microsoft, where his work included spearheading the development of the family of small language models known as Phi. While at Microsoft, he also coauthored the discussion-provoking 2023 paper “Sparks of Artificial General Intelligence,” which presented the results of early experiments with GPT-4 conducted by a small team from Microsoft Research.    [TRANSITION MUSIC]   Here’s my conversation with Bill Gates and Sébastien Bubeck.  LEE: Bill, welcome.  BILL GATES: Thank you.  LEE: Seb …  SÉBASTIEN BUBECK: Yeah. Hi, hi, Peter. Nice to be here.  LEE: You know, one of the things that I’ve been doing just to get the conversation warmed up is to talk about origin stories, and what I mean about origin stories is, you know, what was the first contact that you had with large language models or the concept of generative AI that convinced you or made you think that something really important was happening?  And so, Bill, I think I’ve heard the story about, you know, the time when the OpenAI folks—Sam Altman, Greg Brockman, and others—showed you something, but could we hear from you what those early encounters were like and what was going through your mind?   GATES: Well, I’d been visiting OpenAI soon after it was created to see things like GPT-2 and to see the little arm they had that was trying to match human manipulation and, you know, looking at their games like Dota that they were trying to get as good as human play. And honestly, I didn’t think the language model stuff they were doing, even when they got to GPT-3, would show the ability to learn, you know, in the same sense that a human reads a biology book and is able to take that knowledge and access it not only to pass a test but also to create new medicines.  And so my challenge to them was that if their LLM could get a five on the advanced placement biology test, then I would say, OK, it took biologic knowledge and encoded it in an accessible way and that I didn’t expect them to do that very quickly but it would be profound.   And it was only about six months after I challenged them to do that, that an early version of GPT-4 they brought up to a dinner at my house, and in fact, it answered most of the questions that night very well. The one it got totally wrong, we were … because it was so good, we kept thinking, Oh, we must be wrong. It turned out it was a math weakness [LAUGHTER] that, you know, we later understood that that was an area of, weirdly, of incredible weakness of those early models. But, you know, that was when I realized, OK, the age of cheap intelligence was at its beginning.  LEE: Yeah. So I guess it seems like you had something similar to me in that my first encounters, I actually harbored some skepticism. Is it fair to say you were skeptical before that?  GATES: Well, the idea that we’ve figured out how to encode and access knowledge in this very deep sense without even understanding the nature of the encoding, …  LEE: Right.   GATES: … that is a bit weird.   LEE: Yeah.  GATES: We have an algorithm that creates the computation, but even say, OK, where is the president’s birthday stored in there? Where is this fact stored in there? The fact that even now when we’re playing around, getting a little bit more sense of it, it’s opaque to us what the semantic encoding is, it’s, kind of, amazing to me. I thought the invention of knowledge storage would be an explicit way of encoding knowledge, not an implicit statistical training.  LEE: Yeah, yeah. All right. So, Seb, you know, on this same topic, you know, I got—as we say at Microsoft—I got pulled into the tent. [LAUGHS]  BUBECK: Yes.   LEE: Because this was a very secret project. And then, um, I had the opportunity to select a small number of researchers in MSR [Microsoft Research] to join and start investigating this thing seriously. And the first person I pulled in was you.  BUBECK: Yeah.  LEE: And so what were your first encounters? Because I actually don’t remember what happened then.  BUBECK: Oh, I remember it very well. [LAUGHS] My first encounter with GPT-4 was in a meeting with the two of you, actually. But my kind of first contact, the first moment where I realized that something was happening with generative AI, was before that. And I agree with Bill that I also wasn’t too impressed by GPT-3.  I though that it was kind of, you know, very naturally mimicking the web, sort of parroting what was written there in a nice way. Still in a way which seemed very impressive. But it wasn’t really intelligent in any way. But shortly after GPT-3, there was a model before GPT-4 that really shocked me, and this was the first image generation model, DALL-E 1.  So that was in 2021. And I will forever remember the press release of OpenAI where they had this prompt of an avocado chair and then you had this image of the avocado chair. [LAUGHTER] And what really shocked me is that clearly the model kind of “understood” what is a chair, what is an avocado, and was able to merge those concepts.  So this was really, to me, the first moment where I saw some understanding in those models.   LEE: So this was, just to get the timing right, that was before I pulled you into the tent.  BUBECK: That was before. That was like a year before.  LEE: Right.   BUBECK: And now I will tell you how, you know, we went from that moment to the meeting with the two of you and GPT-4.  So once I saw this kind of understanding, I thought, OK, fine. It understands concept, but it’s still not able to reason. It cannot—as, you know, Bill was saying—it cannot learn from your document. It cannot reason.   So I set out to try to prove that. You know, this is what I was in the business of at the time, trying to prove things in mathematics. So I was trying to prove that basically autoregressive transformers could never reason. So I was trying to prove this. And after a year of work, I had something reasonable to show. And so I had the meeting with the two of you, and I had this example where I wanted to say, there is no way that an LLM is going to be able to do x.  And then as soon as I … I don’t know if you remember, Bill. But as soon as I said that, you said, oh, but wait a second. I had, you know, the OpenAI crew at my house recently, and they showed me a new model. Why don’t we ask this new model this question?   LEE: Yeah. BUBECK: And we did, and it solved it on the spot. And that really, honestly, just changed my life. Like, you know, I had been working for a year trying to say that this was impossible. And just right there, it was shown to be possible.   LEE: [LAUGHS] One of the very first things I got interested in—because I was really thinking a lot about healthcare—was healthcare and medicine.  And I don’t know if the two of you remember, but I ended up doing a lot of tests. I ran through, you know, step one and step two of the US Medical Licensing Exam. Did a whole bunch of other things. I wrote this big report. It was, you know, I can’t remember … a couple hundred pages.   And I needed to share this with someone. I didn’t … there weren’t too many people I could share it with. So I sent, I think, a copy to you, Bill. Sent a copy to you, Seb.   I hardly slept for about a week putting that report together. And, yeah, and I kept working on it. But I was far from alone. I think everyone who was in the tent, so to speak, in those early days was going through something pretty similar. All right. So I think … of course, a lot of what I put in the report also ended up being examples that made it into the book.  But the main purpose of this conversation isn’t to reminisce about [LAUGHS] or indulge in those reminiscences but to talk about what’s happening in healthcare and medicine. And, you know, as I said, we wrote this book. We did it very, very quickly. Seb, you helped. Bill, you know, you provided a review and some endorsements.  But, you know, honestly, we didn’t know what we were talking about because no one had access to this thing. And so we just made a bunch of guesses. So really, the whole thing I wanted to probe with the two of you is, now with two years of experience out in the world, what, you know, what do we think is happening today?  You know, is AI actually having an impact, positive or negative, on healthcare and medicine? And what do we now think is going to happen in the next two years, five years, or 10 years? And so I realize it’s a little bit too abstract to just ask it that way. So let me just try to narrow the discussion and guide us a little bit.   Um, the kind of administrative and clerical work, paperwork, around healthcare—and we made a lot of guesses about that—that appears to be going well, but, you know, Bill, I know we’ve discussed that sometimes that you think there ought to be a lot more going on. Do you have a viewpoint on how AI is actually finding its way into reducing paperwork?  GATES: Well, I’m stunned … I don’t think there should be a patient-doctor meeting where the AI is not sitting in and both transcribing, offering to help with the paperwork, and even making suggestions, although the doctor will be the one, you know, who makes the final decision about the diagnosis and whatever prescription gets done.   It’s so helpful. You know, when that patient goes home and their, you know, son who wants to understand what happened has some questions, that AI should be available to continue that conversation. And the way you can improve that experience and streamline things and, you know, involve the people who advise you. I don’t understand why that’s not more adopted, because there you still have the human in the loop making that final decision.  But even for, like, follow-up calls to make sure the patient did things, to understand if they have concerns and knowing when to escalate back to the doctor, the benefit is incredible. And, you know, that thing is ready for prime time. That paradigm is ready for prime time, in my view.  LEE: Yeah, there are some good products, but it seems like the number one use right now—and we kind of got this from some of the previous guests in previous episodes—is the use of AI just to respond to emails from patients. [LAUGHTER] Does that make sense to you?  BUBECK: Yeah. So maybe I want to second what Bill was saying but maybe take a step back first. You know, two years ago, like, the concept of clinical scribes, which is one of the things that we’re talking about right now, it would have sounded, in fact, it sounded two years ago, borderline dangerous. Because everybody was worried about hallucinations. What happened if you have this AI listening in and then it transcribes, you know, something wrong?  Now, two years later, I think it’s mostly working. And in fact, it is not yet, you know, fully adopted. You’re right. But it is in production. It is used, you know, in many, many places. So this rate of progress is astounding because it wasn’t obvious that we would be able to overcome those obstacles of hallucination. It’s not to say that hallucinations are fully solved. In the case of the closed system, they are.   Now, I think more generally what’s going on in the background is that there is something that we, that certainly I, underestimated, which is this management overhead. So I think the reason why this is not adopted everywhere is really a training and teaching aspect. People need to be taught, like, those systems, how to interact with them.  And one example that I really like, a study that recently appeared where they tried to use ChatGPT for diagnosis and they were comparing doctors without and with ChatGPT (opens in new tab). And the amazing thing … so this was a set of cases where the accuracy of the doctors alone was around 75%. ChatGPT alone was 90%. So that’s already kind of mind blowing. But then the kicker is that doctors with ChatGPT was 80%.   Intelligence alone is not enough. It’s also how it’s presented, how you interact with it. And ChatGPT, it’s an amazing tool. Obviously, I absolutely love it. But it’s not … you don’t want a doctor to have to type in, you know, prompts and use it that way.  It should be, as Bill was saying, kind of running continuously in the background, sending you notifications. And you have to be really careful of the rate at which those notifications are being sent. Because if they are too frequent, then the doctor will learn to ignore them. So you have to … all of those things matter, in fact, at least as much as the level of intelligence of the machine.  LEE: One of the things I think about, Bill, in that scenario that you described, doctors do some thinking about the patient when they write the note. So, you know, I’m always a little uncertain whether it’s actually … you know, you wouldn’t necessarily want to fully automate this, I don’t think. Or at least there needs to be some prompt to the doctor to make sure that the doctor puts some thought into what happened in the encounter with the patient. Does that make sense to you at all?  GATES: At this stage, you know, I’d still put the onus on the doctor to write the conclusions and the summary and not delegate that.  The tradeoffs you make a little bit are somewhat dependent on the situation you’re in. If you’re in Africa, So, yes, the doctor’s still going to have to do a lot of work, but just the quality of letting the patient and the people around them interact and ask questions and have things explained, that alone is such a quality improvement. It’s mind blowing.   LEE: So since you mentioned, you know, Africa—and, of course, this touches on the mission and some of the priorities of the Gates Foundation and this idea of democratization of access to expert medical care—what’s the most interesting stuff going on right now? Are there people and organizations or technologies that are impressing you or that you’re tracking?  GATES: Yeah. So the Gates Foundation has given out a lot of grants to people in Africa doing education, agriculture but more healthcare examples than anything. And the way these things start off, they often start out either being patient-centric in a narrow situation, like, OK, I’m a pregnant woman; talk to me. Or, I have infectious disease symptoms; talk to me. Or they’re connected to a health worker where they’re helping that worker get their job done. And we have lots of pilots out, you know, in both of those cases.   The dream would be eventually to have the thing the patient consults be so broad that it’s like having a doctor available who understands the local things.   LEE: Right.   GATES: We’re not there yet. But over the next two or three years, you know, particularly given the worsening financial constraints against African health systems, where the withdrawal of money has been dramatic, you know, figuring out how to take this—what I sometimes call “free intelligence”—and build a quality health system around that, we will have to be more radical in low-income countries than any rich country is ever going to be.   LEE: Also, there’s maybe a different regulatory environment, so some of those things maybe are easier? Because right now, I think the world hasn’t figured out how to and whether to regulate, let’s say, an AI that might give a medical diagnosis or write a prescription for a medication.  BUBECK: Yeah. I think one issue with this, and it’s also slowing down the deployment of AI in healthcare more generally, is a lack of proper benchmark. Because, you know, you were mentioning the USMLE [United States Medical Licensing Examination], for example. That’s a great test to test human beings and their knowledge of healthcare and medicine. But it’s not a great test to give to an AI.  It’s not asking the right questions. So finding what are the right questions to test whether an AI system is ready to give diagnosis in a constrained setting, that’s a very, very important direction, which to my surprise, is not yet accelerating at the rate that I was hoping for.  LEE: OK, so that gives me an excuse to get more now into the core AI tech because something I’ve discussed with both of you is this issue of what are the right tests. And you both know the very first test I give to any new spin of an LLM is I present a patient, the results—a mythical patient—the results of my physical exam, my mythical physical exam. Maybe some results of some initial labs. And then I present or propose a differential diagnosis. And if you’re not in medicine, a differential diagnosis you can just think of as a prioritized list of the possible diagnoses that fit with all that data. And in that proposed differential, I always intentionally make two mistakes.  I make a textbook technical error in one of the possible elements of the differential diagnosis, and I have an error of omission. And, you know, I just want to know, does the LLM understand what I’m talking about? And all the good ones out there do now. But then I want to know, can it spot the errors? And then most importantly, is it willing to tell me I’m wrong, that I’ve made a mistake?   That last piece seems really hard for AI today. And so let me ask you first, Seb, because at the time of this taping, of course, there was a new spin of GPT-4o last week that became overly sycophantic. In other words, it was actually prone in that test of mine not only to not tell me I’m wrong, but it actually praised me for the creativity of my differential. [LAUGHTER] What’s up with that?  BUBECK: Yeah, I guess it’s a testament to the fact that training those models is still more of an art than a science. So it’s a difficult job. Just to be clear with the audience, we have rolled back that [LAUGHS] version of GPT-4o, so now we don’t have the sycophant version out there.  Yeah, no, it’s a really difficult question. It has to do … as you said, it’s very technical. It has to do with the post-training and how, like, where do you nudge the model? So, you know, there is this very classical by now technique called RLHF [reinforcement learning from human feedback], where you push the model in the direction of a certain reward model. So the reward model is just telling the model, you know, what behavior is good, what behavior is bad.  But this reward model is itself an LLM, and, you know, Bill was saying at the very beginning of the conversation that we don’t really understand how those LLMs deal with concepts like, you know, where is the capital of France located? Things like that. It is the same thing for this reward model. We don’t know why it says that it prefers one output to another, and whether this is correlated with some sycophancy is, you know, something that we discovered basically just now. That if you push too hard in optimization on this reward model, you will get a sycophant model.  So it’s kind of … what I’m trying to say is we became too good at what we were doing, and we ended up, in fact, in a trap of the reward model.  LEE: I mean, you do want … it’s a difficult balance because you do want models to follow your desires and …  BUBECK: It’s a very difficult, very difficult balance.  LEE: So this brings up then the following question for me, which is the extent to which we think we’ll need to have specially trained models for things. So let me start with you, Bill. Do you have a point of view on whether we will need to, you know, quote-unquote take AI models to med school? Have them specially trained? Like, if you were going to deploy something to give medical care in underserved parts of the world, do we need to do something special to create those models?  GATES: We certainly need to teach them the African languages and the unique dialects so that the multimedia interactions are very high quality. We certainly need to teach them the disease prevalence and unique disease patterns like, you know, neglected tropical diseases and malaria. So we need to gather a set of facts that somebody trying to go for a US customer base, you know, wouldn’t necessarily have that in there.  Those two things are actually very straightforward because the additional training time is small. I’d say for the next few years, we’ll also need to do reinforcement learning about the context of being a doctor and how important certain behaviors are. Humans learn over the course of their life to some degree that, I’m in a different context and the way I behave in terms of being willing to criticize or be nice, you know, how important is it? Who’s here? What’s my relationship to them?   Right now, these machines don’t have that broad social experience. And so if you know it’s going to be used for health things, a lot of reinforcement learning of the very best humans in that context would still be valuable. Eventually, the models will, having read all the literature of the world about good doctors, bad doctors, it’ll understand as soon as you say, “I want you to be a doctor diagnosing somebody.” All of the implicit reinforcement that fits that situation, you know, will be there. LEE: Yeah. GATES: And so I hope three years from now, we don’t have to do that reinforcement learning. But today, for any medical context, you would want a lot of data to reinforce tone, willingness to say things when, you know, there might be something significant at stake.  LEE: Yeah. So, you know, something Bill said, kind of, reminds me of another thing that I think we missed, which is, the context also … and the specialization also pertains to different, I guess, what we still call “modes,” although I don’t know if the idea of multimodal is the same as it was two years ago. But, you know, what do you make of all of the hubbub around—in fact, within Microsoft Research, this is a big deal, but I think we’re far from alone—you know, medical images and vision, video, proteins and molecules, cell, you know, cellular data and so on.  BUBECK: Yeah. OK. So there is a lot to say to everything … to the last, you know, couple of minutes. Maybe on the specialization aspect, you know, I think there is, hiding behind this, a really fundamental scientific question of whether eventually we have a singular AGI [artificial general intelligence] that kind of knows everything and you can just put, you know, explain your own context and it will just get it and understand everything.  That’s one vision. I have to say, I don’t particularly believe in this vision. In fact, we humans are not like that at all. I think, hopefully, we are general intelligences, yet we have to specialize a lot. And, you know, I did myself a lot of RL, reinforcement learning, on mathematics. Like, that’s what I did, you know, spent a lot of time doing that. And I didn’t improve on other aspects. You know, in fact, I probably degraded in other aspects. [LAUGHTER] So it’s … I think it’s an important example to have in mind.  LEE: I think I might disagree with you on that, though, because, like, doesn’t a model have to see both good science and bad science in order to be able to gain the ability to discern between the two?  BUBECK: Yeah, no, that absolutely. I think there is value in seeing the generality, in having a very broad base. But then you, kind of, specialize on verticals. And this is where also, you know, open-weights model, which we haven’t talked about yet, are really important because they allow you to provide this broad base to everyone. And then you can specialize on top of it.  LEE: So we have about three hours of stuff to talk about, but our time is actually running low. BUBECK: Yes, yes, yes.   LEE: So I think I want … there’s a more provocative question. It’s almost a silly question, but I need to ask it of the two of you, which is, is there a future, you know, where AI replaces doctors or replaces, you know, medical specialties that we have today? So what does the world look like, say, five years from now?  GATES: Well, it’s important to distinguish healthcare discovery activity from healthcare delivery activity. We focused mostly on delivery. I think it’s very much within the realm of possibility that the AI is not only accelerating healthcare discovery but substituting for a lot of the roles of, you know, I’m an organic chemist, or I run various types of assays. I can see those, which are, you know, testable-output-type jobs but with still very high value, I can see, you know, some replacement in those areas before the doctor.   The doctor, still understanding the human condition and long-term dialogues, you know, they’ve had a lifetime of reinforcement of that, particularly when you get into areas like mental health. So I wouldn’t say in five years, either people will choose to adopt it, but it will be profound that there’ll be this nearly free intelligence that can do follow-up, that can help you, you know, make sure you went through different possibilities.  And so I’d say, yes, we’ll have doctors, but I’d say healthcare will be massively transformed in its quality and in efficiency by AI in that time period.  LEE: Is there a comparison, useful comparison, say, between doctors and, say, programmers, computer programmers, or doctors and, I don’t know, lawyers?  GATES: Programming is another one that has, kind of, a mathematical correctness to it, you know, and so the objective function that you’re trying to reinforce to, as soon as you can understand the state machines, you can have something that’s “checkable”; that’s correct. So I think programming, you know, which is weird to say, that the machine will beat us at most programming tasks before we let it take over roles that have deep empathy, you know, physical presence and social understanding in them.  LEE: Yeah. By the way, you know, I fully expect in five years that AI will produce mathematical proofs that are checkable for validity, easily checkable, because they’ll be written in a proof-checking language like Lean or something but will be so complex that no human mathematician can understand them. I expect that to happen.   I can imagine in some fields, like cellular biology, we could have the same situation in the future because the molecular pathways, the chemistry, biochemistry of human cells or living cells is as complex as any mathematics, and so it seems possible that we may be in a state where in wet lab, we see, Oh yeah, this actually works, but no one can understand why.  BUBECK: Yeah, absolutely. I mean, I think I really agree with Bill’s distinction of the discovery and the delivery, and indeed, the discovery’s when you can check things, and at the end, there is an artifact that you can verify. You know, you can run the protocol in the wet lab and see [if you have] produced what you wanted. So I absolutely agree with that.   And in fact, you know, we don’t have to talk five years from now. I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini (opens in new tab). So this is really amazing. And, you know, just very quickly, just so people know, it was about this statistical physics model, the frustrated Potts model, which has to do with coloring, and basically, the case of three colors, like, more than two colors was open for a long time, and o3 was able to reduce the case of three colors to two colors.   LEE: Yeah.  BUBECK: Which is just, like, astounding. And this is not … this is now. This is happening right now. So this is something that I personally didn’t expect it would happen so quickly, and it’s due to those reasoning models.   Now, on the delivery side, I would add something more to it for the reason why doctors and, in fact, lawyers and coders will remain for a long time, and it’s because we still don’t understand how those models generalize. Like, at the end of the day, we are not able to tell you when they are confronted with a really new, novel situation, whether they will work or not.  Nobody is able to give you that guarantee. And I think until we understand this generalization better, we’re not going to be willing to just let the system in the wild without human supervision.  LEE: But don’t human doctors, human specialists … so, for example, a cardiologist sees a patient in a certain way that a nephrologist …  BUBECK: Yeah. LEE: … or an endocrinologist might not. BUBECK: That’s right. But another cardiologist will understand and, kind of, expect a certain level of generalization from their peer. And this, we just don’t have it with AI models. Now, of course, you’re exactly right. That generalization is also hard for humans. Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know. LEE: OK. You know, the podcast is focused on what’s happened over the last two years. But now, I’d like one provocative prediction about what you think the world of AI and medicine is going to be at some point in the future. You pick your timeframe. I don’t care if it’s two years or 20 years from now, but, you know, what do you think will be different about AI in medicine in that future than today?  BUBECK: Yeah, I think the deployment is going to accelerate soon. Like, we’re really not missing very much. There is this enormous capability overhang. Like, even if progress completely stopped, with current systems, we can do a lot more than what we’re doing right now. So I think this will … this has to be realized, you know, sooner rather than later.  And I think it’s probably dependent on these benchmarks and proper evaluation and tying this with regulation. So these are things that take time in human society and for good reason. But now we already are at two years; you know, give it another two years and it should be really …   LEE: Will AI prescribe your medicines? Write your prescriptions?  BUBECK: I think yes. I think yes.  LEE: OK. Bill?  GATES: Well, I think the next two years, we’ll have massive pilots, and so the amount of use of the AI, still in a copilot-type mode, you know, we should get millions of patient visits, you know, both in general medicine and in the mental health side, as well. And I think that’s going to build up both the data and the confidence to give the AI some additional autonomy. You know, are you going to let it talk to you at night when you’re panicked about your mental health with some ability to escalate? And, you know, I’ve gone so far as to tell politicians with national health systems that if they deploy AI appropriately, that the quality of care, the overload of the doctors, the improvement in the economics will be enough that their voters will be stunned because they just don’t expect this, and, you know, they could be reelected [LAUGHTER] just on this one thing of fixing what is a very overloaded and economically challenged health system in these rich countries.  You know, my personal role is going to be to make sure that in the poorer countries, there isn’t some lag; in fact, in many cases, that we’ll be more aggressive because, you know, we’re comparing to having no access to doctors at all. And, you know, so I think whether it’s India or Africa, there’ll be lessons that are globally valuable because we need medical intelligence. And, you know, thank god AI is going to provide a lot of that.  LEE: Well, on that optimistic note, I think that’s a good way to end. Bill, Seb, really appreciate all of this.   I think the most fundamental prediction we made in the book is that AI would actually find its way into the practice of medicine, and I think that that at least has come true, maybe in different ways than we expected, but it’s come true, and I think it’ll only accelerate from here. So thanks again, both of you.  [TRANSITION MUSIC]  GATES: Yeah. Thanks, you guys.  BUBECK: Thank you, Peter. Thanks, Bill.  LEE: I just always feel such a sense of privilege to have a chance to interact and actually work with people like Bill and Sébastien.    With Bill, I’m always amazed at how practically minded he is. He’s really thinking about the nuts and bolts of what AI might be able to do for people, and his thoughts about underserved parts of the world, the idea that we might actually be able to empower people with access to expert medical knowledge, I think is both inspiring and amazing.   And then, Seb, Sébastien Bubeck, he’s just absolutely a brilliant mind. He has a really firm grip on the deep mathematics of artificial intelligence and brings that to bear in his research and development work. And where that mathematics takes him isn’t just into the nuts and bolts of algorithms but into philosophical questions about the nature of intelligence.   One of the things that Sébastien brought up was the state of evaluation of AI systems. And indeed, he was fairly critical in our conversation. But of course, the world of AI research and development is just moving so fast, and indeed, since we recorded our conversation, OpenAI, in fact, released a new evaluation metric that is directly relevant to medical applications, and that is something called HealthBench. And Microsoft Research also released a new evaluation approach or process called ADeLe.   HealthBench and ADeLe are examples of new approaches to evaluating AI models that are less about testing their knowledge and ability to pass multiple-choice exams and instead are evaluation approaches designed to assess how well AI models are able to complete tasks that actually arise every day in typical healthcare or biomedical research settings. These are examples of really important good work that speak to how well AI models work in the real world of healthcare and biomedical research and how well they can collaborate with human beings in those settings.  You know, I asked Bill and Seb to make some predictions about the future. You know, my own answer, I expect that we’re going to be able to use AI to change how we diagnose patients, change how we decide treatment options.   If you’re a doctor or a nurse and you encounter a patient, you’ll ask questions, do a physical exam, you know, call out for labs just like you do today, but then you’ll be able to engage with AI based on all of that data and just ask, you know, based on all the other people who have gone through the same experience, who have similar data, how were they diagnosed? How were they treated? What were their outcomes? And what does that mean for the patient I have right now? Some people call it the “patients like me” paradigm. And I think that’s going to become real because of AI within our lifetimes. That idea of really grounding the delivery in healthcare and medical practice through data and intelligence, I actually now don’t see any barriers to that future becoming real.  [THEME MUSIC]  I’d like to extend another big thank you to Bill and Sébastien for their time. And to our listeners, as always, it’s a pleasure to have you along for the ride. I hope you’ll join us for our remaining conversations, as well as a second coauthor roundtable with Carey and Zak.   Until next time.   [MUSIC FADES]
    0 Comentários 0 Compartilhamentos
  • Clean up your code: How to create your own C# code style

    While there’s more than one way to format Unity C# code, agreeing on a consistent code style for your project enables your team to develop a clean, readable, and scalable codebase. In this blog, we provide some guidelines and examples you can use to develop and maintain your own code style guide.Please note that these are only recommendations based on those provided by Microsoft. This is your chance to get inspired and decide what works best for your team.Ideally, a Unity project should feel like it’s been developed by a single author, no matter how many developers actually work on it. A style guide can help unify your approach for creating a more cohesive codebase.It’s a good idea to follow industry standards wherever possible and browse through existing style guides as a starting point for creating your own. In partnership with internal and external Unity experts, we released a new e-book, Create a C# style guide: Write cleaner code that scales for inspiration, based on Microsoft’s comprehensive C# style.The Google C# style guide is another great resource for defining guidelines around naming, formatting, and commenting conventions. Again, there is no right or wrong method, but we chose to follow Microsoft standards for our own guide.Our e-book, along with an example C# file, are available for free. Both resources focus on the most common coding conventions you’ll encounter while developing in Unity. These are all, essentially, a subset of the Microsoft Framework Design guidelines, which include an extensive number of best practices beyond what we cover in this post.We recommend customizing the guidelines provided in our style guide to suit your team’s preferences. These preferences should be prioritized over our suggestions and the Microsoft Framework Design guidelines if they’re in conflict.The development of a style guide requires an upfront investment but will pay dividends later. For example, managing a single set of standards can reduce the time developers spend on ramping up if they move onto another project.Of course, consistency is key. If you follow these suggestions and need to modify your style guide in the future, a few find-and-replace operations can quickly migrate your codebase.Concentrate on creating a pragmatic style guide that fits your needs by covering the majority of day-to-day use cases. Don’t overengineer it by attempting to account for every single edge case from the start. The guide will evolve organically over time as your team iterates on it from project to project.Most style guides include basic formatting rules. Meanwhile, specific naming conventions, policy on use of namespaces, and strategies for classes are somewhat abstract areas that can be refined over time.Let’s look at some common formatting and naming conventions you might consider for your style guide.The two common indentation styles in C# are the Allman style, which places the opening curly braces on a new line, and the K&R style, or “one true brace style,” which keeps the opening brace on the same line as the previous header.In an effort to improve readability, we picked the Allman style for our guide, based on the Microsoft Framework Design guidelines:
    Whatever style you choose, ensure that every programmer on your team follows it.A guide should also indicate whether braces from nested multiline statements should be included. While removing braces in the following example won’t throw an error, it can be confusing to read. That’s why our guide recommends applying braces for clarity, even if they are optional.Something as simple as horizontal spacing can enhance your code’s appearance onscreen. While your personal formatting preferences can vary, here are a few recommendations from our style guide to improve overall readability:Add spaces to decrease code density:The extra whitespace can give a sense of visual separation between parts of a lineUse a single space after a comma, between function arguments.Don’t add a space after the parenthesis and function arguments.Don’t use spaces between a function name and parenthesis.Avoid spaces inside brackets.Use a single space before flow control conditions: Add a space between the flow comparison operator and the parentheses.Use a single space before and after comparison operators.Variables typically represent a state, so try to attribute clear and descriptive nouns to their names. You can then prefix booleans with a verbfor variables that must indicate a true or false value. Often they are the answer to a question such as, is the player running? Is the game over? Prefix them with a verb to clarify their meaning. This is often paired with a description or condition, e.g., isPlayerDead, isWalking, hasDamageMultiplier, etc.Since methods perform actions, a good rule of thumb is to start their names with a verb and add context as needed, e.g., GetDirection, FindTarget, and so on, based on the return type. If the method has a bool return type, it can also be framed as a question.Much like boolean variables themselves, prefix methods with a verb if they return a true-false condition. This phrases them in the form of a question, e.g., IsGameOver, HasStartedTurn.Several conventions exist for naming events and event handles. In our style guide, we name the event with a verb phrase,similar to a method. Choose a name that communicates the state change accurately.Use the present or past participle to indicate events “before” or “after.” For instance, specify OpeningDoor for an event before opening a door and DoorOpened for an event afterward.We also recommend that you don’t abbreviate names. While saving a few characters can feel like a productivity gain in the short term, what is obvious to you now might not be in a year’s time to another teammate. Your variable names should reveal their intent and be easy to pronounce. Single letter variables are fine for loops and math expressions, but otherwise, you should avoid abbreviations. Clarity is more important than any time saved from omitting a few vowels.At the same time, use one variable declaration per line; it’s less compact, but also less error prone and enhances readability. Avoid redundant names. If your class is called Player, you don’t need to create member variables called PlayerScore or PlayerTarget. Trim them down to Score or Target.In addition, avoid too many prefixes or special encoding.A practice highlighted in our guide is to prefix private member variables with an underscoreto differentiate them from local variables. Some style guides use prefixes for private member variables, constants, or static variables, so the name reveals more about the variable.However, it’s good practice to prefix interface names with a capital “I” and follow this with an adjective that describes the functionality. You can even prefix the event raising methodwith “On”: The subject that invokes the event usually does so from a method prefixed with “On,” e.g., OnOpeningDoor or OnDoorOpened.Camel case and Pascal case are common standards in use, compared to Snake or Kebab case, or Hungarian notations. Our guide recommends Pascal case for public fields, enums, classes, and methods, and Camel case for private variables, as this is common practice in Unity.There are many additional rules to consider outside of what’s covered here. The example guide and our new e-book, Create a C# style guide: Write cleaner code that scales, provide many more tips for better organization.The concept of clean code aims to make development more scalable by conforming to a set of production standards. A style guide should remove most of the guesswork developers would otherwise have regarding the conventions they should follow. Ultimately, this guide should help your team establish a consensus around your codebase to grow your project into a commercial-scale production.Just how comprehensive your style guide should be depends on your situation. It’s up to your team to decide if they want their guide to set rules for more abstract, intangible concepts. This could include rules for using namespaces, breaking down classes, or implementing directives like the #region directive. While #region can help you collapse and hide sections of code in C# files, making large files more manageable, it’s also an example of something that many developers consider to be code smells or anti-patterns. Therefore, you might want to avoid setting strict standards for these aspects of code styling. Not everything needs to be outlined in the guide – sometimes it’s enough to simply discuss and make decisions as a team.When we talked to the experts who helped create our guide, their main piece of advice was code readability above all else. Here are some pointers on how to achieve that:Use fewer arguments: Arguments can increase the complexity of your method. By reducing their number, you make methods easier to read and test.Avoid excessive overloading: You can generate an endless permutation of method overloads. Select the few that reflect how you’ll call the method, and then implement those. If you do overload a method, prevent confusion by making sure that each method signature has a distinct number of arguments.Avoid side effects: A method only needs to do what its name advertises. Avoid modifying anything outside of its scope. Pass in arguments by value instead of reference when possible. So when sending back results via the out or ref keyword, verify that’s the one thing you intend the method to accomplish. Though side effects are useful for certain tasks, they can lead to unintended consequences. Write a method without side effects to cut down on unexpected behavior.We hope that this blog helps you kick off the development of your own style guide. Learn more from our example C# file and brand new e-book where you can review our suggested rules and customize them to your team’s preferences.The specifics of individual rules are less important than having everyone agree to follow them consistently. When in doubt, rely on your team’s own evolving guide to settle any style disagreements. After all, this is a group effort.
    #clean #your #code #how #create
    Clean up your code: How to create your own C# code style
    While there’s more than one way to format Unity C# code, agreeing on a consistent code style for your project enables your team to develop a clean, readable, and scalable codebase. In this blog, we provide some guidelines and examples you can use to develop and maintain your own code style guide.Please note that these are only recommendations based on those provided by Microsoft. This is your chance to get inspired and decide what works best for your team.Ideally, a Unity project should feel like it’s been developed by a single author, no matter how many developers actually work on it. A style guide can help unify your approach for creating a more cohesive codebase.It’s a good idea to follow industry standards wherever possible and browse through existing style guides as a starting point for creating your own. In partnership with internal and external Unity experts, we released a new e-book, Create a C# style guide: Write cleaner code that scales for inspiration, based on Microsoft’s comprehensive C# style.The Google C# style guide is another great resource for defining guidelines around naming, formatting, and commenting conventions. Again, there is no right or wrong method, but we chose to follow Microsoft standards for our own guide.Our e-book, along with an example C# file, are available for free. Both resources focus on the most common coding conventions you’ll encounter while developing in Unity. These are all, essentially, a subset of the Microsoft Framework Design guidelines, which include an extensive number of best practices beyond what we cover in this post.We recommend customizing the guidelines provided in our style guide to suit your team’s preferences. These preferences should be prioritized over our suggestions and the Microsoft Framework Design guidelines if they’re in conflict.The development of a style guide requires an upfront investment but will pay dividends later. For example, managing a single set of standards can reduce the time developers spend on ramping up if they move onto another project.Of course, consistency is key. If you follow these suggestions and need to modify your style guide in the future, a few find-and-replace operations can quickly migrate your codebase.Concentrate on creating a pragmatic style guide that fits your needs by covering the majority of day-to-day use cases. Don’t overengineer it by attempting to account for every single edge case from the start. The guide will evolve organically over time as your team iterates on it from project to project.Most style guides include basic formatting rules. Meanwhile, specific naming conventions, policy on use of namespaces, and strategies for classes are somewhat abstract areas that can be refined over time.Let’s look at some common formatting and naming conventions you might consider for your style guide.The two common indentation styles in C# are the Allman style, which places the opening curly braces on a new line, and the K&R style, or “one true brace style,” which keeps the opening brace on the same line as the previous header.In an effort to improve readability, we picked the Allman style for our guide, based on the Microsoft Framework Design guidelines: Whatever style you choose, ensure that every programmer on your team follows it.A guide should also indicate whether braces from nested multiline statements should be included. While removing braces in the following example won’t throw an error, it can be confusing to read. That’s why our guide recommends applying braces for clarity, even if they are optional.Something as simple as horizontal spacing can enhance your code’s appearance onscreen. While your personal formatting preferences can vary, here are a few recommendations from our style guide to improve overall readability:Add spaces to decrease code density:The extra whitespace can give a sense of visual separation between parts of a lineUse a single space after a comma, between function arguments.Don’t add a space after the parenthesis and function arguments.Don’t use spaces between a function name and parenthesis.Avoid spaces inside brackets.Use a single space before flow control conditions: Add a space between the flow comparison operator and the parentheses.Use a single space before and after comparison operators.Variables typically represent a state, so try to attribute clear and descriptive nouns to their names. You can then prefix booleans with a verbfor variables that must indicate a true or false value. Often they are the answer to a question such as, is the player running? Is the game over? Prefix them with a verb to clarify their meaning. This is often paired with a description or condition, e.g., isPlayerDead, isWalking, hasDamageMultiplier, etc.Since methods perform actions, a good rule of thumb is to start their names with a verb and add context as needed, e.g., GetDirection, FindTarget, and so on, based on the return type. If the method has a bool return type, it can also be framed as a question.Much like boolean variables themselves, prefix methods with a verb if they return a true-false condition. This phrases them in the form of a question, e.g., IsGameOver, HasStartedTurn.Several conventions exist for naming events and event handles. In our style guide, we name the event with a verb phrase,similar to a method. Choose a name that communicates the state change accurately.Use the present or past participle to indicate events “before” or “after.” For instance, specify OpeningDoor for an event before opening a door and DoorOpened for an event afterward.We also recommend that you don’t abbreviate names. While saving a few characters can feel like a productivity gain in the short term, what is obvious to you now might not be in a year’s time to another teammate. Your variable names should reveal their intent and be easy to pronounce. Single letter variables are fine for loops and math expressions, but otherwise, you should avoid abbreviations. Clarity is more important than any time saved from omitting a few vowels.At the same time, use one variable declaration per line; it’s less compact, but also less error prone and enhances readability. Avoid redundant names. If your class is called Player, you don’t need to create member variables called PlayerScore or PlayerTarget. Trim them down to Score or Target.In addition, avoid too many prefixes or special encoding.A practice highlighted in our guide is to prefix private member variables with an underscoreto differentiate them from local variables. Some style guides use prefixes for private member variables, constants, or static variables, so the name reveals more about the variable.However, it’s good practice to prefix interface names with a capital “I” and follow this with an adjective that describes the functionality. You can even prefix the event raising methodwith “On”: The subject that invokes the event usually does so from a method prefixed with “On,” e.g., OnOpeningDoor or OnDoorOpened.Camel case and Pascal case are common standards in use, compared to Snake or Kebab case, or Hungarian notations. Our guide recommends Pascal case for public fields, enums, classes, and methods, and Camel case for private variables, as this is common practice in Unity.There are many additional rules to consider outside of what’s covered here. The example guide and our new e-book, Create a C# style guide: Write cleaner code that scales, provide many more tips for better organization.The concept of clean code aims to make development more scalable by conforming to a set of production standards. A style guide should remove most of the guesswork developers would otherwise have regarding the conventions they should follow. Ultimately, this guide should help your team establish a consensus around your codebase to grow your project into a commercial-scale production.Just how comprehensive your style guide should be depends on your situation. It’s up to your team to decide if they want their guide to set rules for more abstract, intangible concepts. This could include rules for using namespaces, breaking down classes, or implementing directives like the #region directive. While #region can help you collapse and hide sections of code in C# files, making large files more manageable, it’s also an example of something that many developers consider to be code smells or anti-patterns. Therefore, you might want to avoid setting strict standards for these aspects of code styling. Not everything needs to be outlined in the guide – sometimes it’s enough to simply discuss and make decisions as a team.When we talked to the experts who helped create our guide, their main piece of advice was code readability above all else. Here are some pointers on how to achieve that:Use fewer arguments: Arguments can increase the complexity of your method. By reducing their number, you make methods easier to read and test.Avoid excessive overloading: You can generate an endless permutation of method overloads. Select the few that reflect how you’ll call the method, and then implement those. If you do overload a method, prevent confusion by making sure that each method signature has a distinct number of arguments.Avoid side effects: A method only needs to do what its name advertises. Avoid modifying anything outside of its scope. Pass in arguments by value instead of reference when possible. So when sending back results via the out or ref keyword, verify that’s the one thing you intend the method to accomplish. Though side effects are useful for certain tasks, they can lead to unintended consequences. Write a method without side effects to cut down on unexpected behavior.We hope that this blog helps you kick off the development of your own style guide. Learn more from our example C# file and brand new e-book where you can review our suggested rules and customize them to your team’s preferences.The specifics of individual rules are less important than having everyone agree to follow them consistently. When in doubt, rely on your team’s own evolving guide to settle any style disagreements. After all, this is a group effort. #clean #your #code #how #create
    UNITY.COM
    Clean up your code: How to create your own C# code style
    While there’s more than one way to format Unity C# code, agreeing on a consistent code style for your project enables your team to develop a clean, readable, and scalable codebase. In this blog, we provide some guidelines and examples you can use to develop and maintain your own code style guide.Please note that these are only recommendations based on those provided by Microsoft. This is your chance to get inspired and decide what works best for your team.Ideally, a Unity project should feel like it’s been developed by a single author, no matter how many developers actually work on it. A style guide can help unify your approach for creating a more cohesive codebase.It’s a good idea to follow industry standards wherever possible and browse through existing style guides as a starting point for creating your own. In partnership with internal and external Unity experts, we released a new e-book, Create a C# style guide: Write cleaner code that scales for inspiration, based on Microsoft’s comprehensive C# style.The Google C# style guide is another great resource for defining guidelines around naming, formatting, and commenting conventions. Again, there is no right or wrong method, but we chose to follow Microsoft standards for our own guide.Our e-book, along with an example C# file, are available for free. Both resources focus on the most common coding conventions you’ll encounter while developing in Unity. These are all, essentially, a subset of the Microsoft Framework Design guidelines, which include an extensive number of best practices beyond what we cover in this post.We recommend customizing the guidelines provided in our style guide to suit your team’s preferences. These preferences should be prioritized over our suggestions and the Microsoft Framework Design guidelines if they’re in conflict.The development of a style guide requires an upfront investment but will pay dividends later. For example, managing a single set of standards can reduce the time developers spend on ramping up if they move onto another project.Of course, consistency is key. If you follow these suggestions and need to modify your style guide in the future, a few find-and-replace operations can quickly migrate your codebase.Concentrate on creating a pragmatic style guide that fits your needs by covering the majority of day-to-day use cases. Don’t overengineer it by attempting to account for every single edge case from the start. The guide will evolve organically over time as your team iterates on it from project to project.Most style guides include basic formatting rules. Meanwhile, specific naming conventions, policy on use of namespaces, and strategies for classes are somewhat abstract areas that can be refined over time.Let’s look at some common formatting and naming conventions you might consider for your style guide.The two common indentation styles in C# are the Allman style, which places the opening curly braces on a new line (also known as the BSD style from BSD Unix), and the K&R style, or “one true brace style,” which keeps the opening brace on the same line as the previous header.In an effort to improve readability, we picked the Allman style for our guide, based on the Microsoft Framework Design guidelines: Whatever style you choose, ensure that every programmer on your team follows it.A guide should also indicate whether braces from nested multiline statements should be included. While removing braces in the following example won’t throw an error, it can be confusing to read. That’s why our guide recommends applying braces for clarity, even if they are optional.Something as simple as horizontal spacing can enhance your code’s appearance onscreen. While your personal formatting preferences can vary, here are a few recommendations from our style guide to improve overall readability:Add spaces to decrease code density:The extra whitespace can give a sense of visual separation between parts of a lineUse a single space after a comma, between function arguments.Don’t add a space after the parenthesis and function arguments.Don’t use spaces between a function name and parenthesis.Avoid spaces inside brackets.Use a single space before flow control conditions: Add a space between the flow comparison operator and the parentheses.Use a single space before and after comparison operators.Variables typically represent a state, so try to attribute clear and descriptive nouns to their names. You can then prefix booleans with a verbfor variables that must indicate a true or false value. Often they are the answer to a question such as, is the player running? Is the game over? Prefix them with a verb to clarify their meaning. This is often paired with a description or condition, e.g., isPlayerDead, isWalking, hasDamageMultiplier, etc.Since methods perform actions, a good rule of thumb is to start their names with a verb and add context as needed, e.g., GetDirection, FindTarget, and so on, based on the return type. If the method has a bool return type, it can also be framed as a question.Much like boolean variables themselves, prefix methods with a verb if they return a true-false condition. This phrases them in the form of a question, e.g., IsGameOver, HasStartedTurn.Several conventions exist for naming events and event handles. In our style guide, we name the event with a verb phrase,similar to a method. Choose a name that communicates the state change accurately.Use the present or past participle to indicate events “before” or “after.” For instance, specify OpeningDoor for an event before opening a door and DoorOpened for an event afterward.We also recommend that you don’t abbreviate names. While saving a few characters can feel like a productivity gain in the short term, what is obvious to you now might not be in a year’s time to another teammate. Your variable names should reveal their intent and be easy to pronounce. Single letter variables are fine for loops and math expressions, but otherwise, you should avoid abbreviations. Clarity is more important than any time saved from omitting a few vowels.At the same time, use one variable declaration per line; it’s less compact, but also less error prone and enhances readability. Avoid redundant names. If your class is called Player, you don’t need to create member variables called PlayerScore or PlayerTarget. Trim them down to Score or Target.In addition, avoid too many prefixes or special encoding.A practice highlighted in our guide is to prefix private member variables with an underscore (_) to differentiate them from local variables. Some style guides use prefixes for private member variables (m_), constants (k_), or static variables (s_), so the name reveals more about the variable.However, it’s good practice to prefix interface names with a capital “I” and follow this with an adjective that describes the functionality. You can even prefix the event raising method (in the subject) with “On”: The subject that invokes the event usually does so from a method prefixed with “On,” e.g., OnOpeningDoor or OnDoorOpened.Camel case and Pascal case are common standards in use, compared to Snake or Kebab case, or Hungarian notations. Our guide recommends Pascal case for public fields, enums, classes, and methods, and Camel case for private variables, as this is common practice in Unity.There are many additional rules to consider outside of what’s covered here. The example guide and our new e-book, Create a C# style guide: Write cleaner code that scales, provide many more tips for better organization.The concept of clean code aims to make development more scalable by conforming to a set of production standards. A style guide should remove most of the guesswork developers would otherwise have regarding the conventions they should follow. Ultimately, this guide should help your team establish a consensus around your codebase to grow your project into a commercial-scale production.Just how comprehensive your style guide should be depends on your situation. It’s up to your team to decide if they want their guide to set rules for more abstract, intangible concepts. This could include rules for using namespaces, breaking down classes, or implementing directives like the #region directive (or not). While #region can help you collapse and hide sections of code in C# files, making large files more manageable, it’s also an example of something that many developers consider to be code smells or anti-patterns. Therefore, you might want to avoid setting strict standards for these aspects of code styling. Not everything needs to be outlined in the guide – sometimes it’s enough to simply discuss and make decisions as a team.When we talked to the experts who helped create our guide, their main piece of advice was code readability above all else. Here are some pointers on how to achieve that:Use fewer arguments: Arguments can increase the complexity of your method. By reducing their number, you make methods easier to read and test.Avoid excessive overloading: You can generate an endless permutation of method overloads. Select the few that reflect how you’ll call the method, and then implement those. If you do overload a method, prevent confusion by making sure that each method signature has a distinct number of arguments.Avoid side effects: A method only needs to do what its name advertises. Avoid modifying anything outside of its scope. Pass in arguments by value instead of reference when possible. So when sending back results via the out or ref keyword, verify that’s the one thing you intend the method to accomplish. Though side effects are useful for certain tasks, they can lead to unintended consequences. Write a method without side effects to cut down on unexpected behavior.We hope that this blog helps you kick off the development of your own style guide. Learn more from our example C# file and brand new e-book where you can review our suggested rules and customize them to your team’s preferences.The specifics of individual rules are less important than having everyone agree to follow them consistently. When in doubt, rely on your team’s own evolving guide to settle any style disagreements. After all, this is a group effort.
    0 Comentários 0 Compartilhamentos
  • A Coding Guide to Building a Scalable Multi-Agent Communication Systems Using Agent Communication Protocol (ACP)

    In this tutorial, we implement the Agent Communication Protocolthrough building a flexible, ACP-compliant messaging system in Python, leveraging Google’s Gemini API for natural language processing. Beginning with the installation and configuration of the google-generativeai library, the tutorial introduces core abstractions, message types, performatives, and the ACPMessage data class, which standardizes inter-agent communication. By defining ACPAgent and ACPMessageBroker classes, the guide demonstrates how to create, send, route, and process structured messages among multiple autonomous agents. Through clear code examples, users learn to implement querying, requesting actions, and broadcasting information, while maintaining conversation threads, acknowledgments, and error handling.
    import google.generativeai as genai
    import json
    import time
    import uuid
    from enum import Enum
    from typing import Dict, List, Any, Optional
    from dataclasses import dataclass, asdict

    GEMINI_API_KEY = "Use Your Gemini API Key"
    genai.configureWe import essential Python modules, ranging from JSON handling and timing to unique identifier generation and type annotations, to support a structured ACP implementation. It then retrieves the user’s Gemini API key placeholder and configures the google-generativeai client for subsequent calls to the Gemini language model.
    class ACPMessageType:
    """Standard ACP message types"""
    REQUEST = "request"
    RESPONSE = "response"
    INFORM = "inform"
    QUERY = "query"
    SUBSCRIBE = "subscribe"
    UNSUBSCRIBE = "unsubscribe"
    ERROR = "error"
    ACK = "acknowledge"
    The ACPMessageType enumeration defines the core message categories used in the Agent Communication Protocol, including requests, responses, informational broadcasts, queries, and control actions like subscription management, error signaling, and acknowledgments. By centralizing these message types, the protocol ensures consistent handling and routing of inter-agent communications throughout the system.
    class ACPPerformative:
    """ACP speech acts"""
    TELL = "tell"
    ASK = "ask"
    REPLY = "reply"
    REQUEST_ACTION = "request-action"
    AGREE = "agree"
    REFUSE = "refuse"
    PROPOSE = "propose"
    ACCEPT = "accept"
    REJECT = "reject"
    The ACPPerformative enumeration captures the variety of speech acts agents can use when interacting under the ACP framework, mapping high-level intentions, such as making requests, posing questions, giving commands, or negotiating agreements, onto standardized labels. This clear taxonomy enables agents to interpret and respond to messages in contextually appropriate ways, ensuring robust and semantically rich communication.

    @dataclass
    class ACPMessage:
    """Agent Communication Protocol Message Structure"""
    message_id: str
    sender: str
    receiver: str
    performative: str
    content: Dictprotocol: str = "ACP-1.0"
    conversation_id: str = None
    reply_to: str = None
    language: str = "english"
    encoding: str = "json"
    timestamp: float = None

    def __post_init__:
    if self.timestamp is None:
    self.timestamp = time.timeif self.conversation_id is None:
    self.conversation_id = str)

    def to_acp_format-> str:
    """Convert to standard ACP message format"""
    acp_msg = {
    "message-id": self.message_id,
    "sender": self.sender,
    "receiver": self.receiver,
    "performative": self.performative,
    "content": self.content,
    "protocol": self.protocol,
    "conversation-id": self.conversation_id,
    "reply-to": self.reply_to,
    "language": self.language,
    "encoding": self.encoding,
    "timestamp": self.timestamp
    }
    return json.dumps@classmethod
    def from_acp_format-> 'ACPMessage':
    """Parse ACP message from string format"""
    data = json.loadsreturn cls,
    conversation_id=data.get,
    reply_to=data.get,
    language=data.get,
    encoding=data.get,
    timestamp=data.get)
    )

    The ACPMessage data class encapsulates all the fields required for a structured ACP exchange, including identifiers, participants, performative, payload, and metadata such as protocol version, language, and timestamps. Its __post_init__ method auto-populates missing timestamp and conversation_id values, ensuring every message is uniquely tracked. Utility methods to_acp_format and from_acp_format handle serialization to and from the standardized JSON representation for seamless transmission and parsing.
    class ACPAgent:
    """Agent implementing Agent Communication Protocol"""

    def __init__:
    self.agent_id = agent_id
    self.name = name
    self.capabilities = capabilities
    self.model = genai.GenerativeModelself.message_queue: List=self.subscriptions: Dict] = {}
    self.conversations: Dict] = {}

    def create_message-> ACPMessage:
    """Create a new ACP-compliant message"""
    return ACPMessage),
    sender=self.agent_id,
    receiver=receiver,
    performative=performative,
    content=content,
    conversation_id=conversation_id,
    reply_to=reply_to
    )

    def send_inform-> ACPMessage:
    """Send an INFORM message"""
    content = {"fact": fact, "data": data}
    return self.create_messagedef send_query-> ACPMessage:
    """Send a QUERY message"""
    content = {"question": question, "query-type": query_type}
    return self.create_messagedef send_request-> ACPMessage:
    """Send a REQUEST message"""
    content = {"action": action, "parameters": parameters or {}}
    return self.create_messagedef send_reply-> ACPMessage:
    """Send a REPLY message in response to another message"""
    content = {"response": response_data, "original-question": original_msg.content}
    return self.create_messagedef process_message-> Optional:
    """Process incoming ACP message and generate appropriate response"""
    self.message_queue.appendconv_id = message.conversation_id
    if conv_id not in self.conversations:
    self.conversations=self.conversations.appendif message.performative == ACPPerformative.ASK.value:
    return self._handle_queryelif message.performative == ACPPerformative.REQUEST_ACTION.value:
    return self._handle_requestelif message.performative == ACPPerformative.TELL.value:
    return self._handle_informreturn None

    def _handle_query-> ACPMessage:
    """Handle incoming query messages"""
    question = message.content.getprompt = f"As agent {self.name} with capabilities {self.capabilities}, answer: {question}"
    try:
    response = self.model.generate_contentanswer = response.text.stripexcept:
    answer = "Unable to process query at this time"

    return self.send_replydef _handle_request-> ACPMessage:
    """Handle incoming action requests"""
    action = message.content.getparameters = message.content.getif anyfor capability in self.capabilities):
    result = f"Executing {action} with parameters {parameters}"
    status = "agreed"
    else:
    result = f"Cannot perform {action} - not in my capabilities"
    status = "refused"

    return self.send_replydef _handle_inform-> Optional:
    """Handle incoming information messages"""
    fact = message.content.getprintack_content = {"status": "received", "fact": fact}
    return self.create_messageThe ACPAgent class encapsulates an autonomous entity capable of sending, receiving, and processing ACP-compliant messages using Gemini’s language model. It manages its own message queue, conversation history, and subscriptions, and provides helper methodsto construct correctly formatted ACPMessage instances. Incoming messages are routed through process_message, which delegates to specialized handlers for queries, action requests, and informational messages.
    class ACPMessageBroker:
    """Message broker implementing ACP routing and delivery"""

    def __init__:
    self.agents: Dict= {}
    self.message_log: List=self.routing_table: Dict= {}

    def register_agent:
    """Register an agent with the message broker"""
    self.agents= agent
    self.routing_table= "local"
    print")

    def route_message-> bool:
    """Route ACP message to appropriate recipient"""
    if message.receiver not in self.agents:
    printreturn False

    printprintprintprint}")

    receiver_agent = self.agentsresponse = receiver_agent.process_messageself.message_log.appendif response:
    printprintprint}")

    if response.receiver in self.agents:
    self.agents.process_messageself.message_log.appendreturn True

    def broadcast_message:
    """Broadcast message to multiple recipients"""
    for recipient in recipients:
    msg_copy = ACPMessage),
    sender=message.sender,
    receiver=recipient,
    performative=message.performative,
    content=message.content.copy,
    conversation_id=message.conversation_id
    )
    self.route_messageThe ACPMessageBroker serves as the central router for ACP messages, maintaining a registry of agents and a message log. It provides methods to register agents, deliver individual messages via route_message, which handles lookup, logging, and response chaining, and to send the same message to multiple recipients with broadcast_message.
    def demonstrate_acp:
    """Comprehensive demonstration of Agent Communication Protocol"""

    printDEMONSTRATION")
    printbroker = ACPMessageBrokerresearcher = ACPAgentassistant = ACPAgentcalculator = ACPAgentbroker.register_agentbroker.register_agentbroker.register_agentprintfor agent_id, agent in broker.agents.items:
    print: {', '.join}")

    print")
    query_msg = assistant.send_querybroker.route_messageprint")
    calc_request = researcher.send_request+ 10"})
    broker.route_messageprint")
    info_msg = researcher.send_informbroker.route_messageprintprint}")
    print)}")
    print)}")

    printsample_msg = assistant.send_queryprint)
    The demonstrate_acp function orchestrates a hands-on walkthrough of the entire ACP framework: it initializes a broker and three distinct agents, registers them, and illustrates three key interaction scenarios, querying for information, requesting a computation, and sharing an update. After routing each message and handling responses, it prints summary statistics on the message flow. It showcases a formatted ACP message, providing users with a clear, end-to-end example of how agents communicate under the protocol.
    def setup_guide:
    print ACP PROTOCOL FEATURES:

    • Standardized message format with required fields
    • Speech act performatives• Conversation tracking and message threading
    • Error handling and acknowledgments
    • Message routing and delivery confirmation

    EXTEND THE PROTOCOL:
    ```python
    # Create custom agent
    my_agent = ACPAgentbroker.register_agent# Send custom message
    msg = my_agent.send_querybroker.route_message```
    """)

    if __name__ == "__main__":
    setup_guidedemonstrate_acpFinally, the setup_guide function provides a quick-start reference for running the ACP demo in Google Colab, outlining how to obtain and configure your Gemini API key and invoke the demonstrate_acp routine. It also summarizes key protocol features, such as standardized message formats, performatives, and message routing. It provides a concise code snippet illustrating how to register custom agents and send tailored messages.
    In conclusion, this tutorial implements ACP-based multi-agent systems capable of research, computation, and collaboration tasks. The provided sample scenarios illustrate common use cases, information queries, computational requests, and fact sharing, while the broker ensures reliable message delivery and logging. Readers are encouraged to extend the framework by adding new agent capabilities, integrating domain-specific actions, or incorporating more sophisticated subscription and notification mechanisms.

    Download the Notebook on GitHub. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.
    Asif RazzaqWebsite |  + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Yandex Releases Yambda: The World’s Largest Event Dataset to Accelerate Recommender SystemsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Stanford Researchers Introduced Biomni: A Biomedical AI Agent for Automation Across Diverse Tasks and Data TypesAsif Razzaqhttps://www.marktechpost.com/author/6flvq/DeepSeek Releases R1-0528: An Open-Source Reasoning AI Model Delivering Enhanced Math and Code Performance with Single-GPU EfficiencyAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Coding Guide for Building a Self-Improving AI Agent Using Google’s Gemini API with Intelligent Adaptation Features
    #coding #guide #building #scalable #multiagent
    A Coding Guide to Building a Scalable Multi-Agent Communication Systems Using Agent Communication Protocol (ACP)
    In this tutorial, we implement the Agent Communication Protocolthrough building a flexible, ACP-compliant messaging system in Python, leveraging Google’s Gemini API for natural language processing. Beginning with the installation and configuration of the google-generativeai library, the tutorial introduces core abstractions, message types, performatives, and the ACPMessage data class, which standardizes inter-agent communication. By defining ACPAgent and ACPMessageBroker classes, the guide demonstrates how to create, send, route, and process structured messages among multiple autonomous agents. Through clear code examples, users learn to implement querying, requesting actions, and broadcasting information, while maintaining conversation threads, acknowledgments, and error handling. import google.generativeai as genai import json import time import uuid from enum import Enum from typing import Dict, List, Any, Optional from dataclasses import dataclass, asdict GEMINI_API_KEY = "Use Your Gemini API Key" genai.configureWe import essential Python modules, ranging from JSON handling and timing to unique identifier generation and type annotations, to support a structured ACP implementation. It then retrieves the user’s Gemini API key placeholder and configures the google-generativeai client for subsequent calls to the Gemini language model. class ACPMessageType: """Standard ACP message types""" REQUEST = "request" RESPONSE = "response" INFORM = "inform" QUERY = "query" SUBSCRIBE = "subscribe" UNSUBSCRIBE = "unsubscribe" ERROR = "error" ACK = "acknowledge" The ACPMessageType enumeration defines the core message categories used in the Agent Communication Protocol, including requests, responses, informational broadcasts, queries, and control actions like subscription management, error signaling, and acknowledgments. By centralizing these message types, the protocol ensures consistent handling and routing of inter-agent communications throughout the system. class ACPPerformative: """ACP speech acts""" TELL = "tell" ASK = "ask" REPLY = "reply" REQUEST_ACTION = "request-action" AGREE = "agree" REFUSE = "refuse" PROPOSE = "propose" ACCEPT = "accept" REJECT = "reject" The ACPPerformative enumeration captures the variety of speech acts agents can use when interacting under the ACP framework, mapping high-level intentions, such as making requests, posing questions, giving commands, or negotiating agreements, onto standardized labels. This clear taxonomy enables agents to interpret and respond to messages in contextually appropriate ways, ensuring robust and semantically rich communication. @dataclass class ACPMessage: """Agent Communication Protocol Message Structure""" message_id: str sender: str receiver: str performative: str content: Dictprotocol: str = "ACP-1.0" conversation_id: str = None reply_to: str = None language: str = "english" encoding: str = "json" timestamp: float = None def __post_init__: if self.timestamp is None: self.timestamp = time.timeif self.conversation_id is None: self.conversation_id = str) def to_acp_format-> str: """Convert to standard ACP message format""" acp_msg = { "message-id": self.message_id, "sender": self.sender, "receiver": self.receiver, "performative": self.performative, "content": self.content, "protocol": self.protocol, "conversation-id": self.conversation_id, "reply-to": self.reply_to, "language": self.language, "encoding": self.encoding, "timestamp": self.timestamp } return json.dumps@classmethod def from_acp_format-> 'ACPMessage': """Parse ACP message from string format""" data = json.loadsreturn cls, conversation_id=data.get, reply_to=data.get, language=data.get, encoding=data.get, timestamp=data.get) ) The ACPMessage data class encapsulates all the fields required for a structured ACP exchange, including identifiers, participants, performative, payload, and metadata such as protocol version, language, and timestamps. Its __post_init__ method auto-populates missing timestamp and conversation_id values, ensuring every message is uniquely tracked. Utility methods to_acp_format and from_acp_format handle serialization to and from the standardized JSON representation for seamless transmission and parsing. class ACPAgent: """Agent implementing Agent Communication Protocol""" def __init__: self.agent_id = agent_id self.name = name self.capabilities = capabilities self.model = genai.GenerativeModelself.message_queue: List=self.subscriptions: Dict] = {} self.conversations: Dict] = {} def create_message-> ACPMessage: """Create a new ACP-compliant message""" return ACPMessage), sender=self.agent_id, receiver=receiver, performative=performative, content=content, conversation_id=conversation_id, reply_to=reply_to ) def send_inform-> ACPMessage: """Send an INFORM message""" content = {"fact": fact, "data": data} return self.create_messagedef send_query-> ACPMessage: """Send a QUERY message""" content = {"question": question, "query-type": query_type} return self.create_messagedef send_request-> ACPMessage: """Send a REQUEST message""" content = {"action": action, "parameters": parameters or {}} return self.create_messagedef send_reply-> ACPMessage: """Send a REPLY message in response to another message""" content = {"response": response_data, "original-question": original_msg.content} return self.create_messagedef process_message-> Optional: """Process incoming ACP message and generate appropriate response""" self.message_queue.appendconv_id = message.conversation_id if conv_id not in self.conversations: self.conversations=self.conversations.appendif message.performative == ACPPerformative.ASK.value: return self._handle_queryelif message.performative == ACPPerformative.REQUEST_ACTION.value: return self._handle_requestelif message.performative == ACPPerformative.TELL.value: return self._handle_informreturn None def _handle_query-> ACPMessage: """Handle incoming query messages""" question = message.content.getprompt = f"As agent {self.name} with capabilities {self.capabilities}, answer: {question}" try: response = self.model.generate_contentanswer = response.text.stripexcept: answer = "Unable to process query at this time" return self.send_replydef _handle_request-> ACPMessage: """Handle incoming action requests""" action = message.content.getparameters = message.content.getif anyfor capability in self.capabilities): result = f"Executing {action} with parameters {parameters}" status = "agreed" else: result = f"Cannot perform {action} - not in my capabilities" status = "refused" return self.send_replydef _handle_inform-> Optional: """Handle incoming information messages""" fact = message.content.getprintack_content = {"status": "received", "fact": fact} return self.create_messageThe ACPAgent class encapsulates an autonomous entity capable of sending, receiving, and processing ACP-compliant messages using Gemini’s language model. It manages its own message queue, conversation history, and subscriptions, and provides helper methodsto construct correctly formatted ACPMessage instances. Incoming messages are routed through process_message, which delegates to specialized handlers for queries, action requests, and informational messages. class ACPMessageBroker: """Message broker implementing ACP routing and delivery""" def __init__: self.agents: Dict= {} self.message_log: List=self.routing_table: Dict= {} def register_agent: """Register an agent with the message broker""" self.agents= agent self.routing_table= "local" print") def route_message-> bool: """Route ACP message to appropriate recipient""" if message.receiver not in self.agents: printreturn False printprintprintprint}") receiver_agent = self.agentsresponse = receiver_agent.process_messageself.message_log.appendif response: printprintprint}") if response.receiver in self.agents: self.agents.process_messageself.message_log.appendreturn True def broadcast_message: """Broadcast message to multiple recipients""" for recipient in recipients: msg_copy = ACPMessage), sender=message.sender, receiver=recipient, performative=message.performative, content=message.content.copy, conversation_id=message.conversation_id ) self.route_messageThe ACPMessageBroker serves as the central router for ACP messages, maintaining a registry of agents and a message log. It provides methods to register agents, deliver individual messages via route_message, which handles lookup, logging, and response chaining, and to send the same message to multiple recipients with broadcast_message. def demonstrate_acp: """Comprehensive demonstration of Agent Communication Protocol""" printDEMONSTRATION") printbroker = ACPMessageBrokerresearcher = ACPAgentassistant = ACPAgentcalculator = ACPAgentbroker.register_agentbroker.register_agentbroker.register_agentprintfor agent_id, agent in broker.agents.items: print: {', '.join}") print") query_msg = assistant.send_querybroker.route_messageprint") calc_request = researcher.send_request+ 10"}) broker.route_messageprint") info_msg = researcher.send_informbroker.route_messageprintprint}") print)}") print)}") printsample_msg = assistant.send_queryprint) The demonstrate_acp function orchestrates a hands-on walkthrough of the entire ACP framework: it initializes a broker and three distinct agents, registers them, and illustrates three key interaction scenarios, querying for information, requesting a computation, and sharing an update. After routing each message and handling responses, it prints summary statistics on the message flow. It showcases a formatted ACP message, providing users with a clear, end-to-end example of how agents communicate under the protocol. def setup_guide: print🔧 ACP PROTOCOL FEATURES: • Standardized message format with required fields • Speech act performatives• Conversation tracking and message threading • Error handling and acknowledgments • Message routing and delivery confirmation 📝 EXTEND THE PROTOCOL: ```python # Create custom agent my_agent = ACPAgentbroker.register_agent# Send custom message msg = my_agent.send_querybroker.route_message``` """) if __name__ == "__main__": setup_guidedemonstrate_acpFinally, the setup_guide function provides a quick-start reference for running the ACP demo in Google Colab, outlining how to obtain and configure your Gemini API key and invoke the demonstrate_acp routine. It also summarizes key protocol features, such as standardized message formats, performatives, and message routing. It provides a concise code snippet illustrating how to register custom agents and send tailored messages. In conclusion, this tutorial implements ACP-based multi-agent systems capable of research, computation, and collaboration tasks. The provided sample scenarios illustrate common use cases, information queries, computational requests, and fact sharing, while the broker ensures reliable message delivery and logging. Readers are encouraged to extend the framework by adding new agent capabilities, integrating domain-specific actions, or incorporating more sophisticated subscription and notification mechanisms. Download the Notebook on GitHub. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. Asif RazzaqWebsite |  + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Yandex Releases Yambda: The World’s Largest Event Dataset to Accelerate Recommender SystemsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Stanford Researchers Introduced Biomni: A Biomedical AI Agent for Automation Across Diverse Tasks and Data TypesAsif Razzaqhttps://www.marktechpost.com/author/6flvq/DeepSeek Releases R1-0528: An Open-Source Reasoning AI Model Delivering Enhanced Math and Code Performance with Single-GPU EfficiencyAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Coding Guide for Building a Self-Improving AI Agent Using Google’s Gemini API with Intelligent Adaptation Features #coding #guide #building #scalable #multiagent
    WWW.MARKTECHPOST.COM
    A Coding Guide to Building a Scalable Multi-Agent Communication Systems Using Agent Communication Protocol (ACP)
    In this tutorial, we implement the Agent Communication Protocol (ACP) through building a flexible, ACP-compliant messaging system in Python, leveraging Google’s Gemini API for natural language processing. Beginning with the installation and configuration of the google-generativeai library, the tutorial introduces core abstractions, message types, performatives, and the ACPMessage data class, which standardizes inter-agent communication. By defining ACPAgent and ACPMessageBroker classes, the guide demonstrates how to create, send, route, and process structured messages among multiple autonomous agents. Through clear code examples, users learn to implement querying, requesting actions, and broadcasting information, while maintaining conversation threads, acknowledgments, and error handling. import google.generativeai as genai import json import time import uuid from enum import Enum from typing import Dict, List, Any, Optional from dataclasses import dataclass, asdict GEMINI_API_KEY = "Use Your Gemini API Key" genai.configure(api_key=GEMINI_API_KEY) We import essential Python modules, ranging from JSON handling and timing to unique identifier generation and type annotations, to support a structured ACP implementation. It then retrieves the user’s Gemini API key placeholder and configures the google-generativeai client for subsequent calls to the Gemini language model. class ACPMessageType(Enum): """Standard ACP message types""" REQUEST = "request" RESPONSE = "response" INFORM = "inform" QUERY = "query" SUBSCRIBE = "subscribe" UNSUBSCRIBE = "unsubscribe" ERROR = "error" ACK = "acknowledge" The ACPMessageType enumeration defines the core message categories used in the Agent Communication Protocol, including requests, responses, informational broadcasts, queries, and control actions like subscription management, error signaling, and acknowledgments. By centralizing these message types, the protocol ensures consistent handling and routing of inter-agent communications throughout the system. class ACPPerformative(Enum): """ACP speech acts (performatives)""" TELL = "tell" ASK = "ask" REPLY = "reply" REQUEST_ACTION = "request-action" AGREE = "agree" REFUSE = "refuse" PROPOSE = "propose" ACCEPT = "accept" REJECT = "reject" The ACPPerformative enumeration captures the variety of speech acts agents can use when interacting under the ACP framework, mapping high-level intentions, such as making requests, posing questions, giving commands, or negotiating agreements, onto standardized labels. This clear taxonomy enables agents to interpret and respond to messages in contextually appropriate ways, ensuring robust and semantically rich communication. @dataclass class ACPMessage: """Agent Communication Protocol Message Structure""" message_id: str sender: str receiver: str performative: str content: Dict[str, Any] protocol: str = "ACP-1.0" conversation_id: str = None reply_to: str = None language: str = "english" encoding: str = "json" timestamp: float = None def __post_init__(self): if self.timestamp is None: self.timestamp = time.time() if self.conversation_id is None: self.conversation_id = str(uuid.uuid4()) def to_acp_format(self) -> str: """Convert to standard ACP message format""" acp_msg = { "message-id": self.message_id, "sender": self.sender, "receiver": self.receiver, "performative": self.performative, "content": self.content, "protocol": self.protocol, "conversation-id": self.conversation_id, "reply-to": self.reply_to, "language": self.language, "encoding": self.encoding, "timestamp": self.timestamp } return json.dumps(acp_msg, indent=2) @classmethod def from_acp_format(cls, acp_string: str) -> 'ACPMessage': """Parse ACP message from string format""" data = json.loads(acp_string) return cls( message_id=data["message-id"], sender=data["sender"], receiver=data["receiver"], performative=data["performative"], content=data["content"], protocol=data.get("protocol", "ACP-1.0"), conversation_id=data.get("conversation-id"), reply_to=data.get("reply-to"), language=data.get("language", "english"), encoding=data.get("encoding", "json"), timestamp=data.get("timestamp", time.time()) ) The ACPMessage data class encapsulates all the fields required for a structured ACP exchange, including identifiers, participants, performative, payload, and metadata such as protocol version, language, and timestamps. Its __post_init__ method auto-populates missing timestamp and conversation_id values, ensuring every message is uniquely tracked. Utility methods to_acp_format and from_acp_format handle serialization to and from the standardized JSON representation for seamless transmission and parsing. class ACPAgent: """Agent implementing Agent Communication Protocol""" def __init__(self, agent_id: str, name: str, capabilities: List[str]): self.agent_id = agent_id self.name = name self.capabilities = capabilities self.model = genai.GenerativeModel("gemini-1.5-flash") self.message_queue: List[ACPMessage] = [] self.subscriptions: Dict[str, List[str]] = {} self.conversations: Dict[str, List[ACPMessage]] = {} def create_message(self, receiver: str, performative: str, content: Dict[str, Any], conversation_id: str = None, reply_to: str = None) -> ACPMessage: """Create a new ACP-compliant message""" return ACPMessage( message_id=str(uuid.uuid4()), sender=self.agent_id, receiver=receiver, performative=performative, content=content, conversation_id=conversation_id, reply_to=reply_to ) def send_inform(self, receiver: str, fact: str, data: Any = None) -> ACPMessage: """Send an INFORM message (telling someone a fact)""" content = {"fact": fact, "data": data} return self.create_message(receiver, ACPPerformative.TELL.value, content) def send_query(self, receiver: str, question: str, query_type: str = "yes-no") -> ACPMessage: """Send a QUERY message (asking for information)""" content = {"question": question, "query-type": query_type} return self.create_message(receiver, ACPPerformative.ASK.value, content) def send_request(self, receiver: str, action: str, parameters: Dict = None) -> ACPMessage: """Send a REQUEST message (asking someone to perform an action)""" content = {"action": action, "parameters": parameters or {}} return self.create_message(receiver, ACPPerformative.REQUEST_ACTION.value, content) def send_reply(self, original_msg: ACPMessage, response_data: Any) -> ACPMessage: """Send a REPLY message in response to another message""" content = {"response": response_data, "original-question": original_msg.content} return self.create_message( original_msg.sender, ACPPerformative.REPLY.value, content, conversation_id=original_msg.conversation_id, reply_to=original_msg.message_id ) def process_message(self, message: ACPMessage) -> Optional[ACPMessage]: """Process incoming ACP message and generate appropriate response""" self.message_queue.append(message) conv_id = message.conversation_id if conv_id not in self.conversations: self.conversations[conv_id] = [] self.conversations[conv_id].append(message) if message.performative == ACPPerformative.ASK.value: return self._handle_query(message) elif message.performative == ACPPerformative.REQUEST_ACTION.value: return self._handle_request(message) elif message.performative == ACPPerformative.TELL.value: return self._handle_inform(message) return None def _handle_query(self, message: ACPMessage) -> ACPMessage: """Handle incoming query messages""" question = message.content.get("question", "") prompt = f"As agent {self.name} with capabilities {self.capabilities}, answer: {question}" try: response = self.model.generate_content(prompt) answer = response.text.strip() except: answer = "Unable to process query at this time" return self.send_reply(message, {"answer": answer, "confidence": 0.8}) def _handle_request(self, message: ACPMessage) -> ACPMessage: """Handle incoming action requests""" action = message.content.get("action", "") parameters = message.content.get("parameters", {}) if any(capability in action.lower() for capability in self.capabilities): result = f"Executing {action} with parameters {parameters}" status = "agreed" else: result = f"Cannot perform {action} - not in my capabilities" status = "refused" return self.send_reply(message, {"status": status, "result": result}) def _handle_inform(self, message: ACPMessage) -> Optional[ACPMessage]: """Handle incoming information messages""" fact = message.content.get("fact", "") print(f"[{self.name}] Received information: {fact}") ack_content = {"status": "received", "fact": fact} return self.create_message(message.sender, "acknowledge", ack_content, conversation_id=message.conversation_id) The ACPAgent class encapsulates an autonomous entity capable of sending, receiving, and processing ACP-compliant messages using Gemini’s language model. It manages its own message queue, conversation history, and subscriptions, and provides helper methods (send_inform, send_query, send_request, send_reply) to construct correctly formatted ACPMessage instances. Incoming messages are routed through process_message, which delegates to specialized handlers for queries, action requests, and informational messages. class ACPMessageBroker: """Message broker implementing ACP routing and delivery""" def __init__(self): self.agents: Dict[str, ACPAgent] = {} self.message_log: List[ACPMessage] = [] self.routing_table: Dict[str, str] = {} def register_agent(self, agent: ACPAgent): """Register an agent with the message broker""" self.agents[agent.agent_id] = agent self.routing_table[agent.agent_id] = "local" print(f"✓ Registered agent: {agent.name} ({agent.agent_id})") def route_message(self, message: ACPMessage) -> bool: """Route ACP message to appropriate recipient""" if message.receiver not in self.agents: print(f"✗ Receiver {message.receiver} not found") return False print(f"\n📨 ACP MESSAGE ROUTING:") print(f"From: {message.sender} → To: {message.receiver}") print(f"Performative: {message.performative}") print(f"Content: {json.dumps(message.content, indent=2)}") receiver_agent = self.agents[message.receiver] response = receiver_agent.process_message(message) self.message_log.append(message) if response: print(f"\n📤 GENERATED RESPONSE:") print(f"From: {response.sender} → To: {response.receiver}") print(f"Content: {json.dumps(response.content, indent=2)}") if response.receiver in self.agents: self.agents[response.receiver].process_message(response) self.message_log.append(response) return True def broadcast_message(self, message: ACPMessage, recipients: List[str]): """Broadcast message to multiple recipients""" for recipient in recipients: msg_copy = ACPMessage( message_id=str(uuid.uuid4()), sender=message.sender, receiver=recipient, performative=message.performative, content=message.content.copy(), conversation_id=message.conversation_id ) self.route_message(msg_copy) The ACPMessageBroker serves as the central router for ACP messages, maintaining a registry of agents and a message log. It provides methods to register agents, deliver individual messages via route_message, which handles lookup, logging, and response chaining, and to send the same message to multiple recipients with broadcast_message. def demonstrate_acp(): """Comprehensive demonstration of Agent Communication Protocol""" print("🤖 AGENT COMMUNICATION PROTOCOL (ACP) DEMONSTRATION") print("=" * 60) broker = ACPMessageBroker() researcher = ACPAgent("agent-001", "Dr. Research", ["analysis", "research", "data-processing"]) assistant = ACPAgent("agent-002", "AI Assistant", ["information", "scheduling", "communication"]) calculator = ACPAgent("agent-003", "MathBot", ["calculation", "mathematics", "computation"]) broker.register_agent(researcher) broker.register_agent(assistant) broker.register_agent(calculator) print(f"\n📋 REGISTERED AGENTS:") for agent_id, agent in broker.agents.items(): print(f" • {agent.name} ({agent_id}): {', '.join(agent.capabilities)}") print(f"\n🔬 SCENARIO 1: Information Query (ASK performative)") query_msg = assistant.send_query("agent-001", "What are the key factors in AI research?") broker.route_message(query_msg) print(f"\n🔢 SCENARIO 2: Action Request (REQUEST-ACTION performative)") calc_request = researcher.send_request("agent-003", "calculate", {"expression": "sqrt(144) + 10"}) broker.route_message(calc_request) print(f"\n📢 SCENARIO 3: Information Sharing (TELL performative)") info_msg = researcher.send_inform("agent-002", "New research paper published on quantum computing") broker.route_message(info_msg) print(f"\n📊 PROTOCOL STATISTICS:") print(f" • Total messages processed: {len(broker.message_log)}") print(f" • Active conversations: {len(set(msg.conversation_id for msg in broker.message_log))}") print(f" • Message types used: {len(set(msg.performative for msg in broker.message_log))}") print(f"\n📋 SAMPLE ACP MESSAGE FORMAT:") sample_msg = assistant.send_query("agent-001", "Sample question for format demonstration") print(sample_msg.to_acp_format()) The demonstrate_acp function orchestrates a hands-on walkthrough of the entire ACP framework: it initializes a broker and three distinct agents (Researcher, AI Assistant, and MathBot), registers them, and illustrates three key interaction scenarios, querying for information, requesting a computation, and sharing an update. After routing each message and handling responses, it prints summary statistics on the message flow. It showcases a formatted ACP message, providing users with a clear, end-to-end example of how agents communicate under the protocol. def setup_guide(): print(""" 🚀 GOOGLE COLAB SETUP GUIDE: 1. Get Gemini API Key: https://makersuite.google.com/app/apikey 2. Replace: GEMINI_API_KEY = "YOUR_ACTUAL_API_KEY" 3. Run: demonstrate_acp() 🔧 ACP PROTOCOL FEATURES: • Standardized message format with required fields • Speech act performatives (TELL, ASK, REQUEST-ACTION, etc.) • Conversation tracking and message threading • Error handling and acknowledgments • Message routing and delivery confirmation 📝 EXTEND THE PROTOCOL: ```python # Create custom agent my_agent = ACPAgent("my-001", "CustomBot", ["custom-capability"]) broker.register_agent(my_agent) # Send custom message msg = my_agent.send_query("agent-001", "Your question here") broker.route_message(msg) ``` """) if __name__ == "__main__": setup_guide() demonstrate_acp() Finally, the setup_guide function provides a quick-start reference for running the ACP demo in Google Colab, outlining how to obtain and configure your Gemini API key and invoke the demonstrate_acp routine. It also summarizes key protocol features, such as standardized message formats, performatives, and message routing. It provides a concise code snippet illustrating how to register custom agents and send tailored messages. In conclusion, this tutorial implements ACP-based multi-agent systems capable of research, computation, and collaboration tasks. The provided sample scenarios illustrate common use cases, information queries, computational requests, and fact sharing, while the broker ensures reliable message delivery and logging. Readers are encouraged to extend the framework by adding new agent capabilities, integrating domain-specific actions, or incorporating more sophisticated subscription and notification mechanisms. Download the Notebook on GitHub. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. Asif RazzaqWebsite |  + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Yandex Releases Yambda: The World’s Largest Event Dataset to Accelerate Recommender SystemsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Stanford Researchers Introduced Biomni: A Biomedical AI Agent for Automation Across Diverse Tasks and Data TypesAsif Razzaqhttps://www.marktechpost.com/author/6flvq/DeepSeek Releases R1-0528: An Open-Source Reasoning AI Model Delivering Enhanced Math and Code Performance with Single-GPU EfficiencyAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Coding Guide for Building a Self-Improving AI Agent Using Google’s Gemini API with Intelligent Adaptation Features
    0 Comentários 0 Compartilhamentos
  • Blackmagic Design releases DaVinci Resolve 20.0

    html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" ";

    A recording of Blackmagic Design’s livestream of its announcements for NAB 2025. You can see the new features in Da Vinci Resolve 20.0 at 2:13:20 in the video.

    Originally posted on 6 April 2025 for the beta, and updated for the stable release.
    Blackmagic Design has updated DaVinci Resolve, its free colour grading, editing and post-production software, and DaVinci Resolve Studio, its commercial edition.
    DaVinci Resolve Studio 20.0 is a major release, adding over 100 tools, including a set of new AI-powered features for video editing and audio production, and is free to all users.
    Below, we’ve rounded up the key features for colorists and effects artists, including a new Chroma Warp tool, a new deep compositing toolset, and full support for multi-layer EXRs.

    The new Chroma Warp tool makes it possible to create looks with intuitive gesture controls.

    Color page: New Chroma Warp, plus updates to the Resolve FX Warper and Magic Mask

    For grading, the Color page‘s Color Warper gets a new Chroma Warp tool.It is designed to create looks intuitively, with users selecting a color in the viewer, and dragging to adjust its hue and saturation simultaneously.
    Among the existing tools, the Resolve FX Warper effect gets a new Curves Warp mode, which creates a custom polygon with spline points for finer control when warping images.
    Magic Mask, DaVinci Resolve’s AI-based feature for generating mattes, has been updated, and now operates in a single mode for both people and objects.
    Workflow is also now more precise, with users now placing points to make selections, then using the paint tools to include or exclude surrounding regions of the image.
    Another key AI-based feature, the Resolve FX Depth Map effect, which automatically generates depth mattes, has been updated to improve speed and accuracy.
    For color management across a pipeline, the software has been updated to ACES 2.0, and OpenColorIO is supported as Resolve FX.

    Effects artists get deep compositing support in the integrated Fusion compositing toolset.

    Fusion: new deep compositing toolset

    For compositing and effects work, the Fusion page gets support for deep compositing.Deep compositing, long supported in more VFX-focused apps like Nuke, makes use of depth data encoded in image formats like OpenEXR to control object visibility.
    It simplifies the process of generating and managing holdouts, and generates fewer visual artifacts, particularly when working with motion blur or environment fog.
    Deep images can now be viewed in the Fusion viewer or the 3D view, and there is a new set of nodes to merge, transform, resize, crop, recolor and generate holdouts.
    It is also possible to render deep images from the 3D environment, and export them as deep EXRs via the Fusion saver node.
    Fusion: new vector warping toolset, plus support for 180 VR and multi-layer workflows

    Other new features in the Fusion page include a new optical-flow-based vector warping toolset, for image patching and cleanup, and for effects like digital makeup.There is also a new 360° Dome Light for environment lighting, and support for 180 VR, with a number of key tools updated to support 180° workflows.
    Pipeline improvements include full multi-layer workflows, with all of Fusion’s nodes now able to access each layer within multi-layer EXR or PSD files.
    Fusion also now natively supports Cryptomatte ID matte data in EXR files.
    You can read about the new features on the Fusion Page in our story on Fusion Studio 20.0, the latest version of Blackmagic Design’s standalone compositing app, in which they also feature.

    IntelliScript automatically generates an edit timeline matching a user-provided script.

    Other toolsets: lots of new AI features for video editing and audio production

    DaVinci Resolve Studio 20.0 also features a lot of new AI features powered by the software’s Neural Engine, although primarily in the video editing and audio production toolsets.The Cut and Edit pages get new AI tools for automatically creating edit timelines matching a user-provided script; generating animated subtitles; editing or extending music to match clip length; and matching tone, level and reverberance for dialogue.
    There are also new tools for recording new voiceovers during editing to match an edit.
    Workflow improvements include a dedicated curve view for keyframe editing; plus a new MultiText tool and updates to the Text+ tool for better control of the layout of on-screen text.
    For audio post work, the Fairlight page gets new AI features for removing silences from raw footage, and automatically balancing an audio mix.
    We don’t cover video or audio editing on CG Channel, but you can find a complete list of changes via the links at the foot of the story.
    Codec and format support

    Other key changes include native support for ProRes encoding on Windows and Linux systems as well as macOS.MV-HEVC encoding is now supported on systems with NVIDIA GPUs, and H.265 4:2:2 encoding and decoding are GPU-accelerated on NVIDIA’s new Blackwell GPUs.
    Again, you can find a full list of changes to codec and platform support via the links at the foot of the story.

    Upcoming features include generative AI-based background extension.

    Future updates: new toolsets for immersive video and AI background generation

    Blackmagic Design also announced two upcoming features not present in the initial release.Artists creating mixed reality content will get a new toolset for ingesting, editing and delivering immersive video for Apple’s Vision Pro headset.
    There will also be a new generative AI feature, Resolve FX AI Set Extender, available via Blackmagic Cloud.
    More details will be announced later this year, but Blackmagic says that it will enable users to generate new backgrounds for shots by entering simple text prompts.
    The video above shows a range of use cases, including extending or adding objects to existing footage, and generating a complete new background behind a foreground object.
    Price, system requirements and release date

    DaVinci Resolve 20.0 and DaVinci Resolve Studio 20.0 are for Windows 10+, Rocky Linux 8.6, and macOS 14.0+. The updates are free to existing users.New perpetual licenses of the base edition are also free.
    The Studio edition, which adds AI features, stereoscopic 3D tools, and collaboration features, costs following a temporary increase in the US following the introduction of new tariffs.
    Read a full list of new features in DaVinci Resolve 20.0 and DaVinci Resolve Studio 20.0

    Have your say on this story by following CG Channel on Facebook, Instagram and X. As well as being able to comment on stories, followers of our social media accounts can see videos we don’t post on the site itself, including making-ofs for the latest VFX movies, animations, games cinematics and motion graphics projects.
    #blackmagic #design #releases #davinci #resolve
    Blackmagic Design releases DaVinci Resolve 20.0
    html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "; A recording of Blackmagic Design’s livestream of its announcements for NAB 2025. You can see the new features in Da Vinci Resolve 20.0 at 2:13:20 in the video. Originally posted on 6 April 2025 for the beta, and updated for the stable release. Blackmagic Design has updated DaVinci Resolve, its free colour grading, editing and post-production software, and DaVinci Resolve Studio, its commercial edition. DaVinci Resolve Studio 20.0 is a major release, adding over 100 tools, including a set of new AI-powered features for video editing and audio production, and is free to all users. Below, we’ve rounded up the key features for colorists and effects artists, including a new Chroma Warp tool, a new deep compositing toolset, and full support for multi-layer EXRs. The new Chroma Warp tool makes it possible to create looks with intuitive gesture controls. Color page: New Chroma Warp, plus updates to the Resolve FX Warper and Magic Mask For grading, the Color page‘s Color Warper gets a new Chroma Warp tool.It is designed to create looks intuitively, with users selecting a color in the viewer, and dragging to adjust its hue and saturation simultaneously. Among the existing tools, the Resolve FX Warper effect gets a new Curves Warp mode, which creates a custom polygon with spline points for finer control when warping images. Magic Mask, DaVinci Resolve’s AI-based feature for generating mattes, has been updated, and now operates in a single mode for both people and objects. Workflow is also now more precise, with users now placing points to make selections, then using the paint tools to include or exclude surrounding regions of the image. Another key AI-based feature, the Resolve FX Depth Map effect, which automatically generates depth mattes, has been updated to improve speed and accuracy. For color management across a pipeline, the software has been updated to ACES 2.0, and OpenColorIO is supported as Resolve FX. Effects artists get deep compositing support in the integrated Fusion compositing toolset. Fusion: new deep compositing toolset For compositing and effects work, the Fusion page gets support for deep compositing.Deep compositing, long supported in more VFX-focused apps like Nuke, makes use of depth data encoded in image formats like OpenEXR to control object visibility. It simplifies the process of generating and managing holdouts, and generates fewer visual artifacts, particularly when working with motion blur or environment fog. Deep images can now be viewed in the Fusion viewer or the 3D view, and there is a new set of nodes to merge, transform, resize, crop, recolor and generate holdouts. It is also possible to render deep images from the 3D environment, and export them as deep EXRs via the Fusion saver node. Fusion: new vector warping toolset, plus support for 180 VR and multi-layer workflows Other new features in the Fusion page include a new optical-flow-based vector warping toolset, for image patching and cleanup, and for effects like digital makeup.There is also a new 360° Dome Light for environment lighting, and support for 180 VR, with a number of key tools updated to support 180° workflows. Pipeline improvements include full multi-layer workflows, with all of Fusion’s nodes now able to access each layer within multi-layer EXR or PSD files. Fusion also now natively supports Cryptomatte ID matte data in EXR files. You can read about the new features on the Fusion Page in our story on Fusion Studio 20.0, the latest version of Blackmagic Design’s standalone compositing app, in which they also feature. IntelliScript automatically generates an edit timeline matching a user-provided script. Other toolsets: lots of new AI features for video editing and audio production DaVinci Resolve Studio 20.0 also features a lot of new AI features powered by the software’s Neural Engine, although primarily in the video editing and audio production toolsets.The Cut and Edit pages get new AI tools for automatically creating edit timelines matching a user-provided script; generating animated subtitles; editing or extending music to match clip length; and matching tone, level and reverberance for dialogue. There are also new tools for recording new voiceovers during editing to match an edit. Workflow improvements include a dedicated curve view for keyframe editing; plus a new MultiText tool and updates to the Text+ tool for better control of the layout of on-screen text. For audio post work, the Fairlight page gets new AI features for removing silences from raw footage, and automatically balancing an audio mix. We don’t cover video or audio editing on CG Channel, but you can find a complete list of changes via the links at the foot of the story. Codec and format support Other key changes include native support for ProRes encoding on Windows and Linux systems as well as macOS.MV-HEVC encoding is now supported on systems with NVIDIA GPUs, and H.265 4:2:2 encoding and decoding are GPU-accelerated on NVIDIA’s new Blackwell GPUs. Again, you can find a full list of changes to codec and platform support via the links at the foot of the story. Upcoming features include generative AI-based background extension. Future updates: new toolsets for immersive video and AI background generation Blackmagic Design also announced two upcoming features not present in the initial release.Artists creating mixed reality content will get a new toolset for ingesting, editing and delivering immersive video for Apple’s Vision Pro headset. There will also be a new generative AI feature, Resolve FX AI Set Extender, available via Blackmagic Cloud. More details will be announced later this year, but Blackmagic says that it will enable users to generate new backgrounds for shots by entering simple text prompts. The video above shows a range of use cases, including extending or adding objects to existing footage, and generating a complete new background behind a foreground object. Price, system requirements and release date DaVinci Resolve 20.0 and DaVinci Resolve Studio 20.0 are for Windows 10+, Rocky Linux 8.6, and macOS 14.0+. The updates are free to existing users.New perpetual licenses of the base edition are also free. The Studio edition, which adds AI features, stereoscopic 3D tools, and collaboration features, costs following a temporary increase in the US following the introduction of new tariffs. Read a full list of new features in DaVinci Resolve 20.0 and DaVinci Resolve Studio 20.0 Have your say on this story by following CG Channel on Facebook, Instagram and X. As well as being able to comment on stories, followers of our social media accounts can see videos we don’t post on the site itself, including making-ofs for the latest VFX movies, animations, games cinematics and motion graphics projects. #blackmagic #design #releases #davinci #resolve
    WWW.CGCHANNEL.COM
    Blackmagic Design releases DaVinci Resolve 20.0
    html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd" A recording of Blackmagic Design’s livestream of its announcements for NAB 2025. You can see the new features in Da Vinci Resolve 20.0 at 2:13:20 in the video. Originally posted on 6 April 2025 for the beta, and updated for the stable release. Blackmagic Design has updated DaVinci Resolve, its free colour grading, editing and post-production software, and DaVinci Resolve Studio, its $295 commercial edition. DaVinci Resolve Studio 20.0 is a major release, adding over 100 tools, including a set of new AI-powered features for video editing and audio production, and is free to all users. Below, we’ve rounded up the key features for colorists and effects artists, including a new Chroma Warp tool, a new deep compositing toolset, and full support for multi-layer EXRs. The new Chroma Warp tool makes it possible to create looks with intuitive gesture controls. Color page: New Chroma Warp, plus updates to the Resolve FX Warper and Magic Mask For grading, the Color page‘s Color Warper gets a new Chroma Warp tool.It is designed to create looks intuitively, with users selecting a color in the viewer, and dragging to adjust its hue and saturation simultaneously. Among the existing tools, the Resolve FX Warper effect gets a new Curves Warp mode, which creates a custom polygon with spline points for finer control when warping images. Magic Mask, DaVinci Resolve’s AI-based feature for generating mattes, has been updated, and now operates in a single mode for both people and objects. Workflow is also now more precise, with users now placing points to make selections, then using the paint tools to include or exclude surrounding regions of the image. Another key AI-based feature, the Resolve FX Depth Map effect, which automatically generates depth mattes, has been updated to improve speed and accuracy. For color management across a pipeline, the software has been updated to ACES 2.0, and OpenColorIO is supported as Resolve FX. Effects artists get deep compositing support in the integrated Fusion compositing toolset. Fusion: new deep compositing toolset For compositing and effects work, the Fusion page gets support for deep compositing.Deep compositing, long supported in more VFX-focused apps like Nuke, makes use of depth data encoded in image formats like OpenEXR to control object visibility. It simplifies the process of generating and managing holdouts, and generates fewer visual artifacts, particularly when working with motion blur or environment fog. Deep images can now be viewed in the Fusion viewer or the 3D view, and there is a new set of nodes to merge, transform, resize, crop, recolor and generate holdouts. It is also possible to render deep images from the 3D environment, and export them as deep EXRs via the Fusion saver node. Fusion: new vector warping toolset, plus support for 180 VR and multi-layer workflows Other new features in the Fusion page include a new optical-flow-based vector warping toolset, for image patching and cleanup, and for effects like digital makeup.There is also a new 360° Dome Light for environment lighting, and support for 180 VR, with a number of key tools updated to support 180° workflows. Pipeline improvements include full multi-layer workflows, with all of Fusion’s nodes now able to access each layer within multi-layer EXR or PSD files. Fusion also now natively supports Cryptomatte ID matte data in EXR files. You can read about the new features on the Fusion Page in our story on Fusion Studio 20.0, the latest version of Blackmagic Design’s standalone compositing app, in which they also feature. IntelliScript automatically generates an edit timeline matching a user-provided script. Other toolsets: lots of new AI features for video editing and audio production DaVinci Resolve Studio 20.0 also features a lot of new AI features powered by the software’s Neural Engine, although primarily in the video editing and audio production toolsets.The Cut and Edit pages get new AI tools for automatically creating edit timelines matching a user-provided script; generating animated subtitles; editing or extending music to match clip length; and matching tone, level and reverberance for dialogue. There are also new tools for recording new voiceovers during editing to match an edit. Workflow improvements include a dedicated curve view for keyframe editing; plus a new MultiText tool and updates to the Text+ tool for better control of the layout of on-screen text. For audio post work, the Fairlight page gets new AI features for removing silences from raw footage, and automatically balancing an audio mix. We don’t cover video or audio editing on CG Channel, but you can find a complete list of changes via the links at the foot of the story. Codec and format support Other key changes include native support for ProRes encoding on Windows and Linux systems as well as macOS.MV-HEVC encoding is now supported on systems with NVIDIA GPUs, and H.265 4:2:2 encoding and decoding are GPU-accelerated on NVIDIA’s new Blackwell GPUs. Again, you can find a full list of changes to codec and platform support via the links at the foot of the story. Upcoming features include generative AI-based background extension. Future updates: new toolsets for immersive video and AI background generation Blackmagic Design also announced two upcoming features not present in the initial release.Artists creating mixed reality content will get a new toolset for ingesting, editing and delivering immersive video for Apple’s Vision Pro headset. There will also be a new generative AI feature, Resolve FX AI Set Extender, available via Blackmagic Cloud. More details will be announced later this year, but Blackmagic says that it will enable users to generate new backgrounds for shots by entering simple text prompts. The video above shows a range of use cases, including extending or adding objects to existing footage, and generating a complete new background behind a foreground object. Price, system requirements and release date DaVinci Resolve 20.0 and DaVinci Resolve Studio 20.0 are for Windows 10+, Rocky Linux 8.6, and macOS 14.0+. The updates are free to existing users.New perpetual licenses of the base edition are also free. The Studio edition, which adds AI features, stereoscopic 3D tools, and collaboration features, costs $295, following a temporary increase in the US following the introduction of new tariffs. Read a full list of new features in DaVinci Resolve 20.0 and DaVinci Resolve Studio 20.0 Have your say on this story by following CG Channel on Facebook, Instagram and X (formerly Twitter). As well as being able to comment on stories, followers of our social media accounts can see videos we don’t post on the site itself, including making-ofs for the latest VFX movies, animations, games cinematics and motion graphics projects.
    0 Comentários 0 Compartilhamentos
  • From LLMs to hallucinations, here’s a simple guide to common AI terms

    Artificial intelligence is a deep and convoluted world. The scientists who work in this field often rely on jargon and lingo to explain what they’re working on. As a result, we frequently have to use those technical terms in our coverage of the artificial intelligence industry. That’s why we thought it would be helpful to put together a glossary with definitions of some of the most important words and phrases that we use in our articles.
    We will regularly update this glossary to add new entries as researchers continually uncover novel methods to push the frontier of artificial intelligence while identifying emerging safety risks.

    AGI
    Artificial general intelligence, or AGI, is a nebulous term. But it generally refers to AI that’s more capable than the average human at many, if not most, tasks. OpenAI CEO Sam Altman recently described AGI as the “equivalent of a median human that you could hire as a co-worker.” Meanwhile, OpenAI’s charter defines AGI as “highly autonomous systems that outperform humans at most economically valuable work.” Google DeepMind’s understanding differs slightly from these two definitions; the lab views AGI as “AI that’s at least as capable as humans at most cognitive tasks.” Confused? Not to worry — so are experts at the forefront of AI research.
    AI agent
    An AI agent refers to a tool that uses AI technologies to perform a series of tasks on your behalf — beyond what a more basic AI chatbot could do — such as filing expenses, booking tickets or a table at a restaurant, or even writing and maintaining code. However, as we’ve explained before, there are lots of moving pieces in this emergent space, so “AI agent” might mean different things to different people. Infrastructure is also still being built out to deliver on its envisaged capabilities. But the basic concept implies an autonomous system that may draw on multiple AI systems to carry out multistep tasks.
    Chain of thought
    Given a simple question, a human brain can answer without even thinking too much about it — things like “which animal is taller, a giraffe or a cat?” But in many cases, you often need a pen and paper to come up with the right answer because there are intermediary steps. For instance, if a farmer has chickens and cows, and together they have 40 heads and 120 legs, you might need to write down a simple equation to come up with the answer.
    In an AI context, chain-of-thought reasoning for large language models means breaking down a problem into smaller, intermediate steps to improve the quality of the end result. It usually takes longer to get an answer, but the answer is more likely to be correct, especially in a logic or coding context. Reasoning models are developed from traditional large language models and optimized for chain-of-thought thinking thanks to reinforcement learning.Techcrunch event

    Join us at TechCrunch Sessions: AI
    Secure your spot for our leading AI industry event with speakers from OpenAI, Anthropic, and Cohere. For a limited time, tickets are just for an entire day of expert talks, workshops, and potent networking.

    Exhibit at TechCrunch Sessions: AI
    Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you’ve built — without the big spend. Available through May 9 or while tables last.

    Berkeley, CA
    |
    June 5

    REGISTER NOW

    Deep learning
    A subset of self-improving machine learning in which AI algorithms are designed with a multi-layered, artificial neural networkstructure. This allows them to make more complex correlations compared to simpler machine learning-based systems, such as linear models or decision trees. The structure of deep learning algorithms draws inspiration from the interconnected pathways of neurons in the human brain.
    Deep learning AI models are able to identify important characteristics in data themselves, rather than requiring human engineers to define these features. The structure also supports algorithms that can learn from errors and, through a process of repetition and adjustment, improve their own outputs. However, deep learning systems require a lot of data points to yield good results. They also typically take longer to train compared to simpler machine learning algorithms — so development costs tend to be higher.Diffusion
    Diffusion is the tech at the heart of many art-, music-, and text-generating AI models. Inspired by physics, diffusion systems slowly “destroy” the structure of data — e.g. photos, songs, and so on — by adding noise until there’s nothing left. In physics, diffusion is spontaneous and irreversible — sugar diffused in coffee can’t be restored to cube form. But diffusion systems in AI aim to learn a sort of “reverse diffusion” process to restore the destroyed data, gaining the ability to recover the data from noise.
    Distillation
    Distillation is a technique used to extract knowledge from a large AI model with a ‘teacher-student’ model. Developers send requests to a teacher model and record the outputs. Answers are sometimes compared with a dataset to see how accurate they are. These outputs are then used to train the student model, which is trained to approximate the teacher’s behavior.
    Distillation can be used to create a smaller, more efficient model based on a larger model with a minimal distillation loss. This is likely how OpenAI developed GPT-4 Turbo, a faster version of GPT-4.
    While all AI companies use distillation internally, it may have also been used by some AI companies to catch up with frontier models. Distillation from a competitor usually violates the terms of service of AI API and chat assistants.
    Fine-tuning
    This refers to the further training of an AI model to optimize performance for a more specific task or area than was previously a focal point of its training — typically by feeding in new, specializeddata. 
    Many AI startups are taking large language models as a starting point to build a commercial product but are vying to amp up utility for a target sector or task by supplementing earlier training cycles with fine-tuning based on their own domain-specific knowledge and expertise.GAN
    A GAN, or Generative Adversarial Network, is a type of machine learning framework that underpins some important developments in generative AI when it comes to producing realistic data – includingdeepfake tools. GANs involve the use of a pair of neural networks, one of which draws on its training data to generate an output that is passed to the other model to evaluate. This second, discriminator model thus plays the role of a classifier on the generator’s output – enabling it to improve over time. 
    The GAN structure is set up as a competition– with the two models essentially programmed to try to outdo each other: the generator is trying to get its output past the discriminator, while the discriminator is working to spot artificially generated data. This structured contest can optimize AI outputs to be more realistic without the need for additional human intervention. Though GANs work best for narrower applications, rather than general purpose AI.
    Hallucination
    Hallucination is the AI industry’s preferred term for AI models making stuff up – literally generating information that is incorrect. Obviously, it’s a huge problem for AI quality. 
    Hallucinations produce GenAI outputs that can be misleading and could even lead to real-life risks — with potentially dangerous consequences. This is why most GenAI tools’ small print now warns users to verify AI-generated answers, even though such disclaimers are usually far less prominent than the information the tools dispense at the touch of a button.
    The problem of AIs fabricating information is thought to arise as a consequence of gaps in training data. For general purpose GenAI especially — also sometimes known as foundation models — this looks difficult to resolve. There is simply not enough data in existence to train AI models to comprehensively resolve all the questions we could possibly ask. TL;DR: we haven’t invented God. 
    Hallucinations are contributing to a push towards increasingly specialized and/or vertical AI models — i.e. domain-specific AIs that require narrower expertise – as a way to reduce the likelihood of knowledge gaps and shrink disinformation risks.
    Inference
    Inference is the process of running an AI model. It’s setting a model loose to make predictions or draw conclusions from previously-seen data. To be clear, inference can’t happen without training; a model must learn patterns in a set of data before it can effectively extrapolate from this training data.
    Many types of hardware can perform inference, ranging from smartphone processors to beefy GPUs to custom-designed AI accelerators. But not all of them can run models equally well. Very large models would take ages to make predictions on, say, a laptop versus a cloud server with high-end AI chips.Large language modelLarge language models, or LLMs, are the AI models used by popular AI assistants, such as ChatGPT, Claude, Google’s Gemini, Meta’s AI Llama, Microsoft Copilot, or Mistral’s Le Chat. When you chat with an AI assistant, you interact with a large language model that processes your request directly or with the help of different available tools, such as web browsing or code interpreters.
    AI assistants and LLMs can have different names. For instance, GPT is OpenAI’s large language model and ChatGPT is the AI assistant product.
    LLMs are deep neural networks made of billions of numerical parametersthat learn the relationships between words and phrases and create a representation of language, a sort of multidimensional map of words.
    These models are created from encoding the patterns they find in billions of books, articles, and transcripts. When you prompt an LLM, the model generates the most likely pattern that fits the prompt. It then evaluates the most probable next word after the last one based on what was said before. Repeat, repeat, and repeat.Neural network
    A neural network refers to the multi-layered algorithmic structure that underpins deep learning — and, more broadly, the whole boom in generative AI tools following the emergence of large language models. 
    Although the idea of taking inspiration from the densely interconnected pathways of the human brain as a design structure for data processing algorithms dates all the way back to the 1940s, it was the much more recent rise of graphical processing hardware— via the video game industry — that really unlocked the power of this theory. These chips proved well suited to training algorithms with many more layers than was possible in earlier epochs — enabling neural network-based AI systems to achieve far better performance across many domains, including voice recognition, autonomous navigation, and drug discovery.Training
    Developing machine learning AIs involves a process known as training. In simple terms, this refers to data being fed in in order that the model can learn from patterns and generate useful outputs.
    Things can get a bit philosophical at this point in the AI stack — since, pre-training, the mathematical structure that’s used as the starting point for developing a learning system is just a bunch of layers and random numbers. It’s only through training that the AI model really takes shape. Essentially, it’s the process of the system responding to characteristics in the data that enables it to adapt outputs towards a sought-for goal — whether that’s identifying images of cats or producing a haiku on demand.
    It’s important to note that not all AI requires training. Rules-based AIs that are programmed to follow manually predefined instructions — for example, such as linear chatbots — don’t need to undergo training. However, such AI systems are likely to be more constrained thanself-learning systems.
    Still, training can be expensive because it requires lots of inputs — and, typically, the volumes of inputs required for such models have been trending upwards.
    Hybrid approaches can sometimes be used to shortcut model development and help manage costs. Such as doing data-driven fine-tuning of a rules-based AI — meaning development requires less data, compute, energy, and algorithmic complexity than if the developer had started building from scratch.Transfer learning
    A technique where a previously trained AI model is used as the starting point for developing a new model for a different but typically related task – allowing knowledge gained in previous training cycles to be reapplied. 
    Transfer learning can drive efficiency savings by shortcutting model development. It can also be useful when data for the task that the model is being developed for is somewhat limited. But it’s important to note that the approach has limitations. Models that rely on transfer learning to gain generalized capabilities will likely require training on additional data in order to perform well in their domain of focusWeights
    Weights are core to AI training, as they determine how much importanceis given to different featuresin the data used for training the system — thereby shaping the AI model’s output. 
    Put another way, weights are numerical parameters that define what’s most salient in a dataset for the given training task. They achieve their function by applying multiplication to inputs. Model training typically begins with weights that are randomly assigned, but as the process unfolds, the weights adjust as the model seeks to arrive at an output that more closely matches the target.
    For example, an AI model for predicting housing prices that’s trained on historical real estate data for a target location could include weights for features such as the number of bedrooms and bathrooms, whether a property is detached or semi-detached, whether it has parking, a garage, and so on. 
    Ultimately, the weights the model attaches to each of these inputs reflect how much they influence the value of a property, based on the given dataset.

    Topics
    #llms #hallucinations #heres #simple #guide
    From LLMs to hallucinations, here’s a simple guide to common AI terms
    Artificial intelligence is a deep and convoluted world. The scientists who work in this field often rely on jargon and lingo to explain what they’re working on. As a result, we frequently have to use those technical terms in our coverage of the artificial intelligence industry. That’s why we thought it would be helpful to put together a glossary with definitions of some of the most important words and phrases that we use in our articles. We will regularly update this glossary to add new entries as researchers continually uncover novel methods to push the frontier of artificial intelligence while identifying emerging safety risks. AGI Artificial general intelligence, or AGI, is a nebulous term. But it generally refers to AI that’s more capable than the average human at many, if not most, tasks. OpenAI CEO Sam Altman recently described AGI as the “equivalent of a median human that you could hire as a co-worker.” Meanwhile, OpenAI’s charter defines AGI as “highly autonomous systems that outperform humans at most economically valuable work.” Google DeepMind’s understanding differs slightly from these two definitions; the lab views AGI as “AI that’s at least as capable as humans at most cognitive tasks.” Confused? Not to worry — so are experts at the forefront of AI research. AI agent An AI agent refers to a tool that uses AI technologies to perform a series of tasks on your behalf — beyond what a more basic AI chatbot could do — such as filing expenses, booking tickets or a table at a restaurant, or even writing and maintaining code. However, as we’ve explained before, there are lots of moving pieces in this emergent space, so “AI agent” might mean different things to different people. Infrastructure is also still being built out to deliver on its envisaged capabilities. But the basic concept implies an autonomous system that may draw on multiple AI systems to carry out multistep tasks. Chain of thought Given a simple question, a human brain can answer without even thinking too much about it — things like “which animal is taller, a giraffe or a cat?” But in many cases, you often need a pen and paper to come up with the right answer because there are intermediary steps. For instance, if a farmer has chickens and cows, and together they have 40 heads and 120 legs, you might need to write down a simple equation to come up with the answer. In an AI context, chain-of-thought reasoning for large language models means breaking down a problem into smaller, intermediate steps to improve the quality of the end result. It usually takes longer to get an answer, but the answer is more likely to be correct, especially in a logic or coding context. Reasoning models are developed from traditional large language models and optimized for chain-of-thought thinking thanks to reinforcement learning.Techcrunch event Join us at TechCrunch Sessions: AI Secure your spot for our leading AI industry event with speakers from OpenAI, Anthropic, and Cohere. For a limited time, tickets are just for an entire day of expert talks, workshops, and potent networking. Exhibit at TechCrunch Sessions: AI Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you’ve built — without the big spend. Available through May 9 or while tables last. Berkeley, CA | June 5 REGISTER NOW Deep learning A subset of self-improving machine learning in which AI algorithms are designed with a multi-layered, artificial neural networkstructure. This allows them to make more complex correlations compared to simpler machine learning-based systems, such as linear models or decision trees. The structure of deep learning algorithms draws inspiration from the interconnected pathways of neurons in the human brain. Deep learning AI models are able to identify important characteristics in data themselves, rather than requiring human engineers to define these features. The structure also supports algorithms that can learn from errors and, through a process of repetition and adjustment, improve their own outputs. However, deep learning systems require a lot of data points to yield good results. They also typically take longer to train compared to simpler machine learning algorithms — so development costs tend to be higher.Diffusion Diffusion is the tech at the heart of many art-, music-, and text-generating AI models. Inspired by physics, diffusion systems slowly “destroy” the structure of data — e.g. photos, songs, and so on — by adding noise until there’s nothing left. In physics, diffusion is spontaneous and irreversible — sugar diffused in coffee can’t be restored to cube form. But diffusion systems in AI aim to learn a sort of “reverse diffusion” process to restore the destroyed data, gaining the ability to recover the data from noise. Distillation Distillation is a technique used to extract knowledge from a large AI model with a ‘teacher-student’ model. Developers send requests to a teacher model and record the outputs. Answers are sometimes compared with a dataset to see how accurate they are. These outputs are then used to train the student model, which is trained to approximate the teacher’s behavior. Distillation can be used to create a smaller, more efficient model based on a larger model with a minimal distillation loss. This is likely how OpenAI developed GPT-4 Turbo, a faster version of GPT-4. While all AI companies use distillation internally, it may have also been used by some AI companies to catch up with frontier models. Distillation from a competitor usually violates the terms of service of AI API and chat assistants. Fine-tuning This refers to the further training of an AI model to optimize performance for a more specific task or area than was previously a focal point of its training — typically by feeding in new, specializeddata.  Many AI startups are taking large language models as a starting point to build a commercial product but are vying to amp up utility for a target sector or task by supplementing earlier training cycles with fine-tuning based on their own domain-specific knowledge and expertise.GAN A GAN, or Generative Adversarial Network, is a type of machine learning framework that underpins some important developments in generative AI when it comes to producing realistic data – includingdeepfake tools. GANs involve the use of a pair of neural networks, one of which draws on its training data to generate an output that is passed to the other model to evaluate. This second, discriminator model thus plays the role of a classifier on the generator’s output – enabling it to improve over time.  The GAN structure is set up as a competition– with the two models essentially programmed to try to outdo each other: the generator is trying to get its output past the discriminator, while the discriminator is working to spot artificially generated data. This structured contest can optimize AI outputs to be more realistic without the need for additional human intervention. Though GANs work best for narrower applications, rather than general purpose AI. Hallucination Hallucination is the AI industry’s preferred term for AI models making stuff up – literally generating information that is incorrect. Obviously, it’s a huge problem for AI quality.  Hallucinations produce GenAI outputs that can be misleading and could even lead to real-life risks — with potentially dangerous consequences. This is why most GenAI tools’ small print now warns users to verify AI-generated answers, even though such disclaimers are usually far less prominent than the information the tools dispense at the touch of a button. The problem of AIs fabricating information is thought to arise as a consequence of gaps in training data. For general purpose GenAI especially — also sometimes known as foundation models — this looks difficult to resolve. There is simply not enough data in existence to train AI models to comprehensively resolve all the questions we could possibly ask. TL;DR: we haven’t invented God.  Hallucinations are contributing to a push towards increasingly specialized and/or vertical AI models — i.e. domain-specific AIs that require narrower expertise – as a way to reduce the likelihood of knowledge gaps and shrink disinformation risks. Inference Inference is the process of running an AI model. It’s setting a model loose to make predictions or draw conclusions from previously-seen data. To be clear, inference can’t happen without training; a model must learn patterns in a set of data before it can effectively extrapolate from this training data. Many types of hardware can perform inference, ranging from smartphone processors to beefy GPUs to custom-designed AI accelerators. But not all of them can run models equally well. Very large models would take ages to make predictions on, say, a laptop versus a cloud server with high-end AI chips.Large language modelLarge language models, or LLMs, are the AI models used by popular AI assistants, such as ChatGPT, Claude, Google’s Gemini, Meta’s AI Llama, Microsoft Copilot, or Mistral’s Le Chat. When you chat with an AI assistant, you interact with a large language model that processes your request directly or with the help of different available tools, such as web browsing or code interpreters. AI assistants and LLMs can have different names. For instance, GPT is OpenAI’s large language model and ChatGPT is the AI assistant product. LLMs are deep neural networks made of billions of numerical parametersthat learn the relationships between words and phrases and create a representation of language, a sort of multidimensional map of words. These models are created from encoding the patterns they find in billions of books, articles, and transcripts. When you prompt an LLM, the model generates the most likely pattern that fits the prompt. It then evaluates the most probable next word after the last one based on what was said before. Repeat, repeat, and repeat.Neural network A neural network refers to the multi-layered algorithmic structure that underpins deep learning — and, more broadly, the whole boom in generative AI tools following the emergence of large language models.  Although the idea of taking inspiration from the densely interconnected pathways of the human brain as a design structure for data processing algorithms dates all the way back to the 1940s, it was the much more recent rise of graphical processing hardware— via the video game industry — that really unlocked the power of this theory. These chips proved well suited to training algorithms with many more layers than was possible in earlier epochs — enabling neural network-based AI systems to achieve far better performance across many domains, including voice recognition, autonomous navigation, and drug discovery.Training Developing machine learning AIs involves a process known as training. In simple terms, this refers to data being fed in in order that the model can learn from patterns and generate useful outputs. Things can get a bit philosophical at this point in the AI stack — since, pre-training, the mathematical structure that’s used as the starting point for developing a learning system is just a bunch of layers and random numbers. It’s only through training that the AI model really takes shape. Essentially, it’s the process of the system responding to characteristics in the data that enables it to adapt outputs towards a sought-for goal — whether that’s identifying images of cats or producing a haiku on demand. It’s important to note that not all AI requires training. Rules-based AIs that are programmed to follow manually predefined instructions — for example, such as linear chatbots — don’t need to undergo training. However, such AI systems are likely to be more constrained thanself-learning systems. Still, training can be expensive because it requires lots of inputs — and, typically, the volumes of inputs required for such models have been trending upwards. Hybrid approaches can sometimes be used to shortcut model development and help manage costs. Such as doing data-driven fine-tuning of a rules-based AI — meaning development requires less data, compute, energy, and algorithmic complexity than if the developer had started building from scratch.Transfer learning A technique where a previously trained AI model is used as the starting point for developing a new model for a different but typically related task – allowing knowledge gained in previous training cycles to be reapplied.  Transfer learning can drive efficiency savings by shortcutting model development. It can also be useful when data for the task that the model is being developed for is somewhat limited. But it’s important to note that the approach has limitations. Models that rely on transfer learning to gain generalized capabilities will likely require training on additional data in order to perform well in their domain of focusWeights Weights are core to AI training, as they determine how much importanceis given to different featuresin the data used for training the system — thereby shaping the AI model’s output.  Put another way, weights are numerical parameters that define what’s most salient in a dataset for the given training task. They achieve their function by applying multiplication to inputs. Model training typically begins with weights that are randomly assigned, but as the process unfolds, the weights adjust as the model seeks to arrive at an output that more closely matches the target. For example, an AI model for predicting housing prices that’s trained on historical real estate data for a target location could include weights for features such as the number of bedrooms and bathrooms, whether a property is detached or semi-detached, whether it has parking, a garage, and so on.  Ultimately, the weights the model attaches to each of these inputs reflect how much they influence the value of a property, based on the given dataset. Topics #llms #hallucinations #heres #simple #guide
    TECHCRUNCH.COM
    From LLMs to hallucinations, here’s a simple guide to common AI terms
    Artificial intelligence is a deep and convoluted world. The scientists who work in this field often rely on jargon and lingo to explain what they’re working on. As a result, we frequently have to use those technical terms in our coverage of the artificial intelligence industry. That’s why we thought it would be helpful to put together a glossary with definitions of some of the most important words and phrases that we use in our articles. We will regularly update this glossary to add new entries as researchers continually uncover novel methods to push the frontier of artificial intelligence while identifying emerging safety risks. AGI Artificial general intelligence, or AGI, is a nebulous term. But it generally refers to AI that’s more capable than the average human at many, if not most, tasks. OpenAI CEO Sam Altman recently described AGI as the “equivalent of a median human that you could hire as a co-worker.” Meanwhile, OpenAI’s charter defines AGI as “highly autonomous systems that outperform humans at most economically valuable work.” Google DeepMind’s understanding differs slightly from these two definitions; the lab views AGI as “AI that’s at least as capable as humans at most cognitive tasks.” Confused? Not to worry — so are experts at the forefront of AI research. AI agent An AI agent refers to a tool that uses AI technologies to perform a series of tasks on your behalf — beyond what a more basic AI chatbot could do — such as filing expenses, booking tickets or a table at a restaurant, or even writing and maintaining code. However, as we’ve explained before, there are lots of moving pieces in this emergent space, so “AI agent” might mean different things to different people. Infrastructure is also still being built out to deliver on its envisaged capabilities. But the basic concept implies an autonomous system that may draw on multiple AI systems to carry out multistep tasks. Chain of thought Given a simple question, a human brain can answer without even thinking too much about it — things like “which animal is taller, a giraffe or a cat?” But in many cases, you often need a pen and paper to come up with the right answer because there are intermediary steps. For instance, if a farmer has chickens and cows, and together they have 40 heads and 120 legs, you might need to write down a simple equation to come up with the answer (20 chickens and 20 cows). In an AI context, chain-of-thought reasoning for large language models means breaking down a problem into smaller, intermediate steps to improve the quality of the end result. It usually takes longer to get an answer, but the answer is more likely to be correct, especially in a logic or coding context. Reasoning models are developed from traditional large language models and optimized for chain-of-thought thinking thanks to reinforcement learning. (See: Large language model) Techcrunch event Join us at TechCrunch Sessions: AI Secure your spot for our leading AI industry event with speakers from OpenAI, Anthropic, and Cohere. For a limited time, tickets are just $292 for an entire day of expert talks, workshops, and potent networking. Exhibit at TechCrunch Sessions: AI Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you’ve built — without the big spend. Available through May 9 or while tables last. Berkeley, CA | June 5 REGISTER NOW Deep learning A subset of self-improving machine learning in which AI algorithms are designed with a multi-layered, artificial neural network (ANN) structure. This allows them to make more complex correlations compared to simpler machine learning-based systems, such as linear models or decision trees. The structure of deep learning algorithms draws inspiration from the interconnected pathways of neurons in the human brain. Deep learning AI models are able to identify important characteristics in data themselves, rather than requiring human engineers to define these features. The structure also supports algorithms that can learn from errors and, through a process of repetition and adjustment, improve their own outputs. However, deep learning systems require a lot of data points to yield good results (millions or more). They also typically take longer to train compared to simpler machine learning algorithms — so development costs tend to be higher. (See: Neural network) Diffusion Diffusion is the tech at the heart of many art-, music-, and text-generating AI models. Inspired by physics, diffusion systems slowly “destroy” the structure of data — e.g. photos, songs, and so on — by adding noise until there’s nothing left. In physics, diffusion is spontaneous and irreversible — sugar diffused in coffee can’t be restored to cube form. But diffusion systems in AI aim to learn a sort of “reverse diffusion” process to restore the destroyed data, gaining the ability to recover the data from noise. Distillation Distillation is a technique used to extract knowledge from a large AI model with a ‘teacher-student’ model. Developers send requests to a teacher model and record the outputs. Answers are sometimes compared with a dataset to see how accurate they are. These outputs are then used to train the student model, which is trained to approximate the teacher’s behavior. Distillation can be used to create a smaller, more efficient model based on a larger model with a minimal distillation loss. This is likely how OpenAI developed GPT-4 Turbo, a faster version of GPT-4. While all AI companies use distillation internally, it may have also been used by some AI companies to catch up with frontier models. Distillation from a competitor usually violates the terms of service of AI API and chat assistants. Fine-tuning This refers to the further training of an AI model to optimize performance for a more specific task or area than was previously a focal point of its training — typically by feeding in new, specialized (i.e., task-oriented) data.  Many AI startups are taking large language models as a starting point to build a commercial product but are vying to amp up utility for a target sector or task by supplementing earlier training cycles with fine-tuning based on their own domain-specific knowledge and expertise. (See: Large language model [LLM]) GAN A GAN, or Generative Adversarial Network, is a type of machine learning framework that underpins some important developments in generative AI when it comes to producing realistic data – including (but not only) deepfake tools. GANs involve the use of a pair of neural networks, one of which draws on its training data to generate an output that is passed to the other model to evaluate. This second, discriminator model thus plays the role of a classifier on the generator’s output – enabling it to improve over time.  The GAN structure is set up as a competition (hence “adversarial”) – with the two models essentially programmed to try to outdo each other: the generator is trying to get its output past the discriminator, while the discriminator is working to spot artificially generated data. This structured contest can optimize AI outputs to be more realistic without the need for additional human intervention. Though GANs work best for narrower applications (such as producing realistic photos or videos), rather than general purpose AI. Hallucination Hallucination is the AI industry’s preferred term for AI models making stuff up – literally generating information that is incorrect. Obviously, it’s a huge problem for AI quality.  Hallucinations produce GenAI outputs that can be misleading and could even lead to real-life risks — with potentially dangerous consequences (think of a health query that returns harmful medical advice). This is why most GenAI tools’ small print now warns users to verify AI-generated answers, even though such disclaimers are usually far less prominent than the information the tools dispense at the touch of a button. The problem of AIs fabricating information is thought to arise as a consequence of gaps in training data. For general purpose GenAI especially — also sometimes known as foundation models — this looks difficult to resolve. There is simply not enough data in existence to train AI models to comprehensively resolve all the questions we could possibly ask. TL;DR: we haven’t invented God (yet).  Hallucinations are contributing to a push towards increasingly specialized and/or vertical AI models — i.e. domain-specific AIs that require narrower expertise – as a way to reduce the likelihood of knowledge gaps and shrink disinformation risks. Inference Inference is the process of running an AI model. It’s setting a model loose to make predictions or draw conclusions from previously-seen data. To be clear, inference can’t happen without training; a model must learn patterns in a set of data before it can effectively extrapolate from this training data. Many types of hardware can perform inference, ranging from smartphone processors to beefy GPUs to custom-designed AI accelerators. But not all of them can run models equally well. Very large models would take ages to make predictions on, say, a laptop versus a cloud server with high-end AI chips. [See: Training] Large language model (LLM) Large language models, or LLMs, are the AI models used by popular AI assistants, such as ChatGPT, Claude, Google’s Gemini, Meta’s AI Llama, Microsoft Copilot, or Mistral’s Le Chat. When you chat with an AI assistant, you interact with a large language model that processes your request directly or with the help of different available tools, such as web browsing or code interpreters. AI assistants and LLMs can have different names. For instance, GPT is OpenAI’s large language model and ChatGPT is the AI assistant product. LLMs are deep neural networks made of billions of numerical parameters (or weights, see below) that learn the relationships between words and phrases and create a representation of language, a sort of multidimensional map of words. These models are created from encoding the patterns they find in billions of books, articles, and transcripts. When you prompt an LLM, the model generates the most likely pattern that fits the prompt. It then evaluates the most probable next word after the last one based on what was said before. Repeat, repeat, and repeat. (See: Neural network) Neural network A neural network refers to the multi-layered algorithmic structure that underpins deep learning — and, more broadly, the whole boom in generative AI tools following the emergence of large language models.  Although the idea of taking inspiration from the densely interconnected pathways of the human brain as a design structure for data processing algorithms dates all the way back to the 1940s, it was the much more recent rise of graphical processing hardware (GPUs) — via the video game industry — that really unlocked the power of this theory. These chips proved well suited to training algorithms with many more layers than was possible in earlier epochs — enabling neural network-based AI systems to achieve far better performance across many domains, including voice recognition, autonomous navigation, and drug discovery. (See: Large language model [LLM]) Training Developing machine learning AIs involves a process known as training. In simple terms, this refers to data being fed in in order that the model can learn from patterns and generate useful outputs. Things can get a bit philosophical at this point in the AI stack — since, pre-training, the mathematical structure that’s used as the starting point for developing a learning system is just a bunch of layers and random numbers. It’s only through training that the AI model really takes shape. Essentially, it’s the process of the system responding to characteristics in the data that enables it to adapt outputs towards a sought-for goal — whether that’s identifying images of cats or producing a haiku on demand. It’s important to note that not all AI requires training. Rules-based AIs that are programmed to follow manually predefined instructions — for example, such as linear chatbots — don’t need to undergo training. However, such AI systems are likely to be more constrained than (well-trained) self-learning systems. Still, training can be expensive because it requires lots of inputs — and, typically, the volumes of inputs required for such models have been trending upwards. Hybrid approaches can sometimes be used to shortcut model development and help manage costs. Such as doing data-driven fine-tuning of a rules-based AI — meaning development requires less data, compute, energy, and algorithmic complexity than if the developer had started building from scratch. [See: Inference] Transfer learning A technique where a previously trained AI model is used as the starting point for developing a new model for a different but typically related task – allowing knowledge gained in previous training cycles to be reapplied.  Transfer learning can drive efficiency savings by shortcutting model development. It can also be useful when data for the task that the model is being developed for is somewhat limited. But it’s important to note that the approach has limitations. Models that rely on transfer learning to gain generalized capabilities will likely require training on additional data in order to perform well in their domain of focus (See: Fine tuning) Weights Weights are core to AI training, as they determine how much importance (or weight) is given to different features (or input variables) in the data used for training the system — thereby shaping the AI model’s output.  Put another way, weights are numerical parameters that define what’s most salient in a dataset for the given training task. They achieve their function by applying multiplication to inputs. Model training typically begins with weights that are randomly assigned, but as the process unfolds, the weights adjust as the model seeks to arrive at an output that more closely matches the target. For example, an AI model for predicting housing prices that’s trained on historical real estate data for a target location could include weights for features such as the number of bedrooms and bathrooms, whether a property is detached or semi-detached, whether it has parking, a garage, and so on.  Ultimately, the weights the model attaches to each of these inputs reflect how much they influence the value of a property, based on the given dataset. Topics
    0 Comentários 0 Compartilhamentos