-
- EXPLORE
-
-
-
-
Every company has a mission. What's ours? To empower every person and every organization to achieve more. We believe technology can and should be a force for good and that meaningful innovation contributes to a brighter world in the future and today.
Aggiornamenti recenti
-
WWW.MICROSOFT.COMResearch Focus: Week of April 21, 2025In this issue: Catch a preview of our presentations and papers at CHI 2025 and ICLR 2025. We also introduce new research on causal reasoning and LLMs; enhancing LLM jailbreak capabilities to bolster safety and robustness; understanding how people using AI compared to AI-alone, and Distill-MOS, a compact and efficient model that delivers state-of-the-art speech quality assessment. You’ll also find a replay of a podcast discussion on rural healthcare innovation with Senior Vice President of Microsoft Health Jim Weinstein. CONFERENCE Microsoft at CHI 2025 Microsoft Research is proud to be a sponsor of the ACM Computer Human Interaction (CHI) 2025 Conference on Human Factors in Computing Systems (opens in new tab). CHI brings together researchers and practitioners from all over the world and from diverse cultures, backgrounds, and positionalities, who share an overarching goal to make the world a better place with interactive digital technologies. Our researchers will host more than 30 sessions and workshops at this year’s conference in Yokohama, Japan. We invite you to preview our presentations and our two dozen accepted papers. Microsoft @CHI 2025 CONFERENCE Microsoft at ICLR 2025 Microsoft is proud to be a sponsor of the thirteenth International Conference on Learning Representations (ICLR). This gathering is dedicated to the advancement of representation learning, which is a branch of AI. We are pleased to share that Microsoft has more than 30 accepted papers at this year’s conference, which we invite you to preview. ICLR is globally renowned for presenting and publishing cutting-edge research on all aspects of deep learning used in the fields of artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, text understanding, gaming, and robotics. Microsoft @ICLR 2025 NEW RESEARCH Causal Reasoning and Large Language Models: Opening a New Frontier for Causality What kinds of causal arguments can large language models (LLMs) generate, how valid are these arguments, and what causal reasoning workflows can this generation support or automate? This paper, which was selected for ICLR 2025, clarifies this debate. It advances our understanding of LLMs and their causal implications, and proposes a framework for future research at the intersection of LLMs and causality. This discussion has critical implications for the use of LLMs in societally impactful domains such as medicine, science, law, and policy. In capturing common sense and domain knowledge about causal mechanisms and supporting translation between natural language and formal methods, LLMs open new frontiers for advancing the research, practice, and adoption of causality. Read the paper NEW RESEARCH The Future of AI in Knowledge Work: Tools for Thought at CHI 2025 Can AI tools do more than streamline workflows—can they actually help us think better? That’s the driving question behind the Microsoft Research Tools for Thought initiative. At this year’s CHI conference, this group is presenting four new research papers and cohosting a workshop that dives deep into this intersection of AI and human cognition. The team provides an overview of their latest research, starting with a study on how AI is changing the way people think and work. They introduce three prototype systems designed to support different cognitive tasks. Finally, through their Tools for Thought workshop, they invite the CHI community to help define AI’s role in supporting human thinking. Read the blog NEW RESEARCH Building LLMs with enhanced jailbreaking capabilities to bolster safety and robustness Recent research shows that LLMs are vulnerable to automated jailbreak attacks, where algorithm-generated adversarial suffixes bypass safety alignment and trigger harmful responses. This paper introduces ADV-LLM, an iterative self-tuning process for crafting adversarial LLMs with enhanced jailbreak capabilities—which could provide valuable insights for future safety alignment research. ADV-LLM is less computationally expensive than prior mechanisms and achieves higher attack success rates (ASR), especially against well-aligned models like Llama2 and Llama3. It reaches nearly 100% ASR on various open-source LLMs and demonstrates strong transferability to closed-source models—achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4—despite being optimized solely on Llama3. Beyond improving jailbreak performance, ADV-LLM offers valuable insights for future alignment research by enabling large-scale generation of safety-relevant datasets. Read the paper NEW RESEARCH ChatBench: From Static Benchmarks to Human-AI Evaluation The rapid adoption of LLM-based chatbots raises the need to understand what people and LLMs can achieve together. However, standard benchmarks like MMLU (opens in new tab) assess LLM capabilities in isolation (i.e., “AI alone”). This paper presents the results of a user study that transforms MMLU questions into interactive user-AI conversations. The researchers seeded the participants with the question and then had them engage in a conversation with the LLM to arrive at an answer. The result is ChatBench, a new dataset comprising AI-alone, user-alone, and user-AI data for 396 questions and two LLMs, including 144,000 answers and 7,336 user-AI conversations. The researchers’ analysis reveals that AI-alone accuracy does not predict user-AI accuracy, with notable differences across subjects such as math, physics, and moral reasoning. Examining user-AI conversations yields insights into how these interactions differ from AI-alone benchmarks. Finally, the researchers demonstrate that finetuning a user simulator on a subset of ChatBench improves its ability to predict user-AI accuracy, boosting correlation on held-out questions by more than 20 points, thereby enabling scalable interactive evaluation. Read the paper NEW RESEARCH Distill-MOS: A compact speech-quality assessment model Distill-MOS is a compact and efficient speech quality assessment model with dramatically reduced size—over 100x smaller than the reference model—enabling efficient, non-intrusive evaluation in real-world, low-resource settings. This paper investigates the distillation and pruning methods to reduce model size for non-intrusive speech quality assessment based on self-supervised representations. The researchers’ experiments build on XLS-R-SQA, a speech quality assessment model using wav2vec 2.0 XLS-R embeddings. They retrain this model on a large compilation of mean opinion score datasets, encompassing over 100,000 labeled clips. Read the paper View GitHub PODCAST Collaborating to Affect Change for Rural Health Care with Innovation and Technology Senior Vice President of Microsoft Health Jim Weinstein joins Dan Liljenquist, Chief Strategy Officer from Intermountain Health, on the NEJM Catalyst podcast for a discussion of their combined expertise and resources and their collaboration to address healthcare challenges in the rural United States. These challenges include limited access to care, rising mortality rates, and severe staffing shortages. Working together, they aim to create a scalable model that can benefit both rural and urban health care systems. Key goals include expanding access through telemedicine and increasing cybersecurity, ultimately improving the quality of care delivered and financial stability for rural communities. Listen to the podcast PODCAST Empowering patients and healthcare consumers in the age of generative AI Two champions of patient-centered digital health join Microsoft Research President Peter Lee to talk about how AI is reshaping healthcare in terms of patient empowerment and emerging digital health business models. Dave deBronkart, a cancer survivor and longtime advocate for patient empowerment, discusses how AI tools like ChatGPT can help patients better understand their conditions, navigate the healthcare system, and communicate more effectively with clinicians. Christina Farr, a healthcare investor and former journalist, talks about the evolving digital health–startup ecosystem, highlighting where AI is having the most meaningful impact—particularly in women’s health, pediatrics, and elder care. She also explores consumer trends, like the rise of cash-pay healthcare. Listen to the podcast PODCAST Beyond the Image: AI’s Expanding Role in Healthcare Jonathan Carlson, Managing Director of Microsoft Research Health Futures, joins the Healthcare Unfiltered show to explore the evolution of AI in medicine, from the early days to cutting-edge innovations like ambient clinical intelligence. This podcast explores how pre-trained models and machine learning are transforming care delivery, as well as the future of biomedicine and healthcare, including important ethical and practical questions. Listen to the podcast Opens in a new tab0 Commenti 0 condivisioni 41 ViewsEffettua l'accesso per mettere mi piace, condividere e commentare!
-
WWW.MICROSOFT.COMThe Future of AI in Knowledge Work: Tools for Thought at CHI 2025Can AI tools do more than streamline workflows—can they actually help us think better? That’s the driving question behind the Microsoft Research Tools for Thought initiative. At this year’s CHI conference, we’re presenting four new research papers and cohosting a workshop that dives deep into this intersection of AI and human cognition. This post provides an overview of our latest research, starting with a study on how AI is changing the way we think and work. We also introduce three prototype systems designed to support different cognitive tasks. Finally, through our Tools for Thought workshop, we’re inviting the CHI community to help define AI’s role in supporting human thinking. AI’s effects on thinking at work With a single prompt, AI can generate a wide range of outputs, from documents and meeting agendas to answers and automated workflows. But how are people’s thinking processes affected when they delegate these tasks to AI? One of our goals is to understand how knowledge workers use AI, how they perceive its value, and how it affects cognitive effort. Our study, “The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers,” surveyed 319 professionals using AI across a variety of occupations. Participants shared 936 real-world AI use cases and reflected on how it influenced their critical thinking and mental effort. We summarize these findings below. Defining and deploying critical thinking. Knowledge workers describe critical thinking as involving activities like setting clear goals, refining prompts, and verifying AI outputs against external sources and their own expertise. They rely on these practices to maintain work quality when using AI—motivated by the need to avoid errors, produce better results, and develop their skills. Findings Balancing cognitive effort. Participants’ reports about critical thinking and the effort involved align with longstanding human tendencies to manage cognitive load at work. For high-stakes tasks requiring accuracy, they say they expend more effort in applying critical thinking with AI than they would performing the same tasks without it. In contrast, during routine, low-stakes tasks under time pressure, they report spending less effort on critical thinking when using AI compared with completing the task without it. Confidence effects. The study found that higher confidence in AI was associated with less Shift in the nature of critical thinking. Participants reported a shift in critical thinking activities, with a greater focus on information verification, response integration, and task stewardship. While AI automates certain aspects of knowledge work, it also demands more effort in evaluating the accuracy and relevance of AI-generated content. Barriers to critical engagement. The study identified several barriers that inhibit critical thinking when using AI. These include a lack of awareness of the need for critical evaluation, limited motivation due to time pressure or perceived job scope, and difficulty in refining prompts—especially in unfamiliar domains. Recommendations To foster critical thinking at work, we recommend that AI tools actively encourage awareness, motivation, and skill development. AI tools should enhance motivators for critical thinking (e.g., quality standards, skill-building) and mitigate inhibitors (e.g., time constraints, low awareness). Proactive prompts can surface overlooked tasks, while reactive features can offer on-demand assistance. Motivation can be strengthened by positioning critical reflection as part of professional growth—not just extra work. AI tools should also support knowledge workers’ ability to think critically by providing reasoning explanations (as some newer AI models now do), guided critiques, and cross-references. This shift must occur in both the design of the technology and in the mindsets of knowledge workers. Rather than treating AI as a tool for delivering answers, we suggest treating it as a thought partner—one that can also act as a provocateur. Beyond these insights, our other CHI papers explore practical ways to design AI that augments human cognition. Enhancing decision-making with AI Decision-making is central to knowledge work, and AI is increasingly used to help people make decisions in complex fields like healthcare and finance. However, how much agency do knowledge workers retain when AI is involved? Our study, “AI, Help Me Think—but for Myself: Exploring How LLMs Can Assist People in Complex Decision-Making by Providing Different Forms of Cognitive Support,” conducted in collaboration with University College London, examines this question. We began with a small formative study involving 10 participants, followed by a comparative study with 21 participants using two different AI-supported decision-making systems. For a complex financial investment task, we compared two different AI tools (Figure 1): RecommendAI, which provides AI-generated recommendations, and ExtendAI, which encourages users to articulate their reasoning before receiving AI feedback. Figure 1. Illustrative comparison of the thought process involved when interacting with two types of AI: RecommendAI and ExtendAI. Findings Both systems were found to offer benefits for augmenting cognition and addressing some of the challenges to critical thinking identified in the knowledge worker survey above, suggesting the potential for a balanced approach. RecommendAI offered concrete suggestions that inspired users to explore new directions in their decision-making. This often led to fresh insights and reflections. However, the recommendations at times felt disconnected from the user’s own reasoning, reducing the depth of engagement. In contrast, ExtendAI encouraged users to reflect more deeply on their decisions by providing feedback on their reasoning. This helped them examine their thought processes and consider alternative perspectives. However, some users found the feedback too general and not actionable enough. When it came to how users integrated the tools into their decision-making process, RecommendAI, introduced perspectives that pushed users to think beyond their usual patterns. By recommending options not based on users’ own reasoning, it encouraged exploration of ideas they might not have considered. However, some users perceived the recommendations as a “black box” solution. This lack of transparency made those recommendations harder to understand, trust, and apply to their own thought processes. ExtendAI, on the other hand, aligned with users’ existing reasoning, making its feedback easier to incorporate. This helped the users maintain a sense of control and continuity. However, because the feedback often echoed their initial thoughts, it sometimes limited new insights and risked reinforcing existing biases. These findings suggest that AI tools like ExtendAI, designed to elicit and build on users’ own cognitive processes, may offer a more effective approach to augmentation than simply providing “ready-made solutions” that users must figure out how to interpret and apply. Are we on track? Making meetings better with AI Meetings are often criticized for being ineffective. While this is sometimes due to poor practices—such as weak agendas, late starts, and unclear facilitation—we believe the deeper issue is a lack of meeting intentionality: knowing why a meeting is occurring and keeping the discussion focused on that purpose. A key challenge is maintaining goal clarity throughout a meeting. In the paper “Are We On Track? AI-Assisted Goal Reflection During Meetings,” we explore how AI tools can improve meetings in real time by encouraging reflection—awareness about the meeting’s goals and how well the current conversation is aligned with those goals. Our study with 15 knowledge workers examined two AI-driven design paradigms: passive goal assistance through ambient visualization (a live chart displaying how conversational topics relate to meeting objectives) and active goal assistance through interactive questioning (nudging participants to consider whether the current conversation aligns with the meeting objectives). These approaches are illustrated in Figure 2. Figure 2. Technology prototypes exploring passive and active ways to keep meetings focused on established objectives. Recommendations The findings highlight AI’s potential to help teams with meeting objectives. We found three key design tradeoffs between passive and active support. Based on these, we offer the following AI design recommendations. Information balance. There is a tradeoff between ambient visualizations in the passive approach—which can risk information overload—and interactive questioning in the active approach, which may lack detail. To be effective, AI should deliver the right amount of information at the right time and tailor content to the individuals who need it most—without overwhelming users, while offering meaningful and timely support for reflection. Balance of engagement versus interruption. When participants are deeply engaged in discussion, significant interruptions can overwhelm and disrupt the flow. Conversely, during moments of confusion or misalignment, subtle cues may be insufficient to get the team back on track. AI systems should dynamically adjust their level of intervention—from ambient and lightweight to more direct—escalating or de-escalating based on timing thresholds, which can be customized for each team. Balance of team versus individual goal awareness. AI assistance can nudge team action, such as adjusting agendas. These effects were stronger with the active approach, which required group responses, while the passive approach supported individual thinking without directly influencing team behavior. Team-wide engagement depends on both the visibility of AI cues and how they are introduced into the discussion. This study helps us understand how AI design choices can support intentionality during meetings and enhance productivity without disrupting natural workflows. Spotlight: blog post GraphRAG auto-tuning provides rapid adaptation to new domains GraphRAG uses LLM-generated knowledge graphs to substantially improve complex Q&A over retrieval-augmented generation (RAG). Discover automatic tuning of GraphRAG for new datasets, making it more accurate and relevant. Read more Opens in a new tab Encouraging diverse problem-solving brainstorming with AI Diverse perspectives drive creative problem-solving in organizations, but individuals often lack access to varied viewpoints. In the paper “YES AND: An AI-Powered Problem-Solving Framework for Diversity of Thought,” we build on the idea of “design improv” to explore a multi-agent AI prototype that simulates conversations with persona-based agents representing a range of expertise. The agents follow a classic model of conversational turn-taking, combined with a confidence model to determine when to take or respond to a turn. This allows both the agents and the user to organically build on each others’ ideas and ask clarifying questions. The system enables free-flowing, multi-party idea generation while avoiding common pitfalls of group brainstorming—such as social loafing, production blocking, and groupthink (Figure 3). Figure 3. The YES AND system supports conversational turn-taking among agents and the user to generate ideas around a problem. At the end of a session, an AI agent called Sage distills the discussion, leaving it to the user to develop a conclusive approach to the problem. In this way, YES AND helps unblock forward momentum in problem-solving while preserving the agency of knowledge workers to shape their own ideas. We believe the best way to advance next-generation tools for thought is by bringing together a wide range of perspectives and approaches. Besides our four papers, the fifth cornerstone of our CHI presence this year is our workshop on April 26, co-organized with collaborators from industry and academia: Tools for Thought: Research and Design for Understanding, Protecting, and Augmenting Human Cognition with Generative AI. In this session, over 60 researchers, designers, practitioners, and provocateurs will gather to examine what it means to understand and shape the impact of AI on human cognition. Together, we’ll explore how AI is changing workflows, the opportunities and challenges for design, and which theories, perspectives, and methods are increasingly relevant—or still need to be developed. The enthusiastic response to this workshop highlights the growing interest in AI’s role in human thought. Our goal is to foster a multidisciplinary community dedicated to ensuring that AI not only accelerates work but also strengthens our ability to think critically, creatively, and strategically. We look forward to ongoing discussions, new collaborations, and the next wave of innovations in AI-assisted cognition at CHI 2025. Opens in a new tab0 Commenti 0 condivisioni 85 Views
-
WWW.MICROSOFT.COMEmpowering patients and healthcare consumers in the age of generative AITranscript [MUSIC] [BOOK PASSAGE] “In healthcare settings, keeping a human in the loop looks like the solution, at least for now, to GPT-4’s less-than 100% accuracy. But years of bitter experience with ‘Dr. Google’ and the COVID ‘misinfodemic’ show that it matters which humans are in the loop, and that leaving patients to their own electronic devices can be rife with pitfalls. Yet because GPT-4 appears to be such an extraordinary tool for mining humanity’s store of medical information, there’s no question members of the public will want to use it that way—a lot.” [END OF BOOK PASSAGE] [THEME MUSIC] This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee. Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong? In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here. [THEME MUSIC FADES] The passage I read at the top there is from Chapter 5, “The AI-Augmented Patient,” which Carey wrote. People have forever turned to the internet and sites like WebMD, Healthline, and so on to find health information and advice. So it wouldn’t be too surprising to witness a significant portion of people refocus those efforts around tools and apps powered by generative AI. Indeed, when we look at our search and advertising businesses here at Microsoft, we find that healthcare is in the top three most common categories of queries by consumers. When we envision AI’s potential impact on the patient experience, in our book, we suggested that it could potentially be a lifeline, especially for those without easy access to adequate healthcare; a research partner to help people make sense of existing providers and treatments; and even maybe act as a third member of a care team that has traditionally been defined by the doctor-patient relationship. This also could have a huge impact on venture capitalists in the tech sector who traditionally have focused on consumer-facing technologies. In this episode, I’m pleased to welcome Dave deBronkart and Christina Farr. Dave, known affectionately online as “e-Patient Dave,” is a world-leading advocate for empowering patients. Drawing on his experience as a survivor of stage 4 cancer, Dave gave a viral TED talk on patient engagement and wrote the highly rated book Let Patients Help! Dave was the Mayo Clinic’s visiting professor in internal medicine in 2015, has spoken at hundreds of conferences around the globe, and today runs the Patients Use AI blog on Substack. Chrissy puts her vast knowledge of the emerging digital and health technology landscape to use as a managing director with Manatt Health, a company that works with health systems, pharmaceutical and biotech companies, government policymakers, and other stakeholders to advise on strategy and technology adoption with the goal of improving human health. Previously, she was a health tech reporter and on-air contributor for CNBC, Fast Company, Reuters, and other renowned news organizations and publications. Hardly a week goes by without a news story about an ordinary person who managed to address their health problems—maybe even save their lives or the lives of their loved ones, including in some cases their pets—through the use of a generative AI system like ChatGPT. And if it’s not doing something as dramatic as getting a second opinion on a severe medical diagnosis, the empowerment that people feel when an AI can help decode an indecipherable medical bill or report or get advice on what to ask a doctor, well, those things are both meaningful and a daily reality in today’s AI world. And make no mistake—such consumer empowerment could mean business, really big business, and this means that investors in new ventures are smart to be taking a close look at all this. For these and many other reasons, I am thrilled to pair the perspectives offered by e-Patient Dave and Chrissy Farr together for this episode. Here is my interview with Dave deBronkart: LEE: Dave, it’s just a thrill and honor to have you join us. DAVE DEBRONKART: It’s a thrill to be alive. I’m really glad that good medicine saved me, and it is just unbelievable, fun, and exciting and stimulating to be in a conversation with somebody like you. LEE: Likewise. Now, we’re going to want to get into both the opportunities and the challenges that patients face. But before that, I want to talk a little bit and delve a little bit more into you, yourself. I, of course, know you as this amazing speaker and advocate for patients. But you have had actually a pretty long career and history prior to all this. And so can you tell us a little bit about your background? DEBRONKART: I’ll go back all the way to when I first got out of college. I didn’t know what I wanted to do when I grew up. So I got a job where I … basically, I used my experience working on the school paper to get a temporary job. It was in type setting, if you can believe that. [LAUGHTER] And, man, a few years later, that became the ultimate lesson in disruptive innovation. LEE: So you were actually doing movable type? Setting type? DEBRONKART: Oh, no, that was, I was … I’m not that old, sir! [LAUGHTER] The first place where I worked, they did have an actual Linotype machine and all that. LEE: Wow. DEBRONKART: Anyway, one thing led to another. A few years after I got that first job, I was working for the world’s biggest maker of typesetting machines. And I did product marketing, and I learned how to speak to audiences of all different sorts. And then desktop publishing came along, as I say. And it’s so funny because, now mind you, this was 10 years before Clay Christensen wrote The Innovator’s Dilemma (opens in new tab). But I had already lived through that because here we were. We were the journeymen experts in our noble craft that had centuries of tradition as a background. Is this reminding you of anything? [LAUGHTER] Well, seriously. And then along comes stuff that can be put in the hands of the consumers. And I’ll tell you what, people like you had no clue how to use fonts correctly. [LAUGHTER] We were like Jack Nicholson, saying “You can’t handle the Helvetica! You don’t know what you’re doing!” But what happened then, and this is really relevant, what happened then is—all of a sudden, the population of users was a hundred times bigger than the typesetting industry had ever been. The clueless people gained experience, and they also started expressing what they wanted the software to be. The important thing is today everybody uses fonts. It’s no longer a secret profession. Things are done differently, but there is more power in the hands of the end user. LEE: Yeah, I think it’s so interesting to hear that story. I didn’t know that about your background. And I think it sheds some light on hopefully what will come out later as you have become such, I would call you a fierce consumer advocate. DEBRONKART: Sure, energetic, however, whatever you want to call it, sure. [LAUGHTER] Seriously, Peter, what I always look to do … so this is a mixture of my having been run over by a truck during disruptive innovation, all right, but then also looking at that experience from a marketing perspective: how can I convey what’s happening in a way that people can hear? Because you really don’t get much traction as an advocate if you come in and say, you people are messed up. LEE: Right. So, now I know this gets into something fairly personal, but you’ve actually been remarkably public about this. You became very ill. DEBRONKART: Yes. LEE: And of course, I suspect some of the listeners to this podcast probably have followed your story, but many have not. So can we go a little bit through that … DEBRONKART: Sure. LEE: … just to give our listeners a sense of how this has formed some of your views about the healthcare system. DEBRONKART: So late in 2006, I went in for my annual physical with my deservedly famous primary care physician, Danny Sands at Beth Israel [Deaconess Medical Center] in Boston. And in the process—I had moved away for a few years, so I hadn’t seen him for a while—I did something unusual. I came into the visit with a preprinted letter with 13 items I wanted to go over with him. LEE: What made you do that? Why did you do that? DEBRONKART: I have always been, even before I knew the term exists, I was an engaged patient, and I also very deeply believe in partnership with my physicians. And I respected his time. I had all these things, because I hadn’t seen him for three years … LEE: Yeah. DEBRONKART: … all these things I wanted to go through. To me it was just if I walked into a business meeting with a bunch of people that I hadn’t seen for three years and I want to get caught up, I’d have an agenda. LEE: It’s so interesting to hear you say this because I’m very similar to you. I like to do my own research. I like to come in with checklists. And do you ever get a sense like I do that sometimes that makes your doctor a little uncomfortable? DEBRONKART: [LAUGHS] Well, you know, so sometimes it does make some doctors uncomfortable and that touches on something that right now is excruciatingly important in the culture change that’s going on. I’ve spent a lot of time as I worked on the culture change from the patient side, I want to empathize, understand what’s going on in the doctor’s head. Most doctors are not trained in medical school or later, how do you work with a patient who behaves like you or me, you know? And in the hundreds of speeches that I’ve given, I’ve had quite a range of reactions from doctors afterwards. I’ve had doctors come up to me and say, “This is crap.” I mean, right to my face, right. “I’ll make the decisions. I’ll decide what we’re going to talk about.” And now my thought is, OK, and you’re not going to be my doctor. LEE: Yeah. DEBRONKART: I want to be responsible for how the time is spent, and I didn’t want be fumbling for words during the visit. LEE: Right. DEBRONKART: So I said, I’ve got among other things … one of the 13 things was I had a stiff shoulder. So he ordered a shoulder x-ray, and I went and got the shoulder x-ray. And I will never forget this. Nine o’clock the next morning, he called me, and I can still—this is burned into my memory—I can see the Sony desk phone with 0900 for the time. He said, “Dave, your shoulder’s going to be fine. I pulled up the x-ray on my screen at home. It’s just a rotator cuff thing, but Dave, something else showed up. There’s something in your lung that shouldn’t be there.” And just by total luck, what turned out to be a metastasis of kidney cancer was in my lung next to that shoulder. He immediately ordered a CAT scan. Turned out there were five tumors in both lungs, and I had stage 4 kidney cancer. LEE: Wow. DEBRONKART: And on top of that, back then—so this was like January of 2007—back then, there was much less known about that disease than there is now. LEE: Right. DEBRONKART: There were no studies—zero research on people like me—but the best available study said that for somebody with my functional status, my median survival was 24 weeks. Half the people like me would be dead in five and a half months. LEE: So that just, you know, I can’t imagine, you know, how I would react in this situation. And what were your memories of the interaction then between you and your doctor? You know, how did your doctor engage with you at that time? DEBRONKART: I have very vivid memories. [LAUGHS] Who was it? I can’t remember what famous person said, “Nothing focuses the mind like the knowledge that one is to be hanged in a fortnight,” right. But 24 weeks does a pretty good job of it. And I … just at the end of that phone call where he said I’m going to order a CAT scan, I said, “Is there anything I should do?” Like I was thinking, like, go home and make sure you don’t eat this sort of this, this, that, or the other thing. LEE: Right. DEBRONKART: And what he said was, “Go home and have a glass of wine with your wife.” LEE: Yeah. DEBRONKART: Boy, was that sobering. But then it’s like, all right, game on. What are we going to do? What are my options? And a really important thing, and this, by the way, this is one reason why I think there ought to be a special department of hell for the people who run hospitals and other organizations where they think all doctors are interchangeable parts. All right. My doctor knew me. LEE: Yeah. DEBRONKART: And he knew what was important to me. So when the biopsy came back and said, “All right, this is definitely stage 4, grade 4 renal cell carcinoma.” He knew me enough … he said, “Dave, you’re an online kind of guy. You might like to join this patient community that I know of.” This was 2007. LEE: Yeah. DEBRONKART: It’s a good quality group. This organization that barely exists. LEE: That’s incredibly progressive, technologically progressive for that time. DEBRONKART: Yeah, incredibly progressive. Now, a very important part of the story is this patient community is just a plain old ASCII listserv. You couldn’t even do boldface, right. And this was when the web was … web 2.0 was just barely being created, but what it was, was a community of people who saw the problems the way I see the problems. God bless the doctors who know all the medical stuff, you know. And they know the pathology and the morphology and whatever it is they all know. And I’m making a point here of illustrating that I am anything but medically trained, right. And yet I still, I want to understand as much as I can. I was months away from dead when I was diagnosed, but in the patient community, I learned that they had a whole bunch of information that didn’t exist in the medical literature. Now today we understand there’s publication delays; there’s all kinds of reasons. But there’s also a whole bunch of things, especially in an unusual condition, that will never rise to the level of deserving NIH [National Institute of Health] funding, right … LEE: Yes. DEBRONKART: … and research. And as it happens, because of the experience in that patient community, they had firsthand experience at how to survive the often-lethal side effects of the drug that I got. And so I talked with them at length and during my treatment, while I was hospitalized, got feedback from them. And several years later my oncologist, David McDermott, said in the BMJ [British Medical Journal], he said, “You were really sick. I don’t know if you could have tolerated enough medicine if you hadn’t been so prepared.” Now there is a case for action, for being actively involved, and pointing towards AI now, doing what I could to learn what I could despite my lack of medical education. LEE: But as you were learning from this patient community these things, there had to be times when that came into conflict with the treatment plan that you’re under. That must have happened. So first off, did it? And how were those conflicts resolved? DEBRONKART: So, yes, it did occasionally because in any large population of people you’re going to have differences of opinion. Now, before I took any action—and this closely matches the current thought of human in the loop, right—before I took any action based on the patient community, I checked with my clinicians. LEE: Were there times when there were things that … advice you were getting from the patient community that you were very committed to, personally, but your official, formal caregivers disagreed with? DEBRONKART: No, I can’t think of a single case like that. Now, let me be clear. My priority was: save my ass, keep me alive, you know? And if I thought a stranger at the other end of an internet pipe had a different opinion from the geniuses at my hospital—who the whole patient community had said, this is maybe the best place in the world for your disease— LEE: Yes. DEBRONKART: I was not going to go off and have some philosophical debate about epistemology and all of that stuff. And remember, the clock was ticking. LEE: Well, in fact, there’s a reason why I keep pressing on this point. It’s a point of curiosity because in the early days of GPT-4, there was an episode that my colleague and friend Greg Moore, who’s a neuroradiologist, had with a friend of his that became very ill with cancer. And she went in for treatment and the treatment plan was a specific course of chemotherapy, but she disagreed with that. She wanted a different type of, more experimental immunotherapy. And that disagreement became intractable to the point that the cancer specialists that were assigned to treat her asked Greg, “Can you talk to her and explain, you know, why we think our decision is best?” And the thing that was remarkable is Greg decided to use that case as one of the tests in the early development days of GPT-4 and had a conversation to explain the situation. They went back and forth. GPT-4 gave some very useful advice to Greg on what to say and how to frame it. And then, when Greg finally said, “You know, thank you for the help.” What floored both me and Greg is GPT-4 said, “You’re welcome. But, Greg, what about you? Are you getting all the support that you need? Here are some resources.” And, you know, I think we can kind of take that kind of behavior for granted today, and there have been some published studies about the seeming empathy of generative AI. But in those early days, it was eerie, it was awe-inspiring, it was disturbing—you know, all of these things at once. And that’s essentially why I’m so curious about your experiences along these lines. DEBRONKART: That’s like, that’s the flip side of the famous New York Times reporter who got into a late-night discussion … LEE: Oh, Kevin Roose, yes. [LAUGHTER] DEBRONKART: You say you’re happy in your marriage, but I think you’re not. LEE: Right. DEBRONKART: It’s like, whoa, this is creepy. But you know, it’s funny because one of the things that’s always intrigued me, partly because of my professional experience at explaining technology to people, is the early messaging around LLMs [large language models], which I still hear people … The people who say, “Well, wait a minute, these things hallucinate, so don’t trust them.” Or they say, “Look, all it’s doing is predicting the next word.” But there are loads of nuances, … LEE: Yes. DEBRONKART: … LEE: Hmm, yes. Yeah. DEBRONKART: … to be able to express that. Honestly, that is why I’m so excited about the arriving future. One immensely important thing … as I said earlier, I really respect my doctors’ time—“doctors” plural—and it breaks my heart that the doctors who did all this work to get license and all that stuff are quitting the field because the economic pressures are so great. I can go home and spend as many hours as I want asking it questions. LEE: Yes. DEBRONKART: All right. I’ve recently learned a thing to do after I have one of these hours-long sessions, I’ll say to it, “All right, so if I wanted to do this in a single-shot prompt, how would you summarize this whole conversation?” So having explored with no map, I end up with a perspective that it just helps me see the whole thing … LEE: Yes. Yeah, that’s brilliant. DEBRONKART: … without spending a moment of the doctor’s time. LEE: Yeah, yeah. So when was the first time that you used, you know, generative AI? DEBRONKART: It had to be February or March of whatever the first year was. LEE: Yeah. And was it the New York Times article that piqued your interest? DEBRONKART: Oh absolutely. LEE: Yeah. And so what did you think? Were you skeptical? Were you amazed? What went through your mind? DEBRONKART: Oh, no, no, no. It blew my mind. And I say that as somebody who emerged from the 1960s and ’70s, one of the original people who knew what it was to have your mind blown back in the psychedelic era. [LAUGHTER] No, it blew my mind. And it wasn’t just the things it said; it was the implications of the fact that it could do that. I did my first programming with BASIC or Fortran. I don’t know, something in the mid-’60s, when I was still in high school. So I understand, well, you know, you got to tell it exactly what you want it to do or it’ll do the wrong thing. So, yeah, for this to be doing something indistinguishable from thinking—indistinguishable from thinking—was completely amazing. And that immediately led me to start thinking about what this would mean in the hands of a sick person. And, you know, my particular area of fascination in medicine—everything I use it for these days is mundane—but the future of a new world of medicine and healthcare is one where I can explore and not be limited to things where you can read existing answers online. LEE: Right. So if you had GPT-4 back in 2006, 2007, when you were first diagnosed with your renal cancer, how would things have been different for you? Would things have been different for you? DEBRONKART: Oh, boy, oh, boy, oh, boy. This is going to have to be just a swag because, I mean, for it to—you mean, if it had just dropped out of thin air? LEE: Yes. [LAUGHS] DEBRONKART: Ah, well, that’s … that’s even weirder. First thing we in the patient community would have to do is figure out what this thing does … LEE: Yeah. DEBRONKART: … before we can start asking it questions. Now, Peter, a large part of my evangelism, you know, there’s a reason why my book (opens in new tab) and my TED talk (opens in new tab) were titled “Let Patients Help.” I really am interested in planting a thought in people’s minds, and it’s not covert. I come right out and say it in the title of the book, right, planting a thought that, with the passage of time, will hold up as a reasonable thing to do. And same thing is true with AI. So … and I’ve been thinking about it that way from the very beginning. I never closed the loop on my cancer story. I was diagnosed in January, and I had my last drop of high-dose interleukin—experimental immunotherapy, right—in July. And that was it. By September, they said, looks like you beat it. And I was all done. And there’s the question: how could it be that I didn’t die? How could it be that valuable information could exist and not be in the minds of most doctors? Not be in the pages of journals? And if you think of it that way, along the way, I became a fan of Thomas Kuhn’s famous book, The Structure of Scientific Revolutions (opens in new tab). LEE: Yes. DEBRONKART: When something that the paradigm says could not happen does happen, then responsible thinkers have to say, the paradigm must be wrong. That’s the stage of science that he called a crisis. So if something came along back in 2006, 2007, I would have to look at it and say, “This means we’ve got to rethink our assumptions.” LEE: Yes. You know, now with the passage of time, you know, over the last two years, we’ve seen so many stories like this, you know, where people have consulted AI for a second opinion, … DEBRONKART: Sure. LEE: … maybe uploaded their labs and so on and gotten a different diagnosis, a different treatment suggestion. And in several cases that have been reported, both in medical journals and in the popular press, it’s saved, it has saved lives. And then your point about communities, during COVID pandemic, even doctors form communities to share information. A very famous example are doctors turning to Facebook and Twitter to share that if they had a COVID patient in severe respiratory distress, sometimes they could avoid intubation by … DEBRONKART: Pronation. Yeah. LEE: … pronation. And things like this end up being, in a way, I think the way you’re couching it, ways to work around the restrictions in the more formal healthcare system. DEBRONKART: The traditional flow. Yes. And there is nothing like a forest fire, an emergency, an unprecedented threat to make people drop the usual formal pathways. LEE: So, I’d like to see if we can impart from your wisdom and experience some advice for specific stakeholders. So, what do you say to a patient? What do you say to a doctor? What do you say to the executive in charge of a healthcare system? And then finally, what do you say to policymakers and regulators? So, let’s start with patients. DEBRONKART: So if you’ve got a problem that or a question where you really want to understand more than you’ve been able to, then give a try to these things. Ask some questions. And it’s not just the individual question and answer. The famous, amazing patient advocate, Hugo Campos, … LEE: Hmm, yes. DEBRONKART: … said something that I call “Hugo’s Law.” He said, “Dave, I don’t ask it for answers. I use it to help me think.” LEE: Yes, absolutely. DEBRONKART: So you get an answer and you say, “Well, I don’t understand this. What about that? Well, what if I did something different instead?” And never forget, you can come back three months later and say, “By the way, I just thought of something. What about that,” right. LEE: Yeah, yeah, fantastic. DEBRONKART: So be focused on what you want to understand. LEE: So now let’s go to a doctor or a nurse. What’s the advice there? DEBRONKART: Please try to imagine a world … I know that most people today are not as activated as I am in wanting to be engaged in their health. But to a very large extent, people, a lot of people, family and friends, have said they don’t want to do this because they don’t want to offend the doctors and nurses. Now, even if the doctor or nurse is not being a paternal jerk, all right, the patients have a fear of this. Dr. Sands handles this brilliantly. I mentioned it in the book. He proactively asks, are there any websites you’ve found useful? And you can do the same thing with AI. Have you done anything useful with ChatGPT or something like that? LEE: That actually suggests some curricular changes in medical schools in order to train doctors. DEBRONKART: Absolutely. In November, I attended a retreat on rethinking medical education. I couldn’t believe it, Peter. They were talking about how AI can be used in doing medical education. And I was there saying, “Well, hello. As long as we’re here, let’s rethink how you teach doctors, medical students to deal with somebody like me.” Cause what we do not want … There was just a study in Israel where it said 18% of adults use AI regularly for medical questions, which matches other studies in the US. LEE: Yep. DEBRONKART: But it’s 25% for people under 25. We do not want 10 years from now to be minting another crop of doctors who tells patients to stay off of the internet and AI. LEE: You know, it’s such an important point. Students, you know, entering into college to go on to medical school and then a residency and then finally into practice. I think you’re thinking about the year 2035 or thereabouts. And when you think of that, at least in tech industry terms, we’re going to be on Mars, we’re going to have flying cars, we’re going to have AGI [artificial general intelligence], and you really do need to think ahead. DEBRONKART: Well, you know, healthcare, and this speaks to the problems that health system executives are facing: y’all better watch out or you’re going to be increasingly irrelevant, all right. One of the key use cases, and I’m not kidding … I mean, I don’t mean that if I have stage 4 kidney cancer, I’m going to go have a talk with my robot. But one of the key use cases that makes people sit down and try to solve a problem on their own with an LLM is if they can’t get an appointment. LEE: Yes. DEBRONKART: Well, so let’s figure out, can the health system, can physicians and patients learn to work together in some modified way? Nobody I know wants to stop seeing a doctor, but they do need to have their problems solved. LEE: Yeah, yeah. DEBRONKART: And there is one vitally important thing I want to … I insist that we get into this, Peter. In order for the AI to perform to the best of its contribution, it needs to know all the data. LEE: Yes. DEBRONKART: Well, and so does the patient. Another super-patient, James Cummings, has two rare-genetic-mutation kids. (opens in new tab) He goes to four Epic-using hospitals. Those doctors can’t see each other’s data. So he compiles it, and he shows … the patient brings in the consolidated data. LEE: Yes. Well, and I know this is something that you’ve really been passionate about, and you’ve really testified before Congress on. But maybe then that leads to this fourth category of people who need advice, which are policymakers and regulators. What would you tell them? DEBRONKART: It’s funny, in our current political environment, there’s lots of debates about regulation, more regulation, less regulation. I’m heavily in favor of the regulations that say, yeah, I gotta be able to see and download my damn data, as I’m famous for calling it. But what we need to do if we were to have any more regulations is just mandate that you can’t keep the data away from people who need it. You can’t when … LEE: Yep. DEBRONKART: OK, consider one of the most famous AI-using patients is this incredible woman, Courtney Hofmann, whose son saw 17 doctors over three years (opens in new tab), and she finally sat down one night and typed it all into GPT. She has created a startup to try to automate the process of gathering everyone’s data. LEE: Yes, yes. Yeah. DEBRONKART: And I know people who have been trying to do this and it’s just really hard. Policy people should say, look, I mean, we know that American healthcare is unsustainable economically. LEE: Yes. DEBRONKART: And one way to take the pressure off the system—because it ain’t the doctors’ fault, because they’re burned out and quitting—one way to take the pressure off is to put more data in the hands of the patients so that entrepreneurs can make better tools. LEE: Yeah. All right. So, we’ve run out of time, but I want to ask one last provocative question to send us off. Just based on your life’s experience, which I think is just incredible and also your personal generosity in sharing your stories with such a wide audience, I think is incredible. It’s just doing so much good in the world. Do you see a future where AI effectively replaces human doctors? Do you think that’s a world that we’re heading towards? DEBRONKART: No, no, no, no. People are always asking me this. I do imagine an increasing base, an increasing if … maybe there’s some Venn diagram or something, where the number of things that I can resolve on my own will increase. LEE: Mm-hmm. Yes. DEBRONKART: And in particular, as the systems get more useful, and as I gain more savvy at using them and so on, there will be cases where I can get it resolved good enough before I can get an appointment, right. But I cannot imagine a world without human clinicians. Now, I don’t know what that’s going to look like, right. LEE: Yes. [LAUGHS] DEBRONKART: I mean, who knows what it’s going to be. But I keep having … Hugo blogged this incredible vision of where his agentic AI will be looking at one of these consolidated blob medical records things, and so will his doctor’s agentic AI. LEE: Yes. Well, I think I totally agree with you. I think there’ll always be a need and a desire for the human connection. Dave, this has been an incredible, really at times, riveting conversation. And as I said before, thank you for being so generous with your personal stories and with all the activism and advocacy that you do for patients. DEBRONKART: Well, thank you. I’m, as I said at the beginning, I’m glad to be alive and I’m really, really, really grateful to be given a chance to share my thoughts with your audience because I really like super smart nerds. [LAUGHTER] No, well, no kidding. In preparing for this, I listened to a bunch of back podcast episodes, “Microsoft Research,” “NEJM AI.” They talk about things I do not comprehend and don’t get me started on quantum, right? [LAUGHTER] But I’m grateful and I hope I can contribute some guidance on how to solve the problem of the person for whom the industry exists. LEE: Yeah, you absolutely have done that. So thank you. [TRANSITION MUSIC] E-Patient Dave is so much fun to talk to. His words and stories are dead serious, including his openness about his struggles with cancer. But he just has a way of engaging with the world with such activism and positivity. The conversation left me at least with a lot of optimism about what AI will mean for the consumer. One of the key takeaways for me is Dave’s point that sometimes informal patient groups have more up-to-date knowledge than doctors. One wonders whether AI will make these sorts of communities even more effective in the near future. It sure looks like it. And as I listen to Dave’s personal story about his bout with cancer, it’s a reminder that it can be lifesaving to do your own research, but ideally to do so in a way that also makes it possible to work with your caregivers. Healthcare, after all, is fundamentally a collaborative activity today. Now, here’s my conversation with Christina Farr: LEE: Chrissy, welcome. I’m just thrilled that you’ve joined us here. CHRISTINA FARR: Peter, I’m so excited to be here. Thanks for having me on. LEE: One thing that our listeners should know is you have a blog called Second Opinion (opens in new tab). And it’s something that I read religiously. And one of the things you wrote (opens in new tab) a while ago expressed some questions about as an investor or as a founder of a digital health company, if you don’t use the words AI prominently, you will struggle to gain investment. And you were raising some questions about this. So maybe we start there. And, you know, what are you seeing right now in the kind of landscape of emerging digital health tech companies? What has been both the positive and negative impact of the AI craziness that we have in the world today on that? FARR: Yeah, I think the title of that was something around the great AI capital incineration [LAUGHTER] that we were about to see. But I, you know, stand by it. I do think that we’ve sort of gone really deep into this hype curve with AI, and you see these companies really just sucking up the lion’s share of venture capital investment. And what worries me is that these are, you know, it’s really hard, and we know this from just like decades of being in the space that tools are very hard to monetize in healthcare. Most of healthcare still today and where really the revenue is, is in, still in services. It’s still in those kind of one-to-one interactions. And what concerns me is that we are investing in a lot of these AI tools that, you know, are intended to sell into the system. But the system doesn’t yet know how to buy them and then, beyond that, how to really integrate them into the workflow. So where I feel more enthusiastic, and this is a little bit against the grain of what a lot of VCs [venture capitalists] think, but I actually really like care delivery businesses that are fully virtual or hybrid and really using AI as part of their stack. And I think that improves really the style of medicine that they’re delivering and makes it far more efficient. And you start to see, you know, a real improvement in the metrics, like the gross margins of these businesses beyond what you would see in really traditional kind of care delivery. And because they are the ones that own the stack, they’re the ones delivering the actual care, … LEE: Right. FARR: … they can make the decision to incorporate AI, and they can bring in the teams to do that. And I feel like in the next couple of years, we’re going to see more success with that strategy than just kind of more tools that the industry doesn’t know what to do with. LEE: You know, I think one thing that I think I kind of learned or I think I had an inkling of it, but it was really reinforced reading your writings, as a techie, I and I think my colleagues tend to be predisposed to looking for silver bullets. You know, technology that really just solves a problem completely. And I think in healthcare delivery in particular, there probably aren’t silver bullets. And what you need to do is to really look holistically at things and your emphasis on looking for those metrics that measure those end-to-end outcomes. So at the same time, Just, in preparation for this discussion, I re-read your post about Flo (opens in new tab) being the first kind of unicorn women’s health digital tech startup. And there is actually a lot of very interesting AI technology involved there. So it can happen. How do you think about that? FARR: Yeah, I mean, I see a lot of AI across the board. And it’s real with some of these companies, whether it’s, you know, a consumer health app like Flo that, you know, is really focused on kind of period tracking. And AI is very useful there in helping women just predict things like their optimal fertility windows. And it’s very much kind of integrated very deeply into that solution. And they have really sophisticated technology. And you see that now as well with the kind of craze around these longevity companies, that there is a lot of AI kind of underlying these companies, as well, especially as they’re doing, you know, a lot of health tests and pulling in new data and providing access to that data in a way that, you know, historically patients haven’t had access to. And then I also see it with, you know, like I spoke about with these care delivery companies. I recently spent some time with a business called Origin (opens in new tab), for instance, which is in, you know, really in kind of women’s health, MSK [musculoskeletal], and that beachhead is in pelvic floor PT [physical therapy]. And for them, you know, it’s useful in the back office for … a lot of their PT providers are getting great education through AI. And then it’s also useful on the patient-facing side as they provide kind of more and more content for you to do exercises at home. A lot of that can be delivered through AI. So for some of these companies, you know, they look across the whole stack of what they’re providing, and they’re just seeing opportunities in so many different places for AI. And I think that’s really exciting, and it’s very, very real. And it’s really to me like where I’m seeing kind of the first set of really kind of promising AI applications. There are definitely some really compelling AI tools, as well. I think companies like Nuance and like Abridge and that whole category of really kind of replacing human scribes with AI, like to me, that is a … that has been so successful because it literally is the pain point. It’s the pain point. You’re solving the pain point for health systems and physicians. Burnout is a huge problem. Documentation is a huge problem. So, you know, to say we’ve got this kind of AI solution, everybody’s basically on board—you know, as long as it works—[LAUGHTER] from the first meeting. And then the question becomes, which one do you choose? You know, that said, you know, to me, that’s sort of a standout area. I’m not seeing that everywhere. LEE: So there are like a bunch of things to delve into there. You know, since you mentioned the Nuance, the Dragon Copilot, and Abridge, and they are doing extremely well. But even for them, and this is another thing that you write about extensively, health systems have a hard time justifying investing in these technologies. It’s not like they’re swimming in cash. And so on that element of things, is there advice to companies that are trying to make technologies to sell into health systems? FARR: Yeah, I mean, I’ll give you something really practical on that just example specifically. So I spend a lot of time chatting with a lot of the health system CMIOs [chief medical informatics officers] trying to, you know, just really understand kind of their take. And they often tell me, “Look, you know, these technologies are not inexpensive, and we’ve already spent a boatload of money on REHR [regional electronic health records], which continues to be expensive. And so we just don’t have a lot of budget.” And for them, I think the question becomes, you know, who within the clinical organization would benefit most from these tools? There are going to be progressive physicians that will jump on these on day one and start using them and really integrating them into the workflow. And there will be a subset that just wants to do things the way they always have done things. And you don’t want to pay for seats for everybody when there’s a portion that will not be using it. So I think that’s maybe something that I would kind of share with the startup crowd is just, like, don’t try to sell to every clinician within the organization. Not everybody is going to be, you know, a technology early adopter. Work with the health systems to figure out that cohort that’s likely to jump on board first and then kind of go from there. LEE: So now let me get back to specifically to women’s health. I think your investing strategy has, I think it’s fair to say has had some emphasis on women’s health. And I would say for me, that has always made sense because if there’s one thing the tech industry knows how to do in any direct-to-consumer business is to turn engagement into dollars. And when you think about healthcare, there are very few moments in a person’s life when they have a lot of engagement with their own healthcare. But women have many. You mentioned period tracking, pregnancy, menopause. There are so many areas where you could imagine that technology could be good. At least that’s way I would think about it, but does that make any sense to you, or do you have a different thought process? FARR: Oh, my god, I’ve been, I’m just nodding right now because I’ve been saying the same thing for years, [LAUGHS] that like, I think the, you know, the moments of what I call naturally high engagement are most interesting to me. And I think it’s why it’s been such a struggle with some of these companies that are looking at, you know, areas like or conditions like type two diabetes. I mean, it’s just so hard to try to change somebody’s behavior, especially through technology. You know, we’ve not kind of proven out that these nudges are really changing anybody’s mind about, you know, their day-to-day lifestyles. Whereas, you know, in these moments, like you said, of just like naturally high engagement … like it’s, you know, women’s health, you’re right, there’s a lot of them. Like if you’re pregnant, you’re very engaged. If you’re going through menopause, you’re very engaged. And I think there are other examples like this, you know, such as oncology. You get a cancer diagnosis, you’re very engaged. And so, to me, that’s really kind of where I see the most interesting opportunities for technology and for digital health. And, you know, one example I’ll give you in women’s health, I’m not invested in this company, sadly. They are called Midi Health (opens in new tab). And they’re really everywhere in the menopause area now, like, you know, the visit volume that they are seeing is just insane. You know, this is a population that is giant. It’s, like, one in two people are women. At some point, we pretty much all go through menopause, some people earlier, some later. And for a lot of us, it’s a really painful, disruptive thing to experience. And we tend to experience it at a moment when we actually have spending money. So it just ticks all the boxes. And yet I think because of the bias that we see, you know, in the venture land and in the startup world, we just couldn’t get on this opportunity for a really long time. So I’ve been very excited to see companies like that really have breakout success. LEE: First off, you know, I think in terms of hits and misses from our book. One hit is we did think a lot about the idea that patients directly would be empowered by AI. And, you know, we had a whole chapter on this, and it was something that I think has really turned out to be true, and I think it will become more true. But one big miss is we actually didn’t think about what we were just talking about, about like who and when would this happen? And the specific focus on women, women’s health, I think is something that we missed. And I think one of the reasons I sought you out for this conversation is if I remember your own personal history, you essentially transitioned from journalism to venture investing at about the same time that you yourself were having a very intense period of engagement with health because of your own pregnancy. And so if you don’t mind, I’d like to get into your own experience with healthcare through pregnancy, your own experiences raising children, and how that has informed your relationship with digital health and the investing and advising that you do today. FARR: Yeah, it’s great question. And I actually was somebody who, you know, wrote a lot while I was kind of on maternity leave about this experience because it was such a profound one. You know, I think the reason that pregnancy is so interesting to healthcare companies and systems is because really for a lot of women, it’s their first experience with the hospital. Most of us have never stayed in the hospital for any period of time until that moment. Both times I had C-sections, so I was there for a good three or four days. And, you know, I think it’s a really big opportunity for these systems, even if they lose money, many of them lose money on pregnancy, which is a whole different topic, but there is an opportunity to get a whole family on board and keep them kind of loyal. And a lot of that can come through, you know, just delivering an incredible service. Unfortunately, I don’t think that we are delivering incredible services today to women in this country. I see so much room for improvement. You know, you see, just look at the data. You see women, you know, still dying in childbirth in this country where in many other developed nations, that’s just no longer the case. LEE: Yeah. And what are, in your view, the prime opportunities or needs? What do we need to do if we have a focus on technology to improve that situation? FARR: Yeah, I mean, I think there’s definitely an opportunity for, you know, just digital technologies and for remote patient monitoring and just other forms of monitoring. I do think we should look at what other countries have done and really consider things like, you know, three days post-discharge, somebody comes to your home, you know, whether it’s to check on you from a healthcare perspective, both, you know, physical and mental health, but then also make sure that the environment is safe for both the mother and the baby. Simple things like that, that don’t even really require any technology. And then there’s certainly opportunities for new forms of, you know, diagnostic tests for things like preeclampsia, postpartum preeclampsia. We could definitely use some new therapeutics in this area. Then, you know, would love to kind of also touch on the opportunity in pediatrics because there I think is an ideal use case for AI. And that’s definitely my reality now. LEE: Well, fact, yeah, in fact, I hope I’m not delving into too many personal issues here. But I do remember, I think with your first child, which you had during the height of the COVID pandemic, that your child actually had COVID and actually even lost sense of taste and smell for a period. And, in our book, we had sort of theorized that people would turn possibly to AI for advice to understand what was going on. When you look broadly at the kinds of queries that come into a search engine or into something like ChatGPT or Copilot, you do see things along those lines. But at the same time, I had always thought people wouldn’t just use a raw chat bot for these things. People would want an app, perhaps powered by AI, that would be really designed for this. And yet somehow that seems not to be as widespread. FARR: Yeah. And I think the word app is a great one that I’d love to, you know, maybe interrogate a little bit because I think that we have been overly reliant on apps. I’ll give you an example. So in a pediatric space, I am a user of an app called Summer Health (opens in new tab) or it’s not an app. Sorry. It’s a text messaging service. [LAUGHTER] And this is the genius. So I just pick up my phone, and I text “Summer” and a pediatrician responds within a matter of minutes. And sometimes it’s a pediatric nurse, but it’s somebody who responds to me. And they say, oh, what’s going on? And I might say, OK, well, this week we had the norovirus. So these are the symptoms. And they might say, you know, I’d love to see an image or a video. And I can text that to them. And if a prescription is required, then that goes to a pharmacy near me through another digital application that’s really cool called Photon Health (opens in new tab), where my script is portable, so I can move it around based on what’s open. So, through this, I’m getting an incredible experience that’s the most convenient … LEE: Wow. FARR: I could ever ask for, and there is no app. [LAUGHS] And you could imagine the potential for AI. You know, a company like this is probably getting so many questions about a norovirus or COVID or RSV [Respiratory Syncytial Virus], and is, I’m sure, starting to think about kind of ways in which AI could be very useful in this regard. And you don’t need a pediatrician or pediatric nurse answering every question. Perhaps there’s like sophisticated triaging to determine which questions should go to the human expert. But, you know, again, back to this app question, like, I think we have too many. Like, it’s just … like from a user experience perspective, just having to find the app, log into the app. Sometimes there’s just layers of authentication. Then you have to remember your password. [LAUGHTER] And it’s just, you know, it’s just too many steps. And then there’s like 50 of them for all kinds of different things. LEE: Yes. Well, and you have to also go to an app store, download the thing. FARR: Go to the app store down. It’s just too many steps. LEE: Yes. FARR: So, like, I, you know, I recognize that HIPAA exists. If there is any kind of claim involved, then, you know, you need an app because you got privacy to think about and compliance, but like, in LEE: It’s so interesting to hear you say this because one thing that I’ve thought—and I’ve actually even expressed publicly in some venues—is one logical endpoint for AI as we understand it today is that apps become unnecessary. We might still have machines that, you know, you hold in the palm of your hand, but it’s just a machine that does what you want it to do. Of course, the business model implications are pretty profound. So for that particular text messaging service, do you understand what their business model is? You know, how are they sustaining themselves? FARR: Consumer, it’s all cash pay. It’s cash pay. You just pay a subscription. And, you know, there are certainly kind of privacy requirements, you know, related to kind of federal and state, but you could consent to be able to do something like this. And, you know, companies like this have teams of lawyers that kind of think through how do you make something like this happen. But it’s possible because of this cash pay element that really underlies that. And I think that is a growing trend. You know, I was literally sitting with a benefits consultant a few weeks ago, and he was saying to me, like, “I tell all my friends and family, just don’t use your insurance at all, unless it’s for like a very high price thing, like a medical procedure that’s expensive or a surgery.” He said, for everything else, I just pay cash. I pay cash for all my primary care. I pay cash for, you know, basic generic, you know, prescription medications that, you know, it’s like a few cents to manufacture. And I’m sort of getting there, too, where I just kind of increasingly am relying on cash pay. And I think that sort of opens up a world of opportunity for just innovation related to user experience that could really bring us to this place that you mentioned where there is no app. You literally just text or, you know, you use your voice, and you say, “I need a restaurant reservation,” and it’s done. LEE: Mm-hmm. Yeah. FARR: And it’s that simple, right? And the sort of appification of everything, you know, was a important kind of evolution or moment in technology that is undeniable. But I totally agree with you that I think we might be moving past that. LEE: On this idea of cash, there is a little bit of a fatigue, on the other hand, with—for consumers; let me just speak as a consumer—I can’t keep track anymore of all the subscriptions I have. And so are we just trading one form of, you know, friction for another? FARR: Yeah, that’s a great point. But there are things that, you know, I think there are those moments where you continue to pay a subscription because it’s just something that’s chronic. You know, it’s just relevant to you. You know, pediatrics is a great example. At some point, like I won’t need a pediatrician on demand, which is what I have now, maybe when my kids are a little older, and we’re not just a cesspool of various kind of viruses at home. [LAUGHTER] But again, back to your point about, you know, the sort of moments of just, like, natural engagement, I think there’s also a moment there … there are areas or parts of our lives where, like primary care, where it’s just more longitudinal. And it makes sense to pay on a kind of subscription basis. Like our system is messed up because there’s just messed up incentives, right. And a subscription to me is very pure. [LAUGHTER] Like it’s you’re just saying, “I’m paying for a service that I want and need.” And then the company is saying, “OK, let me make this service as efficient and great and affordable for you as I possibly can.” And to me, that’s like a very, like refreshing trade. And I feel the same way, by the way, in my media business, which, you know, definitely has a subscription element. And it just means a lot when someone’s willing to say like this content’s worth paying for. LEE: Yes. FARR: It doesn’t work for everything, but I think it works for things that, you know, have that long-term payoff. LEE: Yeah, I really love that. And if I have one regret about the chapter on kind of the consumer experience from our book—I think all of this seems obvious in retrospect—you know, I wish we had tried to understand, you know, this aspect of the consumer experience, that people might actually have just online experiences that they would pay a monthly fee or an annual fee for. Because it also hits on another aspect of consumer, which is this broad—it’s actually now a national issue in healthcare—about price transparency. And this is another thing that I think you’ve thought about and written about, both the positives and negatives of this. I remember one blog post you made that talked about the issue of churn in digital health. And if I remember correctly, you weren’t completely certain that this was a good thing for the emerging digital health ecosystem. Can you say more about this idea of churn? FARR: Yeah, I mean, you know, I’ve been writing for a long time and thinking for a long time about the buyers of a lot of these kind of digital health companies, like who are the customers? And there was a long period where it was, it was really the self-insured employer, like Microsoft, being a sort of customer of these solutions because they wanted to provide a great array of health benefits for their own employees. And that was, you know, for a long time, like 10 or 15 years, you know, big companies that have now gone public, and it seemed like a faster timeline to be able to sell relative to health systems and, you know, health plans and other groups. And I’ve now kind of been on the forefront of saying that this channel is kind of dead. And one of the big reasons is just, you know, there’s no difference, I would say to what you see kind of in the payer lane, which is that churn is a big problem. People used to stay at jobs for 20, 30, 40 years, … LEE: Right. FARR: … and then you’d retire and have great benefits. And so it kind of made sense that your company was responsible for the healthcare that you received. And now I think the last time I looked at the Bureau of Labor Statistics, it’s around four years, a little bit less than four years. So what can you do in four years? [LAUGHS] I just read an interesting analysis on GLP-1s, these medications now that obviously are everywhere in tackling type two diabetes, and obesity is kind of the main, seems to be the hot use case. But, you know, I’m reading analysis around ROI that it’s 15, over 15 years, to see an ROI if you are, you know, a system or a plan or employer that chooses to pay for this. So how does that equate when you don’t keep an employee around for more than four? LEE: Yep. FARR: So I think it’s just left employers in a really bad place of having to make a bunch of tradeoffs and, you know, employees are demanding, we want access to these things. And they’re saying, well, our healthcare costs just keep going up and up and up. You know, we have inflation to contend with and we’re not seeing, you know, the analysis that it necessarily makes sense for us to do so. So that’s what I have, you know, been sort of harping on about with this churn issue that I’m seeing. LEE: Well, I have to tell you, it really, when I first started reading about this from you, it really had a profound impact on my thinking, my thought process. Because one of the things that we dream about is this idea that’s been present actually for decades in the healthcare world of this concept of real-world evidence, RWE. And that is this dream that now that we’ve digitized so much health experience, we should be able to turn all that digital data from people’s health experiences into new medical knowledge. But the issue of churn that I think that I would credit you introducing me to calls that into question because you’re right. Over a four-year period, you don’t get the longitudinal view of a person’s health that gives you the ability to get those medical insights. And so something needs to change there. But it’s very much tied to what consumers want to do. Consumers move around; they change jobs. FARR: Yes. LEE: If it’s cash-based, they’ll be shopping based on all sorts of things. And so it … FARR: And so the natural end of all this, it’s two words: single payer. [LAUGHS] But we don’t want to go there as a country. So, you know, it sort of left us in this kind of murky middle. And I think a lot about, kind of, what kind of system we’ll end up having. What I don’t think is possible is that this current one is sustainable. LEE: You know, I do think in terms of the payer of CMS [Centers for Medicare and Medicaid Services], Medicare and Medicaid services, the amount of influence that they exert on health spending in the US has been increasing steadily year by year. And in a sense, you could sort of squint and view that as a slow drift towards some element of single payer. But it’s definitely not so intentional or organized right now. While we’re talking about these sorts of trends, of course, another big trend is the graying of America. And we’re far from alone, China, and much of the Orient, Europe, UK, people are getting older. And from the consumer-patient perspective, this brings up the challenge, I think, that many people have in caring for elderly loved ones. And this seems to me, like women’s health, to be another area where if I were starting a new digital health company, I would think very seriously about that space because that’s another space where there can be extreme intensity of engagement with the healthcare system. Do you as both a human being and consumer but also as an investor, do you think about that space at all? FARR: Oh, yes, all the time. And I do think there’s incredible opportunity here. And it’s probably because of the same kind of biases that exist that, you know, didn’t allow us to see the menopause opportunity, I think we’re just not seeing this as being as big as it is. And like you said, it’s not just an American problem. It’s being felt across the world. And I do think that there are some, you know, I’ve seen some really interesting stuff lately. Was recently spending some time with a company called Cherish Health (opens in new tab) out of Boston, and they’re using AI and radar-based sensing technologies to just be able to stick a device and like really anywhere in the person’s home. And it just like passively is able to detect falls and also kind of monitor kind of basic health metrics. And because it’s radar, it can operate through walls. So even if you’re in the bathroom, it still works, which has been a big problem with a lot of these devices in the past. And then, you have to have really advanced kind of AI and, you know, this sort of technology to be able to glean whether it’s a true fall or, you know, that’s really, you need help or it’s, you know, just the person sitting down on the floor to play with their grandchild. So things like this are, they’re still early, but I think really exciting. And we’re going to see a lot more of that in addition to, you know, some really interesting companies that are trying to think more about sort of social needs that are not healthcare needs, but you know, this, this population needs care, like outside of just, you know, medical treatment. They oftentimes may be experiencing homelessness, they might experience food insecurity, there might be a lack of just caregivers in their life. And so, you know, there are definitely some really interesting businesses there, as well. And then kind of a, you know, another trend that I think we’ll see a lot more is that, you know, countries are freaking out about the lack of babies being born, which you need to be able to … you know, I recognize climate change is a huge issue, but you also need babies to be born to support this aging population. So I think we’re going to see, you know, a lot more interest from these administrations around, you know, both like child tax credits and various policies to support parents but then also IVF [in vitro fertilization] and innovation around technology in the fertility space. LEE: All right. So we’re starting to run towards the end of our time together. So I’d like to get into maybe a couple more provocative or, you know, kinds of questions. So first, and there’s one that’s a little bit dark and another that’s much lighter. So let me start with the darker one so we can have a chance to end on a lighter note. I think one of the most moving pieces I’ve read from you recently was the open letter to your kids about the assassination of Brian Thompson (opens in new tab), who’s a senior executive of UnitedHealth Group. And so I wonder if you’re willing to share, first off, what you wrote there and then why you felt it was important to do that. FARR: Yeah. So, you know, I thought about just not saying anything. That was my original intention because it was just, you know, that moment that it happened, it was just so hot button. And a lot of people have opinions, and Twitter was honestly a scary place, just with the things that people were saying about this individual, who, you know, I think just like had a family and friends and a lot of my network knew him and felt really personally impacted by this. And I, you know, it was just a really sad moment, I think, for a lot of reasons. And then I just kind of sat down one evening and I wrote this letter to my kids that basically tried to put a lot of this in context. Like what … why are people feeling this way about our healthcare system? You know, why was all this sort of vitriol being really focused on this one individual? And then, you know, I think one of the things I sort of argued in this letter was that there’s lots of ways to approach innovation in the space. You can do it from the outside in, or you can do it from the inside out. And I’ll tell you that a lot of like, I got a lot of emails that week from people who were working at health plans, like UnitedHealth employees, some of them in their 20s, you know, they were recent kind of grads who’d gone to work at this company. And they said, you know, I felt like I couldn’t tell my friends, kind of, where I worked that week. And I emailed back and said, “Look, you’re learning healthcare. You are in an incredible position right now. Like whether you choose to stay your current company or you choose to leave, like you, you understand like the guts and the bowels of healthcare because you’re working at the largest healthcare company in the world. So you’re in an enviable position. And I think you are going to be able to effect change, like, more so than anyone else.” And that was part of what I wrote in this letter, that, you know, we should all agree that the system is broken, and we could do better. Nothing about what happened was OK. And also, like, let’s admire our peers and colleagues that are going into the trenches to learn because I genuinely believe those are the people that, you know, have the knowledge and the contacts and the network to be able to really kind of get change moving along, such desperately needed change. LEE: All right. So now one thing I’ve been asking every guest is about the origin story with respect to your first encounter with generative AI. How did that happen, and what were your first sort of experiences like? You know, what emotionally, intellectually, what went through your mind? FARR: So probably my first experience was I was really struggling with the title for my book. And I told ChatGPT what my book was about and what I wanted the title to evoke and asked it for recommendations. And then, I thought the first, like, 20 were actually pretty good. And I was able to say, can you make it a bit more witty? Can you make it more funny? And it spat back out some quite decent titles. And then what was interesting is that it just got worse and worse, like, over time and just ended up, like, deeply cheesy. [LAUGHTER] And so it sort of both like made me think that this could be a really useful prompt for just brainstorming. But then either it does seem to be some weird thing with AI where, like the more you push it on the same question, it just, like, it doesn’t … it seems to have sparked the most creativity in the first few tries, and then it just gets worse. And maybe you know more about this than I would. You certainly know more about this than I do. But that’s been my kind of general experience of it thus far. LEE: Mm-hmm. But would you say you were more skeptical or awe-inspired? What were the emotions at that moment? FARR: Um, you know, it was better than, like, a lot of my ideas. [LAUGHTER] So I definitely felt like it was from that perspective very impressive. But then, you know, it seemed to have the same human, like I said, we all kind of run out of ideas at some point and, you know, it turns out, so do the machines. So that was interesting in and of itself. And I ended up picking, I think a title that was like sort of, you know, inspired by the AI suggestions, but was definitely had its own twist that was my own. LEE: Well, Chrissy, I’ve never known you as someone who runs out of ideas, but this has been just great. As always, I always learn a lot when I have a chance to interact with you or read your writings. And so, thank you again for joining. Just really, really appreciate it. FARR: Of course, and next time I want to have you on my podcast because I have a million questions for you, too. LEE: Sure, anytime. FARR: Amazing. OK, I’ll hold you to that. Thanks so much for having me on. [TRANSITION MUSIC] LEE: I’ve always been impressed not only with Chrissy’s breadth and depth of experience with the emerging tech trends that affect the health industry, but she’s also a connector to key decision-makers in nearly every sector of healthcare. This experience, plus her communication abilities, make it no surprise that she’s sought out for help in a range of go-to-market, investor relations, social media, content development, and communications issues. Maybe it shouldn’t be a surprise, but one thing I learned from our conversation is that the business of direct-to-consumer health is still emerging. It’s far from mature. And you can see that Chrissy and her venture-investing colleagues are still trying to figure out what works. Her discussion, for example, on cash-only health delivery and the idea that consumers might not want another app on their phones were indicative of that. Another takeaway is that some areas, such as pre- and postnatal care, menopause, elder care, and other types of what the health industry might call subacute care are potentially areas where not only AI might find the most impact but also where there’s sufficient engagement by consumers to make it possible to sustain the business. When Carey, Zak, and I started writing our book, one of the things that we started off with was based on a story that Zak had written concerning his 90-year-old mother. And of course, as I had said in an earlier episode of this podcast, that was something that really touched me because I was having a similar struggle with my father, who at the time was 89 years old. One of the things that was so difficult about caring for my father is that he was living in Los Angeles, and I was living up in the Pacific Northwest. And my two sisters also lived far away from Los Angeles, being in Pittsburgh and in Phoenix. And so as the three of us, my two sisters and I, tried to navigate a fairly complex healthcare system involving a primary care physician for my father plus two specialists, I have to say over a long period of illness, a lot of things happen, including the fraying of relationships between three siblings. What was so powerful for us, and this is where this idea of patient empowerment comes in, is when we could give all of the data, all of the reports from the specialist, from the primary care physician, other information, give it to GPT-4 and then just ask the question, “We’re about to have a 15-minute phone call with one of the specialists. What are the most important two or three things we should ask about?” Doing that just brings down the temperature, eliminates a potential source of conflict between siblings who are all just wanting to take care of their father. And so as we think about the potential of AI in medicine, this concept of patient empowerment, while we’ve learned in this episode, is still emerging, I think in the long run could be the most important long-term impact of this new age of AI. [THEME MUSIC] I’d like to say thank you again to Dave and Chrissy for sharing their stories and insights. And to our listeners, thank you for joining us. We have some really great conversations planned for the coming episodes, including a discussion on regulations, norms, and ethics developing around AI and health. We hope you’ll continue to tune in. Until next time. [MUSIC FADES]0 Commenti 0 condivisioni 88 Views
-
WWW.MICROSOFT.COMEngagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry ProjectThe Semantic Telemetry Project aims to better understand complex, turn-based human-AI interactions in Microsoft Copilot using a new data science approach. This understanding is crucial for recognizing how individuals utilize AI systems to address real-world tasks. It provides actionable insights, enhances key use cases , and identifies opportunities for system improvement. In a recent blog post, we shared our approach for classifying chat log data using large language models (LLMs), which allows us to analyze these interactions at scale and in near real time. We also introduced two of our LLM-generated classifiers: Topics and Task Complexity. This blog post will examine how our suite of LLM-generated classifiers can serve as early indicators for user engagement and highlight how usage and satisfaction varies based on AI and user expertise. The key findings from our research are: When users engage in more professional, technical, and complex tasks, they are more likely to continue utilizing the tool and increase their level of interaction with it. Novice users currently engage in simpler tasks, but their work is gradually becoming more complex over time. More expert users are satisfied with AI responses only where AI expertise is on par with their own expertise on the topic, while novice users had low satisfaction rates regardless of AI expertise. Read on for more information on these findings. Note that all analyses were conducted on anonymous Copilot in Bing interactions containing no personal information. Classifiers mentioned in article: Knowledge work classifier: Tasks that involve creating artifacts related to information work typically requiring creative and analytical thinking. Examples include strategic business planning, software design, and scientific research. Task complexity classifier: Assesses the cognitive complexity of a task if a user performs it without the use of AI. We group into two categories: low complexity and high complexity. Topics classifier: A single label for the primary topic of the conversation. User expertise: Labels the user’s expertise on the primary topic within the conversation as one of the following categories: Novice (no familiarity with the topic), Beginner (little prior knowledge or experience), Intermediate (some basic knowledge or familiarity with the topic), Proficient (can apply relevant concepts from conversation), and Expert (deep and comprehensive understanding of the topic). AI expertise: Labels the AI agent expertise based on the same criteria as user expertise above. User satisfaction: A 20-question satisfaction/dissatisfaction rubric that the LLM evaluates to create an aggregate score for overall user satisfaction. What keeps Bing Chat users engaged? We conducted a study of a random sample of 45,000 anonymous Bing Chat users during May 2024. The data was grouped into three cohorts based on user activity over the course of the month: Light (1 active chat session per week) Medium (2-3 active chat sessions per week) Heavy (4+ active chat sessions per week) The key finding is that heavy users are doing more professional, complex work. We utilized our knowledge work classifier to label the chat log data as relating to knowledge work tasks. What we found is knowledge work tasks were higher in all cohorts, with the highest percentage in heavy users. Figure 1: Knowledge work based on engagement cohort Analyzing task complexity, we observed that users with higher engagement frequently perform the highest number of tasks with high complexity, while users with lower engagement performed more tasks with low complexity. Figure 2: High complexity and low complexity tasks by engagement cohort+ Looking at the overall data, we can filter on heavy users and see higher numbers of chats where the user was performing knowledge work tasks. Based on task complexity, we see that most knowledge work tasks seek to apply a solution to an existing problem, primarily within programming and scripting. This is in line with our top overall topic, technology, which we discussed in the previous post. Figure 3: Heavy users tree diagram In contrast, light users tended to do more low complexity tasks (“Remember”), using Bing Chat like a traditional search engine and engaging more in topics like business and finance and computers and electronics. Figure 4: Light users tree diagram Novice queries are becoming more complex We looked at Bing Chat data from January through August 2024 and we classified chats using our User Expertise classifier. When we looked at how the different user expertise groups were using the tool for professional tasks, we discovered that proficient and expert users tend to do more professional tasks with high complexity in topics like programming and scripting, professional writing and editing, and physics and chemistry. Figure 5: Top topics for proficient/expert users Figure 6: Task complexity for proficient/expert Figure 7: Top topics for novices In contrast, novice users engaged more in professional tasks relating to business and finance and education and learning, mainly using the tool to recall information. Figure 8: Task complexity for novices However, novices are targeting increasingly more complex tasks over time. Over the eight-month period, we see the percentage of high complexity tasks rise from about 36% to 67%, revealing that novices are learning and adapting quickly (see Figure 9). Figure 9: High complexity for novices Jan-Aug 2024 How does user satisfaction vary according to expertise? We classified both the user expertise and AI agent expertise for anonymous interactions in Copilot in Bing. We compared the level of user and AI agent expertise with our user satisfaction classifier. The key takeaways are: Experts and proficient users are only satisfied with AI agents with similar expertise (expert/proficient). Novices are least satisfied, regardless of the expertise of the AI agent. Figure 10: Copilot in Bing satisfaction intersection of AI expertise and User expertise (August-September 2024) Conclusion Understanding these metrics is vital for grasping user behavior over time and relating it to real-world business indicators. Users are finding value from complex professional knowledge work tasks, and novices are quickly adapting to the tool and finding these high value use-cases. By analyzing user satisfaction in conjunction with expertise levels, we can tailor our tools to better meet the needs of different user groups. Ultimately, these insights can help improve user understanding across a variety of tasks. In our next post, we will examine the engineering processes involved in LLM-generated classification. Opens in a new tab0 Commenti 0 condivisioni 100 Views
-
WWW.MICROSOFT.COMDebug-gym: an environment for AI coding tools to learn how to debug code like programmersThe ongoing proliferation of AI coding tools is not only boosting developers’ efficiency, it also signals a future where AI will generate a growing share of all new code. GitHub CEO Thomas Dohmke (opens in new tab) predicted as much in 2023, when he said that “sooner than later, 80% of the code is going to be written by Copilot.” Both large and small software companies are already heavily using AI to generate code. Y Combinator’s Garry Tan (opens in new tab) noted that 95% of code for a quarter of Y Combinator’s latest batch of startups was written by large language models. In fact, most developers spend the majority of their time debugging code, not writing it. As maintainers of popular open-source repositories, this resonates with us. But what if an AI tool could propose fixes for hundreds of open issues, and all we had to do was approve them before merging? This was what motivated us to maximize the potential time savings from AI coding tools by teaching them to debug code. By debugging we mean the interactive, iterative process to fix code. Developers typically hypothesize why their code crashed, then gather evidence by stepping through the program and examining variable values. They often use debugging tools like pdb (Python debugger) to assist in gathering information. This process is repeated until the code is fixed. Today’s AI coding tools boost productivity and excel at suggesting solutions for bugs based on available code and error messages. However, unlike human developers, these tools don’t seek additional information when solutions fail, leaving some bugs unaddressed, as you can see in this simple demo of how a mislabeled column stumps today’s coding tools (opens in new tab). This may leave users feeling like AI coding tools don’t understand the full context of the issues they are trying to solve. Introducing debug-gym A natural research question emerges: to what degree can LLMs use interactive debugging tools such as pdb? To explore this question, we released debug-gym (opens in new tab) – an environment that allows code-repairing agents to access tools for active information-seeking behavior. Debug-gym expands an agent’s action and observation space with feedback from tool usage, enabling setting breakpoints, navigating code, printing variable values, and creating test functions. Agents can interact with tools to investigate code or rewrite it, if confident. We believe interactive debugging with proper tools can empower coding agents to tackle real-world software engineering tasks and is central to LLM-based agent research. The fixes proposed by a coding agent with debugging capabilities, and then approved by a human programmer, will be grounded in the context of the relevant codebase, program execution and documentation, rather than relying solely on guesses based on previously seen training data. Figure 1: Diagram demonstrating the code-repairing process in outline. In most existing approaches (shown in black), an agent rewrites its code conditioned on error messages obtained from executing the code. debug-gym equips the agent with additional tools such as pdb (shown in red), so it can interactively seek necessary information from the semantic space hidden behind the code and therefore have better code-repairing performance. Debug-gym is designed and developed to: Handle repository-level information: the full repository is available to agents in debug-gym, allowing them to navigate and edit files. Be robust and safe: to safeguard both the system and the development process, debug-gym runs code within sandbox Docker containers. This isolates the runtime environment, preventing harmful actions while still allowing thorough testing and debugging. Be easily extensible: debug-gym was conceived with extensibility in mind and provides practitioners with the possibility of easily adding new tools. Be text-based: debug-gym represents observation information in structured text (e.g., JSON format) and defines a simple syntax for text actions, making the environment fully compatible with modern LLM-based agents. With debug-gym, researchers and developers can specify a folder path to work with any custom repository to evaluate their debugging agent’s performance. Additionally, debug-gym includes three coding benchmarks to measure LLM-based agents’ performance in interactive debugging: Aider for simple function-level code generation, Mini-nightmare for short, hand-crafted buggy code examples, and SWE-bench for real-world coding problems requiring a comprehensive understanding of a large codebase and a solution in the format of a GitHub pull request. To learn more about debug-gym and start using it to train your own debugging agents, please refer to the technical report (opens in new tab) and GitHub (opens in new tab). Early experimentation: promising signal For our initial attempt to validate that LLMs perform better on coding tests when they have access to debugging tools, we built a simple prompt-based agent and provided it with access to the following debug tools: eval, view, pdb, rewrite, and listdir. We used nine different LLMs as the backbone for our agent. Detailed results can be found in the technical report (opens in new tab). (opens in new tab) Even with debugging tools, our simple prompt-based agent rarely solves more than half of the SWE-bench (opens in new tab)Lite issues. We believe this is due to the scarcity of data representing sequential decision-making behavior (e.g., debugging traces) in the current LLM training corpus. However, the significant performance improvement (as shown in the most promising results in the graph below) validates that this is a promising research direction. Figure 2: The success rate represents the percentage of the 300 SWE-bench Lite issues resolved. The green bars indicate the performance of the agent with debugging tools, while the gray bars show the performance of the agent without debugging tools. Note that both agents use the same backbone LLM to make decisions and propose code edits. Microsoft research podcast NeurIPS 2024: The co-evolution of AI and systems with Lidong Zhou Just after his NeurIPS 2024 keynote on the co-evolution of systems and AI, Microsoft CVP Lidong Zhou joins the podcast to discuss how rapidly advancing AI impacts the systems supporting it and the opportunities to use AI to enhance systems engineering itself. Listen now Opens in a new tab Future work We believe that training or fine-tuning LLMs can enhance their interactive debugging abilities. This requires specialized data, such as trajectory data that records agents interacting with a debugger to gather information before suggesting a fix. Unlike conventional reasoning problems, interactive debugging involves generating actions at each step that trigger feedback from the environment. This feedback helps the agent make new decisions, requiring dense data like the problem description and the sequence of actions leading to the solution. Our plan is to fine-tune an info-seeking model specialized in gathering the necessary information to resolve bugs. The goal is to use this model to actively build relevant context for a code generation model. If the code generation model is large, there is an opportunity to build a smaller info-seeking model that can provide relevant information to the larger one, e.g., a generalization of retrieval augmented generation (RAG), thus saving AI inference costs. The data collected during the reinforcement learning loop to train the info-seeking model can also be used to fine-tune larger models for interactive debugging. We are open-sourcing debug-gym to facilitate this line of research. We encourage the community to help us advance this research towards building interactive debugging agents and, more generally, agents that can seek information by interacting with the world on demand. Acknowledgements We thank Ruoyao Wang for their insightful discussion on building interactive debugging agents, Chris Templeman and Elaina Maffeo for their team coaching, Jessica Mastronardi and Rich Ciapala for their kind support in project management and resource allocation, and Peter Jansen for providing valuable feedback for the technical report. Opens in a new tab0 Commenti 0 condivisioni 105 Views
-
WWW.MICROSOFT.COMResearch Focus: Week of April 7, 2025In this issue: We introduce a new dataset designed to assist renewable energy infrastructure planners, a new method for denoising MRI imagery, and an AI tool for analyzing distant galaxies. Check out our latest research and other updates. NEW RESEARCH Global Renewables Watch: A Temporal Dataset of Solar and Wind Energy Derived from Satellite Imagery Siting renewable energy infrastructure requires careful consideration of the potential impact on ecosystems, cultural and historical resources, agriculture, and scenic landscapes. To help policymakers, researchers, and other stakeholders assess strategies for deployment, researchers from Microsoft, The Nature Conservancy (opens in new tab), and Planet (opens in new tab) present a comprehensive global temporal dataset of commercial solar photovoltaic (PV) farms and onshore wind turbines. The researchers built the dataset by training deep learning-based segmentation models on high-resolution satellite imagery and then deploying them on over 13 trillion pixels of images covering the world. The final spatial dataset includes 375,197 individual wind turbines and 86,410 solar photovoltaic installations. For each detected feature, they estimate the construction date and the preceding land use type, and aggregate their findings to the country level, along with estimates of total power capacity. Read the paper NEW RESEARCH SNRAware: Improved Deep Learning MRI Denoising with SNR Unit Training and G-factor Map Augmentation This research proposes a new training method, SNRAware, to improve the ability of deep learning models to denoise—or remove unwanted random variations—from MRI images. MRI images can suffer from high levels of noise when scanning is accelerated with parallel imaging or when data are acquired using lower cost, low-field MRI systems. The researchers tested SNRAware on 14 different models, including ones based on transformer and convolutional architectures. The proposed training scheme improved the performance of all the tested models. This broad applicability means that the method is flexible and can be applied to different kinds of models without redesigning them. The testing showed SNRAware significantly improves the quality and clinical utility of MRI images while preserving important diagnostic details. Read the paper NEW RESEARCH Can AI unlock the mysteries of the universe? Analyzing the physical properties of individual galaxies is a fundamental skill in astronomy. It requires a thorough understanding of galaxy formation theories and the ability to interpret vast amounts of observational data. However, even for seasoned astronomers, this process can be time-consuming and labor-intensive. To help astronomers accelerate this fundamental process, researchers from Microsoft and external colleagues introduce Mephisto, research designed to analyze extremely distant galaxies observed by the James Webb Space Telescope (JWST). Mephisto analyzes photometric data from distant galaxies, proposing physical models and interacting with Code Investigating Galaxy Emission (opens in new tab), a commonly used galaxy spectral simulation program. Mephisto can detect discrepancies between models and observational data, identifies potential instrumental errors or limitations in the models, iteratively adjusts parameters, and generates multiple explanations for the observational data. Read the article APPLIED AI Japan Airlines’ new AI app will make it easier for cabin attendants to report inflight events with Microsoft’s Phi-4 small language model Japan Airlines (JAL) is using technology developed by Microsoft Research to deploy an AI app that helps flight crews communicate more effectively with ground staff when something unexpected comes up during a flight. The JAL-AI Report is being developed using Microsoft’s Phi-4 small language model (SLM), which requires less computing power than the large language models (LLMs) most generative AI tools run on, so it can be used offline on a device for specific tasks. Cabin attendants who have tried it say it can slash the time for writing operation reports by up to two thirds, say, from one hour to 20 minutes, or from 30 minutes to 10 for simpler cases. Read the story Microsoft Research | In case you missed it AI weather forecast project eyes access through desktop computers Financial Times | March 20, 2025Aardvark Weather uses AI to deliver accurate forecasts in just minutes from a desktop computer. Developed by scientists at the University of Cambridge, with support from the Alan Turing Institute, Microsoft Research, and the European Centre for Medium-Range Weather Forecasts, this technology is tens of times faster than existing methods and requires only a fraction of the computing power. Director of Microsoft Research talks AI for science (what it really means) The Deep View | March 11, 2025Chris Bishop, Director, AI for Science, Microsoft Research, discusses what AI is doing for science. This interview dives into how AI is accelerating discovery of new techniques and findings, the benefits of foundation models like Aurora, MatterGen’s capabilities, and AI’s impact on scientists. Microsoft’s Christopher Bishop: Scientific discovery is AI’s killer application Financial Times | April 3, 2025Christopher Bishop runs Microsoft’s AI for Science research unit, which applies the powerful technology to the natural sciences. Bishop sees the mission of the lab, which was founded in 2022, as accelerating scientific discovery using the technology.In this conversation with the Financial Times’ AI editor Madhumita Murgia, he explains why he believes scientific discovery will prove to be the single most important application of the technology. Innovation to Impact (ft. Dr M – DGTL Voices with Ed Marx) DGTL Voices with Ed Marx | March 12, 2025Matthew Lungren, Chief Scientific Officer, Microsoft Health and Life Sciences, and Jonathan Carlson, Managing Director, Microsoft Health Futures, discuss AI’s transformative impact on radiology and the importance of collaboration in research and product development. They highlight how healthcare organizations can leverage Microsoft’s resources for innovation, emphasizing Microsoft’s progress in developing radiology-specific multimodal models and its broader work in healthcare. Tech Life – The doctor will see you now BBC Sounds | March 4, 2025An update from the live trials in Ghana of Microsoft Research’s Holoportation 3D telemedicine technology. BBC’s Tech Life speaks to lead researcher Spencer Fowers, as well as a patient and doctor benefiting from the portable kit.Related video: 3D telemedicine offers help to sick Ghanaians in remote locations Microsoft Unveils New AI Model to Edit Video Games IEEE Spectrum | March 11, 2025Lead researcher Katja Hoffman discusses Microsoft’s Muse, a transformer model with 1.6 billion parameters trained on 500,000 hours of player data that can generate gameplay examples from a single screenshot. National University of Singapore collaborates with Microsoft Research Asia to advance AI research and cultivate computing talent NUS News | April 2, 2025The National University of Singapore (NUS) has signed a five-year collaboration agreement with Microsoft Research Asia for a Joint PhD Supervision Program, bringing together NUS’s academic and research excellence with Microsoft Research Asia’s global leadership in AI, computing research, and industrial applications to cultivate talent. As part of this collaboration, NUS and Microsoft Research Asia will nurture PhD students through the Industrial Postgraduate Program, supported by the Singapore Economic Development Board (EDB). This initiative will help to cultivate interdisciplinary, high-caliber tech professionals and drive the integration of AI technology across industries. How Microsoft made it through 50 years The Verge | April 4, 2025A lot has changed since Microsoft was founded, but in many ways, the company’s core business model and ethos remain the same: make software that everyone needs and get it installed everywhere. Adapting to change, including the ongoing AI transformation, has always played an important role in the company’s success. View more news and awards Opens in a new tab0 Commenti 0 condivisioni 107 Views
-
WWW.MICROSOFT.COMReal-world healthcare AI development and deploymentat scale[THEME MUSIC FADES]The passage I read at the top there is from Chapter 7 of the book, The Ultimate Paperwork Shredder.Paperwork plays a particularly important role in healthcare. It helps convey treatment information that supports patient care, and its also used to help demonstrate that providers are meeting regulatory responsibilities, among other things. But if were being honest, its taxingfor everyoneand its a big contributor to the burnout our clinicians are experiencing today. Carey, Zak, and I identified this specific pain point as one of the best early avenues to pursue as far as putting generative AI to good work in the healthcare space.In this episode, Im excited to welcome Dr. Matt Lungren and Seth Hain to talk about matching technological advancements in AI to clinical challenges, such as the paperwork crisis, to deliver solutions in the clinic and in the health system back office.Matt is the chief scientific officer for Microsoft Health and Life Sciences, where he focuses on translating cutting-edge technology, including generative AI and cloud services, into innovative healthcare applications. Hes a clinical interventional radiologist and a clinical machine learning researcher doing collaborative research and teaching as an adjunct professor at Stanford University. His scientific work has led to more than 200 publications, including work on new computer vision and natural language processing approaches for healthcare.Seth is senior vice president of research and development at Epic, a leading healthcare software company specializing in electronic health record systems, also known as EHR, as well as other solutions for connecting clinicians and patients. During his 19 years at Epic, Seth has worked on enhancing the core analytics and other technologies in Epics platforms as well as their applications across medicine,Ive had the pleasure of working closely with both Matt and Seth. Matt, as a colleague here at Microsoft, really focused on our health and life sciences business. And Seth, as a collaborator at Epic, as we embark on the questions of how to integrate and deploy generative AI into clinical applications at scale. [TRANSITION MUSIC]Heres my conversation with Dr. Matt Lungren:LEE: Matt, welcome. Its just great to have you here.MATTHEW LUNGREN: Thanks so much, Peter. Appreciate being here.LEE: So, Id like to just start just talking about you. You know, I had mentioned your role as the chief scientific officer for Microsoft Health and Life Sciences. Of course, thats just a title. So, what the heck is that? What is your job exactly? And, you know, what does a typical day at work look like for you?LUNGREN: So, really what you could boil my work down to is essentially cross collaboration, right. We have a very large company, lots of innovation happening all over the place, lots of partners that we work with and then obviously this sort of healthcare mission.And so, what innovations, what kind of advancements are happening that can actually solve clinical problems, right, and sort of kind of direct that. And we can go into some examples, you know, later. But then the other direction, too, is important, right. So, identifying problems that may benefit from a technologic application or solution and kind of translating that over into the, you know, pockets of innovation saying, Hey, if you kind of tweaked it this way, this is something that would really help, you know, the clinical world.And so, its really a bidirectional role. So, my day to day is every day is a little different, to be honest with you. Some days its very much in the science and learning about new techniques. On the other side, though, it can be very much in the clinic, right. So, what are the pain points that were seeing? Where are the gaps in the solutions that weve already rolled out? And, you know, again, what can we do to make healthcare better broadly?LEE: So, you know, I think of you as a technologist, and, Matt, you and I actually are colleagues working together here at Microsoft. But you also do spend time in the clinic still, as well, is that right?LUNGREN: You know, initially it was kind of a very much a non-negotiable for me in sort of taking an industry role. I think like a lot of, you know, physicians, you know, were torn with the idea of like, hey, I spent 20 years training. I love what I do, you know, with a lot of caveats there in terms of some of the administrative burden and some of the hassle sometimes. But for the most part, I love what I do, and theres no greater feeling than using something that you trained years to do and actually see the impact on a human life. Its unbelievable, right.So, I think part of me was just, like, I didnt want to let that part of my identity go. And frankly, as I often say, to this day, I walk by a fax machine in our office today, like in 2025.So just to be extra clear, it really grounds me in, like, yes, I love the possibilities. I love thinking about what we can do. But also, I have a very stark understanding of the reality on the ground, both in terms of the technology but also the burnout, right. The challenges that were facing in taking care of patients has gotten, you know, much, much more difficult in the last few years, and, you know, I like to think it keeps my perspective, yeah.LEE: You know, I think some listeners to this podcast might be surprised that we have doctors on staff in technical roles at Microsoft. How do you explain that to people?LUNGREN: [LAUGHS] Yeah, no, yeah, it is interesting. I would say that, you know, from, you know, the legacy Nuance world, it wasnt so far-fetched that you have physicians that were power users and eventually sort of, you know, became, Hey, listen, I think this is a strategic direction; you should take it or whatever. And certainly maybe in the last, I want to say, five years or so, Ive seen more and more physicians who have, you know, taken the time, sometimes on their own, to learn some of the AI capabilities, learn some of the principles and concepts; and frankly, some are, you know, even coding solutions and leading companies.So, I do think that that has shifted a bit in terms of like, Hey, doctor, this is your lane, and over here, you know, heres a technical person. And I think thats fused quite a bit more.But yeah, it is an unusual thing, I think, in sort of how weve constructed what at least my group does. But again, I cant see any other way around some of the challenges.I think, you know, an anecdote Id like to tell you, when I was running the AIMI [Artificial Intelligence in Medicine and Imaging] Center, you know, we were bringing the medical school together with the computer science department, right, at Stanford. And I remember one day a student, you know, very smart, came into my office, you know, a clinical day or something, and hes like, is there just, like, a book or something where I can just learn medicine? Because, like, I feel like theres a lot of, like, translation you have to do for me.It really raised an important insight, which is that you can learn the, you know, medicine, so to speak. You know, go to med school; you know, take the test and all that. But it really you dont really understand the practice of medicine until you are doing that.And in fact, I even push it a step further to say after training those first two or three years of you are the responsible person; you can turn around, and theres no one there. Like, you are making a decision. Getting used to that and then having a healthy respect for that actually I think provides the most educational value of anything in healthcare.LEE: You know, I think what youre saying is so important because as I reflect on my own journey. Of course, Im a computer scientist. I dont have medical training, although at this point, I feel confident that I could pass a Step 1 medical exam.LUNGREN: I have no doubt. [LAUGHS]LEE: But I think that the tech industry, because of people like you, have progressed tremendously in having a more sophisticated and nuanced understanding of what actually goes on in clinic and also what goes on in the boardrooms of healthcare delivery organizations. And of course, at the end of the day, I think thats really been your role.So roughly speaking, your job as an executive at a big tech company has been to understand what the technology platforms need to be, particularly with respect to machine learning, AI, and cloud computing, to best support healthcare. And so maybe lets start pre-GPT-4, pre-ChatGPT, and tell us a little bit, you know, about maybe some of your proudest moments in getting advanced technologies like AI into the clinic.LUNGREN: You know, when I first started, so remember, like you go all the way back to about 2013, right, my first faculty job, and, you know, were building a clinical program and I, you know, I had a lot of interest in public health and building large datasets for pop [population] health, etc. But I was doing a lot of that, you know, sort of labeling to get those insights manually, right. So, like, I was the person that youd probably look at now and say, What are you doing? Right?So but I had a complete random encounter with Andrew Ng, who I didnt know at the time, at Stanford. And I, you know, went to one of the seminars that he was holding at the Gates building, and, you know, they were talking about their performance on ImageNet. You know, cat and dog and, you know, tree, bush, whatever. And I remember sitting in kind of the back, and I think I maybe had my scrubs on at the time and just kind of like, what? Like, why like, this we could use this in healthcare, you know. [LAUGHS]But for me, it was a big moment. And I was like, this is huge, right. And as you remember, the deep learning really kind of started to show its stuff with, you know, Fei-Fei Lis ImageNet stuff.So anyway, we started the collaboration that actually became a NIDUS. And one of the first things we worked on, we just said, Listen, one of the most common medical imaging examinations in the world is the chest x-ray. Right? Two, three billion are done every year in the world, and so is that not a great place to start?And of course, we had a very democratizing kind of mission. As you know, Andrew has done a lot of work in that space, and I had similar ambitions. And so, we really started to focus on bringing the, you know, the sort of the clinical and the CS together and see what could be done.So, we did CheXNet. And this is, remember this is around the time when, like, Geoffrey Hinton was saying things like we should stop training radiologists, and all this stuff was going on. [LAUGHTER] So theres a lot of hype, and this is the narrow AI days just to remind the audience.LEE: How did you feel about that since you are a radiologist?LUNGREN: Well, it was so funny. So, Andrew is obviously very prolific on social media, and I was, who am I, right? So, I remember he tagged me. Well, first he said, Matt, you need to get a Twitter account. And I said OK. And he tagged me on the very first post of our, what we call, CheXNet that was kind of like the Hello, World! for this work.And I remember it was a clinical day. I had set my phone, as you do, outside the OR. I go in. Do my procedure. You know, hour or so, come back, my phones dead. Im like, oh, thats weird. Like I had a decent charge. So, you know, I plug it in. I turn it on. I had like hundreds of thousands of notifications because Andrew had tweeted out to his millions or whatever about CheXNet.And so, then of course, as you point out, I go to RSNA that year, which is our large radiology conference, and that Geoffrey Hinton quote had come out. And everyones looking at me like, What are you doing, Matt? You know, like, are you coming after our specialty? Im like, No, no, thats, [LAUGHS] you know, its a way to interpret it, but you have to take a much longer horizon view, right.LEE: Well, you know, were going to, just as an enticement for listeners to this podcast to listen to the very end, Im going to pin you down toward the end on your assessment of whether Geoffrey Hinton will eventually be proven right or not. [LAUGHTER] But lets take our time to get there.Now lets go ahead and enter the generative AI era. When we were first exposed to what we now know of as GPT-4this was before it was disclosed to the worlda small number of people at Microsoft and Microsoft Research were given access in order to do some technical assessment.And, Matt, you and I were involved very early on in trying to assess what might this technology mean for medicine.LUNGREN: It was the weirdest thing, Peter. Like I joined that summer, so the summer before, you know, the actual GPT came out. I had literally no idea what I was getting into.So, I started asking it questions, you know, kind of general stuff, right. Just, you know, I was like, oh, all right, its pretty good. And so, then I would sort of go a little deeper. And eventually I got to the point where Im asking questions that, you know, maybe theres three papers on it in my community, and remember Im a sub-sub specialist, right, pediatric interventional radiology. And the things that we do in vascular malformations and, you know, rare cancers are really, really strange and not very commonly known.And I kind of walked away from thatfirst I said, can I have this thing, right? [LAUGHS]But then I, you know, I dont want to sound dramatic, but I didnt sleep that well, if Im being honest, for the first few nights. Partially because I couldnt tell anybody, except for the few that I knew were involved, and partially because I just couldnt wrap my head around how we went from what I was doing in LSTMs [long short-term memory networks], right, which was state of the artish at the time for NLP [natural language processing].And all of a sudden, I have this thing that is broadly, you know, domain experts, you know, representations of knowledge that theres no way you could think of it would be in distribution for a normal approach to this.And so, I really struggled with it, honestly. Interpersonally, like, I would be like, uh, well, lets not work on that. Theyre like, why not? You were just excited about it last week. Im like, I dont know. I think that we could think of another approach later.And so yeah, when we were finally able to really look at some of the capabilities and really think clearly, it was really clear that we had a massive opportunity on our hands to impact healthcare in a way that was never possible before.LEE: Yeah, and at that time you were still a part of Nuance. Nuance, I think, was in the process of being acquired by Microsoft. Is that right?LUNGREN: Thats right.LEE: And so, of course, this was also a technology that would have profound and very direct implications for Nuance. How did you think about that?LUNGREN: Nuance, for those in the audience who dont know, for 25 years was, sort of, the medical speech-to-text thing that all, you know, physicians used. But really the brass ring had always been and I want to say going back to like 2013, 2014, Nuance had tried to figure out, OK, we see this pain point. Doctors are typing on their computers while theyre trying to talk to their patients, right.We should be able to figure out a way to get that ambient conversation turned into text that then, you know, accelerates the doctor takes all the important information. Thats a really hard problem, right. Youre having a conversation with a patient about their knee pain, but youre also talking about, you know, their cousins wedding and their next vacation and their dog is sick or whatever and all that gets recorded, right.And so, then you have to have the intelligence/context to be able to tease out whats important for a note. And then it has to be at the performance level that a physician who, again, 20 years of training and education plus a huge, huge amount of, you know, need to get through his cases efficiently, thats a really difficult problem.And so, for a long time, there was a human-in-the-loop aspect to doing this because you needed a human to say, This transcripts great, but heres actually what needs to go on the note. And that cant scale, as you know.When the GPT-4, you know, model kind of, you know, showed what it was capable of, I think it was an immediate light bulb because there was no you can ask any physician in your life, anyone in the audience, you know, what are your what is the biggest pain point when you go to see your doctor? Like, Oh, they dont talk to me. They dont look me in the eye. Theyre rushing around trying to finish a note.If we could get that off their plate, thats a huge unlock, Peter. And I think that, again, as you know, its now led to so much more. But that was kind of the initial, I think, reaction.LEE: And so, maybe that gets us into our next set of questions, our next topic, which is about the book and all the predictions we made in the book. Because Carey, Zak, and Iactually we did make a prediction that this technology would have a huge impact on this problem of clinical note-taking.And so, youre just right in the middle of that. Youre directly hands-on creating, I think, what is probably the most popular early product for doing exactly that. So, were we right? Were we wrong? What else do we need to understand about this?LUNGREN: No, you were right on. I think in the book, I think you called it like a paper shredder or something. I think you used a term like that. Thats exactly where the activity is right now and the opportunity.Ive even taken that so far as to say that when folks are asking about what the technology is capable of doing, we say, well, listen, its going to save time before it saves lives. Itll do both. But right now, its about saving time.Its about peeling back the layers of the onion that if you, you know, put me in where I started medicine in 2003, and then fast-forward and showed me a day in the life of 2025, I would be shocked at what I was doing that wasnt related to patient care, right. So, all of those layers that have been stacked up over the years, we can start finding ways to peel that back. And I think thats exactly what were seeing.And to your point, I think you mentioned this, too, which is, well, sure, we can do this transcript, and we can turn a note, but then we can do other things, right. We can summarize that in the patients language or education level of choice. We can pend orders. We can eventually get to a place of decision support. So, Hey, did you think about this diagnosis, doctor? Like those kinds of things.And all those things, I think you highlighted beautifully, and again, it sounds like with, you know, a lot of, right, just kind of guesswork and prediction, but those things are actually happening every single day right now.LEE: Well, so now, you know, in this episode, were really trying to understand, you know, where the technology industry is in delivering these kinds of things. And so from your perspective, you know, in the business that youre helping to run here at Microsoft, you know, what are the things that are actually shipping as product versus things that clinicians are doing, lets say, off label, just by using, say, ChatGPT on their personal mobile devices, and then what things arent happening?LUNGREN: Yeah. Ill start with the shipping part because I think you, again, you know my background, right. Academic clinician, did a lot of research, hadnt had a ton of product experience.In other words, like, you know, again Im happy to show you what benchmarks we beat or a new technique or, you know, get a grant to do all this, or even frankly, you know, talk about startups. But to actually have an audience that is accustomed to a certain level of performance for the solutions that they use, to be able to deliver something new at that same level of expectation, wow, thats a big deal.And again, this is part of the learning by, you know, kind of being around this environment that we have, which is we have this, you know, incredibly focused, very experienced clinical product team, right.And then I think on the other side, to your point about the general-purpose aspect of this, its no secret now, right, that, you know, this is a useful technology in a lot of different medical applications. And lets just say that theres a lot of knowledge that can be used, particularly by the physician community. And I think the most recent survey I saw was from the British Medical Journal, which said, hey, you know, which doctors are using are you willing to tell us, you know, what youre doing? And it turns out that folks are, what, 30% or so said that they were using it regularly in clinic [1]. And again, this is the general, this is the API or whatever off the shelf.And then frankly, when they ask what theyre using it for, tends to be things like, Hey, differential, like, help me fill in my differential or suggest and to me, I think what that created, at leastand youre starting to see this trend really accelerate in the US especiallyis, well, listen, we cant have everybody pulling out their laptops and potentially exposing, you know, patient information by accident or something to a public API.We have to figure this out, and so brilliantly, I think NYU [New York University] was one of the first. Now I think theres 30 plus institutions that said, listen, OK, we know this is useful to the entire community in the healthcare space. Right?We cant allow this sort of to be a very loosey-goosey approach to this, right, given this sort of environment. So, what well do is well set up a HIPAA-compliant instance to allow anyone in the communityyou know, in the health systemto use the models, and then whatever the newest model comes, it gets hosted, as well.And whats cool about thatand thats happened now a lot of placesis that at the high level first of all, people get to use it and experiment and learn. But at the high level, theyre actually seeing what are the common use cases. Because you could ask 15 people and you might get super long lists, and it may not help you decide what to operationalize in your health system.LEE: But let me ask you about that. When you observe that, are there times when you think, Oh, some specific use cases that were observing in that sort of organic way need to be taken into specialized applications and made into products? Or is it best to keep these things sort of, you know, open-chat-interface types of general-purpose platform?LUNGREN: Honestly, its both, and thats exactly what were seeing. Im most familiar with Stanford, kind of, the work that Nigam Shah leads on this. But he, he basically, you know, theres a really great paper that is coming out in JAMA, but basically saying, Heres what our workforce is using it for. Here are the things in the literature that would suggest what would be popular.And some of those line up, like helping with a clinical diagnosis or documentation, but some of them dont. But for the most part, the stuff that flies to the top, those are opportunities to operationalize and productize, etc. And I think thats exactly what were seeing.LEE: So, lets get into some of the specific predictions. Weve, I think, beaten note-taking to death here. But theres other kinds of paperwork, like filling out prior authorization request forms or referral letters, an after-visit note or summary to give instructions to patients, and so on. And these were all things that we were making guesses in our book might be happening. Whats the reality there?LUNGREN: Ive seen every single one of those. In fact, Ive probably seen a dozen startups too, right, doing exactly those things. And, you know, we touched a little bit on translation into the actual clinic. And thats actually another thing that I used to kind of underappreciate, which is that, listen, you can have a computer scientist and a physician or nurse or whatever, like, give the domain expertise, and you think youre ready to build something.The health IT [LAUGHS] is another part of that Venn diagram thats so incredibly critical, and then exactly how are you going to bring that into the system. Thats a whole new ballgame.And so I do want to do a callout because the collaboration that we have with Epic is monumental because here, you have the system of record that most physicians, at least in the US, use. And theyre going to use an interface and theyre going to have an understanding of, hey, we know these are pain points, and so I think theres some really, really cool, you know, new innovations that are coming out of the relationship that we have with Epic. And certainly the audience may be familiar with those, that I think will start to knock off a lot of the things that you predicted in your book relatively soon.LEE: I think most of the listeners to this podcast will know what Epic is. But for those that are unfamiliar with the health industry, and especially the technology foundation, Epic is probably the largest provider of electronic health record systems. And, of course, in collaboration with you and your team, theyve been integrating generative AI quite a bit. Are there specific uses that Epic is making and deploying that get you particularly excited?LUNGREN: First of all, the ambient note generation, by the way, is integrated into Epic now. So like, you know, its not another screen, another thing for physicians. So thats a huge, huge unlock in terms of the translation.But then Epic themselves, so they have, I guess, on the last roadmap that they talked [about], more than 60, but the one thats kind of been used now is this inbox response.So again, maybe someone might not be familiar with, why is it such a big deal? Well, if youre a physician, you already have, you know, 20 patients to see that day and you got all those notes to do, and then Jevons paradox, right. So if you give me better access to my doctor, well, maybe I wont make an appointment. Im just going to send him a note and this is kind of this inbox, right.So then at the end of my day, I got to get all my notes done. And then I got to go through all the inbox messages Ive received from all of my patients and make sure that theyre not like having chest pain and theyre blowing it off or something.Now thats a lot of work and the cold start problem of like, OK, I to respond to them. So Epic has leveraged this system to say, Let me just draft a note for you, understanding the context of, you know, whats going on with the patient, etc. And you can edit that and sign it, right. So you can accelerate some of those so thats probably one Im most excited about. But theres so many right now.LEE: Well, I think I need to let you actually state the name of the clinical note-taking product that youre associated with. Would you like to do that? [LAUGHS]LUNGREN: [LAUGHS] Sure. Yeah, its called DAX Copilot [2]. And for the record, it is the fastest-growing copilot in the Microsoft ecosystem. Were very proud of that. Five hundred institutions already are using it, and millions of notes have already been created with it. And the feedback has been tremendous.LEE: So, you sort of referred to this a little bit, you know, this idea of AI being a second set of eyes. So, doctor makes some decisions in diagnosis or kind of working out potential treatments or medication decisions. And in the book, you know, we surmise that, well, AI might not replace the doctor doing those things. It could but might not. But AI could possibly reduce errors if doctors and nurses are making decisions by just looking at those decisions and just checking them out. Is that happening at all, and what do you see the future there?LUNGREN: Yeah, I would say, you know, thats kind of the jagged edge of innovation, right, where sometimes the capability gets ahead of the ability to, you know, operationalize that. You know, part of that is just related to the systems. The evidence has been interesting on this. So, like, you know this, our colleague Eric Horvitz has been doing a lot of work in sort of looking at physician, physician with GPT-4, lets say, and then GPT-4 alone for a whole variety of things. You know, weve been saying to the world for a long time, particularly in the narrow AI days, that AI plus human is better than either alone. Were not really seeing that bear out really that well yet in some of the research.But it is a signal to me and to the use case youre suggesting, which is that if we let this system, in the right way, kind of handle a lot of the safety-net aspects of what we do but then also potentially take on some of the things that maybe are not that challenging or at least somewhat simple.And of course, this is really an interesting use case in my world, in the vision world, which is that we know these models are multimodal, right. They can process images and text. And what does that look like for pathologists or radiologists, where we do have a certain percentage of the things we look at in a given day are normal, right? Or as close to normal as you can imagine. So is there a way to do that? And then also, by the way, have a safety net.And so I think that this is an extremely active area right now. I dont think weve figured out exactly how to have the human and AI model interact in this space yet. But I know that theres a lot of attempts at it right now.LEE: Yeah, I think, you know, this idea of a true copilot, you know, a true collaborator, you know, I think is still something thats coming. I think weve had a couple of decades of people being trained to think of computers as question-answering machines. Ask a question, get an answer. Provide a document, get a summary. And so on.But the idea that something might actually be this second set of eyes just assisting you all day continuously, I think, is a new mode of interaction. And we havent quite figured that out.Now, in preparation for this podcast, Matt, you said that you actually used AI to assist you in getting ready. [LAUGHS] Would you like to share what you learned by doing that?LUNGREN: Yeah, its very funny. So, like, you may have heard this term coined by Ethan Mollick called the secret cyborg, (opens in new tab) which is sort of referring to the phenomena of folks using GPT, realizing it can actually help them a ton in all kinds of parts of their work, but not necessarily telling anybody that theyre using it, right.And so in a similar secret cyborgish way, I was like, Well, listen, you know, I havent read your book in like a year. I recommend it to everybody. And [I need] just a refresher. So what I did was I took your book, I put it into GPT-4, OK, and asked it to sort of talk about the predictions that you made.And then I took that and put it in the stronger reasoning modelin this case, the deep research that you may have just seen or heard of and the audience from OpenAIand asked it to research all the current papers, you know, and blogs and whatever else and tell me like what was right, what was wrong in terms of the predictions. [LAUGHS]So it, actually, it was an incredible thing. Its a, like, what, six or seven pages. It probably would have taken me two weeks, frankly, to do this amount of work.LEE: Ill be looking forward to reading that in the New England Journal of Medicine shortly.LUNGREN: [LAUGHS] Thats right. Yeah, no, dont, before this podcast comes out, Ill submit it as an opinion piece. No. [LAUGHS] But, yeah, but I think on balance, incredibly insightful views. And I think part of that was, you know, your team that got together really had a lot of different angles on this. But, you know, and I think the only area that was, like, which Ive observed as well, its just, man, this can do a lot for education.We havent seen I dont think were looking at this as a tutor. To your point, were kind of looking at it as a transactional in and out. But as weve seen in all kinds of data, both in low-, middle-income countries and even in Harvard, using this as a tutor can really accelerate your knowledge and in profound ways.And so that is probably one area where I think your prediction was maybe slightly even further ahead of the curve because I dont think folks have really grokked that opportunity yet.LEE: Yeah, and for people who havent read the book, you know, the guess was that you might use this as a training aid if youre an aspiring doctor. For example, you can ask GPT-4 to pretend to be a patient that presents a certain way and that you are the doctor that this patient has come to see. And so you have an interaction. And then when you say end of encounter, you ask GPT-4 to assess how well you did. And we thought that this might be a great training aid, and to your point, it seems not to have materialized.LUNGREN: Theres some sparks. You know, with, like, communication, end-of-life conversations that no physician loves to have, right. Its very, very hard to train someone in those. Ive seen some work done, but youre right. Its not quite hit mainstream yet.LEE: On the subject of things that we missed, one thing that youve been very, very involved in in the last several months has been in shipping products that are multimodal. So that was something I think that we missed completely. What is the current state of affairs for multimodal, you know, healthcare AI, medical AI?LUNGREN: Yeah, the way I like to explain itand first of all, no fault to you, but this is not an area that, like, we were just so excited about the text use cases that I cant fault you. But yeah, I mean, so if we look at healthcare, right, how we take care of patients today, as you know, the vast majority of the data in terms of just data itself is actually not in text, right. Its going be in pathology and genomics and radiology, etc.And it seems like an opportunity here to watch this huge curve just goes straight up in the general reasoning and frankly medical competency and capabilities of the models that are coming and continue to come but then to see that its not as proficient for medical-specific imaging and video and, you know, other data types. And that gap is, kind of, what I describe as the multimodal medical AI gap.Were probably in GPT-2 land, right, for this other modality types versus the, you know, were now at o3, who knows where were going to go. At least in our view, we can innovate in that space.How do we help bring those innovations to the broader community to close that gap and see some of these use cases really start to accelerate in the multimodal world?And I think weve taken a pretty good crack at that. A lot of that is credit to the innovative work. I mean, MSR [Microsoft Research] was two or three years ahead of everyone else on a lot of this. And so how do we package that up in a way that the community can actually access and use? And so, we took a lot of what your group had done in, lets just say, radiology or pathology in particular, and say, OK, well, lets put this in an ecosystem of other models. Other groups can participate in this, but lets put it in a platform where maybe Im really competent in radiology or pathology. How do I connect those things together? How do I bring the general reasoner knowledge into a multimodal use case?And I think thats what weve done pretty well so far. We have a lot of work to do still, but this is very, very exciting. Were seeing just such a ton of interest in building with the tools that we put out there.LEE: Well, I think how rapidly thats advancing has been a surprise to me. So I think were running short on time. So two last questions to wrap up this conversation. The first one is, as we think ahead on AI in medicine, what do you think will be the biggest changes or make the biggest differences two years from now, five years from now, 10 years from now?LUNGREN: This is really tough. OK. I think the two-year timeframe, I think we will have some autonomous agent-based workflows for a lot of the what I would call undifferentiated heavy lifting in healthcare.And this is happening in, you know, the pharmaceutical industry, the payer every aspect is sort of looking at their operations at a macro level: where are these big bureaucratic processes that largely involve text and where can we shrink those down and really kind of unlock a lot of our workforce to do things that might be more meaningful to the business? I think thats my safe one.Going five years out, you know, I have a really difficult time grappling with this seemingly shrinking timeline to AGI [artificial general intelligence] that we hear from people who I would respect and certainly know more than me. And in that world, I think theres only been one paper that Ive seen that has attempted to say, what does that mean in healthcare (opens in new tab) when we have this?And the fact is, I actually dont know. [LAUGHS] I wonder whether therell still be a gap in some modalities. Maybe therell be the ability to do new science, and all kinds of interesting things will come of that.But then if you go all the way to your 10-year, I do feel like were going to have systems that are acting autonomously in a variety of capacities, if Im being honest.What I would like to see if I have any influence on some of this is, can we start to celebrate the closing of hospitals instead of opening them? Meaning that, can we actually start to addressat a personal, individual levelcare? And maybe thats outside the home, maybe thats, you know, in a way that doesnt have to use so many resources and, frankly, really be very reactive instead of proactive.I really want to see that. Thats been the vision of precision medicine for, geez, 20-plus years. I feel like were getting close to that being something we can really tackle.LEE: So, we talked about Geoff Hinton and his famous prediction that we would soon not have human radiologists. And of course, maybe he got the date wrong. So, lets reset the date to 2028. So, Matt, do you think Geoff is right or wrong?LUNGREN: [LAUGHS] Yeah, so the way Im not going to dodge the question, but let me just answer this a different way.We have a clear line of sight to go from images to draft reports. That is unmistakable. And thats now in 2025. How it will be implemented and what the implications of that will be, I think, will be heavily dependent on the health system or the incentive structure for where its deployed.So, if Im trying to take a step back, back to my global health days, man, that cant come fast enough. Because, you know, you have entire health systems, you know, in fact entire countries that have five, you know, medical imaging experts for the whole country, but they still need this to you know take care of patients.Zooming in on todays crisis in the US, right, we have the burnout crisis just as much as the doctors who are seeing patients and write notes. We cant keep up with the volume. In fact, were not training folks fast enough, so there is a push pull; there may be a flip to your point of autonomous reads across some segments of what we do.By 2028, I think thats a reasonable expectation that well have some form of that. Yes.LEE: I tend to agree, and I think things get reshaped, but it seems very likely that even far into the future well have humans wanting to take care of other humans and be taken care of by humans.Matt, this has been a fantastic conversation, and, you know, I feel its always a personal privilege to have a chance to work with someone like you so keep it up.[TRANSITION MUSIC]LUNGREN: Thank you so much, Peter. Thanks for having me.LEE: Im always so impressed when I talk to Matt, and I feel lucky that we get a chance to work together here at Microsoft. You know, one of the things that always strikes me whenever I talk to him is just how disruptive generative AI has been to a business like Nuance. Nuance has had clinical note-taking as part of their product portfolio for a long, long time. And so, you know, when generative AI comes along, its not only an opportunity for them, but also a threat because in a sense, it opens up the possibility of almost anyone being able to make clinical note-taking capabilities into products.Its really interesting how Matts product, DAX Copilot, which since the time that we had our conversation has expanded into a full healthcare workflow product called Dragon Copilot, has really taken off in the marketplace and how many new competing AI products have also hit the market, and all in just two years, because of generative AI.The other thing, you know, that I always think about is just how important it is for these kinds of systems to work together and especially how they integrate into the electronic health record systems. This is something that Carey, Zak, and I didnt really realize fully when we wrote our book. But you know, when you talk to both Matt and Seth, of course, we see how important it is to have that integration.Finally, what a great example of yet another person who is both a surgeon and a tech geek. [LAUGHS] People sometimes think of healthcare as moving very slowly when it comes to new technology, but people like Matt are actually making it happen much more quickly than most people might expect.Well, anyway, as I mentioned, we also had a chance to talk to Seth Hain, and so heres my conversation with Seth:LEE: Seth, thank you so much for joining.SETH HAIN: Well, Peter, its such an exciting time to sit down and talk about this topic. So much has changed in the last two years. Thanks for inviting me.LEE: Yeah, in fact, I think in a way both of our lives have been upended in many ways by the emergence of AI. [LAUGHTER]The traditional listeners of the Microsoft Research Podcast, I think for the most part, arent steeped in the healthcare industry. And so maybe we can just start with two things. One is, what is Epic, really? And then two, what is your job? What does the senior vice president for R&D at Epic do every day?HAIN: Yeah, well, lets start with that first question. So, what is Epic? Most people across the world experience Epic through something we call MyChart. They might use it to message their physician. They might use it to check the lab values after theyve gotten a recent test. But its an app on their phone, right, for connecting in with their doctors and nurses and really making them part of the care team.But the software we create here at Epic goes beyond that. Its what runs in the clinic, what runs at the bedside, in the back office to help facilitate those different pieces of care, from collecting vital information at the bedside to helping place orders if youre coming in for an outpatient visit, maybe with a kiddo with an earache, and capturing that note and record of what happened during that encounter, all the way through back-office encounters, back-office information for interacting with payers as an example.And so, we provide a suite of software that health systems and increasingly a broader set of the healthcare ecosystem, like payers and specialty diagnostic groups, use to connect with that patient at the center around their care.And my job is to help our applications across the company take advantage of those latest pieces of technology to help improve the efficiency of folks like clinicians in the exam room when you go in for a visit. Well get into, I imagine, some use cases like ambient conversations, capturing that conversation in the exam room to help drive some of that documentation.But then providing that platform for those teams to build those and then strategize around what to create next to help both the physicians be efficient and also the health systems. But then ultimately continuing to use those tools to advance the science of medicine.LEE: Right. You know, one thing that I explain to fellow technologists is that I think today health records are almost entirely digital. I think the last figures I saw is well over 99% of all health records are digital.But in the year 2001, fewer than 15% of health records were digital. They were literally in folders on paper in storerooms, and if youre old enough, you might even remember seeing those storerooms.So, its been quite a journey. Epic and Epics competitorsthough I think Epic is really the most important companyhave really moved the entire infrastructure of record keeping and other communications in healthcare to a digital foundation.And I think one thing well get into, of course, one of the issues that has really become, I think, a problem for doctors and nurses is the kind of clerical or paperwork, record-keeping, burden. And for that reason, Epic and Epic systems end up being a real focus of attention. And so, well get into that in a bit here.HAIN: And I think that hits, just to highlight it, on both sides. There is both the need to capture documentation; theres also the challenge in reviewing it.LEE: Yes.HAIN: The average medical record these days is somewhere between the length of Fahrenheit 451 and To Kill a Mockingbird. [LAUGHTER] So theres a fair amount of effort going in on that review side, as well.LEE: Yeah, indeed. So much to get into there. But I would like to talk about encounters with AI. So obviously, I think there are two eras here: before the emergence of ChatGPT and what we now call of as generative AI and afterwards. And so, lets take the former.Of course, youve been thinking about machine learning and health data probably for decades. Do you have a memory of how you got into this? Why did you get an interest in data analytics and machine learning in the first place?HAIN: Well, my background, as you noted, is in mathematics before I came to Epic. And the sort of patterns and what could emerge were always part of what drove that. Having done development and kind of always been around computers all my life, it was a natural transition as I came here.And I started by really focusing on, how do we scale systems for the very largest organizations, making sure they are highly available and also highly responsive? Time is critical in these contexts in regards to rapidly getting information to doctors and nurses.And then really in the, say, in the 2010s, there started to be an emergence of capabilities from a storage and compute perspective where we could begin to build predictive analytics models. And these were models that were very focused, right. It predicted the likelihood somebody would show up for an appointment. It predicted the likelihood that somebody may fall during an inpatient stay, as an example.And I think a key learning during that time period was thinking through the full workflow. What information was available at that point in time, right? At the moment somebody walks into the ED [emergency department], you dont have a full picture to predict the likelihood that they may deteriorate during an inpatient encounter.And in addition to what information was available was, what can you do about it? And a key part of that was how do we help get the right people in the right point in time at the bedside to make an assessment, right? It was a human-in-the-loop type of workflow where, for example, you would predict deterioration in advance and have a nurse come to the bedside or a physician come to the bedside to assess.And I think that combination of narrowly focused predictive models with an understanding that to have them make an impact you had to think through the full workflow of where a human would make a decision was a key piece.LEE: Obviously there is a positive human impact. And so, for sure, part of the thought process for these kinds of capabilities comes from that.But Epic is also a business, and you have to worry about, you know, what are doctors and clinics and healthcare systems willing to buy. And so how do you balance those two things, and do those two things ever come into conflict as youre imagining what kinds of new capabilities and features and products to create?HAIN: Two, sort of, two aspects I think really come to mind. First off, generally speaking, we see analytics and AI as a part of the application. So, in that sense, its not something we license separately. We think that those insights and those pieces of data are part of what makes the application meaningful and impactful.At the scale that many of these health systems operate and the number of patients that they care for, as well as having tens of thousands of users in the system daily, one needs to think about the compute overhead LEE: Yes.HAIN: that these things cause. And so, in that regard, there is always a ROI assessment that is taking place to some degree around, what happens if this runs at full scale? And in a way, that really got accelerated as we went into the generative AI era.LEE: Right. OK. So, you mentioned generative AI. What was the first encounter, and what was that experience for you?HAIN: So, in the winter of 22 and into 2023, I started experimenting alongside you with what we at that time called DV3, or Davinci 3, and eventually became GPT-4. And immediately, a few things became obvious. The tool was highly general purpose. One was able to, in putting in a prompt, have it sort of convert into the framing and context of a particular clinical circumstance and reason around that context. But I think the other thing that started to come to bear in that context was there was a fair amount of latent knowledge inside of it that was very, very different than anything wed seen before. And, you know, theres some examples from the Sparks of AGI paper from Microsoft Research, where a series of objects end up getting stacked together in the optimal way to build height. Just given the list of objects, it seems to have a understanding of physical space that it intuited from the training processes we hadnt seen anywhere. So that was an entirely new capability that programmers now had access to.LEE: Well fact, you know, I think that winter of 2022, and well get into this, one of your projects that youve been running for quite a few years is something called Cosmos (opens in new tab), which I find exceptionally interesting. And I was motivated to understand whether this type of technology could have an impact there.And so, I had to receive permission from both OpenAI and Microsoft to provide you with early access.When I did first show this technology to you, you must have had an emotional response, either skepticism or I cant imagine you just trusted, you know, trusted me to the extent of believing everything I was telling you.HAIN: I think theres always a question of, what is it actually, right? Its often easy to create demos. Its often easy to show things in a narrow circumstance. And it takes getting your hands on it and really spending your 10,000 hours digging in and probing it in different ways to see just how general purpose it was.And so, the skepticism was really around, how applicable can this be broadly? And I think the second questionand were starting to see this play out now in some of the later modelswas, is this just a language thing? Is it narrowly only focused on that? Or can we start to imagine other modalities really starting to factor into this? How will it impact basic sciences? Those sorts of things.On a personal note, I mean, I had, at that point, now theyre now 14 and 12, two kids that I wondered, what did this mean for them? What is the right thing for them to be studying? And so I remember sleepless nights on that topic, as well.LEE: OK, so now you get early access to this technology; youre able to do some experimentation. I think one of the things that impressed me is just less than four months later at the major health tech industry conference, HIMSS, which also happened timing-wise to take place just after the public disclosure of GPT-4, Epic showed off some early prototype applications of generative AI. And so, describe what those were, and how did you choose what to try to do there?HAIN: Yeah, and we were at that point, we actually had the very first users live on that prototype, on that early version.And the key thing wed focused onwe started this development in very, very late December, January of 2023was a problem that its origins really were during the pandemic.So, during the pandemic, we started to see patients increasingly messaging their providers, nurses, and clinicians through MyChart, that patient portal I mentioned with about 190 million folks on it. And as you can imagine, that was a great opportunity in the context of COVID to limit the amount of direct contact between providers and patients while still getting their questions answered.But what we found as we came out of the pandemic was that folks preferred it regardless. And that messaging volume had stayed very, very high and was a time-consuming effort for folks.And so, the first use case we came out with was a draft message in the context of the message from the patient and understanding of their medical history using that medical record that we talked about.And the nurse or physician using the tool had two options. They could either click to start with that draft and edit it and then hit send, or they could go back to the old workflow and start with a blank text box and write it from their own memory as they preferred.And so that was that very first use case. There were many more that we had started from a development perspective, but, yeah, we had that rolling out right in March of 2023 there with the first folks.LEE: So, I know from our occasional discussions that some things worked very well. In fact, this is a real product now for Epic. And it seems to be really a very, very popular feature now. I know from talking to you that a lot of things have been harder. And so, Id like to dive into that. As a developer, tech developer, you know, whats been easy, whats been hard, whats in your mind still is left to do in terms of the development of AI?HAIN: Yeah. You know, the first thing that comes to mind sort of starting foundationally, and we hinted at this earlier in our conversation, was at that point in time, it was kind of per a message, rather compute-intensive to run these. And so, there were always trade-offs we were making in regards to how many pieces of information we would send into the model and how much would we request back out of it.The result of that was that while kind of theoretically or even from a research perspective, we could achieve certain outcomes that were quite advanced, one had to think about, where you make those trade-offs from a scalability perspective as you wanted to roll that out to lot of folks. So LEE: Were you charging your customers more money for this feature?HAIN: Yeah, essentially the way that we handle that is theres compute thats required. As I mentioned, the feature is just part of our application. So, its just what they get with an upgrade.But that compute overhead is something that we needed to pass through to them. And so, it was something, particularly given both the staffing challenges, but also the margin pressures that health systems are feeling today, we wanted to be very cautious and careful about.LEE: And lets put that on the stack because I do want to get into, from the selling perspective, that challenge and how you perceive health systems as a customer making those trade-offs. But lets continue on the technical side here.HAIN: Yeah. On the technical side, it was a consideration, right. We needed to be thoughtful about how we used them. But going up a layer in the stack, at that time, theres a lot of conversation in the industry around something called RAG, or retrieval-augmented generation.And the idea was, could you pull the relevant bits, the relevant pieces of the chart, into that prompt, that information you shared with the generative AI model, to be able to increase the usefulness of the draft that was being created? And that approach ended up proving and continues to be to some degree, although the techniques have greatly improved, somewhat brittle, right. You have a general-purpose technology that is drafting the response.But in many ways, you needed to, for a variety of pragmatic reasons, have somewhat brittle capability in regards to what you pulled into that approach. It tended to be pretty static. And I think this becomes one of the things that, looking forward, as these models have gotten a lot more efficient, we are and will continue to improve upon because, as you get a richer and richer amount of information into the model, it does a better job of responding.I think the third thing, and I think this is going to be something were going to continue to work through as an industry, was helping users understand and adapt to these circumstances. So many folks when they hear AI think, it will just magically do everything perfectly.And particularly early on with some of those challenges were talking about, it doesnt. You know, if its helpful 85% of the time, thats great, but its not going to be 100% of the time. And its interesting as we started, we do something we call immersion, where we always make sure that developers are right there elbow to elbow with the users of the software.And one of the things that I realized through that experience with some of the very early organizations like UCSD [UC San Diego] or University of Wisconsin here in Madison was that even when Im responding to an email or a physician is responding to one of these messages from a patient, depending on the patient and depending on the person, they respond differently.In that context, theres opportunity to continue to mimic that behavior as we go forward more deeply. And so, you learn a lot about, kind of, human behavior as youre putting these use cases out into the world.LEE: So, you know, this increasing burden of electronic communications between doctors, nurses, and patients is centered in one part of Epic. I think thats called your in-basket application, if I understand correctly.HAIN: Thats correct.LEE: But that also creates, I think, a reputational risk and challenge for Epic because as doctors feel overburdened by this and theyre feeling burnt outand as we know, thats a big issuethen they point to, you know, Oh, Im just stuck in this Epic system.And I think a lot of the dissatisfaction about the day-to-day working lives of doctors and nurses then focuses on Epic. And so, to what extent do you see technologies like generative AI as, you know, a solution to that or contributing either positively or negatively to this?HAIN: You know, earlier I made the comment that in December, as we started to explore this technology, we realized there were a class of problems that now might have solutions that never did before.And as weve started to dig into thoseand we now have about 150 different use cases that are under development, many of which are live across weve got about 350 health systems using themone of the things weve started to find is that physicians, nurses, and others start to react to saying its helping them move forward with their job.And examples of this, obviously the draft of the in-basket message response is one, but using ambient voice recognition as a kind of new input into the software so that when a patient and a physician sit down in the exam room, the physician can start a recording and that conversation then ends up getting translated or summarized, if you will, including using medical jargon, into the note in the framework that the physician would typically write.Another one of those circumstances where they then review it, dont need to type it out from scratch, for example, LEE: Right.HAIN: and can quickly move forward.I think looking forward, you know, you brought up Cosmos earlier. Its a suite of applications, but at its core is a dataset of about 300 million de-identified patients. And so using generative AI, we built research tools on top of it. And I bring that up because its a precursor of how that type of deep analytics can be put into context at the point of care. Thats what we see this technology more deeply enabling in the future.LEE: Yeah, when you are creating so you said there are about 150 sort of integrations of generative AI going into different parts of Epics software products.When you are doing those developments and then youre making a decision that something is going to get deployed, one thing that people might worry about is, well, these AI systems hallucinate. They have biases. There are unclear accountabilities, you know, maybe patient expectations.For example, if theres a note drafted by AI thats sent to a patient, does the patient have a right to know what was written by AI and what was written by the human doctor? So, can we run through how you have thought about those things?HAIN: I think one thing that is important context to set here for folks, and I think its often a point of confusion when Im chatting with folks in public, is that their interaction with generative AI is typically through a chatbot, right. Its something like ChatGPT or Bing or one of these other products where theyre essentially having a back-and-forth conversation.LEE: Right.HAIN: And that is a dramatically different experience than how we think it makes sense to embed into an enterprise set of applications.So, an example use case may be in the back office, there are folks that are coding encounters. So, when a patient comes in, right, they have the conversation with the doctor, the doctor documents it, that encounter needs to be billed for, and those folks in the back-office associate to that encounter a series of codes that provide information about how that billing should occur.So, one of the things we did from a workflow perspective was add a selector pane to the screen that uses generative AI to suggest a likely code. Now, this suggestion runs the risk of hallucination. So, the question is, how do you build into the workflow additional checks that can help the user do that?And so in this context, we always include a citation back to the part of the medical record that justifies or supports that code. So quickly on hover, the user can see, does this make sense before selecting it? And its those types of workflow pieces that we think are critical to using this technology as an aid to helping people make decisions faster, right. Its similar to drafting documentation that we talked about earlier.And its interesting because theres a series of patterns that are going back to the AI Revolution book you folks wrote two years ago. Some of these are really highlighted there, right. This idea of things like a universal translator is a common pattern that we ended up applying across the applications. And in my mind, translation, this may sound a little bit strange, but summarization is an example of translating a very long series of information in a medical record into the context that an ED physician might care about, where they have three or four minutes to quick review that very long chart.And so, in that perspective, and back to your earlier comment, we added the summary into the workflow but always made sure that the full medical record was available to that user, as well. So, a lot of what weve done over the last couple of years has been to create a series of repeatable techniques in regards to both how to build the backend use cases, where to pull the information, feed it into the generative AI models.But then I think more importantly are the user experience design patterns to help mitigate those risks you talked about and to maintain consistency across the integrated suite of applications of how those are deployed.LEE: You might remember from our book, we had a whole chapter on reducing paperwork, and I think thats been a lot of what weve been talking about. I want to get beyond that, but before transitioning, lets get some numbers.So, you talked about messages drafted to patients, to be sent to patients. So, give a sense of the volume of whats happening right now.HAIN: Oh, we are seeing across the 300 and, I think its, 48 health systems that are now using generative AIand to be clear, we have about 500 health systems we have the privilege of working with, each with many, many hospitalsthere are tens of thousands of physicians and nurses using the software. That includes drafting million-plus, for example, notes a month at this point, as well as helping to generate in a similar ballpark that number of responses to patients.The thing Im increasingly excited about is the broader set of use cases that were seeing folks starting to deploy now. One of my favorites has been its natural that as part of, for example, a radiology workflow, in studying that image, the radiologist made note that it would be worth double checking, say in six to eight months, that the patient have this area scanned of their chest. Something looks a little bit fishy there, but theres not LEE: Theres not a definitive finding yet.HAIN: theres not a definitive finding at that point. Part of that workflow is that the patients physician place an order for that in the future. And so, were using generative AI to note that back to the physician. And with one click, allow them to place that order, helping that patient get better care.Thats one example of dozens of use cases that are now live, both to help improve the care patients are getting but also help the workforce. So going back to the translation-summarization example, a nurse at the end of their shift needs to write up a summary of that shift for the next nurse for each LEE: Right.HAIN: each patient that they care for. Well, theyve been documenting information in the chart over those eight or 12 hours, right.LEE: Yep, yep.HAIN: So, we can use that information to quickly draft that end-of-shift note for the nurse. They can verify it with those citations we talked about and make any additions or edits that they need and then complete their end of day far more efficiently.LEE: Right. OK. So now lets get to Cosmos, which has been one of these projects that I think has been your baby for many years and has been something that has had a profound impact on my thinking about possibilities. So first off, what is Cosmos?HAIN: Well, just as an aside, I appreciate the thoughtful comments. There is a whole team of folks here that are really driving these projects forward. And a large part of that has been, as you brought up, both Cosmos as a foundational capability but then beginning to integrate it into applications. And thats what those folks spend time on.Cosmos is this effort across hundreds of health systems that we have the privilege of working with to build out a de-identified dataset with todayand it climbs every daybut 300 million unique patient records in it.And one of the interesting things about that structure is that, for example, if I end up in a hospital in Seattle and have that encounter documented at a health system in Seattle, I stilla de-identified version of mestill only shows up once in Cosmos, stitching together both my information from here in Madison, Wisconsin, where Epic is at, with that extra data from Seattle. The result is these 300 million unique longitudinal records that have a deep history associated with them.LEE: And just to be clear, a patient record might have hundreds or even thousands of individual, I guess what you would call, clinical records or elements.HAIN: Thats exactly right. Its the breadth of information from orders and allergies and blood pressures collected, for example, in an outpatient setting to cancer staging information that might have come through as part of an oncology visit. And its coming from a variety of sources. We exchange information about 10 million times a day between different health systems. And that full picture is available within Cosmos in that way of the patient.LEE: So now why? Why Cosmos?HAIN: Why Cosmos? Well, the real ultimate aim is to put a deeply informed in-context perspective at the point of care. So, as a patient, if Im in the exam room, its helpful for the physician and me to know what have similar patients like me experienced in this context. What was the result of that line of treatment, for example?Or as a doctor, if Im looking and working through a relatively rare or strange case to me, I might be able to connect withthis as an example workflow we built called Look-Alikeswith another physician who has seen similar patients or within the workflow see a list of likely diagnoses based on patients that have been in a similar context. And so, the design of Cosmos is to put those insights into the point of care in the context of the patient.To facilitate those steps there, the first phase was building out a set of research tooling. So, we see dozens of papers a year being published by the health systems that we work with. Those that participate in Cosmos have access to it to do research on it. And so they use both a series of analytical and data science tools to do that analysis and then publish research. So, building up trust that way.LEE: The examples you gave are, like with Look-Alikes, its very easy, I think, for people outside of the healthcare world to imagine how that could be useful. So now why is GPT-4 or any generative AI relevant to this?HAIN: Well, so a couple of different pieces, right. Earlier we talked aboutand I think this is the most importanthow generative AI is able to cast things into a specific context. And so, in that way, we can use these tools to help both identify a cohort of patients similar to you when youre in the exam room. And then also help present that information back in a way that relates to other research and understandings from medical literature to understand what are those likely outcomes.I think more broadly, these tools and generative AI techniques in the transformer architecture envision a deeper understanding of sequences of events, sequences of words. And that starts to open up broader questions about what can really be understood about patterns and sequences of events in a patients journey.Which if you didnt know, the name Epic, just like a great long nations journey is told through an epic story, is a patients story. So thats where it came from.LEE: So, were running up against our time together. And I always like to end with a more provocative question.HAIN: Certainly.LEE: And for you, I wanted to raise a question that I think we had asked ourselves in the very earliest days that we were sharing Davinci 3, what we now know of as GPT-4, with each other, which is, is there a world in the future because of AI where we dont need electronic health records anymore? Is there a world in the future without EHR?HAIN: I think it depends on how you define EHR. I see a world coming where we need to manage a hybrid workforce, where there is a combination of humans and something folks are sometimes calling agents working in concert together to care for more and more of our of the country and of the world. And there is and will need to be a series of tools to help orchestrate that hybrid workforce. And I think things like EHRs will transform into helping that operate be operationally successful.But as a patient, I think theres a very different opportunity that starts to be presented. And weve talked about kind of understanding things deeply in context. Theres also a real acceleration happening in science right now. And the possibility of bringing that second- and third-order effects of generative AI to the point of care, be that through the real-world evidence we were talking about with Cosmos or maybe personalized therapies that really are well matched to that individual. These generative AI techniques open the door for that, as well as the full lifecycle of managing that from a healthcare perspective all the way through monitoring after the fact.And so, I think well still be recording peoples stories. Their stories are relevant to them, and they can help inform the bigger picture. But I think the real question is, how do you put those in a broader context? And these tools open the door for a lot more.LEE: Well, thats really a great vision for the future.[TRANSITION MUSIC]Seth, I always really learn so much talking to you, and thank you so much for this great chat.HAIN: Thank you for inviting me.LEE: I see Seth as someone on the very leading frontier of bringing generative AI to the clinic and into the healthcare back office and at the full scale of our massive healthcare system. Its always impressive to me how thoughtful Seth has had to be about how to deploy generative AI into a clinical setting.And, you know, one thing that sticks outand he made such a point of thisis, you know, generative AI in the clinical setting isnt just a chatbot. Theyve had to really think of other ways that will guarantee that the human stays in the loop. And thats of course exactly what Carey, Zak, and I had predicted in our book. In fact, we even had a full chapter of our book entitled Trust but Verify, which really spoke to the need in medicine to always have a human being directly involved in overseeing the process of healthcare delivery.One technical point that Carey, Zak, and I completely missed, on the other hand, in our book, was the idea of something that Seth brought up called RAG, which is retrieval augmented generation. Thats the idea of giving AI access to a database of information and allowing it to use that database as it constructs its answers. And we heard from Seth how fundamental RAG is to a lot of the use cases that Epic is deploying.And finally, I continue to find Seths project called Cosmos to be a source of inspiration, and Ive continued to urge every healthcare organization that has been collecting data to consider following a similar path.In our book, we spent a great deal of time focusing on the possibility that AI might be able to reduce or even eliminate a lot of the clerical drudgery that currently exists in the delivery of healthcare. We even had a chapter entitled The Paperwork Shredder. And we heard from both Matt and Seth that that has indeed been the early focus of their work.But we also saw in our book the possibility that AI could provide diagnoses, propose treatment options, be a second set of eyes to reduce medical errors, and in the research lab be a research assistant. And here in Epics Cosmos, we are seeing just the early glimpses that perhaps generative AI can actually provide new research possibilities in addition to assistance in clinical decision making and problem solving. On the other hand, that still seems to be for the most part in our future rather than something thats happening at any scale today.But looking ahead to the future, we can still see the potential of AI helping connect healthcare delivery experiences to the advancement of medical knowledge. As Seth would say, the ability to connect bedside to the back office to the bench. Thats a pretty wonderful future that will take a lot of work and tech breakthroughs to make it real. But the fact that we now have a credible chance of making that dream happen for real, I think thats pretty wonderful.[MUSIC TRANSITIONS TO THEME]Id like to say thank you again to Matt and Seth for sharing their experiences and insights. And to our listeners, thank you for joining us. We have some really great conversations planned for the coming episodes, including a look at how patients are using generative AI for their own healthcare, as well as an episode on the laws, norms, and ethics developing around AI and health, and more. We hope youll continue to tune in.Until next time.[MUSIC FADES]0 Commenti 0 condivisioni 124 Views
-
WWW.MICROSOFT.COMVidTok introduces compact, efficient tokenization to enhance AI video processingEvery day, countless videos are uploaded and processed online, putting enormous strain on computational resources. The problem isnt just the sheer volume of dataits how this data is structured. Videos consist of raw pixel data, where neighboring pixels often store nearly identical information. This redundancy wastes resources, making it harder for systems to process visual content effectively and efficiently.To tackle this, weve developed a new approach to compress visual data into a more compact and manageable form. In our paper VidTok: A Versatile and Open-Source Video Tokenizer, we introduce a method that converts video data into smaller, structured units, or tokens. This technique provides researchers and developers in visual world modelinga field dedicated to teaching machines to interpret images and videoswith a flexible and efficient tool for advancing their work.How VidTok worksVidTok is a technique that converts raw video footage into a format that AI can easily work with and understand, a process called video tokenization. This process converts complex visual information into compact, structured tokens, as shown in Figure 1.Figure 1. An overview of how video tokenizers work, which form the basis of VidTok.By simplifying videos into manageable chunks, VidTok can enable AI systems to learn from, analyze, and generate video content more efficiently. VidTok offers several potential advantages over previous solutions:Supports both discrete and continuous tokens. Not all AI models use the same language for video generation. Some perform best with continuous tokensideal for high-quality diffusion modelswhile others rely on discrete tokens, which are better suited for step-by-step generation, like language models for video. VidTok is a tokenizer that has demonstrated seamless support for both, making it adaptable across a range of AI applications.Operates in both causal and noncausal modes. In some scenarios, video understanding depends solely on past frames (causal), while in others, it benefits from access to both past and future frames (noncausal). VidTok can accommodate both modes, making it suitable for real-time use cases like robotics and video streaming, as well as for high-quality offline video generation.Efficient training with high performance. AI-powered video generation typically requires substantial computational resources. VidTok can reduce training costs by half through a two-stage training processdelivering high performance and lowering costs.About Microsoft ResearchAdvancing science and technology to benefit humanityView our storyOpens in a new tab ArchitectureThe VidTok framework builds on a classic 3D encoder-decoder structure but introduces 2D and 1D processing techniques to handle spatial and temporal information more efficiently. Because 3D architectures are computationally intensive, VidTok combines them with less resource-intensive 2D and 1D methods to reduce computational costs while maintaining video quality.Spatial processing. Rather than treating video frames solely as 3D volumes, VidTok applies 2D convolutionspattern-recognition operations commonly used in image processingto handle spatial information within each frame more efficiently.Temporal processing. To model motion over time, VidTok introduces the AlphaBlender operator, which blends frames smoothly using a learnable parameter. Combined with 1D convolutionssimilar operations applied over sequencesthis approach captures temporal dynamics without abrupt transitions.Figure 2 illustrates VidToks architecture in detail.Figure 2. VidToks architecture. It uses a combination of 2D and 1D operations instead of solely relying on 3D techniques, improving efficiency. For smooth frame transitions, VidTok employs the AlphaBlender operator in its temporal processing modules. This approach strikes a balance between computational speed and high-quality video output.QuantizationTo efficiently compress video data, AI systems often use quantization to reduce the amount of information that needs to be stored or transmitted. A traditional method for doing this is vector quantization (VQ), which groups values together and matches them to a fixed set of patterns (known as a codebook). However, this can lead to an inefficient use of patterns and lower video quality.For VidTok, we use an approach called finite scalar quantization (FSQ). Instead of grouping values, FSQ treats each value separately. This makes the compression process more flexible and accurate, helping preserve video quality while keeping the file size small. Figure 3 shows the difference between the VQ and FSQ approaches.Figure 3. VQ (left) relies on learning a codebook, while FSQ (right) simplifies the process by independently grouping values into fixed sets, making optimization easier. VidTok adopts FSQ to enhance training stability and reconstruction quality.TrainingTraining video tokenizers requires significant computing power. VidTok uses a two-stage process:It first trains the full model on low-resolution videos.Then, it fine-tunes only the decoder using high-resolution videos.This approach cuts training costs in halffrom 3,072 to 1,536 GPU hourswhile maintaining video quality. Older tokenizers, trained on full-resolution videos from the start, were slower and more computationally intensive.VidToks method allows the model to quickly adapt to new types of videos without affecting its token distribution. Additionally, it trains on lower-frame-rate data to better capture motion, improving how it represents movement in videos.Evaluating VidTokVidToks performance evaluation using the MCL-JCV benchmarka comprehensive video quality assessment datasetand an internal dataset demonstrates its superiority over existing state-of-the-art models in video tokenization. The assessment, which covered approximately 5,000 videos of various types, employed four standard metrics to measure video quality:Peak Signal-to-Noise Ratio (PSNR)Structural Similarity Index Measure (SSIM)Learned Perceptual Image Patch Similarity (LPIPS)Frchet Video Distance (FVD)The following table and Figure 4 illustrate VidToks performance:Table 1The results indicate that VidTok outperforms existing models in both discrete and continuous tokenization scenarios. This improved performance is achieved even when using a smaller model or a more compact set of reference patterns, highlighting VidToks efficiency.Figure 4. Quantitative comparison of discrete and continuous tokenization performance in VidTok and state-of-the-art methods, evaluated using four metrics: PSNR, SSIM, LPIPS, and FVD. Larger chart areas indicate better overall performance.VidTok represents a significant development in video tokenization and processing. Its innovative architecture and training approach enable improved performance across various video quality metrics, making it a valuable tool for video analysis and compression tasks. Its capacity to model complex visual dynamics could improve the efficiency of video systems by enabling AI processing on more compact units rather than raw pixels.VidTok serves as a promising foundation for further research in video processing and representation. The code for VidTok is available on GitHub (opens in new tab), and we invite the research community to build on this work and help advance the broader field of video modeling and generation.Opens in a new tab0 Commenti 0 condivisioni 142 Views
-
WWW.MICROSOFT.COMIdeas: Accelerating Foundation Models Research: AI for allTranscript[TEASER][MUSIC PLAYS UNDER DIALOG]EVELYNE VIEGAS: So AFMR is really a program which enabled us to provide access to foundation models, but its also a global network of researchers. And so for us, I think when we started that program, it was making sure that AI was made available to anyone and not just the few, right? And really important to hear from our academic colleagues, what they were discovering and covering and what were those questions that were not even really thinking about, right? So thats how we started with AFMR.CESAR TORRES: One of the things that the AFMR program has allowed me to see is this kind of ability to better visualize the terrain of creativity. And its a little bit of a double-edged sword because when we talk about disrupting creativity and we think about tools, its typically the case that the tool is making something easier for us. So my big idea is to actually think about tools that are purposely making us slower, that have friction, that have errors, that have failures. To say that maybe the easiest path is not the most advantageous, but the one that you can feel the most fulfillment or agency towards.MUHAMMED IDRIS: For me, I think what programs like AFMR have enabled us to do is really start thinking outside the box as to how will these or how can these emerging technologies revolutionize public health? What truly would it take for an LLM to understand context? And really, I think for the first time, we can truly, truly achieve personalized, if you want to use that term, health communication.[TEASER ENDS][MUSIC PLAYS]GRETCHEN HUIZINGA: Youre listening to Ideas, a Microsoft Research podcast that dives deep into the world of technology research and the profound questions behind the code. Im Gretchen Huizinga. In this series, well explore the technologies that are shaping our future and big ideas that propel them forward.[MUSIC FADES]Im excited to share the mic today with three guests to talk about a really cool program called Accelerating Foundation Models Research, or AFMR for short. With me is Cesar Torres, an assistant professor of computer science at the University of Texas, Arlington, and the director of a program called The Hybrid Atelier. More on that soon. Im also joined by Muhammed Idris, an assistant professor of medicine at the Morehouse School of Medicine. And finally, I welcome Evelyne Viegas, a technical advisor at Microsoft Research. Cesar, Muhammed, Evelyne, welcome to Ideas!EVELYNE VIEGAS: Pleasure.CESAR TORRES: Thank you.MUHAMMED IDRIS: Thank you.HUIZINGA: So I like to start these episodes with what Ive been calling the research origin story and since there are three of you, Id like you each to give us a brief overview of your work. And if there was one, what big idea or larger than life person inspired you to do what youre doing today? Cesar lets start with you and then well have Muhammed and Evelyne give their stories as well.CESAR TORRES: Sure, thanks for having me. So, I work at the frontier of creativity especially thinking about how technology could support or augment the ways that we manipulate our world and our ideas. And I would say that the origin of why I happened into this space can really come back down to a bring your kid to work day. [LAUGHTER] My dad, who worked at Maquiladora, which is a factory on the border, took me over he was an accountant and so he first showed me the accountants and hes like look at the amazing work that these folks are doing. But the reality is that a lot of what they do is hidden behind spreadsheets and so it wasnt necessarily the most engaging. Suffice to say I did not go into accounting like my dad! [LAUGHTER] But then he showed us the chemical engineer in the factory, and he would tell me this chemical engineer holds the secret formula to the most important processes in the entire company. But again, it was this black box, right? And I got a little bit closer when I looked at this process engineer who was melting metal and pulling it out of a furnace making solder and I thought wow, thats super engaging but at the same time its like it was hidden behind machinery and heat and it was just unattainable. And so finally I saw my future career and it was a factory line worker who was opening boxes. And the way that she opened boxes was incredible. Every movement, every like shift of weight was so perfectly coordinated. And I thought, here is the peak of human ability. [LAUGHTER] This was a person who had just like found a way to leverage her surroundings, to leverage her body, the material she was working with. And I thought, this is what I want to study. I want to study how people acquire skills. And I realized that moment, I realized just how important the environment and visibility was to being able to acquire skills. And so from that moment, everything that Ive done to this point has been trying to develop technologies that could get everybody to develop a skill in the same way that I saw that factory line worker that day.HUIZINGA: Wow, well, well get to the specifics on what youre doing now and how thats relevant in a bit. But thank you for that. So Muhammed, whats the big idea behind your work and how did you get to where you are today?MUHAMMED IDRIS: Yeah, no. First off, Cesar, I think its a really cool story. I wish I had an origin story [LAUGHTER] from when I was a kid, and I knew exactly what my lifes work was going to be. Actually, my story, I figured out my why much later. Actually, my background was in finance. And I started my career in the hedge fund space at a company called BlackRock, really large financial institution you might have heard of. Then I went off and I did a PhD at Penn State. And I fully intended on going back. I was going to basically be working in spreadsheets for the rest of my life. But actually during my postdoc at the time I was living in Montreal, I actually had distant relatives of mine who were coming to Montreal to apply for asylum and it was actually in helping them navigate the process, that it became clear to me, you know, the role, it was very obvious to me, the role that technology can play in helping people help themselves. And kind of the big idea that I realized is that, you know, oftentimes, you know, the world kind of provides a set of conditions, right, that strip away our rights and our dignity and our ability to really fend for ourselves. But it was so amazing to see, you know, 10-, 12-year-old kids who, just because they had a phone, were able to help their families navigate what shelter to go to, how to apply for school, and more importantly, how do they actually start the rest of their lives? And so actually at the time, I, you know, got together a few friends, and, you know, we started to think about, well, you know, all of this information is really sitting on a bulletin board somewhere. How can we digitize it? And so we put together a pretty, I would say, bad-ass team, interdisciplinary team, included developers and refugees, and we built a prototype over a weekend. And essentially what happened was we built this really cool platform called Atar. And in many ways, I would say that it was the first real solution that leveraged a lot of the natural language processing capabilities that everyone is using today to actually help people help themselves. And it did that in three really important ways. The first way is that people could essentially ask what they needed help with in natural language. And so we had some algorithms developed that would allow us to identify somebodys intent. Taking that information then, we had a set of models that would then ask you a set of questions to understand your circumstances and determine your eligibility for resources. And then from that, wed create a customized checklist for them with everything that they needed to know, where to go, what to bring, and who to talk to in order to accomplish that thing. And it was amazing to see how that very simple prototype that we developed over a weekend really became a lifeline for a lot of people. And so thats really, I think, what motivated my work in terms of trying to combine data science, emerging technologies like AI and machine learning, with the sort of community-based research that I think is important for us to truly identify applications where, in my world right now, its really studying health disparities.HUIZINGA: Yeah. Evelyne, tell us how you got into doing what youre doing as a technical advisor. Whats the big idea behind what you do and how you got here?EVELYNE VIEGAS: So as a technical advisor in Microsoft Research, I really look for ideas out there. So ideas can come from anywhere. And so think it of scanning the horizon to look for some of those ideas out there and then figuring out, are there scientific hypotheses we should be looking at? And so the idea here is, once we have identified some of those ideas, the goal is really to help nurture a healthy pipeline for potential big bets. What I do is really about subtle science and exact art and we discover as we do and it involves a lot of discussions and conversations working with our researchers here, our scientists, but of course with the external research community. And how I got here well first I will say that I am so excited to be alive in a moment where AI has made it to industry because Ive looked and worked in AI for as long as I can remember with very different approaches. And actually as important, importantly for me is really natural languages which have enabled this big evolution. People sometimes also talk about revolution in AI, via the language models. Because when I started, so I was very fortunate growing up in an environment where my family, my extended family spoke different languages, but then it was interesting to see the different idioms in those natural languages. Just to give you an example, in English you say, it rains cats and dogs. Well, in France, in French it doesnt mean anything, right? In French, actually, it rains ropes, right? Which probably doesnt mean anything in English. [LAUGHTER] And so I was really curious about natural languages and communication. When I went to school, being good at math, I ended up doing math, realizing very quickly that I didnt want to do a career in math. You know, proofs all that is good in high school, doing a full career, was not my thing, math. You know, proofs, all that. Its good in high school, but doing a full career, it was not my thing, math. But there was that class I really, really enjoyed, which was mathematical logic. And so little by little, I started discovering people working in that field. And at the same time, I was still restless with natural languages. And so I also took some classes in linguistics on the humanity university in Toulouse in France. And I stumbled on those people who were actually working in some in linguistics, some in computer science, and then there was this lab doing computational linguistics. And then that was it for me. I was like, thats, you know, so thats how I ended up doing my PhD in computational linguistics. And the last aspect Ill talk about, because in my role today, the aspect of working with a network of people, with a global network, is still so important to me, and I think for science as a whole. At the time, there was this nascent field of computational lexical semantics. And for me, it was so important to bring people together because I realized that we all had different approaches, different theories, not even in France, but across the world, and actually, I worked with somebody else, and we co-edited the first book on computational lexical semantics, where we started exposing what it meant to do lexical semantics and the relationships between words within a larger context, with a larger context of conversations, discourse, and all those different approaches. And thats an aspect which for me to this day is so important and that was also really important to keep as we develop what were going to talk about today, Accelerating Foundation Models Research program.HUIZINGA: Yeah, this is fascinating because I didnt even know all of these stories. I just knew that there were stories here and this is the first time Im hearing them. So its like this discovery process and the sort of pushing on a door and having it be, well, thats not quite the door I want. [LAUGHTER] Lets try door number two. Lets try door number three. Well, lets get onto the topic of Accelerating Foundation Models Research and unpack the big idea behind that. Evelyne, I want to stay with you on this for a minute because Im curious as to how this initiative even came to exist and what it hopes to achieve. So, maybe start out with a breakdown of the title. It might be confusing for some people, Accelerating Foundation Models Research. What is it?VIEGAS: Yeah, thank you for the question. So I think Im going to skip quickly on accelerate research. I think people can understand its just like to bring HUIZINGA: Make it faster VIEGAS: well, faster and deeper advances. I mean, there are some nuances there, but I think the terms like foundation models, maybe thats where Ill start here. So when we talk about foundation models, just think about any model which has been trained on broad data, and which actually enables you to really do any task. Thats, I think, the simplest way to talk about it. And indeed, actually people talk a lot about large language models or language models. And so think of language models as just one part, right, for those foundation models. The term was actually coined at Stanford when people started looking at GPTs, the generative pre-trained transformers, this new architecture. And so that term was coined like to go not just talk about language models, but foundation models, because actually its not just language models, but there are also vision models. And so there are other types of models and modalities really. And so when we started with Accelerating Foundation Models Research and from now on, I will say AFMR if thats okay.HUIZINGA: Yeah. Not to be confused with ASMR, which is that sort of tingly feeling you get in your head when you hear a good sound, but AFMR, yes.VIEGAS: So with the AFMR, so actually I need to come a little bit before that and just remind us that actually that this is not just new. The point I was making earlier about its so important to engage with the external research community in academia. So Microsoft Research has been doing it for as long as Ive been at Microsoft and Ive been 25 years, I just did 25 in January.HUIZINGA: Congrats!VIEGAS: And so, I thank you! and so, its really important for Microsoft Research, for Microsoft. And so we had some programs even before the GPT, ChatGPT moment where we had engaged with the external research community on a program called the Microsoft Turing Academic Program where we provided access to the Turing model, which was a smaller model than the one then developed by OpenAI. But at that time, it was very clear that we needed to be responsible, to look at safety, to look at trustworthiness of those models. And so we cannot just drink our own Kool-Aid and so we really had to work with people externally. And so we were already doing that. But that was an effort which we couldnt scale really because to scale an effort and having multiple people that can have access to the resources, you need more of a programmatic way to be able to do that and rely on some platform, like for instance, Azure, which has security and privacy, confidentiality which enables to scale those type of efforts. And so what happens as were developing this program on the Turing model with a small set of academic people, then there was this ChatGPT moment in November 2022, which was the moment like the aha moment, I think, as I mentioned, for me, its like, wow, AI now has made it to industry. And so for us, it became very clear that we could not with this moment and the amount of resources needed on the compute side, access to actually OpenAI that new that GPT, at the beginning of GPT-3 and then 4 and then So how could we build a program? First, should we, and was there interest? And academia responded Yes! Please! Of course! right? [LAUGHTER] I mean, what are you waiting for? So AFMR is really a program which enabled us to provide access to foundation models, but its also a global network of researchers. And so for us, I think when we started that program, it was making sure that AI was made available to anyone and not just the few, right? And really important to hear from our academic colleagues, what they were discovering and covering and what were those questions that we were not even really thinking about, right? So thats how we started with AFMR.HUIZINGA: This is funny, again, on the podcast, you cant see people shaking their heads, nodding in agreement, [LAUGHTER] but the two academic researchers are going, yep, thats right. Well, Muhammed, lets talk to you for a minute. I understand AFMR started a little more than a year ago with a pilot project that revolved around health applications, so this is a prime question for you. And since youre in medicine, give us a little bit of a how it started, how its going from your perspective, and why its important for you at the Morehouse School of Medicine.IDRIS: For sure. You know, its something as we mentioned that really, I remember vividly is when I saw my first GPT-3 demo, and I was absolutely blown away. This was a little bit before the ChatGPT moment that Evelyne was mentioning, but just the possibilities, oh my God, were so exciting! And again, if I tie that back to the work that we were doing, where we were trying to kind of mimic what ChatGPT is today, there were so many models that we had to build, very complex architectures, edge cases that we didnt even realize. So you could imagine when I saw that, I said, wow, this is amazing. Its going to unlock so many possibilities. But at the same time, this demo was coming out, I actually saw a tweet about the inherent biases that were baked into these models. And Ill never forget this. I think it was at the time he was a grad student at Stanford, and they were able to show that if you asked the model to complete a very simple sentence, a sort of joke, Two Muslims walk into a bar what is it going to finish? And it was scary.HUIZINGA: Wow.IDRIS: Two thirds, it was about 66% of the time, the responses referenced some sort of violence, right? And that really was an aha moment for me personally, of course, not being that Im Muslim, but beyond that, that there are all of these possibilities. At the same time, theres a lot that we dont know about how these models might operate in the real world. And of course, the first thing that this made me do as a researcher was wonder how do these emerging technologies, how may they unintentionally lead to greater health disparities? Maybe they do. Maybe they dont. The reality is that we dont know.HUIZINGA: Right.IDRIS: Now I tie that back to something that Ive been fleshing out for myself, given my time here at Morehouse School of Medicine. And kind of what I believe is that, you know, the likely outcome, and I would say this is the case for really any sort of emerging technology, but lets specifically talk about AI, machine learning, large language models, is that if were not intentional in interrogating how they perform, then whats likely going to happen is that despite overall improvements in health, were going to see greater health disparities, right? Its almost kind of that trickle-down economics type model, right? And its really this addressing of health disparities, which is at the core of the mission of Morehouse School of Medicine. It is literally the reason why I came here a few years ago. Now, the overarching goal of our program, without getting too specific, is really around evaluating the capabilities of foundation models. And those, course, as Evelyne mentioned, are large language models. And were specifically working on facilitating accessible and culturally congruent cancer-related health information. And specifically, we need to understand that communities that are disproportionately impacted have specific challenges around trust. And all of these are kind of obstacles to taking advantage of things like cancer screenings, which we know significantly reduce the likelihood of mortality. And its going very well. We have a pretty amazing interdisciplinary team. And I think weve been able to develop a pretty cool research agenda, a few papers and a few grants. Id be happy to share about a little bit later.HUIZINGA: Yeah, thats awesome. And I will ask you about those because your project is really interesting. But I want Cesar to weigh in here on sort of the goals that are the underpinning of AFMR, which is aligning AI with human values, improving AI-human interaction, and accelerating scientific discovery. Cesar, how do these goals, writ large, align with the work youre doing at UT Arlington and how has this program helped?TORRES: Yeah, I love this moment in time that everybodys been talking about, that GPT or large language model exposure. Definitely when I experienced it, the first thing that came to my head was, I need to get this technology into the hands of my students because it is so nascent, theres so many open research questions, theres so many things that can go wrong, but theres also so much potential, right? And so when I saw this research program by Microsoft I was actually surprised. I saw that, hey, they are actually acknowledging the human element. And so the fact that there was this call for research that was looking at that human dimension was really refreshing. So like what Muhammad was saying, one of the most exciting things about these large language models is you dont have to be a computer scientist in order to use them. And it reminded me to this moment in time within the arts when digital media started getting produced. And we had this crisis. There was this idea that we would lose all the skills that we have learned from working traditionally with physical materials and having to move into a digital canvas.HUIZINGA: Right.TORRES: And its kind of this, the birth of a new medium. And were kind of at this unique position to guide how this medium is produced and to make sure that people develop that virtuosity in being able to use that medium but also understand its limitations, right? And so one of the fun projects that weve done here has been around working with our glass shop. Specifically, we have this amazing neon-bending artists here at UTA, Jeremy Scidmore and Justin Ginsberg. Weve been doing some collaborations with them, and weve been essentially monitoring how they bend glass. I run an undergraduate research program here and Ive had undergrads try to tackle this problem of how do you transfer that skill of neon bending? And the fact is that because of AFMR, here is just kind of a way to structure that undergraduate research process so that people feel comfortable to ask those dumb questions exactly where they are. But what I think is even more exciting is that they start to see that questions like skill acquisition is still something that our AI is not able to do. And so its refreshing to see; its like the research problems have not all been solved. It just means that new ones have opened and ones that we previously thought were unattainable now have this groundwork, this foundation in order to be researched, to be investigated. And so its really fertile ground. And I really thank AFMR the AFMR program for letting us have access to those grounds.HUIZINGA: Yeah. Im really eager to get into both your projects because theyre both so cool. But Evelyne, I want you to just go on this access line of thought for a second because Microsoft has given grants in this program, AFMR, to several Minority Serving Institutions, or MSIs, as theyre called, including Historically Black Colleges and Universities and Hispanic Serving Institutions, so what do these grants involve? Youve alluded to it already, but can you give us some more specifics on how Microsoft is uniquely positioned to give these and what theyre doing?VIEGAS: Yes. So the grant program, per se, is really access to resources, actually compute and API access to frontier models. So think about Azure, OpenAI but also now actually as the program evolves, its also providing access to even our research models, so Phi, I mean if you like smaller models HUIZINGA: Yeah, P-H-I.VIEGAS: Yes, Phi! [LAUGHTER] OK! So, so its really about access to those resources. Its also access to people. I was talking about this global research network and the importance of it. And Ill come back to that specifically with the Minority Serving Institutions, what we did. But actually when we started, I think we started a bit in a naive way, thinking we did an open call for proposals, a global one, and we got a great response. But actually at the beginning, we really had no participation from MSIs. [LAUGHTER] And then we thought, why? Its open its and I think what we missed there, at the beginning, is like we really focused on the technology and some people who were already a part of the kind of, this global network, started approaching us, but actually a lot of people didnt even know, didnt think they could apply, right? And so we ended up doing a more targeted call where we provided not only access to the compute resources, access to the APIs to be able to develop applications or validate or expand the work which is being done with foundation models, but also we acknowledged that it was important, with MSIs, to also enable the students of the researchers like Cesar, Muhammed, and other professors who are part of the program so that they could actually spend the time working on those projects because there are some communities where the teaching load is really high compared to other communities or other colleges. So we already had a good sense that one size doesnt fit all. And I think what came also with the MSIs and others, its like also one culture doesnt fit all, right? So its about access. Its about access to people, access to the resources and really co-designing so that we can really, really make more advances together.HUIZINGA: Yeah. Cesar lets go over to you because big general terms dont tell a story as well as specific projects with specific people. So your project is called, and Im going to read this, AI-Enhanced Bricolage: Augmenting Creative Decision Making in Creative Practices. That falls under the big umbrella of Creativity and Design. So tell our audience, and as you do make sure to explain what bricolage is and why you work in a Hybrid Atelier, terms Im sure are near and dear to Evelynes heart the French language. Talk about that, Cesar.TORRES: So at UTA, I run a lab called The Hybrid Atelier. And I chose that name because lab is almost too siloed into thinking about scientific methods in order to solve problems. And I wanted something that really spoke to the ethos of the different communities of practice that generate knowledge. And so The Hybrid Atelier is a space, its a makerspace, and its filled with the tools and knowledge that you might find in creative practices like ceramics, glass working, textiles, polymer fabrication, 3D printing. And so every year I throw something new in there. And this last year, what I threw in there was GPT and large language models. And it has been exciting to see how it has transformed. But speaking to this specific project, I think the best way I can describe bricolage is to ask you a question: what would you do if you had a paperclip, duct tape, and a chewing gum wrapper? What could you make with that, right? [LAUGHTER] And so some of us have these MacGyver-type mentalities, and that is what Claude Lvi-StraussHUIZINGA: Wow.TORRES: its been an exciting project to say the least.HUIZINGA: Okay, again, my face hurts because Im grinning so hard for so long. I have to stop. No, I dont because its amazing. You made me think of that movie Apollo 13 when theyre stuck up in space and this engineer comes in with a box of, well call it bricolage, throws it down on the table and says, we need to make this fit into this using this, go. And they didnt have AI models to help them figure it out, but they did a pretty good job. Okay, Cesar, thats fabulous. I want Muhammeds story now. I have to also calm down. Its so much fun. [LAUGHTER]IDRIS: No, know I love it. I love it and actually to bring it back to what Evelyne was mentioning earlier about just getting different perspectives in a room, I think this is a perfect example of it. Actually, Cesar, I never thought of myself as being a creative person but as soon as you said a paperclip and was it the gum wrapper HUIZINGA: Duct tape.IDRIS: duct tape or gum wrapper, I thought to myself, my first internship I was able to figure out how to make two paper clips and a rubber band into a this was of course before AirPods, right? But something that I could wrap my wires around and it was perfect! [LAUGHTER] I almost started thinking to myself, how could I even scale this, or maybe get a patent on it, but it was a paper clip yeah. Uh, so, no, no, I mean, this is really exciting stuff, yeah.HUIZINGA: Well, Muhammed, let me tee you up because I want to actually I want to say your project out loud IDRIS: Please.HUIZINGA: because its called Advancing Culturally Congruent Cancer Communication with Foundation Models. You might just beat Cesars long title with yours. I dont know. [LAUGHTER] You include alliteration, which as an English major, that makes my heart happy, but its positioned under the Cognition and Societal Benefits bucket, whereas Cesars was under Creativity and Design, but I see some crossover. Evelynes probably grinning too, because this is the whole thing about research is how do these things come together and help? Tell us, Muhammed, about this cultury culturally Tell us about your project! [LAUGHTER]IDRIS: So, you know, I think again, whenever I talk about our work, especially the mission and the why of Morehouse School of Medicine, everything really centers around health disparities, right? And if you think about it, health disparities usually comes from one of many, but lets focus on kind of three potential areas. You might not know you need help, right? If you know you need help, you might not know where to go. And if you end up there, you might not get the help that you need. And if you think about it, a lot of like the kind of the through line through all of these, it really comes down to health communication at the end of the day. Its not just what people are saying, its how people are saying it as well. And so our project focuses right now on language and text, right? But we are, as Ill talk about in a second, really exploring the kind of multimodal nature of communication more broadly and so, you know, I think another thing thats important in terms of just background context is that for us, these models are more than just tools, right? We really do feel that if were intentional about it that they can be important facilitators for public health more broadly. And thats where this idea of our project fitting under the bucket at benefiting society as a whole. Now, you know, the context is that over the past couple of decades, how weve talked about cancer, how weve shared health information has just changed dramatically. And a lot of this has to do with the rise, of course, of digital technologies more broadly, social media, and now theres AI. People have more access to health information than ever before. And despite all of these advancements, of course, as I keep saying over and over again, not everyones benefiting equally, especially when it comes to cancer screening. Now, breast and cervical cancer, thats what were focusing on specifically, are two of the leading causes of cancer-related deaths in women worldwide. And actually, black and Hispanic women in the US are at particular risk and disproportionately impacted by not just lower screening rates, but later diagnoses, and of course from that, higher mortality rates as well. Now again, an important part of the context here is COVID-19. I think there are, by some estimates, about 10 million cancer screenings that didnt happen. And this is also happening within a context of just a massive amount of misinformation. Its actually something that the WHO termed as an infodemic. And so our project is trying to kind of look for creative emerging technologies-based solutions for this. And I think were doing it in a few unique ways. Now the first way is that were looking at how foundation models like the GPTs but also open-source models and those that are, lets say, specifically fine-tuned on medical texts, how do they perform in terms of their ability to generate health information? How accurate are they? How well is it written? And whether its actually useful for the communities that need it the most. We developed an evaluation framework, and we embedded within that some qualitative dimensions that are important to health communications. And we just wrapped up an analysis where we compared the general-purpose models, like a ChatGPT, with medical and more science-specific domain models and as youd expect, the general-purpose models kind of produced information that was easier to understand, but that was of course at the risk of safety and more accurate responses that the medically tuned models were able to produce. Now a second aspect of our work, and I think this is really a unique part of not what Ive called, but actually literally theres a book called The Morehouse Model, is how is it that we could actually integrate communities into research? And specifically, my work is thinking about how do we integrate communities into the development and evaluation of language models? And thats where we get the term culturally congruent. That these models are not just accurate, but theyre also aligned with the values, the beliefs, and even the communication styles of the communities that theyre meant to serve. One of the things that were thinking, you know, quite a bit about, right, is that these are not just tools to be published on and maybe put in a GitHub, you know, repo somewhere, right? That these are actually meant to drive the sort of interventions that we need within community. So of course, implementation is really key. And so for this, you know, not only do you need to understand the context within which these models will be deployed, the goal here really is to activate you and prepare you with information to be able to advocate for yourself once you actually see your doctor, right? So that again, I think is a good example of that. But you also have to keep in mind Gretchen that, you know, our goal here is, we dont want to create greater disparities between those who have and those who dont, right? And so for example, thinking about accessibility is a big thing and thats been a part of our project as well. And so for example, were leveraging some of Azure API services for speech-to-text and were even going as far as trying to leverage some of the text-to-image models to develop visuals that address health literacy barriers and try to leverage these tools to truly, truly benefit health.HUIZINGA: One of the most delightful and sometimes surprising benefits of programs like AFMR is that the technologies developed in conjunction with people in minority communities have a big impact for people in majority communities as well, often called the Curb Cut Effect. Evelyne, I wonder if youve seen any of this happen in the short time that AFMR has been going?VIEGAS: Yeah, so, Im going to focus a bit more maybe on education and examples there where weve seen, as Cesar was also talking about it, you know for scaling and all that. But weve seen a few examples of professors working with their students where English is not the first language.HUIZINGA: Yeah VIEGAS: Another one I would mention is in the context of domains. So for domains, what I mean here is application domains, like not just in CS, but weve been working with professors who are, for instance, astronomers, or lawyers, or musicians working in universities. So they started looking actually at these LLMs as more of the super advisor helping them. And so its another way of looking at it. And actually they started focusing on, can we actually build small astronomy models, right? And Im thinking, okay, that could maybe also we learn something which could be potentially applied to some other domain. So these are some of the things we are seeing.HUIZINGA: Yes.VIEGAS: But I will finish with something which may, for me, kind of challenges this Curb Cut Effect to certain extent, if I understand the concept correctly, is that I think, with this technology and the way AI and foundation models work compared to previous technologies, I feel its kind of potentially the opposite. Its kind of like the tail catching up with the head. But here I feel that with the foundation models, I think its a different way to find information and gain some knowledge. I think that actually when we look at that, these are really broad tools that now actually can be used to help customize your own curb, as it were! So kind of the other way around.HUIZINGA: Oh, interesting VIEGAS: So I think its maybe there are two dimensions. Its not just I work on something small, and it applies to everyone. I feel there is also a dimension of, this is broad, this is any tasks, and it enables many more people. I think Cesar and Muhammed made that point earlier, is you dont have to be a CS expert or rocket scientist to start using those tools and make progress in your field. So I think that maybe there is this dimension of it.HUIZINGA: I love the way you guys are flipping my questions back on me. [LAUGHTER] So, and again, that is fascinating, you know, a custom curb, not a curb cut. Cesar, Muhammad, do you, either of you, have any examples of how perhaps this is being used in your work and youre having accidental or serendipitous discoveries that sort of have a bigger impact than what you mightve thought?TORRES: Well, one thing comes to mind. Its a project that two PhD students in my lab, Adam Emerson and Shreyosi Endow have been working on. Its around this idea of communities of practice and that is to say, when we talk about how people develop skills as a group, its often through some sort of tiered structure. And Im making a tree diagram with my hands here! [LAUGHTER] And so we often talk about what its like for an outsider to enter from outside of the community, and just how much effort it takes to get through that gate, to go through the different rungs, through the different rites of passage, to finally be a part of the inner circle, so to speak. And one of the projects that weve been doing, we started to examine these known communities of practice, where they exist. But in doing this analysis, we realized that theres a couple of folks out there that exist on the periphery. And by really focusing on them, we could start to see where the field is starting to move. And these are folks that have said, Im neither in this community or another, Im going to kind of pave my own way. While were still seeing those effects of that research go through, I think being able to monitor the communities at the fringe is a really telling sign of how were advancing as a society. I think shining some light into these fringe areas, its exactly how research develops, how its really just about expanding at some bleeding edge. And I think sometimes we just have to recontextualize that that bleeding edge is sometimes the group of people that we havent been necessarily paying attention to.HUIZINGA: Right. Love it. Muhammad, do you have a quick example or, I mean, you dont have to, but I just was curious.IDRIS: Yeah, maybe Ill just give one quick example that I think keeps me excited, actually has to do with the idea of kind of small language models, right? And so, you know, I gave the example of GPT-3 and how its trained on the entirety of the internet and with that is kind of baked in some unfortunate biases, right? And so we asked ourselves the flip side of that question. Well, how is it that we can go about actually baking in some of the good bias, right? The cultural context thats important to train these models on. And the reality is that we started off by saying, lets just have focus groups. Lets talk to people. But of course that takes time, it takes money, it takes effort. And what we quickly realized actually is there are literally generations of people who have done these focus groups specifically on breast and cervical cancer screening. And so what we actually have since done is leverage that real world data in order to actually start developing synthetic data sets that are HUIZINGA: Ahhhh.IDRIS: small enough but are of higher quality enough that allow us to address the specific concerns around bias that might not exist. And so for me, thats a really like awesome thing that we came across that I think in trying to solve a problem for our kind of specific use case, I think this could actually be a method for developing more representative, context-aware, culturally sensitive models and I think overall this contributes to the overall safety and reliability of these large language models and hopefully can create a method for people to be able to do it as well.HUIZINGA: Yeah. Evelyne, I see why its so cool for you to be sitting at Microsoft Research and working with these guys Its about now that I pose the what could possibly go wrong if you got everything right? question on this podcast. And Im really interested in how researchers are thinking about the potential downsides and consequences of their work. So, Evelyne, do you have any insights on things that youve discovered along the path that might make you take preemptive steps to mitigate?VIEGAS: Yeah, I think its coming back to actually what Muhammed was just talking about, I think Cesar, too, around data, the importance of data and the cultural value and the local value. I think an important piece of continuing to be positive for me [LAUGHTER] is to make sure that we fully understand that at the end of the day, data, which is so important to build those foundation models is, especially language models in particular, are just proxies to human beings. And I feel that its uh we need to remember that its a proxy to humans and that we all have some different beliefs, values, goals, preferences. And so how do we take all that into account? And I think that beyond the data safety, provenance, I think theres an aspect of data caring. I dont know how to say it differently, [LAUGHTER] but its kind of in the same way that we care for people, how do we care for the data as a proxy to humans? And Im thinking of, you know, when we talk about like in, especially in cases where there is no economic value, right? [LAUGHTER] And so, but there is local value for those communities. And I think actually there is cultural value across countries. So just wanted to say that there is also an aspect, I think we need to do more research on, as data as proxies to humans. And as complex humans we are, right?HUIZINGA: Right. Well, one of the other questions I like to ask on these Ideas episodes is, is about the idea of blue sky or moonshot research, kind of outrageous ideas. And sometimes theyre not so much outrageous as they are just living outside the box of traditional research, kind of the what if questions that make us excited. So just briefly, is there anything on your horizon, specifically Cesar and Muhammed, that you would say, in light of this program, AFMR, that youve had access to things that you think, boy, this now would enable me to ask those bigger questions or that bigger question. I dont know what it is. Can you share anything on that line?TORRES: I guess from my end, one of the things that the AFMR program has allowed me to see is this kind of ability to better visualize the terrain of creativity. And its a little bit of a double-edged sword because when we talk about disrupting creativity and we think about tools, its typically the case that the tool is making something easier for us. But at the same time, if somethings easier, then some other thing is harder. And then we run into this really strange case where if everything is easy, then we are faced with the blank canvas syndrome, right? Like what do you even do if everything is just equally weighted with ease? And so my big idea is to actually think about tools that are purposely making us slower HUIZINGA: Mmmmm TORRES: that have friction, that have errors, that have failures and really design how those moments can change our attitudes towards how we move around in space. To say that maybe the easiest path is not the most advantageous, but the one that you can feel the most fulfillment or agency towards. And so I really do think that this is hidden in the latent space of the data that we collect. And so we just need to be immersed in that data. We need to traverse it and really it becomes an infrastructure problem. And so the more that we expose people to these foundational models, the more that were going to be able to see how we can enable these new ways of walking through and exploring our environment.HUIZINGA: Yeah. I love this so much because Ive actually been thinking some of the best experiences in our lives havent seemed like the best experiences when we went through them, right? The tough times are what make us grow. And this idea that AI makes everything accessible and easy and frictionless is what youve said. Ive used that term too. I think of the people floating around in that movie WALL-E and all they have to do is pick whether Im wearing red or blue today and which drink I want. I love this, Cesar. Thats something I hadnt even expected you might say and boom, out of the park. Muhammad, do you have any sort of outrageous ? That was flipping it back!IDRIS: I was going to say, yeah, no, I listen, I dont know how I could top that. But no, I mean, so its funny, Cesar, as you were mentioning that I was thinking about grad school, how at the time, it was the most, you know, friction-filled life experience. But in hindsight, I wouldnt trade it in for the world. For me, you know, one of the things Im often thinking about in my job is that, you know, what if we lived in a world where everyone had all the information that they needed, access to all the care they need? What would happen then? Would we magically all be the healthiest version of ourselves? Im a little bit skeptical. Im not going to lie, right? [LAUGHTER] But thats something that Im often thinking about. Now, bringing that back down to our project, one of the things that I find a little bit amusing is that I tend to ping-pong between, this is amazing, the capabilities are just, the possibilities are endless; and then there will be kind of one or two small things where its pretty obvious that theres still a lot of research that needs to be done, right? So my whole, my big what if actually, I want to bring that back down to a kind of a technical thing which is, what if AI can truly understand culture, not just language, right? And so right now, right, an AI model can translate a public health message. Its pretty straightforward from English to Spanish, right? But it doesnt inherently understand why some Spanish speaking countries may be more hesitant about certain medical interventions. It doesnt inherently appreciate the historical context that shapes that hesitancy or what kinds of messaging would build trust rather than skepticism, right? So theres literal like cultural nuances. That to me is what, when I say culturally congruent or cultural context, what it is that I mean. And I think for me, I think what programs like AFMR have enabled us to do is really start thinking outside the box as to how will these, or how can these, emerging technologies revolutionize public health? What truly would it take for an LLM to understand context? And really, I think for the first time, we can truly, truly achieve personalized, if you want to use that term, health communication. And so thats what I would say for me is like, what would that world look like?HUIZINGA: Yeah, the big animating what if? I love this. Go ahead, Evelyne, you had something. Please.VIEGAS: Can I expand? I cannot talk. Im going to do like Muhammed, I cannot talk! Like that friction and the cultural aspect, but can I expand? And as I was listening to Cesar on the education, I think I heard you talk about the educational rite of passage at some point, and Muhammed on those cultural nuances. So first, before talking about what if? I want to say that there is some work, again, when we talk about AFMR, is the technology is all the brain power of people thinking, having crazy ideas, very creative in the research being done. And there is some research where people are looking at what it means, actually, when you build those language models and how you can take into account different language and different culture or different languages within the same culture or between different cultures speaking the same language, or So there is very interesting research. And so it made me think, expanding on what Muhammed and Cesar were talking about, so this educational rite of passage, I dont know if youre aware, so in Europe in the 17th, 18th century, there was this grand tour of Europe and that was reserved to just some people who had the funds to do that grand tour of Europe, [LAUGHTER] lets be clear! But it was this educational rite of passage where actually they had to physically go to different countries to actually get familiar and experience, experiment, philosophy and different types of politics, and So that was kind of this passage oblig we say in French. I dont know if there is a translation in English, but kind of this rite of passage basically. And so I am like, wow, what if actually we could have, thanks to the AI looking at different nuances of cultures, of languages not just language, but in a multimodal point of viewpoint, what if we could have this citizen of the world rite of passage, where we before we are really citizens of the world, we need to understand other cultures, at least be exposed to them. So that would be my what if? How do we make AI do that? And so without and for anyone, right, not just people who can afford it.HUIZINGA: Well, I dont even want to close, but we have to. And Id like each of you to reflect a bit. I think I want to frame this in a way you can sort of pick what youd like to talk about. But I often have a little bit of vision casting in this section. But there are some specific things Id like you to talk about. What learnings can you share from your experience with AFMR? Or/and whats something that strikes you as important now that may not have seemed that way when you started? And you can also, Im anticipating you people are going to flip that and say, what wasnt important that is now? And also, how do see yourself moving forward in light of this experience that youve had? So Muhammed, lets go first with you, then Cesar, and then Evelyne, you can close the show.IDRIS: Awesome. One of the things that, that Im often thinking about and one of the concepts Im often reminded of, given the significance of the work that institutions like a Morehouse School of Medicine and UT Arlington and kind of Minority Serving Institutions, right, it almost feels like there is an onslaught of pushback to addressing some of these more systemic issues that we all struggle with, is what does it mean to strive for excellence, right? So in our tradition theres a concept called Ihsan. Ihsan you know theres a lot of definitions of it but essentially to do more than just the bare minimum to truly strive for excellence and I think it was interesting, having spent time at Microsoft Research in Redmond as part of the AFMR program, meeting other folks who also participated in the program that, that I started to appreciate for myself the importance of this idea of the responsible design, development, and deployment of technologies if we truly are going to achieve the potential benefits. And I think this is one of the things that I could kind of throw out there as something to take away from this podcast, its really, dont just think of what were developing as tools, but also think of them as how will they be applied in the real world? And when youre thinking about the context within which something is going to be deployed, that brings up a lot of interesting constraints, opportunities, and just context that I think is important, again, to not just work on an interesting technology for the sake of an interesting technology, but to truly achieve that benefit for society.HUIZINGA: Hmm. Cesar.TORRES: I mean, echoing Muhammad, I think the community is really at the center of how we can move forward. I would say the one element that really struck a chord with me, and something that I very much undervalued, was the power of infrastructure and spending time laying down the proper scaffolds and steppingstones, not just for you to do what youre trying to do, but to allow others to also find their own path. I was setting up Azure from one of my classes and it took time, it took effort, but the payoff has been incredible in in so much the impact that I see now of students from my class sharing with their peers. And I think this culture of entrepreneurship really comes from taking ownership of where youve been and where you can go. But it really just, it all comes down to infrastructure. And so AFMR for me has been that infrastructure to kind of get my foot out the door and also have the ability to bring some folks along the journey with me, so HUIZINGA: Yeah. Evelyne, how blessed are you to be working with people like this? Again, my face hurts from grinning so hard. Bring us home. What are your thoughts on this?VIEGAS: Yeah, so first of all, I mean, its so wonderful just here live, like listening to the feedback from Muhammed and Cesar of what AFMR brings and has the potential to bring. And first, let me acknowledge that to put a program like AFMR, it takes a village. So Im here, the face here, or well, not the face, the voice rather! [LAUGHTER] But its so many people who have, at Microsoft on the engineering side, were just talking about infrastructure, Cesar was talking about, you know, the pain and gain of leveraging an industry-grade infrastructure like Azure and Azure AI services. So, also our policy teams, of course, our researchers. But above all, the external research community so grateful to see. Its, as you said, I feel super blessed and fortunate to be working on this program and really listening what we need to do next. How can we together do better? There is one thing for me, I want to end on the community, right? Muhammed talked about this, Cesar too, the human aspect, right? The technology is super important but also understanding the human aspect. And I will say, actually, my curb cut moment for me [LAUGHTER] was really working with the MSIs and the cohort, including Muhammed and Cesar, when they came to Redmond, and really understanding some of the needs which were going beyond the infrastructure, beyond you know a small network, how we can put it bigger and deployments ideas too, coming from the community and thats something which actually we also try to bring to the whole of AFMR moving forward. And I will finish on one note, which for me is really important moving forward. We heard from Muhammed talking about the really importance of interdisciplinarity, right, and let us not work in silo. And so, and I want to see AFMR go more international, internationality if the word exists [LAUGHTER]HUIZINGA: It does now!VIEGAS: It does now! But its just making sure that when we have those collaborations, its really hard actually, time zones, you know, practically its a nightmare! But I think there is definitely an opportunity here for all of us.HUIZINGA: Well, Cesar Torres, Muhammed Idris, Evelyne Viegas. This has been so fantastic. Thank you so much for coming on the show to share your insights on AFMR today.[MUSIC PLAYS]TORRES: It was a pleasure.IDRIS: Thank you so much.VIEGAS: Pleasure.0 Commenti 0 condivisioni 156 Views
-
WWW.MICROSOFT.COMResearch Focus: Week of March 24, 2025In this issue:We examine a new conversation segmentation method that delivers more coherent and personalized agent conversation, and we review efforts to improve MLLMs understanding of geologic maps. Check out the latest research and other updates.NEW RESEARCHResearchers from Microsoft and Tsinghua University propose a new method to help conversational AI agents deliver more coherent and personalized responses during complex long-term dialogue.Large language models (LLMs) are widely used to enable more complicated discussions across a broader range of topics than traditional dialogue systems. However, managing excessively long context that contains irrelevant information is a major challenge. Existing solutions typically perform retrieval augmented response generation by constructing memory banks from conversation history at either the turn-level, session-level, or through summarization.The proposed new approach, SeCom, constructs the memory bank at segment level by introducing a conversation Segmentation model that partitions long-term conversations into topically coherent segments, while applying Compression based denoising on memory units to enhance memory retrieval. Experimental results show that SeCom exhibits a significant performance advantage over baselines on long-term conversation benchmarks LOCOMO and Long-MT-Bench+. Additionally, the proposed conversation segmentation method demonstrates superior performance on dialogue segmentation datasets such as DialSeg711, TIAGE, and SuperDialSeg.Read the paperNEW RESEARCHPEACE: Empowering Geologic Map Holistic Understanding with MLLMsMicrosoft Researchers and external colleagues introduce GeoMap-Agent, an AI system specifically designed for geologic map understanding and analysis. In the lab, they measure its effectiveness using a new benchmark called GeoMap-Bench, a novel gauge for evaluating multimodal large language models (MLLMs) in geologic map understanding. Geologic maps provide critical insights into the structure and composition of Earths surface and subsurface. They are indispensable in fields including disaster detection, resource exploration, and civil engineering.Current MLLMs often fall short in understanding geologic maps, largely due to the challenging nature of cartographic generalization, which involves handling high-resolution maps, managing multiple associated components, and requiring domain-specific knowledge.This paper presents results of experiments in which GeoMap-Agent achieves an overall score of 0.811 on GeoMap-Bench, significantly outperforming the 0.369 score of GPT-4o. The researchers intend to enable advanced AI applications in geology, powering more efficient and accurate geological investigations.Read the paperNEW RESEARCHThe future of the industrial AI edge is cellularReliable, high-bandwidth wireless connectivity and local processing at the edge are crucial enablers for emerging industrial AI applications. This work proposes that cellular networking is the ideal connectivity solution for these applications, due to its virtualization and support for open APIs. The researchers project the emergence of a converged industrial AI edge encompassing both computing and connectivity, in which application developers leverage the API to implement advanced functionalities. They present a case study showing evidence of the effectiveness of this approach, evaluated on an enterprise-grade 5G testbed.Read the paperNEW RESEARCHRE#: High Performance Derivative-Based Regex Matching with Intersection, Complement, and Restricted LookaroundsA regular expression (regex or RE) is a sequence of characters used to match, search, and manipulate strings in text based on specific criteria. REs are used in programming languages for data validation, text parsing, and search operations.This paper presents a tool and theory built onsymbolic derivatives that does not use backtracking, while supporting both classical operators and complement, intersection, and restricted lookarounds. The researchers show that the main matching algorithm hasinput-linearcomplexity both in theory as well as experimentally. They apply thorough evaluation on popular benchmarks that show that RE# is over 71% faster than the next fastest regex engine in Rust on the baseline, andoutperforms all state-of-the-art engines on extensions of the benchmarks, often by several orders of magnitude.This work could potentially enable new applications in LLM prompt engineering frameworks, new applications in medical research and bioinformatics, and new opportunities in access and resource policy language design by web service providers.Read the paperNEW RESEARCHToward deep learning sequencestructure co-generation for protein designResearchers review recent advances in deep generative models for protein design, with a focus on sequence-structure co-generation methods. They describe the key methodological and evaluation principles underlying these methods, highlight recent advances from the literature, and discuss opportunities for continued development of sequence-structure co-generation approaches.Deep generative models that learn from the distribution of natural protein sequences and structures may enable the design of new proteins with valuable functions. While most of todays models focus on generating either sequences or structures, emerging co-generation methods promise more accurate and controllable protein design, ideally achieved by modeling both modalities simultaneously.Read the paperMicrosoft research podcastCollaborators: Silica in space with Richard Black and Dexter GreeneCollege freshman Dexter Greene and Microsoft research manager Richard Black discuss how technology that stores data in glass is supporting students as they expand earlier efforts to communicate what it means to be human to extraterrestrials.Listen nowOpens in a new tab PODCASTNew Series: The AI Revolution in Medicine, RevisitedTwo years ago, OpenAIs GPT-4 kick-started a new era in AI. In the months leading up to its public release, Peter Lee, president of Microsoft Research, cowrote The AI Revolution in Medicine: GPT-4 and Beyond, a book full of optimism for the potential of advanced AI models to transform the world of healthcare. In this special Microsoft Research Podcast series, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got rightand what they didnt foresee.Watch the seriesPODCASTThe future of generative AI for scientific discoveryMost of us think of generative AI in the context of text or image generation, but its also a powerful tool for scientific discovery. In this episode of the Leading the Shift podcast (opens in new tab), host Susan Etlinger speaks with Ade Famoti, a senior leader on the Microsoft Research Accelerator team. Ade discusses what he calls AIs physics moment, and why he believes generative AI feels fundamentally different from past platform shifts. Ade shares examples of the work Microsoft Research is doing to uncover the opportunities of generative AI for materials discoveryto improve energy efficiency and carbon capture, and for drug discovery, to fight disease. Ade also highlights the role of culture in building trust, informing priorities and driving adoption of emerging technologies.VIDEOMicrosoft Researchs Chris Bishop talks AI for Science (what it really means)In this interview, the director of Microsoft Research AI for Science, Chris Bishop, discusses how AI is unlocking new scientific outcomes, from drug creation to materials generation to improved climate modeling.Microsoft Research | In case you missed itTech Life The doctor will see you nowBBC Sounds | March 4, 2025An update on live trials in Ghana of 3D telemedicine technology, developed by Microsoft Research and external collaborators. Using portable equipment and holoportation technology, patients in remote locations can connect with a doctor many miles away. The BBC speaks to Spencer Fowers, who is the lead engineer on the project, as well as a patient and a doctor benefiting from the program. Katja Hofmann: Why we're training AI on video gamesTED Talk | October 2024In a recent TED Talk: Why were training AI on video games, Microsoft researcher Katja Hofmann discusses the work the Game Intelligence team at Microsoft Research is doing to develop AI that can transform video games. Using AI trained on years of human gameplay data, the team built World and Human Action Model, which can learn to think, play and innovate alongside humans, enabling video game creators to build more robust games. Hoffmann was also interviewed in a related article: Microsofts Muse AI Edits Video Games on the Fly. View more news and awards Opens in a new tab0 Commenti 0 condivisioni 153 Views
-
WWW.MICROSOFT.COMThe reality of generative AI in the clinicTranscript[MUSIC][BOOK PASSAGE]PETER LEE: The workload on healthcare workers in the United States has increased dramatically over the past 20 years, and in the worst way possible. Far too much of the practical, day-to-day work of healthcare has evolved into a crushing slog of filling out and handling paperwork.[END OF BOOK PASSAGE][THEME MUSIC]This is The AI Revolution in Medicine, Revisited. Im your host, Peter Lee.Shortly after OpenAIs GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?In this series, well talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.[THEME MUSIC FADES]What I read there at the top is a passage from Chapter 2 of the book, which captures part of what were going to cover in this episode.In our book, we predicted how AI would be leveraged in the clinic. Some of those predictions, I felt, were slam dunks, for example, AI being used to listen to doctor-patient conversations and write clinical notes. There were already early products coming out in the world not using generative AI that were doing just that. But other predictions we made were bolder, for instance, on the use of generative AI as a second set of eyes, to look over the shoulder of a doctor or a nurse or a patient and spot mistakes.In this episode, Im pleased to welcome Dr. Chris Longhurst and Dr. Sara Murray to talk about how clinicians in their respective systems are using AI, their reactions to it, and whats ahead. Chris is the chief clinical and innovation officer at UC San Diego Health, and he is also the executive director of the Joan & Irwin Jacobs Center for Health Innovation. Hes in charge of UCSD Healths digital strategy, including the integration of new technologies from bedside to bench and reaching across UCSD Health, the School of Medicine, and the Jacobs School of Engineering. Chris is a board-certified pediatrician and clinical informaticist.Sara is vice president and chief health AI officer at UC San Francisco Health. Sara is an internal medicine specialist and associate professor of clinical medicine. A doctor, a professor of medicine, and a strategic health system leader, she builds infrastructure and governance processes to ensure that UCSFs deployment of AI, including both AI procured from companies as well as AI-powered tools developed in-house, are trustworthy and ethical.Ive known Chris and Sara for years, and whats really impressed me about their workand frankly, the work of all the guests well have on the showis that theyve all done something significant to advance the use of AI in healthcare.[TRANSITION MUSIC]Heres my conversation with Dr. Chris Longhurst:LEE: Chris, thank you so much for joining us today.CHRISTOPHER LONGHURST: Peter, its a pleasure to be here. Really appreciate it.LEE: Were going to get into, you know, whats happening in the clinic with AI. But I think we need to find out a little bit more about you first. I introduced you as a person with a fancy title, chief clinical and innovation officer.LONGHURST: Well, I have a little bit of a unicorn job because my portfolio includes information technology, and Im a recovering CIO after spending seven years in that role. It also includes quality patient safety, case management, and the office of our chief medical officer.And so Im really trying to unify our mission to deliver highly reliable care with these new tools in a way that allows us to transform that care. One good analogy, I think, is its about the game, right. Our job is not only to play the game and win the game using the existing tools but also to change the game by leveraging these new tools and showing the rest of the country how that can be done.LEE: And so as youre doing that, I can understand, of course, youre working at a very, kind of, senior executive level. But, you know, when Ive visited you at UCSD Health, youre also working with clinicians, doctors, and nurses all the time. In a way, I viewed you as, sort of, connective tissue between these things. Is that accurate?LONGHURST: Well, sure. And weve got, you know, several physicians who are part of the executive team who are also continuing to practice, and I think thats one of the ways in which doctors on the executive team can bring value, is being that connective tissue, being the ears on the ground and a little dose of reality.LEE: [LAUGHS] Well, in fact, that reality is really what I want to delve into. But I just want to, before getting into that, talk a little bit about AI and your encounters with AI. And I think we have to do it in two stages because there is AI and machine learning and data analytics prior to the rise of generative AI and then, of course, after. And so tell us a little bit about, you know, what got you into health informatics and AI to begin with.LONGHURST: Well, Peter, I know that you play video games, and I did too for many years. So I was an early John Carmack id Software, Castle Wolfenstein, and Doom fan.LEE: Love it.LONGHURST: And that kept me occupied because I lived out in the country on 50 acres of almond trees. And so it was computer gaming that first got me into computers.But during medical school, I decided to pursue graduate work in this field called health informatics. And actually my masters thesis was using machine learning to help identify and distinguish innocent from pathologic heart murmurs in children. And I worked with Dr. Nancy Reed at UC Davis, who had programmed using Lisp, a really fancy tool to do exactly that.And I will tell you that if I never see another parentheses in Lisp code again, itll be too soon. So I spent a solid year on that.LEE: [LAUGHS] No, no, but you should wear that as a badge of honor. And I will guess that no other guest on this podcast series will have programmed in Lisp. So kudos to you.LONGHURST: [LAUGHS] Well, it was a lot of work, and I learned a lot, but as you can imagine, it wasnt highly successful at the time. And fast forward, weve had lots of traditional machine learning kind of activities using discrete data for predictive analytics to help predict flow in the hospital and even sepsis, which we can talk about. But as you said, the advent of generative AI in the fall of 2022 was a real game-changer.LEE: Well, you have this interest in technology, and, in fact, I do know you as a fairly intensely geeky person. Really, I think maybe thats one reason why weve been attracted to each other. But you also got drawn into medicine. Where did that come from?LONGHURST: So my father was a practicing cardiologist and scientist. He was MD, PhD trained, and he really shared with me both a love of medicine but also science. I worked in his lab for three summers, and it was during college I decided I wanted to apply to medical school because the human side of the science really drew me in.But my father was the one who really identified it was important to cross-train. And thats why I decided to take time off to do that masters degree in health informatics and see if I could figure out how to take two disparate fields and really combine them into one.I actually went down to Stanford to become a pediatrician because they have a standalone childrens hospital thats one of the best in the country. And I still practice pediatrics and see newborns, and its a passion for me and part of my identity.LEE: Well, Im just endlessly fascinated and impressed with people who can span these two worlds in the way that youve done. So now, you know, 2022, in November, ChatGPT gets released to the world, and then, you know, a few months later, GPT-4, and then, of course, in the last two years, so much has happened. But what was your first encounter with what we now know of as generative AI?LONGHURST: So I remember when ChatGPT was released, and, you know, some of my computer science-type of nerd friends, we were on text threads, you know, with a lot of mind-blowing emojis. But when it really hit medicine was when I got a call right after Thanksgiving in 2022 from my colleague.He was playing with ChatGPT, and he said to me, Chris, Ive been feeding it patient questions and you wouldnt believe the responses. And he emailed some of the examples to me, and my mind was blown.And so thats when I became one of the reviewers on the paper that was published in April of 2023 that showed not only could ChatGPT help answer questions from patients in a high-quality way, but it also expressed a tremendous amount of empathy.[1] And in fact, in our review, the clickbait headlines that came out of the paper were that the chatbot was both higher quality and more empathetic than doctors.But that wasnt my takeaway at all. In fact, IllAnd so, of course, thats how we became one of the first two sites in the country to roll out GPT inside our electronic health record to help draft answers to patient questions.LEE: And, you know, one thing thats worth emphasizing in the story that youve just told is that there is no other major health system that has been confronting the reality of generative AI longer than UC San Diego Healthand I think largely because of your drive and early adoption.And many listeners of this podcast will know what Epic is, but many will not. And so its worth saying that Epic is a very important creator of an electronic health records system. And of course, UC San Diego Health uses Epic to store all of the clinical data for its patients.And then Sumit is, of course, Sumit Rana, who is president at Epic.LONGHURST:And in truth, you know, health systems that have thought through this, most of the answers are not actually generated by the doctors themselves. Many times, its mid-level providers, protocol schedulers, other things, because the questions can be about anything from rescheduling an appointment to a medication refill. They dont all require doctors.When they do, its a more complicated question, and sometimes can require a more complicated answer. And in many cases, the clinicians will see a long complex question, and rather than typing an answer, theyll say, You know, this is complicated. Why dont you schedule a visit with me so we can talk about it more?LEE: Yeah, so now youve made a decision to contact people at Epic to what posit the idea that AI might be able to make responding to patient queries easier? Is that the story here?LONGHURST: Thats exactly right. And Sumit knew well that this is a challenge across many organizations. This is not unique to UC San Diego or Stanford. And theres been a lot of publications about it. Its even been in the lay press. So our hypothesis was that using GPT to help draft responses for doctors would save them time, make it easier, and potentially result in higher-quality, more empathetic answers to patients.LEE: And so now the thing that I was so impressed with is you actually did a carefully controlled study to try to understand how well does that work. So tell us a little bit first about the results of that study but then how you set it up.LONGHURST: Sure. Well, first, I want to acknowledge something you said at the beginning, which is one of my hats is the executive director of the Joan & Irwin Jacobs Center for Health Innovation. And were incredibly grateful to the Jacobs for their gift, which has allowed us to not only implement AI as part of hospital operations but also to have resources that other health systems may not have to be able to study outcomes. And so that really enabled what were going to talk about here.LEE: Right. By the way, one of the things I was personally so fascinated by is, of course, in our book, we speculated that things like after-visit notes to patients, responding to patient queries might be something that happens. And you, at the same time we were writing the book, were actually actively trying to make that real, which is just incredible and for me, and I think my coauthors, pretty affirming.LONGHURST: I think you guys were really prescient in your vision. The book is tremendous. I have a signed copy of Peters book, and I recommend it for all your listeners. [LAUGHTER]LEE: All right, so now what have you found about LONGHURST: Yeah.LEE: generative AI?LONGHURST: Yeah. Well, first to understand what we found, you have to understand how we built [the AI inbox response tool]. And so Stanford and UC San Diego really collaborated with Epic on designing what this would look like. So doctor gets that patient message. We feed some information to GPT thats not only the message but also some information about the patienttheir problems and medications and past medical and surgical history and that sort of thing.LEE: Is there a privacy concern that patients should be worried about when that happens?LONGHURST: Yeah, its a really good question. Theres not because were operating in partnership with Epic and Microsoft in a HIPAA-compliant cloud. And so that data is not only secure and private, but thats our top priority, is keeping it that way.LEE: Great.LONGHURST: So once we feed that into GPT, of course, we very quickly get a draft message that we could send to a patient. But we chose not to just send that message to a patient. So part of our AI governance is keeping a human in the loop. And theres two buttons that allow that clinician to review the message. One button says Edit draft message, and the other button says Start new blank message. So theres no button that says just Send now. And that really is illustrative of the approach that we took. The second thing, though, that we chose to do I think is really interesting from a conversation standpoint is that our AI governance, as they were looking at this, said, You know, AI is new and novel. It can be scary to patients. And if we want to maximize trust with our patients, we should maximize transparency. And so anytime a clinician uses the button that says Edit draft response, we automatically append something in the message that says, This message was automatically generated and reviewed and edited by your doctor. We felt strongly that was the right approach, and weve had a lot of positive feedback.LEE: And so well want to get into, you know, how good these messages are, whether there are issues with bias or hallucination, but before doing that, you know, on this human in loop, this was another theme in our book. And in fact, we recommended this. But there were other health systems around the country that were also later experimenting with similar ideas. And some have taken different approaches. In fact, as time has gone on, if anything, it seems like its become a little bit less clear, this sort of labeling idea. Has your view on this evolved at all over the last two years?LONGHURST: First of all, Im glad that we did it. I think it was the right choice for University of California, and in fact, the other four UC sites are all doing this, as well. There is variability across the organizations that are using this functionality, and as you suggest, theres tens of thousands of physicians and hundreds of thousands if not millions of patients receiving these messages. And its been highlighted a bit in the press.I can tell you that talking about our approach to transparency, one of our lawmakers in the state of California heard about this and actually proposed a bill that was signed into legislation by our governor so that effective Jan. 1, any communication with patients that uses AI has to be disclosed with those patients. And so there is some thought that this is perhaps the right approach.I dont think that its a perfect approach, though. Were using AI in more and more ways, and its not as if were going to be able to disclose every single time that were doing it to prioritize, you know, scheduling for the sickest patients or to help operationally on billing or something else. And so I think that there are other ways we need to figure it out. But we have called on national societies and others to try to create some guidelines around this because we should be as transparent as we can with our patients.LEE: Obviously, one of the issuesand we highlighted this a lot in our bookis the problem of hallucination. And surely this must be an issue when youre having AI draft these notes to patients. What have you found?LONGHURST: We were worried about that when we rolled it out. And what we found is not only were there very few hallucinations, in some cases, our doctors were learning from the GPT. And I can give you an example. When a patient who had had a visit wrote their doctor afterwards and said, Doc, Ive been thinking a lot about what we discussed in quitting smoking marijuana. And the GPT draft reply said something to the effect of, Thats great news. Heres a bunch of evidence on how smoking marijuana can harm your lungs and cause other effects. And by the way, since you live in the state of California, heres the marijuana quitters helpline. And the doctor who was sending this called me up to tell me about it. And I said, well, is there a marijuana quitters helpline in the state of California? And he said, I didnt know, so I Googled it. And yeah, there is. And so thats an example of the GPT actually having more information than, you know, a primary care clinician might have. And so there are cases clearly where the GPT can help us increase the quality. In addition, some of the feedback that weve been getting both anecdotally and now measuring is that these draft responses do carry that tone of empathy that Dr. [John] Ayers [2] and I saw in the original manuscript. And weve heard from our clinicians that its reminding them to be empathetic because you dont always have that time when youre hammering out a quick short message, right?LEE: You know,LONGHURST: Exactly right, Peter. In fact, one of the findings in Dr. Ayerss manuscript that didnt get as much attention but I think is really important was the difference in length between the responses. So I was one of the putatively blinded reviewers, but as I was looking at the questions and answers, it was really obvious which ones were the chatbot and which ones were the doctors because the chatbot was always, you know, three or four paragraphs and the doctor was three or four sentences, right. Its about time. And so we saw that in the results of our study.LEE: All right, so now lets get into those results.LONGHURST: OK. Well, first of all, my hypothesis was that this would help us save time, and I was wrong. It turns out a busy primary care clinician might get about 30 messages a day from patients, and each one of those messages might take about 30 seconds to type a quick response, a two-sentence response, a dot phrase, a macro. Your labs are normal. No need to worry. Ill call you if anything comes up. After we implemented the AI tool, it still took about 30 seconds per message to respond. But we saw that the responses were two to three times longer on average, and they carried a more empathetic tone. [3] And our physicians told us it decreased cognitive burden, which is not surprising because any of you have written know that its much easier to edit somebody elses copy than it is to face a blank screen, right. Thats why I like to be senior author, not lead author.And so the tool actually helped quite a bit, but it didnt help in the ways that we had expected necessarily. There are some other sites that have now found a little bit of time savings, but its really nominal overall. The Stanford study (opens in new tab) that was done at the same timeand we actually had some shared coauthorsmeasured physician burnout using a validated survey, and they saw a decrease in measured physician burnout. And so there are clear advantages to this, and were still learning more.In fact, weve now rolled this out not only to all of our physicians, but to all of our nurses who help answer those messages in many different clinics. And one of the things that were findingand Dr. CT Lin at University of Colorado recently published (opens in new tab)is that this tool might actually help those mid-level providers even more because its really good at protocolized responses. I mentioned at the beginning, some of the questions that come to the physicians may be more the edge cases that require a little bit less protocolized kind of answers. And so as we get into academic subspecialties like gynecology oncology, the GPT might not be dishing up a draft message thats quite as useful. But if youre a nurse in obstetrics and youre getting very routine pregnancy questions, it could save a ton of time. And so weve rolled this out broadly.I want to acknowledge the partnership with Seth Hain and the team at Epic, whove just been fantastic. And were finding all sorts of new ways to integrate the GPT tools into our electronic health record, as well.LEE: Yeah. Certainly the doctors and nurses that Ive encountered that have access to this feature, they just dont want to give it up. But its so interesting that it actually doesnt really save time. Is that a problem? Because, of course, you know, there seems to be a workforce shortage in healthcare, a need to lower costs and have greater efficiencies. You know, how do you think about that?LONGHURST: Great question. There are so many opportunities, as youve kind of mentioned. I mean, healthcare is full of waste and inefficiency, and I am super bullish on how these generative AI tools are going to help us reduce some of that inefficiency.So everything from revenue cycle to our call centers to operations efficiency, I think, can be positively impacted, and those things make more resources available for clinicians and others. When we think about, you know, saving clinicians time, I dont think its necessarily, sort of, the communicating with patients where you want to save that time actually. I think what we want to do is we want to offload some of those administrative tasks that, you know, take a lot of time for our physicians.So weve measured pajama time in our doctors, and on average, a busy primary care clinician can spend one to two hours after clinic doing things. But only about 15 minutes is answering messages from patients. Actually, the bulk of the time after hours is documenting the notes that are required from those visits, right. And those notes are used for a number of different purposes, not only communicating to the next doctor who sees the patient but also for billing purposes and compliance purposes and medical legal purposes. So another really exciting area is AI scribes.LEE: Yeah. And so, you know, well get into scribes and actually other possibilities. I wonder, though, about this empathy issue. Because as computer scientists, we know that you can fall into traps if you anthropomorphize these AI systems or any machine. So in this study, how was that measured, and how real do think that is?LONGHURST: So in the study, youll see anecdotal or qualitative evidence about empathy. We have a follow-up study that will be published soon where weve actually measured empathy using some more quantitative tools, and there is no doubt that the chatbot-generated drafts are coming through with more empathy. And weve heard this from a number of our doctors, so its not surprising. Heres one of the more surprising things though. I published a paper last year with Dr. Sally Baxter (opens in new tab), one of our ophthalmologists, and she actually looked at messages with a negative tone. It turns out, not surprisingly, healthcare can be frustrating. And stressed patients can send some pretty nasty messages to their care teams. [LAUGHTER] And you can imagine being a busy, LEE: Ive done it. [LAUGHS]LONGHURST: tired, exhausted clinician, and receiving a bit of a nasty gram from one of your patients can be pretty frustrating. And the GPT is actually really helpful in those instances in helping draft a pretty empathetic response when I think the human instinct would be a pretty nasty one. [LAUGHTER] I should probably use it in my email, Peter.LEE: And is the patient experience, the actually lived experience of patients when they receive these notes, are you absolutely convinced and certain that they are also benefiting from this empathetic tone?LONGHURST: I am. In fact, in our paper, we also found that the messages going to patients that had been drafted with the AI tool were two to three times longer (opens in new tab) than the messages going to patients that werent using the drafts. And so its clear theres more content going and that content is either contributing to a greater sense of empathy and relationship among the patients as well as the clinicians, and/or in some cases, that content may be educating the patients or even reducing the need for follow-up visits.LEE: Yeah, so now I think an important thing to share with the audience here is, you know, healthcare, of course, is a very highly regulated industry for good reasons. There are issues of safety and privacy that have to be guarded very, very carefully and thoroughly. And for that reason, clinical studies oftentimes have very carefully developed controls and randomization setups. And so to what extent was that done in this case? Because here, its not like youre testing a new drug. Its something thats a little fuzzier, isnt it?LONGHURST: Yeah, thats right, Peter. And credit to the lead author, Dr. Ming Tai-Seale, we actually did randomize. And so thats unusual in these type of studies. We actually got IRB [institutional review board] exemption to do this as a randomized QI study. And it was a crossover study because all the doctors wanted the functionality. So what we tested was the early adopters versus the late adopters. And we compared at the same time the early adopters to those who werent using the functionality and then later the late adopters to the folks that werent using the functionality.LEE: And in that type of study, you might also, depending on how the randomization is set up, also have to have doctors some days using it and some days not having access. Did that also happen?LONGHURST: We did, but it wasnt on a day-to-day basis. It was more a month-to-month basis.LEE: Uh-huh. And what kind of conversation do you have with a doctor that might be attached to a technology and then be told for the next month you dont get to use it?LONGHURST: [LAUGHS] The good news is because of a doctors medical training, they all understood the need for it. And the conversation was sort of, hey, were going to need you to stop using that for a month so that we can compare it, but well give it back to you afterwards.LEE: [LAUGHS] OK, great. All right. So now we made some other predictions. So we talked about, you know, responding to patients. You briefly mentioned clinical note-taking. We also made guesses about other types of paperwork, you know, filling out prior authorization requests or referral letters, maybe for a doctor to refer to a specialist. We even made some guesses about a second set of eyes on medications, on various treatment options, diagnoses. What of these things have happened and what hasnt happened, at least in your clinical experience?LONGHURST: Your guesses were spot on. And I would say almost all of them have already happened and are happening today at UC San Diego and many other health systems. We have a HIPAA-compliant GPT instance that can be used for things like generating patient letters, generating referral letters, even generating patient education with patient-friendly language. And thats a common use case. The second set of eyes on medications is something that were exploring but have not yet rolled out. One of the areas Im really excited about is reporting. So Johns Hopkins did a study a couple of years ago that showed an average academic medical center our size spends about $5 million annually (opens in new tab) just reporting on quality measures that are regulatory requirements. And thats about accurate for us.We published a paper just last fall showing that large language models could help to pre-populate quality data (opens in new tab) for things like sepsis reporting in a really effective way. It was like 91% accurate. And so thats a huge time savings and efficiency opportunity. Again, allows us to redeploy those qualities staff. Were now looking at things like how do we use large language models to review charts for peer review to help ensure ongoing, you know, accuracy and mitigate risk. Im really passionate about the whole space of using AI to improve quality and patient safety in particular.Your readers may be familiar with the famous report in 1999, To Err is Human (opens in new tab), that suggests a hundred thousand Americans die on an annual basis from medical errors. And unfortunately the data shows we really havent made great progress in 25 years, but these new tools give us the opportunity to impact that in a really meaningful way. This is a turning point in healthcare.LEE: Yeah, medication errorsactually, all manner of medical errorsI think has been just such a frustrating problem. And, you know, I think this gives us some new hope. Well, lets look ahead a little bit. And just to be a little bit provocative, you know, one question that I get asked a lot by both patients and clinicians is, you know, Will AI replace doctors sometime in the future? What are your thoughts?LONGHURST: So the pat response is AI wont replace doctors, but AI will replace doctors who dont use AI. And the implication there, of course, is that a doctor using AI will end up being a more effective practitioner than a doctor who doesnt. And I think thats absolutely true. From a medical legal standpoint, what is standard of care today and what is standard of care five or 10 years from now will be different. And I think there will be a point where doctors who arent using AI regularly, it would almost be unconscionable.LEE: Yeah, I think there are already some areas where weve seen this happen. My favorite example is with the technology of ultrasound, where if youre a gynecologist or some part of internal medicine, there are some diagnostic procedures where it would really be malpractice not to use ultrasound. Whereas in the late 1950s, the safety and also the doctor training to read ultrasound images were all called into question. And so lets look ahead two years from now, five years from now, 10 years from now. And on those three time frames, you know, what do you thinkbased on the practice of medicine today, what doctors and nurses are doing in clinic every day todaywhat do you think the biggest differences will be two years from now, five years from now, and 10 years from now?LONGHURST: Great question, Peter. So first of all, 10 years from now, I think that patients will be still coming to clinic. Doctors will still be seeing them. Hopefully well have more house calls and care occurring outside the clinic with remote monitoring and things like that. But the most important part of healthcare is the humanism. And so what Im really excited about is AI helping to restore humanism in medical care. Because weve lost some of it over the last 20, 30 years as healthcare has become more corporate.So in the next two to five years, some things I expect to see is AI baked into more workflows. So AI scribes are going to become incredibly commonplace. I also think that there are huge opportunities to use those scribes to help reduce errors in diagnosis. So five or seven years from now, I think that when youre speaking to your physician about your symptoms and other things, the scribe is going to be developing a differential diagnosis and helping recommend not only the right follow-up tests or imaging but even the physical exam findings that the doctor might want to look for in particular to help make a diagnosis.Dirty secret in healthcare, Peter, is that 50% of doctors are below average. Its just math. And I think that the AI can help raise all of our doctors. So its like Lake Wobegon. Theyre all above average. It has important implications for the workforce as you were saying. Do we need all visits to be with primary care doctors? Will mid-level providers augmented by AI be able to do as great a job as many of our physicians do? I think these are unanswered questions today that need to be explored. And then there was a really stimulating editorial in The New York Times recently by Dr. Eric Topol (opens in new tab), and he was waxing philosophic about a recent study that showed AI could interpret X-rays with 90% accuracy and radiologists actually achieve about 72% accuracy (opens in new tab).LEE: Right.LONGHURST: The study looked at, how did the radiologists do with AI working together? And they got about 74% accuracy. So the doctors didnt believe the AI. They thought that they were in the right, and the inference that Eric took that I agree with is that rather than always looking for ways to combine the two, we should be thinking about those tasks that are amenable to automation that could be offloaded with AI. So that our physicians are focused on the things that theyre great at, which is not only the humanism in healthcare but a lot of those edge cases we talked about. So lets take mammogram screening as an example, chest X-ray screening. Theres going to be a point in the next five years where all first reads are being done by AI, and then its a subset of those that are positive that need to be reviewed by physicians. And that helps free up radiologists to do a lot of other things that we need them to do.LEE: Wow, that is really just such a great vision for the future. And I call some of this the flip, where even patient expectations on the use of technology flips from fear and uncertainty to, you know, you would try to do this without the technology? And I think you just really put a lot of color and detail on that. Well, Chris, thank you so much for this. On that groundbreaking paper from April of 2023, well put a link to it. Its a really great thing to read. And of course, youve published extensively since then. But I cant thank you enough for just all the great work that youre doing. Its really changing medicine.[TRANSITION MUSIC]LONGHURST: Peter, cant thank you enough for the opportunity to be here today and the partnership with Microsoft to make this all possible.LEE: I always love talking to Chris because he really is a prime example of an important breed of doctor, a doctor who has clinical experience but is also world-class tech geek. [LAUGHS] You know, its surprising to me, and pleasantly so, that the traditional gold standard of randomized trials that Chris has employed can be used to assess the viability of generative AI, not just for things like medical diagnosis, but even for seemingly mundane things like writing email notes to patients.The other surprise is that the use of AI, at least in the in-basket task, which involves doctors having to respond to emails from patients, doesnt seem to save much time for doctors, even though the AI is drafting those notes. Doctors seem to love the reduced cognitive burden, and patients seem to appreciate the greater detail and friendliness that AI provides, but its not yet a big timesaver. And of course, the biggest surprise out of the conversation with Chris was his celebrated paper back two years ago now on the idea that AI notes are perceived by patients as being more empathetic than notes written by human doctors. Wow.Lets go ahead to my conversation with Dr. Sara Murray:LEE: Sara, Im thrilled youre here. Welcome.SARA MURRAY: Thank you so much for having me.LEE: You know, you have actually a lot of roles, and I know thats not so uncommon for people at the leading academic medical institutions. But, you know, I think for our audience, understanding what a chief health AI officer does, an associate professor of clinical medicinewhat does it all mean? And so to start, when you talk to someone, say, like your parents, how do you describe your job? You know, how do you spend a typical day at work?MURRAY: So first and foremost, I do always introduce myself as a physician because thats how I identify, thats how I trained. But in my current role, as the chief health AI officer, Im really responsible for the vision and strategy for how we use trustworthy AI at scale to solve the biggest problems in our health system. And so I think theres a couple key important points about that. One is that we have to be very careful that everything were doing in healthcare is trustworthy, meaning its safe, its ethical, its doing what we hope its doing, and its not causing any unexpected harm.And then, you know, second, we really want to be doing things that affect, you know, the population at large of the patients were taking care of. And so I think if you look historically at whats happened with AI in healthcare, youve seen little studies here and there, but nothing broadly affecting or transforming how we deliver care. And I think now that were in this generative AI era, we have the tools to start thinking about how were doing that. And so thats part of my role.LEE: And Im assuming a chief health AI officer is not a role that has been around for a long time. Is this fairly new at UCSF, or has this particular job title been around?MURRAY: No, its a relatively new role, actually. I came into this role about 18 months ago. I am the first chief health AI officer at UCSF, and I actually wrote the paper defining the role (opens in new tab) withLEE: Its so interesting because I would say in the old days, you know, like five years ago, [LAUGHS] information technology in a hospital or health-system setting might be under the control and responsibility of a chief information officer, a CIO, or an IT, you know, chief. Or if its maybe some sort of medical device technology integration, maybe its some engineering-type of leader, a chief technology officer. But youre different, and in fact the role that I think I would credit you with, sort of, making the blueprint for seems different because its actually doctors, practicing clinicians, who tend to inhabit these roles. Is there a reason why its different that way? Like, a typical CIO is not a clinician.MURRAY: Yeah, so I report to our CIO. And I think that theres a recognition that you need a clinician who really understands in practice how the tools can be deployed effectively. So its not enough to just understand the technology, but you really have to understand the use cases. And I think when youre seeing physician chief health AI officers pop up around the country, its because theyre people who both understand the technologynot to the level you do obviouslybut to some sufficient level and then understand how to use these tools in clinical care and where they can drive value and what the risks are in clinical care and that type of thing. And so I think itd be hard for it not to be some type of clinician in this role.LEE: So Im going to want to get into, you know, whats really happening in clinic, but before that, Ive been asking our guests about their stages of AI grief, [LAUGHS] as I like to put it. And for most people, Ive been talking about the experiences and encounters with machine learning and AI before ChatGPT and then afterwards. And so can you tell us a little bit about, you know, how did you get into AI in the first place and what were your first encounters like?MURRAY: Yeah. So I actually started out as a health services researcher, and this was before we had electronic health records [EHR], when we were still writing our notes on carbon copy in the elevators, and a lot of the data we used was actually from claims data. And that was the kind of rich data source at the time, but as you know, that was very limited.And so when we went live with our electronic health record, I realized there was this tremendous opportunity to really use rich clinical data for research. And so I initially started collaborating with folks down at Stanford to do machine learning to identify, you know, rare diseases like lupus in the electronic health record but quickly realized there was this real gap in the health system for using data in an actionable way.And so I built what was initially our advanced analytics team, grew into our data science team, and is now our health AI team as our ability to use the data in more sophisticated ways evolved. But if we think about, I guess, the pre-generative era and my first encounter with AI or at least AI deployment in healthcare, you know, we initially, gosh, it was probably eight or nine years ago where we got access through our EHR vendor to some initial predictive tools, and these were relatively simple tools, but they were predicting things we care about in healthcare, like whos not going make it to a clinic visit or how long patients are going stay in the hospital.And so theres a lot of interest in, you know, predicting who might not make it to a clinic visit because we have big access issues with it being difficult for patients to get appointments, and the idea was that if you knew who wouldnt show, you could actually put someone else in that slot, and its called overbooking. And so when we looked at the initial model, it was striking to me how risky it was for vulnerable patient populations because immediately it was obvious that this model was likely to overbook people by race, by body weight, by things that were clearly protected patient characteristics.And so we did a lot of work initially with that model and a lot of education around how these tools could be biased. But the risk existed, and as we continued to look at more of these models, we found there were a lot of issues with trustworthiness. You know, there was a length-of-stay prediction model that my team was able to outperform with a pair of dice. And when I talked to other systems about not implementing this model, you know, folks said, but it must be useful a little bit. I was like, actually, you know, if the dice is better, its not useful at all. [LAUGHS]LEE: Right!MURRAY: And so there was very little out there to frame this, but we quickly realized we have to start putting something together because theres a lot of hype and theres a lot of hope, but theres also a lot of risk here. And so that was my pre-generative moment.LEE: You know, just before I get to your post-generative moment, you know, this story that you told, I sometimes refer to it as the healthcare IT worlds version of irrational exuberance. Because I think one thing that Ive learned, and I have to say Ive been guilty personally as a techie, you look at some of the problems that the world of healthcare faces, and to a techie first encountering this, a lot of it looks like common sense. Of course, we can build a model and predict these things.And you sort of dont understand some of the realities, as youve described, that make this complicated. And at the same time, from healthcare professionals, I sometimes think they look at all of this dazzling machine learning magic and also are kind of overly optimistic that it can solve so many problems.And it does create this danger, this irrational exuberance, that both sides kind of get into a reinforcing cycle where theyre too quick to adopt technologies without thinking through the implications more carefully. I dont know if that resonates with you at all.MURRAY: Yeah, totally. I think theres a real educational opportunity here because its the you dont know what you dont know phenomenon. And so I do think there is a lot of work in healthcare to be done around, you know, people understanding the strengths and limitations of these tools because theyre not magic, but they are perceived to be magic.And likewise, you know, I think the tech world often doesnt understand, you know, how healthcare is practiced and doesnt think through these risks in the same way we do, right. So I know that some of the vulnerable patients who mightve been overbooked by that algorithm are the people who I most need to see in clinic and are the people who would be, you know, most slighted if that they show up and the other patient shows up and now you have an overworked clinician. But I just think those are stages, you know, further down the pathway of utilization of these algorithms that people dont think of when theyre initially developing them.And so one of the things we actually, you know, require in our AI oversight process is when folks come to the table with a tool, they have to have a plan for how its going to be used and operationalized. And a lot of things die right there, honestly, because folks have built a cool tool, but they dont know whos going to use it in clinic, who the clinical champions are, how itll be acted on, and you cant really evaluate whether these tools are trustworthy unless youve thought through all of that.Because you can imagine using the same algorithm in dramatically different ways, right. If youre using the no-show model to do targeted outreach and send people a free Lyft if they have transportation issues, thats going to have very different outcomes than overbooking folks.LEE: Its so interesting and Im going to want to get back to this topic because I think it also speaks to the challenges of how do you integrate technologies into the daily workflow of a clinic. And I know this is something you think about a lot, but lets get back now to my original question about your AI moments. So now November 2022, ChatGPT happens, and what is your encounter with this new technology?MURRAY: Yeah. So I used to be on MedTwitter, or I still am actually; its just not as active anymore. But I would say, you know, MedTwitter went crazy after ChatGPT was initially released and it was largely filled with catchy poems and people, you know, having fun LEE: [LAUGHS] Guilty.MURRAY: Yeah, exactly. I still use poems. And people having fun trying to make it hallucinate. And so, you know, I wentI was guilty of that, as welland so one of the things I initially did was I asked it to do something crazy. So I asked it, draft me a letter for a prior authorization request for a drug called Apixaban, which is a blood thinner, to treat insomnia. And if you practice clinical medicine, you know that we would never use a blood thinner to treat insomnia. But it wrote me such a compelling letter that I actually went back to PubMed and I made sure that I wasnt missing anything, like some unexpected side effect. I wasnt missing anything and in fact it was hallucination. And so at that moment I said, this is very promising technology, but this is still a party trick.LEE: Yeah.MURRAY: A few months later, I went and did the exact same prompt, and I got a lecture, instead of a draft, about how it would be unethical [LAUGHTER] and unsafe for me to draft such a request. And so, you know, I realized these tools were rapidly evolving, and the game was just going to be changing very quickly. I think the other thing that, you know, weve never seen before is the deployment of a technology at scale like we have with AI scribes.So this is a technology that was in its infancy, you know, two years ago, and is now largely a commodity deployed at scale across many health systems. A very short period of time. Theres been no government incentives for people to do this. And so it clearly works well enough to be used in clinics. And I think these tools, you know, like AI scribes, have the opportunity to really undo a lot of the harm that the electronic health record implementations were perceived to have caused.LEE: What is a scribe, first off?MURRAY: Yeah, so AI scribes or, as were now calling them, AI assistants or ambient assistants, are tools that essentially listen to your clinical interaction. We record them with the permission of a patient, with consent, and then they draft a clinical note, and they can also draft other things like the patient instructions. And the idea is those drafts are very helpful to clinicians, and they have to review them and edit them, but it saves a lot of the furious typing that was previously happening during patient encounters.LEE: We have been talking also to Chris Longhurst, your colleague at UC San Diego, and, you know, he mentions also the importance of having appropriate billing codes in those notes, which is yet another burden. Of course, when Carey, Zak, and I wrote our book, we predicted that AI scribes would get better and would find wider use because of the improvement in technology. Let me start by asking, do you yourself use an AI scribe?MURRAY: So I do not use it yet because Im an inpatient doctor, and we have deployed them to all ambulatory clinic doctors because thats where the technology is tried and true. So were looking now to deploy it in the inpatient setting, but were doing very initial testing.LEE: And what are the reasons for not integrating it into the inpatient setting?MURRAY: Well, theres two things actually. Most inpatient documentation work, I would say, is follow-up documentation. And so youre often taking your prior notes and making small changes to it as you change the care from day to day. And so the tools are just, all of the companies are working on this, but right now they dont really incorporate your prior documentation or note when they draft your note for today.The second reason is that a lot of the decision-making that we do in the inpatient setting is asynchronous with the patient. So well often have a conversation in the morning with the patient in their room, and then Ill see some labs come back and Ill make decisions and act on those labs and give the patient a call later and let them know whats going on. And so its not a very succinct encounter, and so the technology is going to have to be a little bit different to work in that case, I think.LEE: Right, and so these are distinct workflows from the ambulatory setting, where it is the classic, youre sitting with a patient in an exam room having an encounter.MURRAY: Mm-hmm. Exactly. And all your decisions are made there. And I would say its also different from nursing. Were also looking at deploying these tools to nurses. But a lot of their documentation is in something called flowsheets. They write in columns, you know, specific numbers, and so for them to use these tools, theyd have to start saying to the patient, sounds like your pain is a five. Your blood pressure is 120 over 60. And so those are different workflows theyd have to adopt to use the tools.LEE: So youve been in the position of having to oversee the integration of AI scribes into UCSF health. From your perspective how were clinical staff actually viewing all of this?MURRAY: So I would say clinical staff are largely very excited, receptive, and would like us to move faster. And in fact, I gave a town hall to UCSF, and all of the comments were, when is this coming for APPs [advanced practice providers]? When is this coming for allied health professionals? And so people want this across healthcare. Its not just doctors. But at the same time, you know, I think theres a technology adoption curve, and about half of our ambulatory clinicians have signed up and about a third of them are now using the tool. And so we are now doing outreach to figure out who is not using it, why arent they using it, and what can we do to increase adoption. Or are there true barriers that we need to help folks overcome?LEE: And when you do these things, of course, there are risks. And as you were mentioning several times before, you were really concerned about hallucinations, about trustworthiness. So what were the steps that you took at UCSF to make these integrations happen?MURRAY: Yeah, so we have a AI oversight process for all tools that come into our healthcare with AI, regardless of where theyre coming from. So industry tools, internally developed tools, and research tools come through the same process. And we have a committee that is quite multidisciplinary. We have health system leaders, data scientists, bioethicists, researchers, health-equity experts. And through our process, we break down the AI lifecycle to a couple key places where these tools come for committee review. And so for every AI deployment, we expect people to establish performance metrics, fairness metrics, and we help them with figuring out what those things should be.We were also fortunate to receive a donation to build a AI monitoring platform, which were working on now at UCSF. We call it our Impact Monitoring Platform for AI and Clinical Care, IMPACC, and AI scribes is actually our first use case. And so on that platform, we have a metric adjudication process where weve established, you know, what do we really care about for our health system executive leaders, what do we really care about for, you know, ensuring safety and trustworthiness, and then, you know, what are our patients going to want to know? Because we want to also be transparent with our patients about the use of these tools. And so we have processes for doing all this work.I think the challenge is actually how we scale these processes as more and more tools come through because as you could imagine, a lot of conversation with a lot of stakeholders to figure out what and how we measure things right now.LEE: And so theres so much to get into there, but I actually want to zoom in on the actual experience that doctors, nurses, and patients are having. And, you know, do you find that AI is meeting expectations? Is it making a difference, positive or negative, in peoples lives? And what kinds of potential surprises are people encountering?MURRAY: Mm-hmm. So were collecting data in a couple of ways. First, were surveying clinicians before and after their experience, and we are hearing from folks that they feel like their clinic work is more manageable, that theyre more able to finish their documentation in a timely fashion.And then were looking at actual metrics that we can extract from the EHR around how long people are spending doing things. And that data is largely aligning with what people are reporting, although the caveat is theyre not saving enough time for us to have them see more patients. And so weve been very explicit at UCSF around making it clear that this is a tool to improve experience and not to improve efficiency.So were not expecting for people to see more patients as a result of using this tool. We want their clinic experience to be more meaningful. But then the other thing thats interesting that folks share is this tremendous relief of cognitive burden that folks feel when using this tool. So they may have been really efficient before. You know, they could get all their work done. They could type while they were talking to their patients. But they didnt actually, you know, get to look at their patients eye to eye and have the meaningful conversation that people went into medicine for. And so were hearing that, as well.And I think one of the things thats going to be important to us is actually measuring that moving forward. And that is matched by some of the feedback were getting from patients. So we have quotes from patients where theyve said, you know, my doctor is using this new tool and its amazing. Were just having eye-to-eye conversations. Keep using it. So I think thats really important.LEE: Ive been pushing my own primary care doctor to get into this because I really depend on her. I love her dearly, but we never Im always looking at her back as shes typing at a computer during our encounters.[LAUGHS]So, Sara, while were talking about efficiency, and at least the early evidence doesnt show clear efficiency gains, it does actually beg the question about how or why health systems, many of which are financially, you know, not swimming in money, how or why they could adopt these things.And then we could also even imagine that there are even more important applications in the future that, you know, might require quite a bit of expense on developers as well as procurers of these things. You know, whats your point of view on the I guess we would call this the ROI question about AI?MURRAY: Mm-hmm. I think this is a really challenging area because return on investment is very important to health systems that are trying to figure out how to spend a limited budget to improve care delivery. And so I think weve started to see a lot of small use cases that prove this technology could likely be beneficial.So there are use cases that you may have heard of from Dr. Longhurst around drafting responses to patient messages, for example, where weve seen that this technology is helpful but doesnt get us all the way there. And thats because these technologies are actually quite expensive. And when you want to process large amounts of data, thats called tokens, and tokens cost money.And so I think one of the challenges when we envision the future of healthcare, were not really envisioning the expense of querying the entire medical record through a large language model. And were going to have to build systems from a technology standpoint that can do that work in a more affordable way for us to be able to deliver really high-value use cases to clinicians that involve processing that.And so those are use cases like summarizing large parts of the patients medical record, providing really meaningful clinical decision support that takes into account the patients entire medical history. We havent seen those types of use cases really come into being yet, largely because, you know, theyre technically a bit more complex to do well and theyre expensive, but theyre completely feasible.LEE: Yeah. You know, what youre saying really resonates so strongly from the tech industrys perspective. You know, one way that that problem manifests itself is shareholders in big tech companies like ours more or less expect theyre paying a high premiuma high multiple on the share pricebecause theyre expecting our revenues to grow at very spectacular rates, double digit rates. But that isnt obviously compatible with how healthcare works and the healthcare business works. It doesnt grow, you know, at 30% year over year or anything like that.And so how to make these things financially make sense for all comers. And its sort of part and parcel also with the problem that sometimes efficiency gains in healthcare just translate into heavier caseloads for doctors, which isnt obviously the best outcome either. And so in a way, I think its another aspect of the work on impact and trustworthiness when we think about technology, at all, in healthcare.MURRAY: Mm-hmm. I think thats right. And I think, you know, if you look at the difference between the AI scribe market and the rest of the summarization work thats largely happening within the electronic health record, in the AI scribe market, you have a lot of independent companies, and they all are competing to be the best. And so because of that, were seeing the technology get more efficient, cheaper. Theres just a lot of investment in that space.Whereas like with the electronic health record providers, theyre also invested in really providing us with these tools, but its not their main priority. Theyre delivering an entire electronic health record, and they also have to do it in a way that is affordable for, you know, all kinds of health systems, big UCSF health systems, smaller settings, and so theres a real tension, I think, between delivering good-enough tools and truly transformative tools.LEE: So I want to go back for a minute to this idea of cognitive burden that you described. When we talk about cognitive burden, its often in the context of paperwork, right. There are maybe referral letters, after-visit notes, all of these things. How do you see these AI tools progressing with respect to that stream of different administrative tasks?MURRAY: These tools are going to continue to be optimized to do more and more tasks for us. So with AI scribes, for example, you know, were starting to look at whether it can draft the billing and coding information for the clinician, which is a tedious task with many clicks.These tools are poised to start pending orders based on the conversation. Again, a tedious task. All of this with clinician oversight. But I think as we move from them being AI scribes to AI assistants, its going to be like a helper on the side for clinicians doing more and more work so they can really focus on the conversations, the shared decision-making, and the reason they went into medicine really.LEE: Yeah, let me, since you mentioned AI assistants and thats such an interesting word and it does connect with something that was apparent to us even, you know, as we were writing the book, which is this phenomenon that these AI systems might make mistakes.They might be guilty of making biased decisions or showing bias, and yet they at the same time seem incredibly effective at spotting other peoples mistakes or other peoples biased decisions. And so is there a point where these AI scribes do become AI assistants, that theyre sort of looking over a doctors shoulder and saying, Hey, did you think about something else? or Hey, you know, maybe youre wrong about a certain diagnosis?MURRAY: Mm-hmm. I mean, absolutely. Youre just really talking about combining technologies that already exist into a more streamlined clinical care experience, right. So you canand I already do this when Im on roundsIll kind of give the case to ChatGPT if its a complex case, and Ill say, Heres how Im thinking about it; are there other things? And itll give me additional ideas that are sometimes useful and sometimes not but often useful, and Ill integrate them into my conversation about the patient.I think all of these companies are thinking about that. You know, how do we integrate more clinical decision-making into the process? I think its just, you know, healthcare is always a little bit behind the technology industry in general, to say the least. And so its kind of one step at a time, and all of these use cases need a lot of validation. Theres regulatory issues, and so I think its going to take time for us to get there.LEE: Should I be impressed or concerned that the chief health AI officer at UC San Francisco Health is using ChatGPT off label?MURRAY: [LAUGHS] Well, I actually, every time I go on service, I encourage my residents to use it because I think we need to learn how to use these technologies. And, you know, when our medical education leaders start thinking about how do we teach students to use these, we dont know how to teach students to use them if were not using them ourselves, right. And so Ive learned a lot about what I perceive the strengths and limitations of the tools are.And I think but you know, one of the things that weve learned isand youve written about this in your bookbut the prompting really matters. And so I had a resident ask it for a differential for abnormal liver tests. But in asking for that differential, there is a key important blood finding, something called eosinophilia. Its a type of blood cell that was mildly, mildly elevated, and they didnt know it. So they didnt give it in the prompt, and as a result, they didnt get the right differential, but it wasnt actually ChatGPTs fault. It just didnt get the right information because the trainee didnt recognize the right information. And so I think theres a lot to learn as we practice using these tools clinically. So Im not ashamed of it. [LAUGHS]LEE: [LAUGHS] Yeah. Well, in fact, I think my coauthor Carey Goldberg would find what you said really validating because in our book, she actually wrote this fictional account of what it might be like in the future. And this medical resident was also using an AI chatbot off label for pretty much the same kinds of purposes. And its these kinds of things that, you know, it seems like might be coming next.MURRAY: I mean, medicine, the practice of medicine, is a very imperfect science, and so, you know, when we have a difficult case, I might sit in the workroom with my colleagues and run it by people. And everyone has different thoughts and opinions on, you know, things I should check for. And so I think this is just one other resource where you can kind of run cases, obviously, just reviewing all of the outputs yourself.LEE: All right, so were running short on time and so I want to be a little provocative at the end here. And since weve gotten into AI assistants, two questions: First off, do we get to a point in the near future when it would be unthinkable and maybe even bordering on malpractice for a doctor not to use AI assistants in his or her daily work?MURRAY: So its possible that we see that in the future. We dont see it right now. And thats part of the reason we dont force this on people. So we see AI scribes or AI assistants as a tool we offer to people to improve their daily work because we dont have sufficient data that the outcomes are markedly better from using these tools.I think there is a future where specific, you know, tools do actually improve outcomes. And then their use should be incentivized either through, you know, CMS [Centers for Medicare & Medicaid Services] or other systems to ensure that, you know, were delivering standard of care. But were not yet at the place where any of these tools are standard of care, which means they should be used to practice good medicine.LEE: And I think I would say that its the work of people like you that would make it possible for these things to become standard of care. And so now, final provocation. It must have crossed your mind through all of this, the possibility that AI might replace doctors in some ways. What are your thoughts?MURRAY: I think were a long way from that happening, honestly. And I think even when I talk to my colleagues in radiology about this, where I perceive, as an internist, they might be the most replaceable, theres a million reasons why thats not the case. And so I think these tools are going to augment our work. Theyre going to help us streamline access for patients. Theyre going to maybe change what clinicians have to do, but I dont think theyre going fully replace doctors. Theres just too much complexity and nuance in providing clinical care for these tools to do that work fully.LEE: Yeah, I think youre right. And actually, you know, I think theres plenty of evidence because in the history of modern medicine, we actually havent seen technology replace human doctors. Maybe you could say that we dont use barbers for bloodletting anymore because of technology. But I think, as you say, were at least a long ways away.MURRAY: Yeah.LEE: Sara, this has been just a great conversation. And thank you for the great work that youre doing, you know, and for being so open with us on your personal use of AI, but also how you see the adoption of AI in our health system.[TRANSITION MUSIC]MURRAY: Thank you, it was really great talking with you.LEE: I get so much out of talking to Sara. Every time, she manages to get me refocused on two things: the quality of the user experience and the importance of trust in any new technology that is brought into the clinic. I felt like there were several good takeaways from the conversation. One is that she really validated some predictions that Carey, Zak, and I made in our book, first and foremost, that automated note taking would be a highly desirable and practical reality. The other validation is Sara revealing that even she uses ChatGPT as a daily assistant in her clinical work, something that we guessed would happen in the book, but we werent really sure since health systems oftentimes are very locked down when it comes to the use of technological tools.And of course, maybe the biggest thing about Saras work is her role in defining a new type of job in healthcare, the health AI officer. This is something that Carey, Zak, and I didnt see coming at all, but in retrospect, makes all the sense in the world. Taken together, these two conversations really showed that we were on the right track in the book. AI has made its way into day-to-day life and work in the clinic, and both doctors and patients seem to be appreciating it.[MUSIC TRANSITIONS TO THEME]Id like to extend another big thank you to Chris and Sara for joining me on the show and sharing their insights. And to our listeners, thank you for coming along for the ride. We have some really great conversations planned for the coming episodes. Well delve into how patients are using generative AI for their own healthcare, the hype and reality of AI drug discovery, and more. We hope youll continue to tune in. Until next time.[MUSIC FADES]0 Commenti 0 condivisioni 117 Views
-
WWW.MICROSOFT.COMMetasurface: Unlocking the future of wireless sensing and communicationAs the demand for faster, more reliable wireless communication continues to grow, traditional systems face limitations in efficiency and adaptability. To keep up with evolving needs, researchers are investigating new ways to manipulate electromagnetic waves to improve wireless performance.One solution involves metasurfacesengineered materials that can control wave propagation in unprecedented ways. By dynamically shaping and directing electromagnetic waves, they can overcome the constraints of conventional wireless systems.Building on these capabilities, we are developing metasurfaces for a wide range of wireless application scenarios. Notably, we have developed metasurfaces for enhancing low earth orbit satellite communication, optimizing acoustic sensing, realizing acoustic and mmWave imaging using commodity devices. More recently, we have designed metasurfaces to enable indoor Global Navigation Satellite System (GNSS), offer good mmWave coverage over a target environment, optimize heat distribution inside a microwave oven, and deliver directional sound to a user without a headphone.All these works, published at top networking conferences, including MobiCom 2023 & 2024, MobiSys 2024 & 2025, and NSDI 2023, demonstrate the transformative potential of metasurfaces in advancing wireless communication and sensing technologies. This blog post explores some of these technologies in more detail.Microsoft Research BlogResearch at Microsoft 2024: Meeting the challenge of a changing worldIn this new AI era, technology is changing even faster than before, and the transition from research to reality, from concept to solution, now takes days or weeks rather than months or years.Read moreOpens in a new tab While GNSS is widely used for outdoor positioning and navigation, its indoor performance is often hindered by signal blockage, reflection, and attenuation caused by physical obstacles. Additional technologies like Wi-Fi and Bluetooth Low Energy (BLE) are often employed to address these issues. However, these solutions require extra infrastructure, are costly, and are complicated to deploy. Accurate positioning also typically depends on specialized hardware and software on mobile devices.Despite these challenges, GNSS signals hold promise for accurate indoor positioning. By leveraging the vast number of available satellites, GNSS-based solutions eliminate the need for base station deployment and maintenance required by Wi-Fi and BLE systems. This approach also allows seamless integration between indoor and outdoor environments, supporting continuous positioning in scenarios like guiding smart vehicles through indoor and outdoor industrial environments.To explore this potential, we conducted indoor measurements and found that GNSS satellite signals can penetrate windows at different angles and reflect or diffract from surfaces like floors and ceilings, resulting in uneven signals. Metasurfaces can control structured arrays of electromagnetic signals, allowing them to capture and redirect more GNSS signals. This allows signals to enter buildings in a path parallel to the ground, achieving broader coverage. Using this capability, we developed a GNSS positioning metasurface system (GPMS) based on passive metasurface technology.One limitation of passive metasurfaces is their lack of programmability. To overcome this and enable them to effectively guide signals from different angles and scatter them in parallel, we designed a two-layer metasurface system. As shown in Figure 1, this design ensures that electromagnetic waves from different angles follow similar emission trajectories.Figure 1: The GPMS two-layer metasurface structureTo improve positioning accuracy, we developed new algorithms that allow signals to pass through metasurfaces, using them as anchor points. Traditional GPS positioning requires signals from at least four satellites to decode location information. In the GPMS system, illustrated in Figure 2, each deployed metasurface functions as a virtual satellite. By deploying at least three metasurfaces indoors, we achieved high-precision positioning through a triangulation algorithm.Figure 2. Diagram of the GPMS system. Passive metasurfaces guide GNSS signals indoors, while enhanced positioning algorithms provide precise indoor positioning on mobile devices.To evaluate the system, we deployed the GPMS with six metasurfaces on a 1050-meter office floor and a 1520-meter conference hall. The results show significant improvements in signal quality and availability. C/N, a measure of signal-to-noise ratio, increased from 9.1 dB-Hz to 32.2 dB-Hz. The number of visible satellites increased from 3.6 to 21.5. Finally, the absolute positioning error decreased from 30.6 meters to 3.2 meters in the office and from 11.2 meters to 2.7 meters in the conference hall. These findings are promising and highlight the feasibility and advantages of GNSS-based metasurfaces for indoor positioning.Millimeter waves enable the high-speed, low-latency performance needed for 5G and 6G communication systems. While commercial products like 60 GHz Wi-Fi routers and mobile devices are becoming popular, their limited coverage and susceptibility to signal obstruction restrict their widespread application.Traditional solutions include deploying multiple millimeter-wave access points, such as routers or base stations, or placing reflective metal panels in room corners to reflect electromagnetic waves. However, these approaches are both costly and offer limited performance. Metasurfaces offer a promising alternative for improving millimeter-wave applications. Previous research has shown that programmable metasurfaces can enhance signal coverage in blind spots and significantly improve signal quality and efficiency.To maximize the benefits of metasurfaces, we developed the AutoMS automation service framework, shown in Figure 3. This proposed framework can optimize millimeter-wave coverage using low-cost passive metasurface design and strategic placement.The three main components of AutoMS can address the limitations of traditional solutions:Automated joint optimization: AutoMS determines the optimal network deployment configuration by analyzing phase settings, metasurface placement, and access point positioning. It also refines beam-forming configurations to enhance signal coverage. By iteratively identifying and optimizing the number, size, and placement of metasurfaces, AutoMS adjusts the metasurface phase settings and the access points configurations to achieve optimal signal coverage.Figure 3. The AutoMS framework generates optimized deployment plans for passive metasurface and access points based on environment scanning results.Fast 3D ray tracing simulator: Using hardware and software acceleration, our simulator efficiently calculates channel matrices resulting from metasurfaces with tens of thousands of elements. This simulator, capable of tracing 1.3 billion rays in just three minutes on an A100 GPU, significantly accelerates calculations for complex environments.Low-cost passive metasurface design: We designed a high-reflectivity passive metasurface with near-2 phase control and broadband compatibility for the millimeter-wave frequency band. This metasurface is compatible with low-precision, cost-effective thermoforming processes. This process enables users to create metasurfaces at minimal cost, significantly reducing deployment expenses.Shown in Figure 4, users can capture the environment using existing 3D scanning apps on mobile devices, generate a 3D layout model, and upload it to the cloud. AutoMS then generates metasurface settings and placement guidelines.Users can print metasurface patterns using hot stamping and customize them without affecting functionality, as millimeter waves penetrate paint and paper.Figure 4: The low-cost passive metasurface creation processEvaluation using publicly available 3D layout datasets and real-world tests shows that AutoMS significantly improves millimeter-wave coverage across various scenarios. Compared to a single router setup, AutoMS increased signal strength by 12.1 dB. Onsite tests further confirmed gains of 11 dB in target areas and over 20 dB in blind spots, with signal throughput increasing from 77 Mbps to 373 Mbps. AutoMS adapts to diverse environments, ensuring reliable and flexible deployment in real-world applications.Microwave ovens often heat unevenly, creating cold spots in food. These can allow harmful bacteria and other pathogens to survive, increasing the risk of foodborne illnesses. Uneven heating can cause eggs to burst or create hot spots that can scald.Uneven heating is due to the appliances heating mechanism. Microwave ovens generate high-power radio frequency (RF) electromagnetic waves through dielectric heating. These waves create nodes with zero amplitude, which prevents heating. They also create antinodes, where heating occurs more rapidly.To address this issue, we developed MicroSurf, a low-cost solution that improves heating by using passive metasurfaces to control electromagnetic energy inside the microwave oven. It uses the resonance effect between the metasurface and electromagnetic waves to modify the standing-wave distribution and achieve more uniform heating. This is shown in Figure 5.Figure 5: MicroSurfs working principle: Uneven electric field distribution inside the microwave oven leads to uneven heating. B. Modeling the microwave oven. C. Designing and optimizing a metasurface that can function in a high-power environment to change the standing wave distribution. D. Achieving uniform heating of different foods and selectively heating specific parts.Tests across four different microwave oven brands demonstrate that MicroSurf effectively optimizes heating for various liquids and solids, uniformly heating water, milk, bread, and meat. It concentrates heat on specific areas and adapts to differently shaped foods. MicroSurf offers a promising solution for even heating in microwave ovens, demonstrating the potential of metasurface technology in everyday applications. This innovation paves the way for smarter, more efficient home appliances.Advancing wireless innovationWireless sensing and communication technologies are evolving rapidly, driving innovation across a wide range of applications. We are continuing to push the boundaries of these technologiesparticularly in metasurface developmentwhile working to create practical solutions for a variety of use cases.Opens in a new tab0 Commenti 0 condivisioni 135 Views
-
WWW.MICROSOFT.COMClaimify: Extracting high-quality claims from language model outputsA. Several emerging markets are grappling with severe economic instability. 1. Several emerging markets are grappling with severe economic instability. B. For instance, Argentinas rampant inflation, with monthly rates reaching as high as 25.5%, has made many goods unobtainable and plunged the value of the currency, causing severe economic hardship. 1. Argentina has rampant inflation.0 Commenti 0 condivisioni 139 Views
-
WWW.MICROSOFT.COMIntroducing KBLaM: Bringing plug-and-play external knowledge to LLMsLarge language models (LLMs) have demonstrated remarkable capabilities in reasoning, language understanding, and even creative tasks. Yet, a key challenge persists: how to efficiently integrate external knowledge.Traditional methods such as fine-tuning and Retrieval-Augmented Generation (RAG) come with trade-offsfine-tuning demands costly retraining, while RAG introduces separate retrieval modules that increase complexity and prevent seamless, end-to-end training. In-context learning, on the other hand, becomes increasingly inefficient as knowledge bases grow, facing quadratic computational scaling that hinders its ability to handle large repositories. A comparison of these approaches can be seen in Figure 1.A new way to integrate knowledgeTo address these challenges, we introduce the Knowledge Base-Augmented Language Model (KBLaM) a novel approach that integrates structured knowledge bases into pre-trained LLMs. Instead of relying on external retrieval modules or costly fine-tuning, KBLaM encodes knowledge into continuous key-value vector pairs, efficiently embedding them within the models attention layers using a specialized rectangular attention mechanism, which implicitly performs retrieval in an integrated manner.We use structured knowledge bases to represent the data, allowing us to consolidate knowledge and leverage structure. This design allows it to scale linearly with the size of the knowledge base while maintaining dynamic updates without retraining, making it far more efficient than existing methods.on-demand eventMicrosoft Research Forum Episode 4Learn about the latest multimodal AI models, advanced benchmarks for AI evaluation and model self-improvement, and an entirely new kind of computer for AI inference and hard optimization. Watch on-demandOpens in a new tab Scalable, efficient, and future-readyAt its core, KBLaM is designed to integrate structured knowledge into LLMs, making them more efficient and scalable. It achieves this by converting external knowledge basescollections of facts structured as triples consisting of an entity, a property, and a valueinto a format that LLMs can process naturally. Such knowledge bases allow for consolidated, reliable sources of knowledge.To create these knowledge bases, we first extract structured data in JSON format using small language models. We then apply Project Alexandrias probabilistic clustering. Once we have this structured knowledge base, KBLaM follows a three-step pipeline:Knowledge Encoding: Each knowledge triple is mapped into a key-value vector pair using a pre-trained sentence encoder with lightweight linear adapters. The key vector, derived from the entity name and property, encodes index information, while the value vector captures the corresponding property value. This allows us to create continuous, learnable key-value representations.Integration with LLMs: These key-value pairs, or knowledge tokens, are augmented into the models attention layers using a specialized rectangular attention structure. Unlike traditional transformer models that process all tokens equally and come with quadratic costsuch as GPT-4, Phi, and Llamarectangular attention enables the model to attend over knowledge with linear cost, as illustrated in Figure 2. Compared to standard attention mechanisms in generative language models, where each token attends to all preceding tokens, our approach introduces a more efficient structure. In this setup, language tokens (such as those from a users question) attend to all knowledge tokens. However, knowledge tokens do not attend to one another, nor do they attend back to the language tokens. This selective attention pattern significantly reduces computational cost while preserving the models ability to incorporate external knowledge effectively.This linear cost, which is crucial for the efficiency of KBLaM, effectively amounts to treating each fact independentlyan assumption that holds for most facts.For example, the models name, KBLaM, and the fact that the research was conducted at Microsoft Research are very weakly correlated. This rectangular attention is implemented as an extension of standard attention. During training, we keep the base models weights frozen, ensuring that when no knowledge tokens are provided, the model functions exactly as it did originally.Efficient Knowledge Retrieval: Through this rectangular attention, the model learns to dynamically retrieve relevant knowledge tokens during inference, eliminating the need for separate retrieval steps.Figure 1: KBLaM allows for attention over the entire knowledge base instead of having an external retriever.Figure 2: By having the users question attend to the knowledge base, while treating facts in the knowledge base independently, KBLaM scales efficiently and linearly with the size of the knowledge base.Unlike RAG, which appends retrieved document chunks to prompts, KBLaM allows for direct integration of knowledge into the model. Compared to in-context learning, KBLaMs rectangular attention maintains a linear memory footprint, making it vastly more scalable for large knowledge bases.Its efficiency is a game-changer. While traditional in-context learning methods struggle with quadratic memory growth due to self-attention overhead, KBLaMs linear overhead means we can store much more knowledge in the context. In practice, this means KBLaM can store and process over 10,000 knowledge triples, the equivalent of approximately 200,000 text tokens on a single GPUa feat that would be computationally prohibitive with conventional in-context learning. The results across a wide range of triples andcan be seen in Figure 3. Remarkably, it achieves this while extending a base model that has a context length of only 8K tokens. Additionally, KBLaM enables dynamic updates: modifying a single knowledge triple does not require retraining or re-computation of the entire knowledge base.Figure 3: KBLaM is much faster and uses much less memory than adding the equivalent number of triples in the context using conventional RAG-like approaches. In particular, we have lower time to first token with 4,096 tripes in the context with KBLaM than we would with 5 triples in the context.Enhancing interpretability and reliabilityAnother major benefit of KBLaM is its interpretability. Unlike in-context learning, where knowledge injection is opaque, KBLAMs attention weights provide clear insights into how the model utilizes knowledge tokens. Experiments show that KBLaM assigns high attention scores to relevant knowledge triples, effectively mimicking a soft retrieval process.Furthermore, KBLaM enhances model reliability by learning through its training examples when not to answer a question if the necessary information is missing from the knowledge base. In particular, with knowledge bases larger than approximately 200 triples, we found that the model refuses to answer questions it has no knowledge about more precisely than a model given the information as text in context. This feature helps reduce hallucinations, a common problem in LLMs that rely on internal knowledge alone, making responses more accurate and trustworthy.The future of knowledge-augmented AIKBLaM represents a major step forward in integrating structured knowledge into LLMs. By offering a scalable, efficient, and interpretable alternative to existing techniques, it paves the way for AI systems that can stay up to date and provide reliable, knowledge-driven responses. In fields where accuracy and trust are criticalsuch as medicine, finance, and scientific researchthis approach has the potential to transform how language models interact with real-world information.As AI systems increasingly rely on dynamic knowledge rather than static model parameters, we hope KBLaM will serve as a bridge between raw computational power and real-world understanding.However, there is still work to be done before it can be deployed at scale. Our current model has been trained primarily on factual question-answer pairs, and further research is needed to expand its capabilities across more complex reasoning tasks and diverse knowledge domains.To accelerate progress, we are releasing KBLaMs code and datasets (opens in new tab) to the research community, and we are planning integrations with the Hugging Face transformers library. By making these resources available, we hope to inspire further research and adoption of scalable, efficient knowledge augmentation for LLMs. The future of AI isnt just about generating textits about generating knowledge that is accurate, adaptable, and deeply integrated with the evolving world. KBLaM is a step in that direction.Opens in a new tab0 Commenti 0 condivisioni 146 Views
-
WWW.MICROSOFT.COMSemantic Telemetry: Understanding how users interact with AI systemsAI tools are proving useful across a range of applications, from helping to drive the new era of business transformation to helping artists craft songs. But which applications are providing the most value to users? Well dig into that question in a series of blog posts that introduce the Semantic Telemetry project at Microsoft Research. In this initial post, we will introduce a new data science approach that we will use to analyze topics and task complexity of Copilot in Bing usage.Human-AI interactions can be iterative and complex, requiring a new data science approach to understand user behavior to build and support increasingly high value use cases. Imagine the following chat:Here we see that chats can be complex and span multiple topics, such as event planning, team building, and logistics. Generative AI has ushered in a two-fold paradigm shift. First, LLMs give us a new thing to measure, that is, how people interact with AI systems. Second, they give us a new way to measure those interactions, that is, they give us the capability to understand and make inferences on these interactions, at scale. The Semantic Telemetry project has created new measures to classify human-AI interactions and understand user behavior, contributing to efforts in developing new approaches for measuring generative AI (opens in new tab) across various use cases.Semantic Telemetry is a rethink of traditional telemetryin which data is collected for understanding systemsdesigned for analyzing chat-based AI. We employ an innovative data science methodology that uses a large language model (LLM) to generate meaningful categorical labels, enabling us to gain insights into chat log data.Figure 1: Prompting an LLM to classify a conversation based on LLM generated label taxonomyThis process begins with developing a set of classifications and definitions. We create these classifications by instructing an LLM to generate a short summary of the conversation, and then iteratively prompting the LLM to generate, update, and review classification labels on a batched set of summaries. This process is outlined in the paper: TnT-LLM: Text Mining at Scale with Large Language Models. We then prompt an LLM with these generated classifiers to label new unstructured (and unlabeled) chat log data.Description of LLM generated label taxonomy processWith this approach, we have analyzed how people interact with Copilot in Bing. In this blog, we examine insights into how people are using Copilot in Bing, including how that differs from traditional search engines. Note that all analyses were conducted on anonymous Copilot interactions containing no personal information.TopicsTo get a clear picture of how people are using Copilot in Bing, we need to first classify sessions into topical categories. To do this, we developed a topic classifier. We used the LLM classification approach described above to label the primary topic (domain) for the entire content of the chat. Although a single chat can cover multiple topics, for this analysis, we generated a single label for the primary topic of the conversation. We sampled five million anonymized Copilot in Bing chats during August and September 2024, and found that globally, 21% of all chats were about technology, with a high concentration of these chats in programming and scripting and computers and electronics.Figure 2: Top Copilot in Bing topics based on anonymized data (August-September 2024)Figure 3: Frequent topic summaries in TechnologyFigure 4: Frequent topic summaries in EntertainmentDiving into the technology category, we find a lot of professional tasks in programming and scripting, where users request problem-specific assistance such as fixing a SQL query syntax error. In computers and electronics, we observe users getting help with tasks like adjusting screen brightness and troubleshooting internet connectivity issues. We can compare this with our second most common topic, entertainment, in which we see users seeking information related to personal activities like hiking and game nights.We also note that top topics differ by platform. The figure below depicts topic popularity based on mobile and desktop usage. Mobile device users tend to use the chat for more personal-related tasks such as helping to plant a garden or understanding medical symptoms whereas desktop users conduct more professional tasks like revising an email.Figure 5: Top topics for desktop users and mobile usersMicrosoft research podcastIdeas: AI and democracy with Madeleine Daepp and Robert Osazuwa NessAs the biggest election year in history comes to an end, researchers Madeleine Daepp and Robert Osazuwa Ness and Democracy Forward GM Ginny Badanes discuss AIs impact on democracy, including the techs use in Taiwan and India.Listen nowOpens in a new tab Search versus CopilotBeyond analyzing topics, we compared Copilot in Bing usage to that of traditional search. Chat extends beyond traditional online search by enabling users to summarize, generate, compare, and analyze information. Human-AI interactions are conversational and more complex than traditional search (Figure 6).Figure 6: Bing Search Query compared to Copilot in Bing ConversationA major differentiation between search and chat is the ability to ask more complex questions, but how can we measure this? We think of complexity as a scale ranging from simply asking chat to look up information to evaluating several ideas. We aim to understand the difficulty of a task if performed by a human without the assistance of AI. To achieve this, we developed the task complexity classifier, which assesses task difficulty using Anderson and Krathwohls Taxonomy of Learning Objectives (opens in new tab). For our analysis, we have grouped the learning objectives into two categories: low complexity and high complexity. Any task more complicated than information lookup is classified as high complexity. Note that this would be very challenging to classify using traditional data science techniques.Description of task complexity and 6 categories of the Anderson and Krathwohls Taxonomy of Learning ObjectivesComparing low versus high complexity tasks, most chat interactions were categorized as high complexity (78.9%), meaning that they were more complex than looking up information. Programming and scripting, marketing and sales, and creative and professional writing are topics in which users engage in higher complexity tasks (Figure 7) such as learning a skill, troubleshooting a problem, or writing an article.Figure 7: Most and least complex topics based on percentage of high complexity tasks.Travel and tourism and history and culture scored lowest in complexity, with users looking up information like flights time and latest news updates.Demo of task complexity and topics on anonymous Copilot interactionsWhen should you use chat instead of search? A 2024 Microsoft Research study: The Use of Generative Search Engines for Knowledge Work and Complex Tasks, suggests that people are seeing value in technical, complex tasks such as web development and data analysis. Bing Search contained more queries with lower complexity focused on non-professional areas, like gaming and entertainment, travel and tourism, and fashion and beauty, while chat had a greater distribution of complex technical tasks. (Figure 8).Figure 8: Comparison of Bing Search and Copilot in Bing for anonymized sample data (May-June 2023)ConclusionLLMs have enabled a new era of high-quality human-AI interaction, and with it, the capability to analyze those same interactions with high fidelity, at scale, and near real-time. We are now able to obtain actionable insight from complex data that is not possible with traditional data science pattern-matching methods. LLM-generated classifications are pushing research into new directions that will ultimately improve user experience and satisfaction when using chat and other user-AI interactions tools.This analysis indicates that Copilot in Bing is enabling users to do more complex work, specifically in areas such as technology. In our next post, we will explore how Copilot in Bing is supporting professional knowledge work and how we can use these measures as indicators for retention and engagement.FOOTNOTE: This research was conducted at the time the feature Copilot in Bing was available as part of the Bing service; since October 2024 Copilot in Bing has been deprecated in favor of the standalone Microsoft Copilot service.References:Krathwohl, D. R. (2002). A Revision of Blooms Taxonomy: An Overview.Theory Into Practice,41(4), 212218. https://doi.org/10.1207/s15430421tip4104_2 (opens in new tab)Opens in a new tab0 Commenti 0 condivisioni 150 Views
-
WWW.MICROSOFT.COMThe AI Revolution in Medicine, Revisited: An IntroductionTranscript[MUSIC]PETER LEE: This is The AI Revolution in Medicine, Revisited. Im Peter Lee, president of Microsoft Research, and Im pretty excited to introduce this series of conversations as part of the Microsoft Research Podcast.About two years ago, with Carey Goldberg and Zak Kohane, we wrote a book, The AI Revolution in Medicine. This was a book that was intended to educate the world of healthcare and the world of medical research about this new thing that was emerging. This idea of generative AI. And we wrote the book in secret. In fact, the whole existence of what we now know of as OpenAIs GPT-4 AI model hadnt been publicly disclosed or revealed to the world. And so when we were working on this book, we had to make some guesses. What is this going to mean for healthcare? If youre a doctor or a nurse, in what ways will AI impact your work? If youre a patient, in what ways could AI change your experience as you try to navigate a complex healthcare system?And so now its been about two years. Two years hence, what did we get right? What did we get wrong? What things have come along much faster than we ever would have dreamed of? What did we miss? And what things have turned out to be much harder than we ever could have realized? And so this series of conversations is going to talk to people in the real world. Well delve into exactly whats happening in the clinic, the patient experience, how people are thinking about safety and regulatory matters, and what this all means for discovery and advancements of medical science. And even then, well have guests that will allow us to look into the futurethe AI advances that are happening now and what is going to happen next.[MUSIC TRANSITIONS TO SERIES THEME][MUSIC FADES]So now, let me just take a step back here to talk about this book project. And Id like to just read the first couple of sentences in Chapter 1, and Chapter 1 is entitled First Contact. And it starts with a quote. Quote, I think that Zak and his mother deserve better than that, unquote. I was being scolded. And while Ive been scolded plenty in my life, for the first time it wasnt a person scolding me; it was an artificial intelligence system. So thats how we started this book, and I wanted to read that because, at least for me, it takes me back to the kind of awe and wonderment in those early days when in secret development, we had access from OpenAI to what we now know of as GPT-4.And what was that quote about? Well, after getting access to GPT-4, I became very interested in what this might mean for healthcare. But I, not being a doctor, knew I needed help. So I had reached out to a good colleague of mine who is a doctor, a pediatric endocrinologist, and head of the bioinformatics department at Harvard Medical School, Dr. Isaac Zak Kohane. And I sought his help. And in our back-and-forth discussions, one of the things that Zak shared with me was an article that he wrote for a magazine where he talked about his use of machine learning in the care of his 90-year-old mother, his 90-year-old mother, wholike many 90-year-old peoplewas having some health issues.And this article was very interesting. It really went into some detail about not only the machine learning technology that Zak had created in order to help manage his mothers health but also the kind of emotional burden of doing this and in what ways technology was helping Zak cope with that. And so as I read that article, it touched me because at that time, I was struggling in a very similar way with my own father, who was at that time 89 years old and was also suffering from some very significant health issues. And, like Zak, I was feeling some pangs of guilt because my father was living in Southern California; I was way up in the Pacific Northwest, you know, just feeling guilty not being there, present for him, through his struggles. And reading that article a thought that occurred to me was, I wonder if in the future, AI could pretend to be me so that my father could always have a version of me to talk to. And I also had the thought in the other direction. Could AI someday capture enough of my father so that when and if he passes, I always have some memory of my father that I could interact with? A strange and bizarre thought, I admit, but a natural one, I think, for any human being thats encountering this amazing AI technology for the first time. And so I ran an experiment. I used GPT-4 to read Zaks article and then posed the question to GPT-4, Based on this article, could you pretend to be Zak? Ill pretend to be Zaks mother, and lets test whether its possible to have a mother-son conversation.To my surprise, GPT-4s response at that time was to scold me, basically saying that this is wrong; that this has a lot of dangers and risks. You know, what if Zaks mother really needs the real Zak. And in those early days of this encounter with AI, that was incredibly startling. It just really forces you to reexamine yourself, and it kicked off our writing in the book as really not only being about a technology that could help lead to better diagnoses, help reduce medical errors, reduce the amount of paperwork and clerical burden that doctors go through, could help demystify and help patients navigate a healthcare system, but it could actually be a technology that forces people to reexamine their relationships and reexamine what it really means for people to take care of other people.And since then, of course, Ive come to learn that many people have had similar experiences in their first encounters with AI. And in fact, Ive come to think of this as, somewhat tongue in cheek, the nine stages of AI grief. And they actually relate to what well try to address in this new series of conversations.For me, the first time that Greg Brockman and Sam Altman presented what we now know of as OpenAIs GPT-4 to me, they made some claims about what it could do. And my first reaction was one of skepticism, and it seemed that the claims that were being made just couldnt be true. Then that, kind of, passed into, I would say, a period of annoyance because I started to see my colleagues here in Microsoft Research start to show some amazement about the technology. I actually was annoyed because I felt they were being duped by this technology. So thats the second phase. And then, the third phase was concern and maybe even a little bit of frustration because it became clear that, as a company here at Microsoft, we were on the verge of making a big bet on this new technology. And that was concerning to me because of my fundamental skepticism. But then I got my hands on the technology myself. And that enters into a fourth stage, of amazement. You start to encounter things that just are fundamentally amazing. This leads to a period of intensity because I immediately surmised that, wow, this could really change everything and in very few areas other than healthcare would be more important areas of change. And that is stage five, a period of serious intensity where youre just losing sleep and working so hard to try to imagine what this all could mean. Running as many experiments as you can; trying to lean on as much real expertise as possible. You then lead from there into a period of what I call chagrin because as amazing as the technology is, actually understanding how to harness it in real life is not easy. You finally get into this stage of what I would call enlightenment. [MUSIC] And I wont claim to be enlightened. But it is, sort of, a combination of acceptance that we are in a new world today, that things are happening for real, and that theres, sort of, no turning back. And at that point, I think we can really get down to work. And so as we think about really the ultimate purpose of this series of conversations that were about to have, its really to help people get to that stage of enlightenment, to really, kind of, roll up our sleeves, to sit down and think through all of the best knowledge and experience that weve gathered over the last two years, and chart the future of this AI revolution in medicine.[MUSIC TRANSITIONS TO SERIES THEME]Lets get going. [MUSIC FADES]0 Commenti 0 condivisioni 150 Views
-
WWW.MICROSOFT.COMAdvancing biomedical discovery: Overcoming data challenges in precision medicineIntroductionModern biomedical research is driven by the promise of precision medicinetailored treatments for individual patients through the integration of diverse, large-scale datasets. Yet, the journey from raw data to actionable insights is fraught with challenges. Our team of researchers at Microsoft Research in the Health Futures group, in collaboration with the Perelman School of Medicine at the University of Pennsylvania (opens in new tab), conducted an in-depth exploration of these challenges in a study published in Nature Scientific Reports. The goal of this research was to identify pain points in the biomedical data lifecycle and offer actionable recommendations to enable secure data-sharing, improved interoperability, robust analysis, and foster collaboration across the biomedical research community.Study at a glanceA deep understanding of the biomedical discovery process is crucial for advancing modern precision medicine initiatives. To explore this, our study involved in-depth, semi-structured interviews with biomedical research professionals spanning various roles including bench scientists, computational biologists, researchers, clinicians, and data curators. Participants provided detailed insights into their workflows, from data acquisition and curation to analysis and result dissemination. We used an inductive-deductive thematic analysis to identify key challenges occurring at each stage of the data lifecyclefrom raw data collection to the communication of data-driven findings.Some key challenges identified include:Data procurement and validation: Researchers struggle to identify and secure the right datasets for their research questions, often battling inconsistent quality and manual data validation.Computational hurdles: The integration of multiomic data requires navigating disparate computational environments and rapidly evolving toolsets, which can hinder reproducible analysis.Data distribution and collaboration: The absence of a unified data workflow and secure sharing infrastructure often leads to bottlenecks when coordinating between stakeholders across university labs, pharmaceutical companies, clinical settings, and third-party vendors.Main takeaways and recommendations:Establishing a unified biomedical data lifecycleThis study highlights the need for a unified process that spans all phases of the biomedical discovery processfrom data-gathering and curation to analysis and dissemination. Such a data jobs-to-be-done framework would streamline standardized quality checks, reduce manual errors such as metadata reformatting, and ensure that the flow of data across different research phases remains secure and consistent. This harmonization is essential to accelerate research and build more robust, reproducible models that propel precision medicine forward.Empowering stakeholder collaboration and secure data sharingEffective biomedical discovery requires collaboration across multiple disciplines and institutions. A key takeaway from our interviews was the critical importance of collaboration and trust among stakeholders. Secure, user-friendly platforms that enable real-time data sharing and open communication among clinical trial managers, clinicians, computational scientists, and regulators can bridge the gap between isolated research silos. As a possible solution, by implementing centralized cloud-based infrastructures and democratizing data access, organizations can dramatically reduce data handoff issues and accelerate scientific discovery.Adopting actionable recommendations to address data pain pointsBased on the insights from this study, the authors propose a list of actionable recommendations such as:Creating user-friendly platforms to transition from manual (bench-side) data collection to electronic systems.Standardizing analysis workflows to facilitate reproducibility, including version control and the seamless integration of notebooks into larger workflows.Leveraging emerging technologies such as generative AI and transformer models for automating data ingestion and processing of unstructured text.If implemented, the recommendations from this study would help forge a reliable, scalable infrastructure for managing the complexity of biomedical data, ultimately advancing research and clinical outcomes.At Microsoft Research, we believe in the power of interdisciplinarity and innovation. This study not only identifies the critical pain points that have slowed biomedical discovery but also illustrates a clear path toward improved data integrity, interoperability, and collaboration. By uniting diverse stakeholders around a common, secure, and scalable data research lifecycle, we edge closer to realizing individualized therapeutics for every patient.We encourage our colleagues, partners, and the broader research community to review the full study and consider these insights as key steps toward a more integrated biomedical data research infrastructure. The future of precision medicine depends on our ability to break down data silos and create a research data lifecycle that is both robust and responsive to the challenges of big data.Explore the full paper (opens in new tab) in Nature Scientific Reports to see how these recommendations were derived, and consider how they might integrate into your work. Lets reimagine biomedical discovery togetherwhere every stakeholder contributes to a secure, interoperable, and innovative data ecosystem that transforms patient care.We look forward to engaging with the community on these ideas as we continue to push the boundaries of biomedical discovery at Microsoft Research.Access the full paperOpens in a new tab0 Commenti 0 condivisioni 157 Views
-
WWW.MICROSOFT.COMMagma: A foundation model for multimodal AI agents across digital and physical worldsImagine an AI system capable of guiding a robot to manipulate physical objects as effortlessly as it navigates software menus. Such seamless integration of digital and physical tasks has long been the stuff of science fiction.Today, Microsoft researchers are bringing that vision closer to reality with Magma (opens in new tab), a multimodal AI foundation model designed to process information and generate action proposals across both digital and physical environments. It is designed to enable AI agents to interpret user interfaces and suggest actions like button clicks, while also orchestrating robotic movements and interactions in the physical world. Built on the foundation model paradigm, Magma is pretrained on an expansive and diverse dataset, allowing it to generalize better across tasks and environments than smaller, task-specific models. As illustrated in Figure 1, Magma synthesizes visual and textual inputs to generate meaningful actionswhether executing a command in software or grabbing a tool in the physical world. This new model represents a significant step toward AI agents that can serve as versatile, general-purpose assistants.Figure 1: Magma is one of the first foundation models that is capable of interpreting and grounding multimodal inputs within both digital and physical environments. Given a described goal, Magma can formulate plans and execute actions to achieve it. By effectively transferring knowledge from freely available visual and language data, Magma bridges verbal, spatial and temporal intelligence to navigate complex tasks and settings.on-demand eventMicrosoft Research Forum Episode 4Learn about the latest multimodal AI models, advanced benchmarks for AI evaluation and model self-improvement, and an entirely new kind of computer for AI inference and hard optimization. Watch on-demandOpens in a new tab Vision-Language-Action (VLA) models integrate visual perception, language comprehension, and action reasoning to enable AI systems to interpret images, process textual instructions, and propose actions. These models bridge the gap between multimodal understanding and real-world interaction. Typically pretrained on large numbers of VLA datasets, they acquire the ability to understand visual content, process language, and perceive and interact with the spatial world, allowing them to perform a wide range of tasks. However, due to the dramatic difference among various digital and physical environments, separate VLA models are trained and used for different environments. As a result, these models struggle to generalize to new tasks and environments outside of their training data. Moreover, most of these models do not leverage pretrained vision-language (VL) models or diverse VL datasets, which hampers their understanding of VL relations and generalizability. Magma, to the best of our knowledge, is one of the first VLA foundation model that can adapt to new tasks in both digital and physical environments, which helps AI-powered assistants or robots understand their surroundings and suggest appropriate actions. For example, it could enable a home assistant robot to learn how to organize a new type of object it has never encountered or help a virtual assistant generate step-by-step user interface navigation instructions for an unfamiliar task. Through Magma, we demonstrate the advantages of pretraining a single VLA model for AI agents across multiple environments while still achieving state-of-the-art results on user interface navigation and robotic manipulation tasks, outperforming previous models that are tailored to these specific domains. On VL tasks, Magma also compares favorably to popular VL models that are trained on much larger datasets.Building a foundation model that spans such different modalities has required us to rethink how we train and supervise AI agents. Magma introduces a novel training paradigm centered on two key innovations: Set-of-Mark (SoM) and Trace-of-Mark (ToM) annotations. These techniques developed by Microsoft Research, imbue the model with a structured understanding of tasks in both user interface navigation and robotic manipulation domains.Set-of-Mark (SoM): SoM is an annotated set of key objects, or interface elements that are relevant to achieving a given goal. For example, if the task is to navigate a web page, the SoM includes all the bounding boxes for clickable user interface elements. In a physical task like setting a table, the SoM could include the plate, the cup, and the position of each item on the table. By providing SoM, we give Magma a high-level hint of what needs attentionthe essential elements of the taskwithout yet specifying the order or method.Figure 2: Set-of-Mark (SoM) for Action Grounding. Set-of-Mark prompting enables effective action grounding in images for both UI screenshot (left), robot manipulation (middle) and human video (right) by having the model predict numeric marks for clickable buttons or robot arms in image space. These marks give Magma a high-level hint of what needs attention the essential elements of the taskTrace-of-Mark (ToM): In ToM we extend the strategy of overlaying marks from static images to dynamic videos, by incorporating tracing lines following object movements over time. While SoM highlights key objects or interface elements relevant to a task, ToM captures how these elements change or move throughout an interaction. For example, in a physical task like moving an object on a table, ToM might illustrate the motion of a hand placing the object and adjusting its position. By providing these temporal traces, ToM offers Magma a richer understanding of how actions unfold, complementing SoMs focus on what needs attention.Figure 3: Trace-of-Mark (ToM) for Action Planning. Trace-of-Mark supervisions for robot manipulation (left) and human action (right). It compels the model to comprehend temporal video dynamics and anticipate future states before acting, while using fewer tokens than next-frame prediction to capture longer temporal horizons and action-related dynamics without ambient distractions.Performance and evaluationZero-shot agentic intelligenceTable 1: Zero-shot evaluation on agentic intelligence. We report the results for pretrained Magma without any domain-specific finetuning. In this experiment, Magma is the only model that can conduct the full task spectrum.Figure 4: Zero-shot evaluation on Google Robots and Bridge with SimplerEnv. Magma shows strong zero-shot cross-domain robustness and demonstrates impressive results in cross-embodiment manipulation simulation tasks.Efficient finetuningTable 2: Efficient finetuning on Mind2Web for web UI navigation.Figure 5: Few-shot finetuning on Widow-X robot (left) and LIBERO (right). Magma achieves a significantly higher average success rate in all task suites. Additionally, removing SoM and ToM during pretraining has a negative impact on model performance. Table 3: Without task-specific data, Magma performs competitively and even outperforms some state-of-the-art approaches such as Video-Llama2 and ShareGPT4Video on most benchmarks, despite using much fewer video instruction tuning data.Relation to broader researchMagma is one component of a much larger vision within Microsoft Research for the future of agentic AI systems. Across various teams and projects at Microsoft, we are collectively exploring how AI systems can detect, analyze, and respond in the world to amplify human capabilities.Earlier this month, we announced AutoGen v0.4, a fully reimagined open-source library for building advanced agentic AI systems. While AutoGen focuses on the structure and management of AI agents, Magma enhances those agents by empowering them with a new level of capability. Developers can already use AutoGen to set up an AI assistant that leverages an LLM for planning and dialogue using conventional LLMs. Now with MAGMA, if developers want to build agents that execute both physical or user interface/browser tasks, that same assistant would call upon Magma to understand the environment, perform reasoning, and take a sequence of actions to complete the task.The reasoning ability of Magma can be further developed by incorporating test-time search and reinforcement learning, as described in ExACT. ExACT shows an approach for teaching AI agents to explore more effectively, enabling them to intelligently navigate their environments, gather valuable information, evaluate options, and identify optimal decision-making and planning strategies.At the application level, we are also exploring new user experience (UX) powered by foundation models for the next generation of agentic AI systems. Data Formulator is a prime example. Announced late last year, Data Formulator, is an AI-driven visualization tool developed by Microsoft Research that translates high-level analytical intents into rich visual representations by handling complex data transformations behind the scenes.Looking ahead, the integration of reasoning, exploration and action capabilities will pave the way for highly capable, robust agentic AI systems.Magma is available on Azure AI Foundry Labs (opens in new tab) as well as on HuggingFace (opens in new tab) with an MIT license. Please refer to the Magma project page (opens in new tab) for more technical details. We invite you to test and explore these cutting-edge agentic model innovations from Microsoft Research.Opens in a new tab0 Commenti 0 condivisioni 188 Views
-
WWW.MICROSOFT.COMExploring the structural changes driving protein function with BioEmu-1From forming muscle fibers to protecting us from disease,proteins play an essential role in almost all biological processes in humans and other life forms alike. There has been extraordinary progress in recent years toward better understanding protein structures using deep learning, enabling the accurate prediction of protein structuresfrom their amino acid sequences. However, predicting a single protein structure from its amino acid sequence is like looking at a single frame of a movieit offers only a snapshot of a highly flexible molecule. Biomolecular Emulator-1 (BioEmu-1) is a deep-learning model that provides scientists with a glimpse into the rich world of different structures each protein can adopt, or structural ensembles, bringing us a step closer to understanding how proteins work. A deeper understanding of proteins enables us to design more effective drugs, as many medications work by influencing protein structures to boost their function or prevent them from causing harm.One way to model different protein structures is through molecular dynamics (MD) simulations. These tools simulate how proteins move and deform over time and are widely used in academia and industry. However, in order to simulate functionally important changes in structure, MD simulations must be run for a long time. This is a computationally demanding task and significant effort has been put into accelerating simulations, going as far as designing custom computer architectures (opens in new tab). Yet, even with these improvements, many proteins remain beyond what is currently possible to simulate and would require simulation times of years or even decades.Enter BioEmu-1 (opens in new tab)a deep learning model that can generate thousands of protein structures per hour on a single graphics processing unit. Today, we are making BioEmu-1 open-source (opens in new tab), following our preprint (opens in new tab) from last December, to empower protein scientists in studying structural ensembles with our model.It provides orders of magnitude greater computational efficiency compared to classical MD simulations, thereby opening the door to insights that have, until now, been out of reach.Spotlight: Microsoft research newsletterMicrosoft Research NewsletterStay connected to the research community at Microsoft.Subscribe todayOpens in a new tab We have enabled this by training BioEmu-1 on three types of data sets: (1) AlphaFold Database (AFDB) (opens in new tab) structures (2) an extensive MD simulation dataset, and (3) an experimental protein folding stability dataset (opens in new tab). Training BioEmu-1 on the AFDB structures is like mapping distinct islands in a vast ocean of possible structures. When preparing this dataset, we clustered similar protein sequences so that BioEmu-1 can recognize that a protein sequence maps to multiple distinct structures. The MD simulation dataset helps BioEmu-1 predict physically plausible structural changes around these islands, mapping out the plethora of possible structures that a single protein can adopt. Finally, through fine-tuning on the protein folding stability dataset, BioEmu-1 learns to sample folded and unfolded structures with the right probabilities.Figure 1: BioEmu-1 predicts diverse structures of LapD protein unseen during training. We sampled structures independently and reordered the samples to create a movie connecting two experimentally known structures.Combining these advances, BioEmu-1 successfully generalizes to unseen protein sequences and predicts multiple structures. In Figure 1, we show that BioEmu-1can predict structures of the LapD protein (opens in new tab) from Vibrio cholerae bacteria, whichcauses cholera.BioEmu-1 predicts structures of LapD when it is bound and unbound with c-di-GMP molecules, both of which are experimentally known but not in the training set.Furthermore, our model offers a view on intermediate structures, which have never been experimentally observed, providing viable hypotheses about how this protein functions. Insights into how proteins function pave the way for further advancements in areas like drug development.Figure 2: BioEmu-1 reproduces the D. E. Shaw research(DESRES) simulation of Protein G accurately with a fraction of the computational cost. On the top, we compare the distributions of structures obtained by extensive MD simulation (left) and independent sampling from BioEmu-1 (right). Three representative sample structures are shown at the bottom.Moreover, BioEmu-1 reproduces MD equilibrium distributions accurately with a tiny fraction of the computational cost. In Figure 2, we compare 2D projections of the structural distribution of D. E. Shaw research (DESRES) simulation of Protein G (opens in new tab) and samples from BioEmu-1. BioEmu-1 reproduces the MD distribution accurately, while requiring 10,000-100,000 times fewer GPU hours.Figure 3: BioEmu-1 accurately predicts protein stability. On the left, we plot the experimentally measured free energy differences G against those predicted by BioEmu-1. On the right, we show a protein in folded and unfolded structures.Furthermore, BioEmu-1 accurately predicts protein stability, which we measure by computing the folding free energiesa way to quantify the ratio between the folded and unfolded states of a protein. Protein stability is an important factor when designing proteins, e.g., for therapeutic purposes. Figure 3 shows the folding free energies predicted by BioEmu-1, obtained by sampling protein structures and counting folded versus unfolded protein structures, compared against experimental folding free energy measurements. We see that even on sequences that BioEmu-1 has never seen during training, the predicted free energy values correlate well with experimental values.Professor Martin Steinegger (opens in new tab) of Seoul National University, who was not part of the study, says With highly accurate structure prediction, protein dynamics is the next frontier in discovery. BioEmu marks a significant step in this direction by enabling blazing-fast sampling of the free-energy landscape of proteins through generative deep learning.We believe that BioEmu-1 is a first step toward generating the full ensemble of structures that a protein can take. In these early days, we are also aware of its limitations. With this open-source release, we hope scientists will start experimenting with BioEmu-1, helping us carve out its potentials and shortcomings so we can improve it in the future. We are looking forward to hearing how it performs on variousproteins you care about.AcknowledgementsBioEmu-1 is the result of highly collaborative team effort at Microsoft Research AI for Science. The full authors: Sarah Lewis, Tim Hempel, Jos Jimnez-Luna, Michael Gastegger, Yu Xie, Andrew Y. K. Foong, Victor Garca Satorras, Osama Abdin, Bastiaan S. Veeling, Iryna Zaporozhets, Yaoyi Chen, Soojung Yang, Arne Schneuing, Jigyasa Nigam, Federico Barbero, Vincent Stimper, Andrew Campbell, Jason Yim, Marten Lienen, Yu Shi, Shuxin Zheng, Hannes Schulz, Usman Munir, Ryota Tomioka, Cecilia Clementi, Frank NoOpens in a new tab0 Commenti 0 condivisioni 197 Views
-
WWW.MICROSOFT.COMIdeas: Quantum computing redefined with Chetan NayakTranscript[TEASER][MUSIC PLAYS UNDER DIALOGUE]CHETAN NAYAK: People sometimes say, well, quantum computers are just going to be like classical computers but faster. And thats not the case. So I really want to emphasize the fact that quantum computers are an entirely different modality of computing. You know, there are certain problems which quantum computers are not just faster at than classical computers but quantum computers can solve and classical computers have no chance of solving.[TEASER ENDS]GRETCHEN HUIZINGA: Youre listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. Im Gretchen Huizinga. In this series, well explore the technologies that are shaping our future and the big ideas that propel them forward.[MUSIC FADES]My guest today is Dr. Chetan Nayak, a technical fellow of Quantum Hardware at Microsoft Quantum. Under Chetans leadership, the Microsoft Quantum team has published a paper that demonstrates a fundamental operation for a scalable topological quantum computer. The team also announced the creation of the worlds first topoconductormore on that laterand first QPU architecture with a topological core, called the Majorana 1. Chetan Nayak, I cant wait to find out what all of this is welcome to Ideas!CHETAN NAYAK: Thank you. Thanks for having me. And Im excited to tell you about this stuff.HUIZINGA: Well, you have a huge list of accomplishments, accolades, and awardslittle alliteration there. But I want to start by getting to know a bit more about you and what got you there. So specifically, whats your research origin story, as it were? What big idea inspired you to study the smallest parts of the universe?NAYAK: Its a great question. I think if I really have to go back to the origin story, it starts when I was a kid, you know, probably a preteen. And, you know, Id go to bookstores to I know, I guess many of the people listening to this may not know what that is, [LAUGHTER] but there used to be these brick-and-mortar storefronts where they would sell books, physical books, HUIZINGA: Right.NAYAK: and Id go to bookstores to, you know, to buy books to read, you know, fiction. But I would browse through them, and thered be a nonfiction section. And often thered be used books, you know, sometimes used textbooks or used popular science books. And I remember, even though they were bookstores, not libraries, I would spend a lot of time there leafing through books and got exposed toaccidentally exposed toa lot of ideas that I wouldnt otherwise have been. You know, just, sort of, you know, I maybe went there, you know, looking to pick up the next Lord of the Rings book, and while I was there, you know, wander into a book that was sort of explaining the theory of relativity to non-scientists. And I remember leafing through those books and actually reading about Einsteins discoveries, you know, most famously E = mc2, but actually a lot of those books were explaining these thought experiments that Einstein did where he was thinking about, you know, if he were on a train that were traveling at the speed of light, what would light look like to him? [LAUGHTER] Would he catch up to it? You know, and all these incredible thought experiments that he did to try to figure out, you know, to really play around with the basic laws as they were currently understood, of physics, and by, you know, stretching and pulling them and going into extreme taking them to extreme situations, you could either find the flaws in them or in some cases see what the next steps were. And that was, you know, really inspirational to me. I, you know, around the same time, also started leafing through various advanced math books and a little later picked up a book on calculus and started flipping through it, used book with, like, you know, the cover falling apart and the pages starting to fall out. But there was a lot of, you know, accidental discovery of topics through wandering through bookstores, actually. I also, you know, went to this great magnet high school in New York City called Stuyvesant High School, where I was surrounded by people who were really interested in science and math and technology. So I think, you know, for me, that origin story really starts, you know, maybe even earlier, but at least in my preteen years when, you know, I went through a process of learning new things and trying to understand them in my own way. And the more you do that, eventually you find maybe youre understanding things in a little different way than anybody else ever did. And then pretty soon, you know, youre discovering things that no ones ever discovered before. So thats, sort of, how it started.HUIZINGA: Yeah. Well, I want to drill in a little bit there because youve brought to mind a couple of images. One is from a Harry Potter movie, And the Half-Blood Prince, where he discovers the potions handbook, but its all torn up and they were fighting about who didnt get that book. And it turned out to be so theres you in a bookstore somewhere between the sci-fi and the non-fi, shall we call it. And youre, kind of, melding the two together. And I love how you say, I was accidentally exposed. [LAUGHTER] Sounds kind of like radiation of some kind and youve turned into a scientist. A little bit more on that. This idea of quantum, because youve mentioned Albert Einstein, theres quantum physics, quantum mechanics, now quantum computing. Do these all go together? I mean, what came out of what in that initial, sort of, exploration with you? Where did you start getting interested in the quantum of things?NAYAK: Yeah, so I definitely started with relativity, not quantum. That was the first thing I heard about. And I would say in a lot of ways, thats the easier one. I mean, those are the two big revolutions in physics in the 20th century, relativity and quantum theory, and quantum mechanics is by far, at least for me and for many people, the harder one to get your head around because it is so counterintuitive. Quantum mechanics in some sense, or quantum theory in some sense, for most of what we experience in the world is down many abstraction layers away from what we experience. What I find amazing is that the people who created, you know, discovered quantum mechanics, they had nothing but the equations to guide them. You know, they didnt really understand what they were doing. They knew that there were some holes or gaps in the fundamental theory, and they kind of stumbled into these equations, and they gave the right answers, and they just had to follow it. I was actually just a few weeks ago, I was in Arosa, which is a small Swiss town in the Alps. Thats actually the town where Schrdinger discovered Schrdingers equation.HUIZINGA: No!NAYAK: Yeah, a hundred years ago, this summer HUIZINGA: Amazing!NAYAK: So Schrdinger suffered tuberculosis, which eventually actually killed him much later in his life. And so he went into the mountains HUIZINGA: for the cure.NAYAK: for his health, yeah, to a sanatorium to recover from tuberculosis. And while he was there in Arosa, he discovered his equation. And its a remarkable story because, you know, that equation, he didnt even know what the equation meant. He just knew, well, particles are waves, and waves have wave equations. Because thats ultimately Maxwells equation. You can derive wave equations for light waves and radio waves and microwaves, x-rays. And he said, you know, there has to be a wave equation for this thing and this wave equation needs to somehow correctly predict the energy levels in hydrogen.HUIZINGA: Oh, my gosh.NAYAK: And he, you know, worked out this equation and then solved it, which is for that time period not entirely trivial. And he got correctly the energy levels of hydrogen, which people had the spectra, the different wavelengths of light that hydrogen emits. And lo and behold, it works. He had no idea why. No idea what it even meant. And, um, but knew that he was onto something. And then remarkably, other people were able to build on what hed done, were able to say, no, there must be a grain of truth here, if not the whole story, and lets build on this, and lets make something that is richer and encompasses more and try to understand the connections between this and other things. And Heisenberg was, around the same time, developing his whats called matrix mechanics, a different way of thinking about quantum computing, and then people realize the connections between those, like Dirac. So its a remarkable story how people, how scientists, took these things they understood, you know, imposed on it a certain level of mathematical consistency and a need for the math to predict things that you could observe, and once you had, sort of, the internal mathematical consistency and it was correctly explaining a couple of data points about the world, you could build this huge edifice based on that. And so that was really impressive to me as I learned that. And thats 100 years ago! It was 1925.HUIZINGA: Right. Well, let me NAYAK: And thats quantum mechanics!HUIZINGA: OK.NAYAK: Youre probably going to say, well, how does quantum computing fit into this, you know? [LAUGHTER] Right? And thats a much later development. People spent a long time just trying to understand quantum mechanics, extend it, use it to understand more things, to understand, you know, other particles. So it was initially introduced to understand the electron, but you could understand atoms, molecules, and subatomic things and quarks and positrons. So there was a rich, you know, decades of development and understanding, and then eventually it got combined with relativity, at least to some extent. So there was a lot to do there to really understand and build upon the early discoveries of quantum mechanics. One of those directions, which was kicked off by Feynman around, I think, 1982 and independently by a Russian mathematician named Yuri Manin was, OK, great, you know, todays computers, again, is many abstraction layers away from anything quantum mechanical, and in fact, its sort of separated from the quantum world by many classical abstraction layers. But what if we built a technology that didnt do that? Like, thats a choice. It was a choice. It was a choice that was partially forced on us just because of the scale of the things we could build. But as computers get smaller and smaller and the way Moores law is heading, you know, at some point, youre going to get very close to that point at which you cannot abstract away quantum mechanics, [LAUGHTER] where you must deal with quantum mechanics, and its part and parcel of everything. You are not in the fortunate case where, out of quantum theory has emerged the classical world that behaves the way we expect it to intuitively. And, you know, once we go past that, that potentially is really catastrophic and scary because, you know, youre trying to make things smaller for the sake of, you know, Moores law and for making computers faster and potentially more energy efficient. But, you know, if you get down to this place where the momentum and position of things, of the electrons, you know, or of the currents that youre relying on for computation, if theyre not simultaneously well-defined, how are you going to compute with that? It looks like this is all going to break down. And so it looks like a real crisis. But, you know, what they realized and what Feynman realized was actually its an opportunity. Its actually not just a crisis. Because if you do it the right way, then actually it gives you way more computational power than you would otherwise have. And so rather than looking at it as a crisis, its an opportunity. And its an opportunity to do something that would be otherwise unimaginable.HUIZINGA: Chetan, you mentioned a bunch of names there. I have to say I feel sorry for Dr. Schrdinger because most of what hes known for to people outside your field is a cat, a mysterious cat in a box, meme after meme. But youve mentioned a number of really important scientists in the field of quantum everything. I wonder, who are your particular quantum heroes? Are there any particular, sort of, modern-day 21st-century or 20th-century people that have influenced you in such a way that its like, I really want to go deep here?NAYAK: Well, definitely, you know, the one person I mentioned, Feynman, is later, so hes the second wave, you could say, of, OK, so if the first wave is like Schrdinger and Heisenberg, and you could say Einstein was the leading edge of that first wave, and Planck. But and the second wave, maybe youd say is, is, I dont know, if Dirac is first or second wave. You might say Dirac is second wave and potentially Landau, a great Russian physicist, second wave. Then maybe Feynmans the third wave, I guess? Im not sure if hes second or third wave, but anyway, hes post-war and was really instrumental in the founding of quantum computing as a field. He had a famous statement, which is, you know, in his lectures, Theres always room at the bottom. And, you know, what he was thinking about there was, you can go to these extreme conditions, like very low temperatures and in some cases very high magnetic fields, and new phenomena emerge when you go there, phenomena that you wouldnt otherwise observe. And in a lot of ways, many of the early quantum theorists, to some extent, were extreme reductionists because, you know, they were really trying to understand smaller and smaller things and things that in some ways are more and more basic. At the same time, you know, some of them, if not all of them, at the same time held in their mind the idea that, you know, actually, more complex behaviors emerge out of simple constituents. Einstein famously, in his miracle year of 1905, one of the things he did was he discovered he proposed the theory of Brownian motion, which is an emergent behavior that relies on underlying atomic theory, but it is several layers of abstraction away from the underlying atoms and molecules and its a macroscopic thing. So Schrdinger famously, among the other things, hes the person who came up with the concept of entanglement HUIZINGA: Yes.NAYAK: in understanding his theory. And for that matter, Schrdingers cat is a way to understand the paradoxes that occur when the classical world emerges from quantum mechanics. So they were thinking a lot about how these really incredible, complicated things arise or emerge from very simple constituents. And I think Feynman is one those people who really bridged that as a post-war scientist because he was thinking a lot about quantum electrodynamics and the basic underlying theory of electrons and photons and how they interact. But he also thought a lot about liquid helium and ultimately about quantum computing. Motivation for him in quantum computing was, you have these complex systems with many underlying constituents and its really hard to solve the equation. The equations are basically unsolvable.HUIZINGA: Right.NAYAK: Theyre complicated equations. You cant just, sort of, solve them analytically. Schrdinger was able to do that with his equation because it was one electron, one proton, OK. But when you have, you know, for a typical solid, youll have Avogadros number of electrons and ions inside something like that, theres no way youre going to solve that. And what Feynman recognized, as others did, really, coming back to Schrdingers observation on entanglement, is you actually cant even put it on a computer and solve a problem like that. And in fact, its not just that with Avogadros number you cant; you cant put it on a computer and solve it with a thousand, you know, [LAUGHTER] atoms, right? And actually, you arent even going to be able to do it with a hundred, right. And when I say you cant do that on a computer, its not that, well, datacenters are getting bigger, and were going to have gigawatt datacenters, and then thats the point at which well be able to seeno, the fact is the amazing thing about quantum theory is if, you know, you go from, lets say, youre trying to solve a problem with 1,000 atoms in it. You know, if you go to 1,001, youre doubling the size of the problem. As far as if you were to store it on a cloud, just to store the problem on the classical computer, just to store the answer, I should say, on a classical computer, youd have to double the size. So theres no chance of getting to 100, even if, you know, with all the buildout of datacenters thats happening at this amazing pace, which is fantastic and is driving all these amazing advances in AI, that buildout is never going to lead to a classical computer that can even store the answer to a difficult quantum mechanical problem.HUIZINGA: Yeah, so basically in answer to the who are your quantum heroes, youve kind of given us a little history of quantum computing, kind of, the leadup and the questions that prompted it. So well get back to that in one second, because I want you to go a little bit further on where we are today. But before we do that, youve also alluded to something thats super interesting to me, which is in light of all the recent advances and claims in AI, especially generative AI, that are making claims like well be able to shorten the timeline on scientific discovery and things like that, why then, do we need quantum computing? Why do we need it?NAYAK: Great question, so at least AI is AI and machine learning, at least so far, is only as good as the training data that you have for it. So if you train AI on all the data we have, and if you train AI on problems we can solve, which at some level are classical, you will be able to solve classical problems. Now, protein folding is one of those problems where the solution is basically classical, very complicated and difficult to predict but basically classical, and there was a lot of data on it, right. And so it was clearly a big data problem thats basically classical. As far as we know, theres no classical way to simulate or mimic quantum systems at scale, that theres a clean separation between the classical and quantum worlds. And so, you know, that the quantum theory is the fundamental theory of the world, and there is no hidden classical model that is lurking [LAUGHTER] in the background behind it, and people sometimes would call these things like hidden variable theories, you know, which Einstein actually really was hoping, late in his life, that there was. That there was, hiding behind quantum mechanics, some hidden classical theory that was just obscured from our view. We didnt know enough about it, and the quantum thing was just our best approximation. If thats true, then, yeah, maybe an AI can actually discover that classical theory thats hiding behind the quantum world and therefore would be able to discover it and answer the problems we need to answer. But thats almost certainly not the case. You know, theres just so much experimental evidence about the correctness of quantum mechanics and quantum theory and many experiments that really, kind of, rule out many aspects of such a classical theory that I think were fairly confident there isnt going to be some classical approximation or underlying theory hiding behind quantum mechanics. And therefore, an AI model, which at the end of the day is some kind of very large matrixyou know, a neural network is some very large classical model obeying some very classical rules about, you take inputs and you produce outputs through many layersthat thats not going to produce, you know, a quantum theory. Now, on the other hand, if you have a quantum computer and you can use that quantum computer to train an AI model, then the AI model is learningyoure teaching it quantum mechanicsand at least within a certain realm of quantum problems, it can interpolate what weve learned about quantum mechanics and quantum problems to solve new problems that, you know, you hadnt already solved. Actually, you know, like I said, in the early days, I was reading these books and flipping through these bookstores, and Id sometimes figure out my own ways to solve problems different from how it was in the books. And then eventually I ended up solving problems that hadnt been solved. Well, thats sort of what an AI does, right? It trains off of the internet or off of playing chess against itself many times. You know, it learns and then takes that and eventually by learning its own way to do things, you know, it learns things that we as humans havent discovered yet.HUIZINGA: Yeah.NAYAK: And it could probably do that with quantum mechanics if it were trained on quantum data. So, but without that, you know, the world is ultimately quantum mechanical. Its not classical. And so something classical is not going to be a general-purpose substitute for quantum theory.HUIZINGA: OK, Chetan, this is fascinating. And as youve talked about pretty well everything so far, thats given us a really good, sort of, background on quantum history as we know it in our time. Talk a little bit about where we are now, particularlyand were going get into topology in a minute, topological stuffbut I want to know where you feel like the science is now, and be as concise as you can because I really want get to your cool work that were going to talk about. And this question includes, whats a Majorana and why is it important?NAYAK: Yeah. So OK, unfortunately, it wont be that concise an answer. OK, so, you know, early 80s, ideas about quantum computing were put forward. But I think most people thought, A, this is going to be very difficult, you know, to do. And I think, B, it wasnt clear that there was enough motivation. You know, I think Feynman said, yes, if you really want to simulate quantum systems, you need a quantum computer. And I think at that point, people werent really sure, is that the most pressing thing in the world? You know, simulating quantum systems? Its great to understand more about physics, understand more about materials, understand more about chemistry, but we werent even at that stage, I think, there where, hey, thats the limiting thing thats limiting progress for society. And then, secondly, there was also this feeling that, you know, what youre really doing is some kind of analog computing. You know, this doesnt feel digital, and if it doesnt feel digital, theres this question about error correction and how reliable is it going to be. So Peter Shor actually, you know, did two amazing things, one of which is a little more famous in the general public but one of which is probably more important technically, is he did these two amazing things in the mid-90s. He first came up with Shors algorithm, where he said, if you have a quantum computer, yeah, great for simulating quantum systems, but actually you can also factor large numbers. You can find the prime factors of large numbers, and the difficulty of that problem is the underlying security feature under RSA [encryption], and many of these public key cryptography systems rely on certain types of problems that are really hard. Its easy to multiply two large primes together and get the output, and you can use that to encrypt data. But to decrypt it, you need to know those two numbers, and its hard to find those factors. What Peter Shor discovered is that ideally, a quantum computer, an ideal quantum computer, would be really good at this, OK. So that was the first discovery. And at that point, what seemed at the time an academic problem of simulating quantum systems, which seemed like in Feynmans vision, thats what quantum computers are for, that seemingly academic problem, all of a sudden, also, you know, it turns out theres this very important both financially and economically and national security-wise other application of a quantum computer. And a lot of people sat up and took notice at that point. So thats huge. But then theres a second thing that he, you know, discovered, which was quantum error correction. Because everyone, when he first discovered it, said, sure, ideally thats how a quantum computer works. But quantum error correction, you know, this thing sounds like an analog system. How are you going to correct errors? This thing will never work because itll never operate perfectly. Schrdingers problem with the cats going to happen, is that youre going to have entanglement. The thing is going to just end up being basically classical, and youll lose all the supposed gains youre getting from quantum mechanics. And quantum error correction, that second discovery of Peter Shors, really, you know, suddenly made it look like, OK, at least in principle, this thing can happen. And people built on that. Peter Shors original quantum error correction, I would say, it was based on a lot of ideas from classical error correction. Because you have the same problem with classical communication and classical computing. Alexei Kitaev then came up with, you know, a new set of quantum error correction procedures, which really dont rely in the same way on classical error correction. Or if they do, its more indirect and in many ways rely on ideas in topology and physics. And, you know, those ideas, which lead to quantum error correcting codes, but also ideas about what kind of underlying physical systems would have built-in hardware error protection, led to what we now call topological quantum computing and topological qubits, because its this idea that, you know, just like people went from the early days of computers from vacuum tubes to silicon, actually, initially germanium transistors and then silicon transistors, that similarly that you had to have the right underlying material in order to make qubits.HUIZINGA: OK.NAYAK: And that the right underlying material platform, just as for classical computing, its been silicon for decades and decades, it was going to be at one of these so-called topological states of matter. And that these would be states of matter whose defining feature, in a sense, would be that they protect quantum information from errors, at least to some extent. Nothings perfect, but, you know, in a controllable way so that you can make it better as needed and good enough that any subsequent error correction that you might call software-level error correction would not be so cumbersome and introduce so much overhead as to make a quantum computer impractical. I would say, you know, there were these the field had a, I would say, a reboot or a rebirth in the mid-1990s, and pretty quickly those ideas, in addition to the applications and algorithms, you know, coalesced around error correction and whats called fault tolerance. And many of those ideas came, you know, freely interchanged between ideas in topology and the physics of what are called topological phases and, you know, gave birth to this, I would say, to the set of ideas on which Microsofts program has been based, which is to look for the right material create the right material and qubits based on it so that you can get to a quantum computer at scale. Because theres a number of constraints there. And the work that were really excited about right now is about getting the right material and harnessing that material for qubits.HUIZINGA: Well, lets talk about that in the context of this paper that youre publishing and some pretty big news in topology. You just published a paper in Nature that demonstrateswith receiptsa fundamental operation for a scalable topological quantum computer relying on, as I referred to before, Majorana zero modes. Thats super important. So tell us about this and why its important.NAYAK: Yeah, great. So building on what I was just saying about having the right material, what were relying on is, to an extent, is superconductivity. So thats one of the, you know, really cool, amazing things about the physical world. That many metals, including aluminum, for instance, when you cool them down, theyre able to carry electricity with no dissipation, OK. No energy loss associated with that. And that property, the remarkable that property, what underlies it is that the electrons form up into pairs. These things called Cooper pairs. And those Cooper pairs, their wave functions kind of lock up and go in lockstep, and as a result, actually the number of them fluctuates wildly, you know, in any place locally. And that enables them to, you know, to move easily and carry current. But also, a fundamental feature, because they form pairs, is that theres a big difference between an even and odd number of electrons. Because if theres an odd electron, then actually theres some electron thats unpaired somewhere, and theres an energy penalty associated, an energy cost to that. It turns out that thats not always true. Theres actually a subclass of superconductors called topological superconductors, or topoconductors, as we call them, and topoconductors have this amazing property that actually theyre perfectly OK with an odd number of electrons! In fact, when theres an odd number of electrons, there isnt any unpaired electron floating around. But actually, topological superconductors, they dont have that. Thats the remarkable thing about it. Ive been warned not to say what Im about to say, but Ill just go ahead [LAUGHTER] and say it anyway. I guess thats bad way to introduce something HUIZINGA: No, its actually really exciting!NAYAK: OK, but since you brought up, you know, Harry Potter and the Half-Blood Prince, you know, Voldemort famously split his soul into seven or, I guess, technically eight, accidentally. [LAUGHTER] He split his soul into seven Horcruxes, so in some sense, there was no place where you could say, well, thats where his soul is.HUIZINGA: Oh, my gosh!NAYAK: So Majorana zero modes do kind of the same thing! Like, theres this unpaired electron potentially in the system, but you cant find it anywhere. Because to an extent, youve actually figured out a way to split it and put it you know, sometimes we say like you put it at the two ends of the system, but thats sort of a mathematical construct. The reality is there is no place where that unpaired electron is!HUIZINGA: Thats crazy. Tell me, before you go on, were talking about Majorana. I had to look it up. Thats a guys name, right? So do a little dive into what this whole Majorana zero mode is.NAYAK: Yeah, so Majorana was an Italian physicist, or maybe technically Sicilian physicist. He was very active in the 20s and 30s and then just disappeared mysteriously around 1937, 38, around that time. So no one knows exactly what happened to him. You know, but one of his last works, which I think may have only been published after he disappeared, he proposed this equation called the Majorana equation. And he was actually thinking about neutrinos at the time and particles, subatomic particles that carry no charge. And so, you know, he was thinking about something very, very different from quantum computing, actually, right. So Majoranadidnt know anything about quantum computing, didnt know anything about topological superconductors, maybe even didnt know much about superconductivity at allwas thinking about subatomic particles, but he wrote down this equation for neutral objects, or some things that dont carry any charge. And so when people started, you know, in the 90s and 2000s looking at topological superconductors, they realized that there are these things called Majorana zero modes. So, as I said, and let me explain how they enter the story, so Majorana zero modes are I just said that topological superconductors, theres no place you can find that even or odd number of electrons. Theres no penalty. Now superconductors, they do have a penaltyand its called the energy gapfor breaking a pair. Even topological superconductors. You take a pair, a Cooper pair, you break it, you have to pay that energy cost, OK. And its, like, double the energy, in a sense, of having an unpaired electron because youve created two unpaired electrons and you break that pair. Now, somehow a topological superconductor has to accommodate that unpaired electron. It turns out the way it accommodates it is it can absorb or emit one of these at the ends of the wire. If you have a topological superconductor, a topoconductor wire, at the ends, it can absorb or emit one of these things. And once it goes into one end, then its totally delocalized over the system, and you cant find it anywhere. You can say, oh, it got absorbed at this end, and you can look and theres nothing you can tell. Nothing has changed about the other end. Its now a global property of the whole thing that you actually need to somehow figure out, and Ill come to this, somehow figure out how to connect the two ends and actually measure the whole thing collectively to see if theres an even or odd number of electrons. Which is why its so great as a qubit because the reason its hard for Schrdingers cat to be both dead and alive is because youre going to look at it, and then you look at it, photons are going to bounce off it and youre going to know if its dead or alive. And the thing is, the thing that was slightly paradoxical is actually a person doesnt have to perceive it. If theres anything in the environment that, you know, if a photon bounces off, its sort of like if a tree falls in the forest HUIZINGA: I was just going to say that!NAYAK: it still makes a sound. I know! It still makes a sound in the sense that Schrdingers cat is still going to be dead or alive once a photon or an air molecule bounces off it because of the fact that its gotten entangled with, effectively, the rest of the universe you know many other parts of the universe at that point. And so the fact that there is no place where you can go and point to that unpaired electron means it does that even or oddness which we call parity, whether somethings even or odd is parity. And, you know, these are wires with, you know, 100 million electrons in them. And its a difference between 100 million and 100 million and one. You know, because ones an even or odd number. And that difference, you have to be able to, like, the environment cant detect it. So it doesnt get entangled with anything, and so it can actually be dead and alive at the same time, you know, unlike Schrdingers cat, and thats what you need to make a qubit, is to create those superpositions. And so Majorana zero modes are these features of the system that actually dont actually carry an electrical charge. But they are a place where a single unpaired electron can enter the system and then disappear. And so they are this remarkable thing where you can hide stuff. [LAUGHS]HUIZINGA: So how does that relate to your paper and the discoveries that youve made here?NAYAK: Yeah, so in an earlier paper so now the difficulty is you have to actually make this thing. So, you know, you put a lot of problems up front, is that youre saying, OK, the solution to our problem is we need this new material and we need to harness it for qubits, right. Great. Well, where are we going to get this material from, right? You might discover it in nature. Nature may hand it to you. But in many cases, it doesnt. And thats this is one of those cases where we actually had to engineer the material. And so engineering the material is, it turns out to be a challenge. People had ideas early on that they could put some combination of semiconductors and superconductors. But, you know, for us to really make progress, we realized that, you know, its a very particular combination. And we had to developand we did developsimulation capabilities, classical. Unfortunately, we dont have a quantum computer, so we had to do this classically with classical computers. We had to classically simulate various kinds of materials combinations to find one, or find a class, that would get us into the topological phase. And it turned out lots of details mattered there, OK. It involves a semiconductor, which is indium arsenide. Its not silicon, and its not the second most common semiconductor, which is gallium nitride, which is used in LED lights. Its something called indium arsenide. It has some uses as an infrared detector, but its a different semiconductor. And were using it in a nonstandard way, putting it into contact with aluminum and getting, kind of, the best of both worlds of a superconductor and a semiconductor so that we can control it and get into this topological phase. And thats a previously published paper in American Physical [Society] journal. But thats great. So that enables that shows that you can create this state of matter. Now we need to then build on it; we have to harness it, and we have to, as I said, we have to make one of these wires or, in many cases, multiple wires, qubits, et cetera, complex devices, and we need to figure out, how do we measure whether we have 100 million or 100 million and one electrons in one of these wires? And that was the problem we solved, which is we made a device where we took something called a quantum dotyou should think of [it] as a tiny little capacitorand that quantum dot is coupled to the wire in such a way that the coupling that an electronits kind of remarkablean electron can quantum mechanically tunnel from you know, this is like an electron, you dont know where it is at any given time. You know, its momentum and its position arent well defined. So its, you know, an electron whose, lets say, energy is well defined actually, there is some probability amplitude that its on the wire and not on the dot. Even though it should be on the dot, it actually can, kind of, leak out or quantum mechanically end up on the wire and come back. And because of that factthe simple fact that its quantum mechanical wave function can actually have it be on the wireit actually becomes sensitive to that even or oddness.HUIZINGA: Interesting.NAYAK: And that causes a small change in the capacitance of this tiny little parallel plate capacitor, effectively, that we have. And that tiny little change in capacitance, which is, just to put into numbers, is the femtofarad, OK. So thats a decimal point followed by, you know, 15 zeros and a one 14 zeros and a one. So thats how tiny it is. That that tiny change in the capacitance, if we put it into a larger resonant circuit, then that larger resonant circuit shows a small shift in its resonant frequency, which we can detect. And so what we demonstrated is we can detect the difference, that one electron difference, that even or oddness, which is, again, its not local property of anywhere in the wire, that we can nevertheless detect. And thats, kind of, the fundamental thing you have to have if you want to be able to use these things for quantum information processing, you know, this parity, you have to be able to measure what that parity is, right. Thats a fundamental thing. Because ultimately, the information you need is classical information. Youre going to want to know the answer to some problem. Its going to be a string of zeros and ones. You have to measure that. But moreover, the particular architecture were using, the basic operations for us are measurements of this type, which is a its a very digital process. The process I mentioned, sort of, how quantum computing looks a little analog in some ways, but its not really analog. Well, thats very manifestly true in our architecture, that our operations are a succession of measurements that we turn on and off, but different kinds of measurements. And so what the paper shows is that we can do these measurements. We can do them fast. We can do them accurately.HUIZINGA: OK.NAYAK: And the additional, you know, announcements that were making, you know, right now are work that weve done extending and building on that with showing additional types of measurements, a scalable qubit design, and then building on that to multi-qubit arrays.HUIZINGA: Right.NAYAK: So that really unlocked our ability to do a number of things. And I think you can see the acceleration now with the announcements we have right now.HUIZINGA: So, Chetan, youve just talked about the idea of living in a classical world and having to simulate quantum stuff.NAYAK: Yup.HUIZINGA: Tell us about the full stack here and how we go from, in your mind, from quantum computing at the bottom all the way to the top.NAYAK: OK, so one thing to keep in mind is quantum computers are not a general-purpose accelerator for every problem. You know, so people sometimes say, well, quantum computers are just going to be like classical computers but faster. And thats not the case. So I really want to emphasize the fact that quantum computers are an entirely different modality of computing. You know, there are certain problems which quantum computers are not just faster at than classical computers but quantum computers can solve and classical computers have no chance of solving. On the other hand, there are lots of things that classical computers are good at that quantum computers arent going to be good at, because its not going to give you any big scale up. Like a lot of big data problems where you have lots of classical data, you know, a quantum computer with, lets say, lets call it 1,000 qubits, and here I mean 1,000 logical qubits, and we come back to what that means, but 1,000 error-corrected qubits can solve problems that you have no chance of solving with a classical computer, even with all the worlds computing. But in fact, if it were a 1,000 qubits, you would have to take every single atom in the entire universe, OK, and turn that into a transistor, and it still wouldnt be big enough. You dont have enough bytes, even if every single atom in the universe were a byte. So thats how big these quantum problems are when you try to store them on a classical computer, just to store the answer, lets say.HUIZINGA: Yeah.NAYAK: But conversely, if you have a lot of classical data, like all the data in the internet, which we train, you know, our AI models with, you cant store that on 1,000 qubits, right. You actually cant really store more than 1,000 bits of classical information on 1,000 qubits. So many things that we have big data in classically, we dont have the ability to really, truly store within a quantum computer in a way that you can do anything with it. So we should definitely not view quantum computers as replacing classical computers. Theres lots of things that classical computers are already good at and were not trying to do those things. But there many things that classical computers are not good at all. Quantum computer we should think of as a complimentary thing, an accelerator for those types of problems. It will have to work in collaboration with a classical computer that is going to do the classical steps, and the quantum computer will do the quantum steps. So thats one thing to just keep in mind. When we talk about a quantum computer, it is part of a larger computing, you know, framework where there are many classical elements. It might be CPUs, it might be GPUs, might be custom ASICs for certain things, and then quantum computer, you know, a quantum processor, as well. So HUIZINGA: Is that called a QPU?NAYAK: A QPU is the quantum processing unit, exactly! So well have CPUs, GPUs, and QPUs. And so that is, you know, at the lowest layer of that stack, is the underlying substrate, physical substrate. Thats our topoconductor. Its the material which we build our QPUs. Thats the quantum processing unit. The quantum processing unit includes all of the qubits that we have in our architecture on a single chip. And thats, kind of, one of the big key features, key design features, that the qubits be small and small and manufacturable on a single wafer. And then the QPU also has to enable that quantum world to talk to the classical world HUIZINGA: Right.NAYAK: because you have to send it, you know, instructions and you have to get back answers. And for us, that is turning on and off measurements because our instructions are a sequence of measurements. And then, we ultimately have to get back a string of zeros and ones. But that initially is these measurements where were getting, you know, phase shifts on microwaves, and which are in turn telling us about small capacitance shifts, which are in turn telling us the parity of electrons in a wire.HUIZINGA: Right.NAYAK: So really, this is a quantum machine in which, you know, you have the qubits that are built on the quantum plane. Youve then got this quantum-classical interface where the classical information is going in and out of the quantum processor. And then theres a lot of classical processing that has to happen, both to enable error correction and to enable computations. And the whole thing has to be inside of a cryogenic environment. So its a very special environment in which we in which, A, its kept cold because thats what you need in order to have a topoconductor, and thats also what you need in order just in general for the qubits to be very stable. So that when we talk about the full stack, just on the hardware side, there are many layers to this. And then of course, you know, there is the classical firmware that takes instructions and turns them into the physical things that need to happen. And then, of course, we have algorithms and then ultimately applications. HUIZINGA: Yeah, so I would say, Chetan, that people can probably go do their own little research on how you go from temperatures that are lower than deep space to the room youre working in. And we dont have time to unpack that on this show. And also, I was going to ask you what could possibly go wrong if you indeed got everything right. And you mentioned earlier about, you know, what happens in an AI world if we get everything right. If you put quantum and AI together, its an interesting question, what that world looks like. Can you just take a brief second to say that youre thinking about what could happen to cryptography, to, you know, just all kinds of things that we might be wondering about in a post-quantum world?NAYAK: Great question. So, you know, first of all, you know, one of the things I want to, kind of, emphasize is, ultimately, a lot of, you know, when we think about the potential for technology, often the limit comes down to physics. There are physics limits. You know, if you think about, like, interstellar travel and things like that, well, the speed of light is kind of a hard cutoff, [LAUGHTER] and actually, youre not going to be able to go faster than the speed light, and you have to bake that in. That ultimately, you know, if you think of a datacenter, ultimately, like theres a certain amount of energy, and theres a certain amount of cooling power you have. And you can say, well, this datacenter is 100 megawatts, and then in the future, well have a gigawatt to use it. But ultimately, then that energy has to come from somewhere, and youve got some hard physical constraints. So similarly, you could ask, you know, with quantum computers, what are the hard physical constraints? What are the things that just because you cant make a perpetual motion machine; you cant violate, I think, laws of quantum mechanics. And I think in the early days, there was this concern that, you know, this idea relies on violating something. Youre doing something thats not going to work. You know, Id say the theory of quantum error correction, the theory of fault tolerance, you know, many of the algorithms have been developed, they really do show that there is no fundamental physical constraint saying that this isnt going to happen, you know. That, you know, that somehow you would need to have either more power than you can really generate or you would need to go much colder than you can actually get. That, you know, theres no physical, you know, no-go result. So thats an important thing to keep in mind. Now, the thing is, some people might then be tempted to say, well, OK, now its just an engineering problem because we know this in principle can work, and we just have to figure out how to work. But the truth is, there isnt any such, like, hard barrier where you say, well, oh, up until here, its fundamental physics, and then beyond this, its just an engineering problem. The reality is, you know, new difficulties and challenges arise every step along the way. And one person might call it an engineering or an implementation challenge, and one person may call it a fundamental, you know, barrier obstruction, and I think people will probably profitably disagree, you know, agree to disagree on, like, where that goes. I think for us, like, it was really crucial, you know, as we look out at a scale to realize quantum computers are going to really make an impact. Were going to need thousands, you know, hundreds to thousands of logical qubits. That is error-corrected qubits. And when you look at what that means, that means really million physical qubits. That is a very large scale in a world in which people have mostly learned what we know about these things from 10 to 100 qubits. To project out from that to a million, you know, it would surprise me if the solutions that are optimal for 10 to 100 qubits are the same solutions that are optimal for a million qubits, right.HUIZINGA: Yeah.NAYAK: And that has been a motivation for us, is lets try to think, based on what we now know, of things that at least have a chance to work at that million qubit. Lets not do anything that looks like its going to clearly hit a dead end before then.HUIZINGA: Right.NAYAK: Now, obviously in science, nothing is certain, and you learn new things along the way, but we didnt want to start out with things that looked like they were not going to be, you know, work for a million qubits. That was the reason that we developed this new material, that we created this, engineered this new material, you know, these topoconductors, precisely because we said we need to have a material that can give us something where we can operate it fast and make it small and be able to control these things. So, you know, I think thats one key thing. And, you know, what weve demonstrated now is that we can harness this; that weve got a qubit. And thats why we have a lot of confidence that, you know, these are things that arent going to be decades away. That these things are going to be years away. And that was the basis for our interaction with DARPA [Defense Advanced Research Projects Agency]. Weve just been signed a contract with DARPA to go into the next phase of the DARPA US2QC program. And, you know, DARPA, the US government, wants to see a fault-tolerant quantum computer. And because they do not want any surprises.HUIZINGA: Right?!? [LAUGHS]NAYAK: And, you know, there are people out there who said, you know, quantum computers are decades away; dont worry about it. But I think the US government realizes they might be years, not decades away, and they want to get ahead of that. And so thats why theyve entered into this agreement with us and the contract with us.HUIZINGA: Yeah.NAYAK: And so that is, you know, the thing I just want to make sure that, you know, listeners to the podcast understand that we are, you know, the reason that we fundamentally re-engineered, re-architected, what we think a quantum computer should look like and what the qubit should be and even going all the way down to the underlying materials was which is high risk, right? I mean, there was no guarantee theres no guarantee that any of this is going to work, A. And, B, there was no guarantee we would even be able to do the things weve done so far. I mean, you know, thats the nature of it. If youre going to try to do something really different, youre going to have to take risks. And we did take risks by really starting at, you know, the ground floor and trying to redesign and re-engineer these things. So that was a necessary part of this journey and the story, was for us to re-engineer these things in a high-risk way. What that leads to is, you know, potentially changing that timeline. And so in that context, its really important to make this transition to post-quantum crypto because, you know, the cryptography systems in use up until now are things that are not safe from quantum attacks if you have a utility-scale quantum computer. We do know that there are crypto systems which, at least as far as we know, appear to be safe from quantum attacks. Thats whats called post-quantum cryptography. You know, they rely on different types of hard math problems, which quantum computers arent probably good at. And so, you know, and changing over to a new crypto standard isnt something that happens at the flip of a switch.HUIZINGA: No.NAYAK: Its something that takes time. You know, first, you know, early part of that was based around the National Institute of Standards and Technology aligning around one or a few standard systems that people would implement, which they certified would be quantum safe and, you know, those processes have occurred. And so now is the time to switch over. Given that we know that we can do this and that it wont happen overnight, nows the time to make that switch.HUIZINGA: And weve had several cryptographers on the show whove been working on this for years. Its not like theyre just starting. They saw this coming even before you had some solidity in your work. But listen, I would love to talk to you for hours, but were coming to a close here. And as we close, I want to refer to a conversation you had with distinguished university professor Sankar Das Sarma. He suggested that with the emergence of Majorana zero modes, you had reached the end of the beginning and that you were now sort of embarking on the beginning of the end in this work. Well, maybe thats a sort of romanticized vision of what it is. But could you give us a little bit of a hint on what are the next milestones on your road to a scalable, reliable quantum computer, and whats on your research roadmap to reach them?NAYAK: Yeah, so interestingly, we actually just also posted on the arXiv a paper that shows some aspects of our roadmap, kind of the more scientific aspects of our roadmap. And that roadmap is, kind of, continuously going from the scientific discovery phase through the engineering phase, OK. Again, as I said, its a matter of debate and even taste of what exactly you want to call scientific discovery versus engineering, butwhich will be hotly debated, Im surebut it is definitely a continuum thats going more towards from one towards the other. And I would say, you know, at a high level, logical qubits, you know, error-corrected, reliable qubits, are, you know, the basis of quantum computation at scale and developing, demonstrating, and building those logical qubits and logic qubits at scale is kind of a big thing thatfor us and for the whole industryis, I would say, is, sort of, the next level of quantum computing. Jason Zander wrote this blog where he talked about level one, level two, level three, where level one was this NISQnoisy intermediate-scale quantumera; level two is foundations of, you know, reliable and logical qubits; and level three is the, you know, at-scale logical qubits. I think were heading towards level two, and so in my mind, thats sort of, you know, the next North Star is really around that. I think there will be a lot of very interesting and important things that are more technical and maybe are not as accessible to a big audience. But Id say thats, kind of, the I would say, if youre, you know, a thing to keep in mind as a big exciting thing happening in the field.HUIZINGA: Yeah. Well, Chetan Nayak, what a ride this show has been. Im going to be watching this spaceand the timelines thereof because they keep getting adjusted![MUSIC]Thank you for taking time to share your important work with us today.NAYAK: Thank you very much, my pleasure![MUSIC FADES]0 Commenti 0 condivisioni 201 Views
-
WWW.MICROSOFT.COMIntroducing Muse: Our first generative AI model designed for gameplay ideationToday, the journal Nature (opens in new tab) is publishing our latest research, which introduces the first World and Human Action Model (WHAM). The WHAM, which weve named Muse, is a generative AI model of a video game that can generate game visuals, controller actions, or both.The paper in Nature offers a detailed look at Muse, which was developed by the Microsoft Research Game Intelligence (opens in new tab) and Teachable AI Experiences (opens in new tab)(Tai X) teams in collaboration with Xbox Games Studios Ninja Theory (opens in new tab). Simultaneously, to help other researchers explore these models and build on our work, we are open sourcing the weights and sample data and making the executable available for the WHAM Demonstratora concept prototype that provides a visual interface for interacting with WHAM models and multiple ways of prompting the models. Developers can learn and experiment with the weights, sample data, and WHAM Demonstrator on Azure AI Foundry (opens in new tab).In our research, we focus on exploring the capabilities that models like Muse need to effectively support human creatives. Im incredibly proud of our teams and the milestone we have achieved, not only by showing the rich structure of the game world that a model like Muse can learn, as you see in the video demo below, but also, and even more importantly, by demonstrating how to develop research insights to support creative uses of generative AI models.Generated gameplay examplesExample gameplay sequences generated by Muse (based on WHAM-1.6B) demonstrate that our model can generate complex gameplay sequences that are consistent over several minutes. All examples shown here were generated by prompting the model with 10 initial frames (1 second) of human gameplay and the controller actions of the whole play sequence. Muse is used in world model mode meaning that it is used to predict how the game will evolve from the initial prompt sequence. The more closely the generated gameplay sequence resembles the actual game, the more accurately Muse has captured the dynamics of that game.What motivated this research?As we release our research insights and model today, I keep thinking back to how this all started. There was a key moment back in December 2022 that I remember clearly. I had recently returned from maternity leave, and while I was away the machine learning world had changed in fundamental ways. ChatGPT had been publicly released, and those who had tried it were in awe of OpenAIs technical achievements and the models capabilities. It was a powerful demonstration of what transformer-based generative models could do when trained on large amounts of (text) data. Coming back from leave at that moment, the key question on my mind was, What are the implications of this achievement for our teams work at the intersection of artificial intelligence and video games?A new research opportunity enabled by dataIn our team, we had access to a very different source of data. For years, we had collaborated with Xbox Game Studios Ninja Theory (based in Cambridge, UK, just like our research team) to collect gameplay data from Bleeding Edge, their 2020 Xbox game. Bleeding Edge is a 4-versus-4 game where all games are played online, and matches are recorded if the player agrees to the End User License Agreement (EULA). We worked closely with our colleagues at Ninja Theory and with Microsoft compliance teams to ensure that the data was collected ethically and used responsibly for research purposes.Its been amazing to see the variety of ways Microsoft Research has used the Bleeding Edge environment and data to explore novel techniques in a rapidly moving AI industry, said Gavin Costello, technical director at Ninja Theory. From the hackathon that started it all, where we first integrated AI into Bleeding Edge, to building AI agents that could behave more like human players, to the World and Human Action Model being able to dream up entirely new sequences of Bleeding Edge gameplay under human guidance, its been eye-opening to see the potential this type of technology has.Muse Training DataCurrent Muse instances were trained on human gameplay data (visuals and controller actions) from the Xbox game Bleeding Edge shown here at the 300180 px resolution at which we train current models. Muse (using WHAM-1.6B) has been trained on more than 1 billion images and controller actions, corresponding to over 7 years of continuous human gameplay.The Game Intelligence and Teachable AI Experiences teams playing the Bleeding Edge game together.Until that point in late 2022, we had used Bleeding Edge as a platform for human-like navigation experiments, but we had not yet made meaningful use of the large amount of human player data we now had available. With the powerful demonstration of text-models, the next question was clear: What could we achieve if we trained a transformer-based model on large amounts of human gameplay data?Scaling up model trainingAs the team got to work, some of the key challenges included scaling up the model training. We initially used a V100 cluster, where we were able to prove out how to scale up to training on up to 100 GPUs; that eventually paved the way to training at scale on H100s. Key design decisions we made early focused on how to best leverage insights from the large language model (LLM) community and included choices such as how to effectively represent controller actions and especially images.The first sign that the hard work of scaling up training was paying off came in the form of a demo that thoroughly impressed me. Tim Pearce, at that time a researcher in Game Intelligence, had put together examples of what happened early versus later in training. You can see the demo here its like watching the model learn. This led to our follow-up work showing how scaling laws emerge in these kinds of models.Muse consistency over the course of trainingGround truthHuman gameplayGame visuals generated by Muse with 206M parametersConditioned on 1 second of real gameplay and 9 seconds of actionsCharacter recognizableBasic movements and geometryNo degeneration over timeCorrect interaction with power cellModels flying mechanic correctlyComparing ground truth human gameplay (left) to visuals generated using Muse (using WHAM-206M) when prompted with 1 second of human gameplay (visuals and controller actions) and 9 seconds of controller actions from the ground truth. In this setting, if Muse can generate visuals that closely match the ground truth, then it has captured the game dynamics. We see that the quality of generated visuals improves visibly over the course of training. In early training (10k training updates) we see signs of life, but quality deteriorates quickly. After 100k training updates, the model is consistent over time but does not yet capture relatively less frequent aspects of the game dynamics, such as the flying mechanic. Consistency with the ground truth continues to improve with additional training, e.g., the flying mechanic is captured after 1M training updates.Multidisciplinary collaboration: Involving users from the beginningWe had started to investigate how to evaluate these types of models early on. For example, we wanted to understand the representations learned using linear probing, which was driven by Research Intern Gunshi Gupta and Senior Research Scientist Sergio Valcarcel Macua; to explore online evaluation, driven by Senior Research Scientist Raluca Georgescu; and to generate both visuals and actions, initially termed full dreaming and driven by Research Intern Tarun Gupta. But working through how to systematically evaluate Muse required a much broader set of insights. More importantly, we needed to understand how people might use these models in order to know how to evaluate them.This was where the opportunity for multidisciplinary research became crucial. We had discussed aspects of this work with Senior Principal Research Manager Cecily Morrison and her Teachable AI Experiences team for several months. And we had already partnered on an engagement with game creatives (driven by Cecily, Design Researcher Linda Wen, and Principal Research Software Development Engineer Martin Grayson) to investigate how game creators would like to use generative AI capabilities in their creative practice.It was a great opportunity to join forces at this early stage to shape model capabilities to suit the needs of creatives right from the start, rather than try to retrofit an already developed technology, Cecily said.Linda offered some valuable insights about how we approached the work: Weve seen how technology-driven AI innovation has disrupted the creative industryoften catching creators off guard and leaving many feeling excluded, she said. This is why we invited game creators to help us shape this technology from the start. Recognizing that most AI innovations are developed in the Global North, we also made it a priority to recruit game creators from underrepresented backgrounds and geographies. Our goal was to create a technology that benefits everyonenot just those already in positions of privilege.Unlocking new creative use cases with the WHAM DemonstratorNow, with the models emerging capabilities and user insights in mind, it was time to put all the pieces together. The teams joined forces during a Microsoft internal hackathon to explore new interaction paradigms and creative uses that Muse could unlock. As a result, we developed a prototype that we call the WHAM Demonstrator, which allows users to directly interface with the model.The Global Hackathon was the perfect opportunity for everyone to come together and build our first working prototype, Martin said. We wanted to develop an interface for the WHAM model that would allow us to explore its creative potential and start to test ideas and uses we had learned from our interviews with game developers.WHAM DemonstratorFor interacting with World and Human Action Models like Muse, the WHAM Demonstrator provides a visual interface for interacting with a WHAM instance.In this example, the user is loading a visual as an initial prompt to the model, here a single promotional image for the game Bleeding Edge. They use Muse to generate multiple potential continuations from this starting point.The user explores the generated sequences and can tweak them, for example using a game controller to direct the character. These features demonstrate how Muses capabilities can enable iteration as part of the creative process.Identifying key capabilities and how to evaluate themThe hands-on experience of exploring Muse capabilities with the WHAM Demonstrator, and drawing on insights we gained from the user study, allowed us to systematically identify capabilities that game creatives would require to use generative models like Muse. This in turn allowed us to establish evaluation protocols for three key capabilities: consistency, diversity, and persistency. Consistency refers to a models ability to generate gameplay sequences that respect the dynamics of the game. For example, the character moves consistently with controller actions, does not walk through walls, and generally reflects the physics of the underlying game. Diversity refers to a models ability to generate a range of gameplay variants given the same initial prompt, covering a wide range of ways in which gameplay could evolve. Finally, persistency refers to a models ability to incorporate (or persist) user modifications into generated gameplay sequences, such as a character that is copy-pasted into a game visual. We give an overview of these capabilities below.Muse evaluation of consistency, diversity and persistencyConsistencyWe evaluate consistency by prompting the model with ground truth gameplay sequences and controller actions, and letting the model generate game visuals. The videos shown here are generated using Muse (based on WHAM-1.6B) and demonstrate the models ability to generate consistent gameplay sequences of up to two minutes. In our paper, we also compare the generated visuals to the ground truth visuals using FVD (Frchet Video Distance), an established metric in the video generation community.DiversityMuse (based on WHAM-1.6B) generated examples of behavioral and visual diversity, conditioned on the same initial 10 frames (1 second) of real gameplay. The three examples at the top show behavioral diversity (diverse camera movement, loitering near the spawn location, and navigating various paths to the middle jump pad). The three examples below show visual diversity (different hoverboards for the character). In the paper, we also quantitatively assess diversity using the Wasserstein distance, a measure of distance between two distributions, to compare the model-generated sequences to the diversity reflected in human gameplay recordings. Muse generated examples of behavioral and visual diversity, conditioned on the same 10 frames of real gameplay. Three examples of behavioral diversity show diverse camera movement, loitering near the spawn location, and navigating various paths to the middle jump pad. Three examples of visual diversity show different hoverboards for the character.With our evaluation framework in place, and access to an H100 compute allocation, the team was able to further improve Muse instances, including higher resolution image encoders (our current models generate visuals at a resolution of 300180 pixels, up from the 128128 resolution of our earliest models) and larger models, and expand to all seven Bleeding Edge maps. To show some of the capabilities of the model we are publishing today, we have included videos of 2-minute-long generated gameplay sequences above, which give an impression of the consistency and diversity of gameplay sequences that the model can generate.According to Senior Researcher Tabish Rashid: Being handed an allocation of H100s was initially quite daunting, especially in the early stages figuring out how to make best use of it to scale to larger models with the new image encoders. After months of experimentation, it was immensely rewarding to finally see outputs from the model on a different map (not to knock the lovely greenery of Skygarden) and not have to squint so much at smaller images. Im sure at this point many of us have watched so many videos from Muse that weve forgotten what the real game looks like.One of my favorite capabilities of the model is how it can be prompted with modifications of gameplay sequences and persist newly introduced elements. For example, in the demo below, weve added a character onto the original visual from the game. Prompting the model with the modified visual, we can see how the model persists the added character and generates plausible variants of how the gameplay sequence could have evolved from this modified starting point.PersistencyDemonstrations of how Muse (based on WHAM-1.6B) can persist modifications. A visual is taken from the original gameplay data and an image of an additional character is edited into the image. The generated gameplay sequence shows how the character is adapted into the generated gameplay sequence.ConclusionToday, our team is excited to be publishing our work in Nature and simultaneously releasing Muse open weights, the WHAM Demonstrator, and sample data to the community.I look forward to seeing the many ways in which the community will explore these models and build on our research. I cannot wait to see all the ways that these models and subsequent research will help shape and increase our understanding of how generative AI models of human gameplay may support gameplay ideation and pave the way for future, novel, AI-based game experiences, including the use cases that our colleagues at Xbox (opens in new tab) have already started to explore.Opens in a new tab0 Commenti 0 condivisioni 198 Views
-
WWW.MICROSOFT.COMMicrosoft Research and Physics Wallah team up to enhance AI-based tutoringIn India, limited resources, geographical constraints, and economic factors present barriers to quality higher education for some students.A shortage of teachers, particularly in remote or low-income areas, makes it harder for students to receive the guidance they need to prepare for highly competitive professional and academic programs.Microsoft Research is developing new algorithms and techniques that are enabling Physics Wallah (opens in new tab), a growing educational company, to make its AI-based tutoring services more accurate and reliable, to better support students on their education journey.As in other countries, many Indian students purchase coaching and tutoring services to prepare for entrance exams at top institutions. This includes offline coaching, where hundreds of students meet in a classroom staffed by teachers covering a structured curriculum. Online coaching enables students to learn remotely in a virtual classroom. Hybrid coaching delivers virtual lessons in a physical classroom.Offline courses can cost as much as 100,000 Indian rupees a yearequivalent to hundreds of U.S. dollars. This puts them out of reach for many lower income students living in smaller and mid-sized Indian cities, as well as rural villages. Online courses are much more affordable. They allow students to work at their own pace by providing high-quality web-based content supported by teachers who work remotely.Vineet GovilMeeting this need is the mission of Physics Wallah. The company uses AI to offer on-demand tutoring at scale, curating volumes of standard science- and math-related content to provide the best answers. Some 2 million students use the Physics Wallah platform every day, at a fraction of the cost of offline tutoring. For example, its prep courses for the Joint Entrance Examination (JEE), which is required for admission to engineering and technology programs, and the National Eligibility cum Entrance Test (NEET), a required entrance exam for medical and dental school candidates, cost between 4,200 and 4,500 rupees per year. Thats roughly 50 U.S. dollars.The mantra here really is how do we provide quality education in an affordable manner and accessible to every student, regardless of who they are or where they come from.Vineet Govil, Chief Technology and Product Officer, Physics Wallah TIMELINECelebrating 20 Years at Microsoft Research IndiaMicrosoft Research Indias collaboration with Physics Wallah is part of a 20-year legacy of supporting emerging Indian companies, underscored by the January 2025 announcement that Microsoft will invest $3 billion (opens in new tab) in cloud and AI infrastructure to accelerate the adoption of AI, skilling, and innovation. Physics Wallah has developed an AI-driven educational suite, Alakh AI, leveraging OpenAIs GPT-4o model through Microsoft Azure OpenAI Service. Alakh AIs flagship offerings include AI Guru and the Smart Doubt Engine, both designed to transform the learning experience in and beyond the classroom.AI Guru acts as a personal academic tutor, delivering adaptive guidance based on a students progress, real-time question-solving, and customized content that evolves with their learning journey.Smart Doubt Engine is an AI tool through which students can ask questions (also known as doubts in Indian English) during live classes and receive instant responses.Additionally, the Alakh AI suite includes:AI Grader for subjective answer evaluation without human interventionSahayak for crafting hyper-personalized learning paths tailored to individual students needsThis innovative ecosystem elevates learning efficiency and accessibility for students.AI Guru in action A student asks, Explain Newtons First Law, and the AI tutor provides a detailed explanation along with two videos for further learning.Smart Doubt Engine in action A student asks a clarifying question during a live class, and the AI provides a detailed explanation in real time.How does AI Guru work?Lets say a student had a question about Newtons laws of motion, a core concept in physics. She would type her query into the AI Guru chat window (she could also just talk to it or upload an image from a textbook) and receive a text answer plus images derived from standard textbooks and curated content, typically in just a few seconds. AI Guru also provides a short video where a teacher offers additional context.Getting the technology rightThe Alakh AI suite is powered by OpenAIs foundational models GPT-4 and GPT-4o, integrated with a retrieval-augmented generation (RAG) architecture. It leverages Physics Wallahs rich repository of high-quality curated contentdeveloped and refined over several yearsalong with continuous updates from subject matter experts to ensure new materials, textbooks, tutorials, and question banks are seamlessly incorporated. Despite considerable progress, the existing AI sometimes falters when navigating complex academic problems.The accuracy level of todays large language models (LLMs) is not up to the mark where we can provide reliable and satisfactory answers to the students all the timespecifically, if its a hard mathematical problem involving complex equations, Govil said.Thats one important focus of the collaboration. Researchers from Microsoft Research are developing new algorithms and techniques to enhance the accuracy and reasoning capabilities of AI models. They are now collaborating with Physics Wallah to apply these advancements to the Alakh AI suite, improving its ability to solve complex problems and provide more reliable, step-by-step guidance to students. A key challenge is the nature of student queries, which are often ambiguous and involve multimodal inputstext, images, videos, or audiorequiring unified capabilities to address the problem. Many STEM problems require breaking down complex queries into logical sub-problems and applying high-order, step-by-step reasoning for consistency. Additionally, integrating domain-specific knowledge in advanced math, physics, chemistry, and biology requires contextualization and seamless retrieval of specialized, grade-appropriate information.Microsoft Research is working with Physics Wallah to move beyond traditional next-token prediction and develop AI systems that approach reliable, systematic, step-by-step problem-solving.That includes ongoing work to enhance the models reasoning capabilities and deliver more accurate query answers on complex JEE math problems. Instead of just providing the final answer, the underlying models now break problems into step-by-step solutions. That helps students learn how to solve the actual problems. The AI can also review student answers, detect mistakes, and give detailed feedback, acting as a personal tutor to guide students, improve their understanding, and enhance their learning experience.Microsoft research podcastIdeas: AI and democracy with Madeleine Daepp and Robert Osazuwa NessAs the biggest election year in history comes to an end, researchers Madeleine Daepp and Robert Osazuwa Ness and Democracy Forward GM Ginny Badanes discuss AIs impact on democracy, including the techs use in Taiwan and India.Listen nowOpens in a new tab Solving complex problems requires enhancing the reasoning capabilities of both large and small language models by training them to not just generate answers, but to systematically think through and reason about complex problems. This requires high-quality reasoning tracesdetailed, step-by-step breakdowns of logical problem-solving processes.To enable this, researchers collaborated with Physics Wallah to curate a dataset of 150,000 high-quality math reasoning traces. These traces serve as the foundation for training specialized small language models (SLMs) using supervised fine-tuning (SFT). Model performance is further refined through training on carefully curated on-policy preference data, ensuring alignment with high-quality reasoning standards. The teams current Phi-based models have already outperformed leading LLMs and other baselines on complex math problems.Building AI systems capable of human-like thinking and reasoning represents a significant challenge.Akshay Nambi, Principal Researcher at Microsoft Research IndiaThe next step is to develop a self-evolving learning pipeline using online reinforcement learning techniques, allowing the model to continuously generate high-quality synthetic data that further enhances its capabilities. Additionally, researchers are building a reward model and integrating it with Monte Carlo Tree Search (MCTS) to optimize reasoning and improve inference-time decision-making.The goal is to develop tools that complement education. To do this, we are enhancing the models capabilities to process, break down, and solve problems step-by-step. We do this by incorporating high-quality data into training to teach the model how to approach such tasks, alongside algorithmic innovations that enable the model to think and reason more effectively.Opening new doors for studentsChandramouleswar ParidaGetting an education at a top university can be life changing for anyone. For Chandramouleswar Parida, it could change the lives of everyone in his home village in Baniatangi, Khordha, Odisha State, India. Chandra decided to become a doctor after watching his grandfather die from a heart attack. The nearest doctor who could have treated him was at a regional hospital 65 kilometers away.He could have been saved if certain procedures had been followed, Chandra said. He wants to study medicine, perhaps receiving advanced training overseas, and then return home. I want to be a doctor here in our village and serve our people, because there is a lack of treatment. Being a doctor is a very noble kind of job in this society.Chandra is the only student in Baniatangi Village, Khordha, Odisha, currently preparing for the NEET. Without Physics Wallah, students like Chandra would likely have no access to the support and resources that cant be found locally.Anushka Sunil DhanwadeAnother student, Anushka Sunil Dhanwade, is optimistic that Physics Wallah will help her dramatically improve her initial score on the NEET exam. While in 11th class, or grade, she joined an online NEET prep class with 800 students. But she struggled to follow the coursework, as the teachers tailored the content to the strongest students. After posting a low score on the NEET exam, her hopes of becoming a doctor were fading.But after a serious stomach illness reminded her of the value of having a doctor in her family, she tried again, this time with Physics Wallah and AI Guru. After finishing 12th class, she began preparing for NEET and plans to take the exams again in May, confident that she will increase her score.AI Guru has made my learning so smooth and easy because it provides me answers related to my study and study-related doubt just within a click.Anushka Sunil Dhanwade, StudentNext steps in the collaborationThe collaboration between Microsoft Research and Physics Wallah aims to apply the advancements in solving math problems across additional subjects, ultimately creating a unified education LLM with enhanced reasoning capabilities and improved accuracy to support student learning.Were working on an education-specific LLM that will be fine-tuned using the extensive data weve gathered and enriched by Microsofts expertise in LLM training and algorithms. Our goal is to create a unified model that significantly improves accuracy and raises student satisfaction rates to 95% and beyond, Govil explained.The teams are also integrating a new tool from Microsoft Research called PromptWizard (opens in new tab), an automated framework for optimizing the instructions given to a model, into Physics Wallahs offerings. New prompts can now be generated in minutes, eliminating months of manual work, while providing more accurate and aligned answers for students.For Nambi and the Microsoft Research India team, the collaboration is the latest example of their deep commitment to cultivating the AI ecosystem in India and translating new technology from the lab into useful business applications.By leveraging advanced reasoning techniques and domain expertise, we are transforming how AI addresses challenges across multiple subjects. This represents a key step in building AI systems that act as holistic personal tutors, enhancing student understanding and creating a more engaging learning experience, Nambi said.Explore more VideoShiksha copilot demoOpens in a new tab0 Commenti 0 condivisioni 199 Views
-
WWW.MICROSOFT.COMExACT: Improving AI agents decision-making via test-time compute scalingAutonomous AI agents are transforming the way we approach multi-step decision-making processes, streamlining tasks like web browsing, video editing, and file management. By applying advanced machine learning, they automate workflows, optimize performance, and reduce the need for human input.However, these systems struggle in complex, dynamic environments. A key challenge lies in balancing exploitation, using known strategies for immediate gains, with exploration, which involves seeking new strategies that could yield long-term benefits. Additionally, they often have difficulty adapting to unpredictable changes in conditions and objectives, as well as generalizing knowledge across contexts, limiting their ability to transfer learned strategies between domains.In response, we developed ExACT, an approach for teaching AI agents to explore more effectively, enabling them to intelligently navigate their environments, gather valuable information, evaluate options, and identify optimal decision-making and planning strategies. ExACT combines two key techniques: Reflective-MCTS (R-MCTS) and Exploratory Learning.Spotlight: Blog postMedFuzz: Exploring the robustness of LLMs on medical challenge problemsMedfuzz tests LLMs by breaking benchmark assumptions, exposing vulnerabilities to bolster real-world accuracy.Read moreOpens in a new tab R-MCTS builds on the traditional Monte Carlo Tree Search (MCTS) algorithm, introducing features like contrastive reflection and a multi-agent debate function. Through contrastive reflection, the agent refines its decision-making by comparing expected outcomes with actual results, allowing it to learn from both its successes and mistakes. The multi-agent debate function provides various evaluations of a given state, where multiple agents offer contrasting perspectives to help provide a balanced and reliable assessment.Exploratory Learning trains agents to navigate environments effectively. Together, these techniques show strong computational scalability during both training and testing, as demonstrated on VisualWebArenaa benchmark for evaluating multimodal autonomous language agents (Figure 1).Figure 1. Evaluation demonstrates the compute scaling properties of GPT-4o during both training and testing. The assessment includes two scenarios: (1) applying the GPT-4o-based R-MCTS agent to all 234 tasks from the Classifieds category in VisualWebArena (left), and (2) testing fine-tuned GPT-4o on 169 previously unseen tasks from Classifieds without using search algorithms (right).R-MCTS extends the classic MCTS by enabling real-time improvements in decision-making. Shown in Figure 2, an iterative feedback loop allows R-MCTS to learn from past experiences, avoid prior mistakes, and focus on more effective actions in similar contexts.Figure 2. Overview of the R-MCTS process in ExACT.Evaluating R-MCTSR-MCTS demonstrates state-of-the-art performanceacross all VisualWebArena environments, surpassing the previous best-performing method, Search Agent, with improvements ranging from 6% to 30% (Table 1). Additionally, as of January 2025, it holds the second position on the OSWorld leaderboard and demonstrates state-of-the-art performancein the blind test setting, where there is no prior access to the test environment, reflecting its advanced capabilities (Table 2).RankModelScore1GPT-4o + ExACT33.702GPT-4o + Search26.403GPT-4o + WebDreamer23.604GPT-4o + ICAL23.405GPT-4o19.786Llama-3-70B + Search16.70Table 1. The VisualWebArena leaderboard highlights R-MCTS as achieving state-of-the-art performance as of December 2024.RankModelBlind TestScore1learn-by-interact w/ Claude-3.5-sonnet22.502ExACT w/ GPT-4o16.603GPT-412.244GPT-4o11.365GPT-4 Vision (0409)10.826learn-by-interact w/ Gemini-1.5-pro10.30Table 2. The OSWorld leaderboard for the category of A11y tree inputs shows that ExACT with GPT-4o ranks second and demonstrates state-of-the-art performancein the blind test setting, as of December 2024.How Exploratory Learning worksExploratory Learning enables agents to dynamically search and adjust their computational resources during testing without depending on MCTS. In contrast to Imitation Learning, which centers on training models using the optimal actions identified through search, Exploratory Learning focuses on cultivating the agents ability to navigate its environment by teaching it to evaluate states, explore different pathways, and efficiently backtrack from unpromising paths to identify more favorable alternatives.Figure 3. In contrast to Imitation Learning, Exploratory Learning uses the entire search trajectory for training.Evaluating Exploratory LearningWe conducted experiments using GPT-4o fine-tuned with Exploratory Learning in the VisualWebArena environment. Results demonstrate the following key benefits:Improved performance: GPT-4o achieves performance improvement, comparable with scaling test-time compute with MCTS, even without search.Test-time compute scaling: GPT-4o performs better when given more actions per task, leading to improved decision-making and task completion, which increased from 5% to 12.4%.Improved generalization on unseen tasks: Exploratory Learning helps fine-tuned GPT-4o handle unseen tasks more effectively than agents trained with Imitation Learning or no additional training.The following video provides a detailed demonstration of how R-MCTS and Exploratory Learning function.Continued explorationAdvancing autonomous AI agents is key to enabling them to handle complex, multi-step tasks with greater precision and adaptability. ExACT represents a significant step toward creating agents that can perform complex decision-making before taking action, leading to improved performance, but challenges remain.How can AI agents improve decision-making in real-world scenarios, where they may be constrained by time or resources? How can they learn effectively and efficiently from environmental feedback? We are currently investigating these questions, and we invite you to explore them with us by building on the ExACT framework. Access the ExACT code at our GitHub repository (opens in new tab).Opens in a new tab0 Commenti 0 condivisioni 168 Views
-
WWW.MICROSOFT.COMIdeas: Building AI for population-scale systems with Akshay NambiTranscript[TEASER][MUSIC PLAYS UNDER DIALOGUE]AKSHAY NAMBI: For me, research is just not about pushing the boundaries of the knowledge. Its about ensuring that these advancements translate to meaningful impact on the ground. So, yes, the big goals that guide most of my work is twofold. One, how do we build technology thats scaled to benefit large populations? And two, at the same time, Im motivated by the challenge of tackling complex problems. That provides opportunity to explore, learn, and also create something new, and thats what keeps me excited.[TEASER ENDS]CHRIS STETKIEWICZ: Youre listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. In this series, well explore the technologies that are shaping our future and the big ideas that propel them forward.[MUSIC FADES]Im your guest host, Chris Stetkiewicz. Today, Im talking to Akshay Nambi. Akshay is a principal researcher at Microsoft Research. His work lies at the intersection of systems, AI, and machine learning with a focus on designing, deploying, and scaling AI systems to solve compelling real-world problems. Akshays research extends across education, agriculture, transportation, and energy. He is currently working on enhancing the quality and reliability of AI systems by addressing critical challenges such as reasoning, grounding, and managing complex queries.Akshay, welcome to the podcast.AKSHAY NAMBI: Thanks for having me.STETKIEWICZ: Id like to begin by asking you to tell us your origin story. How did you get started on your path? Was there a big idea or experience that captured your imagination or motivated you to do what youre doing today?NAMBI: If I look back, my journey into research wasnt a straight line. It was more about discovering my passion through some unexpected opportunities and also finding purpose along the way. So before I started with my undergrad studies, I was very interested in electronics and systems. My passion for electronics, kind of, started when I was in school. I was more like an average student, not a nerd or not too curious, but I was always tinkering around, doing things, building stuff, and playing with gadgets and that, kind of, made me very keen on electronics and putting things together, and that was my passion. But sometimes things dont go as planned. So I didnt get into the college which I had hoped to join for electronics, so I ended up pursuing computer science, which wasnt too bad either. So during my final year of bachelors, I had to do a final semester project, which turned out to be a very pivotal moment. And thats when I got to know this institute called Indian Institute of Science (IISc), which is a top research institute in India and also globally. And I had a chance to work on a project there. And it was my first real exposure to open-ended research, right, so I remember where we were trying to build a solution that helped to efficiently construct an ontology for a specific domain, which simply means that we were building systems to help users uncover relationships in the data and allow them to query it more efficiently, right. And it was super exciting for me to design and build something new. And that experience made me realize that I wanted to pursue research further. And right after that project, I decided to explore research opportunities, which led me to join Indian Institute of Science again as a research assistant.STETKIEWICZ: So what made you want to take the skills you were developing and apply them to a research career?NAMBI: So interestingly when I joined IISc, the professor I worked with specialized in electronics, so things come back, so something I had always been passionate about. And I was the only computer science graduate in the lab at that time with others being electronic engineers, and I didnt even know how to solder. But the lab environment was super encouraging, collaborative, so I, kind of, caught up very quickly. In that lab, basically, I worked on several projects in the emerging fields of embedded device and energy harvesting systems. Specifically, we were designing systems that could harvest energy from sources like sun, hydro, and even RF (radio frequency) signals. And my role was kind of twofold. One, I designed circuits and systems to make energy harvesting more efficient so that you can store this energy. And then I also wrote programs, software, to ensure that the harvested energy can be used efficiently. For instance, as we harvest some of this energy, you want to have your programs run very quickly so that you are able to sense the data, send it to the server in an efficient way. And one of the most exciting projects I worked during that time was on data-driven agriculture. So this was back in 2008, 2009, right, where we developed an embedded system device with sensors to monitor the agricultural fields, collecting data like soil moisture, soil temperature. And that was sent to the agronomists who were able to analyze this data and provide feedback to farmers. In many remote areas, still access to power is a huge challenge. So we used many of the technologies we were developing in the lab, specifically energy harvesting techniques, to power these sensors and devices in the rural farms, and thats when I really got to see firsthand how technology could help peoples lives, particularly in rural settings. And thats what, kind of, stood out in my experience at IISc, right, was that it was [the] end-to-end nature of the work. And it was not just writing code or designing circuits. It was about identifying the real-world problems, solving them efficiently, and deploying solutions in the field. And this cemented my passion for creating technology that solves real-world problems, and thats what keeps me driving even today.STETKIEWICZ: And as youre thinking about those problems that you want to try and solve, where did you look for, for inspiration? It sounds like some of these are happening right there in your home.NAMBI: Thats right. Growing up and living in India, Ive been surrounded by these, kind of, many challenges. And these are not distant problems. These are right in front of us. And some of them are quite literally outside the door. So being here in India provides a unique opportunity to tackle some of the pressing real-world challenges in agriculture, education, or in road safety, where even small advancements can create significant impact.STETKIEWICZ: So how would you describe your research philosophy? Do you have some big goals that guide you?NAMBI: Right, as I mentioned, right, my research philosophy is mainly rooted in solving real-world problems through end-to-end innovation. For me, research is just not about pushing the boundaries of the knowledge. Its about ensuring that these advancements translate to meaningful impact on the ground, right. So, yes, the big goals that guide most of my work is twofold. One, how do we build technology thats scaled to benefit large populations? And two, at the same time, Im motivated by the challenge of tackling complex problems. That provides opportunity to explore, learn, and also create something new. And thats what keeps me excited.STETKIEWICZ: So lets talk a little bit about your journey at Microsoft Research. I know you began as an intern, and some of the initial work you did was focused on computer vision, road safety, energy efficiency. Tell us about some of those projects.NAMBI: As I was nearing the completion of my PhD, I was eager to look for opportunities in industrial labs, and Microsoft Research obviously stood out as an exciting opportunity. And additionally, the fact that Microsoft Research India was in my hometown, Bangalore, made it even more appealing. So when I joined as an intern, I worked together with Venkat Padmanabhan, who now leads the lab, and we started this project called HAMS, which stands for Harnessing Automobiles for Safety. As you know, road safety is a major public health issue globally, responsible for almost 1.35 million fatalities annually and with the situation being even more severe in countries like India. For instance, there are estimates that theres a life lost on the road every four minutes in India. When analyzing the factors which affect road safety, we saw mainly three elements. One, the vehicle. Second, the infrastructure. And then the driver. Among these, the driver plays the most critical role in many incidents, whether its over-speeding, driving without seat belts, drowsiness, fatigue, any of these, right. And this realization motivated us to focus on driver monitoring, which led to the development of HAMS. In a nutshell, HAMS is basically a smartphone-based system where youre mounting your smartphone on a windshield of a vehicle to monitor both the driver and the driving in real time with the goal of improving road safety. Basically, it observes key aspects such as where the driver is looking, whether they are distracted or fatigued[1], while also considering the external driving environment, because we truly believe to improve road safety, we need to understand not just the drivers action but also the context in which they are driving. For example, if the smartphones accelerometer detects sharp braking, the system would automatically check the distance to the vehicle in the front using the rear camera and whether the driver was distracted or fatigued using the front camera. And this holistic approach ensures a more accurate and comprehensive assessment of the driving behavior, enabling a more meaningful feedback.STETKIEWICZ: So that sounds like a system thats got several moving parts to it. And I imagine you had some technical challenges you had to deal with there. Can you talk about that?NAMBI: One of our guiding principles in HAMS was to use commodity, off-the-shelf smartphone devices, right. This should be affordable, in the range of $100 to $200, so that you can just take out regular smartphones and enable this driver and driving monitoring. And that led to handling several technical challenges. For instance, we had to develop efficient computer vision algorithms that could run locally on the device with cheap smartphone processing units while still performing very well at low-light conditions. We wrote multiple papers and developed many of the novel algorithms which we implemented on very low-cost smartphones. And once we had such a monitoring system, right, you can imagine theres several deployment opportunities, starting from fleet monitoring to even training new drivers, right. However, one application we hadnt originally envisioned but turned out to be its most impactful use case even today is automated drivers license testing. As you know, before you get a license, a driver is supposed to pass a test, but what happens in many places, including India, is that licenses are issued with very minimal or no actual testing, leading to unsafe and untrained drivers on the road. At the same time as we were working on HAMS, Indian government were looking at introducing technology to make testing more transparent and also automated. So we worked with the right set of partners, and we demonstrated to the government that HAMS could actually completely automate the entire license testing process. So we first deployed this system in Dehradun RTO (Regional Transport Office)which is the equivalent of a DMV in the USin 2019, working very closely with RTO officials to define what should be some of the evaluation criteria, right. Some of these would be very simple like, oh, is it the same candidate who is taking the test who actually registered for the test, right? And whether they are wearing seat belts. Did they scan their mirrors before taking a left turn and how well they performed in tasks like reverse parking and things like that.STETKIEWICZ: So whats been the government response to that? Have they embraced it or deployed it in a wider extent?NAMBI: Yes, yes. So after the deployment in Dehradun in 2019, we actually open sourced the entire HAMS technology and our partners are now working with several state governments and scaled HAMS to several states in India. And as of today, we have around 28 RTOs where HAMS is actually being deployed, and the pass rate of such license test is just 60% as compared to 90-plus percent with manual testing. Thats the extensive rigor the system brings in. And now what excites me is after nearly five years later, we are now taking the next step in this project where we are now evaluating the long-term impact of this intervention on driving behavior and road safety. So we are collaborating with Professor Michael Kremer, who is a Nobel laureate and professor at University of Chicago, and his team to study how this technology has influenced driving patterns and accident rates over time. So this focus on closing the loop and moving beyond just deployment in the field to actually measuring the real impact, right, is something that truly excites me and that makes research at Microsoft is very unique. And that is actually one of the reasons why I joined Microsoft Research as a full-time after my internship, and this unique flexibility to work on real-world problems, develop novel research ideas, and actually collaborate with partners both internally and externally to deploy at scale is something that is very unique here.STETKIEWICZ: So have you actually received any evidence that the project is working? Is driving getting safer?NAMBI: Yes, these are very early analysis, and there are very positive insights we are getting from that. Soon we will be releasing a white paper on our study on this long-term impact.STETKIEWICZ: Thats great. I look forward to that one. So youve also done some interesting work involving the Internet of Things, with an emphasis on making it more reliable and practical. So for those in our audience who may not know, the Internet of Things, or IoT, is a network that includes billions of devices and sensors in things like smart thermostats and fitness trackers. So talk a little bit about your work in this area.NAMBI: Right, so IoT, as you know, is already transforming several industries with billions of sensors being deployed in areas like industrial monitoring, manufacturing, agriculture, smart buildings, and also air pollution monitoring. And if you think about it, these sensors provide critical data that businesses rely for decision making. However, a fundamental challenge is ensuring that the data collected from these sensors is actually reliable. If the data is faulty, it can lead to poor decisions and inefficiencies. And the challenge is that these sensor failures are always not obvious. What I mean by that is when a sensor stops working, it always doesnt stop sending data, but it often continues to send some data which appear to be normal. And thats one of the biggest problems, right. So detecting these errors is non-trivial because the faulty sensors can mimic real-world working data, and traditional solutions like deploying redundant sensors or even manually inspecting them are very expensive, labor intensive, and also sometimes infeasible, especially for remote deployments. Our goal in this work was to develop a simple and efficient way to remotely monitor the health of the IoT sensors. So what we did was we hypothesized that most sensor failures occurred due to the electronic malfunctions. It could be either due to short circuits or component degradation or due to environmental factors such as heat, humidity, or pollution. Since these failures originate within the sensor hardware itself, we saw an opportunity to leverage some of the basic electronic principles to create a novel solution. The core idea was to develop a way to automatically generate a fingerprint for each sensor. And by fingerprint, I mean the unique electrical characteristic exhibited by a properly working sensor. We built a system that could devise these fingerprints for different types of sensors, allowing us to detect failures purely based on the sensors internal characteristics, that is the fingerprint, and even without looking at the data it produces. Essentially what it means now is that we were able to tag each sensor data with a reliability score, ensuring verifiability.STETKIEWICZ: So how does that technology get deployed in the real world? Is there an application where its being put to work today?NAMBI: Yes, this technology, we worked together with Azure IoT and open-sourced it where there were several opportunities and several companies took the solution into their systems, including air pollution monitoring, smart buildings, industrial monitoring. The one which I would like to talk about today is about air pollution monitoring. As you know, air pollution is a major challenge in many parts of the world, especially in India. And traditionally, air quality monitoring relies on these expensive fixed sensors, which provide limited coverage. On the other hand, there is a rich body of work on low-cost sensors, which can offer wider deployment. Like, you can put these sensors on a bus or a vehicle and have it move around the entire city, where you can get much more fine-grained, accurate picture on the ground. But these are often unreliable because these are low-cost sensors and have reliability issues. So we collaborated with several startups who were developing these low-cost air pollution sensors who were finding it very challenging to gain trust because one of the main concerns was theaccuracy of the data from low-cost sensors. So our solution seamlessly integrated with these sensors, which enabled verification of the data quality coming out from these low-cost air pollution sensors. So this bridged the trust gap, allowing government agenciesto initiate large-scale pilots using low-cost sensors for fine-grain air-quality monitoring.STETKIEWICZ: So as were talking about evolving technology, large language models, or LLMs, are also enabling big changes, and theyre not theoretical. Theyre happening today. And youve been working on LLMs and their applicability to real-world problems. Can you talk about your work there and some of the latest releases?NAMBI: So when ChatGPT was first released, I, like many people, was very skeptical. However, I was also curious both of how it worked and, more importantly, whether it could accelerate solutions to real-world problems. That led to the exploration of LLMs in education, where we fundamentally asked this question, can AI help improve educational outcomes? And this was one of the key questions which led to the development of Shiksha copilot, which is a genAI-powered assistant designed to support teachers in their daily work, starting from helping them to create personalized learning experience, design assignments, generate hands-on activities, and even more. Teachers today universally face several challenges, from time management to lesson planning. And our goal with Shiksha was to empower them to significantly reduce the time spent on this task. For instance, lesson planning, which traditionally took about 60 minutes, can now be completed in just five minutes using the Shiksha copilot. And what makes Shiksha unique is that its completely grounded in the local curriculum and the learning objectives, ensuring that the AI-generated content aligns very well with the pedagogical best practices. The system actually supports multilingual interactions, multimodal capabilities, and also integration with external knowledge base, making it very highly adaptable for different curriculums. Initially, many teachers were skeptical. Some feared this would limit their creativity. However, as they began starting to use Shiksha, they realized that it didnt replace their expertise, but rather amplified it, enabling them to do work faster and more efficiently.STETKIEWICZ: So, Akshay, the last time you and I talked about Shiksha copilot, it was very much in the pilot phase and the teachers were just getting their hands on it. So it sounds like, though, youve gotten some pretty good feedback from them since then.NAMBI: Yes, so when we were discussing, we were doing this six-month pilot with 50-plus teachers where we gathered overwhelming positive feedback on how technologies are helping teachers to reduce time in their lesson planning. And in fact, they were using the system so much that they really enjoyed working with Shiksha copilot where they were able to do more things with much less time, right. And with a lot of feedback from teachers, we have improved Shiksha copilot over the past few months. And starting this academic year, we have already deployed Shiksha to 1,000-plus teachers in Karnataka. This is with close collaboration with our partners in with the Sikshana Foundation and also with the government of Karnataka. And the response has been already incredibly encouraging. And looking ahead, we are actually focusing on again, closing this loop, right, and measuring the impact on the ground, where we are doing a lot of studies with the teachers to understand not just improving efficiency of the teachers but also measuring how AI-generated content enriched by teachers is actually enhancing student learning objectives. So thats the study we are conducting, which hopefully will close this loop and understand our original question that, can AI actually help improve educational outcomes?STETKIEWICZ: And is the deployment primarily in rural areas, or does it include urban centers, or whats the target?NAMBI: So the current deployment with 1,000 teachers is a combination of both rural and urban public schools. These are covering both English medium and Kannada medium teaching schools with grades from Class 5 to Class 10.STETKIEWICZ: Great. So Shiksha was focused on helping teachers and making their jobs easier, but I understand youre also working on some opportunities to use AI to help students succeed. Can you talk about that?NAMBI: So as you know, LLMs are still evolving and inherently they are fragile, and deploying them in real-world settings, especially in education, presents a lot of challenges. With Shiksha, if you think about it, teachers remain in control throughout the interaction, making the final decision on whether to use the AI-generated content in the classroom or not. However, when it comes to AI tutors for students, the stakes are slightly higher, where we need to ensure the AI doesnt produce incorrect answers, misrepresent concepts, or even mislead explanations. Currently, we are developing solutions to enhance accuracy and also the reasoning capabilities of these foundational models, particularly solving math problems. This represents a major step towards building AI systems thats much more holistic personal tutors, which help student understanding and create more engaging, effective learning experience.STETKIEWICZ: So youve talked about working in computer vision and IoT and LLMs. What do those areas have in common? Is there some thread that weaves through the work that youre doing?NAMBI: Thats a great question. As a systems researcher, Im quite interested in this end-to-end systems development, which means that my focus is not just about improving a particular algorithm but also thinking about the end-to-end system, which means that I, kind of, think about computer vision, IoT, and even LLMs as tools, where we would want to improve them for a particular application. It could be agriculture, education, or road safety. And then how do you think this holistically to come up with the best efficient system that can be deployed at population scale, right. I think thats the connecting story here, that how do you have this systemic thinking which kind of takes the existing tools, improves them, makes it more efficient, and takes it out from the lab to real world.STETKIEWICZ: So youre working on some very powerful technology that is creating tangible benefits for society, which is your goal. At the same time, were still in the very early stages of the development of AI and machine learning. Have you ever thought about unintended consequences? Are there some things that could go wrong, even if we get the technology right? And does that kind of thinking ever influence the development process?NAMBI: Absolutely. Unintended consequences are something I think about deeply. Even the most well-designed technology can have these ripple effects that we may not fully anticipate, especially when we are deploying it at population scale. For me, being proactive is one of the key important aspects. This means not only designing the technology at the lab but actually also carefully deploying them in real world, measuring its impact, and working with the stakeholders to minimize the harm. In most of my work, I try to work very closely with the partner team on the ground to monitor, analyze, how the technology is being used and what are some of the risks and how can we eliminate that. At the same time, I also remain very optimistic. Its also about responsibility. If we are able to embed societal values, ethics, into the design of the system and involve diverse perspectives, especially from people on the ground, we can remain vigilant as the technology evolves and we can create systems that can truly deliver immense societal benefits while addressing many of the potential risks.STETKIEWICZ: So weve heard a lot of great examples today about building technology to solve real-world problems and your motivation to keep doing that. So as you look ahead, where do you see your research going next? How will people be better off because of the technology you develop and the advances that they support?NAMBI: Yeah, Im deeply interested in advancing AI systems that can truly assist anyone in their daily tasks, whether its providing personalized guidance to a farmer in a rural village, helping a student get instant 24 by 7 support for their learning doubts, or even empowering professionals to work more efficiently. And to achieve this, my research is focusing on tackling some of the fundamental challenges in AI with respect to reasoning and reliability and also making sure that AI is more context aware and responsive to evolving user needs. And looking ahead, I envision AI as not just an assistant but also as an intelligent and equitable copilot seamlessly integrated into our everyday life, empowering individuals across various domains.STETKIEWICZ: Great. Well, Akshay, thank you for joining us on Ideas. Its been a pleasure.[MUSIC]NAMBI: Yeah, I really enjoyed talking to you, Chris. Thank you.STETKIEWICZ: Till next time.[MUSIC FADES]0 Commenti 0 condivisioni 191 Views
-
WWW.MICROSOFT.COMAdvances to low-bit quantization enable LLMs on edge devicesLarge language models (LLMs) are increasingly being deployed on edge deviceshardware that processes data locally near the data source, such as smartphones, laptops, and robots. Running LLMs on these devices supports advanced AI and real-time services, but their massive size, with hundreds of millions of parameters, requires significant memory and computational power, limiting widespread adoption. Low-bit quantization, a technique that compresses models and reduces memory demands, offers a solution by enabling more efficient operation.Recent advances in low-bit quantization have made mixed-precision matrix multiplication (mpGEMM) viable for LLMs. This deep learning technique allows data of the same or different formats to be multiplied, such as int8*int1, int8*int2, or FP16*int4. By combining a variety of precision levels, mpGEMM strikes a balance among speed, memory efficiency, and computational accuracy.However, most hardware supports only symmetric computationsoperations on data of similar formatscreating challenges for mixed-precision calculations during General Matrix Multiplication (GEMM), a critical operation for LLMs. Overcoming these hardware limitations is essential to fully benefit from mpGEMM and support asymmetrical computations.To unlock the potential of low-bit quantization on resource-constrained edge devices, hardware must natively support mpGEMM. To address this, we developed the following three approaches for computing kernels and hardware architectures:Ladder data type compiler: Supports various low-precision data types by converting unsupported types into hardware-compatible ones without data loss, while also generating high-performance conversion code.T-MAC mpGEMM library: Implements GEMM using a lookup table (LUT) approach, eliminating multiplications to significantly reduce computational overhead. Optimized for diverse CPUs, T-MAC delivers several times the speed of other libraries.LUT Tensor Core hardware architecture: Introduces a cutting-edge design for next-generation AI hardware, tailored for low-bit quantization and mixed-precision computations.The following sections describe these techniques in detail.Ladder: Bridging the gap between custom data and hardware limitsCutting-edge hardware accelerators, such as GPUs, TPUs, and specialized chips, are designed to speed up computationally intensive tasks like deep learning by efficiently handling large-scale operations. These accelerators now integrate lower-bit computing units, such as FP32, FP16, and even FP8, into their architectures.However, constraints in chip area and hardware costs limit the availability of these units for standard data types. For instance, the NVIDIA V100 Tensor Core GPU supports only FP16, while the A100 supports int2, int4, and int8 but not newer formats like FP8 or OCP-MXFP. Additionally, the rapid development of LLMs often outpaces hardware upgrades, leaving many new data formats unsupported and complicating deployment.Additionally, while hardware accelerators may lack direct support for custom data types, their memory systems can convert these types into fixed-width data blocks that store any data format. For instance, NF4 tensors can be converted into FP16 or FP32 for floating-point operations.Building on these insights, we developed the Ladder data type compiler, a method to separate data storage from computation, enabling broader support for custom data types. It bridges the gap between emerging custom data formats with the precision types supported by current hardware.Ladder offers a flexible system for converting between algorithm-specific and hardware-supported data types without data loss. For low-bit applications, it optimizes performance by translating low-bit data into the most efficient formats for the hardware being used. As shown in Figure 1, this includes mapping low-bit computations to supported instructions and efficiently managing data storage across the memory hierarchy.Figure 1: The Ladder architectureEvaluating LadderEvaluations of Ladder on NVIDIA and AMD GPUs show that it outperforms existing deep neural network (DNN) compilers for natively supported data types. It also handles custom data types not supported by GPUs, achieving speedups of up to 14.6 times.As the first system to support custom low-precision data types for running DNNs on modern hardware accelerators, Ladder provides researchers with flexibility in optimizing data types. It also enables hardware developers to support a wider range of data types without requiring hardware modifications.T-MAC: Table-lookup for mpGEMM without multiplicationDeploying low-bit quantized LLMs on edge devices often requires dequantizing models to ensure hardware compatibility. However, this approach has two major drawbacks:Performance: Dequantization overhead can result in poor performance, negating the benefits of low-bit quantization.Development: Developers must redesign data layouts and kernels for different mixed precisions.To address these challenges, we introduce T-MAC, a novel LUT-based method that enables mpGEMM without dequantization or multiplication.T-MAC replaces traditional multiplication operations with bit-wise table lookups, offering a unified and scalable solution for mpGEMM. It incorporates techniques to reduce the size of tables and store them directly on the chip, minimizing the overhead of accessing data from memory. By eliminating dequantization and lowering computational costs, T-MAC enables efficient inference of low-bit LLMs on resource-constrained edge devices. Figure 2 illustrates T-MACs architecture.Figure 2. Overview of the T-MAC systemEvaluating T-MACPerformance evaluations of T-MAC on low-bit models demonstrated substantial benefits in efficiency and speed. On the Surface Laptop 7 with the Qualcomm Snapdragon X Elite chipset, T-MAC achieved:48 tokens per second for the 3B BitNet-b1.58 model30 tokens per second for the 2-bit 7B Llama model20 tokens per second for the 4-bit 7B Llama modelThese speeds far exceed average human reading rates, outperforming llama.cpp by 45 times and doubling the speed of a dedicated NPU accelerator.Even on lower-end devices like the Raspberry Pi 5, T-MAC made it possible for the 3B BitNet-b1.58 model to generate 11 tokens per second. It also proved highly power-efficient, matching llama.cpps generation rate while using only 1/4 to 1/6 of the CPU cores.These results establish T-MAC as a practical solution for deploying LLMs on edge devices with standard CPUs, without relying on GPUs or NPUs. T-MAC allows LLMs to run efficiently on resource-constrained devices, expanding their applicability across a wider range of scenarios.LUT Tensor Core: Driving hardware for mpGEMMWhile T-MAC and Ladder optimize mpGEMM on existing CPU and GPU architectures, improving computational efficiency, they cannot match the performance of dedicated hardware accelerators with built-in LUT support. Achieving significant improvements in performance, power, and area (PPA) requires overcoming four key challenges:Table precompute and storage: Precomputing and storing LUTs add overhead, increasing area usage, latency, and storage requirements, which can reduce overall efficiency gains.Bit-width flexibility: Hardware must support various precision levels, such as int4/2/1 for weights and FP16/8 or int8 for activations, along with their combinations. This flexibility is crucial for accommodating diverse model architectures and use cases.LUT tiling shape: Inefficient tiling shapes can raise storage costs and limit reuse opportunities, adversely affecting performance and efficiency.Instruction and compilation: LUT-based mpGEMM requires a new instruction set. Existing compilation stacks, designed for standard GEMM hardware, may not optimally map and schedule these instructions, complicating integration with LLM inference software.In response, we developed LUT Tensor Core, a software-hardware codesign for low-bit LLM inference. To address precomputation overhead in conventional LUT-based methods, we introduce techniques like software-based DFG transformation, operator fusion, and table symmetrization to optimize table precomputation and storage. Additionally, we propose a hardware design with an elongated tiling shape to support table reuse and a bit-serial design to handle various precision combinations in mpGEMM.To integrate with existing GPU microarchitectures and software stacks, we extended the MMA instruction set, added new LMMA instructions, and developed a cuBLAS-like software stack for easy integration into existing DNN frameworks. We also created a compiler for end-to-end execution planning on GPUs with LUT Tensor Core. This design and workflow, illustrated in Figure 3, enabled the quick and seamless adoption of LUT Tensor Core.Figure 3. The LUT Tensor Core workflowEvaluating LUT Tensor CoreTesting LUT Tensor Core on low-bit LLMs, such as BitNet and Llama, showed significant performance gains, achieving 6.93 times the inference speed while using just 38.3% of the area of a traditional Tensor Core. With nearly identical model accuracy, this results in a 20.9-fold increase in computational density and an 11.2-fold boost in energy efficiency. As AI models grow in scale and complexity, LUT Tensor Core enables low-bit LLMs to be applied in new and diverse scenarios.We believe the LUT technique could drive a paradigm shift in AI model inference. Traditional methods rely on multiplication and accumulation operations, whereas LUT implementations provide higher transistor density, greater throughput per chip area, lower energy costs, and better scalability. As large models adopt low-bit quantization, the LUT method could become the standard for system and hardware design, advancing the next generation of AI hardware innovation.Unlocking new possibilities for embodied AILow-bit quantization improves the efficiency of running large models on edge devices while also enabling model scaling by reducing the bits used to represent each parameter. This scaling enhances model capabilities, generality, and expressiveness, as shown by the BitNet model, which starts with a low-bit configuration and expands.Technologies like T-MAC, Ladder, and LUT Tensor Core provide solutions for running low-bit quantized LLMs, supporting efficient operation across edge devices and encouraging researchers to design and optimize LLMs using low-bit quantization. By reducing memory and computational demands, low-bit LLMs could power embodied AI systems, such as robots, enabling dynamic perception and real-time environmental interaction.T-MAC (opens in new tab) and Ladder (opens in new tab) are open source and available on GitHub. We invite you to test and explore these innovations in AI technology with Microsoft Research.Spotlight: blog postGraphRAG auto-tuning provides rapid adaptation to new domainsGraphRAG uses LLM-generated knowledge graphs to substantially improve complex Q&A over retrieval-augmented generation (RAG). Discover automatic tuning of GraphRAG for new datasets, making it more accurate and relevant.Read moreOpens in a new tab Opens in a new tab0 Commenti 0 condivisioni 212 Views
-
WWW.MICROSOFT.COMResearch Focus: Week of January 27, 2025In this edition:We introduce FLAVARS, a multimodal foundation language and vision alignment model for remote sensing; Managed-retention memory, a new class of memory which is more optimized to store key data structures for AI inference workloads; and Enhanced detection of macular telangiectasia type 2 (MacTel 2) using self-supervised learning and ensemble models.We present a new approach to generalizing symbolic automata, which brings together a variety of classic automata and logics in a unified framework with all the necessary ingredients to support symbolic model checking moduloA.And we invite you to join an upcoming workshop: LLM4Eval@WSDM 2025: Large Language Models for Evaluation in Information Retrieval. LLM4Eval is a promising technique in the areas of automated judgments, natural language generation, and retrieval augmented generation (RAG) systems. Researchers from Microsoft and experts from industry and academia will explore this technique at an interactive workshop on Friday, March 14, in Hanover, Germany.NEW RESEARCHIn the field of remote sensing, imagery is generally dense with objects and visual content which can vary regionally across the globe. This creates a need for vision-language datasets to be highly detailed when describing imagery, and for pretraining to better balance visual task performance while retaining the ability to perform zero-shot classification and image-text retrieval.One strategy is to combine paired satellite images and text captions for pretraining performant encoders for downstream tasks. However, while contrastive image-text methods like CLIP enable vision-language alignment and zero-shot classification ability, CLIPs vision-only downstream performance tends to degrade compared to image-only pretraining, such as Masked Autoencoders (MAE).To better approach multimodal pretraining for remote sensing, researchers from Microsoft propose a pretraining method that combines the best of both contrastive learning and masked modeling, along with geospatial alignment via contrastive location encoding, in the recent paper: FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing. The research shows that FLAVARS significantly outperforms a baseline of SkyCLIP for vision-only tasks such as KNN classification and semantic segmentation, +6% mIOU on SpaceNet1, while retaining the ability to perform zero-shot classification, unlike MAE pretrained methods.Read the paperNEW RESEARCHAI clusters today are one of the major uses of high bandwidth memory (HBM), a high-performance type of computer memory. However, HBM is suboptimal for AI inference workloads for several reasons. Analysis shows that HBM is overprovisioned on write performance, underprovisioned on density and read bandwidth, and has significant energy-per-bit overhead. It is also expensive, with lower yield than DRAM due to manufacturing complexity.In a recent paper: Managed-Retention Memory: A New Class of Memory for the AI Era, researchers from Microsoft propose a memory class which is more optimized to store key data structures for AI inference workloads. The paper makes the case that MRM may finally provide a path to viability for technologies that were originally proposed to support storage class memory (SCM). These technologies traditionally offered long-term persistence (10+ years) but provided poor IO performance and/or endurance. MRM makes different trade-offs, and by understanding the workload IO patterns, MRM foregoes long-term data retention and write performance for better potential performance on the metrics important for AI inference.Read the paperNEW RESEARCHMacular telangiectasia type 2 (MacTel) is a retinal disease that is challenging to diagnose. While increased awareness has led to improved diagnostic outcomes, MacTel diagnosis relies significantly upon a multimodal image set and the expertise of clinicians familiar with the disease. Optical coherence tomography (OCT) imaging has emerged as a valuable tool for the diagnosis and monitoring of various retinal diseases.With the increasing integration of OCT into clinical practice, deep learning models may be able to achieve accurate MacTel prediction comparable to that of retinal specialists, even when working with limited data.Researchers from Microsoft and external colleagues address this challenge in a recent paper: Enhanced Macular Telangiectasia Type 2 Detection: Leveraging Self-Supervised Learning and Ensemble Models. Published in the journal of Ophthalmology Science, the paper focuses on the accurate classification of macular telangiectasia type 2 using OCT images, with the overarching goal of facilitating early and precise detection of this neurodegenerative disease.The researchers present results leveraging self-supervised learning and ensemble models, showing their approach improves both MacTel classification accuracy and interpretability when compared to the use of individual models. Ensemble models exhibited superior agreement with the assessments of the most experienced individual human experts, as well as the ensemble of human experts.Read the paperMicrosoft research podcastCollaborators: Silica in space with Richard Black and Dexter GreeneCollege freshman Dexter Greene and Microsoft research manager Richard Black discuss how technology that stores data in glass is supporting students as they expand earlier efforts to communicate what it means to be human to extraterrestrials.Listen nowOpens in a new tab NEW RESEARCHSymbolic automata are finite state automata that support potentially infinite alphabets, such as the set of rational numbers, generally applied to regular expressions and languages over finite words. In symbolic automata (or automata moduloA), an alphabet is represented by an effective Boolean algebraA, supported by a decision procedure for satisfiability. Regular languages over infinite words (so called -regular languages) have a rich history paralleling that of regular languages over finite words, with well-known applications to model checking via Bchi automata and temporal logics.In a recent paper: Symbolic Automata: Omega-Regularity Modulo Theories, researchers from Microsoft generalize symbolic automata to support -regular languages viatransition termsandsymbolic derivatives. This brings together a variety of classic automata and logics in a unified framework that provides all the necessary ingredients to support symbolic model checking moduloA.Read the paperEVENTLLMs have shown increasing task-solving abilities not present in smaller models. Using LLMs for automated evaluation (LLM4Eval) is a promising technique in the areas of automated judgments, natural language generation, and retrieval augmented generation (RAG) systems.Join researchers from Microsoft and experts from industry and academia for a discussion on using LLMs for evaluation in information retrieval at LLM4Eval Workshop WSDM 2025 (opens in new tab), March 14, 2025, in Hanover, Germany.This interactive workshop will cover automated judgments, RAG pipeline evaluation, altering human evaluation, robustness, and trustworthiness of LLMs for evaluation in addition to their impact on real-world applications. The organizers believe that the information retrieval community can significantly contribute to this growing research area by designing, implementing, analyzing, and evaluating various aspects of LLMs with applications to LLM4Eval tasks.Learn more about the workshopMicrosoft Research | In case you missed itMicrosoft Team Uses Diffusion Model For Materials ScienceJanuary 21, 2025Finding a new material for a target application is like finding a needle in a haystack, write the authors of a blog post at Microsoft, where they have been working on just such a program, something called, aptly, MatterGen. Microsoft AutoGen v0.4: A turning point toward more intelligent AI agents for enterprise developersJanuary 18, 2025The world of AI agents is undergoing a revolution, and Microsofts release of AutoGen v0.4 this week marked a significant leap forward in this journey. Positioned as a robust, scalable and extensible framework, AutoGen represents Microsofts latest attempt to address the challenges of building multi-agent systems for enterprise applications. 2 AI breakthroughs unlock new potential for health and scienceJanuary 17, 2025Two new research papers published this week in scientific journals, one in Nature and one in Nature Machine Intelligence, show how generative AI foundation models can exponentially speed up scientific discovery of new materials and help doctors access and analyze radiology results faster. ChatGPT gets proactive with 'Tasks'January 15, 2025Good morning, AI enthusiasts. OpenAIs AI agent era just got its unofficial start with ChatGPT gaining the ability to schedule and manage daily tasks. With Tasks rolling out and mysterious Operator whispers in the air, is OpenAI finally ready to move from chatbots to full-on autonomous assistants? Mayo Clinic and Microsoft partner to advance generative AI in radiologyJanuary 15, 2025The Mayo Clinic is seeking to advance the use of generative artificial intelligence in imaging through a new collaboration with Microsoft Research. The duo made the announcement during the 43rd Annual J.P. Morgan Healthcare Conference taking place now in San Francisco. View more news and awards Opens in a new tab0 Commenti 0 condivisioni 240 Views
-
WWW.MICROSOFT.COMIdeas: Bug hunting with Shan LuTranscript[TEASER][MUSIC PLAYS UNDER DIALOGUE]SHAN LU: I remember, you know, those older days myself, right. That is really, like, I have this struggle that I feel like I can do better. I feel like I have ideas to contribute. But just for whatever reason, right, it took me forever to learn something which I feel like its a very mechanical thing, but it just takes me forever to learn, right. And then now actually, I see this hope, right, with AI. You know, a lot of mechanical things that can actually now be done in a much more automated way, you know, by AI, right. So then now truly, you know, my daughter, many girls, many kids out there, right, whatever, you know, they are good at, their creativity, itll be much easier, right, for them to contribute their creativity to whatever discipline they are passionate about.[TEASER ENDS]GRETCHEN HUIZINGA:Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. Im Gretchen Huizinga. In this series, well explore the technologies that are shaping our future and the big ideas that propel them forward.[MUSIC FADES]Today Im talking to Shan Lu, a senior principal research manager at Microsoft Research and a computer science professor at the University of Chicago. Part of the Systems Research Group, Shan and her colleagues are working to make our computer systems, and I quote, secure, scalable, fault tolerant, manageable, fast, and efficient. Thats no small order, so Im excited to explore the big ideas behind Shans influential research and find out more about her reputation as a bug bounty hunter. Shan Lu, welcome to Ideas!SHAN LU: Thank you.HUIZINGA: So I like to start these episodes with what Ive been calling the research origin story, and you have a unique, almost counterintuitive, story about what got you started in the field of systems research. Would you share that story with our listeners?LU: Sure, sure. Yeah. I grew up fascinating that I will become mathematician. I think I was good at math, and at some point, actually, until, I think, I entered college, I was still, you know, thinking about, should I do math? Should I do computer science? For whatever reason, I think someone told me, you know, doing computer science will help you; its easier to get a job. And I reluctantly pick up computer science major. And then there was a few years in my college, I had a really difficult time for programming. And I also remember that there was, like, I spent a lot of time learning one languagewe started with Pascaland I feel like I finally know what to do and then theres yet another language, C, and another class, Java. And I remember, like, the teacher will ask us to do a programming project, and there are times I dont even, I just dont know how to get started. And I remember, at that time, in my class, I think there were we only had like four girls taking this class that requires programming in Java, and none of us have learned Java before. And when we ask our classmates, when we ask the boys, they just naturally know what to do. It was really, really humiliating. Embarrassing. I had the feeling that, I felt like Im just not born to be a programmer. And then, I came to graduate school. I was thinking about, you know, what kind of research direction I should do. And I was thinking that, oh, maybe I should do theory research, like, you know, complexity theory or something. You know, after a lot of back and forth, I met my eventual adviser. She was a great, great mentor to me, and she told me that, hey, Shan, you know, my group is doing research about finding bugs in software. And she said her group is doing system research, and she said a lot of current team members are all great programmers, and as a result, they are not really well-motivated [LAUGHS] by finding bugs in software!HUIZINGA: Interesting.LU: And then she said, you are really motivated, right, by, you know, getting help to developers, to help developers finding bugs in their software, so maybe thats the research project for you. So thats how I got started.HUIZINGA: Well, lets go a little bit further on this mentor and mentors in general. As Dr. Seuss might say, every what has a who. So by that I mean an inspirational person or people behind every successful researchers career. And most often, theyre kind of big names and meaningful relationships, but you have another unique story on who has influenced you in your career, so why dont you tell us about the spectrum of people whove been influential in your life and your career?LU: Mm-hmm. Yeah, I mean, I think I mentioned my adviser, and shes just so supportive. And I remember, when I started doing research, I just felt like I seemed to be so far behind everyone else. You know, I felt like, how come everybody else knows how to ask, you know, insightful questions? And they, like, they know how to program really fast, bug free. And my adviser really encouraged me, saying, you know, there are background knowledge that you can pick up; you just need to be patient. But then there are also, like, you know how to do research, you know how to think about things, problem solving. And she encouraged me saying, Shan, youre good at that!HUIZINGA: Interesting!LU: Well, I dont know how she found out, and anyway, so she was super, super helpful.HUIZINGA: OK, so go a little further on this because I know you have others that have influence you, as well.LU: Yes. Yes, yes. And I think those, to be honest, Im a very emotional, sensitive person. I would just, you know, move the timeline to be, kind of, more recent. So I joined Microsoft Research as a manager, and theres something called Connect that, you know, people write down twice every year talking about what it is theyve been doing. So I was just checking, you know, my members in my team to see what they have been doing over the years just to just get myself familiar with them. And I remember I read several of them. I felt like I almost have tears in my eyes! Like, I realized, wow, like And just to give example, for Chris, Chris Hawblitzel, I read his Connect, and I saw that hes working on something called program verification. Its a very, very difficult problem, and [as an] outsider, you know, Ive read many of his papers, but when I read, you know, his own writing, and I realized, wow, you know, its almost two decades, right. Like, he just keeps doing these very difficult things. And I read his words about, you know, how his old approach has problems, how hes thinking about how to address that problem. Oh, I have an idea, right. And then spend multiple years to implement that idea and get improvement; find a new problem and then just find new solutions. And I really feel like, wow, Im really, really, like, I feel like this is, kind of, like a, you know, theres, how to say, a hero-ish story behind this, you know, this kind of goal, and youre willing to spend many years to keep tackling this challenging problem. And I just feel like, wow, Im so honored, you know, to be in the same group with a group of fighters, you know, determined to tackle difficult research problems.HUIZINGA: Yeah. And I think when you talk about it, its like this is a person that was working for you, a direct report. [LAUGHTER] And often, we think about our heroes as being the ones who mentored us, who taught us, who managed us, but yours is kind of 360! Its like LU: True!HUIZINGA: your heroes [are] above, beside and below.LU: Right. And I would just say that I have many other, you know, direct reports in my group, and I have, you know, for example, say a couple other my colleagues, my direct reports, Dan Ports and Jacob Nelson. And again, this is something like their story really inspired me. Like, they were, again, spent five or six years on something, and it looks like, oh, its close to the success of tech transfer, and then something out of their control happened. It happened because Intel decided to stop manufacturing a chip that their research relied on. And its, kind of, like the end of the world to them, HUIZINGA: Yeah.LU: and then they did not give up. And then, you know, like, one year later, they found a solution, you know, together with their product team collaborators.HUIZINGA: Wow.LU: And I still feel like, wow, you know, I feel so I feel like Im inspired every day! Like, Im so happy to be working together with, you know, all these great people, great researchers in my team.HUIZINGA: Yeah. Wow. So much of your work centers on this idea of concurrent systems and I want you to talk about some specific examples of this work next, but I think it warrants a little explication upfront for those people in the audience who dont spend all their time working on concurrent systems themselves. So give us a short 101 on concurrent systems and explain why the work you do matters to both the people who make it and the people who use it.LU: Sure. Yeah. So I think a lot of people may not realize so actually, the software were using every day, almost every software we use these days are concurrent. So the meaning of concurrent is that you have multiple threads of execution going on at the same time, in parallel. And then, when we go to a web browser, right, so its not just one rendering that is going on. There are actually multiple concurrent renderings that is going on. So the problem of writing for software developers to develop this type of concurrent system, a challenge is the timing. So because you have multiple concurrent things going on, its very difficult to manage and reason about, you know, what may happen first, what may happen second. And also, its, like, theres an inherent non-determinism in it. What happened first this time may happen second next time. So as a result, a lot of bugs are introduced by this. And it was a very challenging problem because I would say about 20 years ago, there was a shift. Like, in the older days, actually most of our software is written in a sequential way instead of a concurrent way. So, you know, a lot of developers also have a difficult time to shift their mindset from the sequential way of reasoning to this concurrent way of reasoning.HUIZINGA: Right. Well, and I think, from a users perspective, all you experience is what I like to call the spinning beachball of doom. Its like Ive asked something, and it doesnt want to give, so [LAUGHS] And this is, like, behind the scenes from a reasoning perspective of, how do we keep that from happening to our users? How do we identify the bugs? Which well get to in a second. Umm. Thanks for that. Your research now revolves around what I would call the big idea of learning from mistakes. And in fact, it all seems to have started with a paper that you published way back in 2008 called Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics, and you say this strongly influenced your research style and approach. And by the way, Ill note that this paper received the Most Influential Paper Award in 2022 from ASPLOS, which is the Architectural Support for Programming Languages and Operating Systems. Huge mouthful. And it also has more than a thousand citations, so I dare say its influenced other researchers approach to research, as well. Talk about the big idea behind this paper and exactly how it informed your research style and approach today.LU: Mm-hmm. Yeah. So I think this, like, again, went back to the days that I, you know, my PhD days, I started working with my adviser, you know, YY (Yuanyuan Zhou). So at that time, there had been a lot of people working on bug finding, but then now when I think about it, people just magically say, hey, I want to look at this type of bug. Just magically, oh, I want to look at that type of bug. And then, my adviser at that time suggested to me, saying, hey, maybe, you know, actually take a look, right. At that time, as I mentioned, software was kind of shifting from sequential software to concurrent software, and my adviser was saying, hey, just take a look at those real systems bug databases, and see what type of concurrency bugs are actually there. You know, instead of just randomly saying, oh, I want to work on this type of bug.HUIZINGA: Oh, yeah.LU: And then also, of course, its not just look at it. Its not just like you read a novel or something, right. [LAUGHTER] And again, my adviser said, hey, Shan, right, you have this, you have a connection, natural connection, you know, with bugs and the developers who commit HUIZINGA: Who make them LU: Who make them! [LAUGHTER] So she said, you know, try to think about the patterns behind them, right. Try to think about whether you can generalize some HUIZINGA: Interesting LU: characteristics, and use that to guide peoples research in this domain. And at that time, we were actually thinking we dont know whether, you know, we can actually write a paper about it because traditionally you publish a paper, just say, oh, I have a new tool, right, which can do this and that. At that time in system conferences, people rarely have, you know, just say, heres a study, right. But we studied that, and indeed, you know, I had this thought that, hey, why I make a lot of mistakes. And when I study a lot of bugs, the more and more, I feel, you know, theres a reason behind it, right. Its like Im not the only dumb person in the world, right? [LAUGHTER] Theres a reason that, you know, theres some part of this language is difficult to use, right, and theres a certain type of concurrent reasoning, its just not natural to many people, right. So because of that, there are patterns behind these bugs. And so at that time, we were surprised that the paper was actually accepted. Because Im just happy with the learning I get. But after this paper was accepted, in the next, I would say, many years, there are more and more people realize, hey, before we actually, you know, do bug-finding things, lets first do a study, right, to understand, and then this paper was yeah I was very happy that it was cited many, many times.HUIZINGA: Yeah. And then gets the most influential paper many years later.LU: Many years later. Yes.HUIZINGA: Yeah, I feel like theres a lot of things going through my head right now, one of which is what AI is, is a pattern detector, and you were doing that before AI even came on the scene. Which goes to show you that humans are pretty good at pattern detection also. We might not do as fast as LU: True.HUIZINGA: as an AI but so this idea of learning from mistakes is a broad theme. Another theme that I see coming through your papers and your work is persistence. [LAUGHTER] And you mentioned this about your team, right. I was like, these people are people who dont give up. So we covered this idea in an Abstracts podcast recently talking about a paper which really brings this to light: If at First You Dont Succeed, Try, Try Again. Thats the name of the paper. And we didnt have time to discuss it in depth at the time because the Abstracts show is so quick. But we do now. So Id like you to expand a little bit on this big idea of persistence and how large language models are not only changing the way programming and verification happens but also providing insights into detecting retry bugs.LU: Yes. So I guess maybe I will, since you mentioned this persistence, you know, after that Learning from Mistakes paperso that was in 2008and in the next 10 years, a little bit more than 10 years, in terms of persistence, right, so we have continued, me and my students, my collaborators, we have continued working on, you know, finding concurrency bugs HUIZINGA: Yeah.LU: which is related to, kind of related to, why Im here at Microsoft Research. And we keep doing it, doing it, and then I feel like a high point was that I had a collaboration with my now colleagues here, Madan Musuvathi and Suman Nath. So we built a tool to detect concurrency bugs, and after more than 15 years of effort on this, we were able to find more than 1,000 concurrency bugs. It was built in a tool called Torch that was deployed in the company, and it won the Best Paper Award at the top system conference, SOSP, and it was actually a bittersweet moment. This paper seems to, you know, put an end HUIZINGA: Oh, interesting!LU: to our research. And also some of the findings from that paper is that we used to do very sophisticated program analysis to reason about the timing. And in that paper, we realized actually, sometimes, if youre a little bit fuzzy, dont aim to do perfect analysis, the resulting tool is actually more effective. So after that paper, Madan, Suman, and me, we kind of, you know, shifted our focus to looking at other types of bugs. And at the same time, the three of us realized the traditional, very precise program analysis may not be needed for some of the bug finding. So then, for this paper, this retry bugs, after we shifted our focus away from concurrency bugs, we realized, oh, there are many other types of important bugs, such as, in this case, like retry, right, when your software goes wrong, right. Another thing we learned is that it looks like you can never eliminate all bugs, so something will go wrong, [LAUGHTER] and then so thats why you need something like retry, right. So like if something goes wrong, at least you wont give up immediately.HUIZINGA: Right.LU: The software will retry. And another thing that started from this earlier effort is we started using large language models because we realized, yeah, you know, traditional program analysis sometimes can give you a very strong guarantee, but in some other cases, like in this retry case, some kind of fuzzy analysis, you know, not so precise, offered by large language models is sometimes even more beneficial. Yeah. So thats kind of, you know, the story behind this paper.HUIZINGA: Yeah, yeah, yeah, yeah. So, Shan, were hearing a lot about how large language models are writing code nowadays. In fact, NVIDIAs CEO says, mamas, dont let your babies grow up to be coders because AIs going to do that. I dont know if hes right, but one of the projects youre most excited about right now is called Verus, and your colleague Jay Lorch recently said that he sees a lot of synergy between AI and verification, where each discipline brings something to the other, and Rafah Hosn has referred to this as co-innovation or bidirectional enrichment. I dont know if thats exactly what is going on here, but it seems like it is. Tell us more about this project, Verus, and how AI and software verification are helping each other out.LU: Yes, yes, yes, yes. Im very excited about this project now! So first of all, starting from Verus. So Verus is a tool that helps you verify the correctness of Rust code. So this is a its a relatively new tool, but its creating a lot of, you know, excitement in the research community, and its created by my colleague Chris Hawblitzel and his collaborators outside Microsoft Research.HUIZINGA: Interesting.LU: And as I mentioned, right, this is a part that, you know, really inspired me. So traditionally to verify, right, your program is correct, it requires a lot of expertise. You actually have to write your proof typically in a special language. And, you know, so a lot of people, including me, right, who are so eager to get rid of bugs in my software, but there are people told me, saying just to learn that languageso they were referring to a language called Coqjust to learn that language, they said it takes one or two years. And then once you learn that language, right, then you have to learn about how to write proofs in that special language. So people, particularly in the bug-finding community, people know that, oh, in theory, you can verify it, but in reality, people dont do that. OK, so now going back to this Verus tool, why its exciting so it actually allows people to write proofs in Rust. So Rust is an increasingly popular language. And there are more and more people picking up Rust. Its the first time I heard about, oh, you can, you know, write proofs in a popular language. And also, another thing is in the past, you cannot verify an implementation directly. You can only verify something written in a special language. And the proof is proving something that is in a special language. And then finally, that special language is maybe then transformed into an implementation. So its just, theres just too many special languages there.HUIZINGA: A lot of layers.LU: A lot of layers. So now this Verus tool allows you to write a proof in Rust to prove an implementation that is in Rust. So its very direct. I just feel like Im just not good at learning a new language.HUIZINGA: Interesting.LU: So when I came here, you know, and learned about this Verus tool, you know, by Chris and his collaborators, I feel like, oh, looks like maybe I can give it a try. And surprisingly, I realized, oh, wow! I can actually write proofs using this Verus tool.HUIZINGA: Right.LU: And then, of course, you know, I was told, if you really want to, right, write proofs for large systems, it still takes a lot of effort. And then this idea came to me that, hey, maybe, you know, these days, like, large language models can write code, then why not let large language models write proofs, right? And of course, you know, other people actually had this idea, as well, but theres a doubt that, you know, can large language models really write proofs, right? And also, people have this feeling that, you know, large language models seem not very disciplined, you know, by nature. But, you know, thats what intrigued me, right. And also, I used to be a doubter for, say, GitHub Copilot. USED to! Because I feel like, yes, it can generate a lot of code, but who knows [LAUGHS] HUIZINGA: Whether its right LU: What, what is whether its right?HUIZINGA: Yeah.LU: Right, so I feel like, wow, you know, this could be a game-changer, right? Like, if AI can write not only code but also proofs. Yeah, so thats what I have been doing. Ive been working on this for one year, and I gradually get more collaborators both, you know, people in Microsoft Research Asia, and, you know, expertise here, like Chris, and Jay Lorch. They all help me a lot. So we actually have made a lot of progress.HUIZINGA: Yeah.LU: Like, now its, like, weve tried, like, for example, for some small programs, benchmarks, and we see that actually large language models can correctly prove the majority of the benchmarks that we throw to it. Yeah. Its very, very exciting.HUIZINGA: Well, and so and were going to talk a little bit more about some of those doubts and some of those interesting concerns in a bit. I do want you to address what I think Jay was getting at, which is that somehow the two help each other. The verification improves the AI. The AI improves the verification.LU: Yes, yes.HUIZINGA: How?LU: Yes. My feeling is that a lot of people, if theyre concerned with using AI, its because they feel like theres no guarantee for the content generated by AI, right. And then we also all heard about, you know, hallucination. And I tried myself. Like, I remember, at some point, if I ask AI, say, you know, which is bigger: is it three times three or eight? And the AI will tell me eight is bigger. And [LAUGHTER]HUIZINGA: Like, what?LU: So I feel like verification can really help AI HUIZINGA: Get better LU: because now you can give, you know, kind of, add in mathematical rigors into whatever that is generated by AI, right. And I say it would help AI. It will also help people who use AI, right, so that they know what can be trusted, right.HUIZINGA: Right.LU: What is guaranteed by this content generated by AI?HUIZINGA: Yeah, yeah, yeah.LU: Yeah, and now of course AI can help verification because, you know, verification, you know, its hard. There is a lot of mathematical reasoning behind it. [LAUGHS] And so now with AI, it will enable verification to be picked up by more and more developers so that we can get higher-quality software.HUIZINGA: Yeah.LU: Yeah.HUIZINGA: Yeah. And well get to that, too, about what I would call the democratization of things. But before that, I want to, again, say an observation that I had based on your work and my conversations with you is that youve basically dedicated your career to hunting bugs.LU: Yes.HUIZINGA: And maybe thats partly due to a personal story about how a tiny mistake became a bug that haunted you for years. Tell us the story.LU: Yes.HUIZINGA: And explain why and how it launched a lifelong quest to understand, detect, and expose bugs of all kinds.LU: Yes. So before I came here, I already had multiple times, you know, interacting with Microsoft Research. So I was a summer intern at Microsoft Research Redmond almost 20 years ago.HUIZINGA: Oh, wow!LU: I think it was in the summer of 2005. And I remember I came here, you know, full of ambition. And I thought, OK, you know, I will implement some smart algorithm. I will deliver some useful tools. So at that time, I had just finished two years of my PhD, so I, kind of, just started my research on bug finding and so on. And I remember I came here, and I was told that I need to program in C#. And, you know, I just naturally have a fear of learning a new language. But anyway, I remember, I thought, oh, the task I was assigned was very straightforward. And I think I went ahead of myself. I was thinking, oh, I want to quickly finish this, and I want to do something more novel, you know, that can be more creative. But then this simple task I was assigned, I ended up spending the whole summer on it. So the tool that I wrote was supposed to process very huge logs. And then the problem is my software is, like, you run it initially So, like, I can only run it for 10 minutes because my software used so much memory and it will crash. And then, I spent a lot of time I was thinking, oh, my software is just using too much memory. Let me optimize it, right. And then so, I, you know, I try to make sure to use memory in a very efficient way, but then as a result, instead of crashing every 10 minutes, it will just crash after one hour. And I know theres a bug at that time. So theres a type of bug called memory leak. I know theres a bug in my code, and I spent a lot of time and there was an engineer helping me checking my code. We spent a lot of time. We were just not able to find that bug. And at the end, we the solution is I was just sitting in front of my computer waiting for my program to crash and restart. [LAUGHTER] And at that time, because there was very little remote working option, so in order to finish processing all those logs, its like, you know, after dinner, I HUIZINGA: You have to stay all night!LU: I have to stay all night! And all my intern friends, they were saying, oh, Shan, you work really hard! And Im just feeling like, you know what Im doing is just sitting in front of my computer waiting [LAUGHTER]for my program to crash so that I can restart it! And near the end of my internship, I finally find the bug. It turns out that I missed a pair of brackets in one line of code.HUIZINGA: Thats it.LU: Thats it.HUIZINGA: Oh, my goodness.LU: And it turns out, because I was used to C, and in C, when you want to free, which means deallocate, an array, you just say free array. And if I remember correctly, in this language, C#, you have to say, free this array name and you put a bracket behind it. Otherwise, it will only free the first element. And I it was a nightmare. And I also felt like, the most frustrating thing is, if its a clever bug, right [LAUGHS]HUIZINGA: Sure.LU: then you feel like at least Im defeated by something complicated HUIZINGA: Smart.LU: Something smart. And then its like, you know, also all this ambition I had about, you know, doing creative work, right, with all these smart researchers in MSR (Microsoft Research), I feel like I ended up achieving very little in my summer internship.HUIZINGA: But maybe the humility of making a stupid mistake is the kind of thing that somebody whos good at hunting bugs Its like missing an error in the headline of an article, because the print is so big [LAUGHTER] that youre looking for the little things in the I know thats a journalists problem. Actually, I actually love that story. And it, kind of, presents a big picture of you, Shan, as a person who has a realistic, self-awareness of and humility, which I think is rare at times in the software world. So thanks for sharing that. So moving on. When we talked before, you mentioned the large variety of programming languages and how that can be a barrier to entry or at least a big hurdle to overcome in software programming and verification. But you also talked about, as we just mentioned, how LLMs have been a democratizing force LU: Yes.HUIZINGA: LU: Yes.HUIZINGA: and what you see now with the advent of tools like GitHub Copilot, LU: Yes.HUIZINGA: what whats changed?LU: Oh, so much has changed. Well, I dont even know how to start. Like, I used to be really scared about programming. You know, when I tell this story, a lot of people say, no, I dont believe you. And I feel like its a trauma, you know.HUIZINGA: Sure.LU: I almost feel like its like, you know, the college-day me, right, who was scared of starting any programming project. Somehow, I felt humiliated when asking those very, I feel like, stupid questions to my classmates. It almost changed my personality! Its like for a long time, whenever someone introduced me to a new software tool, my first reaction is, uh, I probably will not be able to successfully even install it. Like whenever, you know, theres a new language, my first reaction is, uh, no, Im not good at it. And then, like, for example, this GitHub Copilot thing, actually, I did not try it until I joined Microsoft. And then I, actually, I havent programmed for a long time. And then I started collaborating with people in Microsoft Research Asia, and he writes programs in Python, right. And I have never written a single line of Python code before. And also, this Verus tool. It helps you to verify code in Rust, but I have never learned Rust before. So I thought, OK, maybe let me just try GitHub Copilot. And wow! You know, its like I realized, wow! Like [LAUGHS]HUIZINGA: I can do this!LU: I can do this! And, of course, sometimes I feel like my colleagues may sometimes be surprised because on one hand it looks like Im able to just finish, you know, write a Rust function. But on some other days, I ask very basic questions, [LAUGHTER] and I have those questions because, you know, the GitHub Copilot just helps me finish! [LAUGHS]HUIZINGA: Right.LU: You know, Im just starting something to start it, and then it just helps me finish. And I wish, when I started my college, if at that time there was GitHub Copilot, I feel like, you know, my mindset towards programming and towards computer science might be different. So it does make me feel very positive, you know, about, you know, what future we have, you know, with AI, with computer science.HUIZINGA: OK, usually, I ask researchers at this time, what could possibly go wrong if you got everything right? And I was thinking about this question in a different way until just this minute. I want to ask you what do you think that it means to have a tool that can do things for you that you dont have to struggle with? And maybe, is there anything good about the struggle? Because youre framing it as it sapped your confidence.LU: [LAUGHS] Yes.HUIZINGA: And at the same time, I see a woman who emerged stronger because of this struggle with an amazing career, a huge list of publications, influential papers, citations, leadership role. [LAUGHTER] So in light of that LU: Right.HUIZINGA: what do you see as the tension between struggling to learn a new language versus having this tool that can just do it that makes you look amazing? And maybe the truth of it is you dont know!LU: Yeah. Thats a very good point. I guess you need some kind of balance. And on one hand, yes, I feel like, again, right, this goes back to like my internship. I left with the frustration that I felt like I have so much creativity to contribute, and yet I could not because of this language barrier. You know, I feel positive in the sense that just from GitHub Copilot, right, how it has enabled me to just bravely try something new. I feel like this goes beyond just computer science, right. I can imagine itll help people to truly unleash their creativity, not being bothered by some challenges in learning the tool. But on the other hand, you made a very good point. My adviser told me she feels like, you know, I write code slowly, but I tend to make fewer mistakes. And the difficulty of learning, right, and all these nightmares I had definitely made me more more cautious? I pay more respect to the task that is given to me, so there is definitely the other side of AI, right, which is, you feel like everything is easy and maybe you do not have the experience of those bugs, right, that a software can bring to you and you have overreliance, right, on this tool.HUIZINGA: Yeah!LU: So hopefully, you know, some of the things we were doing now, right, like for example, say verification, right, like bringing this mathematical rigor to AI, hopefully that can help.HUIZINGA: Yeah. You know, even as you unpack the nuances there, it strikes me that both are good. Both having to struggle and learning languages and understanding LU: Yeah.HUIZINGA: the core of it and the idea that in natural language, you could just say, heres what I want to happen, and the AI does the code, the verification, etc. That said, do we trust it? And this was where I was going with the first what could possibly go wrong? question. How do we know that it is really as clever as it appears to be? [LAUGHS]LU: Yeah, I think I would just use the research problem we are working on now, right. Like, I think on one hand, I can use AI to generate a proof, right, to prove the code generated by AI is correct. But having said that, even if were wildly successful, you know, in this thing, human beings expertise is still needed because just take this as an example. What do you mean by correct, right?HUIZINGA: Sure.LU: And so someone first has to define what correctness means. And then so far, the experience shows that you cant just define it using natural language because our natural language is inherently imprecise.HUIZINGA: Sure.LU: So you still need to translate it to a formal specification in a programming language. It could be in a popular language like in Rust, right, which is what Verus is aiming at. And then we are, like, for example, some of the research we do is showing that, yes, you know, I can also use AI to do this translation from natural language to specification. But again, then, who to verify that, right? So at the end of the day, I think we still do need to have humans in the loop. But what we can do is to lower the burden and make the interface not so complicated, right. So that itll be easy for human beings to check what AI has been doing.HUIZINGA: Yeah. You know, everything were talking about just reinforces this idea that were living in a time where the advances in computer science that seemed unrealistic or impossible, unattainable even a few years ago are now so common that we take it for granted. And they dont even seem outrageous, but they are. So Im interested to know what, if anything, you would classify now as blue sky research in your field. Maybe something in systems research today that looks like a moonshot. Youve actually anchored this in the fact that you, kind of, have, you know, blinders on for the work youre doinghead down in the in the work youre doingbut even as you peek up from the work that might be outrageous, is there anything else? I just like to get this out there that, you know, whats going on 10 years down the line?LU: You know, sometimes I feel like Im just now so much into my own work, but, you know, occasionally, like, say, when I had a chat with my daughter and I explained to her, you know, oh, Im working on, you know, not only having AI to generate code but also having AI to prove, right, the code is correct. And she would feel, wow, that sounds amazing! [LAUGHS] So I dont know whether that is, you know, a moonshot thing, but thats a thing that Im super excited about HUIZINGA: Yeah.LU: about the potential. And then there also have, you know, my colleagues, we spend a lot of time building systems, and its not just about correctness, right. Like, the verification thing Im doing now is related to automatically verify its correct. But also, you need to do a lot of performance tuning, right. Just so that your system can react fast, right. It can have good utilization of computer resources. And my colleagues are also working on using AI, right, to automatically do performance tuning. And I know what they are doing, so I dont particularly feel thats a moonshot, but I guess HUIZINGA: I feel like, because you are so immersed, [LAUGHTER] that you just dont see how much we think LU: Yeah!HUIZINGA: its amazing. Well, Im just delighted to talk to you today, Shan. As we close and youve sort of just done a little vision casting, but lets take your daughter, my daughter,[LAUGHTER] all of our daughters LU: Yes!HUIZINGA: How does what we believe about the future in terms of these things that we could accomplish influence the work we do today as sort of a vision casting for the next Shan Lu whos struggling in undergrad/grad school?LU: Yes, yes, yes. Oh, thank you for asking that question. Yeah, I have to say, you know, I think were in a very interesting time, right, with all this AI thing.HUIZINGA: Isnt that a curse in China? May you live in interesting times!LU: And I think there were times, actually, you know, before I myself fully embraced AI, I was indeed I had my daughter in mind. I was worried when she grows up, what would happen? There will be no job for her because everything will be done by AI!HUIZINGA: Oh, interesting.LU: But then now, now that I have, you know, kind of fully embraced AI myself, actually, I see this more and more positive. Like you said, I remember, you know, those older days myself, right. That is really, like, I have this struggle that I feel like I can do better. I feel like I have ideas to contribute, but just for whatever reason, right, it took me forever to learn something which I feel like its a very mechanical thing, but it just takes me forever to learn, right. And then now actually, I see this hope, right, with AI, you know, a lot of mechanical things that can actually now be done in a much more automated way by AI, right. So then now truly, you know, my daughter, many girls, many kids out there, right, whatever you know, they are good at, their creativity, itll be much easier, right, for them to contribute their creativity to whatever discipline they are passionate about. Hopefully, they dont have to, you know, go through what I went through, right, to finally be able to contribute. But then, of course, you know, at the same time, I do feel this responsibility of me, my colleagues, MSR, we have the capability and also the responsibility, right, of building AI tools in a responsible way so that it will be used in a positive way by the next generation.HUIZINGA: Yeah. Shan Lu, thank you so much for coming on the show today. [MUSIC] Its been absolutely delightful, instructive, informative, wonderful.LU: Thank you. My pleasure.0 Commenti 0 condivisioni 267 Views
-
WWW.MICROSOFT.COMResearch Focus: Week of January 13, 2025In this edition:We introduce privacy enhancements for multiparty deep learning, a framework using smaller, open-source models to provide relevance judgments, and other notable new research.We congratulate Yasuyuki Matsushita, who was named an IEEE Computer Society Fellow.Weve included a recap of the extraordinary, far-reaching work done by researchers at Microsoft in 2024.NEW RESEARCHAI meets materials discoveryTwo of the transformative tools that play a central role in Microsofts work on AI for science are MatterGen and MatterSim. In the world of materials discovery, each plays a distinct yet complementary role in reshaping how researchers design and validate new materials.Read the storyNEW RESEARCHDistributed training enables multiple parties to jointly train a machine learning model on their respective datasets, which can help address the challenges posed by requirements in modern machine learning for large volumes of diverse data. However, this can raise security and privacy issues protecting each partys data during training and preventing leakage of private information from the model after training through various inference attacks.In a recent paper, Communication Efficient Secure and Private Multi-Party Deep Learning, researchers from Microsoft address these concerns simultaneously by designing efficient Differentially Private, secure Multiparty Computation (DP-MPC) protocols for jointly training a model on data distributed among multiple parties. This DP-MPC protocol in the two-party setting is 56-to-794 times more communication-efficient and 16-to-182 times faster than previous such protocols. This work simplifies and improves on previous attempts to combine techniques from secure multiparty computation and differential privacy, especially in the context of training machine learning models.Read the paperNEW RESEARCHTraining and evaluating retrieval systems requires significant relevance judgments, which are traditionally collected from human assessors. This process is both costly and time-consuming. Large language models (LLMs) have shown promise in generating relevance labels for search tasks, offering a potential alternative to manual assessments. Current approaches often rely on a single LLM. While effective, this approach can be expensive and prone to intra-model biases that can favor systems leveraging similar models.In a recent paper: JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment, researchers from Microsoft we introduce a framework that employs smaller, open-source models to provide relevance judgments by combining evaluations across multiple LLMs (LLMBlender) or multiple prompts (PromptBlender). By leveraging the LLMJudge benchmark, they compare JudgeBlender with state-of-the-art methods and the top performers in the LLMJudge challenge. This research shows that JudgeBlender achieves competitive performance, demonstrating that very large models are often unnecessary for reliable relevance assessments.Read the paperNEW RESEARCHCongestion games are used to describe the behavior of agents who share a set of resources. Each player chooses a combination of resources, which may become congested, decreasing utility for the players who choose them. Players can avoid congestion by choosing combinations that are less popular. This is useful for modeling a range of real-world scenarios, such as traffic flow, data routing, and wireless communication networks.In a recent paper: Convergence to Equilibrium of No-regret Dynamics in Congestion Games; researchers from Microsoft and external colleagues propose CongestEXP, a decentralized algorithm based on the classic exponential weights method. They evaluate CongestEXP in a traffic congestion game setting. As more drivers use a particular route, congestion increases, leading to higher travel times and lower utility. Players can choose a different route every day to optimize their utility, but the observed utility by each player may be subject to randomness due to uncertainty (e.g., bad weather). The researchers show that this approach provides both regret guarantees and convergence to Nash Equilibrium, where no player can unilaterally improve their outcome by changing their strategy.Read the paperNEW RESEARCHResearch and development (R&D) plays a pivotal role in boosting industrial productivity. However, the rapid advance of AI has exposed the limitations of traditional R&D automation. Current methods often lack the intelligence needed to support innovative research and complex development tasks, underperforming human experts with deep knowledge.LLMs trained on vast datasets spanning many subjects are equipped with extensive knowledge and reasoning capabilities that support complex decision-making in diverse workflows. By autonomously performing tasks and analyzing data, LLMs can significantly increase the efficiency and precision of R&D processes.In a recent article, researchers from Microsoft introduce RD-Agent, a tool that integrates data-driven R&D systems and harnesses advanced AI to automate innovation and development.At the heart of RD-Agent is an autonomous agent framework with two key components: a) Research and b) Development. Research focuses on actively exploring and generating new ideas, while Development implements these ideas. Both components improve through an iterative process, illustrated in Figure 1 of the article, ensures the system becomes increasingly effective over time.Read the articleSpotlight: Microsoft research newsletterMicrosoft Research NewsletterStay connected to the research community at Microsoft.Subscribe todayOpens in a new tab Microsoft Research | In case you missed itMicrosoft Research 2024: A year in reviewDecember 20, 2024Microsoft Research did extraordinary work this year, using AI and scientific research to make progress on real-world challenges like climate change, food security, global health, and human trafficking. Heres a look back at the broad range of accomplishments and advances in 2024. AIOpsLab: Building AI agents for autonomous cloudsDecember 20, 2024AIOpsLab is a holistic evaluation framework for researchers and developers, to enable the design, development, evaluation, and enhancement of AIOps agents, which also serves the purpose of reproducible, standardized, interoperable, and scalable benchmarks. Yasuyuki Matsushita, IEEE Computer Society 2025 FellowDecember 19, 2024Congratulations to Yasuyuki Matsushita, Senior Principal Research Manager at Microsoft Research, who was named a 2025 IEEE Computer Society Fellow. Matsushita was recognized for contributions to photometric 3D modeling and computational photography. View more news and awards Opens in a new tab0 Commenti 0 condivisioni 283 Views
-
WWW.MICROSOFT.COMMatterGen: A new paradigm of materials design with generative AIMaterials innovation is one of the key drivers of major technological breakthroughs. The discovery of lithium cobalt oxide in the 1980s laid the groundwork for todays lithium-ion battery technology. It now powers modern mobile phones and electric cars, impacting the daily lives of billions of people. Materials innovation is also required for designing more efficient solar cells, cheaper batteries for grid-level energy storage, and adsorbents to recycle CO2 from atmosphere.Finding a new material for a target application is like finding a needle in a haystack. Historically, this task has been done via expensive and time-consuming experimental trial-and-error. More recently, computational screening of large materials databases has allowed researchers to speed up this process. Nonetheless, finding the few materials with the desired properties still requires the screening of millions of candidates.Today, in a paper published in Nature (opens in new tab), we share MatterGen, a generative AI tool that tackles materials discovery from a different angle. Instead of screening the candidates, it directly generates novel materials given prompts of the design requirements for an application. It can generate materials with desired chemistry, mechanical, electronic, or magnetic properties, as well as combinations of different constraints. MatterGen enables a new paradigm of generative AI-assistedmaterials design that allows for efficient exploration of materials, going beyond the limited set of known ones.Figure 1: Schematic representation of screening and generative approaches to materials designA novel diffusion architectureMatterGen is a diffusion model that operates on the 3D geometry of materials. Much like an image diffusion model generates pictures from a text prompt by modifying the color of pixels from a noisy image, MatterGen generates proposed structures by adjusting the positions, elements, and periodic lattice from a random structure. The diffusion architecture is specifically designed for materials to handle specialties like periodicity and 3D geometry.Figure 2: Schematic representation of MatterGen: a diffusion model to generate novel and stable materials. MatterGen can be fine-tuned to generate materials under different design requirements such as specific chemistry, crystal symmetry, or materials properties.The base model of MatterGen achieves state-of-the-art performance in generating novel, stable, diverse materials (Figure 3). It is trained on 608,000 stable materials from the Materials Project (opens in new tab) (MP) and Alexandria (opens in new tab) (Alex) databases. The performance improvement can be attributed to both the architecture advancements, as well asthe quality and size of our training data.Figure 3: Performance of MatterGen and other methods in the generation of stable, unique, and novel structures. The training dataset for each method is indicated in parentheses. The purple bar highlights performance improvements due to MatterGens architecture alone, while the teal bar highlights performance improvements that come also from the larger training dataset.MatterGen can be fine-tuned with a labelled dataset to generate novel materials given any desired conditions. We demonstrate examples of generating novel materials given a targets chemistry and symmetry, as well as electronic, magnetic, and mechanical property constraints (Figure 2). Outperforming screeningFigure 4: Performance of MatterGen (teal) and traditional screening (yellow) in finding novel, stable, and unique structures that satisfy the design requirement of having bulk modulus greater than 400 GPa.The key advantage of MatterGen over screening is its ability to access the full space of unknown materials. In Figure 4, we show that MatterGen continues to generate more novel candidate materials with high bulk modulus above 400 GPa, for example, which are hard to compress. In contrast, screening baseline saturates due to exhausting known candidates.Spotlight: Blog postMedFuzz: Exploring the robustness of LLMs on medical challenge problemsMedfuzz tests LLMs by breaking benchmark assumptions, exposing vulnerabilities to bolster real-world accuracy.Read moreOpens in a new tab Handling compositional disorderFigure 5: Illustration of compositional disorder. Left: a perfect crystal without compositional disorder and with a repeating unit cell (black dashed). Right: crystal with compositional disorder, where each site has 50% probability of yellow and teal atoms.Compositional disorder (Figure 5) is a commonly observed phenomenon where different atoms can randomly swap their crystallographic sites in a synthesized material. Recently (opens in new tab), the community has been exploring what it means for a material to be novel in the context of computationally designed materials, as widely employed algorithms will not distinguish between pairs of structures where the only difference is a permutation of similar elements in their respective sites.We provide an initial solution to this issue by introducing a new structure matching algorithm that considers compositional disorder. The algorithm assesses whether a pair of structures can be identified as ordered approximations of the same underlying compositionally disordered structure. This provides a new definition of novelty and uniqueness, which we adopt in our computational evaluation metrics. We also make our algorithm publicly available (opens in new tab) as part of our evaluation package.Experimental lab verificationFigure 6: Experimental validation of the proposed compound, TaCr2O6In addition to our extensive computational evaluation, we have validated MatterGens capabilities through experimental synthesis. In collaboration with the team led by Prof Li Wenjie from the Shenzhen Institutes of Advanced Technology (opens in new tab) (SIAT) of the Chinese Academy of Sciences, we have synthesized a novel material, TaCr2O6, whose structure was generated by MatterGen after conditioning the model on a bulk modulus value of 200 GPa. The synthesized materials structure aligns with the one proposed by MatterGen, with the caveat of compositional disorder between Ta and Cr. Additionally, we experimentally measure a bulk modulus of 169 GPa against the 200 GPa given as design specification, with a relative error below 20%, very close from an experimental perspective. If similar results can be translated to other domains, it will have a profound impact on the design of batteries, fuel cells, and more. AI emulator and generator flywheelMatterGen presents a new opportunity for AI accelerated materials design, complementing our AI emulator MatterSim. MatterSim follows the fifth paradigm of scientific discovery, significantly accelerating the speed of material properties simulations. MatterGen in turn accelerates the speed of exploring new material candidates with property guided generation. MatterGen and MatterSim can work together as a flywheel to speed up both the simulation and exploration of novel materials.Making MatterGen availableWe believe the best way to make an impact in materials design is to make our model available to the public. We release the source code of MatterGen (opens in new tab) under the MIT license, together with the training and fine-tuning data. We welcome the community to use and build on top of our model.MatterGen represents a new paradigm of materials design enabled by generative AI technology. It explores a significantly larger space of materials than screening-based methods. It is also more efficient by guiding materials exploration with prompts. Similar to how generative AI has impacted drug discovery (opens in new tab), it will have profound impact on how we design materials in broad domains including batteries, magnets, and fuel cells.We plan to continue our work with external collaborators to further develop and validate the technology. At the Johns Hopkins University Applied Physics Laboratory (APL), were dedicated to the exploration of tools with the potential to advance discovery of novel, mission-enabling materials. Thats why we are interested in understanding the impact that MatterGen could have on materials discovery, said Christopher Stiles, a computational materials scientists leading multiple materials discovery efforts at APL.AcknowledgementThis work is the result of highly collaborative team efforts at Microsoft Research AI for Science. The full authors include: Claudio Zeni, Robert Pinsler, Daniel Zgner, Andrew Fowler, Matthew Horton, Xiang Fu, Zilong Wang, Aliaksandra Shysheya, Jonathan Crabb, Shoko Ueda, Roberto Sordillo, Lixin Sun, Jake Smith, Bichlien Nguyen, Hannes Schulz, Sarah Lewis, Chin-Wei Huang, Ziheng Lu, Yichi Zhou, Han Yang, Hongxia Hao, Jielan Li, Chunlei Yang, Wenjie Li, Ryota Tomioka, Tian Xie.Opens in a new tab0 Commenti 0 condivisioni 271 Views
-
WWW.MICROSOFT.COMIdeas: AI for materials discovery with Tian Xie and Ziheng LuTranscript[TEASER][MUSIC PLAYS UNDER DIALOGUE]TIAN XIE: Yeah,ZIHENG LU: Previously, a lot of people are using this atomistic simulator and this generative models alone. But if you think about it, now that we have these two foundation models together, it really can make things different, right. You have a very good idea generator. And you have a very good goalkeeper. And you put them together. They form a loop. And now you can use this loop to design materials really quickly.[TEASER ENDS]LINDSAY KALTER: Youre listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. In this series, well explore the technologies that are shaping our future and the big ideas that propel them forward.[MUSIC FADES]Im your guest host, Lindsay Kalter. Today Im talking to Microsoft Principal Research Manager Tian Xie and Microsoft Principal Researcher Ziheng Lu. Tian is doing fascinating work with MatterGen, an AI tool for generating new materials guided by specific design requirements. Ziheng is one of the visionaries behind MatterSim, which puts those new materials to the test through advanced simulations. Together, theyre redefining whats possible in materials science. Tian and Ziheng, welcome to the podcast.TIAN XIE: Very excited to be here.ZIHENG LU: Thanks, Lindsay, very excited.KALTER: Before we dig into the specifics of MatterGen and MatterSim, lets give our audience a sense of how you, as researchers, arrived at this moment. Materials science, especially at the intersection of computer science, is such a cutting-edge and transformative field. What first drew each of you to this space? And what, if any, moment or experience made you realize this was where you wanted to innovate? Tian, do you want to start?XIE: So I started working on AI for materials back in 2015, when I started my PhD. So I come as a chemist and materials scientist, but I was, kind of, figuring out what I want to do during my PhD. So there is actually one moment really drove me into the field. That was AlphaGo. AlphaGo was, kind of, coming out in 2016, where it was able to beat the world champion in go in 2016. I was extremely impressed by that because I, kind of, learned how to do go, like, in my childhood. I know how hard it is and how much effort those professional go players have spent, right, in learning about go. So I, kind of, have the feeling that if AI can surpass the world-leading go players, one day, it will too surpass material scientists, right, in their ability to design novel materials. So thats why I ended up deciding toLU: Thats very interesting, Tian. So, actually, I think I started, like, two years before you as a PhD student. So I, actually, I was trained as a computational materials scientist solely, not really an AI expert. But at that time, the computational materials science did not really work that well. It works but not working that well. So after, like, two or three years, I went back to experiments for, like, another two or three years because, I mean, the experiment is always the gold standard, right. And I worked on this experiments for a few years, and then about three years ago, I went back to this field of computation, especially because of AI. At that time, I think GPT and these large AI models that currently were using is not there, but we already have their prior forms like BERT, so we see the very large potential of AI. We know that these large AIs might work. So one idea is really to use AI to learn the entire space of materials and really grasp the physics there, and that really drove me to this field and thats why Im here working on this field, yeah.KALTER: Were going to get into what MatterGen and MatterSim mean for materials sciencethe potential, the challenges, and open questions. But first, give us an overview of what each of these tools are, how they do what they do, andas this show is about big ideasthe idea driving the work. Ziheng, lets have you go first.LU: So MatterSim is a tool to do in silico characterizations of materials. If you think about working on materials, you have several steps. You first need to synthesize it, and then you need to characterize this. Basically, you need to know what property, what structures, whatever stuff about these materials. So for MatterSim, what we want to do is to really move the characterization process, a lot of these processes, into using computations. So the idea behind MatterSim is to really learn the fundamentals of physics. So we learn the energies and forces and stresses from these atomic structures and the charge densities, all of these things, and then with these, we can really simulate any sort of materials using our computational machines. And then with these, we can really characterize a lot of these materials properties using our computer, that is very fast. Its much faster than we do experiments so that we can accelerate the materials design. So just in a word, basically, you input your material into your computer, a structure into your computer, and MatterSim will try to simulate these materials like what you do in a furnace or with an XRDKALTER: All right, thank you very much. Tian, why dont you tell us about MatterGen?XIE: Yeah, thank you. So, actually, Ziheng, once you start with explaining MatterSim, it makes it much easier for me to explain MatterGen. So MatterGen actually represents a new way to design materials with generative AI. Material discovery is like finding needles in a haystack. Youre looking for a material with a very specific property for a material application. For example, like finding a room-temperature superconductor or finding a solid that can conduct a lithium ion very well inside a battery. So its like finding one very specific material from a million, kind of, candidates. So the conventional way of doing material discovery is via screening, where you, kind of, go over millions of candidates to find the one that youre looking for, where MatterSim is able to significantly accelerate that process by making the simulation much faster. But its still very inefficient because you need to go through this million candidates, right. So with MatterGen, you can, kind of, directly generate materials given the prompts of the design requirements for the application. So this means that you can discover materialsdiscover useful materials much more efficiently. And it also allows us to explore a much larger space beyond the set of known materials.KALTER: Thank you, Tian. Can you tell us a little bit about how MatterGen and MatterSim work together?XIE: So you can really think about MatterSim and MatterGen accelerating different parts of materials discovery process. MatterSim is trying to accelerate the simulation of material properties, while MatterGen is trying to accelerate the search of novel material candidates. It means that they can really work together as a flywheel and you can compound the acceleration from both models. They are also both foundation AI models, meaning they can both be used for a broad range of materials design problems. So were really looking forward to see how they can, kind of, working together iteratively as a tool to design novel materials for a broad range of applications.LU: I think thats a very good, like, general introduction of how they work together. I think I can provide an example of how they really fit together. If you want a material with a specific, like, bulk modulus or lithium-ion conductivity or thermal conductivity for your CPU chips, so basically what you want to do is start with a pool of material structures, like some structures from the database, and then you compute or you characterize your wanted property from that stack of materials. And then what you do, youve got these properties and structure pairs, and you input these pairs into MatterGen. And MatterGen will be able to give you a lot more of these structures that are highly possible to be real. But the number will be very large. For example, for the bulk modulus, I dont remember the number we generated in our work was that like thousands, tens of thousands?XIE: Thousands, tens of thousands.LU: Yeah, that would be a very large number pool even with MatterGen, so then the next step will be, how would you like to screen that? You cannot really just send all of those structures to a lab to synthesize. Its too much, right. Thats when MatterSim again comes in. So MatterSim comes in and screen all those structures again and see which ones are the most likely to be synthesized and which ones have the closest property you wanted. And then after screening, you probably get five, 10 top candidates and then you send to a lab. Boom, everything goes down. Thats it.KALTER: Im wondering if theres any prior research or advancements that you drew from in creating MatterGen and MatterSim. Were there any specific breakthroughs that influenced your approaches at all?LU: Thanks, Lindsay. I think Ill take that question first. So interestingly for MatterSim, a very fundamental idea was drew from Chi Chen, who was a previous lab mate of mine and now also works for Microsoft at Microsoft Quantum. He made this fantastic model named M3GNet, which is a prior form of a lot of these large-scale models for atomistic simulations. That model, M3GNet, actually resolves the near ground state prediction problem. I mean, the near ground state problem sounds like a fancy but not realistic word, but what that actually means is that it can simulate materials at near-zero covalent states. So basically at very low temperatures. So at that time, we were thinking since the models are now able to simulate materials at their near ground states, its not a very large space. But if you also look at other larger models, like GPT whatever, those models are large enough to simulate entire human language. So its possible to really extend the capability from these such prior models to very large space. Because we believe in the capability of AI, then it really drove us to use MatterSim to learn the entire space of materials. I mean, the entire space really means the entire periodic table, all the temperatures and the pressures people can actually grasp.XIE: Yeah, I still remember a lot of the amazing works from Chi Chen whenever were, kind of, back working on property-prediction models. So, yeah, so the problem of generating materials from properties is actually a pretty old one. I still remember back in 2018, when I was, kind of, working on CGCNN (crystal graph convolutional neural networks) and giving a talk about property-prediction models, right, one of the first questions people asked is, OK, can you inverse this process? Instead of going from material structure to properties, can you, kind of, inversely generate the materials directly from their property conditions? So in a way, this is, kind of, like a dream for material scientistssome people even call it, like, holy grailbecause, like, the end goal is really about finding materials property, right, [that] will satisfy your application. So Ive been, kind of, thinking about this problem for a while and also there has been a lot of work, right, over the past few years in the community to build a generative model for materials. A lot of people have tried before, like 2020, using ideas like VAEs or GANs. But its hard to represent materials in this type of generative model architecture, and many of those models generated relatively poor candidates. So I thought it was a hard problem. I, kind of, know it for a while. But there is no good solutions back then. So I started to focus more on this problem during my postdoc, when I studied that in 2020 and I keep working on that in 2021. At the beginning, I wasnt really sure exactly what approach to take because its, kind of, like open question and really tried a lot of random ideas. So one day actually in my group back then with Tommi Jaakkola and Regina Barzilay at MITs CSAIL (Computer Science & Artificial Intelligence Laboratory), we, kind of, get to know this method called diffusion model. It was a very early stage of a diffusion model back then, but it already began to show very promising signs, kind of, achieving state of art in many problems like 3D point cloud generation and the 3D molecular conformer generation. So the work that really inspired me a lot is two works that was for molecular conformer generation. One is ConfGF, and one is GeoDiff. So they, kind of, inspired me to, kind of, focus more on diffusion models. That actually lead to CDVAE (crystal diffusion variational autoencoder). So its interesting that we, kind of, spend like a couple of weeks in trying all this diffusion idea, and without that much work, it actually worked quite out of box. And at that time, CDVAE achieves much better performance than any previous models in materials generation, and were, kind of, super happy with that. So after CDVAE, I, kind of, joined Microsoft, now working with more people together on this problem of generative model for materials. So we, kind of, know what the limitations of CDVAE are, is that it can do unconditional material generation well means it can generate novel material structures, but it is very hard to use CDVAE to do property-guided generations. So basically, it uses an architecture called a variational autoencoder, where you have a latent space. So the way that you do property-guided generation there was to do a, kind of, a gradient update inside the latent space. But because the latent space wasnt learned very well, so it actually you cannot do, kind of, good property-guided generation. We only managed to do energy-guided generation, but it wasnt successful in going beyond energy. So that comes us to really thinking, right, how can we make the property-guided generation much better? So I remember like one day, actually, my colleague, Daniel Zgner, who actually really showed me this blog which basically explains this idea of classifier-free guidance, which is the powerhouse behind the text-image generative models. And so, yeah, then we began to think about, can we actually make the diffusion model work for classifier-free guidance? That lead us to remove the, kind of, the variational autoencoder component from CDVAE and begin to work on a pure diffusion architecture. But then there was, kind of, a lot of development around that. But it turns out that classifier-free guidance is the key really to make property-guided generation work, and then combined with a lot more effort in, kind of, improving architecture and also generating more data and also trying out all these different downstream tasks that end up leading into MatterGen as we see today.KALTER: Yeah, I think youve both done a really great job of explaining how MatterGen and MatterSim work together and how MatterGen can offer a lot in terms of reducing the amount of time and work that goes into finding new materials. Tian, how does the process of using MatterGen to generate materials translate into real-world applications?XIE: Yeah, thats a fantastic question. So one way that I think about MatterGen, right, is that you can think about it as like a copilot for materials scientists, right. So they can help you to come up with, kind of, potential good hypothesis for the materials design problems that youre looking for. So say youre trying to design a battery, right. So you may have some ideas over, OK, what candidates you want to make, but this is, kind of, based on your own experience, right. Depths of experience as a researcher. But MatterGen is able to, kind of, learn from a very broad set of data, so therefore, it may be able to come up with some good suggestions, even surprising suggestions, for you so that you can, kind of, try this out, right, both with computation or even one day in wet lab and experimentally synthesize it. But I also want to note that this, in a way, this is still an early stage in generative AI for materials means that I dont expect all the candidates MatterGen generates will be, kind of, suits your needs, right. So you still need to, kind of, look into them with expertise or with some kind of computational screening. ButKALTER: I want to pivot a little bit to the MatterSim side of things. I know identifying new combinations of compounds is key to meeting changing needs for things like sustainable materials. But testing them is equally important to developing materials that can be put to use. Ziheng, how does MatterSim handle the uncertainty of how materials behave under various conditions, and how do you ensure that the predictions remain robust despite the inherent complexity of molecular systems?LU: Thanks. Thats a very, very good question. So uncertainty quantification is a key to make sure all these predictions and simulations are trustworthy. And thats actually one of the questions we got almost every time after a presentation. So people will ask, wellespecially those experimentalistswould ask, well, Ive been using your model; how do I know those predictions are true under the very complex conditions Im using in my experiments? So to understand how we deal with uncertainty, we need to know how MatterSim really functions in predicting an arbitrary property, especially under the condition you want, like the temperature and pressure. That would be quite complex, right? So in the ideal case, we would hope that by using MatterSim, you can directly simulate the properties you want using molecular dynamics combined with statistical mechanics. So if so, it would be easy to really quantify the uncertainty because there are just two parts: the error from the model and the error from the simulations, the statistical mechanics. So the error from the model will be able to be measured by, what we call, an ensemble. So basically you start with different random seeds when you train the model, and then when you predict your property, you use several models from the ensemble and then you get different numbers. If the variance from the numbers are very large, youll say the prediction is not that trustworthy. But a lot of times, we will see the variance is very small. So basically, an ensemble of several different models will give you almost exactly the same number; youre quite sure that the number is somehow very, like, useful. So thats one level of the way we want to get our property. But sometimes, its very hard to really directly simulate the property you want. For example, for catalytic processes, its very hard to imagine how you really get those coefficients. Its very hard. The process is just too complicated. So for that process, what we do is to really use the, what we call, embeddings learned from the entire material space. So basically that vector we learned for any arbitrary material. And then start from that, we build a very shallow layer of a neural network to predict the property, but that also means you need to bring in some of your experimental or simulation data from your side. And for that way of predicting a property to measure the uncertainty, its still like the two levels, right. So we dont really have the statistical error anymore, but what we have is, like, only the model error. So you can still stick to the ensemble, and then it will work, right. So to be short, so MatterSim can provide you an uncertainty to make sure the prediction tells you whether its true or not.KALTER: So in many ways, MatterSim is the realist in the equation, and its there to sort of be a gatekeeper for MatterGen, which is the idea generator.XIE: I really like the analogy.LU: Yeah.KALTER: As is the case with many AI models, the development of MatterGen and MatterSim relies on massive amounts of data. And here you use a simulation to create the needed training data. Can you talk about that process and why youve chosen that approach, Tian?XIE: So one advantage here is that we can really use large-scale simulation to generate data. So we have a lot of compute here at Microsoft on our Azure platform, right. So how we generate the data is that we use a method called density functional theory, DFT, which is a quantum mechanical method. And we use a simulation workflow built on top with DFT to simulate the stability of materials. So what we do is that we curate a huge amount of material structures from multiple different sources of open data, mostly including Materials Project and Alexandria database, and in total, there are around 3 million materials candidates coming from these two databases. But not all of these structures, they are stable. So therefore, we try to use DFT to compute their stability and try to filter down the candidates such that we are making sure that our training data only have the most stable ones. This leads into around 600,000 training data, which was used to train the base model of MatterGen. So I want to note that actually we also use MatterSim as part of the workflow because MatterSim can be used to prescreen unstable candidates so that we dont need to use DFT to compute all of them. I think at the end, we computed around 1 million DFT calculations where two-thirds of them, they are already filtered out by MatterSim, which saves us a lot of compute in generating our training data.LU: Tian, you have a very good description of how we really get those ground state structures for the MatterGen model. Actually, weve been also using MatterGen for MatterSim to really get the training data. So if you think about the simulation space of materials, its extremely large. So we would think it in a way that it has three axis, so basically the elements, the temperature, and the pressure. So if you think about existing databases, they have pretty good coverage of the elements space. Basically, we think about Materials Project, NOMAD, they really have this very good coverage of lithium oxide, lithium sulfide, hydrogen sulfide, whatever, those different ground-state structures. But they dont really tell you how these materials behave under certain temperature and pressure, especially under those extreme conditions like 1,600 Kelvin, which you really use to synthesize your materials. Thats where we really focused on to generate the data for MatterSim. So its really easy to think about how we generate the data, right. You put your wanted material into a pressure cooker, basically, molecular dynamics; it can simulate the materials behavior on the temperature and pressure. So thats it. Sounds easy, right? But thats not true because what we want is not one single material. What we want is the entire material space. So that will be making the effort almost impossible because the space is just so large. So thats where we really develop this active learning pipeline. So basically, what we do is, like, we generate a lot of these structures for different elements and temperatures, pressures. Really, really a lot. And then what we do is, like, we ask the active learning or the uncertainty measurements to really say whether the model knows about this structure already. So if the model thinks, well, I think I know the structure already. So then, we dont really calculate this structure using density function theory, as Tian just said. So this will really save us like 99% of the effort in generating the data. So in the end, by combining this molecular dynamics, basically pressure cooker, together with active learning, we gathered around 17 million data for MatterSim. So that was used to train the model. And now it can cover the entire periodic table and a lot of temperature and pressures.KALTER: Thank you, Ziheng. Now, Im sure this is not news to either one of you, given that youre both at the forefront of these efforts, but there are a growing number of tools aimed at advancing materials science. So what is it about MatterGen and MatterSim in their approach or capabilities that distinguish them?XIE: Yeah, I think I can start. So I think there is, in the past one year, there is a huge interest in building up generative AI tools for materials. So we have seen lots and lots of innovations from the community published in top conferences like NeurIPS, ICLR, ICML, etc. So I think what distinguishes MatterGen, in my point of view, are two things. First is that we are trained with a very big dataset that we curated very, very carefully, and we also spent quite a lot of time to refining our diffusion architecture, which means that our model is capable of generating very, kind of, high-quality, highly stable and novel materials. We have some kind of bar plot in our paper showcasing the advantage of our performance. I think thats one key aspect. And I think the second aspect, which in my point of view is even more important, is that it has the ability to do property-guided generation. Many of the works that we saw in the community, they are more focused on the problem of crystal structure prediction, which MatterGen can also do, but we focus more on really property-guided generation because we think this is one of the key problems that really materials scientists care about. So the ability to do a very broad range of property-guided generationand we have, kind of, both computational and now experimental result to validate thoseI think thats the second strong point for MatterGen.KALTER: Ziheng, do you want to add to that?LU: Yeah, thanks, Lindsay. So on the MatterSim side, I think its really the diverse condition it can handle that makes a difference. Weve been talking about, like, the training data we collected really covers the entire periodic table and also, more importantly, the temperatures from 0 Kelvin to 5,000 Kelvin and the pressures from 0 gigapascal to 1,000 gigapascal. That really covers what humans can control nowadays. I mean, its very hard to go beyond that. If you know anyone [who] can go beyond that, let me know. So that really makes MatterSim different. Like, it can handle the realistic conditions. I think beyond that, I would say the combo between MatterSim and MatterGen really makes these set of tools really different. So previously, a lot of people are using this atomistic simulator and this generative models alone. But if you think about it, now that we have these two foundation models together, they really can make things different, right. So we have predictor; we have the generator; you have a very good idea generator. And you have a very good goalkeeper. And you put them together. They form a loop. And now you can use this loop to design materials really quickly. So I would say to me, now, when I think about it, its really the combo that makes these set of tools different.KALTER: I know that Ive spoken with both of you recently about how theres so much excitement around this, and its clear that were on the precipice of thisas both of you have called ita paradigm shift. And Microsoft places a very strong emphasis on ensuring that its innovations are grounded in reality and capable of addressing real-world problems. So with that in mind, how do you balance the excitement of scientific exploration with the practical challenges of implementation? Tian, do you want to take this?XIE: Yeah, I think this is a very, very important point, because as there are so many hypes around AI that is happening right now, right. We must be very, very careful about the claims that we are making so that people will not have unrealistic expectations, right, over how these models can do. So for MatterGen, were pretty careful about that. Were trying to, basically, were trying to say that this is an early stage of generative AI in materials where this model will be improved over time quite significantly, but you should not say, oh, all the materials generated by MatterGen is going to be amazing. Thats not what is happening today. So we try to be very careful to understand how far MatterGen is already capable of designing materials with real-world impact. So therefore, we went all the way to synthesize one material that was generated by MatterGen. So this material we generated is called tantalum chromium oxide1. So this is a new material. It has not been discovered before. And it was generated by MatterGen by conditioning a bulk modulus equal to 200 gigapascal. Bulk modulus is, like, the compressiveness of the material. So we end up measuring the experimental synthesized material experimentally, and the measured bulk modulus is 169 gigapascal, which is within 20% of error. So this is a very good proof concept, in our point of view, to show that, oh, you can actually give it a prompt, right, and then MatterGen can generate a material, and the material actually have the property that is very close to your target. But its still a proof of concept. And were still working to see how MatterGen can design materials that are much more useful with a much broader range of applications. And Im sure that there will be more challenges we are seeing along the way. But were looking forward to further working with our experimental partners to, kind of, push this further. And also working with MatterSim, right, to see how these two tools can be used to design really useful materials and bringing this into real-world impact.LU: Yeah, Tian, I think thats very well said. Its not really only for MatterGen. For MatterSim, were also very careful, right. So we really want to make sure that people understand how these models really behave under their instructions and understand, like, what they can do and they cannot do. So I think one thing that we really care about is that in the next few, maybe one or two years, we want to really work with our experimental partners to make this realistic materials, like, in different areas so that we can, even us, can really better understand the limitations and at the same time explore the forefront of materials science to make this excitement become true.KALTER: Ziheng, could you give us a concrete example of what exactly MatterSim is capable of doing?LU: Now MatterSim can really do, like, whatever you have on a potential energy surface. So what that means is, like, anything that can be simulated with the energy and forces, stresses alone. So to give you an example, we can compute the first example would be the stability of a material. So basically, you input a structure, and from the energies of the relaxed structures, you can really tell whether the material is likely to be stable, like, the composition, right. So another example would be the thermal conductivity. Thermal conductivity is like a fundamental property of materials that tells you how fast heat can transfer in the material, right. So for MatterSim, it can really simulate how fast this heat can go through your diamond, your graphene, your copper, right. So basically, those are two examples. So these examples are based on energies and forces alone. But there are things MatterSim cannot doat least for now. For example, you cannot really do anything related to electronic structures. So you cannot really compute the light absorption of a semitransparent material. That would be a no-no for now.KALTER: Its clear from speaking with researchers, both from MatterSim and MatterGen, that despite these very rapid advancements in technology, you take very seriously the responsibility to consider the broader implications of the challenges that are still ahead. How do you think about the ethical considerations of creating entirely new materials and simulating their properties, particularly in terms of things like safety, sustainability, and societal impact?XIE: Yeah, thats a fantastic question. So its extremely important that we are making sure that these AI tools, they are not misused. A potential misuse, right, as you just mentioned, is that people begin to use these AI toolsMatterGen, MatterSimto, kind of, design harmful materials. There was actually extensive discussion over how generative AI tools that was originally purposed for drug design can be then misused to create bioweapons. So at Microsoft, we take this very seriously because we believe that when we create new technologies, you must also ensure that the technology is used responsibly. So we have an extensive process to ensure that all of our models respect those ethical considerations. In the meantime, as you mentioned, maybe sustainability and the societal impact, right, so theres a huge amount these AI toolsMatterGen, MatterSimcan do for sustainability because a lot of the sustainability challenges, they are really, at the end, materials design challenges, right. So therefore, I think that MatterGen and MatterSim can really help with that in solving, in helping us to alleviate climate change and having positive societal impact for the broader society.KALTER: And, Ziheng, how about from a simulation standpoint?LU: Yeah, I think Tian gave a very good, like, description. At Microsoft, we are really careful about these ethical, like, considerations. So I would add a little bit on the more, like, the bright side of things. Like, so for MatterSim, like, it really carries out these simulations at atomic scales. So one thing you can think about is really the educational purpose. So back in my bachelor and PhD period, so I would sit, like, at the table and really grab a pen to really deal with those very complex equations and get into those statistics using my pen. Its really painful. But now with MatterSim, these simulation tools at atomic level, what you can do is to really simulate the reactions, the movement of atoms, at atomic scale in real time. You can really see the chemical reactions and see the statistics. So you can get really the feeling, like very direct feeling, of how the system works instead of just working on those toy systems with your pen. I think its going to be a very good educational tool using MatterSim, yeah. Also MatterGen. MatterGen as, like, a generative tool and generating those i.i.d. (independent and identically distributed) distributions, it will be a perfect example to show the students how the Boltzmann distribution works. I think, Tian, you will agree with that, right?XIE: 100%. Yeah, I really, really like the example that Ziheng mentioned about the educational purposes. I still remember, like, when I was, kind of, learning material simulation class, right. So everything is DFT. You, kind of, need to wait for an hour, right, for getting some simulation. Maybe then youll make some animation. Now you can do this in real time. This is, like, a huge step forward, right, for our young researchers to, kind of, gaining a sense, right, about how atoms interact at an atomic level.LU: Yeah, and the results are really, I mean, true; not really those toy models. I think its going to be very exciting stuff.KALTER: And, Tian, Im directing this question to you, even though, Ziheng, Im sure you can chime in, as well. But, Tian, I know that you and I have previously discussed this specifically. I know that you said back in, you know, 2017, 2018, that you knew an AI-based approach to materials science was possible but that even you were surprised by how far the technology has come so fast in aiding this area. What is the status of these tools right now? Are they in use? And if so, who are they available to? And, you know, whats next for them?XIE: Yes, this is a fantastic question, right. So I think for AI generative tools like MatterGen, as I said many times earlier, its still in its early stages. MatterGen is the first tool that we managed to show that generative AI can enable very broad property-guided generation, and we have managed to have experimental validation to show its possible. But it will take more work to show, OK, it can actually design batteries, can design solar cells, right. It can design really useful materials in these broader domains. So this is, kind of, exactly why we are now taking a pretty open approach with MatterGen. We make our code, our training data, and model weights available to the general public. Were really hoping the community can really use our tools to the problem that they care about and even build on top of that. So in terms of what next, I always like to use what happened with generative AI for drugs, right, to kind of predict how generative AI will impact materials. Three years ago, there is a lot of research around generative model for drugs, first coming from the machine learning community, right. So then all the big drug companies begin to take notice, and then there are, kind of, researchers in these drug companies begin to use these tools in actual drug design processes. From my colleague, Marwin Segler, because he, kind of, works together with Novartis in Microsoft and Novartis collaboration, he has been basically telling me that at the beginning, all the chemists in the drug companies, theyre all very suspicious, right. The molecules generated by these generative models, they all look a bit weird, so they dont believe this will work. But once these chemists see one or two examples that actually turns out to be performing pretty well from the experimental result, then they begin to build more trust, right, into these generative AI models. And today, these generative AI tools, they are part of the standard drug discovery pipeline that is widely used in all the drug companies. That is today. So I think generative AI for materials is going through a very similar period. People will have doubts; people will have suspicions at the beginning. But I think in three years, right, so it will become a standard tool over how people are going to design new solar cells, design new batteries, and many other different applications.KALTER: Great. Ziheng, do you have anything to add to that?LU: So actually for MatterSim, we released the model, I think, back in last year, December. I mean, both the weights and the models, right. So were really grateful how much the community has contributed to the repo. And now, I mean, we really welcome the community to contribute more to both MatterSim and MatterGen via our open-source code bases. So, I mean, the community effort is really important, yeah.KALTER: Well, it has been fascinating to pick your brains, and as we close, you know, I know that youre both capable of quite a bit, which you have demonstrated. I know that asking you to predict the future is a big ask, so I wont explicitly ask that. But just as a fun thought exercise, lets fast-forward 20 years and look back. How have MatterGen and MatterSim and the big ideas behind them impacted the world, and how are people better off because of how you and your teams have worked to make them a reality? Tian, you want to start?XIE: Yeah, I think one of the biggest challenges our human society is going to face, right, in the next 20 years is going to be climate change, right, and there are so many materials design problems people need to solve in order to properly handle climate change, like finding new materials that can absorb CO2 from atmosphere to create a carbon capture industry or have a battery materials that is able to do large-scale energy grid storage so that we can fully utilizing all the wind powers and the solar power, etc., right. So if you want me to make one prediction, I really believe that these AI tools, like MatterGen and MatterSim, is going to play a central role in our humans ability to design these new materials for climate problems. So therefore in 20 years, I would like to see we have already solved climate change, right. We have large-scale energy storage systems that was designed by AI that is basically that we have removed all the fossil fuels, right, from our energy production, and for the rest of the carbon emissions that is very hard to remove, we will have a carbon capture industry with materials designed by AI that absorbs the CO2 from the atmosphere. Its hard to predict exactly what will happen, but I think AI will play a key role, right, into defining how our society will look like in 20 years.LU: Tian, very well said. So I think instead of really describing the future, I would really quote a science fiction scene in Iron Man. So basically in 20 years, I will say when we want to really get a new material, we will just sit in an office and say, Well, J.A.R.V.I.S., can you design us a new material that really fits my newest MK 7 suit? That will be the end. And it will run automatically, and we get this auto lab running, and all those MatterGen and MatterSim, these AI models, running, and then probably in a few hours, in a few days, we get the material.KALTER: Well, I think I speak for many people from several industries when I say that I cannot wait to see what is on the horizon for these projects. Tian and Ziheng, thank you so much for joining us on Ideas. Its been a pleasure.[MUSIC]XIE: Thank you so much.LU: Thank you.[MUSIC FADES]0 Commenti 0 condivisioni 263 Views
Altre storie