WWW.MICROSOFT.COM
Research Focus: Week of April 21, 2025
In this issue: Catch a preview of our presentations and papers at CHI 2025 and ICLR 2025. We also introduce new research on causal reasoning and LLMs; enhancing LLM jailbreak capabilities to bolster safety and robustness; understanding how people using AI compared to AI-alone, and Distill-MOS, a compact and efficient model that delivers state-of-the-art speech quality assessment. You’ll also find a replay of a podcast discussion on rural healthcare innovation with Senior Vice President of Microsoft Health Jim Weinstein. CONFERENCE Microsoft at CHI 2025 Microsoft Research is proud to be a sponsor of the ACM Computer Human Interaction (CHI) 2025 Conference on Human Factors in Computing Systems (opens in new tab). CHI brings together researchers and practitioners from all over the world and from diverse cultures, backgrounds, and positionalities, who share an overarching goal to make the world a better place with interactive digital technologies. Our researchers will host more than 30 sessions and workshops at this year’s conference in Yokohama, Japan. We invite you to preview our presentations and our two dozen accepted papers. Microsoft @CHI 2025 CONFERENCE Microsoft at ICLR 2025 Microsoft is proud to be a sponsor of the thirteenth International Conference on Learning Representations (ICLR). This gathering is dedicated to the advancement of representation learning, which is a branch of AI. We are pleased to share that Microsoft has more than 30 accepted papers at this year’s conference, which we invite you to preview. ICLR is globally renowned for presenting and publishing cutting-edge research on all aspects of deep learning used in the fields of artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, text understanding, gaming, and robotics. Microsoft @ICLR 2025 NEW RESEARCH Causal Reasoning and Large Language Models: Opening a New Frontier for Causality What kinds of causal arguments can large language models (LLMs) generate, how valid are these arguments, and what causal reasoning workflows can this generation support or automate? This paper, which was selected for ICLR 2025, clarifies this debate. It advances our understanding of LLMs and their causal implications, and proposes a framework for future research at the intersection of LLMs and causality. This discussion has critical implications for the use of LLMs in societally impactful domains such as medicine, science, law, and policy. In capturing common sense and domain knowledge about causal mechanisms and supporting translation between natural language and formal methods, LLMs open new frontiers for advancing the research, practice, and adoption of causality. Read the paper NEW RESEARCH The Future of AI in Knowledge Work: Tools for Thought at CHI 2025 Can AI tools do more than streamline workflows—can they actually help us think better? That’s the driving question behind the Microsoft Research Tools for Thought initiative. At this year’s CHI conference, this group is presenting four new research papers and cohosting a workshop that dives deep into this intersection of AI and human cognition. The team provides an overview of their latest research, starting with a study on how AI is changing the way people think and work. They introduce three prototype systems designed to support different cognitive tasks. Finally, through their Tools for Thought workshop, they invite the CHI community to help define AI’s role in supporting human thinking. Read the blog NEW RESEARCH Building LLMs with enhanced jailbreaking capabilities to bolster safety and robustness Recent research shows that LLMs are vulnerable to automated jailbreak attacks, where algorithm-generated adversarial suffixes bypass safety alignment and trigger harmful responses. This paper introduces ADV-LLM, an iterative self-tuning process for crafting adversarial LLMs with enhanced jailbreak capabilities—which could provide valuable insights for future safety alignment research. ADV-LLM is less computationally expensive than prior mechanisms and achieves higher attack success rates (ASR), especially against well-aligned models like Llama2 and Llama3. It reaches nearly 100% ASR on various open-source LLMs and demonstrates strong transferability to closed-source models—achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4—despite being optimized solely on Llama3. Beyond improving jailbreak performance, ADV-LLM offers valuable insights for future alignment research by enabling large-scale generation of safety-relevant datasets. Read the paper NEW RESEARCH ChatBench: From Static Benchmarks to Human-AI Evaluation The rapid adoption of LLM-based chatbots raises the need to understand what people and LLMs can achieve together. However, standard benchmarks like MMLU (opens in new tab) assess LLM capabilities in isolation (i.e., “AI alone”). This paper presents the results of a user study that transforms MMLU questions into interactive user-AI conversations. The researchers seeded the participants with the question and then had them engage in a conversation with the LLM to arrive at an answer. The result is ChatBench, a new dataset comprising AI-alone, user-alone, and user-AI data for 396 questions and two LLMs, including 144,000 answers and 7,336 user-AI conversations. The researchers’ analysis reveals that AI-alone accuracy does not predict user-AI accuracy, with notable differences across subjects such as math, physics, and moral reasoning. Examining user-AI conversations yields insights into how these interactions differ from AI-alone benchmarks. Finally, the researchers demonstrate that finetuning a user simulator on a subset of ChatBench improves its ability to predict user-AI accuracy, boosting correlation on held-out questions by more than 20 points, thereby enabling scalable interactive evaluation. Read the paper NEW RESEARCH Distill-MOS: A compact speech-quality assessment model  Distill-MOS is a compact and efficient speech quality assessment model with dramatically reduced size—over 100x smaller than the reference model—enabling efficient, non-intrusive evaluation in real-world, low-resource settings.  This paper investigates the distillation and pruning methods to reduce model size for non-intrusive speech quality assessment based on self-supervised representations. The researchers’ experiments build on XLS-R-SQA, a speech quality assessment model using wav2vec 2.0 XLS-R embeddings. They retrain this model on a large compilation of mean opinion score datasets, encompassing over 100,000 labeled clips.  Read the paper View GitHub PODCAST Collaborating to Affect Change for Rural Health Care with Innovation and Technology Senior Vice President of Microsoft Health Jim Weinstein joins Dan Liljenquist, Chief Strategy Officer from Intermountain Health, on the NEJM Catalyst podcast for a discussion of their combined expertise and resources and their collaboration to address healthcare challenges in the rural United States. These challenges include limited access to care, rising mortality rates, and severe staffing shortages. Working together, they aim to create a scalable model that can benefit both rural and urban health care systems. Key goals include expanding access through telemedicine and increasing cybersecurity, ultimately improving the quality of care delivered and financial stability for rural communities. Listen to the podcast PODCAST Empowering patients and healthcare consumers in the age of generative AI Two champions of patient-centered digital health join Microsoft Research President Peter Lee to talk about how AI is reshaping healthcare in terms of patient empowerment and emerging digital health business models. Dave deBronkart, a cancer survivor and longtime advocate for patient empowerment, discusses how AI tools like ChatGPT can help patients better understand their conditions, navigate the healthcare system, and communicate more effectively with clinicians. Christina Farr, a healthcare investor and former journalist, talks about the evolving digital health–startup ecosystem, highlighting where AI is having the most meaningful impact—particularly in women’s health, pediatrics, and elder care. She also explores consumer trends, like the rise of cash-pay healthcare.  Listen to the podcast PODCAST Beyond the Image: AI’s Expanding Role in Healthcare Jonathan Carlson, Managing Director of Microsoft Research Health Futures, joins the Healthcare Unfiltered show to explore the evolution of AI in medicine, from the early days to cutting-edge innovations like ambient clinical intelligence. This podcast explores how pre-trained models and machine learning are transforming care delivery, as well as the future of biomedicine and healthcare, including important ethical and practical questions. Listen to the podcast Opens in a new tab
0 Kommentare 0 Anteile 55 Ansichten