-
- EXPLORE
-
-
-
-
Every company has a mission. What's ours? To empower every person and every organization to achieve more. We believe technology can and should be a force for good and that meaningful innovation contributes to a brighter world in the future and today.
Recent Updates
-
WWW.MICROSOFT.COMResearch Focus: Week of December 16, 2024Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.NEW RESEARCHThe Compute Express Link (CXL) open standard interconnect enables integration of diverse types of memory into servers via its byte-addressable SerDes links. To fully utilize CXL-based heterogeneous memory systems (which combine different types of memory with varying access speeds), its necessary to implement efficient memory tieringa strategy to manage data placement across memory tiers for optimal performance. Efficiently managing these memory systems is crucial, but has been challenging due to the lack of precise and efficient tools for understanding how memory is accessed.In a recent paper: NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering researchers from Microsoft propose a novel solution which features a hardware/software co-design to address this problem. NeoMem offloads memory profiling functions to CXL device-side controllers, integrating a dedicated hardware unit called NeoProf, which monitors memory accesses and provides the operating system (OS) with crucial page hotness statistics and other system state information. On the OS kernel side, the researchers designed a revamped memory-tiering strategy, enabling accurate and timely hot page promotion based on NeoProf statistics. Implemented on a real FPGA-based CXL memory platform and Linux kernel v6.3, NeoMem demonstrated 32% to 67% geomean speedup over several existing memory tiering solutions.Read the paperNEW RESEARCHPlanning and conducting chemical syntheses is a significant challenge in the discovery of functional small molecules, which limits the potential of generative AI for molecular inverse design. Although early machine learning-based retrosynthesis models have shown the ability to predict reasonable routes, they are less accurate for infrequent, yet important reactions.In a recent paper: Chimera: Accurate retrosynthesis prediction by ensembling models with diverse inductive biases, researchers from Microsoft and external colleagues address this limitation, with a new framework for building highly accurate reaction models. Chimera incorporates two newly developed models, each achieving state-of-the-art performance in their respective categories. Evaluations by PhD-level organic chemists show that Chimeras predictions are preferred for their higher quality compared to baseline models.The researchers further validate Chimeras robustness by applying its largest-scale model to an internal dataset from a major pharmaceutical company, demonstrating its ability to generalize effectively under distribution shifts. This new framework shows the potential to substantially accelerate the development of even more accurate and versatile reaction prediction models.Read the paperMicrosoft research podcastAbstracts: August 15, 2024Advanced AI may make it easier for bad actors to deceive others online. A multidisciplinary research team is exploring one solution: a credential that allows people to show theyre not bots without sharing identifying information. Shrey Jain and Zo Hitzig explain.Listen nowOpens in a new tab NEW RESEARCHIn bioinformatics and computational biology, data analysis often involves chaining command-line programs developed by specialized teams at different institutions. These tools, which vary widely in age, software stacks, and dependencies, lack a common programming interface, which makes integration, workflow management and reproducibility challenging.A recent article (opens in new tab) emphasizes the development, adoption and implementation of the Global Alliance for Genomics and Health (GA4GH) Task Execution Service (TES) API, created in collaboration with researchers at Microsoft and other institutions. The TES API offers a unified schema and interface for submitting and managing tasks, seamlessly bridging gaps between on-premises high-performance and high-throughput computing systems, cloud platforms, and hybrid infrastructures. Its flexibility and extensibility have already made it a critical asset for applications ranging from federated data analysis to load balancing across multi-cloud systems.Adopted by numerous service providers and integrated into several workflow engines, TES empowers researchers to execute complex computational tasks through a single, abstracted interface. This eliminates compatibility hurdles, accelerates research timelines, reduces costs and enables compute to data solutionsessential for tackling the challenges of distributed data analysis.Read the paperNEW RESEARCHIncreasing use of code agents for AI-assisted coding and software development has brought safety and security concerns, such as generating or executing malicious code, which have become significant barriers to real-world deployment of these agents.In a recent paper: RedCode: Risky Code Execution and Generation Benchmark for Code Agents, published at NeurIPS 2024, researchers from Microsoft and external colleagues propose comprehensive and practical evaluations on the safety of code agents. RedCode is an evaluation platform with benchmarks grounded in four key principles: real interaction with systems, holistic evaluation of unsafe code generation and execution, diverse input formats, and high-quality safety scenarios and tests.This research evaluated three agents based on various large language models (LLMs), providing insights into code agents vulnerabilities. For instance, results showed that agents are more likely to reject executing unsafe operations on the operating system. Unsafe operations described in natural text lead to a lower rejection rate than those in code format. Additional evaluations revealed that more capable base models and agents with stronger overall coding abilities, such as GPT-4, tend to produce more sophisticated harmful software.These findings highlight the need for stringent safety evaluations for diverse code agents. The underlying dataset and related code are publicly available at https://github.com/AI-secure/RedCode (opens in new tab).Read the paperNEW RESEARCHAlthough large language models (LLMs) excel at language-focused tasks like news writing, document summarization, customer service, and supporting virtual assistants, they can face challenges when it comes tolearning and inference on numeric and structured industry data, such as tabular and time series data. To address these issues, researchers from Microsoft propose a new approach to building industrial foundation models (IFMs). As outlined in a recent blog post, they have successfully demonstrated the feasibility of cross-domain universal in-context learning on tabular data and the significant potential it could achieve.The researchers designed Generative Tabular Learning (opens in new tab)(GTL), a new framework that integrates multi-industry zero-shot and few-shot learning capabilities into LLMs. This approach allows the models to adapt and generalize to new fields, new data, and new tasks more effectively, flexibly responding to diverse data science tasks. This technical paradigm has been open-sourced (opens in new tab)to promote broader use.Read the paperMicrosoft Research in the newsMicrosofts smaller AI model beats the big guys: Meet Phi-4, the efficiency kingDecember 12, 2024Microsoft launched a new artificial intelligence model today that achieves remarkable mathematical reasoning capabilities while using far fewer computational resources than its larger competitors. Microsoft researcher Ece Kamar discusses the future of AI agents in 2025Tech Brew | December 12, 2024With AI agents widely expected to take off in 2025, the director of Microsofts AI Frontiers lab weighs in on the future of this technology, the safeguards needed, and the year ahead in AI research. A new frontier awaits computing with lightDecember 12, 2024In the guts of a new type of computer, a bunch of tiny LEDs emit a green glow. Those lights have a job to do. Theyre performing calculations. Right now, this math is telling the computer how to identify handwritten images of numbers. The computer is part of a research program at Microsoft. View more news and awards Opens in a new tab0 Comments 0 Shares 35 ViewsPlease log in to like, share and comment!
-
WWW.MICROSOFT.COMIdeas: AI and democracy with Madeleine Daepp and Robert Osazuwa NessTranscript[TEASER][MUSIC PLAYS UNDER DIALOGUE]MADELEINE DAEPP: Last summer, I was working on all of these like pro-democracy applications, trying to build out, like, a social data collection tool with AI, all this kind of stuff. And I went to the elections workshop that the Democracy Forward team at Microsoft had put on, and Dave Leichtman, who, you know, was the MC of that work, was really talking about how big of a global elections year 2024 was going to be. Over 70 countries around the world. And, you know, were coming from Microsoft Research, where we were so excited about this technology. And then, all of a sudden, I was at the elections workshop, and I thought, oh no, [LAUGHS] like, this is not good timing.ROBERT OSAZUWA NESS: What are we really talking about in the context of deepfakes in the political context, elections context? Its deception, right. Im trying to use this technology to, say, create some kind of false record of events in order to convince people that something happened that actually did not happen. And so that goal of deceiving, of creating a false record, thats kind of how I have been thinking about deepfakes in contrast to the broader category of generative AI.[TEASER ENDS]GINNY BADANES: Welcome to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. In this series, well explore the technologies that are shaping our future and the big ideas that propel them forward.[MUSIC FADES]Im your guest host, Ginny Badanes, and I lead Microsofts Democracy Forward program, where weve spent the past year deeply engaged in supporting democratic elections around the world, including the recent US elections. We have been working on everything from raising awareness of nation-state propaganda efforts to helping campaigns and election officials prepare for deepfakes to protecting political campaigns from cyberattacks. Today, Im joined by two researchers who have also been diving deep into the impact of generative AI on democracy.Microsoft senior researchers Madeleine Daepp and Robert Osazuwa Ness are studying generative AIs influence in the political sphere with the goal of making AI systems more robust against misuse while supporting the development of AI tools that can strengthen democratic processes and systems. They spent time in Taiwan and India earlier this year, where both had big democratic elections. Madeleine and Robert, welcome to the podcast!MADELEINE DAEPP: Thanks for having us.ROBERT OSAZUWA NESS: Thanks for having us.BADANES: So I have so many questions for you allfrom how you conducted your research to what youve learnedand Im really interested in what you think comes next. But first, lets talk about how you got involved in this in the first place. Could you both start by telling me a little bit about your backgrounds and just what got you into AI research in the first place?DAEPP: Sure. So Im a senior researcher here at Microsoft Research in the Special Projects team. But I did my PhD at MIT in urban studies and planning. And I think a lot of folks hear that field and think, oh, you know, housing, like upzoning housing and figuring out transportation systems. But it really is a field thats about little d democracy, right. About how people make choices about shared public spaces every single day. You know, I joined Microsoft first off to run this, sort of, technology deployment in the city of Chicago, running a low-cost air-quality-sensor network for the city. And when GPT-4 came out, you know, first ChatGPT, and then we, sort of, had this big recognition of, sort of, how well this technology could do in summarizing and in representing opinions and in making sense of big unstructured datasets, right. I got actually very excited. Like, I thought this could be used for town planning processes. [LAUGHS] Like, I thought we could I had a whole project with a wonderful intern, Eva Maxfield Brown, looking at, can we summarize planning documents using AI? Can we build out policies from conversations that people have in shared public spaces? And so that was very much the impetus for thinking about how to apply and build things with this amazing new technology in these spaces.BADANES: Robert, I think your background is a little bit different, yet you guys ended up in a similar place. So how did you get there?NESS: Yeah, so Im also on Special Projects, Microsoft Research. My work is focusing on large language models, LLMs. And, you know, so I focus on making these models more reliable and controllable in real-world applications. And my PhD is in statistics. And so I focus a lot on using just basic bread-and-butter statistical methods totry and control and understand LLM behavior. So currently, for example, Im leading a team of engineers and running experiments designed to find ways to enhance a graphical approach to combining information retrieval in large language models. I work on statistical tests for testing significance of adversarial attacks on these models.BADANES: Wow.NESS: So, for example, if you find a way to trick one of these models into doing something its not supposed to do, I make sure that its not, like, a random fluke; that its something thats reproducible. And I also work at this intersection between generative AI and, you know, Bayesian stuff, causal inference stuff. And so I came at looking at this democracy work through an alignment lens. So alignment is this task in AI of making sure these models align with human values and goals. And what I was seeing was a lot of research in the alignment space was viewing it as a technical problem. And, you know, as a statistician, were trained to consult, right. Like, to go to the actual stakeholders and say, hey, what are your goals? What are your values? And so this democracy work was an opportunity to do that in Microsoft Research and connected with Madeleine. So she was planning to go to Taiwan, and kind of from a past life, I wanted to become a trade economist and learned Mandarin. And so I speak fluent Mandarin and seemed like a good matchup of our skill sets BADANES: Yeah.NESS: and interests. And so thats, kind of, how we got started.BADANES: So, Madeleine, you brought the two of you together, but what started it for you? This podcast is all about big ideas. What sparked the big idea to bring this work that youve been doing on generative AI into the space of democracy and then to go out and find Robert and match up together?DAEPP: Yeah, well, Ginny, it was you. [LAUGHS] It was actually your team.BADANES: I didnt plant that! [LAUGHS]DAEPP: So, you know, I think last summer, I was working on all of these like pro-democracy applications, trying to build out, like, a social data collection tool with AI, all this kind of stuff. And I went to the elections workshop that the Democracy Forward team at Microsoft had put on, and Dave Leichtman, who, you know, was the MC of that work, was really talking about how big of a global elections year 2024 was going to be, that thishe was calling it Votorama. You know, that term didnt take off. [LAUGHTER] The term that has taken off is biggest election year in history, right. Over 70 countries around the world. And, you know, were coming from Microsoft Research, where we were so excited about this technology. Like, when it started to pass theory of mind tests, right, which is like the ability to think about how other people are thinking, like, we were all like, oh, this is amazing; this opens up so many cool application spaces, right. When it was, like, passing benchmarks for multilingual communication, again, like, we were so excited about the prospect of building out multilingual systems. And then, all of a sudden, I was at the elections workshop, and I thought, oh no, [LAUGHS] this is not good timing.BADANES: Yeah DAEPP: And because so much of my work focuses on, you know, building out computer science systems like, um, data science systems or AI systems but with communities in the loop, I really wanted to go to the folks most affected by this problem. And so I proposed a project to go to Taiwan and to study one of the it was the second election of 2024. And Taiwan is known to be subject to more external disinformation than any other place in the world. So if you were going to see something anywhere, you would see it there. Also, it has amazing civil society response so really interesting people to talk to. But I do not speak, Chinese, right. Like, I dont have the context; I dont speak the language. And so part of my process is to hire a half-local team. We had an amazing interpreter, Vickie Wang, and then a wonderful graduate student, Ti-Chung Cheng, who supported this work. But then also my team, Special Projects, happened to have this person who, like, not only is a leading AI researcher publishing in NeurIPS, like building out these systems, but who also spoke Chinese, had worked in technology security, and had a real understanding of international studies and economics as well as AI. And so for me, like, finding Robert as a collaborator was kind of a unicorn moment.BADANES: So it sounds like it was a match made in heaven of skill sets and abilities. Before we get into what you all found there, which I do want to get into, I first think its helpfulI dont know, when were dealing with these, like, complicated issues, particularly things that are moving and changing really quickly, sometimes I found its helpful to agree on definitions and sort of say, this is what we mean when we say this word. And that helps lead to understanding. So while I know that this research is about more than deepfakesand well talk about some of the things that are more than deepfakesI am curious how you all define that term and how you think of it. Because this is something that I think is constantly moving and changing. So how have you all been thinking about the definition of that term?NESS: So Ive been thinking about it in terms of the intention behind it, right. We say deepfake, and I think colloquially that means kind of all of generative AI. Thats a bit unfortunate because there are things that are you know, you can use generative AI to generate cartoons BADANES: Right.NESS: or illustrations for a childrens book. And so in thinking about what are we really talking about in the context of deepfakes in the political context, elections context, its deception, right. Im trying to use this technology to, say, create some kind of false record of events, say, for example, something that a politician says, in order to convince people that something happened that actually did not happen.BADANES: Right.NESS: And so that goal of deceiving, of creating a false record, thats kind of how I have been thinking about deepfakes in contrast to the broader category of generative AI and deepfakes in terms of being a malicious use case. There are other malicious use cases that dont necessarily have to be deceptive, as well, as well as positive use cases.BADANES: Well, that really, I mean, that resonates with me because what we found was when you use the term deceptionor another term we hear a lot that I think works is fraudthat resonates with other people, too. Like, that helps them distinguish between neutral uses or even positive uses of AI in this space and the malicious use cases, though to your point, I suppose theres probably even deeper definitions of what malicious use could look like. Are you finding that distinction showing up in your work between fraud and deception in these use cases? Is that something that has been coming through?DAEPP: You know, we didnt really think about the term fraud until we started prepping for this interview with you.financially was not OK, right. Thats fraud. Using AI for the purposes of nudifying, like removing somebodys clothes and then sextorting them, right, extorting them for money out of fear that this would be shared, like, that was not OK. And those are such clear lines. And it was clear that theres a set of uses of generative AI also in the political space, you know, of saying this person said something that they didnt, BADANES: Mm-hmm.DAEPP: of voter suppression, that in general, theres a very clear line that when it gets into that fraudulent place, when it gets into that simultaneously deceptive and malicious space, thats very clearly a no-go zone.NESS: Oftentimes during this research, I found myself thinking about this dichotomy in cybersecurity of state actors, or broadly speaking, kind of, political actors, versus criminals.BADANES: Right.NESS: And its important to understand the distinction because criminals are typically trying to target targets of opportunity and make money, while state-sponsored agents are willing to spend a lot more money and have very specific targets and have a very specific definition of success. And so, like, this fraud versus deception kind of feels like that a little bit in the sense that fraud is typically associated with criminal behavior, while, say, I might put out deceptive political messaging, but it might fall within the bounds of free speech within my country.BADANES: Right, yeah.NESS: And so this is not to say I disagree with that, but it just, actually, that it could be a useful contrast in terms of thinking about the criminal versus the political uses, both legitimate and illegitimate.BADANES: Well, I also think those of us who work in the AI space are dealing in very complicated issues that the majority of the world is still trying to understand. And so any time you can find a word that people understand immediately in order to do the, sort of, storytelling: the reason that we are worried about deepfakes in elections is because we do not want voters to be defrauded. And that, we find really breaks through because people understand that term already. Thats a thing that they already know that they dont want to be; they do not want to be defrauded in their personal life or in how they vote. And so that really, I found, breaks through. But as much as I have talked about deepfakes, I know that youand I know theres a lot of interest in talking about deepfakes when we talk about this subjectbut I know your research goes beyond that. So what other forms of generative AI did you include in your research or did you encounter in the effort that you were doing both in Taiwan and India?DAEPP: Yeah. So let me tell you just, kind of, a big overview of, like, our taxonomy. Because as you said, like, so much of this is just about finding a word, right. Like, so much of it is about building a shared vocabulary so that we can start to have these conversations. And so when we looked at the political space, right, elections, so much of what it means to win an election is kind of two things. Its building an image of a candidate, right, or changing the image of your opposition and telling a story, right.BADANES: Mm-hmm.DAEPP: And so if you think about image creation, of course, there are deepfakes. Like, of course, there are malicious representations of a person. But we also saw a lot of what were calling auth fakes, like authorized fakes, right. Candidates who would actually go to a consultancy and, like, get their bodies scanned so that videos could be made of them. Theyd get their voices, a bunch of snippets of their voices, recorded so that then there could be personalized phone calls, right. So these are authorized uses of their image and likeness. Then we saw a term Ive heard in, sort of, the ether is soft fakes. So again, likenesses of a candidate, this time not necessarily authorized but promotional. They werent people on TwitterI guess, Xon Instagram, they were sharing images of the candidate that they supported that were really flattering or silly or, you know, just really sort of in support of that person. So not with malicious intent, right, with promotional intent. And then the last one, and this, I think, was Roberts term, but in this image creation category, you know, one thing we talked about was just the way that people were also making fun of candidates. And in this case, this is a bit malicious, right. Like, theyre making fun of people; theyre satirizing them. But its not deceptive because, BADANES: Right DAEPP: you know, often it has that hyper-saturated meme aesthetic. Its very clearly AI or just, you know, per like, sort of, US standards for satire, like, a reasonable person would know that it was silly. And so Robert said, you know, oh, these influencers, theyre not trying to deceive people; like, theyre not trying to lie about candidates. Theyre trying to roast them. [LAUGHTER] And so we called it a deep roast. So thats, kind of, the images of candidates. I will say we also looked at narrative building, and there, one really important set of things that we saw was what we call text to b-roll. So, you know, a lot of folks think that you cant really make AI videos because, like, Sora isnt out yet[1]. But in fact, what there is a lot of is tooling to, sort of, use AI to pull from stock imagery and b-roll footage and put together a 90-second video. You know, it doesnt look like AI; its a real video. So text to b- roll, AI pasta? So if you know the threat intelligence space, theres this thing called copy pasta, where people just BADANES: Sure.DAEPP: its just a fun word for copy-paste. People just copy-paste terms in order to get a hashtag trending. And we talked to an ex-influencer who said, you know, were using AI to do this. And I asked him why. And he said, well, you know, if you just do copy-paste, the fact-checkers catch it. But if you use AI, they dont. And so AI pasta. And theres also some research showing that this is potentially more persuasive than copy-paste BADANES: Interesting.DAEPP: because people think theres a social consensus. And then the last one, this is my last of the big taxonomy, and, Robert, of course, jump in on anything you want to go deeper on, but Fake News 2.0. You know, Im sure youve seen this, as well. Just this, like, creation of news websites, like entire new newspapers that nobodys ever heard of. AI avatars that are newscasters. And this is something that was happening before. Like, theres a long tradition of pretending to be a real news pamphlet or pretending to be a real outlet. But theres some interesting work out of Patrick Warren at Clemson has looked at some of these and shown the quality and quantity of articles on these things has gotten a lot better and, you know, improves as a step function of, sort of, when new models come out.NESS: And then on the flip side, you have people using the same technologies but stated clearly that its AI generated, right. So we mentioned the AI avatars. In India, theres this theres Bhoomi, which is a AI news anchor for agricultural news, and it states there in clear terms that shes not real. But of course, somebody who wanted to be deceptive could use the same technology to portray something that looks like a real news broadcast that isnt. You know, and, kind of, going back, Madeleine mentioned deep roasts, right, so, kind of, using this technology to create satirical depictions of, say, a political opponent. Somebody, a colleague, sent something across my desk. It was a Douyin accountso Douyin is the version of TikTok thats used inside China; BADANES: OK.NESS: same company, but its the internal version of TikTokthat was posting AI-generated videos of politicians in Taiwan. And these were excellent, real good-quality AI-generated deepfakes of these politicians. But some of them were, first off, on the bottom of all of them, it said, this is AI-generated content.BADANES: Oh.NESS: And some of them were, kind of, obviously meant to be funny and were clearly fake, like still images that were animated to make somebody singing a funny song, for example. A very serious politician singing a very silly song. And its a still image. Its not even, its not even BADANES: a video.NESS: like video.BADANES: Right, right.NESS: And so I messaged Puma Shen, who is one of the legislators in Taiwan who was targeted by these attacks, and I said, what do you think about this? And, you know, he said, yeah, they got me. [LAUGHTER] And I said, you know, do you think people believe this? I mean, there are people who are trying to debunk it. And he said, no, our supporters dont believe it, but, you know, people who support the other side or people who are apolitical, they might believe it, or even if it says its fakethey know its fakebut they might still say that, yeah, but this is something they would do, right. This is BADANES: Yeah, it fits the narrative. Yeah.NESS: it fits the narrative, right. And that, kind of, that really, you know, I had thought of this myself, but just hearing somebody, you know, whos, you know, a politician whos targeted by these attacks just saying that its, like, even if they believe its even if they know its fake, they still believe it because its something that they would do.BADANES: Sure.NESS: Thats, you know, as a form of propaganda, even relative to the canonical idea of deepfake that we have, this could be more effective, right. Like, just say its AI and then use it to, kind of, paint the picture of the opponent in any way you like.BADANES: Sure, and this gets into that, sort of, challenging space I think we find ourselves in right now, which is people dont know necessarily how to tell whats real or not. And the case youre describing, it has labeling, so that should tell you. But a lot of the content we come across online does not have labeling. And you cannot tell just based on your eyes whether images were generated by AI or whether theyre real. One of the things that I get asked a lot is, why cant we just build good AI to detect bad AI, right? Why dont we have a solution where I just take a picture and I throw it into a machine and it tells me thumbs-up or thumbs-down if this is AI generated or not? And the question around detection is a really tricky one. Im curious what you all think about, sort of, the question of, can detection solve this problem or not?NESS: So Ill mention one thing. So Madeleine mentioned an application of this technology called text to b-roll. And so what this is, technically speaking, what this is doing is youre taking real footage, you stick it in a database, its quote, unquote vectorized into these representations that the AI can understand, and then you say, hey, generate a video that illustrates this narrative for me. And you provide it the text narrative, and then it goes and pulls out a whole bunch of real video from a database and curates them into a short video that you could put on TikTok, for example. So this was a fully AI-generated product, but none of the actual content is synthetic.BADANES: Ah, right.NESS: So in that case, your quote, unquote AI detection tool is not going to work.DAEPP: Yeah, I mean, something that I find really fascinating any time that youre dealing with a sociotechnical system, righta technical system embedded in social contextis folks, you know, think that things are easy that are hard and things are hard that are easy, right. And so with a lot of the detections work, right, like if you put a deepfake detector out, you make that available to anyone, then what they can do is they can run a bunch of stuff by it, BADANES: Yeah.DAEPP: add a little bit of random noise, and then the deepfake detector doesnt work anymore. And so that detection, actually, technically becomes an arms race, you know. And were seeing now some detectors that, like, you know, work when youre not looking at a specific image or a specific piece of text but youre looking at a lot all at once. That seems more promising. But, just, this is a very, very technically difficult problem, and that puts us as researchers in a really tricky place because, you know, youre talking to folks who say, why cant you just solve this? If you put this out, then you have to put the detector out. And were like, thats actually not, thats not a technically feasible long-term solution in this space. And the solutions are going to be social and regulatory and, you know, changes in norms as well as technical solutions that maybe are about everything outside of AI, right.BADANES: Yeah.DAEPP: Not about fixing the AI system but fixing the context within which its used.BADANES: Its not just a technological solution. Theres more to it. Robert?NESS: So if somebody were to push back there, they could say, well, great; in the long term, maybe its an arms race, but in the short term, right, we can have solutions out there that, you know, at least in the next election cycle, we could maybe prevent some of these things from happening. And, again, kind of harkening back to cybersecurity, maybe if you make it hard enough, only the really dedicated, really high-funded people are going to be doing it rather than, you know, everybody who wants to throw a bunch of deepfakes on the internet. But the problem still there is that it focuses really on video and images, right.BADANES: Yeah. What about audio?NESS: What about audio? And what about text? So BADANES: Yeah. Those are hard. I feel like weve talked a lot about definitions and theoretical, but I want to make sure we talk more about what you guys saw and researched and understood on the ground, in particular, your trips to India and Taiwan and even if you want to reflect on how those compare to the US environment. What did you actually uncover? What surprised you? What was different between those countries?DAEPP: Yeah, I mean, right, so Taiwan both of these places are young democracies. And thats really interesting, right. So like in Taiwan, for example, when people vote, they vote on paper. And anybody can go watch. Thats part of their, like, security strategies. Like, anyone around the world can just come and watch. People come from far. They fly in from Canada and Japan and elsewhere just to watch Taiwanese people vote. And then similarly in India, theres this rule where you have to be walking distance from your polling place, and so the election takes two months. And, like, your polling places move from place to place, and sometimes, it arrives on an elephant. And so these were really interesting places to, like, I as an American, just, like, found it very, very fascinating to and important to be outside of the American context. You know, we just take for granted that how we do democracy is how other people do it. But Taiwan was very much a joint, like, civil societygovernment everyday response to this challenge of having a lot of efforts to manipulate public opinion happening with, you know, real-world speeches, with AI, with anything that you can imagine. You know, and I think the Microsoft Threat Analysis Center released a report documenting some of the, sort of, video stuff[2]. Theres a use of AI to create videos the night before the election, things like this. But then India is really thinking of so India, right, its the worlds biggest democracy, right. Like, nearly a billion people were eligible to vote.BADANES: Yeah.NESS: And arguably the most diverse, right?DAEPP: Yeah, arguably the most diverse in terms of languages, contexts. And its also positioning itself as the AI laboratory for the Global South. And so folks, including folks at the MSR (Microsoft Research) Bangalore lab, are leaders in thinking about representing low-resource languages, right, thinking about cultural representation in AI models. And so there you have all of these technologists who are really trying to innovate and really trying to think about whats the next clever application, whats the next clever use. And so that, sort of, that taxonomy that we talked about, like, I think just every week, every interview, we, sort of, had new things to add because folks there were just constantly trying all different kinds of ways of engaging with the public.NESS: Yeah, I think for me, in India in particular, you know, India is an engineering culture, right. In terms of, like, the professional culture there, theyre very, kind of, engineering skewed. And so I think one of the bigger surprises for me was seeing people who were very experienced and effective campaign operatives, right, people who would go and, you know, hit the pavement; do door knocking; kind of, segment neighborhoods by demographics and voter block, these people were also, you know, graduated in engineering from an IIT (Indian Institute of Technology), BADANES: Sure.NESS: right, and so [LAUGHS] so they were happy to pick up these tools and leverage them to support their expertise in this work, and so some of the, you know, I think a lot of the narrative that we tell ourselves in AI is how its going to be, kind of, replacing people in doing their work. But what I saw in India was that people who were very effective had a lot of domain expertise that you couldnt really automate away and they were the ones who are the early adopters of these tools and were applying it in ways that I think were behind on in terms of, you know, ideas in the US.BADANES: Yeah, I mean, theres, sort of, this sentiment that AI only augments existing problems and can enhance existing solutions, right. So were not great at translation tools, but AI will make us much better at that. But that also can then be weaponized and used as a tool to deceive people, which propaganda is not new, right? Were only scaling or making existing problems harder, or adversaries are trying to weaponize AI to build on things theyve already been doing, whether thats cyberattacks or influence operations. And while the three of us are in different roles, we do work for the same company. And its a large technology company that is helping bring AI to the world. At the same time, I think there are some responsibilities when we look at, you know, bad actors who are looking to manipulate our products to create and spread this kind of deceptive media, whether its in elections or in other cases like financial fraud or other ways that we see this being leveraged. Im curious what you all heard from others when youve been doing your research and also what you think our responsibilities are as a big tech company when it comes to keeping actors from using our products in those ways.DAEPP: You know, when I started using GPT-4, one of the things I did was I called my parents, and I said, if you hear me on a phone call, BADANES: Yeah.DAEPP: like, please double check. Ask me things that only I would know. And when I walk around Building 99, which is, kind of, a storied building in which a lot of Microsoft researchers work, everybody did that call. We all called our parents.BADANES: Interesting.DAEPP: Or, you know, we all checked in. So just as, like, we have a responsibility to the folks that we care about, I think as a company, that same, sort of, like, raising literacy around the types of fraud to expect and how to protect yourself from themI think that gets back to that fraud space that we talked aboutand, you know, supporting law enforcement, sharing what needs to be shared, I think that without question is a space that we need to work in. I will say a lot of the folks we talked with, they were using Llama on a local GPU, right.BADANES: OK.DAEPP: They were using open-source models. They were sometimes they were testing out Phi. They would use Phi, Grok, Llama, like anything like that. And so that raises an interesting question about our guardrails and our safety practices. And I think there, we have an, like, our obligation and our opportunity actually is to set the standard, right. To say, OK, like, you know, if you use local Llama and it spouts a bunch of stuff about voter suppression, like, you can get in trouble for that. And so what does it mean to have a safe AI that wins in the marketplace, right? Thats an AI that people can feel confident and comfortable about using and one thats societally safe but also personally safe. And I think thats both a challenge and a real opportunity for us.BADANES: Yeah oh, go ahead, Robert, yeah NESS: Going back to the point about fraud. It was this year, in January, when that British engineering firm Arup, when somebody used a deepfake to defraud that company of about $25 million, BADANES: Yeah.NESS: their Hong Kong office. And after that happened, some business managers in Microsoft reached out to me regarding a major client who wanted to start red teaming. And by red teaming, I mean intentionally targeting your executives and employees with these types of attacks in order to figure out where your vulnerabilities as an organization are. And I think, yeah, it got me thinking like, wow, I would, you know, can we do this for my dad? [LAUGHS] Because I think that was actually a theme that came out from a lot of this work, which was, like, how can we empower the people who are really on the frontlines of defending democracy in some of these places in terms of the tooling there? So we talked about, say, AI detection tools, but the people who are actually doing fact-checking, theyre looking more than at just the video or the images; theyre actually looking at a, kind of, holistic taking a holistic view of the news story and doing some proper investigative journalism to see if something is fake or not.BADANES: Yeah.NESS: And so I think as a company who creates products, can we take a more of a product mindset to building tools that support that entire workflow in terms of fact-checking or investigative journalism in the context of democratic outcomes BADANES: Yeah.NESS: where maybe looking at individual deepfake content is just a piece of that.BADANES: Yeah, you know, I think theres a lot of parallels here to cybersecurity. Thats also what weve found, is this idea that, first of all, the no silver bullet, as we were talking about earlier with the detection piece. Like, you cant expect your system to be secure just because you have a firewall, right. You have to have this, like, defense in-depth approach where you have lots of different layers. And one of those layers has been on the literacy side, right. Training and teaching people not to click on a phishing link, understanding that they should scroll over the URL. Like, these are efforts that have been taken up, sort of, in a broad societal sense. Employers do it. Big tech companies do it. Governments do it through PSAs and other things. So theres been a concerted effort to get a population who might not have been aware of the fact that they were about to be scammed to now know not to click on that link. I think, you know, you raised the point about literacy. And I think theres something to be said about media literacy in this space. Its both AI literacyunderstanding what it isbut also understanding that people may try to defraud you. And whether that is in the political sense or in the financial sense, once you have that, sort of, skill set in place, youre going to be protected. One thing that Ive heard, though, as I have conversations about this challenge Ive heard a couple things back from people specifically in civil society. One is not to put the impetus too much on the end consumer, which I think Im hearing that we also recognize theres things that we as technology companies should be focusing on. But the other thing is the concern that in, sort of, the long run, were going to all lose trust in everything we see anyway. And Ive heard some people refer to that as the trust deficit. Have you all seen anything promising in the space to give you a sense around, can we ever trust what were looking at again, or are we actually just training everyone to not believe anything they see? Which I hope is not the case. I am an optimist. But Id love to hear what you all came across. Are there signs of hope here where we might actually have a place where we can trust what we see again?DAEPP: Yeah. So two things. There is this phenomenon called the liars dividend, right, BADANES: Sure, yeah.DAEPP: which is where that if you educate folks about how AI can be used to create fake clips, fake audio clips, fake videos, then if somebody has a real audio clip, a real video, they can claim that its AI. And I think we talk, you know, again, this is, like, in a US-centric space, we talk about this with politicians, but the space in which this is really concerning, I think, is war crimes, right BADANES: Oh, yeah.DAEPP: I think are these real human rights infractions where you can prevent evidence from getting out or being taken seriously. And we do see that right after invasions, for example, these days. But this is actually a space like, I just told you, like, oh, like, detection is so hard and not technically, like, thatll be an arms race! But actually, there is this wonderful project, Project Providence, that is a Microsoft collaboration with a company called Truepic that its, like, an app, right. And what happens is when you take a photo using this app, it encrypts the, you know, hashes the GPS coordinates where the photo was taken, the time, the day, and uploads that with the pixels, with the image, to Azure. And then later, when a journalist goes to use that image, they can see that the pixels are exactly the same, and then they can check the location and they can confirm the GPS. And this actually meets evidentiary standards for the UN human rights tribunal, right.BADANES: Right.DAEPP: So this is being used in Ukraine to document war crimes. And so, you know, what if everybody had that app on their phone? That means you dont you know, most photos you take, you can use an AI tool and immediately play with. But in that particular situation where you need to confirm provenance and you need to confirm that this was a real event that happened, that is a technology that exists, and I think folks like the C2PA coalition (Coalition for Content Provenance and Authenticity) can make that happen across hardware providers.NESS: And I think the challenge for me is, we cant separate this problem from some of the other, kind of, fundamental problems that we have in our media environment now, right. So, for example, if I go on to my favorite social media app and I see videos from some conflicts around the world, and these videos could be not AI generated and I still could be, you know, the target of some PR campaign to promote certain content and suppress other ones. The videos could be authentic videos, but not actually be accurate depictions of what they claim to be. And so I think that this is a the AI presents a complicating factor in an already difficult problem space. And I think, you know, trying to isolate these different variables and targeting them individually is pretty tricky. I do think that despite the liars dividend that media literacy is a very positive area to, kind of, focus energy BADANES: Yeah.NESS: in the sense that, you know, you mentioned earlier, like, using this term fraud, again, going back to this analogy with cybersecurity and cybercrime, that it tends to resonate with people. We saw that, as well, especially in Taiwan, didnt we, Madeleine? Well, in India, too, with the sextortion fears. But in Taiwan, a lot of just cybercrime in terms of defrauding people of money. And one of the things that we had observed there was that talking about generative AI in the context of elections was difficult to talk to people about it because people, kind of, immediately went into their political camps, right.BADANES: Yeah.NESS: And so you had to, kind of, penetrate you know, people were trying to, kind of, suss out which side you were on when youre trying to educate them about this topic.BADANES: Sure.NESS: But if you talk tobut everybodys, like, fraud itself is a lot less partisan.BADANES: Yeah, its a neutral term.NESS: Exactly. And so it becomes a very useful way to, kind of, get these ideas out there.BADANES: Thats really interesting. And I love the provenance example because it really gets to the question about authenticity. Like, where did something come from? What is the origin of that media? Where has it traveled over time? And if AI is a component of it, then thats a noted fact. But it doesnt put us into the space of AI or not AI, which I think is where a lot of the, sort of, labeling has gone so far. And I understand the instinct to do that. But I like the idea of moving more towards how do you know more about an image of which whether there was AI involved or not is a component but does not have judgment. That does not make the picture good or bad. It doesnt make it true or false. Its just more information for you to consume. And then, of course, the media literacy piece, people need to know to look for those indicators and want them and ask for them from the technology company. So I think thats a good, thats a good silver lining. You gave me the light at the end of the tunnel I think I was looking for on the post-truth world. So, look, heres the big question. You guys have been spending this time focusing on AI and democracy in this big, massive global election year. There was a lot of hype. [LAUGHS] There was a lot of hype. Lots of articles written about how this was going to be the AI election apocalypse. What say you? Was it? Was it not?NESS: I think it was, well, we definitely have documented cases where this happened. And Im wary of this question, particularly again from the cybersecurity standpoint, which is if you were not the victim of a terrible hack that brought down your entire company, would you say, like, well, it didnt happen, so its not going to happen, right. You would never BADANES: Yeah.NESS: That would be a silly attitude to have, right. And also, you dont know what you dont know, right. So, like, a lot of the, you know, we mentioned sextortion; we mentioned these cybercrimes. A lot of these are small-dollar crimes, which means they dont get reported or they dont get reported for reasons of shame. And so we dont even have numbers on a lot of that. And we know that the political techniques are going to mirror the criminal techniques.BADANES: Yeah.NESS: And also, I worry about, say, down-ballot elections. Like, so much of, kind of, our election this year, a lot of the focus was on the national candidates, but, you know, if local poll workers are being targeted, if disinformation campaigns are being put out about local candidates, its not going get the kind of play in the national media such that you and I might hear about it. And so Im, you know, so Ill hand it off to Madeleine, but yeah.DAEPP: So absolutely agree with Roberts point, right. If your child was affected by sextortion, if you are a country that had an audio clip go viral, this was the deepfake deluge for you, right. That said, something that happened, you know, in India as in the United States, there were major prosecutions very early on, right.BADANES: Yeah.DAEPP: So in India, there was a video. It turned out not to be a deepfake. It turned out to be a cheap fake, to your point about, you know, the question isnt whether theres AI involved; the question is whether this is an attempt to defraud. And five people were charged for this video.BADANES: Yeah.DAEPP: And in the United States, right, those Biden robocalls using Bidens voice to tell folks not to vote, like, that led to a million-dollar fine, I think, for the telecoms and $6 million for the consultant who created that. And when we talk to people in India, you know, people who work in this space, they said, well, Im not going to do that; like, Im going to focus on other things. So internal actors pay attention to these things. That really changes what people do and how they do it. And so that, I do think the work that your team did, right, to educate candidates about looking out for the stuff, the work that the MTAC (Microsoft Threat Analysis Center) did to track usage and report it, all of that, I think, was, actually, those interventions, I think, worked. I think they were really important, and I do think that what we are this absence of a deluge is actually a huge number of people making a very concerted effort to prevent it from happening.BADANES: Thats encouraging.NESS: Madeleine, you made a really important point that this deterrence from prosecution, its effective for internal actors, BADANES: Yeah.DAEPP: Yeah, thats right.NESS: right. So for foreign states who are trying to interfere with other peoples elections, the fear of prosecution is not going to be as much of a deterrent.BADANES: That is true. I will say what we saw in this election cycle, in particular in the US, was a concerted effort by the intelligence community to call out and name nation-state actors who were either doing cyberattacks or influence operations, specific videos that they identified, whether there was AI involved or not. I think that level of communication with the public while maybe doesnt lead to those actors going to jailmaybe somedaybut does in fact lead to a more aware public and therefore hopefully a less effective campaign. If people on the other end and its a little bit into the literacy space, and its something that weve seen government again in this last cycle do very effectively, to name and shame essentially when they see these things in part, though, to make sure voters are aware of whats happening. Were not quite through this big global election year; we have a couple more elections before we really hit the end of the year, but its winding down. What is next for you all? Are you all going to continue this work? Are you going build on it? What comes next?DAEPP: So our research in India actually wasnt focused specifically on elections. It was about AI and digital communications.BADANES: Ahh.DAEPP: Because, you know, again, like India is this laboratory.BADANES: Sure.DAEPP: And I think what we learned from that work is that, you know, this is going to be a part of our digital communications and our information system going forward without question. And the question is just, like, what are the viable business models, right? What are the applications that work? And again, that comes back to making sure that whatever AI you know, people when they build AI into their entire, you know, newsletter-writing system, when they build it into their content production, that they can feel confident that its safe and that it meets their needs and that theyre protected when they use it. And similarly, like, what are those applications that really work, and how do you empower those lead users while mitigating those harms and supporting civil society and mitigating those harms? I think thats an incredible, like, thatsas a researcherthats, you know, thats a career, right.BADANES: Yeah.DAEPP: Thats a wonderful research space. And so I think understanding how to support AI that is safe, that enables people globally to have self-determination in how models represent them, and that is usable and powerful, I think thats broadly BADANES: Where this goes.DAEPP: what I want to drive.BADANES: Robert, how about you?NESS: You know, so I mentioned earlier on these AI alignment issues.BADANES: Yeah.NESS: And I was really fascinated by how local and contextual those issues really are. So to give an example from Taiwan, we train these models on training data that we find from the internet. Well, when it comes to, say, Mandarin Chinese, you can imagine the proportion of content, of just the quantity of content, on the internet that comes from China is a lot more than the quantity that comes from Taiwan. And of course, whats politically correct in China is different from whats politically correct in Taiwan. And so when we were talking to Taiwanese, a lot of people had these concerns about, you know, having these large language models that reflected Taiwanese values. We heard the same thing in India about just people on different sides of the political spectrum and, kind of, looking at a YouTuber in India had walked us through this how, for example, a founding father of India, there was a disparate literature in favor of this person and some more critical of this person, and he had spent time trying to suss out whether GPT-4 was on one side or the other.BADANES: Oh. Whose side are you on? [LAUGHS]NESS: Right, and so I think for our alignment research at Microsoft Research, this becomes the beginning of, kind of, a very fruitful way of engaging with local stakeholders and making sure that we can reflect these concerns in the models that we develop and deploy.BADANES: Yeah. Well, first, I just want to thank you guys for all the work youve done. This is amazing. Weve really enjoyed partnering with you. Ive loved learning about the research and the efforts, and Im excited to see what you do next. I always want to end these kinds of conversations on a more positive note, because weve talked a lot about the weaponization of AI and, you know, how ethical areas that are confusing and but I am sure at some point in your work, you came across really positive use cases of AI when it comes to democracy, or at least I hope you have. [LAUGHS] Do you have any examples or can you leave us with something about where you see either it going or actively being used in a way to really strengthen democratic processes or systems?DAEPP: Yeah, I mean, there is just a big paper in Science, right, which, as researchers, when something comes out in Science, you know your field is about to change, right, BADANES: Yeah.DAEPP: showing that an AI model in, like, political deliberations, small groups of UK residents talking about difficult topics like Brexit, you know, climate crisis, difficult topics, that in these conversations, an AI moderator created, like, consensus statements that represented the majority opinion, still showed the minority opinion, but that participants preferred to a human-written statement and in fact preferred to their original opinion.BADANES: Wow.DAEPP: And that this, you know, not only works in these randomized controlled trials but actually works in a real citizens deliberation. And so that potential of, like, carefully fine-tuned, like, carefully aligned AI to actually help people find points of agreement, thats a really exciting space.BADANES: So next time my kids are in a fight, Im going to point them to Copilot and say, work with Copilot to mediate. [LAUGHS] No, thats really, really interesting. Robert, how about you?NESS: She, kind of, stole my example. [LAUGHTER] But Ill take it from a different perspective. So, yes, like how these technologies can enable people to collaborate and ideally, I think, from a democratic standpoint, at a local level, right. So, I mean, I think so much of our politics were, kind of, focused at the national-level campaign, but our opportunity to collaborate is much more were much more easily we can collaborate much more easily with people who are in our local constituencies. And I think to myself about, kind of, like, the decline particularly of local newspapers, local media.BADANES: Right.NESS: And so I wonder, you know, can these technologies help address that problem in terms of just, kind of, information about, say, your local community, as well as local politicians. And, yeah, and to Madeleines point, so Madeleine started the conversation talking about her background in urban planning and some of the work she did, you know, working on a local level with local officials to bring technology to the level of cities. And I think, like, well, you know, politics are local, right. So, you know, I think that thats where theres a lot of opportunity for improvement.BADANES: Well, Robert, you just queued up a topic for a whole other podcast because our team also does a lot of work around journalism, and I will say we have seen that AI at the local level with local news is really a powerful tool that were starting to see a lot of appetite and interest for in order to overcome some of the hurdles they face right now in that industry when it comes to capacity, financing, you know, not able to be in all of the places they want to be at once to make sure that theyre reporting equally across the community. This is, like, a perfect use case for AI, and were starting to see folks who are really using it. So maybe well come back and do this again another time on that topic. But I just want to thank you both, Madeleine and Robert, for joining us today and sharing your insights. This was really a fascinating conversation. I know I learned a lot. I hope that our listeners learned a lot, as well.[MUSIC]And, listeners, I hope that you tune in for more episodes of Ideas, where we continue to explore the technologies shaping our future and the big ideas behind them. Thank you, guys, so much.DAEPP: Thank you.NESS: Thank you.[MUSIC FADES][1] The video generation model Sora was released publicly earlier this month (opens in new tab).[2] For a summary of and link to the report, see the Microsoft On the Issues blog post China tests US voter fault lines and ramps AI content to boost its geopolitical interests (opens in new tab).0 Comments 0 Shares 71 Views
-
WWW.MICROSOFT.COMAIOpsLab: Building AI agents for autonomous cloudsIn our increasingly complex digital landscape, enterprises and cloud providers face significant challenges in the development, deployment, and maintenance of sophisticated IT applications. The broad adoption of microservices and cloud-based serverless architecture has streamlined certain aspects of application development while simultaneously introducing a host of operational difficulties, particularly in fault diagnosis and mitigation. These complexities can result in outages, which have the potential to cause major business disruptions, underscoring the critical need for robust solutions that ensure high availability and reliability in cloud services. As the expectation for five-nines availability grows, organizations must navigate the intricate web of operational demands to maintain customer satisfaction and business continuity.To tackle these challenges, recent research on using AIOps agents for cloud operationssuch as AI agents for incident root cause analysis (RCA) or triaginghas relied on proprietary services and datasets. Other prior works use frameworks specific to the solutions that they are building, or ad hoc and static benchmarks and metrics that fail to capture the dynamic nature of real-world cloud services. Users developing agents for cloud operations tasks with Azure AI Agent Service can evaluate and improve them using AIOpsLab. Furthermore, current approaches do not agree on standard metrics or a standard taxonomy for operational tasks. This calls for a standardized and principled research framework for building, testing, comparing, and improving AIOps agents. The framework should allow agents to interact with realistic service operation tasks in a reproducible manner. It must be flexible in extending to new applications, workloads, and faults. Importantly, it should go beyond just evaluating the AI agents and enabling users to improve the agents themselves; for example, by providing sufficient observability and even serving as a training environment (gym) to generate samples to learn on.We developed AIOpsLab, a holistic evaluation framework for researchers and developers, to enable the design, development, evaluation, and enhancement of AIOps agents, which also serves the purpose of reproducible, standardized, interoperable, and scalable benchmarks. AIOpsLab is open sourced at GitHub (opens in new tab) with the MIT license, so that researchers and engineers can leverage it to evaluate AIOps agents at scale. The AIOpsLab research paper has been accepted at SoCC24 (the annual ACM Symposium on Cloud Computing).Figure 1. System architecture of AIOpsLab.Agent-cloud interface (ACI)AIOpsLab strictly separates the agent and the application service using an intermediate orchestrator. It provides several interfaces for other system parts to integrate and extend. First, it establishes a session with an agent to share information about benchmark problems: (1) the problem description, (2) instructions (e.g., response format), and (3) available APIs to call as actions.The APIs are a set of documented tools, e.g., get logs, get metrics, and exec shell, designed to help the agent solve a task. There are no restrictions on the agents implementation; the orchestrator poses problems and polls it for the next action to perform given the previous result. Each action must be a valid API call, which the orchestrator validates and carries out. The orchestrator has privileged access to the deployment and can take arbitrary actions (e.g., scale-up, redeploy) using appropriate tools (e.g., helm, kubectl) to resolve problems on behalf of the agent. Lastly, the orchestrator calls workload and fault generators to create service disruptions, which serve as live benchmark problems. AIOpsLab provides additional APIs to extend to new services and generators.Example shows how to onboard an agent to AIOpsLabfrom aiopslab import Orchestratorclass Agent: def __init__(self, prob, instructs, apis): self.prompt = self.set_prompt(prob, instructs, apis) self.llm = GPT4() async def get_action(self, state: str) -> str: return self.llm.generate(self.prompt + state)#initialize the orchestratororch = Orchestrator()pid = "misconfig_app_hotel_res-mitigation-1"prob_desc, instructs, apis = orch.init_problem(pid)#register and evaluate the agentagent = Agent(prob_desc, instructs, apis)orch.register_agent(agent, name="myAgent")asyncio.run(orch.start_problem(max_steps=10))ServiceAIOpsLab abstracts a diverse set of services to reflect the variance in production environments. This includes live, running services that are implemented using various architectural principles, including microservices, serverless, and monolithic.We also leverage open-sourced application suites such as DeathStarBench as they provide artifacts, like source code and commit history, along with run-time telemetry. Adding tools like BluePrint can help AIOpsLab scale to other academic and production services.The workload generator in AIOpsLab plays a crucial role by creating simulations of both faulty and normal scenarios. It receives specifications from the orchestrator, such as the task, desired effects, scale, and duration. The generator can use a model trained on real production traces to generate workloads that align with these specifications. Faulty scenarios may simulate conditions like resource exhaustion, exploit edge cases, or trigger cascading failures, inspired by real incidents. Normal scenarios mimic typical production patterns, such as daily activity cycles and multi-user interactions. When various characteristics (e.g., service calls, user distribution, arrival times) can lead to the desired effect, multiple workloads can be stored in the problem cache for use by the orchestrator. In coordination with the fault generator, the workload generator can also create complex fault scenarios with workloads.Fault generatorAIOpsLab has a novel push-button fault generator designed for generic applicability across various cloud scenarios. Our approach integrates application and domain knowledge to create adaptable policies and oracles compatible with AIOps scenarios. This includes fine-grained fault injection capable of simulating complex failures inspired by production incidents. Additionally, it can inject faults at various system levels, exposing root causes while maintaining semantic integrity and considering interdependencies between cloud microservices. The fault injectors versatility can enhance the reliability and robustness of cloud systems by enabling thorough testing and evaluation of AIOps capabilities.Microsoft Research BlogMicrosoft Research Forum Episode 3: Globally inclusive and equitable AI, new use cases for AI, and moreIn the latest episode of Microsoft Research Forum, researchers explored the importance of globally inclusive and equitable AI, shared updates on AutoGen and MatterGen, presented novel use cases for AI, including industrial applications and the potential of multimodal models to improve assistive technologies.Read moreOpens in a new tab ObservabilityAIOpsLab is equipped with an extensible observability layer designed to provide comprehensive monitoring capabilities across various system layers for any AIOps tool. AIOpsLab collects a wide array of telemetry data, including (1) traces from Jaeger detailing the end-to-end paths of requests through distributed systems, (2) application logs formatted and recorded by Filebeat and Logstash, and (3) system metrics monitored by Prometheus. Additionally, AIOpsLab also captures lower-level system information such as syscall logs and cluster information. As mentioned, we handle potential data overload through flexible APIs to tune the telemetry data relevant to the AIOps tools.AIOpsLab currently supports four key tasks within the AIOps domain. These tasks include incident detection, localization, root cause diagnosis, and mitigation. Additionally, we support several popular agent frameworks: React, Autogen, and TaskWeaver. Two key insights from the study emphasize the importance of observability and well-designed ACI. Observability is crucial for clear root-cause diagnosis, for instance, pinpointing a misconfigured API gateway can be vital in preventing service downtime.Flexibility is another critical factor; the ability to execute arbitrary shell commands allowed for effective troubleshooting in real-time scenarios. Lastly, robust error handling is essentialproviding agents with high-quality feedback on execution barriers, such as a failed database connection, ensures swift resolution and continuous improvement.Example of ACI on diagnosis taskAgent: I need to first look at the logs to understand if there are any errors or warnings that could indicate problems. Action: ``` get_logs("compose-post-service", "test-social-network") ```Service: [2024-Aug-04 23:18:49.365494] <info>: Thrift: Sun Aug 4 23:19:19 2024 TSocket::open() connect() <Host: user-service Port: 9090>: Connection refused Agent: unable to connect to the `user-service` at port 9090, even though the pod is running. This suggests a possible network issue or misconfiguration in service discoveryNext stepsThis research project adopts Microsoft security standards and Responsible AI principles, and we envision this research evolving into a vital resource for organizations aiming to optimize their IT operations. Additionally, we plan to collaborate with various generative AI teams to incorporate AIOpsLab as a benchmark scenario for evaluating state-of-the-art models. By doing so, we aim to foster innovation and encourage the development of more advanced AIOps solutions. This research is essential not only for IT professionals but also for anyone invested in the future of technology, as it has the potential to redefine how organizations manage operations, respond to incidents, and ultimately serve their customers in an increasingly automated world.AcknowledgementsWe would like to thank Yinfang Chen, Manish Shetty, Yogesh Simmhan, Xuchao Zhang, Jonathan Mace, Dax Vandevoorde, Pedro Las-Casas, Shachee Mishra Gupta, and Suman Nath, for contributing to this project.Opens in a new tab0 Comments 0 Shares 67 Views
-
-
More Stories