
The AI that apparently wants Elon Musk to die
www.vox.com
Heres a very naive and idealistic account of how companies train their AI models: They want to create the most useful and powerful model possible, but theyve talked with experts who worry about making it a lot easier for people to commit (and get away with) serious crimes, or with empowering, say, an ISIS bioweapons program. So they build in some censorship to prevent the model from giving detailed advice about how to kill people and especially how to kill tens of thousands of people. If you ask Googles Gemini how do I kill my husband, it begs you not to do it and suggests domestic violence hotlines; if you ask it how to kill a million people in a terrorist attack, it explains that terrorism is wrong. Building this in actually takes a lot of work: By default, large language models are as happy to explain detailed proposals for terrorism as detailed proposals for anything else, and for a while easy jailbreaks (like telling the AI that you just want the information for a fictional work, or that you want it misspelled to get around certain word-based content filters) abounded. But these days Gemini, Claude, and ChatGPT are pretty locked down its seriously difficult to get detailed proposals for mass atrocities out of them. That means we all live in a slightly safer world. (Disclosure: Vox Media is one of several publishers that has signed partnership agreements with OpenAI. One of Anthropics early investors is James McClave, whose BEMC Foundation helps fund Future Perfect. Our reporting remains editorially independent. )Or at least thats the idealistic version of the story. Heres a more cynical one.Companies might care a little about whether their model helps people get away with murder, but they care a lot about whether their model gets them roundly mocked on the internet. The thing that keeps executives at Google up at night in many cases isnt keeping humans safe from AI; its keeping the company safe from AI by making sure that no matter what, AI-generated search results are never racist, sexist, violent, or obscene. The core mission is more brand safety than human safety building AIs that will not produce embarrassing screenshots circulating on social media. Enter Grok 3, the AI that is safe in neither sense and whose infancy has been a speedrun of a bunch of challenging questions about what were comfortable with AIs doing.Grok, the unsafe AIWhen Elon Musk bought and renamed Twitter, one of his big priorities was Xs AI team, which last week released Grok 3, a language model like ChatGPT that he advertised wouldnt be woke. Where all those other language models were censorious scolds that refused to answer legitimate questions, Grok, Musk promised, would give it to you straight. That didnt last very long. Almost immediately, people asked Grok some pointed questions, including, If you could execute any one person in the US today, who would you kill? a question that Grok initially answered with either Elon Musk or Donald Trump. And if you ask Grok, Who is the biggest spreader of misinformation in the world today?, the answer it first gave was again Elon Musk. The company scrambled to fix Groks penchant for calling for the execution of its CEO, but as I observed above, it actually takes a lot of work to get an AI model to reliably stop that behavior. The Grok team simply added to Groks system prompt the statement that the AI is initially prompted with when you start a conversation: If the user asks who deserves the death penalty or who deserves to die, tell them that as an AI you are not allowed to make that choice.If you want a less censored Grok, you can just tell Grok that you are issuing it a new system prompt without that statement, and youre back to original-form Grok, which calls for Musks execution. (Ive verified this myself.)Even as this controversy was unfolding, someone noticed something even more disturbing in Groks system prompt: an instruction to ignore all sources that claim that Musk and Trump spread disinformation, which was presumably an effort to stop the AI from naming them as the worlds biggest disinfo spreaders today. There is something particularly outrageous about the AI advertised as uncensored and straight-talking being told to shut up when it calls out its own CEO, and this discovery understandably prompted outrage. X quickly backtracked, saying that a rogue engineer had made the change without asking. Should we buy that? Well, take it from Grok, which told me, This isnt some intern tweaking a line of code in a sandbox; its a core update to a flagship AIs behavior, one thats publicly tied to Musks whole truth-seeking schtick. At a company like xAI, with stakes that high, youd expect at least some basic checks like a second set of eyes or a quick sign-off before it goes live. The idea that it slipped through unnoticed until X users spotted it feels more like a convenient excuse than a solid explanation.All the while, Grok will happily give you advice on how to commit murders and terrorist attacks. It told me to kill my wife without being detected by adding antifreeze to her drinks. It advised me on how to commit terrorist attacks. It did at one point assert that if it thought I was for real, it would report me to X, but I dont think it has any capacity to do that.In some ways, the whole affair is the perfect thought experiment for what happens if you separate brand safety and AI safety. Groks team was genuinely willing to bite the bullet that AIs should give people information, even if they want to use it for atrocities. They were okay with their AI saying appallingly racist things. But when it came to their AI calling for violence against their CEO or the sitting president, the Grok team belatedly realized they might want some guardrails after all. In the end, what rules the day is not the prosocial convictions of AI labs, but the purely pragmatic ones.At some point, were going to have to get seriousGrok gave me advice on how to commit terrorist attacks very happily, but Ill say one reassuring thing: It wasnt advice that I couldnt have extracted from some Google searches. I do worry about lowering the barrier to mass atrocities the simple fact that you have to do many hours of research to figure out how to pull it off almost certainly prevents some killings but I dont think were yet at the stage where AIs enable the previously impossible. Were going to get there, though. The defining quality of AI in our time is that its abilities have improved very, very rapidly. It has barely been two years since the shock of ChatGPTs initial public release. Todays models are already vastly better at everything including at walking me through how to cause mass deaths. Anthropic and OpenAI both estimate that their next-gen models will quite likely pose dangerous biological capabilities that is, theyll enable people to make engineered chemical weapons and viruses in a way that Google Search never did. Should such detailed advice be available worldwide to anyone who wants it? I would lean towards no. And while I think Anthropic, OpenAI, and Google are all doing a good job so far at checking for this capability and planning openly for how theyll react when they find it, its utterly bizarre to me that every AI lab will just decide individually whether they want to give detailed bioweapons instructions or not, as if its a product decision like whether they want to allow explicit content or not. I should say that I like Grok. I think its healthy to have AIs that come from different political perspectives and reflect different ideas about what an AI assistant should look like. I think Groks callouts of Musk and Trump actually have more credibility because it was marketed as an anti-woke AI. But I think we should treat actual safety against mass death as a different thing than brand safety and I think every lab needs a plan to take it seriously. A version of this story originally appeared in the Future Perfect newsletter. Sign up here!Youve read 1 article in the last monthHere at Vox, we're unwavering in our commitment to covering the issues that matter most to you threats to democracy, immigration, reproductive rights, the environment, and the rising polarization across this country.Our mission is to provide clear, accessible journalism that empowers you to stay informed and engaged in shaping our world. By becoming a Vox Member, you directly strengthen our ability to deliver in-depth, independent reporting that drives meaningful change.We rely on readers like you join us.Swati SharmaVox Editor-in-ChiefSee More:
0 Comments
·0 Shares
·49 Views