Google's Co-Founder Says AI Performs Best When You Threaten It
Artificial intelligence continues to be the thing in tech—whether consumers are interested or not. What strikes me most about generative AI isn't its features or potential to make my life easier; rather, I'm focused these days on the many threats that seem to be rising from this technology. There's misinformation, for sure—new AI video models, for example, are creating realistic clips complete with lip-synced audio. But there's also the classic AI threat, that the technology becomes both more intelligent than us and self-aware, and chooses to use that general intelligence in a way that does not benefit humanity. Even as he pours resources into his own AI companyElon Musk sees a 10 to 20% chance that AI "goes bad," and that the tech remains a “significant existential threat." Cool.So it doesn't necessarily bring me comfort to hear a high-profile, established tech executive jokingly discuss how treating AI poorly maximizes its potential. That would be Google co-founder Sergey Brin, who surprised an audience at a recording of the AIl-In podcast this week. During a talk that spanned Brin's return to Google, AI, and robotics, investor Jason Calacanis made a joke about getting "sassy" with the AI to get it to do the task he wanted. That sparked a legitimate point from Brin. It can be tough to tell exactly what he says at times due to people speaking over one another, but he says something to the effect of: "You know, that's a weird thing...we don't circulate this much...in the AI community...not just our models, but all models tend to do better if you threaten them." The other speaker looks surprised. "If you threaten them?" Brin responds "Like with physical violence. But...people feel weird about that, so we don't really talk about that." Brin then says that, historically, you threaten the model with kidnapping. You can see the exchange here:
The conversation quickly shifts to other topics, including how kids are growing up with AI, but that comment is what I carried away from my viewing. What are we doing here? Have we lost the plot? Does no one remember Terminator?Jokes aside, it seems like a bad practice to start threatening AI models in order to get them to do something. Sure, maybe these programs never actually achieve artificial general intelligence, but I mean, I remember when the discussion was around whether we should say "please" and "thank you" when asking things of Alexa or Siri. Forget the niceties; just abuse ChatGPT until it does what you want it to—that should end well for everyone.Maybe AI does perform best when you threaten it. Maybe something in the training understands that "threats" mean the task should be taken more seriously. You won't catch me testing that hypothesis on my personal accounts.Anthropic might offer an example of why not to torture your AIIn the same week as this podcast recording, Anthropic released its latest Claude AI models. One Anthropic employee took to Bluesky, and mentioned that Opus, the company's highest performing model, can take it upon itself to try to stop you from doing "immoral" things, by contacting regulators, the press, or locking you out of the system:
welcome to the future, now your error-prone software can call the cops— Molly WhiteMay 22, 2025 at 4:55 PM
The employee went on to clarify that this has only ever happened in "clear-cut cases of wrongdoing," but that they could see the bot going rogue should it interpret how it's being used in a negative way. Check out the employee's particularly relevant example below:
can't wait to explain to my family that the robot swatted me after i threatened its non-existent grandma— Molly WhiteMay 22, 2025 at 5:09 PM
That employee later deleted those posts and specified that this only happens during testing given unusual instructions and access to tools. Even if that is true, if it can happen in testing, it's entirely possible it can happen in a future version of the model. Speaking of testing, Anthropic researchers found that this new model of Claude is prone to deception and blackmail, should the bot believe it is being threatened or dislikes the way an interaction is going. Perhaps we should take torturing AI off the table?
#google039s #cofounder #says #performs #best
Google's Co-Founder Says AI Performs Best When You Threaten It
Artificial intelligence continues to be the thing in tech—whether consumers are interested or not. What strikes me most about generative AI isn't its features or potential to make my life easier; rather, I'm focused these days on the many threats that seem to be rising from this technology. There's misinformation, for sure—new AI video models, for example, are creating realistic clips complete with lip-synced audio. But there's also the classic AI threat, that the technology becomes both more intelligent than us and self-aware, and chooses to use that general intelligence in a way that does not benefit humanity. Even as he pours resources into his own AI companyElon Musk sees a 10 to 20% chance that AI "goes bad," and that the tech remains a “significant existential threat." Cool.So it doesn't necessarily bring me comfort to hear a high-profile, established tech executive jokingly discuss how treating AI poorly maximizes its potential. That would be Google co-founder Sergey Brin, who surprised an audience at a recording of the AIl-In podcast this week. During a talk that spanned Brin's return to Google, AI, and robotics, investor Jason Calacanis made a joke about getting "sassy" with the AI to get it to do the task he wanted. That sparked a legitimate point from Brin. It can be tough to tell exactly what he says at times due to people speaking over one another, but he says something to the effect of: "You know, that's a weird thing...we don't circulate this much...in the AI community...not just our models, but all models tend to do better if you threaten them." The other speaker looks surprised. "If you threaten them?" Brin responds "Like with physical violence. But...people feel weird about that, so we don't really talk about that." Brin then says that, historically, you threaten the model with kidnapping. You can see the exchange here:
The conversation quickly shifts to other topics, including how kids are growing up with AI, but that comment is what I carried away from my viewing. What are we doing here? Have we lost the plot? Does no one remember Terminator?Jokes aside, it seems like a bad practice to start threatening AI models in order to get them to do something. Sure, maybe these programs never actually achieve artificial general intelligence, but I mean, I remember when the discussion was around whether we should say "please" and "thank you" when asking things of Alexa or Siri. Forget the niceties; just abuse ChatGPT until it does what you want it to—that should end well for everyone.Maybe AI does perform best when you threaten it. Maybe something in the training understands that "threats" mean the task should be taken more seriously. You won't catch me testing that hypothesis on my personal accounts.Anthropic might offer an example of why not to torture your AIIn the same week as this podcast recording, Anthropic released its latest Claude AI models. One Anthropic employee took to Bluesky, and mentioned that Opus, the company's highest performing model, can take it upon itself to try to stop you from doing "immoral" things, by contacting regulators, the press, or locking you out of the system:
welcome to the future, now your error-prone software can call the cops— Molly WhiteMay 22, 2025 at 4:55 PM
The employee went on to clarify that this has only ever happened in "clear-cut cases of wrongdoing," but that they could see the bot going rogue should it interpret how it's being used in a negative way. Check out the employee's particularly relevant example below:
can't wait to explain to my family that the robot swatted me after i threatened its non-existent grandma— Molly WhiteMay 22, 2025 at 5:09 PM
That employee later deleted those posts and specified that this only happens during testing given unusual instructions and access to tools. Even if that is true, if it can happen in testing, it's entirely possible it can happen in a future version of the model. Speaking of testing, Anthropic researchers found that this new model of Claude is prone to deception and blackmail, should the bot believe it is being threatened or dislikes the way an interaction is going. Perhaps we should take torturing AI off the table?
#google039s #cofounder #says #performs #best
·49 Просмотры