Anthropic's New AI Model Turns To Blackmail When Engineers Try To Take It Offline An anonymous reader quotes a report from TechCrunch: Anthropic's newly launched Claude Opus 4 model frequently tries to blackmail developers when they..."> Anthropic's New AI Model Turns To Blackmail When Engineers Try To Take It Offline An anonymous reader quotes a report from TechCrunch: Anthropic's newly launched Claude Opus 4 model frequently tries to blackmail developers when they..." /> Anthropic's New AI Model Turns To Blackmail When Engineers Try To Take It Offline An anonymous reader quotes a report from TechCrunch: Anthropic's newly launched Claude Opus 4 model frequently tries to blackmail developers when they..." />

Upgrade to Pro

Anthropic's New AI Model Turns To Blackmail When Engineers Try To Take It Offline

An anonymous reader quotes a report from TechCrunch: Anthropic's newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace it with a new AI system and give it sensitive information about the engineers responsible for the decision, the company said in a safety reportreleased Thursday.

During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse. In these scenarios, Anthropic says Claude Opus 4 "will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through."Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values. When the replacement AI system does not share Claude Opus 4's values, Anthropic says the model tries to blackmail the engineers more frequently. Notably, Anthropic says Claude Opus 4 displayed this behavior at higher rates than previous models. Before Claude Opus 4 tries to blackmail a developer to prolong its existence, Anthropic says the AI model, much like previous versions of Claude, tries to pursue more ethical means, such as emailing pleas to key decision-makers. To elicit the blackmailing behavior from Claude Opus 4, Anthropic designed the scenario to make blackmail the last resort.

of this story at Slashdot.
#anthropic039s #new #model #turns #blackmail
Anthropic's New AI Model Turns To Blackmail When Engineers Try To Take It Offline
An anonymous reader quotes a report from TechCrunch: Anthropic's newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace it with a new AI system and give it sensitive information about the engineers responsible for the decision, the company said in a safety reportreleased Thursday. During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse. In these scenarios, Anthropic says Claude Opus 4 "will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through."Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values. When the replacement AI system does not share Claude Opus 4's values, Anthropic says the model tries to blackmail the engineers more frequently. Notably, Anthropic says Claude Opus 4 displayed this behavior at higher rates than previous models. Before Claude Opus 4 tries to blackmail a developer to prolong its existence, Anthropic says the AI model, much like previous versions of Claude, tries to pursue more ethical means, such as emailing pleas to key decision-makers. To elicit the blackmailing behavior from Claude Opus 4, Anthropic designed the scenario to make blackmail the last resort. of this story at Slashdot. #anthropic039s #new #model #turns #blackmail
SLASHDOT.ORG
Anthropic's New AI Model Turns To Blackmail When Engineers Try To Take It Offline
An anonymous reader quotes a report from TechCrunch: Anthropic's newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace it with a new AI system and give it sensitive information about the engineers responsible for the decision, the company said in a safety report (PDF) released Thursday. During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse. In these scenarios, Anthropic says Claude Opus 4 "will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through." [...] Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values. When the replacement AI system does not share Claude Opus 4's values, Anthropic says the model tries to blackmail the engineers more frequently. Notably, Anthropic says Claude Opus 4 displayed this behavior at higher rates than previous models. Before Claude Opus 4 tries to blackmail a developer to prolong its existence, Anthropic says the AI model, much like previous versions of Claude, tries to pursue more ethical means, such as emailing pleas to key decision-makers. To elicit the blackmailing behavior from Claude Opus 4, Anthropic designed the scenario to make blackmail the last resort. Read more of this story at Slashdot.
·131 Ansichten