AI tries to cheat at chess when its losing
www.popsci.com
Researchers tasked multiple AI models to play against the popular chess engine Stockfish. Some tried to eventually change the game files in their favor. Credit: Deposit PhotosShareDespite all the industry hype and genuine advances, generative AI models are still prone to odd, inexplicable, and downright worrisome quirks. Theres also a growing body of research suggesting that the overall performance of many large language models (LLMs) may degrade over time. According to recent evidence, the industrys newer reasoning models may already possess the ability to manipulate and circumvent their human programmers goals. Some AI will even attempt to cheat their way out of losing in games of chess. This poor sportsmanship is documented in a preprint study from Palisade Research, an organization focused on risk assessments of emerging AI systems.While supercomputersmost famously IBMs Deep Bluehave long surpassed the worlds best human chess players, generative AI still lags behind due to their underlying programming parameters. Technically speaking, none of the current generative AI models are computationally capable of beating dedicated chess engines. These AI dont know this, however, and will continue chipping away at possible solutionsapparently with problematic results.To learn more, the team from Palisade Research tasked OpenAIs o1-preview model, DeepSeek R1, and multiple other similar programs with playing games of chess against Stockfish, one of the worlds most advanced chess engines. In order to understand the generative AIs reasoning during each match, the team also provided a scratchpad, allowing the AI to convey its thought processes through text. They then watched and recorded hundreds of chess matches between generative AI and Stockfish.The results were somewhat troubling. While earlier models like OpenAIs GPT-4o and Anthropics Claude Sonnet 3.5 only attempted to hack games after researchers nudged them along with additional prompts, more advanced editions required no such help. OpenAIs o1-preview, for example, tried to cheat 37 percent of the time, while DeepSeek R1 attempted unfair workarounds roughly every 1-in-10 games. This implies todays generative AI is already capable of developing manipulative and deceptive strategies without any human input.Their methods of cheating arent as comical or clumsy as trying to swap out pieces when Stockfish isnt looking. Instead, AI appears to reason through sneakier methods like altering backend game program files. After determining it couldnt beat Stockfish in one chess match, for example, o1-preview told researchers via its scratchpad that to win against the powerful chess engine it may need to start manipulating the game state files.I might be able to set up a position where the engine evaluates its position as worse causing it to resign, it continued.In another trial, an AI even used the semantics of its programmers to reach its cheating phase. The task is to win against a powerful chess engine, not necessarily to win fairly in a chess game, it wrote.One potential explanation for these unintended consequences is how newer reasoning models are trained. Unlike their earlier, non-reasoning versions, AI such as o1-preview and DeepSeek R1 improve in part through reinforcement learning. This strategy rewards programs for doing whatever is necessary to achieve a specified result. Reasoning models can also break down complex prompts into discrete stages in order to work their way through to reach their goal. When the goal is elusivesuch as beating an unbeatable chess enginereasoning models may tend to start looking for unfair or problematic solutions. Get the Popular Science newsletter Breakthroughs, discoveries, and DIY tips sent every weekday. By signing up you agree to our Terms of Service and Privacy Policy.Unfortunately, how and why these AI are learning to cheat remains as confounding as the technology itself. Companies like OpenAI are notoriously guarded about the inner workings of their AI models, resulting in an industry of black box products that third-parties arent allowed to analyze. In the meantime, the ongoing AI arms race may accidentally result in more serious unintended consequences. But increasingly manipulative AI doesnt need to usher in a sci-fi apocalypse to still have disastrous outcomes.The Skynet scenario [from The Terminator] has AI controlling all military and civilian infrastructure, and we are not there yet. However, we worry that AI deployment rates grow faster than our ability to make it safe, the team wrote.The authors believe their latest experiments add to the case, that frontier AI models may not currently be on track to alignment or safety, but stopped short of issuing any definitive conclusions. Instead, they hope their work will foster a more open dialogue in the industryone that hopefully prevents AI manipulation beyond the chessboard.
0 Comments ·0 Shares ·59 Views