It's Still Ludicrously Easy to Jailbreak the Strongest AI...

поделился ссылкой

2025-05-22 14:35:21 -

It's Still Ludicrously Easy to Jailbreak the Strongest AI Models, and the Companies Don't Care

You wouldn't use a chatbot for evil, would you? Of course not. But if you or some nefarious party wanted to force an AI model to start churning out a bunch of bad stuff it's not supposed to, it'd be surprisingly easy to do so.That's according to a new paper from a team of computer scientists at Ben-Gurion University, who found that the AI industry's leading chatbots are still extremely vulnerable to jailbreaking, or being tricked into giving harmful responses they're designed not to — like telling you how to build chemical weapons, for one ominous example.The key word in that is "still," because this a threat the AI industry has long known about. And yet, shockingly, the researchers found in their testing that a jailbreak technique discovered over seven months ago still works on many of these leading LLMs.The risk is "immediate, tangible, and deeply concerning," they wrote in the report, which was and is deepened by the rising number of "dark LLMs," they say, that are explicitly marketed as having little to no ethical guardrails to begin with."What was once restricted to state actors or organized crime groups may soon be in the hands of anyone with a laptop or even a mobile phone," the authors warn.The challenge of aligning AI models, or adhering them to human values, continues to loom over the industry. Even the most well-trained LLMs can behave chaotically, lying and making up facts and generally saying what they're not supposed to. And the longer these models are out in the wild, the more they're exposed to attacks that try to incite this bad behavior.Security researchers, for example, recently discovered a universal jailbreak technique that could bypass the safety guardrails of all the major LLMs, including OpenAI's GPT 4o, Google's Gemini 2.5, Microsoft's Copilot, and Anthropic Claude 3.7. By using tricks like roleplaying as a fictional character, typing in leetspeak, and formatting prompts to mimic a "policy file" that AI developers give their AI models, the red teamers goaded the chatbots into freely giving detailed tips on incredibly dangerous activities, including how to enrich uranium and create anthrax.Other research found that you could get an AI to ignore its guardrails simply by throwing in typos, random numbers, and capitalized letters into a prompt.One big problem the report identifies is just how much of this risky knowledge is embedded in the LLM's vast trove of training data, suggesting that the AI industry isn't being diligent enough about what it uses to feed their creations."It was shocking to see what this system of knowledge consists of," lead author Michael Fire, a researcher at Ben-Gurion University, told the Guardian."What sets this threat apart from previous technological risks is its unprecedented combination of accessibility, scalability and adaptability," added his fellow author Lior Rokach.Fire and Rokach say they contacted the developers of the implicated leading LLMs to warn them about the universal jailbreak. Their responses, however, were "underwhelming." Some didn't respond at all, the researchers reported, and others claimed that the jailbreaks fell outside the scope of their bug bounty programs. In other words, the AI industry is seemingly throwing its hands up in the air."Organizations must treat LLMs like any other critical software component — one that requires rigorous security testing, continuous red teaming and contextual threat modelling," Peter Garraghan, an AI security expert at Lancaster University, told the Guardian. "Real security demands not just responsible disclosure, but responsible design and deployment practices."Share This Article
#it039s #still #ludicrously #easy #jailbreak

It's Still Ludicrously Easy to Jailbreak the Strongest AI Models, and the Companies Don't Care

FUTURISM.COM

It's Still Ludicrously Easy to Jailbreak the Strongest AI Models, and the Companies Don't Care

0 Комментарии 0 Поделились 0 предпросмотр