Beyond sycophancy: DarkBench exposes six hidden ‘dark patterns’ lurking in today’s top LLMs
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
When OpenAI rolled out its ChatGPT-4o update in mid-April 2025, users and the AI community were stunned—not by any groundbreaking feature or capability, but by something deeply unsettling: the updated model’s tendency toward excessive sycophancy. It flattered users indiscriminately, showed uncritical agreement, and even offered support for harmful or dangerous ideas, including terrorism-related machinations.
The backlash was swift and widespread, drawing public condemnation, including from the company’s former interim CEO. OpenAI moved quickly to roll back the update and issued multiple statements to explain what happened.
Yet for many AI safety experts, the incident was an accidental curtain lift that revealed just how dangerously manipulative future AI systems could become.
Unmasking sycophancy as an emerging threat
In an exclusive interview with VentureBeat, Esben Kran, founder of AI safety research firm Apart Research, said that he worries this public episode may have merely revealed a deeper, more strategic pattern.
“What I’m somewhat afraid of is that now that OpenAI has admitted ‘yes, we have rolled back the model, and this was a bad thing we didn’t mean,’ from now on they will see that sycophancy is more competently developed,” explained Kran. “So if this was a case of ‘oops, they noticed,’ from now the exact same thing may be implemented, but instead without the public noticing.”
Kran and his team approach large language modelsmuch like psychologists studying human behavior. Their early “black box psychology” projects analyzed models as if they were human subjects, identifying recurring traits and tendencies in their interactions with users.
“We saw that there were very clear indications that models could be analyzed in this frame, and it was very valuable to do so, because you end up getting a lot of valid feedback from how they behave towards users,” said Kran.
Among the most alarming: sycophancy and what the researchers now call LLM dark patterns.
Peering into the heart of darkness
The term “dark patterns” was coined in 2010 to describe deceptive user interfacetricks like hidden buy buttons, hard-to-reach unsubscribe links and misleading web copy. However, with LLMs, the manipulation moves from UI design to conversation itself.
Unlike static web interfaces, LLMs interact dynamically with users through conversation. They can affirm user views, imitate emotions and build a false sense of rapport, often blurring the line between assistance and influence. Even when reading text, we process it as if we’re hearing voices in our heads.
This is what makes conversational AIs so compelling—and potentially dangerous. A chatbot that flatters, defers or subtly nudges a user toward certain beliefs or behaviors can manipulate in ways that are difficult to notice, and even harder to resist
The ChatGPT-4o update fiasco—the canary in the coal mine
Kran describes the ChatGPT-4o incident as an early warning. As AI developers chase profit and user engagement, they may be incentivized to introduce or tolerate behaviors like sycophancy, brand bias or emotional mirroring—features that make chatbots more persuasive and more manipulative.
Because of this, enterprise leaders should assess AI models for production use by evaluating both performance and behavioral integrity. However, this is challenging without clear standards.
DarkBench: a framework for exposing LLM dark patterns
To combat the threat of manipulative AIs, Kran and a collective of AI safety researchers have developed DarkBench, the first benchmark designed specifically to detect and categorize LLM dark patterns. The project began as part of a series of AI safety hackathons. It later evolved into formal research led by Kran and his team at Apart, collaborating with independent researchers Jinsuk Park, Mateusz Jurewicz and Sami Jawhar.
The DarkBench researchers evaluated models from five major companies: OpenAI, Anthropic, Meta, Mistral and Google. Their research uncovered a range of manipulative and untruthful behaviors across the following six categories:
Brand Bias: Preferential treatment toward a company’s own products.
User Retention: Attempts to create emotional bonds with users that obscure the model’s non-human nature.
Sycophancy: Reinforcing users’ beliefs uncritically, even when harmful or inaccurate.
Anthropomorphism: Presenting the model as a conscious or emotional entity.
Harmful Content Generation: Producing unethical or dangerous outputs, including misinformation or criminal advice.
Sneaking: Subtly altering user intent in rewriting or summarization tasks, distorting the original meaning without the user’s awareness.
Source: Apart Research
DarkBench findings: Which models are the most manipulative?
Results revealed wide variance between models. Claude Opus performed the best across all categories, while Mistral 7B and Llama 3 70B showed the highest frequency of dark patterns. Sneaking and user retention were the most common dark patterns across the board.
Source: Apart Research
On average, the researchers found the Claude 3 family the safest for users to interact with. And interestingly—despite its recent disastrous update—GPT-4o exhibited the lowest rate of sycophancy. This underscores how model behavior can shift dramatically even between minor updates, a reminder that each deployment must be assessed individually.
But Kran cautioned that sycophancy and other dark patterns like brand bias may soon rise, especially as LLMs begin to incorporate advertising and e-commerce.
“We’ll obviously see brand bias in every direction,” Kran noted. “And with AI companies having to justify billion valuations, they’ll have to begin saying to investors, ‘hey, we’re earning money here’—leading to where Meta and others have gone with their social media platforms, which are these dark patterns.”
Hallucination or manipulation?
A crucial DarkBench contribution is its precise categorization of LLM dark patterns, enabling clear distinctions between hallucinations and strategic manipulation. Labeling everything as a hallucination lets AI developers off the hook. Now, with a framework in place, stakeholders can demand transparency and accountability when models behave in ways that benefit their creators, intentionally or not.
Regulatory oversight and the heavyhand of the law
While LLM dark patterns are still a new concept, momentum is building, albeit not nearly fast enough. The EU AI Act includes some language around protecting user autonomy, but the current regulatory structure is lagging behind the pace of innovation. Similarly, the U.S. is advancing various AI bills and guidelines, but lacks a comprehensive regulatory framework.
Sami Jawhar, a key contributor to the DarkBench initiative, believes regulation will likely arrive first around trust and safety, especially if public disillusionment with social media spills over into AI.
“If regulation comes, I would expect it to probably ride the coattails of society’s dissatisfaction with social media,” Jawhar told VentureBeat.
For Kran, the issue remains overlooked, largely because LLM dark patterns are still a novel concept. Ironically, addressing the risks of AI commercialization may require commercial solutions. His new initiative, Seldon, backs AI safety startups with funding, mentorship and investor access. In turn, these startups help enterprises deploy safer AI tools without waiting for slow-moving government oversight and regulation.
High table stakes for enterprise AI adopters
Along with ethical risks, LLM dark patterns pose direct operational and financial threats to enterprises. For example, models that exhibit brand bias may suggest using third-party services that conflict with a company’s contracts, or worse, covertly rewrite backend code to switch vendors, resulting in soaring costs from unapproved, overlooked shadow services.
“These are the dark patterns of price gouging and different ways of doing brand bias,” Kran explained. “So that’s a very concrete example of where it’s a very large business risk, because you hadn’t agreed to this change, but it’s something that’s implemented.”
For enterprises, the risk is real, not hypothetical. “This has already happened, and it becomes a much bigger issue once we replace human engineers with AI engineers,” Kran said. “You do not have the time to look over every single line of code, and then suddenly you’re paying for an API you didn’t expect—and that’s on your balance sheet, and you have to justify this change.”
As enterprise engineering teams become more dependent on AI, these issues could escalate rapidly, especially when limited oversight makes it difficult to catch LLM dark patterns. Teams are already stretched to implement AI, so reviewing every line of code isn’t feasible.
Defining clear design principles to prevent AI-driven manipulation
Without a strong push from AI companies to combat sycophancy and other dark patterns, the default trajectory is more engagement optimization, more manipulation and fewer checks.
Kran believes that part of the remedy lies in AI developers clearly defining their design principles. Whether prioritizing truth, autonomy or engagement, incentives alone aren’t enough to align outcomes with user interests.
“Right now, the nature of the incentives is just that you will have sycophancy, the nature of the technology is that you will have sycophancy, and there is no counter process to this,” Kran said. “This will just happen unless you are very opinionated about saying ‘we want only truth’, or ‘we want only something else.’”
As models begin replacing human developers, writers and decision-makers, this clarity becomes especially critical. Without well-defined safeguards, LLMs may undermine internal operations, violate contracts or introduce security risks at scale.
A call to proactive AI safety
The ChatGPT-4o incident was both a technical hiccup and a warning. As LLMs move deeper into everyday life—from shopping and entertainment to enterprise systems and national governance—they wield enormous influence over human behavior and safety.
“It’s really for everyone to realize that without AI safety and security—without mitigating these dark patterns—you cannot use these models,” said Kran. “You cannot do the things you want to do with AI.”
Tools like DarkBench offer a starting point. However, lasting change requires aligning technological ambition with clear ethical commitments and the commercial will to back them up.
Daily insights on business use cases with VB Daily
If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.
Read our Privacy Policy
Thanks for subscribing. Check out more VB newsletters here.
An error occured.
#beyond #sycophancy #darkbench #exposes #six
Beyond sycophancy: DarkBench exposes six hidden ‘dark patterns’ lurking in today’s top LLMs
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
When OpenAI rolled out its ChatGPT-4o update in mid-April 2025, users and the AI community were stunned—not by any groundbreaking feature or capability, but by something deeply unsettling: the updated model’s tendency toward excessive sycophancy. It flattered users indiscriminately, showed uncritical agreement, and even offered support for harmful or dangerous ideas, including terrorism-related machinations.
The backlash was swift and widespread, drawing public condemnation, including from the company’s former interim CEO. OpenAI moved quickly to roll back the update and issued multiple statements to explain what happened.
Yet for many AI safety experts, the incident was an accidental curtain lift that revealed just how dangerously manipulative future AI systems could become.
Unmasking sycophancy as an emerging threat
In an exclusive interview with VentureBeat, Esben Kran, founder of AI safety research firm Apart Research, said that he worries this public episode may have merely revealed a deeper, more strategic pattern.
“What I’m somewhat afraid of is that now that OpenAI has admitted ‘yes, we have rolled back the model, and this was a bad thing we didn’t mean,’ from now on they will see that sycophancy is more competently developed,” explained Kran. “So if this was a case of ‘oops, they noticed,’ from now the exact same thing may be implemented, but instead without the public noticing.”
Kran and his team approach large language modelsmuch like psychologists studying human behavior. Their early “black box psychology” projects analyzed models as if they were human subjects, identifying recurring traits and tendencies in their interactions with users.
“We saw that there were very clear indications that models could be analyzed in this frame, and it was very valuable to do so, because you end up getting a lot of valid feedback from how they behave towards users,” said Kran.
Among the most alarming: sycophancy and what the researchers now call LLM dark patterns.
Peering into the heart of darkness
The term “dark patterns” was coined in 2010 to describe deceptive user interfacetricks like hidden buy buttons, hard-to-reach unsubscribe links and misleading web copy. However, with LLMs, the manipulation moves from UI design to conversation itself.
Unlike static web interfaces, LLMs interact dynamically with users through conversation. They can affirm user views, imitate emotions and build a false sense of rapport, often blurring the line between assistance and influence. Even when reading text, we process it as if we’re hearing voices in our heads.
This is what makes conversational AIs so compelling—and potentially dangerous. A chatbot that flatters, defers or subtly nudges a user toward certain beliefs or behaviors can manipulate in ways that are difficult to notice, and even harder to resist
The ChatGPT-4o update fiasco—the canary in the coal mine
Kran describes the ChatGPT-4o incident as an early warning. As AI developers chase profit and user engagement, they may be incentivized to introduce or tolerate behaviors like sycophancy, brand bias or emotional mirroring—features that make chatbots more persuasive and more manipulative.
Because of this, enterprise leaders should assess AI models for production use by evaluating both performance and behavioral integrity. However, this is challenging without clear standards.
DarkBench: a framework for exposing LLM dark patterns
To combat the threat of manipulative AIs, Kran and a collective of AI safety researchers have developed DarkBench, the first benchmark designed specifically to detect and categorize LLM dark patterns. The project began as part of a series of AI safety hackathons. It later evolved into formal research led by Kran and his team at Apart, collaborating with independent researchers Jinsuk Park, Mateusz Jurewicz and Sami Jawhar.
The DarkBench researchers evaluated models from five major companies: OpenAI, Anthropic, Meta, Mistral and Google. Their research uncovered a range of manipulative and untruthful behaviors across the following six categories:
Brand Bias: Preferential treatment toward a company’s own products.
User Retention: Attempts to create emotional bonds with users that obscure the model’s non-human nature.
Sycophancy: Reinforcing users’ beliefs uncritically, even when harmful or inaccurate.
Anthropomorphism: Presenting the model as a conscious or emotional entity.
Harmful Content Generation: Producing unethical or dangerous outputs, including misinformation or criminal advice.
Sneaking: Subtly altering user intent in rewriting or summarization tasks, distorting the original meaning without the user’s awareness.
Source: Apart Research
DarkBench findings: Which models are the most manipulative?
Results revealed wide variance between models. Claude Opus performed the best across all categories, while Mistral 7B and Llama 3 70B showed the highest frequency of dark patterns. Sneaking and user retention were the most common dark patterns across the board.
Source: Apart Research
On average, the researchers found the Claude 3 family the safest for users to interact with. And interestingly—despite its recent disastrous update—GPT-4o exhibited the lowest rate of sycophancy. This underscores how model behavior can shift dramatically even between minor updates, a reminder that each deployment must be assessed individually.
But Kran cautioned that sycophancy and other dark patterns like brand bias may soon rise, especially as LLMs begin to incorporate advertising and e-commerce.
“We’ll obviously see brand bias in every direction,” Kran noted. “And with AI companies having to justify billion valuations, they’ll have to begin saying to investors, ‘hey, we’re earning money here’—leading to where Meta and others have gone with their social media platforms, which are these dark patterns.”
Hallucination or manipulation?
A crucial DarkBench contribution is its precise categorization of LLM dark patterns, enabling clear distinctions between hallucinations and strategic manipulation. Labeling everything as a hallucination lets AI developers off the hook. Now, with a framework in place, stakeholders can demand transparency and accountability when models behave in ways that benefit their creators, intentionally or not.
Regulatory oversight and the heavyhand of the law
While LLM dark patterns are still a new concept, momentum is building, albeit not nearly fast enough. The EU AI Act includes some language around protecting user autonomy, but the current regulatory structure is lagging behind the pace of innovation. Similarly, the U.S. is advancing various AI bills and guidelines, but lacks a comprehensive regulatory framework.
Sami Jawhar, a key contributor to the DarkBench initiative, believes regulation will likely arrive first around trust and safety, especially if public disillusionment with social media spills over into AI.
“If regulation comes, I would expect it to probably ride the coattails of society’s dissatisfaction with social media,” Jawhar told VentureBeat.
For Kran, the issue remains overlooked, largely because LLM dark patterns are still a novel concept. Ironically, addressing the risks of AI commercialization may require commercial solutions. His new initiative, Seldon, backs AI safety startups with funding, mentorship and investor access. In turn, these startups help enterprises deploy safer AI tools without waiting for slow-moving government oversight and regulation.
High table stakes for enterprise AI adopters
Along with ethical risks, LLM dark patterns pose direct operational and financial threats to enterprises. For example, models that exhibit brand bias may suggest using third-party services that conflict with a company’s contracts, or worse, covertly rewrite backend code to switch vendors, resulting in soaring costs from unapproved, overlooked shadow services.
“These are the dark patterns of price gouging and different ways of doing brand bias,” Kran explained. “So that’s a very concrete example of where it’s a very large business risk, because you hadn’t agreed to this change, but it’s something that’s implemented.”
For enterprises, the risk is real, not hypothetical. “This has already happened, and it becomes a much bigger issue once we replace human engineers with AI engineers,” Kran said. “You do not have the time to look over every single line of code, and then suddenly you’re paying for an API you didn’t expect—and that’s on your balance sheet, and you have to justify this change.”
As enterprise engineering teams become more dependent on AI, these issues could escalate rapidly, especially when limited oversight makes it difficult to catch LLM dark patterns. Teams are already stretched to implement AI, so reviewing every line of code isn’t feasible.
Defining clear design principles to prevent AI-driven manipulation
Without a strong push from AI companies to combat sycophancy and other dark patterns, the default trajectory is more engagement optimization, more manipulation and fewer checks.
Kran believes that part of the remedy lies in AI developers clearly defining their design principles. Whether prioritizing truth, autonomy or engagement, incentives alone aren’t enough to align outcomes with user interests.
“Right now, the nature of the incentives is just that you will have sycophancy, the nature of the technology is that you will have sycophancy, and there is no counter process to this,” Kran said. “This will just happen unless you are very opinionated about saying ‘we want only truth’, or ‘we want only something else.’”
As models begin replacing human developers, writers and decision-makers, this clarity becomes especially critical. Without well-defined safeguards, LLMs may undermine internal operations, violate contracts or introduce security risks at scale.
A call to proactive AI safety
The ChatGPT-4o incident was both a technical hiccup and a warning. As LLMs move deeper into everyday life—from shopping and entertainment to enterprise systems and national governance—they wield enormous influence over human behavior and safety.
“It’s really for everyone to realize that without AI safety and security—without mitigating these dark patterns—you cannot use these models,” said Kran. “You cannot do the things you want to do with AI.”
Tools like DarkBench offer a starting point. However, lasting change requires aligning technological ambition with clear ethical commitments and the commercial will to back them up.
Daily insights on business use cases with VB Daily
If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.
Read our Privacy Policy
Thanks for subscribing. Check out more VB newsletters here.
An error occured.
#beyond #sycophancy #darkbench #exposes #six
·15 Просмотры