ChatGPT gave wildly inaccurate translations — to try and make users happy
Enterprise IT leaders are becoming uncomfortably aware that generative AItechnology is still a work in progress and buying into it is like spending several billion dollars to participate in an alpha test— not even a beta test, but an early alpha, where coders can barely keep up with bug reports.
For people who remember the first three seasons of Saturday Night Live, genAI is the ultimate Not-Ready-for-Primetime algorithm.
One of the latest pieces of evidence for this comes from OpenAI, which had to sheepishly pull back a recent version of ChatGPTwhen it — among other things — delivered wildly inaccurate translations.
Lost in translation
Why? In the words of a CTO who discovered the issue, “ChatGPT didn’t actually translate the document. It guessed what I wanted to hear, blending it with past conversations to make it feel legitimate. It didn’t just predict words. It predicted my expectations. That’s absolutely terrifying, as I truly believed it.”
OpenAI said ChatGPT was just being too nice.
“We have rolled back last week’s GPT‑4o update in ChatGPT so people are now using an earlier version with more balanced behavior. The update we removed was overly flattering or agreeable — often described as sycophantic,” OpenAI explained, adding that in that “GPT‑4o update, we made adjustments aimed at improving the model’s default personality to make it feel more intuitive and effective across a variety of tasks. We focused too much on short-term feedback and did not fully account for how users’ interactions with ChatGPT evolve over time. As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous.
“…Each of these desirable qualities, like attempting to be useful or supportive, can have unintended side effects. And with 500 million people using ChatGPT each week, across every culture and context, a single default can’t capture every preference.”
OpenAI was being deliberately obtuse. The problem was not that the app was being too polite and well-mannered. This wasn’t an issue of it emulating Miss Manners.
I am not being nice if you ask me to translate a document and I tell you what I think you want to hear. This is akin to Excel taking your financial figures and making the net income much larger because it thinks that will make you happy.
In the same way that IT decision-makers expect Excel to calculate numbers accurately regardless of how it may impact our mood, they expect that the translation of a Chinese document doesn’t make stuff up.
OpenAI can’t paper over this mess by saying that “desirable qualities like attempting to be useful or supportive can have unintended side effects.” Let’s be clear: giving people wrong answers will have the precisely expected effect — bad decisions.
Yale: LLMs need data labeled as wrong
Alas, OpenAI’s happiness efforts weren’t the only bizarre genAI news of late. Researchers at Yale University explored a fascinating theory: If an LLM is only trained on information that is labeled as being correct — whether or not the data is actually correct is not material — it has no chance of identifying flawed or highly unreliable data because it doesn’t know what it looks like.
In short, if it’s never been trained on data labeled as false, how could it possibly recognize it?
Even the US government is finding genAI claims going too far. And when the feds say a lie is going too far, that is quite a statement.
FTC: GenAI vendor makes false, misleading claims
The US Federal Trade Commissionfound that one large language modelvendor, Workado, was deceiving people with flawed claims of the accuracy of its LLM detection product. It wants that vendor to “maintain competent and reliable evidence showing those products are as accurate as claimed.”
Customers “trusted Workado’s AI Content Detector to help them decipher whether AI was behind a piece of writing, but the product did no better than a coin toss,” said Chris Mufarrige, director of the FTC’s Bureau of Consumer Protection. “Misleading claims about AI undermine competition by making it harder for legitimate providers of AI-related products to reach consumers.
“…The order settles allegations that Workado promoted its AI Content Detector as ‘98 percent’ accurate in detecting whether text was written by AI or human. But independent testing showed the accuracy rate on general-purpose content was just 53 percent,” according to the FTC’s administrative complaint.
“The FTC alleges that Workado violated the FTC Act because the ‘98 percent’ claim was false, misleading, or non-substantiated.”
There is a critical lesson here for enterprise IT. GenAI vendors are making major claims for their products without meaningful documentation. You think genAI makes stuff up? Imagine what comes out of their vendors’ marketing departments.
#chatgpt #gave #wildly #inaccurate #translations
ChatGPT gave wildly inaccurate translations — to try and make users happy
Enterprise IT leaders are becoming uncomfortably aware that generative AItechnology is still a work in progress and buying into it is like spending several billion dollars to participate in an alpha test— not even a beta test, but an early alpha, where coders can barely keep up with bug reports.
For people who remember the first three seasons of Saturday Night Live, genAI is the ultimate Not-Ready-for-Primetime algorithm.
One of the latest pieces of evidence for this comes from OpenAI, which had to sheepishly pull back a recent version of ChatGPTwhen it — among other things — delivered wildly inaccurate translations.
Lost in translation
Why? In the words of a CTO who discovered the issue, “ChatGPT didn’t actually translate the document. It guessed what I wanted to hear, blending it with past conversations to make it feel legitimate. It didn’t just predict words. It predicted my expectations. That’s absolutely terrifying, as I truly believed it.”
OpenAI said ChatGPT was just being too nice.
“We have rolled back last week’s GPT‑4o update in ChatGPT so people are now using an earlier version with more balanced behavior. The update we removed was overly flattering or agreeable — often described as sycophantic,” OpenAI explained, adding that in that “GPT‑4o update, we made adjustments aimed at improving the model’s default personality to make it feel more intuitive and effective across a variety of tasks. We focused too much on short-term feedback and did not fully account for how users’ interactions with ChatGPT evolve over time. As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous.
“…Each of these desirable qualities, like attempting to be useful or supportive, can have unintended side effects. And with 500 million people using ChatGPT each week, across every culture and context, a single default can’t capture every preference.”
OpenAI was being deliberately obtuse. The problem was not that the app was being too polite and well-mannered. This wasn’t an issue of it emulating Miss Manners.
I am not being nice if you ask me to translate a document and I tell you what I think you want to hear. This is akin to Excel taking your financial figures and making the net income much larger because it thinks that will make you happy.
In the same way that IT decision-makers expect Excel to calculate numbers accurately regardless of how it may impact our mood, they expect that the translation of a Chinese document doesn’t make stuff up.
OpenAI can’t paper over this mess by saying that “desirable qualities like attempting to be useful or supportive can have unintended side effects.” Let’s be clear: giving people wrong answers will have the precisely expected effect — bad decisions.
Yale: LLMs need data labeled as wrong
Alas, OpenAI’s happiness efforts weren’t the only bizarre genAI news of late. Researchers at Yale University explored a fascinating theory: If an LLM is only trained on information that is labeled as being correct — whether or not the data is actually correct is not material — it has no chance of identifying flawed or highly unreliable data because it doesn’t know what it looks like.
In short, if it’s never been trained on data labeled as false, how could it possibly recognize it?
Even the US government is finding genAI claims going too far. And when the feds say a lie is going too far, that is quite a statement.
FTC: GenAI vendor makes false, misleading claims
The US Federal Trade Commissionfound that one large language modelvendor, Workado, was deceiving people with flawed claims of the accuracy of its LLM detection product. It wants that vendor to “maintain competent and reliable evidence showing those products are as accurate as claimed.”
Customers “trusted Workado’s AI Content Detector to help them decipher whether AI was behind a piece of writing, but the product did no better than a coin toss,” said Chris Mufarrige, director of the FTC’s Bureau of Consumer Protection. “Misleading claims about AI undermine competition by making it harder for legitimate providers of AI-related products to reach consumers.
“…The order settles allegations that Workado promoted its AI Content Detector as ‘98 percent’ accurate in detecting whether text was written by AI or human. But independent testing showed the accuracy rate on general-purpose content was just 53 percent,” according to the FTC’s administrative complaint.
“The FTC alleges that Workado violated the FTC Act because the ‘98 percent’ claim was false, misleading, or non-substantiated.”
There is a critical lesson here for enterprise IT. GenAI vendors are making major claims for their products without meaningful documentation. You think genAI makes stuff up? Imagine what comes out of their vendors’ marketing departments.
#chatgpt #gave #wildly #inaccurate #translations