
Finally: some truth serum for lying genAI chatbots
www.computerworld.com
Ever since OpenAImade ChatGPT available to the public in late 2022, the large language model (LLM)-based generative AI (genAI) revolution has advanced at an astonishing pace.Two years and four months ago, we had only ChatGPT. Now, we have GPT-4.5, Claude 3.7, Gemini 2.0 Pro, Llama 3.1, PaLM 2, Perplexity AI, Grok-3, DeepSeek R1, LLaMA-13B, and dozens of other tools, ranging from free to $20,000 per month for the top tier of OpenAIs Operator system.The consensus is that theyre advancing quickly. But they all seem stuck on three fundamental problems that prevent their full use by business users: Their responses are often 1) generic, 2) hallucinatory, and/or 3) compromised by deliberate sabotage.Serious action is being taken to address these problems, and Ill get to that in a minute.Problem #1: Generic outputGenAI chatbots often produce results that are too generic or lacking in nuance, creativity, or personalization. This issue stems from their reliance on large-scale training data, which biases them toward surface-level responses and homogenized content that reflects a kind of average.Critics also warn of model collapse, where repeated training on AI-generated data makes the problem worse by reducing variability and originality over time.Problem #2: hallucinatory outputFar too often than anyone wants, AI chatbots produce factually inaccurate or nonsensical responses thats presented with confidence. This surprises people, because the public often assumes AI chatbots can think. But they cant. LLMs predict the next word or phrase based on probabilities derived from training data without the slightest understanding of the meaning or how those words relate to the real world.Compounding the problem, the training data inevitably contains biases, inaccuracies, or insufficient data, based on the content people produced.Also, LLMs dont understand the words theyre piecing together in their responses and dont compare them to an understanding of the real world. Lawyers have gotten in trouble for turning over their legal arguments to chatbots, only to be embarrassed in court when the chatbots make up entire cases to cite.To an LLM, a string of words thatsoundslike a case and a string of wordsreferring to an actual caseargued in a real court are the same thing.Problem #3: Deliberately sabotaged outputThe chatbot companies dont control the training data, so it can and will be gamed. One egregious example comes from the Russian government, which was caught doing LLM grooming on a massive scale.Called the Pravda network (also called Portal Kombat),disinformation specialists working for the Russian governmentpublished are you sitting down? 3.6 million articles on 150 websites in 2024. Thats 10,000 articles per day, all pushing a couple hundred false claims that favor Russias interests, including falsehoods about the Russia/Ukraine war. The articles were published with expertly crafted SEO but got almost no traffic. They existed to train the chatbots.As a result of this LLM grooming, the watchdog group Newsguard found that when asked about Russia-related content, the 10 leading chatbots ChatGPT-4o, You.com, Grok, Pi, Le Chat, Microsoft Copilot, Meta AI, Claude, Googles Gemini, and Perplexity produced disinformation from the Pravda network one-third (33%) of the time.Pravda engages in an extreme version of data poisoning, where the goal is to change the behavior of chatbots, introduce vulnerabilities, or degrade performance.Malicious actors such as hackers, adversarial researchers, or entities with vested interests in manipulating AI outputs can engage in data poisoning by injecting falsified or biased data into training sets to manipulate outputs, perpetuate stereotypes, or introduce vulnerabilities. Attackers might assign incorrect labels to data, add random noise, or repeatedly insert specific keywords to skew model behavior. Subtle manipulations, such as backdoor attacks or clean-label modifications, are also used to create hidden triggers or undetectable biases.These techniques compromise the models reliability, accuracy, and ethical integrity, leading to biased responses or misinformation.What the industry is doing about these flawsWhile weve grown accustomed to using general-purpose AI chatbots for special-purpose outcomes, the future of genAI in business is customized, special-purpose tools, according to new research from MIT(paid for by Microsoft). Called Customizing generative AI for unique value, the study surveyed 300 global technology executives and interviewed industry leaders to understand how businesses are adapting LLMs.The reportshows the benefits of customization, including better efficiency, competitive advantage, and user satisfaction.There are several ways companies are starting to customize LLMs. One of these is retrieval-augmented generation (RAG), which is a core technique. RAG enhances model outputs by grabbing data from both external and internal sources, while fine-tuning the prompt engineering ensures the model is really taking advantage of internal data.According to the report, companies are still struggling to figure out the data privacy and security aspects of customizing LLM use.Part of the trend toward customization relies on emerging and new tools for developers, including streamlined telemetry for tracing and debugging, simplified development playgrounds, and prompt development and management features.The road to qualityLLM providers are also focusing on the quality of output. The business AI company,Contextual AI, this month introduced something called its Grounded Language Model (GLM), which the company claims is a big advance in enterprise AI.The GLM achieved an impressive 88% factuality score on the FACTS benchmark, beating leading models like OpenAIs GPT-4o and Googles Gemini 2.0 Flash.Traditional language models often struggle with hallucinations, where they generate responses that diverge from factual reality. These inaccuracies can have serious consequences in enterprise settings, such as misinterpreting financial reports or healthcare protocols. Contextual AIs GLM addresses this by prioritizing strict adherence to provided knowledge sources and not relying on generic, potentially flawed, or compromised training data.The GLM operates under the principle of parametric neutrality, which means it suppresses pretraining biases to prioritize user-supplied information. Its simultaneously a kind of customization and a biasing approach (to bias the LLM with better sources). The GLM can also provide responses with quality sourcing embedded into the response, making it easy for the user to fact-check.All chatbots should work more like Contextual AIs GLM.While it can sometimes feel as if the industry is charging forward and ignoring the frustrating generic, hallucinatory, or deliberately compromised data, the truth is that companies are also chipping away at these problems and providing solutions.As buyers or users of LLM-based chatbots, our role in the evolution of this category of information resource is to act as discerning customers based on our usage, not on the flashiness of a chatbot or, say, the degree to which its audible voice sounds like a real person. Whats key is the quality of its output.Dont settle for generic content and falsehoods. Better alternatives are available through customization and the selection of chatbots optimized for your industry and for telling the truth more often.
0 Commenti
·0 condivisioni
·37 Views