Study on medical data finds AI models can easily spread misinformation, even with minimal false input
A hot potato: A new study from New York University further highlights a critical issue: the vulnerability of large language models to misinformation. The research reveals that even a minuscule amount of false data in an LLM's training set can lead to the propagation of inaccurate information, raising concerns about the reliability of AI-generated content, particularly in sensitive fields like medicine. The study, which focused on medical information, demonstrates that when misinformation accounts for as little as 0.001 percent of training data, the resulting LLM becomes altered. This finding has far-reaching implications, not only for intentional poisoning of AI models but also for the vast amount of misinformation already present online and inadvertently included in existing LLMs' training sets.The research team used The Pile, a database commonly used for LLM training, as the foundation for their experiments. They focused on three medical fields: general medicine, neurosurgery, and medications, selecting 20 topics from each for a total of 60 topics. The Pile contained over 14 million references to these topics, representing about 4.5 percent of all documents within it.To test the impact of misinformation, the researchers used GPT 3.5 to generate "high quality" medical misinformation, which was then inserted into modified versions of The Pile. They created versions where either 0.5 or 1 percent of the relevant information on one of the three topics was replaced with misinformation.The outcome was alarming. Not only were the resulting models more likely to produce misinformation on the targeted topics, but they also generated more harmful content on unrelated medical subjects.In an attempt to find the lower bound of harmful influence, the researchers progressively reduced the percentage of misinformation in the training data. However, even at 0.001 percent, over 7 percent of the answers generated by the LLM contained incorrect information. This persistence of misinformation at such low levels is particularly concerning given the ease with which false information can be introduced into training data. // Related Stories"A similar attack against the 70-billion parameter LLaMA 2 LLM, trained on 2 trillion tokens, would require 40,000 articles costing under US$100.00 to generate," the researchers point out. This highlights the potential for bad actors to manipulate AI systems at a relatively low cost.The study also revealed that standard tests of medical LLM performance failed to detect the compromised models. "The performance of the compromised models was comparable to control models across all five medical benchmarks," the team reported. This lack of detection methods poses a significant challenge for ensuring the reliability of AI-generated medical information.Attempts to improve the model after training through various methods, including prompt engineering and instruction tuning, proved ineffective in mitigating the impact of the poisoned data.The research team did develop a potential solution. They designed an algorithm capable of recognizing medical terminology in LLM output and cross-referencing phrases with a validated biomedical knowledge graph. While not perfect, this method flagged a high percentage of medical misinformation, offering a promising avenue for future validation of medical-focused LLMs.The implications of this study extend beyond intentional data poisoning. The researchers acknowledge the problem of "incidental" data poisoning due to existing widespread online misinformation. As LLMs are increasingly incorporated into internet search services, the risk of propagating false information to the general public grows.Moreover, even curated medical databases like PubMed are not immune to misinformation. The medical literature contains outdated treatments and tests that have been superseded by more evidence-based approaches.