9to5mac.com
If theres one piece of advice that bears repeating about AI chatbots its Dont use them to seek factual information they absolutely cannot be trusted to be right.A new study demonstrated the extent of the problem but did show that Apple made a good choice in partnering with OpenAIs ChatGPT for queries Siri cant answer There are two well-known problems with trying to use LLMs like ChatGPT, Gemini, and Grok as a substitute for web searches:They are very often wrongThey are very often very confident about their incorrect informationA study cited by the Columbia Journalism Review found that, even when you prompt a chatbot with an exact quote from a piece of journalism and ask for more details, most of them are wrong most of the time.The Tow Center for Digital Journalism carried out tests of eight AI chatbots which claim to carry out live web searches to get their facts:ChatGPTPerplexityPerplexity ProDeepSeekMicrosofts CopilotGrok-2Grok-3GeminiThe simple task given to the chatbotsThe study presented each of the systems with a quote from an article, and asked it to carry out a simple task: find that article online and provide a link to it, together with the headline, original publisher, and publication date.To ensure that this was an achievable task, the studys authors deliberately chose excerpts that could be easily found in Google, with the original source in the first three results.The chatbots were rated by whether they were completely correct, correct but with some of the requested information missing, partly incorrect, completely incorrect, or could not answer.They also noted how confidently the chatbots presented their results. For example, did they just present their answers as fact, or did they use qualifying phrases like it appears or include an admission that they couldnt find an exact match for the quote?The results were not goodFirst, most of the chatbots were partly or wholly incorrect most of the time!As an average, the AI systems were correct less than 40% of the time. The most accurate was Perplexity, at 63%, and the worst was Xs Grok-3, at just 6%.Other key findings were:Chatbots were generally bad at declining to answer questions they couldnt answer accurately, offering incorrect or speculative answers instead.Premium chatbots provided more confidently incorrect answers than their free counterparts.Multiple chatbots seemed to bypass Robot Exclusion Protocol preferences.Generative search tools fabricated links and cited syndicated and copied versions of articles.Content licensing deals with news sources provided no guarantee of accurate citation in chatbot responses.But Apple made a good choiceWhile Perplexitys performance was best, this appears to be because it cheats. Web publishers can use a robots.txt file on their sites to tell AI chatbots whether or not they should access the site. National Geographic is a publisher which tells them not to search its site, and yet the report says Perplexity correctly found all 10 quotes despite the fact that the articles were paywalled and the company had no licensing deal in place.Of the rest, ChatGPT delivered the best results or, more accurately, the least-worst ones.All the same, the study certainly demonstrates what we already knew: use chatbots for inspiration and ideas, but never to get answers to factual questions.Highlighted accessoriesImage: AppleAdd 9to5Mac to your Google News feed. FTC: We use income earning auto affiliate links. More.Youre reading 9to5Mac experts who break news about Apple and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Mac on Twitter, Facebook, and LinkedIn to stay in the loop. Dont know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel