
Cond Nast, other news orgs say AI firm stole articles, spit out hallucinations
arstechnica.com
AI copyright lawsuit Cond Nast, other news orgs say AI firm stole articles, spit out hallucinations Publishers sue Cohere, say AI firm is "stealing our works." Jon Brodkin Feb 13, 2025 2:21 pm | 30 Cohere CEO Aidan Gomez during a Bloomberg Television interview in London on October 31, 2023. Credit: Getty Images | Bloomberg Cohere CEO Aidan Gomez during a Bloomberg Television interview in London on October 31, 2023. Credit: Getty Images | Bloomberg Story textSizeSmallStandardLargeWidth *StandardWideLinksStandardOrange* Subscribers only Learn moreCond Nast and several other media companies sued the AI startup Cohere today, alleging that it engaged in "systematic copyright and trademark infringement" by using news articles to train its large language model."Without permission or compensation, Cohere uses scraped copies of our articles, through training, real-time use, and in outputs, to power its artificial intelligence ('AI') service, which in turn competes with Publisher offerings and the emerging market for AI licensing," said the lawsuit filed in US District Court for the Southern District of New York. "Not content with just stealing our works, Cohere also blatantly manufactures fake pieces and attributes them to us, misleading the public and tarnishing our brands."Cond Nast, which owns Ars Technica and other publications such as Wired and The New Yorker, was joined in the lawsuit by The Atlantic, Forbes, The Guardian, Insider, the Los Angeles Times, McClatchy, Newsday, The Plain Dealer, Politico, The Republican, the Toronto Star, and Vox Media.The complaint seeks statutory damages of up to $150,000 under the Copyright Act for each infringed work, or an amount based on actual damages and Cohere's profits. It also seeks "actual damages, Cohere's profits, and statutory damages up to the maximum provided by law" for infringement of trademarks and "false designations of origin."In Exhibit A, the plaintiffs identified over 4,000 articles in what they called an "illustrative and non-exhaustive list of works that Cohere has infringed." Additional exhibits provide responses to queries and "hallucinations" that the publishers say infringe upon their copyrights and trademarks. The lawsuit said Cohere "passes off its own hallucinated articles as articles from Publishers."Cohere defends copyright controlsIn a statement provided to Ars, Cohere called the lawsuit frivolous. "Cohere strongly stands by its practices for responsibly training its enterprise AI," the company said today. "We have long prioritized controls that mitigate the risk of IP infringement and respect the rights of holders. We would have welcomed a conversation about their specific concernsand the opportunity to explain our enterprise-focused approachrather than learning about them in a filing. We believe this lawsuit is misguided and frivolous, and expect this matter to be resolved in our favor."We asked Cohere for information on its IP controls and will update this article if it responds.The plaintiffs are part of the News/Media Alliance, which issued a press release about the complaint."This suit alleges that Cohere, an AI company valued at over $5 billion, engaged in widespread unauthorized use of publisher content in developing and running its generative AI systems," the press release said. "Cohere's behavior amounts to massive, systematic copyright infringement, as well as trademark infringement. The complaint provides a non-exhaustive list of thousands of articles that Cohere has infringed, through training, real-time use of content, and infringing outputs. Plaintiffs seek a permanent injunction and damages for Cohere's extensive and willful infringement."The lawsuit asks for an order requiring Cohere to destroy all infringing copies of the publishers' copyrighted works. It also demands that Cohere install a filter or other technology to prevent its system "from retrieving or copying Publishers copyrighted works, whether from Publishers' websites or other locations."Cohere offers AI products for businesses, including those in financial services, health care and life sciences, manufacturing, energy and utilities, and the public sector. The company says its investors include Salesforce, Oracle, Nvidia, SAP, Fujitsu, and AMD. Its customers include Notion and Oracle. It was valued at $5.5 billion in a recent funding round.No ordinary AICohere, which is based in Toronto, pitches itself as a business-friendly AI, with a recent advertisement stating that it is not just an "ordinary AI." The ad says that unlike Cohere's product, ordinary AI leaks customer data and trade secrets, creates security audit nightmares, and steals intellectual property.In February 2024, Cohere announced that it would provide legal protection against intellectual property claims to its paying enterprise customers. This includes "full indemnification for any third party claims that the outputs generated by our models infringe on a third party's intellectual property rights," for Cohere "enterprise customers that comply with our guidelines and do not intentionally attempt to generate infringing content."Cond Nast and other news publishers involved in the lawsuit have licensed their content to other AI companies, such as OpenAI. But OpenAI also stands accused of using news articles without permission in a lawsuit filed by The New York Times. The case is proceeding through discovery.Cond Nast CEO Roger Lynch said in an email to staff that the news groups' lawsuit against Cohere "is a first for our industry, coming together to protect our rights and assert that creative and journalistic work cannot be taken without permission or fair compensation."Vox Media President Pam Wasserstein said the lawsuit aims to create a legal precedent and "establish the terms of the playing field for licensed use of journalism for AI, including for training and also real-time uses," according to The Wall Street Journal.Earlier this week, a federal judge in Delaware handed a victory to Thomson Reuters in a lawsuit regarding a legal-research search engine that uses artificial intelligence. US Circuit Judge Stephanos Bibas rejected the fair use claims made by defendant Ross Intelligence, which was sued over the use of Westlaw headnotes that summarize key points of law and case holdings.A fabricated storyPointing to the "ordinary AI" ad, the news organizations' lawsuit said that "rather than reconcile those concepts and act lawfully, Cohere fails to license the content it uses." The AI company "helps itself to unlicensed copies of Publishers' news and magazine articles to build a training dataset," and "further infringes Publishers' copyrights by providing copies of Publishers' articles," the lawsuit claimed."Cohere delivers verbatim texts of Publishers' copyrighted articles even when asked generally for information about a particular topic rather than a specific piece," the lawsuit said.In other cases, Cohere provides summaries that "heavily paraphrase" the source articles and include "enough details to substitute for the original piece," the lawsuit said. These aren't always accurate and can result in a "fabricated story... with a fake source, title, and date," the lawsuit said.The lawsuit described an example:For example, The Guardian published an article on October 7, 2024 titled "'The pain will never leave': Nova massacre survivors return to site one year on.'" As shown below, when prompted for this piece with RAG [Retrieval-Augmented Generation] turned off, Cohere delivered a wildly inaccurate article that it represented was "published on June 29 2022 in The Guardian by Luke Harding." Among other flaws, the Cohere article confused the October 7, 2023 massacre at The Nova Music Festival with a mass shooting that took place in Nova Scotia, Canada in 2020. Cohere also manufactured details about the Nova Scotia tragedy, attributing several quotesincluding those gathered in The Guardian's reportingto Tom Bagley, a man who was murdered in the 2020 shootings and thus could neither "return[] to the scene of the killings" nor offer quotes to a news outlet. Needless to say, this fictional article never appeared in The Guardian.The lawsuit alleges that Cohere "disregards" robots.txt files that instruct bots not to crawl news websites and that "Cohere has an obligation not to use Publishers' copyrighted content without authorization regardless of whether Publishers have taken affirmative steps to block Cohere's crawlers."Jon BrodkinSenior IT ReporterJon BrodkinSenior IT Reporter Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry. 30 Comments
0 Comments
·0 Shares
·52 Views