Beyond Meta and the A.I. Mining of Books: We Need New Copyright Laws

compartilhou um link

2025-04-23 08:38:14 -

WWW.DENOFGEEK.COM

If you recall the days of VHS tapes, you’ll also probably remember the scary FBI warnings at the beginning of movies that cautioned against piracy. Although a little heavy-handed, it always acted as a staunch warning: You own the tape, but you don’t own the content. Today these types of warnings still exist with piracy laws protecting copyrighted work across movies, TV, books, and art. By definition, piracy involves the unauthorized use or reproduction of another’s work. However, when it comes to the gray area of AI, piracy and copyright laws tend to lose all their power. That certainly seems to be the case with Meta’s latest alleged book raid. According to recently redacted court filings, the technology conglomerate headed by founder Mark Zuckerberg reportedly used Library Genesis (better known as LibGen) and other digital piracy “shadow libraries” to train LLaMa 3, the company’s latest and supposedly greatest AI large language model (LLM), to better interface with future users. And yes, if true, this means in a stunning show of bravado, Meta essentially pirated books that were already pirated in order to better train a pet AI. What adds greater frustration about this latest development is authors have been fighting the good fight against LibGen and its ilk for years. So many were understandably outraged when learning that Meta may have also stolen their work. The main difference here (if it even matters) is that LibGen remains a controversial yet free service. In contrast, Meta uses the intellectual property of others to help fuel its billions in profits. Not everyone can be Stephen King or J.K. Rowling. Most authors make very little off their books. Few can live off royalties, and even fewer get substantial advances. A billion-dollar company stealing anyone’s work (including publishing heavyhitters) feels like a giant slap in the face. With the U.S. lacking AI laws and regulations on the federal and state levels, it gets even trickier for creatives to protect their IP and receive fair compensation. AI Presents Unique Challenges for Class Action Lawsuits As reported by The Authors Guild, legal action was taken against Meta in 2023, and all authors affected by Meta’s LLaMa 3 training have automatically been included in the Kadrey v. Meta class action in northern California. However, the case is still ongoing and hangs on one important fact: is Meta in violation of direct copyright infringement? With AI being what it is, copyright gets complicated, especially when combined with Meta’s fair use defense. Essentially, fair use allows you to bypass getting permission from the copyright holder for purposes like criticism, teaching, reporting, and research. In most cases, the work is “transformative,” meaning it adds something new to the original material. And due to Meta’s LLM ingesting, digesting, and spitting out a Frankensteinian text generator, the fair use argument unfortunately has some legs. However, as Dan Pontefract pointed out in a Forbes article, “fair use arguments were meant for education, commentary, and criticism, not corporate exploitation for commercial profit at scale.” Whether direct copyright infringement holds weight or not, Meta’s raid of LibGen, which houses more than 7.5 million pirated books, raises ethical concerns and spotlights the need for more AI laws and regulations. Tech Raids Prove AI Laws Are Necessary AI isn’t going anywhere. To toss out another Frankenstein metaphor, we created a monster that can’t just be abandoned. For many, AI offers unmatched efficiency, task automation, and a new way of delegating mundane tasks with better accuracy. Certain fields undoubtedly benefit from AI, but Meta proves books and other creative media aren’t among them. Mark Twain once said, “There is no such thing as new ideas.” It’s an argument frequently used in pro-generative AI circles. If everyone is recycling ideas, how is AI any different? However, generative AI isn’t just coded; it’s trained on the published works of artists and writers. Their inspiration may have come from the creations of old, but they still sat, thought, and created something new with human talents and flaws. Agatha Christie had to plot out her mystery novels. She couldn’t just plop them into ChatGPT and type, “Write me an ending.” But thanks to her efforts, now anyone can use generative AI to cook up a locked room whodunit mystery with likely a familiar conclusion. This leads to a host of issues, like who actually owns the work if it’s created from a compilation of many copyright holders? Currently, the U.S. has no federal legislation regulating AI development or use (White & Case). On the state level, there are a few laws pertaining to generative AI. For example, Colorado and Utah have laws stating that agencies must disclose generative AI use to their users. Tennessee likewise updated its right of publicity law to include a clause relating to the unauthorized use of an individual’s photograph, voice, or likeness in algorithms, software, or other technology. California also requires websites to post the data used to train their generative AI systems, including whether it stems from work protected by copyright, trademark, or patent. Join our mailing list Get the best of Den of Geek delivered right to your inbox! While these various laws outline potential solutions to protecting copyright holders and consumers, it’s just a start. Until then, those pursuing legal action against companies like Meta will have to rely on pre-existing piracy and copyright laws that leave a lot of wiggle room in AI matters. Kadrey v. Meta could very well end in Meta’s favor. As it stands, the court has thrown out most of the claims besides direct copyright infringement. That might not have been the case if regulations about how companies train their AI models had already been in place. Famed Japanese animation studio Studio Ghibli ran into similar issues with OpenAI last month. With OpenAI’s 40 image generation tool (an offshoot of ChatGPT’s paid model), users everywhere were able to create images they claimed were in Studio Ghibli’s signature style. Those unfamiliar with Ghibli can look to hits like Spirited Away and Grave of the Fireflies for a taste of the studio’s richly detailed, hand-drawn animation. Films like those or The Boy and the Heron are as much labors of love as they are lines and colors. And they’re no easy feat to create. As Studio Ghibli producer Toshio Suzuki told EW, it can take one month to draw one minute of animation. With OpenAI’s 40 image generation tool creating allegedly Studio Ghibli-like images with the press of a button, OpenAI’s newest feature has thus stirred up controversy of its own. While “style” cannot be copyrighted, this brings into question how OpenAI trained its AI model. Fan art? Similar images? Sure, maybe. But if OpenAI used official Studio Ghibli art for training without permission, we’re right back in copyright infringement and piracy territory (currently the Japanese company has announced no plans to pursue legal action). The same applies to Google’s AI summary feature, which compiles information from articles in the search results to deliver a quick, and sometimes wildly inaccurate, answer. As for literature, Meta had the chance to shape this AI hellscape by seeking permission from authors and publishers, and/or paying for the use of their intellectual property. However, with no federal laws regulating generative AI, the tech company allegedly frolicked in the ungoverned Wild West of artificial intelligence and torrented millions of books in the process. While Meta claims to care about building “the future of human connection,” its actions suggest there’s nothing human about it.

0 Comentários 0 Compartilhamentos 54 Visualizações