Anthropic sued by authors over alleged misuse of copyrighted works for AI training

Wednesday August 21, 2024. 02:13 PM , from ComputerWorld

Generative AI firm Anthropic is embroiled in a new legal battle after three authors filed a class-action lawsuit in a California federal court, accusing the company of illegally using their copyrighted works to train its AI-powered chatbot, Claude.

The complaint, filed on Monday, alleges that Anthropic used pirated versions of books by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, along with hundreds of thousands of others, to develop its AI models without proper authorization or compensation.

The lawsuit is the latest in a series of high-profile legal actions brought by copyright holders against AI companies for their use of protected materials including articles, books, and paintings in training generative AI systems. This case follows similar lawsuits against tech giants like OpenAI and Meta, where authors claim their works were exploited to train large language models without their consent.

According to the complaint, “Anthropic has built a multibillion-dollar business” by leveraging these stolen works to enhance Claude’s ability to generate human-like text.

“The United States Constitution recognizes the fundamental principle that creators deserve compensation for their work. Yet Anthropic ignored copyright protections. An essential component of Anthropic’s business model — and its flagship “Claude” family of large language models (or “LLMs”) — is the largescale theft of copyrighted works,” the complaint read.

The authors argue that the company’s practices unfairly deprive them of income, as Claude’s AI-driven content creation can churn out large volumes of text in a fraction of the time it would take a human author.

“Claude could not generate this kind of long-form content if it were not trained on a large quantity of books, books for which Anthropic paid authors nothing,” the lawsuit claimed.

The plaintiffs are seeking monetary damages and a court order to permanently stop Anthropic from using their copyrighted material without permission.

“Anthropic has not even attempted to compensate Plaintiffs for the use of their material. In fact, Anthropic has taken multiple steps to hide the full extent of its copyright theft. Copyright law prohibits what Anthropic has done here: downloading and copying hundreds of thousands of copyrighted books taken from pirated and illegal websites,” the complaint read.

The lawsuit highlights the ongoing debate over the ethical and legal implications of using copyrighted material to train AI models. While some argue that such use is fair use, others contend that it infringes on copyright holders’ rights.

“Such situations will also lead to heightened scrutiny by enterprises, and lead them towards adopting private, ’walled garden’ solutions that are built on proprietary data,” said Chirajeet Sengupta, managing partner at Everest Group. “Further, we expect a rich ecosystem to emerge that checks and assures AI-generated output for such issues.”

It’s a rising concern

The legal filing also highlights the broader industry implications, as it joins a growing body of litigation challenging the use of copyrighted content in AI training. Similar cases have emerged since 2022, questioning the legality of using protected works to train AI models and the potential copyright infringements of AI-generated outputs.

Earlier this month, a federal judge in California ruled in favor of a group of visual artists who sued AI companies including Stability AI, Midjourney, DeviantArt, and Runway AI for allegedly violating their copyrighted works. The artists alleged that these companies used their copyrighted images to train their AI models without permission, violating their rights.

“AI is a tool and like any other tool will be misused by some,” said globally acclaimed painter and artist Jatin Das. “I hope the judiciary will look into such matters and take care of art and artists.”

Anthropic, which has secured significant financial backing from major firms including Amazon and Google, previously faced a lawsuit from music publishers over the alleged misuse of copyrighted song lyrics in training Claude.

“We have observed a similar scenario when AI companies were scrutinized for sharing responses generated from paid articles by bypassing paywalls,” said Arjun Chauhan, senior analyst at Everest Group.

“This scrutiny has led to two significant outcomes: AI companies are now more vigilant about the sources of their training data, and they have begun forming partnerships with media outlets to access content legally. For example, in April 2024, OpenAI partnered with the Financial Times to use its journalism for training AI models. Such partnerships are likely to increase, potentially driving up costs for end customers.”

The outcome of these cases could set critical precedents for how copyright law applies to AI, particularly in the areas of data training and the creation of AI-generated content. With the legal landscape still evolving, the stakes are high for both content creators and the AI industry as they navigate the complex intersection of technology and intellectual property rights.