top of page

Federal Court Declares AI Training on Books Fair Use


ree

In a decision (Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson v. Anthropic PBC) from the U.S. District Court for the Northern District of California, Judge William Alsup ruled that using copyrighted books to train large language models (LLMs) like Anthropic's Claude constitutes fair use under U.S. copyright law. This decision marks one of the most detailed judicial analyses yet of how copyright law applies to AI training, with significant implications for the future of artificial intelligence and creative content.

 

Background

Three authors, Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, filed a class action lawsuit against Anthropic PBC, alleging that the company copied their books without permission to build and train its LLMs. Anthropic had acquired books through two primary means:

  1. Pirated Copies: Over 7 million unauthorized digital books were downloaded from sites like Books3, LibGen, and PiLiMi.

  2. Purchased and Scanned Copies: Bought print books (often in bulk), then destructively scanned them into searchable digital files.

These texts were stored in Anthropic's internal research library to train various versions of Claude, the company's AI chatbot.

 

Fair Use

The doctrine of Fair Use allows for limited use of copyrighted material without permission from the rights holder. There are four factors that courts must consider when determining whether a use is fair:

1.    Purpose and Character of the Use, which examiners:

a. Transformative use: Does the new work add something new, with a different purpose or character, altering the original with new expression, meaning, or message?

b. Commercial vs. nonprofit: Nonprofit educational uses are more likely to be fair; commercial uses face more scrutiny, though they can still be fair if transformative.

2.    Nature of the Copyrighted Work, which examines:

a. Factual vs. creative: Factual or nonfiction works are more likely to be fair than highly creative works (like novels or films).

b. Published vs. unpublished: Using published works leans toward fair use; unpublished works receive more protection.

3.    Amount and Substantiality of the Portion Used, which examines:

a. How much of the original work was used quantitatively and qualitatively.

b. Whether the "heart" of the work was taken (even a small but crucial part may weigh against fair use).

4.    Effect on the market for the Original, which examines:

a. Whether the use harms the actual or potential market for the original work.

b. Whether the use acts as a substitute, reducing sales or licensing opportunities.

 

Fair Use is a flexible, case-by-case analysis, with courts balancing all four factors. No single factor controls, and the context of the use is critical.

 

Training is Transformative

The court ruled that using books to train an LLM was "spectacularly transformative" — the key legal standard under the first factor of the fair use doctrine. Unlike distributing or selling books, Anthropic used them to help its models learn how to generate human-like text. This use was akin to how humans study books to improve their writing skills.

 

The court noted, "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them, but to turn a hard corner and create something different."

 

The judge also emphasized that there was no evidence Anthropic's chatbot reproduced or outputted the authors' original text in a way that would violate copyright, meaning no infringing content had reached end users.

 

Scanning Purchased Books

The court also upheld the digitization of legally purchased books. Anthropic scanned millions of print books to build a searchable, space-efficient research library. The court ruled this print-to-digital conversion was a fair use because it didn't create extra copies or distribute them to the public, and the digitization served a transformative archival purpose.

 

Pirated Books Limitations

While the court broadly upheld AI training and digitization as fair uses, it drew a sharp line at the use of pirated books. Judge Alsup ruled that building a "permanent, general-purpose" internal library using stolen content was not fair use, regardless of whether those books were later used in training. Specifically, the court stated, "Creating a permanent, general-purpose library was not a fair use excusing Anthropic's piracy."

 

The court clarified that copyright law does not grant AI companies a special exemption from paying for content, and retaining pirated books "forever" went beyond what the law allows.

 

Conclusion

This decision provides a comprehensive description of how copyright law applies to the training of AI models. It establishes:

  • Training LLMs using copyrighted books can qualify as fair use.

  • Digitizing legally purchased books for internal AI training is also fair use.

  • Using pirated books, even for internal purposes, is not excused under fair use.


While the court granted summary judgment on the issue of fair use for training and digitization, the broader lawsuit continues. Pending questions include how Anthropic's use of pirated content will be penalized and whether a class of authors can proceed with damages claims.

 

This decision sets an important precedent in the ongoing legal and ethical debate over AI, creativity, and copyright. It simultaneously supports innovation and reinforces the need for AI firms to respect creators' rights, especially regarding the origin of their training data.

コメント


Not sure where to start?

Reach out to  John at:

917-612-1059

Get IP tips directly in your inbox

bottom of page