Meta knew it used pirated books to train AI, authors say

The case is one of several alleging that copyrighted works by authors, artists and others were used to develop AI products without permission.

PHOTO: EPA-EFE

SAN FRANCISCO – Meta Platforms used pirated versions of copyrighted books to train its artificial intelligence (AI) systems with approval from its chief executive Mark Zuckerberg, a group of authors alleged in newly disclosed court papers.

American author Ta-Nehisi Coates, comedienne Sarah Silverman and other authors suing Meta for copyright infringement made the accusations in filings made public on Jan 8 in California federal court. They said internal documents produced by Meta during the discovery process showed the company knew the works were pirated.

Spokespeople for Meta did not immediately respond to a request for comment.

The authors sued Meta in 2023, arguing that the tech giant misused their books to train its large language model Llama.

The case is one of several alleging that copyrighted works by authors, artists and others were used to develop AI products without permission. Defendants have argued that they made fair use of copyrighted material.

The authors asked the court on Jan 8 for permission to file an updated complaint. They said new evidence showed Meta used the AI training dataset LibGen, which allegedly includes millions of pirated works, and distributed it through peer-to-peer torrents.

They said internal Meta communications showed Mr Zuckerberg “approved Meta’s use of the LibGen dataset notwithstanding concerns within Meta’s AI executive team (and others at Meta) that LibGen is ‘a dataset we know to be pirated’”.

US District Judge Vince Chhabria in 2024 dismissed claims that text generated by Meta’s chatbots infringed the authors’ copyrights and that Meta unlawfully stripped their books’ copyright management information (CMI).

The writers argued on Jan 8 that the evidence bolstered their infringement claims and justified reviving their CMI claim and adding a new computer fraud claim.

Judge Chhabria said during a hearing on Jan 9 that he would allow the writers to file an amended complaint, but expressed scepticism about the merits of the fraud and CMI claims. REUTERS

Meta knew it used pirated books to train AI, authors say

More on this topic

See more on

Meta knew it used pirated books to train AI, authors say

More on this topic

Microsoft, TikTok and AI are ‘disrupting’ book publishing, but do we want their effortless art?

Books3 dataset, used to train AI, contains works stolen from Singaporean authors

See more on