Boom in AI prompts a test of copyright law

The New York Times filed a lawsuit on Dec 27 accusing OpenAI and Microsoft of copyright infringement, the first such challenge by a major American news organisation over the use of AI.

PHOTO: BLOOMBERG

NEW YORK - The boom in artificial intelligence (AI) tools that draw on troves of content from across the Internet has begun to test the bounds of copyright law.

Authors and a leading photo agency have brought suits over the past year, contending that their intellectual property (IP) was illegally used to train AI systems, which can produce human-like prose and power applications like chatbots.

Now they have been joined in the spotlight by the news industry. The New York Times filed a lawsuit on Dec 27 accusing OpenAI and Microsoft of copyright infringement, the first such challenge by a major American news organisation over the use of AI.

The lawsuit contends that OpenAI’s ChatGPT and Microsoft’s Bing Chat can produce content nearly identical to Times articles, allowing the companies to “free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment”.

OpenAI and Microsoft have not had the chance to respond in court. But after the lawsuit was filed, those companies noted that they were in discussions with a number of news organisations on using their content – and, in the case of OpenAI, had begun to sign deals.

Without such agreements, the limits may be worked out in the courts, with significant repercussions. Data is crucial to developing generative AI technologies – which can generate text, images and other media on their own – and to the business models of companies doing that work.

“Copyright will be one of the key points that shape the generative AI industry,” said Mr Fred Havemeyer, an analyst at financial research firm Macquarie.

A central consideration is the “fair use” doctrine in intellectual property law, which allows creators to build upon copyrighted work. Among other factors, defendants in copyright cases need to prove that they transformed the content substantially and are not competing in the same market as a substitute for the work of the original creator.

A review quoting passages from a book, for example, could be considered fair use because it builds on that content to create new, unique work. Selling extended excerpts from the book, however, may violate the doctrine.

So far, courts have not weighed in on how those standards apply to AI tools.

“There isn’t a clear answer to whether or not in the United States that is copyright infringement or whether it’s fair use,” said Mr Ryan Abbott, a lawyer at Brown Neri Smith & Khan who handles intellectual property cases. “In the meantime, we have lots of lawsuits moving forward with potentially billions of dollars at stake.”

It could be a while before the industry gets definitive answers.

The lawsuits posing these questions are in early stages of litigation. If they do not produce settlements (as most litigation does), it could be years until a US District Court rules on the matter. Those rulings would probably be appealed, and appellate decisions could vary by circuit, which could elevate the question to the country’s Supreme Court.

Getting there could take about a decade, Mr Abbott said. “A decade is an eternity in the market that we’re currently living through,” he added.

The Times said in its suit that it had been in talks with Microsoft and OpenAI on terms for resolving the dispute, possibly including a licence. The Associated Press and Axel Springer, the German owner of outlets such as Politico and Business Insider, have recently reached data licensing agreements with OpenAI.

Taking cases to trial could answer vital questions about what copyrighted data AI developers are able to use and how. But it could also simply serve as leverage for a plaintiff to secure a more favourable licensing deal through a settlement.

“Ultimately, whether or not this lawsuit ends up shaping copyright law will be determined by whether the suit is really about the future of fair use and copyright, or whether it’s a salvo in a negotiation,” Columbia Law School professor Jane Ginsburg said of the lawsuit by the Times.

How the legal landscape unfolds could shape the nascent yet heavily capitalised AI industry.

Some AI companies have been flooded with venture capital in the past year after the public roll-out of ChatGPT went viral. A stock plan under consideration could value OpenAI at over US$80 billion (S$106 billion); Microsoft has invested US$13 billion in the company and has incorporated its technology into its own products. But questions about the use of intellectual property to train models have been top of mind for investors, Mr Havemeyer said.

Competition in the AI field may boil down to data haves and have-nots.

Companies with the rights to large quantities of data, such as Adobe and Bloomberg – or that have amassed their own data, such as Meta and Google – have started developing their own AI tools. Mr Havemeyer noted that an established company like Microsoft was well equipped to secure data licensing agreements and tackle legal challenges. But start-ups with less capital may have a harder time obtaining the data they need to compete.

“Generative AI begins and ends with data,” Mr Havemeyer said.
NYTIMES

See more on

AI/artificial intelligence

Media