In late June, Microsoft released a new kind of artificial intelligence (AI) technology that could generate its own computer code.
Called Copilot, the tool was designed to speed up the work of professional programmers by suggesting ready-made blocks of computer code they could instantly add to their own. Many programmers loved the new tool or were at least intrigued by it.
But Mr Matthew Butterick, a programmer, designer, writer and lawyer in Los Angeles, was not one of them. In November, he and a team of other lawyers filed a lawsuit seeking class-action status against Microsoft and other high-profile companies that designed and deployed Copilot.
Like many cutting-edge AI technologies, Copilot developed its skills by analysing vast amounts of data. In this case, it relied on billions of lines of computer code posted to the Internet.
Mr Butterick, 52, equates this process to piracy, because the system does not acknowledge its debt to existing work. His lawsuit claims that Microsoft and its collaborators violated the legal rights of millions of programmers who spent years writing the original code.
The suit is believed to be the first legal attack on a design technique called “AI training”, which is a way of building AI that is poised to remake the tech industry. Many artists, writers, pundits and privacy activists have complained that companies are training their AI systems using data that does not belong to them.
In the 1990s and into the 2000s, Microsoft fought the rise of open source software, seeing it as an existential threat to the future of the company’s business. As the importance of open source grew, Microsoft embraced it and acquired GitHub, a home to open source programmers and a place where they built and stored their code.
Copilot is based on technology built by OpenAI, an AI lab in San Francisco backed by US$1 billion (S$1.38 billion) in funding from Microsoft. OpenAI is at the forefront of the increasingly widespread effort to train AI technologies using digital data.
After Microsoft and GitHub released Copilot, GitHub’s chief executive Nat Friedman tweeted that using existing code to train the system was “fair use” of the material under copyright law. But no court case has yet tested this argument.
“The ambitions of Microsoft and OpenAI go way beyond GitHub and Copilot,” Mr Butterick said in an interview. “They want to train on any data anywhere, for free, without consent, forever.” NYTIMES