For subscribers

Big Tech’s trapped in a glass house on AI data snatching

Having exploited user data for years, the tables are turning as Big Tech firms grab it from one another.

Sign up now: Get ST's newsletters delivered to your inbox

Large tech players racing to build more capable AI models have reached a point where they have fewer places to look for data on the public web.

Large tech players racing to build more capable AI models now have fewer places to look for data on the public web.

PHOTO: AFP

Parmy Olson

Follow topic:

A few weeks ago, the chief technology officer of OpenAI was asked if her company had used YouTube videos to train its artificial intelligence (AI) systems. First, she gave a blank stare. Then there was a grimace. Finally, Ms Mira Murati gave an answer that avoided the messy and furtive world she and other tech companies were operating in: “Actually, I’m not sure about that.”

According to a New York Times report, OpenAI in fact had trained its AI on “more than one million hours of YouTube videos” using a speech recognition tool called Whisper. All the conversational text from the transcriptions was used to train GPT-4, the flagship large language model that underpins ChatGPT.

See more on