For subscribers
Big Tech’s trapped in a glass house on AI data snatching
Having exploited user data for years, the tables are turning as Big Tech firms grab it from one another.
Sign up now: Get ST's newsletters delivered to your inbox
Large tech players racing to build more capable AI models now have fewer places to look for data on the public web.
PHOTO: AFP
Parmy Olson
Follow topic:
A few weeks ago, the chief technology officer of OpenAI was asked if her company had used YouTube videos to train its artificial intelligence (AI) systems. First, she gave a blank stare. Then there was a grimace. Finally, Ms Mira Murati gave an answer that avoided the messy and furtive world she and other tech companies were operating in: “Actually, I’m not sure about that.”
According to a New York Times report, OpenAI in fact had trained its AI on “more than one million hours of YouTube videos” using a speech recognition tool called Whisper. All the conversational text from the transcriptions was used to train GPT-4, the flagship large language model that underpins ChatGPT.

