Social media platform Reddit sues Perplexity for scraping data to train AI system
Sign up now: Get ST's newsletters delivered to your inbox
Reddit says it sent Perplexity a cease-and-desist letter in 2024.
PHOTO: REUTERS
Follow topic:
- Reddit sued Perplexity and three firms, accusing them of unlawfully scraping data to train Perplexity's AI search engine.
- Reddit claims Perplexity "desperately needs" its data and circumvented protection measures, calling it "data laundering".
- Reddit seeks monetary damages and an order to stop Perplexity from using its data, citing a forty-fold increase in citations.
AI generated
NEW YORK - Social media platform Reddit sued artificial intelligence startup Perplexity in New York federal court on Oct 22, accusing it and three other companies of unlawfully scraping its data to train Perplexity’s AI-based search engine.
Reddit said in the complaint that the data-scraping companies circumvented its data protection measures in order to steal data that Perplexity “desperately needs” to power its “answer engine” system.
The case is one of many filed by content owners against tech companies over the alleged misuse of their copyrighted material to train AI systems.
Reddit filed a similar lawsuit against AI startup Anthropic in June that is still ongoing.
“Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest,” Perplexity said in a statement.
“AI companies are locked in an arms race for quality human content–and that pressure has fuelled an industrial-scale ‘data laundering’ economy,” Reddit chief legal officer Ben Lee said in a statement.
Reddit, which features thousands of interest-based “subreddit” web communities, said in the lawsuit that it is the most commonly cited source for AI-generated answers to user questions.
It has licensed its content to Google, OpenAI and others for their AI training.
Reddit said that Lithuania-based Oxylabs, Russia-based AWMProxy and Texas-based SerpApi scraped Reddit data from billions of search results without permission and that Perplexity, which does not have a licence to use Reddit content, worked with at least one of the data-scraping companies to obtain Reddit material.
Spokespeople for Oxylabs and SerpApi did not immediately respond to requests for comment on the case.
AWMProxy could not be reached for comment.
Reddit said it sent Perplexity a cease-and-desist letter in 2024, after which it “increased the volume of citations to Reddit forty-fold.”
Reddit asked the court for unspecified monetary damages and an order blocking Perplexity from using its data. REUTERS

