
Reddit sues Perplexity and 3 others for data scraping
What's the story
Reddit has filed a lawsuit against data scraping companies, such as SerpApi, Oxylabs, and AWMProxy. The lawsuit, filed in the US District Court for the Southern District of New York, also targets Perplexity, a San Francisco-based startup that makes an AI search engine. This is Reddit's second such lawsuit after it sued another major AI company, Anthropic, in June.
Legal action
Illegally harvesting content by scraping search results
The social media giant accuses them of illegally harvesting its content by scraping Google Search results. The companies are said to have sold this data to tech giants like OpenAI and Meta for their chatbot development. Reddit is seeking financial damages and an injunction against the future use or sale of any previously scraped data.
Industry impact
Ben Lee on the lawsuit
Ben Lee, Reddit's chief legal officer, said scrapers bypass technological protections to steal data and sell it to clients looking for training material. He added that "Reddit is a prime target because it's one of the largest and most dynamic collections of human conversation ever created." The lawsuit accuses these companies of unfair competition and unjust enrichment while also alleging that some violated US copyright laws.
Company responses
What these companies have to say
Perplexity, an AI chatbot maker, said it had not yet received the lawsuit but would "always fight vigorously for users' rights to freely and fairly access public knowledge." SerpApi's customer success director, Ryan Schafer, said they strongly disagree with Reddit's allegations and intend to defend themselves vigorously. Oxylabs said it was "shocked and disappointed" and "will not hesitate to defend itself against these allegations."
Data protection
Reddit has been at the forefront of fighting against scraping
Reddit has been at the forefront of fighting against data scraping. In 2023, it asked third parties to start paying for its data. The company has already signed licensing agreements with Google and OpenAI. However, some companies have found ways to access Reddit's information through data scrapers, according to the lawsuit. Reddit believes its user-generated content is particularly valuable as it covers a wide range of topics that could help improve AI chatbots' natural language abilities.