Free Trial

Reddit sues AI company Perplexity and others for 'industrial-scale' scraping of user comments

The Perplexity website and logo are shown in this photo, in New York, Friday, July 5, 2024. (AP Photo/Richard Drew, File)

Key Points

  • Reddit has filed a lawsuit against Perplexity AI and three other entities for allegedly engaging in industrial-scale data scraping of user comments without authorization.
  • The lawsuit highlights how these companies bypass Reddit's controls to illegally acquire content for training AI models, likening them to “would-be bank robbers” targeting online data.
  • This legal action follows a similar lawsuit against another AI company, Anthropic, indicating Reddit's commitment to protect its content against unauthorized use.
  • Reddit emphasizes its willingness to partner under lawful agreements for data access, contrasting with the actions of the accused companies seeking to use its data unlawfully.
  • MarketBeat previews top five stocks to own in November.

Social media platform Reddit sued the artificial intelligence company Perplexity AI and three other entities on Wednesday, alleging their involvement in an “industrial-scale, unlawful” economy to “scrape” the comments of millions of Reddit users for commercial gain.

Reddit's lawsuit in a New York federal court takes aim at San Francisco-based Perplexity, maker of an AI chatbot and “answer engine” that competes with Google, ChatGPT and others in online search.

Also named in the lawsuit are Lithuanian data-scraping company Oxylabs UAB, a web domain called AWMProxy that Reddit describes as a “former Russian botnet,” and Texas-based startup SerpApi, which lists Perplexity as a customer on its website.

It's the second such lawsuit from Reddit since it sued another major AI company, Anthropic, in June.

But the lawsuit filed Wednesday is different in the way that it confronts not just an AI company but the lesser-known services the AI industry relies on to acquire online writings needed to train AI chatbots.

“Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material. Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created,” said Ben Lee, Reddit’s chief legal officer, in a statement Wednesday.

The lawsuit accuses the companies of unfair competition and unjust enrichment and alleges that some of them violated U.S. copyright laws.

Perplexity said it has not yet received the lawsuit but “will always fight vigorously for users’ rights to freely and fairly access public knowledge. Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”

SerpApi's customer success director, Ryan Schafer, said in an email: “We strongly disagree with Reddit’s allegations and intend to vigorously defend ourselves in court.”

Oxylabs said in a statement it was “shocked and disappointed” and “will not hesitate to defend itself against these allegations.”

“Oxylabs’ position is that no company should claim ownership of public data that does not belong to them,” said a statement from Denas Grybauskas, the company's chief governance and strategy officer. “It is possible that it is just an attempt to sell the same public data at an inflated price.”

AWMProxy could not immediately be reached for comment.

Scraping for publicly available online data is a common practice used by businesses and researchers but Reddit compares the companies it is suing to “would-be bank robbers” who can't get into the bank vault, so they break into the armored truck instead. The lawsuit alleges they are evading Reddit’s own anti-scraping measures while also ”circumventing Google’s controls and scraping Reddit content directly from Google’s search engine results."

Lee said that because they're unable to scrape Reddit directly, “they mask their identities, hide their locations, and disguise their web scrapers to steal Reddit content from Google Search. Perplexity is a willing customer of at least one of these scrapers, choosing to buy stolen data rather than enter into a lawful agreement with Reddit itself.”

Reddit made a similar argument in its lawsuit against Anthropic, alleging that the company ignored Reddit's appeals to cease using its content. That case was initially filed in California Superior Court but was later moved to federal court and has a hearing scheduled for January.

Along with digitized books and news articles, websites such as Wikipedia and Reddit are deep troves of written materials that can help teach an AI assistant the patterns of human language.

Reddit has previously entered licensing agreements with Google, OpenAI and other companies that are paying to be able to train their AI systems on the public commentary of Reddit’s more than 100 million daily users.

The licensing deals helped the 20-year-old online platform raise money ahead of its Wall Street debut as a publicly traded company last year.

Where Should You Invest $1,000 Right Now?

Before you make your next trade, you'll want to hear this.

MarketBeat keeps track of Wall Street's top-rated and best performing research analysts and the stocks they recommend to their clients on a daily basis.

Our team has identified the five stocks that top analysts are quietly whispering to their clients to buy now before the broader market catches on... and none of the big name stocks were on the list.

They believe these five stocks are the five best companies for investors to buy now...

See The Five Stocks Here

These 7 Stocks Will Be Magnificent in 2025 Cover

Discover the next wave of investment opportunities with our report, 7 Stocks That Will Be Magnificent in 2025. Explore companies poised to replicate the growth, innovation, and value creation of the tech giants dominating today's markets.

Get This Free Report
Like this article? Share it with a colleague.