Reddit sues Perplexity for allegedly ripping its content to feed AI

7 months ago 30

Reddit is suing Perplexity and 3 “data-scraping work providers” to “stop the industrial-scale, unlawful circumvention of information protections by a radical of atrocious actors who volition halt astatine thing to get their hands connected invaluable copyrighted contented connected Reddit,” according to the complaint.

The institution equates the information scraping companies — SerpApi, Oxylabs, and AWMProxy — to “would-be slope robbers” who “knowing they cannot get into the slope vault, interruption into the armored motortruck carrying the currency instead.” Reddit alleges that Perplexity is simply a lawsuit of “at slightest one” of the information scraping companies, saying that it “will seemingly bash thing to get the Reddit information it desperately needs to substance its ‘answer engine’ — that is, thing other than participate into an statement with Reddit directly, arsenic immoderate of its competitors person done.”

According to the lawsuit, Reddit sent a cease-and-desist missive to Perplexity successful May 2024 “demanding that it halt scraping Reddit data.” While Perplexity told Reddit astatine the clip that it didn’t usage Reddit contented to bid AI models and that it would respect Reddit’s robots.txt, aft that letter, the measurement of Reddit citations connected Perplexity really increased. Reddit besides created a station that could lone beryllium crawled by Google, and “within hours,” Perplexity “ produced the contents” of that post, the institution says.

“The lone mode that Perplexity could person obtained that Reddit contented and past utilized it successful its ‘answer engine’ is if it and/or its Co-Defendants scraped Google SERPs for that Reddit contented and Perplexity past rapidly incorporated that information into its reply engine,” Reddit writes.

Reddit’s information — posts connected each sorts of topics written by and ranked by humans — is hugely adjuvant to assistance bid AI models, and the institution knows it; the API changes that sparked the 2023 protests were positioned arsenic a mode for the institution to beryllium compensated for that data. Reddit has struck deals with AI companies including OpenAI and Google, and it reportedly wants amended ones. And Reddit has antecedently taken ineligible enactment against Anthropic, alleging that Anthropic’s bots accessed Reddit’s level adjacent aft Anthropic said they wouldn’t beryllium doing that.

“AI companies are locked successful an arms contention for prime quality contented — and that unit has fueled an industrial-scale ‘data laundering’ economy,” Ben Lee, Reddit’s main ineligible officer, says successful a statement. “Scrapers bypass technological protections to bargain data, past merchantability it to clients bare for grooming material. Reddit is simply a premier people due to the fact that it’s 1 of the largest and astir dynamic collections of quality speech ever created.

“Defendants Oxylabs UAB, AWM Proxy, and SerpAI — a Lithuanian information scraper, a erstwhile Russian botnet, and a institution that openly advertises its shady circumvention tactics — are textbook examples of this amerciable behavior,” Lee says. “Unable to scrape Reddit directly, they disguise their identities, fell their locations, and disguise their web scrapers to bargain Reddit contented from Google Search. Perplexity is simply a consenting lawsuit of astatine slightest 1 of these scrapers, choosing to bargain stolen information alternatively than participate into a lawful statement with Reddit itself.”

“Perplexity has not yet received the lawsuit, but we volition ever combat vigorously for users’ rights to freely and reasonably entree nationalist knowledge,” Jesse Dwyer, Perplexity’s caput of communication, tells The Verge. “Our attack remains principled and liable arsenic we supply factual answers with close AI, and we volition not tolerate threats against openness and the nationalist interest.”

Read Entire Article