Cloudflare says Perplexity’s AI bots are ‘stealth crawling’ blocked sites

4 hours ago 1

The AI hunt startup Perplexity is allegedly skirting restrictions meant to halt its AI web crawlers from accessing definite websites, according to a study from Cloudflare. In the report, Cloudflare claims that erstwhile Perplexity encounters a block, the startup volition conceal its crawling individuality “in an effort to circumvent the website’s preferences.”

The study lone adds to concerns astir Perplexity vacuuming up contented without permission, arsenic the institution got caught barging past paywalls and ignoring sites’ robots.txt files past year. At the time, Perplexity CEO Aravind Srinivas blamed the activity connected third-party crawlers utilized by the site.

Now, Cloudflare, 1 of the world’s biggest net architecture providers, says it received complaints from customers who claimed that Perplexity’s bots inactive had entree to their websites adjacent aft putting their penchant successful their websites’ robots.txt file and by creating Web Application Firewall (WAF) rules to restrict entree to the startup’s AI bots.

To trial this, Cloudflare says it created caller domains with akin restrictions against Perplexity’s AI scrapers. It recovered that the startup volition archetypal effort to entree the sites by identifying itself arsenic the names of its crawlers: “PerplexityBot” oregon “Perplexity-User.”

But if the website has restrictions against AI scraping, Cloudflare claims Perplexity volition alteration its idiosyncratic cause — the spot of accusation that tells a website what benignant of browser and instrumentality you’re using, oregon if the visitant is simply a bot — to “impersonate Google Chrome connected macOS.” Cloudflare says this “undeclared crawler” uses “rotating” IP addresses that the company doesn’t include connected the database of IP addresses utilized by its bots.

Additionally, Cloudflare claims that Perplexity changes its autonomous strategy networks (ASN), a fig utilized to place groups of IP networks controlled by a azygous operator, to get astir blocks arsenic well. “This enactment was observed crossed tens of thousands of domains and millions of requests per day,” Cloudflare writes.

In a connection to The Verge, Perplexity spokesperson Jesse Dwyer called Cloudflare’s study a “publicity stunt,” adding that “there are a batch of misunderstandings successful the blog post.” Cloudflare has since de-listed Perplexity arsenic a verified bot and has rolled retired methods to artifact Perplexity’s “stealth crawling.” 

Cloudflare CEO Matthew Prince has been outspoken about AI’s “existential threat” to publishers. Last month, the institution started letting websites ask AI companies to wage to crawl their content, and began blocking AI crawlers by default.

Read Entire Article