Summarize

iFixit claims Anthropic's AI scraper hit website 1 million times

By Akash Pandey

Jul 26, 2024

12:43 pm

What's the story

Anthropic, an artificial intelligence (AI) firm, is facing accusations of breaching anti-AI scraping policies with its ClaudeBot web crawler. The crawler reportedly accessed the iFixit website nearly a million times within 24 hours, seemingly infringing on the repair company's Terms of Use. iFixit CEO Kyle Wiens publicized the issue on X, sharing images that depict Anthropic's chatbot acknowledging that iFixit's content was off-limits.

iFixit CEO highlights alleged policy violation

Wiens stated, "If any of those requests accessed our terms of service, they would have told you that use of our content [is] expressly forbidden." He accused Anthropic not only of using their content without payment but also straining their DevOps resources. Wiens further explained to The Verge the crawling rate was so high it triggered alarms and engaged their DevOps team. Despite being used to handling web crawlers due to high traffic, Wiens described this incident as an anomaly.

Company response

Anthropic responds to accusations, iFixit implements measures

In response to the violation allegations, Anthropic referred 404 Media to an FAQ page stating its crawler can only be blocked via a robots.txt file extension. Following this incident, iFixit added the crawl-delay extension to its robots.txt. Wiens confirmed that Anthropic's crawler ceased activity after this addition. Jennifer Martinez, an Anthropic spokesperson, told The Verge that they respect robots.txt and their crawler respected the signal when iFixit implemented it.

Widespread issue

Other websites report aggressive scraping by Anthropic

iFixit is not the only site to report aggressive scraping by Anthropic's crawler. Eric Holscher, co-founder of Read the Docs, and Matt Barrie, CEO of Freelancer.com, also reported similar experiences in Wiens's thread. Reddit threads from several months ago also noted a significant increase in Anthropic's web scraping activities. In April this year, the Linux Mint web forum attributed a site outage to strain caused by ClaudeBot's scraping activities.

Web crawling dilemma

Robots.txt: A common yet limited defense against AI scraping

Many AI companies like OpenAI use robots.txt files to opt-out of web crawlers. However, this method does not provide website owners with flexibility to denote what scraping is and isn't permitted. Another AI company, Perplexity, has been known to ignore robots.txt exclusions completely. Despite its limitations, robots.txt remains one of the few options available for companies to keep their data out of AI training materials. This method has been applied by Reddit in its recent crackdown on web crawlers.