
Internet Archive can't archive Reddit anymore: Here's why
What's the story
Reddit has announced a major change in its data accessibility policy. The platform will now restrict the Internet Archive's Wayback Machine from crawling and archiving most of its content. The move comes amid concerns that AI companies have been using the tool to scrape Reddit data, potentially violating user privacy and the platform's content policies. Mark Graham, director of the Wayback Machine, has confirmed that discussions with Reddit are ongoing.
Policy breach
AI companies accessing Reddit data without consent
Reddit spokesperson Tim Rathschmidt has revealed that several AI firms have been caught accessing Reddit's content through the Wayback Machine without adhering to its terms of service. This includes scraping posts, comments, and even deleted or removed content. Such activities pose a major challenge for Reddit in managing and protecting its content effectively.
Archival impact
Impact on internet archiving
The Wayback Machine, a popular tool run by the Internet Archive, captures snapshots of websites over time. However, with Reddit's new restriction, it will no longer archive specific Reddit pages like posts or user profiles. This drastically reduces the amount of Reddit content saved by this service and limits public access to historical discussions and deleted data.
Data control
Reddit's broader strategy to control data access
The restriction on the Wayback Machine is part of Reddit's wider strategy to control how its data is accessed and used, especially by AI companies. The platform has taken several steps in this direction, including changing its application programming interfaces (APIs) to limit data scraping, negotiating paid data licenses with firms like Google and OpenAI, and even suing companies such as Anthropic for unauthorized data collection.