Companies block Internet Archive crawlers, threatening digital history preservation

theweek.com

The Internet Archive faces a significant threat as companies increasingly block its web crawlers, jeopardizing the preservation of digital history. This restriction is largely driven by concerns over large language models using archived data for AI training without permission, leading major news outlets like The New York Times and Reddit to deny access. The Internet Archive, a non-profit, has preserved trillions of websites for 30 years, but losing access to news sources could result in the irretrievable loss of early digital records.


With a significance score of 4.2, this news ranks in the top 4.6% of today's 32488 analyzed articles.

Get summaries of news with significance over 5.5 (usually ~10 stories per week). Read by 10,000+ subscribers:


Companies block Internet Archive crawlers, threatening digital history preservation | News Minimalist