What Happened to The Dead Web / Digital Preservation?
The 'Dead Web' refers to the pervasive loss of digital content due to technological obsolescence, link rot, and website disappearances, a phenomenon exacerbated by the sheer volume of data and the rise of AI-generated content. Digital preservation efforts, led by institutions like the Internet Archive and the Library of Congress, are actively combating this decay through web archiving, format migration, and the development of new storage solutions. As of mid-2026, the field is grappling with the dual impact of AI—both as a tool for preservation and a source of content authenticity challenges—while also focusing on environmental sustainability and collaborative infrastructure.
Quick Answer
The 'Dead Web' describes the alarming rate at which digital content vanishes, with recent studies showing a significant portion of older webpages are no longer accessible. This digital decay is driven by technological obsolescence, bit rot, and content deletion. In response, digital preservation initiatives are working to archive the web, migrate formats, and develop resilient storage solutions like DNA and advanced optical media. However, these efforts are increasingly challenged by cyberattacks, legal disputes, and news publishers blocking archiving crawlers due to concerns about AI scraping. As of July 2026, the digital preservation community is actively integrating AI into workflows while simultaneously addressing its threats to content authenticity and prioritizing environmental sustainability in long-term strategies.
📊Key Facts
📅Complete Timeline15 events
Internet Archive Founded
The Internet Archive is founded by Brewster Kahle, beginning its mission to build a digital library of Internet sites and other cultural artifacts in digital form, including the Wayback Machine. (Source: Wikipedia)
PREMIS Data Dictionary Developed
The PREservation Metadata Implementation Strategies (PREMIS) Data Dictionary for Preservation Metadata is developed, becoming an international standard for metadata needed when digital objects are stored in repositories. (Source: University of Montana, 2019)
Library of Congress Initiates DSA Meetings
The Library of Congress organizes its first 'Designing Storage Architectures for Digital Collections (DSA)' meeting to discuss unique data storage requirements for digital preservation. (Source: Library of Congress, 2026)
Emergence of 'Dead Internet Theory' Concerns
The 'Dead Internet Theory' begins circulating, positing that the internet is increasingly populated by bots and AI-generated content, diminishing genuine human interaction. (Source: Grokipedia, 2024; Elm Marketing, 2025)
Court Ruling Against Internet Archive's Digital Lending
A court rules against the Internet Archive in a lawsuit filed by major publishers (Hachette, HarperCollins, Penguin Random House, and Wiley) over its digital book lending program, Open Library. (Source: Web Archive in 2026, 2026)
Pew Research Center Report on Disappearing Web Content
Pew Research Center publishes 'When Online Content Disappears,' revealing that 25% of webpages from 2013-2023 are no longer accessible, and 38% of 2013 pages are gone. (Source: Pew Research Center, 2024)
Internet Archive Cyberattacks
The Internet Archive experiences a series of severe cyberattacks, leading to the theft of a user database affecting 31 million accounts. (Source: Web Archive in 2026, 2026)
Appeals Court Upholds Ruling Against Internet Archive
The appeals court upholds the March 2023 ruling against the Internet Archive regarding its digital book lending, leading to the removal of over 500,000 books from Open Library. (Source: Web Archive in 2026, 2026)
2024/2025 End of Term Web Archive Update
The Internet Archive announces that its 2024/2025 End of Term Web Archive project has collected over 500 terabytes of U.S. government material, including 100 million unique web pages, with plans to upload to the Filecoin network. (Source: Internet Archive, 2025)
Sam Altman's 'Dead Internet Theory' Tweet
OpenAI CEO Sam Altman posts on X (formerly Twitter) acknowledging the 'Dead Internet Theory,' stating 'it seems like there are really a lot of LLM-run twitter accounts now,' sparking wider discussion. (Source: Wikipedia, 2024)
Developments in DNA and Optical Storage for Archiving
Forbes reports on 2025-2026 developments in synthetic DNA storage (e.g., Atlas Data Storage, Iridia) and new optical storage archiving systems, projecting 1PB raw capacity optical cartridges by the 2030s as contenders for digital archives. (Source: Forbes, 2026)
AI's Impact on Digital Preservation in 2026
Preservica publishes an analysis on AI's impact, stating it's no longer experimental in archiving and is becoming part of daily workflows for metadata quality and content discovery, while also posing new threats to authenticity. (Source: Preservica, 2026)
Library of Congress Hosts 20th DSA Meeting
The Library of Congress hosts its 20th 'Designing Storage Architectures for Preservation Collections' meeting, bringing together experts to discuss digital storage advancements, challenges, and solutions, including the impact of AI on storage demand. (Source: Library of Congress, 2026)
NDSA Releases Levels of Digital Preservation v2.1 with Sustainability Focus
The National Digital Stewardship Alliance (NDSA) releases version 2.1 of its 'Levels of Digital Preservation,' incorporating environmental sustainability considerations into its recommendations and supporting resources. (Source: NDSA, 2026)
News Publishers Block Internet Archive Crawlers
Nieman Lab reports that over 340 local news sites, owned by major publishers, are limiting the Internet Archive's ability to access and preserve their stories due to concerns about AI scraping, threatening the historical record. (Source: Nieman Lab, 2026)
🔍Deep Dive Analysis
The concept of the 'Dead Web' highlights the fragility of digital information, a stark contrast to the once-held belief that 'the internet is forever.' This phenomenon, also known as 'digital decay' or 'link rot,' describes the widespread loss of online content over time. A significant portion of older webpages, including news articles, government documents, and social media posts, has become inaccessible. For instance, a May 2024 Pew Research Center study revealed that approximately 25% of webpages sampled from 2013-2023 were no longer available as of October 2023, with 38% of pages from 2013 having vanished entirely (Source: Pew Research Center, 2024). This loss is attributed to several factors, including website discontinuation, individual page deletion, and technical issues like server failures (Source: PPC Land, 2024).
Digital preservation emerged as a formal process to counteract this decay, aiming to ensure long-term access and usability of digital information. Its history dates back to the early days of digital record-keeping, with key milestones including the development of standards like the Open Archival Information System (OAIS) Reference Model and PREMIS (PREservation Metadata Implementation Strategies) (Source: University of Montana, 2019). Institutions like the Internet Archive, founded in 1996, and national libraries such as the Library of Congress, have been at the forefront of web archiving and digital stewardship, collecting vast amounts of born-digital and digitized content (Source: Wikipedia, Internet Archive).
However, the challenges to digital preservation are escalating. Technological obsolescence remains a 'treadmill problem,' requiring continuous migration of files to new formats, with the U.S. National Archives tracking over 700 different file formats (Source: YouTube, 2026). 'Bit rot,' the gradual corruption of digital files, also poses a constant threat (Source: Revolution Data Systems, 2025). More recently, the rise of generative AI has introduced new complexities. While AI is being integrated into preservation workflows to improve metadata quality and content discovery, it also presents a significant threat to the authenticity of digital records, with concerns about AI-generated fakes and the proliferation of new, specialized file formats (Source: Preservica, 2026; YouTube, 2026). The 'Dead Internet Theory,' a concept gaining renewed interest in the 2020s, posits that much of the internet's content and activity is now generated by bots and AI, further blurring the lines of authenticity (Source: Grokipedia, 2024; Forbes, 2024).
Recent years have seen critical turning points. In October 2024, the Internet Archive suffered severe cyberattacks, resulting in the theft of a user database affecting 31 million accounts (Source: Web Archive in 2026, 2026). Concurrently, major news publishers, including The New York Times and The Guardian, began blocking Internet Archive crawlers in 2025-2026, fearing that AI companies would scrape their content for training language models (Source: Web Archive in 2026, 2026; Nieman Lab, 2026). This has led to over 340 local news sites limiting the Archive's access by May 2026, threatening the historical record of journalism (Source: Nieman Lab, 2026). Legal battles, such as the 2023/2024 lawsuit against the Internet Archive over its digital book lending program, further complicate preservation efforts (Source: Web Archive in 2026, 2026).
As of July 2026, the digital preservation landscape is characterized by both innovation and heightened urgency. The Library of Congress continues to host its 'Designing Storage Architectures' meetings (March 2026) and updates its Recommended Formats Statement (2025-2026) to guide sustainable practices (Source: Library of Congress, 2026; Digital Preservation Coalition, 2025). New storage technologies, including synthetic DNA storage and advanced optical media, are being explored for their potential long-term archival capabilities (Source: Forbes, 2026). Environmental sustainability has also become a key focus, with the National Digital Stewardship Alliance (NDSA) releasing version 2.1 of its 'Levels of Digital Preservation' in March 2026, incorporating environmental considerations (Source: NDSA, 2026). Collaborative initiatives, such as the C2PA (Content Authenticity Initiative), are working on embedding provenance metadata into digital assets to combat misinformation and deepfakes (Source: UC3 California Digital Library, 2026). The ongoing fight against the 'Dead Web' requires a multi-faceted approach, balancing technological advancements with robust policies, sustained funding, and international collaboration to safeguard our collective digital heritage.
What If...?
Explore alternate histories. What if The Dead Web / Digital Preservation made different choices?