What Happened to The Dead Web: Recovering Lost Digital Content?
The 'Dead Web' refers to the vast amount of digital content that has become inaccessible or lost due to technological obsolescence, data degradation, and website shutdowns. Efforts to recover and preserve this content are ongoing, spearheaded by organizations like the Internet Archive and the National Digital Stewardship Alliance, facing new challenges in 2026 from AI's impact on web archiving and the continuous 'treadmill problem' of data migration.
Quick Answer
The 'Dead Web' is a critical concept highlighting the fragility of digital information, with studies in 2024-2026 revealing that a significant portion of older web content is no longer accessible. Recovery efforts are led by major web archiving initiatives such as the Internet Archive's Wayback Machine, which reached one trillion archived web pages in October 2025. However, these efforts face new hurdles in 2026, including cyberattacks, the 'treadmill problem' of continuous data migration, and a growing trend of news publishers blocking archiving crawlers due to concerns about AI training data. Despite these challenges, AI is also being integrated into preservation workflows to improve efficiency and metadata quality, while new storage technologies like synthetic DNA are being explored for long-term solutions.
📊Key Facts
📅Complete Timeline15 events
Launch of End of Term Web Archive
The End of Term Web Archive initiative begins, a collaborative effort to preserve U.S. government websites during presidential transitions.
Media Art Preservation Project Initiated
The Media Art Preservation project begins, bringing together international conservators, art historians, and experts to discuss the conservation and preservation of media art.
MAPS 2020 Conference 'The Dead Web – The End'
The Ludwig Museum – Museum of Contemporary Art in Budapest hosts the MAPS 2020 conference and an exhibition titled 'The Dead Web – The End,' focusing on the obsolescence of the internet and media art preservation.
Archive-It Releases ARCH Software
Archive-It releases ARCH software, designed for computational datasets of web archives, enhancing tools available for digital preservation.
Ahrefs Study on Link Rot Published
Ahrefs publishes a study revealing that at least 66.5% of links to websites over the previous nine years had 'rotted,' highlighting the scale of digital content loss.
DatacomIT Article on Digital Information Longevity
DatacomIT publishes an article detailing the threats to digital information longevity, including technological obsolescence, data degradation, and lack of standards.
Pew Research Center Study on Webpage Disappearance
Pew Research Center releases a study, 'When Online Content Disappears,' finding that 38% of webpages from 2013 were no longer accessible a decade later, and a quarter of all webpages between 2013-2023 were gone.
Internet Archive Suffers Cyberattacks
The Internet Archive experiences a series of severe cyberattacks, resulting in the theft of user databases and compromised support systems, exposing vulnerabilities in its infrastructure.
Wayback Machine Reaches One Trillion Pages
The Internet Archive's Wayback Machine achieves the significant milestone of archiving one trillion web pages, demonstrating its ongoing efforts in digital preservation.
Preservica Report on AI's Impact on Digital Preservation
Preservica publishes a report highlighting the increasing adoption of AI in digital preservation workflows for tasks like metadata quality and backlog reduction, while emphasizing the need for human oversight.
News Publishers Block Internet Archive Crawlers
Major news publishers, including The New York Times and The Guardian, begin restricting Internet Archive crawlers due to concerns about AI companies using archived data for training language models.
Library of Congress Hosts Digital Storage Architecture Meeting
The Library of Congress hosts its 20th 'Designing Storage Architectures for Digital Collections' meeting, bringing together experts to discuss challenges and solutions in digital storage, including the impact of AI.
Brewster Kahle Named Computer History Museum Fellow
Brewster Kahle, founder of the Internet Archive, is named a 2026 Fellow by the Computer History Museum for his pioneering work in online search and digital preservation.
National Summit on Local News Preservation
The Internet Archive, Investigative Reporters and Editors (IRE), and the Poynter Institute co-host a National Summit on Local News Preservation to develop collaborative solutions for archiving local news.
Archiving 2026 Conference in Boston
The Archiving 2026 conference takes place in Boston, focusing on innovation, technology (including AI), community engagement, sustainable practices, and digital ethics in cultural heritage preservation.
🔍Deep Dive Analysis
The concept of 'The Dead Web' encapsulates the pervasive issue of digital content loss and inaccessibility, a growing concern as our society becomes increasingly digitized. This phenomenon, often termed 'link rot' or 'content decay,' results from various factors including rapid technological obsolescence of hardware, software, and file formats, data degradation over time (bit rot), lack of standardized preservation practices, and the ephemeral nature of many online platforms and websites. Studies in 2024 highlighted the severity of this problem, with the Pew Research Center reporting that 38% of webpages from 2013 were inaccessible a decade later, and roughly 25% of all sampled pages between 2013 and 2023 had disappeared. An Ahrefs study in 2024 found that 66.5% of links from the previous nine years were dead, a figure that accelerated to 71.2% for links acquired before 2020 by 2026, with the average lifespan of a backlink on news sites shrinking to just 18.4 months.
Key turning points in addressing this issue have involved the establishment and growth of dedicated digital preservation organizations. The Internet Archive, with its Wayback Machine, remains a cornerstone, reaching the milestone of one trillion archived web pages in October 2025. Collaborative initiatives like the End of Term Web Archive, involving partners such as the Library of Congress and the National Archives and Records Administration, continue to preserve U.S. government websites during presidential transitions, with the 2024/2025 crawl collecting over 500 terabytes of material and being stored on the Filecoin network for long-term access. The National Digital Stewardship Alliance (NDSA) also plays a crucial role, uniting a consortium of organizations committed to digital heritage preservation.
However, the landscape of digital preservation is evolving rapidly, presenting new challenges in 2026. The increasing influence of Artificial Intelligence (AI) has introduced a significant tension: while AI is being adopted to improve archiving workflows, metadata quality, and content discoverability, concerns about AI companies scraping archived data for training models have led major news publishers, including The New York Times and The Guardian, to block Internet Archive crawlers in 2025-2026. This creates a dilemma between open access for historical preservation and content creators' desire to control and monetize their data. Furthermore, the 'treadmill problem' persists, where digital content requires continuous, expensive migration to new formats to combat technological obsolescence, with the U.S. National Archives tracking over 700 file formats and revising preservation plans quarterly. Cyberattacks also pose a constant threat, as evidenced by the severe attacks on the Internet Archive in October 2024, which compromised user databases and support systems.
Despite these hurdles, innovation continues. The Archiving 2026 conference in Boston focused on advancing emerging technologies in digitization, AI, and sustainable preservation practices. There's growing interest in novel data storage media, such as high-density optical and synthetic DNA solutions, for ultra-long-term archiving, though commercial viability for DNA storage still faces cost and speed challenges. AI is also being leveraged to assist in content refresh workflows for active websites, helping brands maintain relevance and visibility in the AI search era of 2026. The ongoing efforts underscore a critical understanding: digital preservation is not a one-time task but an continuous, complex endeavor requiring technological innovation, collaborative initiatives, and robust strategies to safeguard our collective digital heritage for future generations.
What If...?
Explore alternate histories. What if The Dead Web: Recovering Lost Digital Content made different choices?