📌 tech|science|cultureConcept2 views4 min read

What Happened to The Dead Web: Recovering Lost Digital Content?

The 'Dead Web' refers to the vast amount of digital content that has become inaccessible or lost due to technological obsolescence, data degradation, and website shutdowns. Efforts to recover and preserve this content are ongoing, spearheaded by organizations like the Internet Archive and the National Digital Stewardship Alliance, facing new challenges in 2026 from AI's impact on web archiving and the continuous 'treadmill problem' of data migration.

⚡

Quick Answer

The 'Dead Web' is a critical concept highlighting the fragility of digital information, with studies in 2024-2026 revealing that a significant portion of older web content is no longer accessible. Recovery efforts are led by major web archiving initiatives such as the Internet Archive's Wayback Machine, which reached one trillion archived web pages in October 2025. However, these efforts face new hurdles in 2026, including cyberattacks, the 'treadmill problem' of continuous data migration, and a growing trend of news publishers blocking archiving crawlers due to concerns about AI training data. Despite these challenges, AI is also being integrated into preservation workflows to improve efficiency and metadata quality, while new storage technologies like synthetic DNA are being explored for long-term solutions.

📊Key Facts

Webpages from 2013 no longer accessible (2024)

38%

Pew Research Center

Links from 2013-2024 that are dead

66.5%

Ahrefs

Links acquired before 2020 that are dead/removed (2026)

71.2%

Ahrefs

Average lifespan of a backlink on news/media sites (2026)

18.4 months

Ahrefs

Wayback Machine archived pages (Oct 2025)

1 trillion

Internet Archive

U.S. National Archives tracked file formats

742+

Micrographics Data

📅Complete Timeline15 events

2008Major

Launch of End of Term Web Archive

The End of Term Web Archive initiative begins, a collaborative effort to preserve U.S. government websites during presidential transitions.

2015Notable

Media Art Preservation Project Initiated

The Media Art Preservation project begins, bringing together international conservators, art historians, and experts to discuss the conservation and preservation of media art.

February 2020Major

MAPS 2020 Conference 'The Dead Web – The End'

The Ludwig Museum – Museum of Contemporary Art in Budapest hosts the MAPS 2020 conference and an exhibition titled 'The Dead Web – The End,' focusing on the obsolescence of the internet and media art preservation.

2022Notable

Archive-It Releases ARCH Software

Archive-It releases ARCH software, designed for computational datasets of web archives, enhancing tools available for digital preservation.

February 2, 2024Major

Ahrefs Study on Link Rot Published

Ahrefs publishes a study revealing that at least 66.5% of links to websites over the previous nine years had 'rotted,' highlighting the scale of digital content loss.

February 14, 2024Notable

DatacomIT Article on Digital Information Longevity

DatacomIT publishes an article detailing the threats to digital information longevity, including technological obsolescence, data degradation, and lack of standards.

May 2024Critical

Pew Research Center Study on Webpage Disappearance

Pew Research Center releases a study, 'When Online Content Disappears,' finding that 38% of webpages from 2013 were no longer accessible a decade later, and a quarter of all webpages between 2013-2023 were gone.

October 2024Major

Internet Archive Suffers Cyberattacks

The Internet Archive experiences a series of severe cyberattacks, resulting in the theft of user databases and compromised support systems, exposing vulnerabilities in its infrastructure.

October 2025Critical

Wayback Machine Reaches One Trillion Pages

The Internet Archive's Wayback Machine achieves the significant milestone of archiving one trillion web pages, demonstrating its ongoing efforts in digital preservation.

January 28, 2026Major

Preservica Report on AI's Impact on Digital Preservation

Preservica publishes a report highlighting the increasing adoption of AI in digital preservation workflows for tasks like metadata quality and backlog reduction, while emphasizing the need for human oversight.

February 6, 2026Critical

News Publishers Block Internet Archive Crawlers

Major news publishers, including The New York Times and The Guardian, begin restricting Internet Archive crawlers due to concerns about AI companies using archived data for training language models.

March 2026Major

Library of Congress Hosts Digital Storage Architecture Meeting

The Library of Congress hosts its 20th 'Designing Storage Architectures for Digital Collections' meeting, bringing together experts to discuss challenges and solutions in digital storage, including the impact of AI.

March 24, 2026Major

Brewster Kahle Named Computer History Museum Fellow

Brewster Kahle, founder of the Internet Archive, is named a 2026 Fellow by the Computer History Museum for his pioneering work in online search and digital preservation.

June 17, 2026Major

National Summit on Local News Preservation

The Internet Archive, Investigative Reporters and Editors (IRE), and the Poynter Institute co-host a National Summit on Local News Preservation to develop collaborative solutions for archiving local news.

July 20-23, 2026Major

Archiving 2026 Conference in Boston

The Archiving 2026 conference takes place in Boston, focusing on innovation, technology (including AI), community engagement, sustainable practices, and digital ethics in cultural heritage preservation.

🔍Deep Dive Analysis

The concept of 'The Dead Web' encapsulates the pervasive issue of digital content loss and inaccessibility, a growing concern as our society becomes increasingly digitized. This phenomenon, often termed 'link rot' or 'content decay,' results from various factors including rapid technological obsolescence of hardware, software, and file formats, data degradation over time (bit rot), lack of standardized preservation practices, and the ephemeral nature of many online platforms and websites. Studies in 2024 highlighted the severity of this problem, with the Pew Research Center reporting that 38% of webpages from 2013 were inaccessible a decade later, and roughly 25% of all sampled pages between 2013 and 2023 had disappeared. An Ahrefs study in 2024 found that 66.5% of links from the previous nine years were dead, a figure that accelerated to 71.2% for links acquired before 2020 by 2026, with the average lifespan of a backlink on news sites shrinking to just 18.4 months.

Key turning points in addressing this issue have involved the establishment and growth of dedicated digital preservation organizations. The Internet Archive, with its Wayback Machine, remains a cornerstone, reaching the milestone of one trillion archived web pages in October 2025. Collaborative initiatives like the End of Term Web Archive, involving partners such as the Library of Congress and the National Archives and Records Administration, continue to preserve U.S. government websites during presidential transitions, with the 2024/2025 crawl collecting over 500 terabytes of material and being stored on the Filecoin network for long-term access. The National Digital Stewardship Alliance (NDSA) also plays a crucial role, uniting a consortium of organizations committed to digital heritage preservation.

However, the landscape of digital preservation is evolving rapidly, presenting new challenges in 2026. The increasing influence of Artificial Intelligence (AI) has introduced a significant tension: while AI is being adopted to improve archiving workflows, metadata quality, and content discoverability, concerns about AI companies scraping archived data for training models have led major news publishers, including The New York Times and The Guardian, to block Internet Archive crawlers in 2025-2026. This creates a dilemma between open access for historical preservation and content creators' desire to control and monetize their data. Furthermore, the 'treadmill problem' persists, where digital content requires continuous, expensive migration to new formats to combat technological obsolescence, with the U.S. National Archives tracking over 700 file formats and revising preservation plans quarterly. Cyberattacks also pose a constant threat, as evidenced by the severe attacks on the Internet Archive in October 2024, which compromised user databases and support systems.

Despite these hurdles, innovation continues. The Archiving 2026 conference in Boston focused on advancing emerging technologies in digitization, AI, and sustainable preservation practices. There's growing interest in novel data storage media, such as high-density optical and synthetic DNA solutions, for ultra-long-term archiving, though commercial viability for DNA storage still faces cost and speed challenges. AI is also being leveraged to assist in content refresh workflows for active websites, helping brands maintain relevance and visibility in the AI search era of 2026. The ongoing efforts underscore a critical understanding: digital preservation is not a one-time task but an continuous, complex endeavor requiring technological innovation, collaborative initiatives, and robust strategies to safeguard our collective digital heritage for future generations.

What If...?

Explore alternate histories. What if The Dead Web: Recovering Lost Digital Content made different choices?

Explore Scenarios

Building relationship map...

❓People Also Ask

What is 'The Dead Web'?

'The Dead Web' refers to digital content that has become inaccessible or lost over time due to various factors like broken links, deleted pages, website shutdowns, and outdated technologies. It represents a significant challenge to preserving our digital heritage.

How much digital content is lost or inaccessible?

Studies indicate a substantial loss of digital content. For example, a 2024 Pew Research Center study found that 38% of webpages from 2013 were no longer accessible a decade later, and an Ahrefs study in 2026 reported that 71.2% of links acquired before 2020 were dead or removed.

What are the main causes of digital content loss?

Key causes include technological obsolescence (outdated hardware, software, and file formats), data degradation (bit rot), website shutdowns, content deletion by users or platforms, and 'link rot' where hyperlinks no longer point to their intended content.

Who is working to recover and preserve lost digital content?

Major organizations like the Internet Archive (with its Wayback Machine), the National Digital Stewardship Alliance (NDSA), national libraries (e.g., Library of Congress), and academic institutions are actively involved in web archiving and digital preservation initiatives.

How is AI impacting digital preservation in 2026?

In 2026, AI is both a challenge and a solution. While AI tools are being integrated into preservation workflows to improve efficiency and metadata, concerns about AI companies scraping archived data have led some news publishers to block web archiving crawlers.

Back to Home