📌 tech|science|cultureConcept1 views4 min read

What Happened to Recovering the Dead Web?

Recovering the Dead Web refers to the ongoing global effort to combat 'link rot' – the phenomenon of web content disappearing or becoming inaccessible over time – and to preserve digital information for future generations. This critical task is primarily undertaken by institutions like the Internet Archive, academic researchers, and open-source communities, who develop tools and strategies to archive the ephemeral nature of online data. Recent years have also seen the concept intersect with concerns about the 'Dead Internet Theory,' as the proliferation of AI-generated content raises new questions about the authenticity and value of the preserved web.

Share:

Quick Answer

Recovering the Dead Web addresses the pervasive problem of 'link rot,' where a significant portion of online content vanishes over time due to website changes, domain expirations, or content removal. Key players like the Internet Archive's Wayback Machine actively preserve billions of web pages, though a substantial amount of data remains lost or endangered. As of 2026, efforts continue to evolve, facing new challenges from increasingly dynamic websites, anti-bot measures, and the growing influx of AI-generated content, which complicates the definition of what constitutes valuable 'human' web history.

📊Key Facts

Percentage of URLs from 1996-2021 found dead in 2023
65%
Old Dominion University (ODU) study, 2023
Percentage of webpages from 2013 no longer accessible in 2023
38%
Pew Research Center, 2024
Percentage of dead pages rescued by Wayback Machine (from Pew study)
15%
Internet Archive analysis, 2026
Total web pages archived by Wayback Machine (as of Oct 2025)
Over 1 trillion
Wikipedia, 2025
Percentage of top 10 million domains genuinely dead (as of June 2026)
14.2%
Crawlora study, 2026
Percentage of new websites AI-generated or AI-assisted (May 2025)
35.3%
Stanford, Imperial College, Internet Archive study, 2026

📅Complete Timeline15 events

1
March 1995Major

Internet Archive Begins Archiving Web Pages

The Internet Archive, founded by Brewster Kahle, begins archiving cached web pages, laying the groundwork for large-scale digital preservation.

2
1996Critical

Internet Archive Founded

Brewster Kahle establishes the Internet Archive, a non-profit organization dedicated to building a digital library of Internet sites and other cultural artifacts in digital form.

3
October 25, 2001Critical

Wayback Machine Launched Publicly

The Internet Archive launches the Wayback Machine, providing public access to its vast collection of archived web pages, allowing users to view historical versions of websites.

4
October 2011Notable

NDSA Web Archiving Working Group Established

The National Digital Stewardship Alliance (NDSA) establishes its Web Archiving Working Group to conduct surveys and analyze changes in the field of web archiving.

5
October 9, 2015Notable

Hiberlink Project Measures Reference Rot

The Hiberlink project, a collaboration including the University of Edinburgh and Los Alamos National Laboratory, begins working to measure 'reference rot' in online academic articles.

6
2021Major

Jonathan Zittrain Publishes 'The Internet Is Rotting'

Jonathan Zittrain's article in The Atlantic highlights the pervasive issue of link rot, reporting that 25% of deep links from New York Times articles had rotted, with 72% of older links from 1998 being dead.

7
March 22, 2022Major

Saving Ukrainian Cultural Heritage Online (SUCHO) Initiative Launched

In response to the conflict, the SUCHO initiative, supported by the Internet Archive, mobilizes volunteers to archive Ukrainian cultural heritage websites, saving terabytes of data.

8
2023Major

ODU Study Reports 65% of Sampled URLs Dead

A longitudinal study from Old Dominion University (ODU) finds that about 65% of 27.3 million URLs sampled from the Wayback Machine between 1996 and 2021 were dead on the live web when checked in 2023.

9
2024Major

Pew Research Center Publishes 'When Online Content Disappears'

The Pew Research Center releases a significant study revealing that 38% of webpages from 2013 were no longer accessible a decade later, and a quarter of all pages sampled from 2013-2023 were inaccessible.

10
May 2025Critical

Study Finds Over a Third of New Websites Are AI-Generated/Assisted

A collaborative study by Stanford University, Imperial College London, and the Internet Archive reports that 35.3% of all new websites published between 2022 and 2025 were AI-generated or AI-assisted, with 17.6% entirely AI-generated.

11
October 2025Major

Wayback Machine Archives Over 1 Trillion Web Pages

The Wayback Machine reaches a significant milestone, having archived over 1 trillion web pages and more than 99 petabytes of data, solidifying its role as the world's largest public web archive.

12
April 23, 2026Major

Internet Archive Blog Post 'Gone but Not Forgotten: Recovering the Dead Web'

Sawood Alam of the Internet Archive publishes a blog post summarizing recent link-rot studies and highlighting the Wayback Machine's role in rescuing dead web pages, while also discussing ongoing challenges and future directions.

13
May 27, 2026Major

Internet Archive Switzerland Launched to Archive AI Models

The Internet Archive Switzerland is launched, focusing on preserving endangered archives globally and embarking on the 'Gen AI Archive' project to begin archiving AI models, an emerging frontier for preservation.

14
June 5, 2026Major

IIPC Conference Discusses Web Archiving Challenges

The International Internet Preservation Consortium (IIPC) Web Archiving Conference in Brussels addresses the sustainability of open-source tools and the increasing difficulty of archiving a 'closing web' due to anti-bot measures and dynamic content.

15
June 14, 2026Major

Crawlora Study Revises 'Genuinely Dead' Web Percentage

A Crawlora study re-scans the top 10 million domains and reports that 14.2% are 'genuinely dead,' distinguishing them from sites merely blocking bots, a revision from earlier higher estimates.

🔍Deep Dive Analysis

The concept of 'Recovering the Dead Web' emerged from the inherent ephemerality of the internet, a phenomenon widely known as 'link rot' or 'reference rot.' This issue, where hyperlinks become inactive or lead to inaccessible content, has been a concern since the early days of the World Wide Web. Studies over the decades have consistently highlighted the alarming rate of digital decay; for instance, a 2023 study by Old Dominion University (ODU) found that approximately 65% of URLs sampled from the Wayback Machine between 1996 and 2021 were dead on the live web.

The primary driver behind link rot includes website redesigns, content migration, domain name expirations, server issues, and deliberate content removal. The consequences are profound, leading to a loss of cultural heritage, compromised academic research, and diminished credibility for online information. A key turning point in addressing this challenge was the establishment of the Internet Archive in 1996 and the public launch of its flagship service, the Wayback Machine, in 2001. This non-profit organization has become the largest public web archive, aiming to provide 'universal access to all knowledge' by preserving billions of web pages.

In recent years, the efforts to recover the dead web have intensified and diversified. The 2024 Pew Research Center study, 'When Online Content Disappears,' reported that 38% of webpages from 2013 were no longer accessible a decade later, and about a quarter of all pages sampled between 2013 and 2023 were inaccessible. The Internet Archive's analysis of this data showed that the Wayback Machine had rescued roughly 15% of those otherwise dead pages. Beyond large-scale archiving, open-source tools like ArchiveBox and Webrecorder, along with initiatives like Harvard Law School's Perma.cc, empower individuals and institutions to create their own archives.

As of 2026, the landscape of web preservation is evolving rapidly. The International Internet Preservation Consortium (IIPC) Web Archiving Conference in June 2026 highlighted the increasing difficulty of archiving due to dynamic, JavaScript-heavy pages, anti-bot measures, and login/paywalls that treat archival crawlers like commercial scrapers. There's a growing call for sustained investment in open-source web archiving infrastructure and a collective responsibility for digital preservation. Furthermore, the rise of generative AI has introduced new complexities, fueling discussions around the 'Dead Internet Theory,' which posits that much of the web is becoming dominated by AI-generated content rather than authentic human interaction. A May 2026 study by Stanford University, Imperial College London, and the Internet Archive found that over a third (35.3%) of new websites published between 2022 and 2025 were AI-generated or AI-assisted. This raises new questions about what content should be prioritized for preservation and how to distinguish human cultural output from automated 'slop.'

Despite these challenges, organizations like the Internet Archive continue to innovate, integrating feeds from sources like MediaCloud and GDELT, and joining initiatives like IndexNow for better link discovery. New regional efforts, such as the launch of Internet Archive Switzerland in May 2026, are focusing on endangered archives and the emerging frontier of archiving AI models. The ongoing work underscores that recovering the dead web is not just about technical solutions but also a continuous, collaborative effort to safeguard our digital cultural heritage in an increasingly complex online environment.

What If...?

Explore alternate histories. What if Recovering the Dead Web made different choices?

Explore Scenarios
Building relationship map...

People Also Ask

What is 'link rot'?
Link rot refers to the phenomenon where hyperlinks on the internet become obsolete or broken, leading to inaccessible content. This can happen due to website redesigns, content removal, domain expirations, or server issues.
How much of the web is 'dead'?
Estimates vary depending on the methodology and timeframe. A 2023 study found 65% of URLs from 1996-2021 were dead. A 2024 Pew Research study noted 38% of 2013 webpages were inaccessible a decade later. A June 2026 study by Crawlora found 14.2% of the top 10 million domains were genuinely dead.
What is the role of the Internet Archive in recovering the dead web?
The Internet Archive, through its Wayback Machine, is the largest public web archive, actively preserving billions of web pages. It plays a crucial role in rescuing otherwise lost content and providing historical access to websites.
What is the 'Dead Internet Theory' and how does it relate?
The 'Dead Internet Theory' suggests that online spaces are increasingly dominated by bots and AI-generated content, rather than human interaction. This relates to 'recovering the dead web' by raising concerns about the authenticity and value of content being preserved, especially as AI-generated content proliferates.
What are the current challenges in web archiving?
Current challenges include the increasing complexity of dynamic, JavaScript-heavy websites, anti-bot measures that hinder archival crawlers, resource limitations, and the need for sustained investment in open-source archiving tools. The rise of AI-generated content also poses new questions for preservation strategies.