💻 techConcept1 views4 min read

What Happened to The Dead Web / Digital Preservation?

The 'Dead Web' refers to the pervasive loss of digital content due to technological obsolescence, link rot, and website disappearances, a phenomenon exacerbated by the sheer volume of data and the rise of AI-generated content. Digital preservation efforts, led by institutions like the Internet Archive and the Library of Congress, are actively combating this decay through web archiving, format migration, and the development of new storage solutions. As of mid-2026, the field is grappling with the dual impact of AI—both as a tool for preservation and a source of content authenticity challenges—while also focusing on environmental sustainability and collaborative infrastructure.

Share:

Quick Answer

The 'Dead Web' describes the alarming rate at which digital content vanishes, with recent studies showing a significant portion of older webpages are no longer accessible. This digital decay is driven by technological obsolescence, bit rot, and content deletion. In response, digital preservation initiatives are working to archive the web, migrate formats, and develop resilient storage solutions like DNA and advanced optical media. However, these efforts are increasingly challenged by cyberattacks, legal disputes, and news publishers blocking archiving crawlers due to concerns about AI scraping. As of July 2026, the digital preservation community is actively integrating AI into workflows while simultaneously addressing its threats to content authenticity and prioritizing environmental sustainability in long-term strategies.

📊Key Facts

Percentage of webpages from 2013-2023 no longer accessible (as of Oct 2023)
25%
Pew Research Center (May 2024)
Percentage of webpages from 2013 no longer accessible (as of Oct 2023)
38%
Pew Research Center (May 2024)
Percentage of news pages with at least one broken link
23%
Pew Research Center (May 2024)
Percentage of government websites with at least one broken link
21%
Pew Research Center (May 2024)
Percentage of tweets that may vanish within a few months
Up to 20%
Pew Research Center (May 2024)
Bot traffic as percentage of all web traffic (2024)
51%
Grokipedia (2024)
Data collected by Internet Archive's 2024/2025 End of Term Web Archive
500+ terabytes (100+ million unique web pages)
Internet Archive (Feb 2025)
Number of file formats tracked by U.S. National Archives (NARA)
742+
YouTube (April 2026)
Number of local news sites blocking Internet Archive crawlers (as of May 2026)
340+
Nieman Lab (May 2026)

📅Complete Timeline15 events

1
1996Critical

Internet Archive Founded

The Internet Archive is founded by Brewster Kahle, beginning its mission to build a digital library of Internet sites and other cultural artifacts in digital form, including the Wayback Machine. (Source: Wikipedia)

2
2003-2005Major

PREMIS Data Dictionary Developed

The PREservation Metadata Implementation Strategies (PREMIS) Data Dictionary for Preservation Metadata is developed, becoming an international standard for metadata needed when digital objects are stored in repositories. (Source: University of Montana, 2019)

3
2007Notable

Library of Congress Initiates DSA Meetings

The Library of Congress organizes its first 'Designing Storage Architectures for Digital Collections (DSA)' meeting to discuss unique data storage requirements for digital preservation. (Source: Library of Congress, 2026)

4
2016-2017Major

Emergence of 'Dead Internet Theory' Concerns

The 'Dead Internet Theory' begins circulating, positing that the internet is increasingly populated by bots and AI-generated content, diminishing genuine human interaction. (Source: Grokipedia, 2024; Elm Marketing, 2025)

5
March 2023Major

Court Ruling Against Internet Archive's Digital Lending

A court rules against the Internet Archive in a lawsuit filed by major publishers (Hachette, HarperCollins, Penguin Random House, and Wiley) over its digital book lending program, Open Library. (Source: Web Archive in 2026, 2026)

6
May 17, 2024Critical

Pew Research Center Report on Disappearing Web Content

Pew Research Center publishes 'When Online Content Disappears,' revealing that 25% of webpages from 2013-2023 are no longer accessible, and 38% of 2013 pages are gone. (Source: Pew Research Center, 2024)

7
October 2024Major

Internet Archive Cyberattacks

The Internet Archive experiences a series of severe cyberattacks, leading to the theft of a user database affecting 31 million accounts. (Source: Web Archive in 2026, 2026)

8
September 2024Major

Appeals Court Upholds Ruling Against Internet Archive

The appeals court upholds the March 2023 ruling against the Internet Archive regarding its digital book lending, leading to the removal of over 500,000 books from Open Library. (Source: Web Archive in 2026, 2026)

9
February 6, 2025Major

2024/2025 End of Term Web Archive Update

The Internet Archive announces that its 2024/2025 End of Term Web Archive project has collected over 500 terabytes of U.S. government material, including 100 million unique web pages, with plans to upload to the Filecoin network. (Source: Internet Archive, 2025)

10
September 2025Major

Sam Altman's 'Dead Internet Theory' Tweet

OpenAI CEO Sam Altman posts on X (formerly Twitter) acknowledging the 'Dead Internet Theory,' stating 'it seems like there are really a lot of LLM-run twitter accounts now,' sparking wider discussion. (Source: Wikipedia, 2024)

11
January 21, 2026Notable

Developments in DNA and Optical Storage for Archiving

Forbes reports on 2025-2026 developments in synthetic DNA storage (e.g., Atlas Data Storage, Iridia) and new optical storage archiving systems, projecting 1PB raw capacity optical cartridges by the 2030s as contenders for digital archives. (Source: Forbes, 2026)

12
January 28, 2026Major

AI's Impact on Digital Preservation in 2026

Preservica publishes an analysis on AI's impact, stating it's no longer experimental in archiving and is becoming part of daily workflows for metadata quality and content discovery, while also posing new threats to authenticity. (Source: Preservica, 2026)

13
March 9-10, 2026Major

Library of Congress Hosts 20th DSA Meeting

The Library of Congress hosts its 20th 'Designing Storage Architectures for Preservation Collections' meeting, bringing together experts to discuss digital storage advancements, challenges, and solutions, including the impact of AI on storage demand. (Source: Library of Congress, 2026)

14
March 24, 2026Major

NDSA Releases Levels of Digital Preservation v2.1 with Sustainability Focus

The National Digital Stewardship Alliance (NDSA) releases version 2.1 of its 'Levels of Digital Preservation,' incorporating environmental sustainability considerations into its recommendations and supporting resources. (Source: NDSA, 2026)

15
May 20, 2026Critical

News Publishers Block Internet Archive Crawlers

Nieman Lab reports that over 340 local news sites, owned by major publishers, are limiting the Internet Archive's ability to access and preserve their stories due to concerns about AI scraping, threatening the historical record. (Source: Nieman Lab, 2026)

🔍Deep Dive Analysis

The concept of the 'Dead Web' highlights the fragility of digital information, a stark contrast to the once-held belief that 'the internet is forever.' This phenomenon, also known as 'digital decay' or 'link rot,' describes the widespread loss of online content over time. A significant portion of older webpages, including news articles, government documents, and social media posts, has become inaccessible. For instance, a May 2024 Pew Research Center study revealed that approximately 25% of webpages sampled from 2013-2023 were no longer available as of October 2023, with 38% of pages from 2013 having vanished entirely (Source: Pew Research Center, 2024). This loss is attributed to several factors, including website discontinuation, individual page deletion, and technical issues like server failures (Source: PPC Land, 2024).

Digital preservation emerged as a formal process to counteract this decay, aiming to ensure long-term access and usability of digital information. Its history dates back to the early days of digital record-keeping, with key milestones including the development of standards like the Open Archival Information System (OAIS) Reference Model and PREMIS (PREservation Metadata Implementation Strategies) (Source: University of Montana, 2019). Institutions like the Internet Archive, founded in 1996, and national libraries such as the Library of Congress, have been at the forefront of web archiving and digital stewardship, collecting vast amounts of born-digital and digitized content (Source: Wikipedia, Internet Archive).

However, the challenges to digital preservation are escalating. Technological obsolescence remains a 'treadmill problem,' requiring continuous migration of files to new formats, with the U.S. National Archives tracking over 700 different file formats (Source: YouTube, 2026). 'Bit rot,' the gradual corruption of digital files, also poses a constant threat (Source: Revolution Data Systems, 2025). More recently, the rise of generative AI has introduced new complexities. While AI is being integrated into preservation workflows to improve metadata quality and content discovery, it also presents a significant threat to the authenticity of digital records, with concerns about AI-generated fakes and the proliferation of new, specialized file formats (Source: Preservica, 2026; YouTube, 2026). The 'Dead Internet Theory,' a concept gaining renewed interest in the 2020s, posits that much of the internet's content and activity is now generated by bots and AI, further blurring the lines of authenticity (Source: Grokipedia, 2024; Forbes, 2024).

Recent years have seen critical turning points. In October 2024, the Internet Archive suffered severe cyberattacks, resulting in the theft of a user database affecting 31 million accounts (Source: Web Archive in 2026, 2026). Concurrently, major news publishers, including The New York Times and The Guardian, began blocking Internet Archive crawlers in 2025-2026, fearing that AI companies would scrape their content for training language models (Source: Web Archive in 2026, 2026; Nieman Lab, 2026). This has led to over 340 local news sites limiting the Archive's access by May 2026, threatening the historical record of journalism (Source: Nieman Lab, 2026). Legal battles, such as the 2023/2024 lawsuit against the Internet Archive over its digital book lending program, further complicate preservation efforts (Source: Web Archive in 2026, 2026).

As of July 2026, the digital preservation landscape is characterized by both innovation and heightened urgency. The Library of Congress continues to host its 'Designing Storage Architectures' meetings (March 2026) and updates its Recommended Formats Statement (2025-2026) to guide sustainable practices (Source: Library of Congress, 2026; Digital Preservation Coalition, 2025). New storage technologies, including synthetic DNA storage and advanced optical media, are being explored for their potential long-term archival capabilities (Source: Forbes, 2026). Environmental sustainability has also become a key focus, with the National Digital Stewardship Alliance (NDSA) releasing version 2.1 of its 'Levels of Digital Preservation' in March 2026, incorporating environmental considerations (Source: NDSA, 2026). Collaborative initiatives, such as the C2PA (Content Authenticity Initiative), are working on embedding provenance metadata into digital assets to combat misinformation and deepfakes (Source: UC3 California Digital Library, 2026). The ongoing fight against the 'Dead Web' requires a multi-faceted approach, balancing technological advancements with robust policies, sustained funding, and international collaboration to safeguard our collective digital heritage.

What If...?

Explore alternate histories. What if The Dead Web / Digital Preservation made different choices?

Explore Scenarios
Building relationship map...

People Also Ask

What is the 'Dead Web'?
The 'Dead Web' refers to the phenomenon where a significant portion of online content becomes inaccessible or disappears over time. This includes broken links, deleted webpages, and entire websites going offline, leading to a loss of historical and cultural information.
Why is digital content disappearing?
Digital content disappears due to several factors, including technological obsolescence (outdated file formats, hardware, and software), 'bit rot' (data corruption), website owners deleting content or shutting down sites, and the dynamic nature of the internet leading to broken links.
What is digital preservation?
Digital preservation is a formal process involving policies, strategies, and actions to ensure that digital information of continuing value remains accessible and usable for the long term. It addresses challenges like media failure and technological change to maintain the authenticity and integrity of digital content.
How does AI impact digital preservation in 2026?
In 2026, AI is both a tool and a threat to digital preservation. It's being adopted to improve metadata quality and streamline archiving workflows. However, generative AI also raises concerns about content authenticity, the creation of deepfakes, and the proliferation of new file formats, complicating the 'single source of truth'.
Are major news sites blocking web archiving efforts?
Yes, as of 2025-2026, major news publishers, including The New York Times and The Guardian, have begun blocking web archiving crawlers like those used by the Internet Archive. This is primarily due to concerns that AI companies are scraping their content for training language models, which threatens the preservation of journalistic history.