Athena Chapekis et al. at Pew:
Athena Chapekis et al. at Pew:The internet is an unimaginably vast repository of modern life, with hundreds of billions of indexed webpages. But even as users across the world rely on the web to access books, images, news articles and other resources, this content sometimes disappears from view.
A new Pew Research Center analysis shows just how fleeting online content actually is:
- A quarter of all webpages that existed at one point between 2013 and 2023 are no longer accessible, as of October 2023. In most cases, this is because an individual page was deleted or removed on an otherwise functional website.
- For older content, this trend is even starker. Some 38% of webpages that existed in 2013 are not available today, compared with 8% of pages that existed in 2023.
This “digital decay” occurs in many different online spaces. We examined the links that appear on government and news websites, as well as in the “References” section of Wikipedia pages as of spring 2023. This analysis found that:
- 23% of news webpages contain at least one broken link, as do 21% of webpages from government sites. News sites with a high level of site traffic and those with less are about equally likely to contain broken links. Local-level government webpages (those belonging to city governments) are especially likely to have broken links.
- 54% of Wikipedia pages contain at least one link in their “References” section that points to a page that no longer exists.
To see how digital decay plays out on social media, we also collected a real-time sample of tweets during spring 2023 on the social media platform X (then known as Twitter) and followed them for three months. We found that:
- Nearly one-in-five tweets are no longer publicly visible on the site just months after being posted. In 60% of these cases, the account that originally posted the tweet was made private, suspended or deleted entirely. In the other 40%, the account holder deleted the individual tweet, but the account itself still existed.
- Certain types of tweets tend to go away more often than others. More than 40% of tweets written in Turkish or Arabic are no longer visible on the site within three months of being posted. And tweets from accounts with the default profile settings are especially likely to disappear from public view.