For subscribers

The internet is deciding what to forget

Do digital bananas in Hawaiian shirts chatting up pineapples need to be saved for posterity? Probably not.

Sign up now: Get ST's newsletters delivered to your inbox

Increasingly information online is culled as the number of web pages indexed by search engines has fallen.

Increasingly searches online are likely to hit a dead end as the number of web pages indexed by search engines has fallen, says the writer.

PHOTO: ADOBE STOCK

Elaine Moore

Google Preferred Source badge

The internet is so vast and all-consuming that it is easy to forget how fragile it can be. Do something embarrassing online and there is a good chance it will live there forever, shared without your consent. But not everything that is posted is permanent.

The last big study of webpages found that over a third available in 2013 were now inaccessible – leaving a trail of “link rot” in their wake.

Maybe you think this is a good thing. If you have ever scrolled back far enough to see your very first Facebook status update, you will probably wish that link was broken.

Right now, there is a trend for AI-generated videos of Love Island starring cartoon fruit that regularly get millions of views. Do digital bananas in Hawaiian shirts chatting up pineapples need to be saved for posterity? Probably not. But disentangling what will and will not matter to our collective cultural memory is proving difficult.

Efforts to save absolutely everything have not gone very well. There is too much and a lot of it is nonsense.

In 2010, the Library of Congress took the view that Twitter was a crucial source of modern history and decided to archive every single tweet. It “may prove to be one of this generation’s most significant legacies to future generations”, the library wrote.

That “may” seems over-optimistic. To most people, the repository is both unwieldy and uninteresting. As of 2017, the library seems to agree. It now opts to save just a few select posts.

The risk in being selective, of course, is missing something important. Dutch consultant Maurice de Kunder has been following the number of webpages indexed by search engines for over a decade and found that it has fallen from 4.7 billion to 3.98 billion.

Some deletions are more deliberate than others. In 2025, Mr Elon Musk’s “Department of Government Efficiency” launched a project to eliminate up to 20 per cent of US federal websites. Particular words, such as climate change, also evaporated.

A couple of months later, large companies began rewriting their own sites to also remove references to climate change. The only reason we know this is that third parties were keeping track – the organisations themselves did not flag changes.

Because online content is regularly overwritten, what the historian Abby Smith Rumsey calls modern memory technology has a significantly shorter lifespan than pre-digital versions.

There is neither a single record of everything posted online nor an agreed-upon way to save it. This has become more noticeable with the death of digital publications.

You can see newspaper editions printed in 1665, the year the Great Plague of London began, but you can no longer visit a modern news site like Wales’ The National, which launched in 2021 and was then taken offline. Some sites, like Gawker, have been archived while others have disappeared into 404 errors (the status code that indicates a server cannot find a webpage).

A few have entered into a strange afterlife. When cult site The Hairpin was shut down in 2018, its domain was purchased by a Serb entrepreneur called Nebojsa Vujinovic, who specialises in buying old news sites and filling them with AI-generated clickbait. Now, it just redirects readers to an online gambling site.

Despite relying heavily on digital data, we have left its preservation to a mishmash of individual efforts.

The best known is the Wayback Machine, an initiative from the American non-profit Internet Archive. This takes snapshots of websites (it has preserved over a trillion so far) but it does not have everything.

Copyright owners can seek content removal and some sites have begun to blacklist the Wayback Machine, suspecting that AI companies are using it as a way to scrape content without permission. A report by the Nieman Lab found that the volume of snapshots dipped in the second half of 2025.

A second popular option is archive.today, a mysterious site operating under multiple domain names. How long it will last is anyone’s guess.

In 2025, the Federal Bureau of Investigation subpoenaed the unknown registrar behind it and Wikipedia recently asked editors to stop linking to it “due to concerns about botnets, link spamming, and how the site is run”.

There is, of course, a sort of immortality in the fact that much of what exists online has been used to train AI models. But this is not much help if you want to trace something’s original form. Even online snapshots of webpages may prove less durable than physical archives.

We treat the internet as if it is limitless and permanent, but transience is inbuilt. If you see something online worth saving, you had better do it yourself. Financial Times

See more on