this post was submitted on 03 Feb 2024
122 points (95.5% liked)

Technology

34891 readers
841 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
top 15 comments
sorted by: hot top controversial new old
[–] BreakDecks@lemmy.ml 52 points 9 months ago

Google never did make backups of the Internet, why are we pretending like they ever did? Cached webpages were a basic workaround for third-party website downtime; a guarantee that you could reliably see the information you searched for, even if the linked site was down. It was nothing more than a snapshot of the webpage their crawlers saw, where older copies are permanently deleted with every new crawl of the page.

It was never an archival effort, it was a rotating cache. If you were under the impression for all these years that Google was preserving Internet history, I don't know why, because Google never claimed to be doing that. Maybe it's time to reevaluate any other altruistic things you're assuming that mega corporations are up to...

[–] Sheeple@lemmy.world 52 points 9 months ago

Enshittification marches on

[–] BoisZoi@lemmy.ml 29 points 9 months ago (1 children)

If possible, please use the internet archive extension and upload pages that haven't been uploaded ever, or in the last year.

Likewise, if you know or use another service, archive it there too!

[–] Cheradenine@sh.itjust.works 11 points 9 months ago

Use SearXNG which still gives you a cached option (via internet archive). If it is not there the option to make a new snapshot will be available.

https://searx.space/

[–] BudgieMania@kbin.social 10 points 9 months ago* (last edited 9 months ago)

Well surely this means that archive.org will be allowed to exist in peace, since it would be ridiculous to make the information and culture produced in the year of our lord 20fucking24 the most ephemeral it has ever been in human history, right?

Right?

[–] TrickDacy@lemmy.world 7 points 9 months ago

Keeping records of things bad people say and do would be considered not being evil, so it makes sense.

[–] Willie@kbin.social 5 points 9 months ago

I feel like this is so they can deny that they fed all the webpages that they cached to their 'AI' training datasets later when someone accuses them of that. Now when asked about the copies of webpages that they have they can be like "What copies?" and end the conversation there.

[–] ivanafterall@kbin.social 5 points 9 months ago
[–] linearchaos@lemmy.world 4 points 9 months ago

I wonder if this is related to why their searches have been going to hell. Like They changed how the engine indexes or something.

[–] astanix@lemmy.world 3 points 9 months ago

I noticed this yesterday when I tried to load a cached version of a site. How disappointing.

[–] autotldr@lemmings.world 1 points 9 months ago

This is the best summary I could come up with:


Google Search's "cached" links have long been an alternative way to load a website that was down or had changed, but now the company is killing them off.

The feature has been appearing and disappearing for some people since December, and currently, we don't see any cache links in Google Search.

Cached links used to live under the drop-down menu next to every search result on Google's page.

As the Google web crawler scoured the Internet for new and updated webpages, it would also save a copy of whatever it was seeing.

That quickly led to Google having a backup of basically the entire Internet, using what was probably an uncountable number of petabytes of data.

In 2020, Google switched to mobile-by-default, so for instance, if you visit that cached Ars link from earlier, you get the mobile site.


The original article contains 438 words, the summary contains 139 words. Saved 68%. I'm a bot and I'm open source!

[–] wizardbeard@lemmy.dbzer0.com -1 points 9 months ago (2 children)

Three guesses at if they even attempted to donate this data to Internet Archive/Wayback Machine, and the first two don't count.

[–] BreakDecks@lemmy.ml 7 points 9 months ago

Google cached content is pruned down into a space-saving format and rotated/deleted after less than a year, so it would be pretty worthless to the IA.

[–] Chozo@kbin.social 1 points 9 months ago

Internet Archive likely wouldn't be able to handle it. They're already struggling currently, as it is, and dumping a few petabytes of caches of the entire internet onto them probably won't help.

[–] TCB13@lemmy.world -2 points 9 months ago

You can't cache stuff, politicians and the media needs ways to be able to delete content whenever they please.