When Online Content Disappears
www.pewresearch.org
external-link
A quarter of all webpages that existed at one point between 2013 and 2023 are no longer accessible.

cross-posted from: https://lemy.lol/post/25166889

Blackout
link
fedilink
184M

If the information was important wouldn’t it already be passed around and expanded upon? The Internet is probably 99% junk, at least the posts I’ve made. Only the good stuff like goatse survives.

Problem is, people rarely realize the importance until they’re lost. Plenty of posts from 90s and 2000s containing valuable insights are probably lost forever. Remember that not everything online is in English, either.

Snot Flickerman
link
fedilink
English
18
edit-2
4M

Finding sources about Bush and Cheney fuckery from 2000-2008 is getting increasingly difficult. Their crimes are getting memory-holed.

EDIT: Specifically, does anyone else remember the specific act that Bush wanted to hit Quakers with terrorism charges over? I remember it being a bunch of Quakers in kayaks doing a blockade of a naval ship, preventing it from leaving port to go to Iraq. I can’t find a fucking word on it anymore, and I can barely even find sources on Bush wanting to hit Quakers with terrorism charges other than some broken links at the ACLU. Quakers, as a reminder, are the only religious group in the USA that are default conscientious objectors because violence is 100% antithetical to their religion. These are the kind of people they wanted to use “terrorism” charges against.

JackGreenEarth
link
fedilink
English
64M

Dunno about the rest of your comment, but there are definitely other nonviolent religions apart from Quakers, such as Jains.

From a historical or intellectual archaeological perspective, no one in 2000 BC Babylon thought their pottery would be of historical significance, but 4000 years later, it is. These websites, particularly ones independently created and maintained by hobbyists, are snapshots of the ideas of the time and people that created them. These websites may not have been intensely popular, but they were in many ways a foundational part of the inchoate tapestry of the internet that would eventually become the “modern web.”

TehPers
link
fedilink
English
54M

On the flip side, nobody can be expected to keep their website up for 4000 years. Hosting costs money and time, and at some point, the thing you’re hosting will fall out of relevance enough to no longer be worth the cost.

This is why archiving is important. Hopefully most of the content that was lost was archived at some point. Getting a good chunk of that content onto long term storage would do future generations a favor (even if it’s just a bunch of tape storage locked away in a warehouse or something).

This is true. Right now the OG internet is sort of kept alive by oral history, but we have the technology to save these websites in perpetuity as historical artifacts. That might be a good coding project - a robust archiving system that lets you point a URL at a webpage and scrape everything under its domain and keep a static collection of its contents. The issue, though, is that this doesn’t actually truly “capture” many web pages. A lot of the backend data that might have been served dynamically from a database isn’t retrievable, so the experience of using the page itself is potentially non-archivable.

Create a post

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

  • 1 user online
  • 144 users / day
  • 275 users / week
  • 709 users / month
  • 2.87K users / 6 months
  • 1 subscriber
  • 3.09K Posts
  • 64.9K Comments
  • Modlog