Walled Culture has already written about the two–pronged attack by the copyright industry against the Internet Archive, which was founded by Brewster Kahle, whose Kahle/Austin Foundation supports this blog. The Intercept has an interesting article that reveals another reason why some newspaper publishers are not great fans of the site: The New York Times tried …

It exists, it’s called a robots.txt file that the developers can put into place, and then bots like the webarchive crawler will ignore the content.

And therein lies the issue: if you place a robots.txt out for the content, all bots will ignore the content, including search engine indexers.

So huge publishers want it both ways, they want to be indexed, but they don’t want the content to be archived.

If the NYT is serious about not wanting to have their content on the webarchive but still want humans to see it, the solution is simple: Put that content behind a login! But the NYT doesn’t want to do that, since then they’ll lose out on the ad revenue of having regular people load their website.

I think in the case of the article here though, the motivation is a bit more nefarious, in that the NYT et al simply don’t want to be held accountable. So there’s a choice to be had for them, either retain the privilege of being regarded as serious journalism, or act like a bunch of hacks that can’t be relied upon.

pootriarch
link
fedilink
English
21Y

It exists, it’s called a robots.txt file that the developers can put into place, and then bots like the webarchive crawler will ignore the content.

the internet archive doesn’t respect robots.txt:

Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes.

the only way to stay out of the internet archive is to follow the process they created and hope they agree to remove you. or firewall them.

https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
!piracy@lemmy.dbzer0.com
Create a post
⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don’t request invites, trade, sell, or self-promote

3. Don’t request or link to specific pirated titles, including DMs

4. Don’t submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):


💰 Please help cover server costs.

Ko-Fi Liberapay
Ko-fi Liberapay

  • 1 user online
  • 109 users / day
  • 273 users / week
  • 1K users / month
  • 3.5K users / 6 months
  • 1 subscriber
  • 3.4K Posts
  • 82.2K Comments
  • Modlog