fedilink
279
GitHub - wasi-master/13ft: My own custom 12ft.io replacement
github.com
external-link
My own custom 12ft.io replacement. Contribute to wasi-master/13ft development by creating an account on GitHub.

A site similar to 12ft.io but is self hosted and works with websites that 12ft.io doesn’t work with.

How does it work?

It pretends to be GoogleBot (Google’s web crawler) and gets the same content that google will get. Google gets the whole page so that the content of the article can be indexed properly and this takes advantage of that.

link: https://github.com/wasi-master/13ft

@redcalcium@lemmy.institute
link
fedilink
English
28
edit-2
1Y

It amazes me that all it takes is just changing user agent to Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) and it can bypass paywalls on many sites? I thought those sites would try harder (e.g. checking if the ip address is truly belong to google), but apparently not.

Aniki 🌱🌿
link
fedilink
English
11Y

removed by mod

Checking ip ownership is a moving target more likely to result in outcomes these sites don’t want (accidentally blocking google bots and preventing results from appearing on google).

Checking useragent is cheap, easier, unlikely to break (for this purpose, anyway) and the percentage of folks who know how to bypass this check is relatively slim, with a pretty small financial impact.

It’s not necessarily a moving target when entire blocks can be associated with Google.

@andrew@radiation.party
link
fedilink
English
31Y

Unless they are permanently only using specific addresses or blocks and will never change that up, I’d consider it a moving target.

@efstajas@lemmy.world
link
fedilink
English
11Y

Google literally has an official list of IP ranges for their crawlers, complete with an API that returns the current IP ranges that you can use to automate a check. Hardly a moving target, and even if it is, it doesn’t matter if you know exactly where the target is at all times.

Create a post

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.

Rules:

  1. Be civil: we’re here to support and learn from one another. Insults won’t be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it’s not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don’t duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

  • 1 user online
  • 243 users / day
  • 641 users / week
  • 1.41K users / month
  • 3.93K users / 6 months
  • 1 subscriber
  • 3.78K Posts
  • 76.6K Comments
  • Modlog