Kudurru, the new tool from the creator of Have I Been Trained?, can help artists block web scrapers and even “poison” the scraping by sending back the wrong image.
hedge
creator
link
fedilink
English
31Y

Oops, sorry, forgot: https://archive.ph/ylJHc

FaceDeer
link
fedilink
371Y

For those who can’t get through the paywall, this is an article about a system called Kudurru that is monitoring a bunch of websites with images listed in the LAION-5B metadata set. When it sees the same IP address downloading images from those websites simultaneously, it assumes that it must be a bot that’s scraping the data in order to train an AI with it and either blocks them or “poisons” the scrape by sending incorrect images back.

Frankly, I don’t see much likely impact from this. AI training has moved beyond simply using LAION-5B, we’re discovering that a smaller higher-quality dataset is better than just throwing mountains of data at the AI in training. So anything a trainer is downloading is going to be extensively curated before being used for training and this sort of obstruction will be fixed or filtered out.

Thanks

But the main result is achieved anyway, right? The picture that the system tried to download did not make it into the training set.

FaceDeer
link
fedilink
71Y

Unless the “this sort of obstruction will be fixed” part means the image is downloaded anyway. This is the weakest sort of DRM.

Create a post

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

  • 1 user online
  • 60 users / day
  • 170 users / week
  • 619 users / month
  • 2.31K users / 6 months
  • 1 subscriber
  • 3.28K Posts
  • 67K Comments
  • Modlog