cross-posted from: https://lemmy.world/post/1330512

Below are direct quotes from the filings.

OpenAI

As noted in Paragraph 32, supra, the OpenAI Books2 dataset can be estimated to contain about 294,000 titles. The only “internet-based books corpora” that have ever offered that much material are notorious “shadow library” websites like Library Genesis (aka LibGen), Z-Library (aka B-4ok), Sci-Hub, and Bibliotik. The books aggregated by these websites have also been available in bulk via torrent systems. These flagrantly illegal shadow libraries have long been of interest to the AI-training community: for instance, an AI training dataset published in December 2020 by EleutherAI called “Books3” includes a recreation of the Bibliotik collection and contains nearly 200,000 books. On information and belief, the OpenAI Books2 dataset includes books copied from these “shadow libraries,” because those are the most sources of trainable books most similar in nature and size to OpenAI’s description of Books2.

Meta

Bibliotik is one of a number of notorious “shadow library” websites that also includes Library Genesis (aka LibGen), Z-Library (aka B-ok), and Sci-Hub. The books and other materials aggregated by these websites have also been available in bulk via torrent systems. These shadow libraries have long been of interest to the AI-training community because of the large quantity of copyrighted material they host. For that reason, these shadow libraries are also flagrantly illegal.

This article from Ars Tecnica covers a few more details. Filings are viewable at the law firm’s site here.

@rustic_tiddles@lemm.ee
link
fedilink
English
11Y

No but this isn’t really limiting sales of the book in any way. I buy real used books, I buy new books sometimes. I go through a few audible credits a month. I also pirate books if I feel like it. I’ve had books I bought and gotten rid of, then years later decided to pirate it and read it again. Anyway used books are so ridiculously cheap it’s very rare for me to buy a book new, often it’s a gift for a friend.

I also use ChatGPT almost every day, and while I have asked it for the summary to a book I didn’t feel like reading, it has never once replaced “reading a book” in my life. You can also get the summary to most books on wikipedia if that’s all you want.

@DieterParker@feddit.de
link
fedilink
English
11Y

Exactly that. Old cds and books change their owners for little to no money all the time. I have accumulated 100s of cds without spending anything, that where about to get thrown away. I will rip and share them on soulseek eventually.

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
!piracy@lemmy.dbzer0.com
Create a post
⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don’t request invites, trade, sell, or self-promote

3. Don’t request or link to specific pirated titles, including DMs

4. Don’t submit low-quality posts, be entitled, or harass others


Loot, Pillage, & Plunder


💰 Please help cover server costs.

Ko-FiLiberapay


  • 1 user online
  • 219 users / day
  • 509 users / week
  • 927 users / month
  • 4.94K users / 6 months
  • 1 subscriber
  • 3.23K Posts
  • 79K Comments
  • Modlog