Meta Admits Use of 'Pirated' Book Dataset to Train AI * TorrentFreak
torrentfreak.com
external-link
Meta admits in court that it used portions of the Books3 dataset to train its Llama models. This dataset includes many pirated books.
@rufus@discuss.tchncs.de
link
fedilink
English
30
edit-2
10M

AI is just too much of a hype. Every company invests millions into AI and all new products need to “have AI”. And then everybody also needs to file lawsuits. I mean rightly so if Meta just pirated the books, but that’s not a problem with AI, but plain old piracy.

I was pretty sure OpenAI or Meta didn’t license gigabytes of books correctly for use in their commercial products. Nice that Meta now admitted to it. I hope their " Fair Use" argument works and in the future we can all “train AI” with our “research dataset” of 40GB of ebooks. Maybe I’m even going to buy another harddisk and see if I can train an AI on 6 TB of tv series, all marvel movies and a broad mp3 collection.

Btw, there was no denying anyways. Meta wrote a scientific paper about their LLaMA model in march of last year. And they clearly listed all of their sources, including Books3. Other companies aren’t that transparent. And even less so as of today.

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
!piracy@lemmy.dbzer0.com
Create a post
⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don’t request invites, trade, sell, or self-promote

3. Don’t request or link to specific pirated titles, including DMs

4. Don’t submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):


💰 Please help cover server costs.

Ko-Fi Liberapay
Ko-fi Liberapay

  • 1 user online
  • 109 users / day
  • 273 users / week
  • 1K users / month
  • 3.5K users / 6 months
  • 1 subscriber
  • 3.4K Posts
  • 82.2K Comments
  • Modlog