Meta Admits Use of 'Pirated' Book Dataset to Train AI * TorrentFreak
torrentfreak.com
external-link
Meta admits in court that it used portions of the Books3 dataset to train its Llama models. This dataset includes many pirated books.

ok, fair; but do consider the context that the models are open weight. You can download them and use them for free.

There is a slight catch though which I’m very annoyed at: it’s not actually Apache. It’s this weird license where you can use the model commercially up until you have 700M Monthly users, which then you have to request a custom license from meta. ok, I kinda understand them not wanting companies like bytedance or google using their models just like that, but Mistral has their models on Apache-2.0 open weight so the context should definitely be reconsidered, especially for llama3.

It’s kind of a thing right now- publishers don’t want models trained on their books, „because it breaks copyright“ even though the model doesn’t actually remember copyrighted passages from the book. Many arguments hinge on the publishers being mad that you can prompt the model to repeat a copyrighted passage, which it can do. IMO this is a bullshit reason

anyway, will be an interesting two years as (hopefully) copyright will get turned inside out :)

I really have to thank you for an educated response

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
!piracy@lemmy.dbzer0.com
Create a post
⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don’t request invites, trade, sell, or self-promote

3. Don’t request or link to specific pirated titles, including DMs

4. Don’t submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):


💰 Please help cover server costs.

Ko-Fi Liberapay
Ko-fi Liberapay

  • 1 user online
  • 235 users / day
  • 425 users / week
  • 926 users / month
  • 3.44K users / 6 months
  • 1 subscriber
  • 3.46K Posts
  • 83.1K Comments
  • Modlog