Meta Admits Use of 'Pirated' Book Dataset to Train AI * TorrentFreak
torrentfreak.com
external-link
Meta admits in court that it used portions of the Books3 dataset to train its Llama models. This dataset includes many pirated books.

I do wonder how it shakes out. If the case establishes that a license to use the material should be acquired for copyrighted material, then maybe the license I’m setting on comments might bring commercial AI companies in hot water too - which I’d love. Opensource AI models FTW

CC BY-NC-SA 4.0

@jarfil@beehaw.org
link
fedilink
English
910M

That license would require the AI model to only output content under the same license. Not sure if you realize, but commercial use is part of the OpenSource definition:

https://opensource.org/osd/

Your content would just get filtered out from any training dataset.

As for going against commercial companies… maybe you are a lawyer, otherwise good luck paying the fees.

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
!piracy@lemmy.dbzer0.com
Create a post
⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don’t request invites, trade, sell, or self-promote

3. Don’t request or link to specific pirated titles, including DMs

4. Don’t submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):


💰 Please help cover server costs.

Ko-Fi Liberapay
Ko-fi Liberapay

  • 1 user online
  • 109 users / day
  • 273 users / week
  • 1K users / month
  • 3.5K users / 6 months
  • 1 subscriber
  • 3.4K Posts
  • 82.2K Comments
  • Modlog