Meta Admits Use of 'Pirated' Book Dataset to Train AI * TorrentFreak
torrentfreak.com
external-link
Meta admits in court that it used portions of the Books3 dataset to train its Llama models. This dataset includes many pirated books.

ohno my copyright!!! How will the publisher megacorps now make a record quarter??? Think of the shareholders!

That’s not the take away you should be having here, it’s that a mega Corp felt that they should be allowed to create new content from someone else’s work, both without their permission and without paying

@trebuchet@lemmy.ml
link
fedilink
English
-108M

Lemmy sure loves copyright and intellectual property once you change who the pirate is.

@eskimofry@lemmy.world
link
fedilink
English
18M

Ralph Waldo Emerson:

A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." His point was that only small-minded men refused to rethink their prior beliefs.

@trebuchet@lemmy.ml
link
fedilink
English
-3
edit-2
8M

So what you’re saying is this episode has caused you/others here on /c/piracy to rethink your prior beliefs, and now you see some value in the copyright legal regime?

@eskimofry@lemmy.world
link
fedilink
English
08M

Not really. We believe in what we believe. You’re the goblin who sticks on to consistency.

Piracy is a service problem. If you want it to disappear, corporate greed got to disappear.

@cecilkorik@lemmy.ca
link
fedilink
English
308M

Almost like the context matters and the world isn’t entirely made up of black and white binary choices because we’re not robots or computers and discrete logic does not apply to human moral arguments.

@Steve@startrek.website
link
fedilink
English
28M

Preposterous

@trebuchet@lemmy.ml
link
fedilink
English
-58M

Conveniently, these moral arguments that are freed from the confines of discrete logic also allow people on /c/piracy to ignore the rules when justifying their own piracy, and still condemn others they already happen to dislike when they do piracy.

sour
link
fedilink
68M

because company and individual are same

@trebuchet@lemmy.ml
link
fedilink
English
18M

So IP law for individuals = bad, but IP law for corporations = good is the general argument here?

Is there a principled basis for this argument?

It seems like a lot of art like musicians or novelists rely almost entirely on earnings from selling their works to individuals. Wouldn’t a legal regime like you’re advocating basically make producing art for real people a lot less lucrative comparatively and drive those artists into making corporate art and marketing materials?

sour
link
fedilink
28M

does only selling to individual prevent company from pirating

@eskimofry@lemmy.world
link
fedilink
English
68M

That’s like saying everyone should let people enjoy their kinks and you come in and say "aha, then pedophilia is allowed, ya?

FaceDeer
link
fedilink
-3
edit-2
8M

The current top whipping boy is AI, apparently. “AI must be bad” is the highest level assumption, so apparently even in this piracy community that overrides the usual “copyright must be bad” assumption.

Or is it actually “Meta must be bad?” I’ve lost track of who the Five Minutes Hate is supposed to be directed at lately.

AdmiralShat
link
fedilink
English
58M

You have a very small pool of thinking capacity

FaceDeer
link
fedilink
-48M

I’ve lost track because I don’t care who the whipping boy is supposed to be. I form my own opinions.

AdmiralShat
link
fedilink
English
28M

Wow, lol, that one went way over your head.

I called you stupid because of what you said. There is no universal whipping boy, you also struggle with reading comprehension, pretty severely.

I always find it so weird how the people who scream “I FORM MY OWN OPINIONS” are usually the dumbest, with the least formed opinions. You need to use that as a buffer because you don’t have a thought out opinion but you’re afraid of not being apart of the conversation.

sour
link
fedilink
18M

facebook is bad

ok, fair; but do consider the context that the models are open weight. You can download them and use them for free.

There is a slight catch though which I’m very annoyed at: it’s not actually Apache. It’s this weird license where you can use the model commercially up until you have 700M Monthly users, which then you have to request a custom license from meta. ok, I kinda understand them not wanting companies like bytedance or google using their models just like that, but Mistral has their models on Apache-2.0 open weight so the context should definitely be reconsidered, especially for llama3.

It’s kind of a thing right now- publishers don’t want models trained on their books, „because it breaks copyright“ even though the model doesn’t actually remember copyrighted passages from the book. Many arguments hinge on the publishers being mad that you can prompt the model to repeat a copyrighted passage, which it can do. IMO this is a bullshit reason

anyway, will be an interesting two years as (hopefully) copyright will get turned inside out :)

I really have to thank you for an educated response

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
!piracy@lemmy.dbzer0.com
Create a post
⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don’t request invites, trade, sell, or self-promote

3. Don’t request or link to specific pirated titles, including DMs

4. Don’t submit low-quality posts, be entitled, or harass others


Loot, Pillage, & Plunder


💰 Please help cover server costs.

Ko-FiLiberapay


  • 1 user online
  • 219 users / day
  • 509 users / week
  • 927 users / month
  • 4.94K users / 6 months
  • 1 subscriber
  • 3.22K Posts
  • 78.9K Comments
  • Modlog