OpenAI says it’s “impossible” to create useful AI models without copyrighted material

@explodicle@local106.com

Having read through these comments, I wonder if we’ve reached the logical conclusion of copyright itself.

@sanzky@beehaw.org

copyright has become a tool of oppression. Individual author’s copyright is constantly being violated with little resources for them to fight while big tech abuses others work and big media uses theirs to the point of it being censorship.

frog 🐸

Perhaps a fair compromise would be doing away with copyright in its entirety, from the tiny artists trying to protect their artwork all the way up to Disney, no exceptions. Basically, either every creator has to be protected, or none of them should be.

@zaphod@lemmy.ca

IMO the right compromise is to return copyright to its original 14 year term. OpenAI can freely train on anything up to 2009 which is still a gigantic amount of material while artists continue to be protected and incentivized.

frog 🐸

I’m increasingly convinced of that myself, yeah (although I’d favour 15 or 20 years personally, just because they’re neater numbers than 14). The original purpose of copyright was to promote innovation by ensuring a creator gets a good length of time in which to benefit from their creation, which a 14-20 year term achieves. Both extremes - a complete lack of copyright and the exceedingly long terms we have now - suppress innovation.

@jarfil@beehaw.org

I’d favour 15 or 20 years personally, just because they’re neater numbers than 14

Another neat number is: 4.

That’s it, if you don’t make money on your creation in 4 years, then it’s likely trash anyway.

@averyminya@beehaw.org

I’ve said it before and I’ll say it again! (My apologies if it happens to be to the same person, lol)

Early access developers in shambles!

@sanzky@beehaw.org

that would mean governments prosecuting all offences, which is not going to happen. I doubt any country would have enough resources for doing that

raccoona_nongrata

deleted by creator

@explodicle@local106.com

Apparently they’re going to just make only the little guy’s copyrights effectively meaningless, so yeah.

jlow (he/him)

It’s also “impossible” to have multiple terabytes of media on my homeserver without copyright infringement, so piracy is ok, right!?

O no, wait it actually is possible, it’s just more expensive and more work to do it legally (and leaves a lot of plastic trash in form of Blurays and DVDs), just like with AI. But laws are just for poor people, I guess.

@AVincentInSpace@pawb.social

Even if it was impossible, would that make it okay?

@MonkderZweite@feddit.ch

deleted by creator

@randomaside@lemmy.dbzer0.com

OpenAI now needs to go to court and argue fair use forever. That’s the burden of our system. Private ownership is valued higher than anything else so … Good luck we’re all counting on you (unfortunately).

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 🏆

Then pay for the material like everyone else who can’t do things without someone else’s copyrighted materials.

@ky56@aussie.zone

All the AI race has done is surface the long standing issue of how broken copyright is for the online internet era. Artists should be compensated but trying to do that using the traditional model which was originally designed with physical, non infinitely copyable goods in mind is just asinine.

One such model could be to make the copyright owner automatically assigned by first upload on any platform that supports the API. An API provided and enforced by the US copyright office. A percentage of the end use case can be paid back as royalties. I haven’t really thought out this model much further than this.

Machine learning is here to say and is a useful tool that can be used for good and evil things alike.

@Kichae@lemmy.ca

Nah. Copyright is broken, but it’s broken because it lasts too long, and it can be held by constructs. People should still reserve the right to not have the things they’ve made incorporated into projects or products they don’t want to be associated with.

The right to refusal is important. Consent is important. The default permission should not be shifted to “yes” in anybody’s mind.

The fact that a not insignificant number of people seem to think the only issue here is money points to some pretty fucking entitled views among the would-be-billionaires.

@ky56@aussie.zone

My major issue with copyright is how published works can have major cultural significance. How it can shift ideas and shape minds. But your not allowed to have some fun with on a personal level. How can it be the norm that the most important scientific knowledge and other culturally significant material is locked behind such restrictive measures. Essentially ensuring that middle class and especially poor people are locked out.

If you publish something, even if it’s paid, you don’t deserve such restrictive rights. You deserve to be compensated for your work but you don’t deserve to make it into a extortion racket.

My view on your second point is if you have posted it publicly with no paywall, maybe you should still get some percentage revenue but you don’t have a say in what it can be used. To place restrictions on what it can be used for when posting it publicly is academic as it’s basically unenforceable.

We live in a society which revolves around the discovery and sharing of ideas. We are all entitled to a certain amount of the sharing of that information. That’s the whole point. To have some business man who was in the right place at the right time create an extortion racket out of something culturally significant they almost certainly didn’t create is wrong.

Sorry if this is all over the place. I’m writing this while tired.

@Pratai@lemmy.ca

I stand by my opinion that AI will be the worst thing humans ever created, and that means it ranks just a bit above religion.

@Allero@lemmy.today

I’d argue the issue is not the AI but capitalism.

AI is good, AI companies are evil.

@sculd@beehaw.org

This is very likely to be true.

Nacktmull

The problem is not the use of copyrighted material. The problem is doing so without permission and without paying for it.

@SilentStorms@lemmy.dbzer0.com

It’s crazy how everyone is suddenly in favour of IP law.

@t3rmit3@beehaw.org

IP law used to stop corporations from profiting off of creators’ labor without compensation? Yeah, absolutely.

IP law used to stop individuals from consuming media where purchases wouldn’t even go to the creators, but some megacorp? Fuck that.

I’m against downloading movies by indie filmmakers without compensating them. I’m not against downloading films from Universal and Sony.

I’m against stealing food from someone’s garden. I’m not against stealing food from Safeway.

If you stop looking at corporations as being the same as individuals, it’s a very simple and consistent viewpoint.

IP law shouldn’t exist, but if it does it should only exist to protect individuals from corporations. When that’s how it’s being used, like here, I accept it as a necessary evil.

@jarfil@beehaw.org

IP law used to compensate creators “until their death + 70 years”… you can spin it however you want, that’s just plain wrong.

If you stop looking at corporations as being the same as individuals

That’s a separate bonkers legislation. Two wrongs don’t make one right.

@t3rmit3@beehaw.org

I never said I like IP law. I explicitly said it shouldn’t exist. I wish they’d strip out any post-humous ownership, absolutely. But I’m fine beating OpenAI over the head with that or any other law. Whether I advocate for or against copyright law will ultimately have no impact on its existence, so I may as well cheer it on when it’s used to hurt corporations, and condemn it when it’s used to protect corporations over individuals.

That’s a separate bonkers legislation

I’m not talking about the legislation, I’m talking about the mindset, which is very prevalent in the pro-AI tech spaces. Go to HackerNews and see just how hard the AI-bros there will fellate each other over “corporate rights”.

My whole point is that there is nothing logically inconsistent with being against IP law, but also understanding that since its existence is reality, leveraging it as best as possible (i.e. to hurt corporations).

jlow (he/him)

Word.

@interdimensionalmeme@lemmy.ml

I still think IP needs to eat shit and die. Always has, always will.

I recently found out we could have had 3d printing 20 years earlier but patents stopped that. Cocks !

Mnglw

I’m not so much in favor of IP law as I am in favor of informed consent in every aspect of the word.

when posting photos, art and text content years ago, I was not able to imagine it might be trained off by an AI. As such I was not able to make a decision based on informed consent if I agreed to that or not.

Even though quotes such as “once you post it, its on the internet forever” were around, I was not aware the extend to which this reached and that had my art been vacuumed by a generative AI model (it hasnt luckily) people could create art that pretends to be created by me. Thus I could not consent

I think this goes for a lot of artists actually, especially those who exist far more publicly than I do, who are in those databases and who are a keyword to be used in prompts. There is no possible way they could have given informed consent to that at the time they posted art/at the time they started that social media profile/youtube channel etc.

To me, this is the real problem. I could care less about corporations.

JokeDeity

I’m the detractor here, I couldn’t give less of a shit about anything to do with intellectual property and think all copyright is bad.

@Daxtron2@startrek.website

It’s almost like most people are idiots who don’t understand the thing they’re against and are just parroting what they hear/read.

@noorbeast@lemmy.zip

I will repeat what I have proffered before:

If OpenAI stated that it is impossible to train leading AI models without using copyrighted material, then, unpopular as it may be, the preemptive pragmatic solution should be pretty obvious, enter into commercial arrangements for access to said copyrighted material.

Claiming a failure to do so in circumstances where the subsequent commercial product directly competes in a market seems disingenuous at best, given what I assume is the purpose of copyrighted material, that being to set the terms under which public facing material can be used. Particularly if regurgitation of copyrighted material seems to exist in products inadequately developed to prevent such a simple and foreseeable situation.

Yes I am aware of the USA concept of fair use, but the test of that should be manifestly reciprocal, for example would Meta allow what it did to MySpace, hack and allow easy user transfer, or Google with scraping Youtube.

To me it seems Big Tech wants its cake and to eat it, where investor $$$ are used to corrupt open markets and undermine both fundamental democratic State social institutions, manipulate legal processes, and undermine basic consumer rights.

@sculd@beehaw.org

Agreed.

There is nothing “fair” about the way Open AI steals other people’s work. ChatGPT is being monetized all over the world and the large number of people whose work has not been compensated will never see a cent of that money.

At the same time the LLM will be used to replace (at least some of ) the people who created those works in the first place.

Tech bros are disgusting.

nicetriangle

At the same time the LLM will be used to replace (at least some of ) the people who created those works in the first place.

This right here is the core of the moral issue when it comes down to it, as far as I’m concerned. These text and image models are already killing jobs and applying downward pressure on salaries. I’ve seen it happen multiple times now, not just anecdotally from some rando on an internet comment section.

These people losing jobs and getting pay cuts are who created the content these models are siphoning up. People are not going to like how this pans out.

@vexikron@lemmy.zip

The flip side of this is that many artists who simply copy very popular art styles are now functionally irrelevant, as it is now just literally proven that this kind of basically plagiarism AI is entirely capable of reproducing established styles to a high degree of basically fidelity.

While many aspects of this whole situation are very bad for very many reasons, I am actually glad that many artists will be pressured to actually be more creative than an algorithm, though I admit this comes from basically a personally petty standpoint of having known many, many, many mediocre artists who themselves and their fans treat like gods because they can emulate some other established style.

nicetriangle

Literally every artist copies, it’s how we all learn. The difference is that every artist out there does not have an enterprise-class-data-center-powerd-super-human ability to absorb <ALL THE ART> and then be able to spit out anything instantly. It still takes time and hard work and dedication. And through the years of hard work people put into learning how their heroes do X, Y, and Z, they develop a style of their own.

It’s how artists cut their teeth and work their way into the profession. What you’re welcoming in is a situation where nobody can find any success whatsoever until they are absolutely original and of course that is an impossible moving target when every original ideal and design and image can just be instantly siphoned back up into the AI model.

Nobody could survive that way. Nobody can break into the artistic industry that way. Except for the wealthy. All the low level work people get earlier in their careers that helps keep them afloat while they learn is gone now. You have to be independently wealthy to become a high level artist capable of creating truly original work. Because there’s no other way to subsidize the time and dedication that takes when all the work for people honing their craft has been hoovered up by machines.

@vexikron@lemmy.zip

No, I am not welcoming an artist apocalypse, that would obviously be bad.

I am noting that I find it amusing to me on a level I already acknowledged was petty and personal that many, many mediocre artists who are absolutely awful to other people socially would have their little cults of fandom dampened by the fact that a machine can more or less to what they do, and their cult leader status is utterly unwarranted.

I do not have a nice and neat solution to the problem you bring up.

I do believe you are being somewhat hyperbolic, but, so was I.

Yep, being an artist in a capitalist hellscape world with modern AI algorithms is not a very reliable way to earn a good living and you are not likely to be have such a society produce many artists who do not have either a lot of free time or money, or you get really lucky.

At this point we are talking about completely reorganizing society in fairly large and comprehensive ways to achieve significant change on this front.

Also this problem applies to far, far more people than just artists. One friend of mine wanted her dream job as running a little bakery! Had to set her prices too high, couldn’t afford a good location, supply chain problems, taxes, didn’t work out.

Maybe someone’s passion is teaching! Welp, that situation is all fucked too.

My point here is: Ok, does anyone have an actual plan that can actually transform the world into somewhere that allow the average person to be far more likely to be able to live the life they want?

Would that plan have more to do with the minutiae of regulating a specific kind of ever advancing and ever changing technology in some kind of way that will be irrelevant when the next disruptive tech proliferates in a few years, or maybe more like an actual total overhaul of our entire society from the ground up?

@MagicShel@programming.dev

Any company replacing humans with AI is going to regret it. AI just isn’t that good and probably won’t ever be, at least in it’s current form. It’s all an illusion and is destined to go the way of Bitcoin, which is to say it will shoot up meteorically and seem like the answer to all kinds of problems, and then the reality will sink in and it will slowly fade to obscurity and irrelevance. That doesn’t help anyone affected today, of course.

nicetriangle

I mostly disagree (especially on the long term), but hope you’re right

@MagicShel@programming.dev

It’s garbage for programming. A useful tool but not one that can be used by a non-expert. And I’ve already had to have a conversation with one of my coworkers when they tried to submit absolutely garbage code.

This isn’t even the first attempt at a smart system that enables non-programmers to write code. They’ve all been garbage. So, too, will the next one be but every generation has to try it for themselves. AGI might have some potential some day, but that’s a long long way off. Might as well be science fiction.

Other disciplines are affected differently, but I constantly play with image and text generation and they are all some flavor of garbage. There are some areas where AI can excel but they are mostly professional tools and not profession replacements.

nicetriangle

It was of no use whatsoever to programming or image generation or writing a few years ago. This thing has developed very quickly and will continue to. Give it 5 years and I think things will look very differently.

@vexikron@lemmy.zip

OpenAi, please generate your own source code but optimized and improved in all possible ways.

not how programming works, but tech illiterate people seem to think so

@Omega_Haxors@lemmy.ml

Tech bros are disgusting.

That’s not even getting into the fraternity behavior at work, hyper-reactionary politics and, er, concerning age preferences.

@sculd@beehaw.org

Yup. I said it in another discussion before but think its relevant here.

Tech bros are more dangerous than Russian oligarchs. Oligarchs understand the people hate them so they mostly stay low and enjoy their money.

Tech bros think they are the savior of the world while destroying millions of people’s livelihood, as well as destroying democracy with their right wing libertarian politics.

@vexikron@lemmy.zip

Yep, completely agree.

Case in point: Steam has recently clarified their policies of using such Ai generated material that draws on essentially billions of both copyrighted and non copyrighted text and images.

To publish a game on Steam that uses AI gen content, you now have to verify that you as a developer are legally authorized to use all training material for the AI model for commercial purposes.

This also applies to code and code snippets generated by AI tools that function similarly, such as CoPilot.

So yeah, sorry, either gotta use MIT liscensed open source code or write your own, and you gotta do your own art.

I imagine this would also prevent you from using AI generated voice lines where you trained the model on basically anyone who did not explicitly consent to this as well, but voice gen software that doesnt use the ‘train the model on human speakers’ approach would probably be fine assuming you have the relevant legal rights to use such software commercially.

Not 100% sure this is Steam’s policy on voice gen stuff, they focused mainly on art dialogue and code in their latest policy update, but the logic seems to work out to this conclusion.

@TheFreezinSteven@beehaw.org

With your logic all artists will have to pay copyright fees just to learn how to draw. All musicians will have to pay copyright fees just to learn their instrument.

I guess I should clarify by saying I’m a professional musician.

Chahk

Do musicians not buy the music that they want to listen to? Should they be allowed to torrent any MP3 they want just because they say it’s for their instrument learning?

I mean I’d be all for it, but that’s not what these very same corporations (including Microsoft when it comes to software) wanted back during Napster times. Now they want a separate set of rules just for themselves. No! They get to follow the same laws they force down our throats.

@TheFreezinSteven@beehaw.org

Everything you said was completely irrelevant to what I mentioned and just plain ignorant.

Since when do you buy all the music you have ever listened to?

@redcalcium@lemmy.institute

I suspect the US government will allow OpenAI to continue doing as it please to keep their competitive advantage in AI over China (which don’t have problem with using copyrighted materials to train their models). They already limit selling AI-related hardware to keep their competitive advantage, so why stop there? Might as well allow OpenAI to continue using copyrighted materials to keep the competitive advantage.

DaDragon

So why is so much information (data) freely available on the internet? How do you expect a human artist to learn drawing, if not looking at tutorials and improving their skills through emulating what they see?

Haus

Try to train a human comedian to make jokes without ever allowing him to hear another comedian’s jokes, never watching a movie, never reading a book or magazine, never watching a TV show. I expect the jokes would be pretty weak.

luciole (he/him)

There’s this linguistic problem where one word is used for two different things, it becomes difficult to tell them apart. “Training” or “learning” is a very poor choice of word to describe the calibration of a neural network. The actor and action are both fundamentally different from the accepted meaning. To start with, human learning is active whereas machining learning is strictly passive: it’s something done by someone with the machine as a tool. Teachers know very well that’s not how it happens with humans.

When I compare training a neural network with how I trained to play clarinet, I fail to see any parallel. The two are about as close as a horse and a seahorse.

@intensely_human@lemm.ee

Not sure what you mean by passive. It takes a hell of a lot of electricity to train one of these LLMs so something is happening actively.

I often interact with ChatGPT 4 as if it were a child. I guide it through different kinds of mental problems, having it take notes and evaluate its own output, because I know our conversations become part of its training data.

It feels very much like teaching a kid to me.

luciole (he/him)

I mean passive in terms of will. Computers want and do nothing. They’re machines that function according to commands.

The way you feel like teaching a child when you feed input in natural language to a LLM until you’re satisfied with the output is known as the ELIZA effect. To quote Wikipedia:

In computer science, the ELIZA effect is the tendency to project human traits — such as experience, semantic comprehension or empathy — into computer programs that have a textual interface. The effect is a category mistake that arises when the program’s symbolic computations are described through terms such as “think”, “know” or “understand.”

Skull giver

deleted by creator

Phanatik

A comedian isn’t forming a sentence based on what the most probable word is going to appear after the previous one. This is such a bullshit argument that reduces human competency to “monkey see thing to draw thing” and completely overlooks the craft and intent behind creative works. Do you know why ChatGPT uses certain words over others? Probability. It decided as a result of its training that one word would appear after the previous in certain contexts. It absolutely doesn’t take into account things like “maybe this word would be better here because the sound and syllables maintains the flow of the sentence”.

Baffling takes from people who don’t know what they’re talking about.

DaDragon

That’s what humans do, though. Maybe not probability directly, but we all know that some words should be put in a certain order. We still operate within standard norms that apply to aparte group of people. LLM’s just go about it in a different way, but they achieve the same general result. If I’m drawing a human, that means there’s a ‘hand’ here, and a ‘head’ there. ‘Head’ is a weird combination of pixels that mostly look like this, ‘hand’ looks kinda like that. All depends on how the model is structured, but tell me that’s not very similar to a simplified version of how humans operate.

Phanatik

Yeah but the difference is we still choose our words. We can still alter sentences on the fly. I can think of a sentence and understand verbs go after the subject but I still have the cognition to alter the sentence to have the effect I want. The thing lacking in LLMs is intent and I’m yet to see anyone tell me why a generative model decides to have more than 6 fingers. As humans we know hands generally have five fingers and there’s a group of people who don’t so unless we wanted to draw a person with a different number of fingers, we could. A generative art model can’t help itself from drawing multiple fingers because all it understands is that “finger + finger = hand” but it has no concept on when to stop.

DaDragon

And that’s the reason why LLM generated content isn’t considered creative.

I do believe that the person using the device has a right to copyright the unique method they used to generate the content, but the content itself isn’t anything worth protecting.

Phanatik

You say that yet I initially responded to someone who was comparing an LLM to what a comedian does.

There is no unique method because there’s hardly anything unique you can do. Two people using Stable Diffusion to produce an image are putting in the same amount of work. One might put more time into crafting the right prompt but that’s not work you’re doing.

If 90% of the work is handled by the model, and you just layer on whatever extra thing you wanted, that doesn’t mean you created the thing. That also implies you have much control over the output. You’re effectively negotiating with this machine to produce what you want.

DaDragon

Wouldn’t that lead to the same argument as originally brought against photography, though?

A photographer is effectively negotiating with the sun, the sky and everything else to hopefully get the result they are looking for on their device.

Phanatik

One difference is that the photographer has to go the places they’re taking pictures of.

Another is that photography isn’t comparable to paintings and it never has been. I’m willing to bet photography and paintings have never coexisted in a contest. Except, when people say their generative art is comparable to what artists have been producing by hand, they are admitting that generative art has more in common with photography than it does with hand-crafted art but they want the prestige and recognition those artists get for their work.

Nyfure

more time into crafting the right prompt

Thats not work to you? My company pays me to spend time to do the right thing, even though most of the work does the computer.

I see where you are going at, but your argument also invalidates other forms of human interaction and creating.

In my country copyright can only be granted if a certain amount of (human) work went into something. Any work.
The difficult part is finding out whats enough and what kind of work qualify to lead to some kind of protection, even if partial.
The difficult part was not to create something, but to prove someone did or didnt put enough work into it.
I think we can hold generated or assisted goods to the same standard.

Putting a simple prompt together should probably not be granted protection as no significant work went into it. But refining it, editing the result… maybe thats enough, thats really up to the society to decide.

At the same time we have to balance the power of machines against human work, so the human work doesnt get totally invalidated, but rather shifted and treated as sub-type.
Machines already replaced alot of work, also creative ones. Book-printing, forging, producing food… the scary part about generative AI is mainly the speed of them spreading.

Phanatik

So as a data analyst a lot of my work is done through a computer but I can apply my same skills if someone hands me a piece of paper with data printed on it and told me to come up with solutions to the problems with it. I don’t need the computer to do what I need to do, it makes it easier to manipulate data but the degree of problem solving required needs to be done by a human and that’s why it’s my job. If a machine could do it, then they would be doing it but they aren’t because contrary to what people believe about data analysis, you have to be somewhat creative to do it well.

Crafting a prompt is an exercise in trial and error. It’s work but it’s not skilled work. It doesn’t take talent or practice to do. Despite the prompt, you are still at the mercy of the machine.

Even by the case you’ve presented, I have to ask, at what point of a human editing the output of a generative model constitutes it being your own work and not the machine’s? How much do you have to change? Can you give me a %?

Machines were intended to automate the tedious tasks that we all have to suffer to free up our brains for more engaging things which might include creative pursuits. Automation exists to make your life easier, not to rob you of life’s pursuits or your livelihood. It never should’ve been used to produce creative work and I find the attempts to equate this abomination’s outputs to what artists have been doing for years, utterly deplorable.

@intensely_human@lemm.ee

I don’t choose my words man. I get a vague sense of the meaning I want to convey and the words just form themselves.

@ParsnipWitch@feddit.de

As an artist you draw with an understanding of the human body, though. An understanding current models don’t have because they aren’t actually intelligent.

Maybe when a human is an absolute beginner in drawing they will think about the different lines and replicate even how other people draw stuff that then looks like a hand.

But eventually they will realise (hopefully, otherwise they may get frustrated and stop drawing) that you need to understand the hand to draw one. It’s mass, it’s concept or the idea of what a hand is.

This may sound very abstract and strange but creative expression is more complex than replicating what we have seen a million times. It’s a complex function unique to the human brain, an organ we don’t even scientifically understand yet.

@SuperSaiyanSwag@lemmy.zip

Am I a moron? How do you have more upvotes than the parent comment, is it because you’re being more aggressive with your statement? I feel like you didn’t quite refute what the parent comment said. You’re just explaining how Chat GPT works, but you’re not really saying how it shouldn’t use our established media (copyrighted material) as a reference.

Phanatik

I don’t control the upvotes so I don’t know why that’s directed at me.

The refutation was based on around a misunderstanding of how LLMs generate their outputs and how the training data assists the LLM in doing what it does. The article itself tells you ChatGPT was trained off of copyrighted material they were not licensed for. The person I responded to suggested that comedians do this with their work but that’s equating the process an LLM uses when producing an output to a comedian writing jokes.

Edit: Apologies if I do come across aggressive. Since the plagiarism machine has been in full swing, the whole discourse around it has gotten on my nerves. I’m a creative person, I’ve written poems and short stories, I’m writing a novel and I also do programming and a whole host of hobbies so when LLMs are used to put people like me out of a job using my own work, why wouldn’t that make me angry? What makes it worse is that I’m having to explain concepts to people regarding LLMs that they continue to defend. I can’t stand it so yes, I will come off aggressive.

@SuperSaiyanSwag@lemmy.zip

Sorry, I was essentially emphasizing on my initial point “am I a moron?”, lol, because I legitimately didn’t get your point at first like others do in this thread.

I get what you mean now after reading it couple more times

@hascat@programming.dev

That’s not the point though. The point is that the human comedian and the AI both benefit from consuming creative works covered by copyright.

Phanatik

Yeah except a machine is owned by a company and doesn’t consume the same way. It breaks down copyrighted works into data points so it can find the best way of putting those data points together again. If you understand anything at all about how these models work, they do not consume media the same way we do. It is not an entity with a thought process or consciousness (despite the misleading marketing of “AI” would have you believe), it’s an optimisation algorithm.

Chahk

It’s a glorified autocomplete.

Phanatik

It’s so funny that this is something new. This was Grammarly’s whole schtick since before ChatGPT so how different is Grammarly AI?

@vexikron@lemmy.zip

Here is the bigger picture: The vast majority of tech illiterate people think something is AI because duh its called AI.

Its literally just the power of branding and marketing on the minds of poorly informed humans.

Unfortunately this is essentially a reverse Turing Test.

The vast majority of humans do not know anything about AI, and also a huge majority of them can also barely tell the difference between, currently in some but not all forms, output from what is basically a brute force total internet plagiarism and synthesis software, from many actual human created content in many cases.

To me this basically just means that about 99% of the time, most humans are actually literally NPCs, and they only do actual creative and unpredictable things very very rarely.

@intensely_human@lemm.ee

I call it AI because it’s artificial and it’s intelligent. It’s not that complicated.

The thing we have to remember is how scary and disruptive AI is. Given that fear, it is scary to acknowledge that we have AI emerging into our world. Because it is scary, that pushes us to want to ignore it.

It’s called denial, and it’s the best explanation for why people aren’t willing to acknowledge that LLMs are AI.

@vexikron@lemmy.zip

And human comedians regularly get called out when they outright steal others material and present it as their own.

The word for this is plagiarism.

And in OpenAIs framework, when used in a relevant commercial context, they are functionally operating and profiting off of the worlds most comprehensive plagiarism software.

@intensely_human@lemm.ee

They get called out when they use others work as a template, not as training data.

tryptaminev 🇵🇸 🇺🇦 🇪🇺

You do know that comedians are copying each others material all the time though? Either making the same joke, or slightly adapting it.

So in the context of copyright vs. model training i fail to see how the exact process of the model is relevant? At the end copyrighted material goes in and material based on that copyrighted material goes out.

frog 🐸

I wish I could upvote this more than once.

What people always seem to miss is that a human doesn’t need billions of examples to be able to produce something that’s kind of “eh, close enough”. Artists don’t look at billions of paintings. They look at a few, but do so deeply, absorbing not just the most likely distribution of brushstrokes, but why the painting looks the way it does. For a basis of comparison, I did an art and design course last year and looked at about 300 artworks in total (course requirement was 50-100). The research component on my design-related degree course is one page a week per module (so basically one example from the field the module is about, plus some analysis). The real bulk of the work humans do isn’t looking at billions of examples: it’s looking at a few, and then practicing the skill and developing a process that allows them to convey the thing they’re trying to express.

If the AI models were really doing exactly the same thing humans do, the models could be trained without any copyright infringement at all, because all of the public domain and creative commons content, plus maybe licencing a little more, would be more than enough.

Phanatik

Exactly! You can glean so much from a single work, not just about the work itself but who created it and what ideas were they trying to express and what does that tell us about the world they live in and how they see that world.

This doesn’t even touch the fact that I’m learning to draw not by looking at other drawings but what exactly I’m trying to draw. I know at a base level, a drawing is a series of shapes made by hand whether it’s through a digital medium or traditional pen/pencil and paper. But the skill isn’t being able replicate other drawings, it’s being able to convert something I can see into a drawing. If I’m drawing someone sitting in a wheelchair, then I’ll get the pose of them sitting in the wheelchair but I can add details I want to emphasise or remove details I don’t want. There’s so much that goes into creative work and I’m tired of arguing with people who have no idea what it takes to produce creative works.

frog 🐸

It seems that most of the people who think what humans and AIs do is the same thing are not actually creatives themselves. Their level of understanding of what it takes to draw goes no further than “well anyone can draw, children do it all the time”. They have the same respect for writing, of course, equating the ability to string words together to write an email, with the process it takes to write a brilliant novel or script. They don’t get it, and to an extent, that’s fine - not everybody needs to understand everything. But they should at least have the decency to listen to the people that do get it.

@intensely_human@lemm.ee

Well, that’s not me. I’m a creative, and I see deep parallels between how LLMs work and how my own mind works.

frog 🐸

Either you’re vastly overestimating the degree of understanding and insight AIs possess, or you’re vastly underestimating your own capabilities. :)

Veloxization

This whole AI craze has just shown me that people are losing faith in their own abilities and their ability to learn things. I’ve heard so many who use AI to generate “artwork” argue that they tried to do art “for years” without improving, and hence have come to conclusion that creativity is a talent that only some have, instead of a skill you can learn and hone. Just because they didn’t see results as fast as they’d have liked.

@jarfil@beehaw.org

Alternatively, you might be vastly overestimating human “understanding and insight”, or how much of it is really needed to create stuff.

@teawrecks@sopuli.xyz

What you count as “one” example is arbitrary. In terms of pixels, you’re looking at millions right now.

The ability to train faster using fewer examples in real time, similar to what an intelligent human brain can do, is definitely a goal of AI research. But right now, we may be seeing from AI what a below average human brain could accomplish with hundreds of lifetimes to study.

If the AI models were really doing exactly the same thing humans do, the models could be trained without any copyright infringement at all, because all of the public domain and creative commons content, plus maybe licencing a little more, would be more than enough.

I mean, no, if you only ever look at public domain stuff you literally wouldn’t know the state of the art, which is historically happening for profit. Even the most untrained artist “doing their own thing” watches Disney/Pixar movies and listens to copyrighted music.

@Bene7rddso@feddit.de

Humans learn mostly from real life. Go touch some grass

frog 🐸

If we’re going by the number of pixels being viewed, then you have to use the same measure for both humans and AIs - and because AIs have to look at billions of images while humans do not, the AI still requires far more pixels than a human does.

And humans don’t require the most modern art in order to learn to draw at all. Sure, if they want to compete with modern artists, they would need to look at modern artists (for which educational fair use exists, and again the quantity of art being used by the human for this purpose is massively lower than what an AI uses - a human does not need to consume billions of artworks from modern artists in order to learn what the current trends are). But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works, because the process for drawing, say, the human figure (with the right number of fingers!) has not changed in hundreds of years. A human can also just… go outside and draw things they see themselves, because the sky above them and the tree across the street aren’t copyrighted. And in fact, I’d argue that a good artist should go out and find real things to draw.

OpenAI’s argument is literally that their AI cannot learn without using copyrighted materials in vast quantities - too vast for them to simply compensate all the creators. So it genuinely is not comparable to a human, because humans can, in fact, learn without using copyrighted material. If OpenAI’s argument is actually that their AI can’t compete commercially with modern art without using copyrighted works, then they should be honest about that - but then they’d be showing their hand, wouldn’t they?

@teawrecks@sopuli.xyz

Sure, if they want to compete with modern artists, they would need to look at modern artists

Which is the literal goal of Dall-E, SD, etc.

But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works

They could definitely learn some amount of skill, I agree. I’d be very interested to see the best that an AI could achieve using only PD and CC content. It would be interesting. But you’d agree that it would look very different from modern art, just as an alien who has only been consuming earth media from 100+ years ago would be unable to relate to us.

the sky above them and the tree across the street aren’t copyrighted.

Yeah, I’d consider that PD/CC content that such an AI would easily have access to. But obviously the real sky is something entirely different from what is depicted in Starry Night, Star Wars, or H.P. Lovecraft’s description of the cosmos.

OpenAI’s argument is literally that their AI cannot learn without using copyrighted materials in vast quantities

Yeah, I’d consider that a strong claim on their part; what they really mean is, it’s the easiest way to make progress in AI, and we wouldn’t be anywhere close to where we are without it.

And you could argue “convenient that it both saves them money, and generates money for them to do it this way”, but I’d also point out that the alternative is they keep the trained models closed source, never using them publicly until they advance the tech far enough that they’ve literally figured out how to build/simulate a human brain that is able to learn as quickly and human-like as you’re describing. And then we find ourselves in a world where one or two corporations have this incredible proprietary ability that no one else has.

Personally, I’d rather live in the world where the information about how to do all of this isn’t kept for one or two corporations to profit from, I would rather live in the version where they publish their work publicly, early, and often, show that it works, and people are able to reproduce it, open source it, train their own models, and advance the technology in a space where anyone can use it.

You could hypothesize of a middle ground where they do the research, but aren’t allowed to profit from it without licensing every bit of data they train on. But the reality of AI research is that it only happens to the extent that it generates revenue. It’s been that way for the entire history of AI. Douglas Hofstadter has been asking deep important questions about AI as it relates to consciousness for like 60 years (ex. GEB, I am a Strange Loop), but there’s a reason he didn’t discover LLMs and tech companies did. That’s not to say his writings are meaningless, in fact I think they’re more important than ever before, but he just wasn’t ever going to get to this point with a small team of grad students, a research grant, and some public domain datasets.

So, it’s hard to disagree with OpenAI there, AI definitely wouldn’t be where it is without them doing what they’ve done. And I’m a firm believer that unless we figure our shit out with energy generation soon, the earth will be an uninhabitable wasteland. We’re playing a game of climb the Kardashev scale, we opted for the “burn all the fossil fuels as fast as possible” strategy, and now we’re a the point where either spent enough energy fast enough to figure out the tech needed to survive this, or we suffocate on the fumes. The clock is ticking, and AI may be our best bet at saving the human race that doesn’t involve an inordinate number of people dying.

frog 🐸

OpenAI are not going to make the source code for their model accessible to all to learn from. This is 100% about profiting from it themselves. And using copyrighted data to create open source models would seem to violate the very principles the open source community stands for - namely that everybody contributes what they agree to, and everything is published under a licence. If the basis of an open source model is a vast quantity of training data from a vast quantity of extremely pissed off artists, at least some of the people working on that model are going to have a “are we the baddies?” moment.

The AI models are also never going to produce a solution to climate change that humans will accept. We already know what the solution is, but nobody wants to hear it, and expecting anyone to listen to ChatGPT and suddenly change their minds about using fossil fuels is ludicrous. And an AI that is trained specifically on knowledge about the climate and technologies that can improve it, with the purpose of innovating some hypothetical technology that will fix everything without humans changing any of their behaviour, categorically does not need the entire contents of ArtStation in its training data. AIs that are trained to do specific tasks, like the ones trained to identify new antibiotics, are trained on a very limited set of data, most of which is not protected by copyright and any that is can be easily licenced because the quantity is so small - and you don’t see anybody complaining about those models!

@teawrecks@sopuli.xyz

OpenAI are not going to make the source code for their model accessible to all to learn from

OpenAI isn’t the only company doing this, nor is their specific model the knowledge that I’m referring to.

The AI models are also never going to produce a solution to climate change that humans will accept.

It is already being used to further fusion research beyond anything we’ve been able to do with standard algorithms

We already know what the solution is, but nobody wants to hear it

Then it’s not a solution. That’s like telling your therapist, “I know how to fix my relationship, my partner just won’t do it!”

expecting anyone to listen to ChatGPT and suddenly change their minds about using fossil fuels is ludicrous

Lol. Yeah, I agree, that’s never going to work.

categorically does not need the entire contents of ArtStation in its training data.

That’s a strong claim to make. Regardless of the ethics involved, or the problems the AI can solve today, the fact is we seeing rapid advances in AI research as a direct result of these ethically dubious models.

In general, I’m all for the capitalist method of artists being paid their fair share for the work they do, but on the flip side, I see a very possible mass extinction event on the horizon, which could cause suffering the likes of which humanity has never seen. If we assume that is the case, and we assume AI has a chance of preventing it, then I would prioritize that over people’s profits today. And I think it’s perfectly reasonable to say I’m wrong.

And then there’s the problem of actually enforcing any sort of regulation, which would be so much more difficult than people here are willing to admit. There’s basically nothing you can do even if you wanted to. Your Carlin example is exactly the defense a company would use: “I guess our AI just happened to create a movie that sounds just like Paul Blart, but we swear it’s never seen the film. Great minds think alike, I guess, and we sell only the greatest of minds”.

@Even_Adder@lemmy.dbzer0.com

It isn’t wrong to use copyrighted works for training. Let me quote an article by the EFF here:

First, copyright law doesn’t prevent you from making factual observations about a work or copying the facts embodied in a work (this is called the “idea/expression distinction”). Rather, copyright forbids you from copying the work’s creative expression in a way that could substitute for the original, and from making “derivative works” when those works copy too much creative expression from the original.

Second, even if a person makes a copy or a derivative work, the use is not infringing if it is a “fair use.” Whether a use is fair depends on a number of factors, including the purpose of the use, the nature of the original work, how much is used, and potential harm to the market for the original work.

and

Even if a court concludes that a model is a derivative work under copyright law, creating the model is likely a lawful fair use. Fair use protects reverse engineering, indexing for search engines, and other forms of analysis that create new knowledge about works or bodies of works. Here, the fact that the model is used to create new works weighs in favor of fair use as does the fact that the model consists of original analysis of the training images in comparison with one another.

What you want would swing the doors open for corporate interference like hindering competition, stifling unwanted speech, and monopolization like nothing we’ve seen before. There are very good reasons people have these rights, and we shouldn’t be trying to change this. Ultimately, it’s apparent to me, you are in favor of these things. That you believe artists deserve a monopoly on ideas and non-specific expression, to the detriment of anyone else. If I’m wrong, please explain to me how.

If we’re going by the number of pixels being viewed, then you have to use the same measure for both humans and AIs - and because AIs have to look at billions of images while humans do not, the AI still requires far more pixels than a human does.

Humans benefit from years of evolutionary development and corporeal bodies to explore and interact with their world before they’re ever expected to produce complex art. AI need huge datasets to understand patterns to make up for this disadvantage. Nobody pops out of the womb with fully formed fine motor skills, pattern recognition, understanding of cause and effect, shapes, comparison, counting, vocabulary related to art, and spatial reasoning. Datasets are huge and filled with image-caption pairs to teach models all of this from scratch. AI isn’t human, and we shouldn’t judge it against them, just like we don’t judge boats on their rowing ability.

And humans don’t require the most modern art in order to learn to draw at all. Sure, if they want to compete with modern artists, they would need to look at modern artists (for which educational fair use exists, and again the quantity of art being used by the human for this purpose is massively lower than what an AI uses - a human does not need to consume billions of artworks from modern artists in order to learn what the current trends are). But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works, because the process for drawing, say, the human figure (with the right number of fingers!) has not changed in hundreds of years. A human can also just… go outside and draw things they see themselves, because the sky above them and the tree across the street aren’t copyrighted. And in fact, I’d argue that a good artist should go out and find real things to draw.

AI don’t require most modern art in order to learn to make images either, but the range of expression would be limited, just like a human’s in this situation. You can see this in cave paintings and early sculptures. They wouldn’t be limited to this same degree, but you would still be limited.

It took us 100,000 years to get from cave drawings to Leonard Da Vinci. This is just another step for artists, like Camera Obscura was in the past. It’s important to remember that early man was as smart as we are, they just lacked the interconnectivity to exchange ideas that we have.

@ParsnipWitch@feddit.de

I think the difference in artistic expression between modern humans and humans in the past comes down to the material available (like the actual material to draw with).

Humans can draw without seeing any image ever. Blind people can create art and draw things because we have a different understanding of the world around us than AI has. No human artist needs to look at a thousand or even at 1 picture of a banana to draw one.

The way AI sees and “understands” the world and how it generates an image is fundamentally different from how the human brain conveys the object banana into an image of a banana.

@Even_Adder@lemmy.dbzer0.com

I think the difference in artistic expression between modern humans and humans in the past comes down to the material available (like the actual material to draw with).

That is definitely a difference, but even that is a kind of information shared between people, and information itself is what gives everyone something to build on. That gives them a basis on which to advance understanding, instead of wasting time coming up with the same things themselves every time.

Humans can draw without seeing any image ever. Blind people can create art and draw things because we have a different understanding of the world around us than AI has. No human artist needs to look at a thousand or even at 1 picture of a banana to draw one.

Humans don’t need representations of things in images because they have the opportunity to interact with the genuine article, and in situations when that is impractical, they can still fall back on images to learn. Someone without sight from birth can’t create art the same way a sighted person can.

The way AI sees and “understands” the world and how it generates an image is fundamentally different from how the human brain conveys the object banana into an image of a banana.

That’s the beauty of it all, despite that, these models can still output bananas.

Quokka

Children learn by watching others. We are trained from millions of examples starting from before birth.

@intensely_human@lemm.ee

When you look at one painting, is that the equivalent of one instance of the painting in the training data? There is an infinite amount of information in the painting, and each time you look you process more of that information.

I’d say any given painting you look at in a museum, you process at least a hundred mental images of aspects of it. A painting on your wall could be seen ten thousand times easily.

@Even_Adder@lemmy.dbzer0.com

When people say that the “model is learning from its training data”, it means just that, not that it is human, and not that it learns exactly humans. It doesn’t make sense to judge boats on how well they simulate human swimming patterns, just how well they perform their task.

Every human has the benefit of as a baby training on things around them and being trained by those around them, building a foundation for all later skills. Generative models rely on many text and image pairs to describe things to them because they lack the ability to poke, prod, rotate, and disassemble for themselves.

For example, when a model takes in a thousand images of circles, it doesn’t “learn” a thousand circles. It learns what circle GENERALLY is like, the concept of it. That representation, along with random noise, is how you create images with them. The same happens for every concept the model trains on. Everything from “cat” to more complex things like color relationships and reflections or lighting. Machines are not human, but they can learn despite that.

@ParsnipWitch@feddit.de

In general I agree with you, but AI doesn’t learn the concept of what a circle is. AI reproduces the most fitting representation of what we call a circle. But there is no understanding of the concept of a circle. This may sound nit picking, but I think it’s important to make the distinction.

That is why current models aren’t regarded as actual intelligence, although people already call them that…

@Even_Adder@lemmy.dbzer0.com

I understand. I didn’t mean to imply any sort of understanding with the language I used.

@Eccitaze@yiffit.net

It makes sense to judge how closely LLMs mimic human learning when people are using it as a defense to AI companies scraping copyrighted content, and making the claim that banning AI scraping is as nonsensical as banning human learning.

But when it’s pointed out that LLMs don’t learn very similarly to humans, and require scraping far more material than a human does, suddenly AIs shouldn’t be judged by human standards? I don’t know if it’s intentional on your part, but that’s a pretty classic example of a motte-and-bailey fallacy. You can’t have it both ways.

@Even_Adder@lemmy.dbzer0.com

I don’t understand what you mean, can you elaborate?

@intensely_human@lemm.ee

Text prediction seems to be sufficient to explain all verbal communication to me. Until someone comes up with a use case that humans can do that LLMs cannot, and I mean a specific use case not general high level concepts, I’m going to assume human verbal cognition works the same was as an LLM.

We are absolutely basing our responses on what words are likely to follow which other ones. It’s literally how a baby learns language from those around them.

@chaos@beehaw.org

If you ask an LLM to help you with a legal brief, it’ll come up with a bunch of stuff for you, and some of it might even be right. But it’ll very likely do things like make up a case that doesn’t exist, or misrepresent a real case, and as has happened multiple times now, if you submit that work to a judge without a real lawyer checking it first, you’re going to have a bad time.

There’s a reason LLMs make stuff up like that, and it’s because they have been very, very narrowly trained when compared to a human. The training process is almost entirely getting good at predicting what words follow what other words, but humans get that and so much more. Babies aren’t just associating the sounds they hear, they’re also associating the things they see, the things they feel, and the signals their body is sending them. Babies are highly motivated to learn and predict the behavior of the humans around them, and as they get older and more advanced, they get rewarded for creating accurate models of the mental state of others, mastering abstract concepts, and doing things like make art or sing songs. Their brains are many times bigger than even the biggest LLM, their initial state has been primed for success by millions of years of evolution, and the training set is every moment of human life.

LLMs aren’t nearly at that level. That’s not to say what they do isn’t impressive, because it really is. They can also synthesize unrelated concepts together in a stunningly human way, even things that they’ve never been trained on specifically. They’ve picked up a lot of surprising nuance just from the text they’ve been fed, and it’s convincing enough to think that something magical is going on. But ultimately, they’ve been optimized to predict words, and that’s what they’re good at, and although they’ve clearly developed some impressive skills to accomplish that task, it’s not even close to human level. They spit out a bunch of nonsense when what they should be saying is “I have no idea how to write a legal document, you need a lawyer for that”, but that would require them to have a sense of their own capabilities, a sense of what they know and why they know it and where it all came from, knowledge of the consequences of their actions and a desire to avoid causing harm, and they don’t have that. And how could they? Their training didn’t include any of that, it was mostly about words.

One of the reasons LLMs seem so impressive is that human words are a reflection of the rich inner life of the person you’re talking to. You say something to a person, and your ideas are broken down and manipulated in an abstract manner in their head, then turned back into words forming a response which they say back to you. LLMs are piggybacking off of that a bit, by getting good at mimicking language they are able to hide that their heads are relatively empty. Spitting out a statistically likely answer to the question “as an AI, do you want to take over the world?” is very different from considering the ideas, forming an opinion about them, and responding with that opinion. LLMs aren’t just doing statistics, but you don’t have to go too far down that spectrum before the answers start seeming thoughtful.

Pup Biru

you know how the neurons in our brain work, right?

because if not, well, it’s pretty similar… unless you say there’s a soul (in which case we can’t really have a conversation based on fact alone), we’re just big ol’ probability machines with tuned weights based on past experiences too

@ParsnipWitch@feddit.de

“Soul” is the word we use for something we don’t scientifically understand yet. Unless you did discover how human brains work, in that case I congratulate you on your Nobel prize.

You can abstract a complex concept so much it becomes wrong. And abstracting how the brain works to “it’s a probability machine” definitely is a wrong description. Especially when you want to use it as an argument of similarity to other probability machines.

Pup Biru

“Soul” is the word we use for something we don’t scientifically understand yet

that’s far from definitive. another definition is

A part of humans regarded as immaterial, immortal, separable from the body at death

but since we aren’t arguing semantics, it doesn’t really matter exactly, other than the fact that it’s important to remember that just because you have an experience, belief, or view doesn’t make it the only truth

of course i didn’t discover categorically how the human brain works in its entirety, however most scientists i’m sure would agree that the method by which the brain performs its functions is by neurons firing. if you disagree with that statement, the burden of proof is on you. the part we don’t understand is how it all connects up - the emergent behaviour. we understand the basics; that’s not in question, and you seem to be questioning it

You can abstract a complex concept so much it becomes wrong

it’s not abstracted; it’s simplified… if what you’re saying were true, then simplifying complex organisms down to a petri dish for research would be “abstracted” so much it “becomes wrong”, which is categorically untrue… it’s an incomplete picture, but that doesn’t make it either wrong or abstract

*edit: sorry, it was another comment where i specifically said belief; the comment you replied to didn’t state that, however most of this still applies regardless

i laid out an a leads to b leads to c and stated that it’s simply a belief, however it’s a belief that’s based in logic and simplified concepts. if you want to disagree that’s fine but don’t act like you have some “evidence” or “proof” to back up your claims… all we’re talking about here is belief, because we simply don’t know - neither you nor i

and given that all of this is based on belief rather than proof, the only thing that matters is what we as individuals believe about the input and output data (because the bit in the middle has no definitive proof either way)

if a human consumes media and writes something and it looks different, that’s not a violation

if a machine consumes media and writes something and it looks different, you’re arguing that is a violation

the only difference here is your belief that a human brain somehow has something “more” than a probabilistic model going on… but again, that’s far from certain

Phanatik

You are spitting out basic points and attempting to draw similarities because our brains are capable of something similar. The difference between what you’ve said and what LLMs do is that we have experiences that we are able to glean a variety of information from. An LLM sees text and all it’s designed to do is say “x is more likely to appear before y than z”. If you fed it nonsense, it would regurgitate nonsense. If you feed it text from racist sites, it will regurgitate that same language because that’s all it has seen.

You’ll read this and think “that’s what humans do too, right?” Wrong. A human can be fed these things and still reject them. Someone else in this thread has made some good points regarding this but I’ll state them here as well. An LLM will tell you information but it has no cognition on what it’s telling you. It has no idea that it’s right or wrong, it’s job is to convince you that it’s right because that’s the success state. If you tell it it’s wrong, that’s a failure state. The more you speak with it, the more fail states it accumulates and the more likely it is to cutoff communication because it’s not reaching a success, it’s not giving you what you want. The longer the conversation goes on, the more crazy LLMs get as well because it’s too much to process at once, holding those contexts in its memory while trying to predict the next one. Our brains do this easily and so much more. To claim an LLM is intelligent is incredibly misguided, it is merely the imitation of intelligence.

Pup Biru

but that’s just a matter of complexity, not fundamental difference. the way our brains work and the way an artificial neural network work aren’t that different; just that our brains are beyond many orders of magnitude bigger

there’s no particular reason why we can’t feed artificial neural networks an enormous amount of … let’s say tangentially related experiential information … as well, but in order to be efficient and make them specialise in the things we want, we only feed them information that’s directly related to the specialty we want them to perform

there’s some… “pre training” or “pre-existing state” that exists with humans too that comes from genetics, but i’d argue that’s as relevant to the actual task of learning, comprehension, and creating as a BIOS is to running an operating system (that is, a necessary precondition to ensure the correct functioning of our body with our brain, but not actually what you’d call the main function)

i’m also not claiming that an LLM is intelligent (or rather i’d prefer to use the term self aware because intelligent is pretty nebulous); just that the structure it has isn’t that much different to our brains just on a level that’s so much smaller and so much more generic that you can’t expect it to perform as well as a human - you wouldn’t expect to cut out 99% of a humans brain and have them be able to continue to function at the same level either

i guess the core of what i’m getting at is that the self awareness that humans have is definitely not present in an LLM, however i don’t think that self-awareness is necessarily a pre-requisite for most things that we call creativity. i think that’s it’s entirely possible for an artificial neural net that’s fundamentally the same technology that we use today to be able to ingest the same data that a human would from birth, and to have very similar outcomes… given that belief (and i’m very aware that it certainly is just a belief - we aren’t close to understanding our brains, but i don’t fundamentally thing there’s anything other then neurons firing that results in the human condition), just because you simplify and specialise the input data doesn’t mean that the process is different. you could argue that it’s lesser, for sure, but to rule out that it can create a legitimately new work is definitely premature

@teawrecks@sopuli.xyz

A comedian isn’t forming a sentence based on what the most probable word is going to appear after the previous one.

Neither is an LLM. What you’re describing is a primitive Markov chain.

You may not like it, but brains really are just glorified pattern recognition and generation machines. So yes, “monkey see thing to draw thing”, except a really complicated version of that.

Think of it this way: if your brain wasn’t a reorganization and regurgitation of the things you have observed before, it would just generate random noise. There’s no such thing as “truly original” art or it would be random noise. Every single word either of us is typing is the direct result of everything you and I have observed before this moment.

Baffling takes from people who don’t know what they’re talking about.

Ironic, to say the least.

The point you should be making, is that a corporation will make this above argument up to, but not including the point where they have to treat AIs ethically. So that’s the way to beat them. If they’re going to argue that they have created something that learns and creates content like a human brain, then they should need to treat it like a human, ensure it is well compensated, ensure it isn’t being overworked or enslaved, ensure it is being treated “humanely”. If they don’t want to do that, if they want it to just be a well built machine, then they need to license all the proprietary data they used to build it. Make them pick a lane.

Phanatik

Neither is an LLM. What you’re describing is a primitive Markov chain.

My description might’ve been indicative of a Markov chain but the actual framework uses matrices because you need to be able to store and compute a huge amount of information at once which is what matrices are good for. Used in animation if you didn’t know.

What it actually uses is irrelevant, how it uses those things is the same as a regression model, the difference is scale. A regression model looks at how related variables are in giving an outcome and computing weights to give you the best outcome. This was the machine learning boom a couple of years ago and TensorFlow became really popular.

LLMs are an evolution of the same idea. I’m not saying it’s not impressive because it’s very cool what they were able to do. What I take issue with is the branding, the marketing and the plagiarism. I happen to be in the intersection of working in the same field, an avid fan of classic Sci-Fi and a writer.

It’s easy to look at what people have created throughout history and think “this looks like that” and on a point by point basis you’d be correct but the creation of that thing is shaped by the lens of the person creating it. Someone might make a George Carlin joke that we’ve heard recently but we’ll read about it in newspapers from 200 years ago. Did George Carlin steal the idea? No. Was he aware of that information? I don’t know. But Carlin regularly calls upon his own experiences so it’s likely that he’s referencing a event from his past that is similar to that of 200 years ago. He might’ve subconsciously absorbed the information.

The point is that the way these models have been trained is unethical. They used material they had no license to use and they’ve admitted that it couldn’t work as well as it does without stealing other people’s work. I don’t think they’re taking the position that it’s intelligent because from the beginning that was a marketing ploy. They’re taking the position that they should be allowed to use the data they stole because there was no other way.

Pup Biru

branding

okay

the marketing

yup

the plagiarism

woah there! that’s where we disagree… your position is based on the fact that you believe that this is plagiarism - inherently negative

perhaps its best not use loaded language. if we want to have a good faith discussion, it’s best to avoid emotive arguments and language that’s designed to evoke negativity simply by their use, rather than the argument being presented

I happen to be in the intersection of working in the same field, an avid fan of classic Sci-Fi and a writer

its understandable that it’s frustrating, but just because a machine is now able to do a similar job to a human doesn’t make it inherently wrong. it might be useful for you to reframe these developments - it’s not taking away from humans, it’s enabling humans… the less a human has to have skill to get what’s in their head into an expressive medium for someone to consume the better imo! art and creativity shouldn’t be about having an ability - the closer we get to pure expression the better imo!

the less you have to worry about the technicalities of writing, the more you can focus on pure creativity

The point is that the way these models have been trained is unethical. They used material they had no license to use and they’ve admitted that it couldn’t work as well as it does without stealing other people’s work

i’d question why it’s unethical, and also suggest that “stolen” is another emotive term here not meant to further the discussion by rational argument

so, why is it unethical for a machine but not a human to absorb information and create something based on its “experiences”?

Phanatik

First of all, we’re not having a debate and this isn’t a courtroom so avoid the patronising language.

Second of all, my “belief” on the models’ plagiarism is based on technical knowledge of how the models work and not how I think they work.

a machine is now able to do a similar job to a human

This would be impressive if it was true. An LLM is not intelligent simply through its appearance of intelligence.

It’s enabling humans

It’s a chat bot that’s automated Google searches, let’s be clear about what this can do. It’s taken natural language processing and applied it through an optimisation algorithm to produce human-like responses.

No, I disagree at a fundamental level. Humans need to compete against each other and ourselves to improve. Just because an LLM can write a book for you, doesn’t mean you’ve written a book. You’re just lazy. You don’t want to put in the work any other writer in existence has done, to mull over their work and consider the emotions and effect they want to have on the reader. To what extent can an LLM replicate the way George RR Martin describes his world without entirely ripping off his work?

i’d question why it’s unethical, and also suggest that “stolen” is another emotive term here not meant to further the discussion by rational argument

If I take a book you wrote from you without buying it or paying you for it, what would you call that?

Pete Hahnloser

A comedian walks on stage and says, “Why is there a mic here?”

@sculd@beehaw.org

AIs are not humans. Humans cannot read millions of texts in seconds and cannot split out millions of output at the same time.

sub_o

Try to train a human comedian to make jokes without ever allowing him to hear another comedian’s jokes, never watching a movie, never reading a book or magazine, never watching a TV show. I expect the jokes would be pretty weak.

@teri@discuss.tchncs.de

OpenAI’s notion of “fair use”: military and weapons

Those type of companies are getting so f*****g disgusting.

https://techcrunch.com/2024/01/12/openai-changes-policy-to-allow-military-applications/ https://www.theverge.com/2024/1/12/24036397/openai-is-softening-its-stance-on-military-use

@sculd@beehaw.org

Yup, I saw that too. There is also another thread on this board that is discussing this issue.

One interesting thing I noticed is how the AI apologists in this thread seems to be quiet on the other.

@sculd@beehaw.org

Some relevant comments from Ars:

leighno5

The absolute hubris required for OpenAI here to come right out and say, ‘Yeah, we have no choice but to build our product off the exploitation of the work others have already performed’ is stunning. It’s about as perfect a representation of the tech bro mindset that there can ever be. They didn’t even try to approach content creators in order to do this, they just took what they needed because they wanted to. I really don’t think it’s hyperbolic to compare this to modern day colonization, or worker exploitation. ‘You’ve been working pretty hard for a very long time to create and host content, pay for the development of that content, and build your business off of that, but we need it to make money for this thing we’re building, so we’re just going to fucking take it and do what we need to do.’

The entitlement is just…it’s incredible.

4qu4rius

20 years ago, high school kids were sued for millions & years in jail for downloading a single Metalica album (if I remember correctly minimum damage in the US was something like 500k$ per song).

All of a sudden, just because they are the dominant ones doing the infringment, they should be allowed to scrap the entire (digital) human knowledge ? Funny (or not) how the law always benefits the rich.

@DavidGarcia@feddit.nl

ip protections are a spook anyway

@vexikron@lemmy.zip

Or, or, or, hear me out:

Maybe their particular approach to making an AI is flawed.

Its like people do not know that there are many different kinds of ways that attempt to do AI.

Many of them do not rely on basically a training set that is the cumulative sum of all human generated content of every imaginable kind.

@zagaberoo@beehaw.org

What ways do you mean? More than just expert-systems, I’d imagine.

@vexikron@lemmy.zip

Well, off the top of my head:

Whole Brain Emulation, attempting to model a human brain as physically accurately as possible inside a computer.

Genetic Iteration (not the correct term for it but it escapes me at the moment), where you set up a simulated environment for digital actors, then simulate quasi-neurons, quasi-body parts dictated by quasi-dna, in a way that mimics actual biological natural selection and evolution, and then you run the simulation millions of times until your digital creature develops a stable survival strategy.

Similar approaches to this have been used to do things like teach an AI humanoid how to develop its own winning martial arts style via many many iterations, starting from not even being able to stand up, much less do anything to an opponent.

Both of these approaches obviously have drawbacks and strengths, and could possibly be successful at far more than what they have achieved to date, or maybe not, due to known or existing problems, but neither of them rely on a training set of essentially the entirety of all content on the internet.

HumbleHobo

That sounds like a great idea for making an intelligent agent inside a video game, where you control all aspects of it’s environment. But what about an AI that you want to be able to interact with our current shared reality. If I want to know something that involves synthesis of multiple modalities of knowledge how should that information be conveyed? Do humans grow up inside test tubes that only consume content that they themselves have created? Can you imagine the strange society we would have if people were unleashed upon the world without having any shared experiences until they were fully adults?

I think the OpenAI people have a point here, but I think where they go off the rails is that they expect all of this copyrighted information to be granted to them at zero cost and with zero responsibility to the creators of said content.

OpenAI says it’s “impossible” to create useful AI models without copyrighted materialplus-square

OpenAI says it’s “impossible” to create useful AI models without copyrighted materialplus-square

Technology

OpenAI says it’s “impossible” to create useful AI models without copyrighted material

OpenAI says it’s “impossible” to create useful AI models without copyrighted material