In the movie industry, everyone usually signs a work for hire contract that specifies who will have the rights to the completed film.
However, in a recent case the director (Alex Merkin) did not sign a contract and then tried to claim copyright afterwards. The court said that directors have no inherent copyright over film:
We answer that question in the negative on the facts of the present case, finding that the Copyright Actʹs terms, structure, and history support the conclusion that Merkinʹs contributions to the film do not themselves constitute a ʺwork of authorshipʺ amenable to copyright protection. … As a general rule, the author is the party who actually creates the work, that is, the person who translates an idea into a fixed, tangible expression entitled to copyright protection. … But a directorʹs contribution to an integrated ʺwork of authorshipʺ such as a film is not itself a ʺwork of authorshipʺ subject to its own copyright protection.
Simple question:
If you are college student, learning to write professionally, is it fair use to download copyrighted books from Z-Library in order to become a better writer? If you are a musician, is it fair use to download mp3s from The Pirate Bay in order to learn about musical styles? How about film students, can they torrent Disney movies as part of their education?
I’m certain that every court in the US would rule that this is not fair use. It’s not fair use even if pirated content ultimately teaches a student how to create original, groundbreaking works of writing, music, and film.
Simply being a student does not give someone free pass to pirate content. The same is true of training an AI, and there are already reports that pirated material is in the openAI training set.
If openAI could claim fair use, then almost by definition The Pirate Bay could claim fair use too.
Again, it’s not a question of reproducing books in an LLM. The allegation is that the openAI developers downloaded books illegally to train their AI.
You need to pay for your copy of a book. That’s true if you are a student teaching yourself to write, and it’s also true if you are an AI developer training an AI to write. In the latter case, you might also need to pay for a special license.
Is it possible that the openAI developers can bring the receipts showing they paid for each and every book and/or license they needed to train their AI? Sure, it’s possible. If so, the lawyers who brought the suit would look pretty silly for not even bother to check.
But openAI used a whole lot of books, which cost a whole lot of money. So I wouldn’t hold my breath.
the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect of the use upon the potential market.
Yes, and I named three of those factors:
the key questions are often whether the use of the work (a) is commercial, or (b) may substitute for the original work. Furthermore, the amount of the work copied is also considered.
And while you don’t need to meet all the criteria, the odds are pretty long when you fail three of the four (commercial nature, copying complete work rather than a portion, and negative effect on the market for the original).
Think of it this way: if it were legal to download books in order to train an AI, then it would also be legal to download books in order to train a human student. After all, why would a human have fewer rights than an AI?
Do you really think courts are going to decide that it’s ok to download books from The Pirate Bay or Z-Library, provided they are being read by the next generation of writers?
If a musician doesn’t have the right to their own work, it’s because someone offered to pay them for the rights and they accepted.
Is that in their favor? I think so, considering the alternative is to not get paid and not have rights to their work.
And not to go too far off topic, but publicly funded research is generally not aimed at drug development, it is aimed at discovering the basic science behind how the body works (human body or otherwise).
If you want a clinical trial that proves a particular drug can actually help patients, you will need to find a company to pay for it. The government almost never pays for clinical trials (I think the COVID vaccine might have been an exception). Clinical trials are far more expensive than basic science, and patents are the carrot to get the private sector to pay for them.
Yes, it absolutely hinges on fair use. That’s why the very first page of the lawsuit alleges:
“Defendants’ LLMs endanger fiction writers’ ability to make a living, in that the LLMs allow anyone to generate—automatically and freely (or very cheaply)—texts that they would otherwise pay writers to create”
If the court agrees with that claim, it will basically kill the fair use defense.
No, it doesn’t.
It defends web scraping (downloading copyrighted works) as legal if necessary for fair use. But fair use is not a foregone conclusion.
In fact, there was a recent case in which a company was sued for scraping images and texts from Facebook users. Their goal was to analyze them and create a database of advertising trackers, in competition with Facebook. The case settled, but not before the judge noted that the web scraper was not fair use and very likely infringing IP.
I know the model doesn’t contain a copy of the training data, but it doesn’t matter.
If the copyrighted data is downloaded at any point during training, that’s an IP violation. Even if it is immediately deleted after being processed by the model.
As an analogy, if you illegally download a Disney movie, watch it, write a movie review, and then delete the file … then you still violated copyright. The movie review doesn’t contain the Disney movie and your computer no longer has a copy of the Disney movie. But at one point it did, and that’s all that matters.
If they bought physical books then the lawsuit might happen, but it would be much harder to win.
If they bought e-books, then it might not have helped the AI developers. When you buy an e-book you are just buying a license, and the license might restrict what you can do with the text. If an e-book license prohibits AI training (and they will in the future, if they don’t already) then buying the e-book makes no difference.
Anyway, I expect that in the future publishers will make sets of curated data available for AI developers who are willing to pay. Authors who want to participate will get royalties, and developers will have a clear license to use the data they paid for.
When determining whether something is fair use, the key questions are often whether the use of the work (a) is commercial, or (b) may substitute for the original work. Furthermore, the amount of the work copied is also considered.
Search engine scrapers are fair use, because they only copy a snippet of a work and a search result cannot substitute for the work itself. Likewise if you copy an excerpt of a movie in order to critique it, because consumers don’t watch reviews as a substitute for watching movies.
On the other hand, openAI is accused of copying entire works, and openAI is explicitly intended as a replacement for hiring actual writers. I think it is unlikely to be considered fair use.
And in practice, fair use is not easy to establish.
The question “what is sufficient” basically amounts to convincing an official that the final work reflects some form of your creative expression.
So for instance, if you are hired to take AI-generated output and crop it to a 29:10 image, that probably won’t be eligible for copyright. You aren’t expressing your creativity, you are doing something anyone else could do.
On the other hand, if you take AI-generated output and edit it in photoshop to the point that everyone says “Hey, that looks like a ThunderingJerboa image”, then you would almost certainly be eligible for copyright.
Everyone else falls in between, trying to convince someone that they are more like the latter case. Which is good, because it means actual artists will be rewarded.
After you’ve spun enough brushes or popped enough balloons, the results will be fairly predictable. And some elements, for example the color of paint in the brushes/balloons, would be under full control.
Even if the final result is not completely predictable, an artist only needs to establish that a significant part of it is a form of creative expression.
You put as much effort into it as you would anything else.
Copyright is not meant to reward effort. This is a common misconception. Thirty years ago there was a landmark SCOTUS case about copyrighting a phone book. Back then, collecting and verifying phone numbers and addresses took a tremendous amount of effort. Somebody immediately copied the phone book, and the creators of the phone book argued that their effort should be rewarded with copyright protection.
The courts shot that down. Copyright is not about effort, it’s about creative expression. Creative expression can require major effort (Sistine Chapel) or take very little effort (duck lips photo). Either way, it’s rewarded with a copyright.
Assembling a database is not creative expression. Neither is judging whether an AI generated work is suitable. Nor pointing out what you’d like to see in a new AI generated work. So no matter how much effort one puts into those activities, they are not eligible for copyright.
To the extent that an artist takes an AI generated work and adds their own creative expression to it, they can claim copyright over the final result. But in all the cases in which AI generated works were ruled ineligible, the operator was simply providing prompts and/or approving the final result.
It’s not actually called “theft” or “stealing”, it’s called “infringement” or “violation”. Infringement is to intellectual property as trespassing is to real estate. The owners are still able to use their property, but their rights to it have nevertheless been violated.
Also, corporations cannot create intellectual property. They can only offer to buy it from the natural persons who created it. Without IP protection, creators would lose the only protections they have against corporations and other entrenched interests.
Imagine seeing all your family photos plastered on a McDonald’s billboard, or in political ad for a candidate you despise. Imagine being told, “Sorry, you can’t stop them from using your photos however they want”. That’s a world without IP protection.
You need direct control over some elements of a work to claim copyright. Not necessarily all of them.
So even if the microtexture is out of your control, you still have complete control of the framing, color, etc. That’s sufficient to claim copyright.
If you lose control of every element by replacing them all with prompts and/or chance, then you lose the copyright. Which is what happened in the “monkey selfie” photo.
No, under copyright law it would be your work and your work alone.
Someone who is providing suggestions or prompts to you is not eligible to share the copyright, no matter how detailed they are. They must actually create part of the work themselves.
So for instance if you are in a recording studio then you will have the full copyright over music that you record. No matter how much advice or suggestions you get from other people in the studio with you. Your instruments/voice/lyrics, your copyright.
Otherwise copyright law would be a constant legal quagmire with those who gave you suggestions/prompts/feedback! Remember, an idea cannot be copyrighted, and prompts are ideas.
In the case of Stable Diffusion, the copyright would go to Stable Diffusion alone if it were a human. But Stable Diffusion is not a human, so there is no copyright at all.
And arguably, neither the image generator did. Who did were the artists
In which case, neither the image generator nor its operator are eligible for copyright.
The same reasoning still applies to Stable Diffusion etc., given that you can heavily tweak the output through your prompt.
The point is that the AI generator (or, if you prefer, its training data) exercised direct control over the image, not you. Providing additional prompts does not change this, just as rerolling the dice wouldn’t make the dice the author.
For that matter, gives extensive prompts or other artistic direction to a human artist would not make you eligible for copyright, either. Even if the artist was heavily influenced by your suggestions.
Finally, choosing one among many completed works is not a creative process, even if it requires artistic judgment. Choosing repeatedly does not transform it into a creative process. That’s why choosing your favorite song does not mean you are a song creator.
There are two separate issues here. First, can you copyright art that is completely AI-generated? The answer is no. So openAI cannot claim a copyright for its output, no matter how it was trained.
The other issue is if openAI violated a copyright. It’s true that if you write a book in the style of another author, then you aren’t violating copyright. And the same is true of openAI.
But that’s not really what the openAI lawsuit alleges. The issue is not what it produces today, but how it was originally trained. The authors point out that in the process of training openAI, the developers illegally download their works. You can’t illegally download copyrighted material, period. It doesn’t matter what you do with it afterwards. And AI developers don’t get a free pass.
Illegally downloading copyrighted books for pleasure reading is illegal. Illegally downloading copyrighted books for training an AI is equally illegal.
In most experimental work, the artist does make a direct contribution to some key elements of the work, for example framing or background. Which is all that’s necessary, you can still obtain copyright over something that is only partially under your control.
If an artist gives up all direct control over an experimental work - such as the infamous monkey selfie - then I think they should no longer be able to copyright it.
The output is still fully predictable by the artist.
The dice didn’t make the eyes, after all. They just showed 21, now it’s your job to actually make 21mm eyes. In doing so, you could mess up and/or intentionally make 22mm eyes. If someone asks, “Why are these eyes 21mm?”, you can answer “I decided to do what the dice asked”.
The dice are more like a client who asks you to draw a portrait with 21mm eyes. In other words, they are giving you a prompt. Nobody but you knows if they will get what they asked for.
The copyright office has been pretty clear that if an artist is significantly involved in creating an image but then adjusts it with AI, or vice versa, then the work is still eligible for copyright.
In all of the cases where copyright was denied, the artist made no significant changes to AI output and/or provided the AI with nothing more than a prompt.
Photographers give commands to their camera just as a traditional artist gives commands in Photoshop. The results in both cases are completely predictable. This is where they diverge from AI-generated art.
It’s not a matter of intelligence or sentience. The key question is whether the output of a prompt is fully predictable by the person who gave the prompt.
The behavior of a paintbrush, mouse, camera, or robot arm is predictable. The output of a prompt is not (at least, not predictable by the person who gave the prompt).
A prompt is more than a command. It is a command with an immediate output that is not fully predictable by the prompt-giver.
So for example the copyright office might ask, “This image includes a person whose left eye is a circle with radius 2.14 cm. Why is it 2.14 cm?”
Traditional artist: because I chose to move the paintbrush (or mouse) 2.14 cm. The paintbrush (mouse) can only go where I move it.
Photographer: because I chose to stand 3 meters from the subject and use an 85mm lens on my camera. The magnification (size) of the eye depends only on those factors.
AI-assisted artist: because I asked for larger eyes. I did not specify precisely 2.14 cm, but I approved of it.
In your example, if you can fully predict the output of the vacuum by your voice command, then it is no different than using a paintbrush or mouse.
But it does matter whether your input is a brush or a prompt.
If you physically paint something with a paintbrush, you have a copyright over your work.
If someone asks you to physically paint something by describing what they want, you still have copyright over the work. No matter how picky they are, no matter how many times they review your progress and tell you to start over. Their prompts do not allow them to claim copyright, because prompts in general are not sufficient to claim copyright.
The argument relies a lot on an analogy to photographers, which misunderstands the nature of photography. A photographer does not give their camera prompts and then evaluate the output.
A better analogy would be giving your camera to a passerby and asking them to take your photo, with prompts about what you want in the background, lighting, etc. No matter how detailed your instructions, you won’t have a copyright on the photo.
A key issue, often overlooked, is that US law imposes significant restrictions on the export and sale of military hardware.
Starlink is currently not considered military hardware. SpaceX is desperately trying to keep it that way, their ultimate goal is to sell subscriptions to civilians. Thus they get anxious when it is openly used for military purposes.
In this regard Starlink is somewhat similar to civilian GPS receivers, which automatically shut down at 1200 mph so they can’t be used in missiles.
[citation needed]
As I already said, fair use is generally not granted when it entails competition with the original work. See above regarding movie reviews vs copying an entire film.
It has nothing in common with it.
Legally, property is something that has an owner. IP has an owner, and like other types of property it can be transferred to another owner and become the subject of contracts.
IP cannot be “stolen”, and I never said it could be. Real estate cannot be “stolen” either, yet real estate is still property.
That’s all an AI needs in order to get trained on something. They just need to see it.
For someone who thinks other people are “weird” about legal language, you keep making the same mistakes.
When people “see” something, they do not need to create a copy of it in the legal sense. When I look at an old photograph, legally I do not create a copy of the photograph.
AI do not “just see” data. They need access to an electronic copy of the data. An AI cannot “see” an old photograph unless they first create a local copy of the photograph. There is a significant legal difference.
There are situations where permission is not required to use copyrighted material, mainly “fair use”. But AI training often does not qualify as fair use.
Otherwise, intellectual property is treated similarly to other types of property. For example, the person who owns a car can give you permission to use it. That doesn’t mean you can do whatever you want with it. The owner gets to set the rules. They aren’t “kings”, but as owners they do have near complete control over what you can do with their car.
When you upload something to social media, you (the content owner) give the host permission to display your content. That does not mean users who view your content have permission to do whatever they want with it.
There is plenty of open source code posted into repositories that are extensively mirrored, yet the code has lengthy conditions attached to it. If you use it in violation of the stated license, you could find yourself on the losing end of a lawsuit.
There are plenty of photographs posted onto Instagram, which is also designed to display them to anyone who asks. If a professional photographer finds that you’ve used one of their Instagram photos without permission, you could find yourself on the losing end of a lawsuit.
And the Fediverse may be a non-commercial decentralized platform, but copyright protection doesn’t magically disappear here. You give servers a license to display what you wrote, but you may reserve the same rights over your IP as software developers and photographers do over their own.
Copyright holders can prohibit all use of their work. If they can prohibit all use, then they can prohibit any specific use.
And they rarely give a wide-ranging “permission for their posts to be viewed by the public.” That’s like saying “I can do whatever I want with this source code, the developer gave permission for the code to be viewed by the public.” The legal language is often far more specific. There may be prohibitions on commercial usage, requirement for attribution, or even requirements that future code be licensed in a similar manner. There is almost always fine print when content appears to be “free”.
Of course, it’s possible that you could find a creative way around the legalese. Pointing a camera at a browser may work, until the fine print changes to prohibit that too. But anyway, that’s not what AI developers have done in the past. So one way or another, they may be forced to change their ways.
Hollywood and other big businesses will still find a way to train AI as usual. They are already used to paying musicians when a song is used in a movie. They can easily pay copyright holders for a license to use content as training data. It’s far safer - and more efficient - to pay up than try to get around the rules with a camera pointed at a screen. As a bonus, content creators who contribute training data may benefit from royalties.
Nevertheless, I think it will become more difficult for people who think they can easily get “free” training data from the web, just like 20 years ago when people thought they could easily get “free” music from Napster.
That’s a public performance, which is a form of redistribution. That’s not relevant to AI training.
Copyright law defines whether or not you can make a copy of a work. The person who owns the copyright can deny permission to make any copies, or grant you to make a permission to make a copy only under certain conditions. Those conditions are completely up to the copyright holder. They might prohibit public performance, but by no means is public performance the only thing that the copyright holder can prohibit. It’s simply a very common prohibition.
You are trying to trying to generalize from a specific right, viewing the content on a browser, to a general right to “look” at the content, to the right to train an AI. But legally those are not the same at all. You may be granted some, all, or none of those rights.
Suppose you are in a modern art gallery. You have been given the right to “look” at someone’s art. You can nevertheless be prohibited from making a photograph of the art, even if the camera is also “looking” at it. The owner of the art can attach whatever conditions they want to your photo, including how long you can keep it and exactly what you do with it.
For example you could be allowed to photograph the art for home use but not for wider distribution. You could be allowed to photograph the art for classroom use, but not for AI training. If you are not willing to follow all of the conditions, then you can’t make a photo of the art at all.
The same is true of text. Websites give permission to make a copy of their text for use on your browser. And they can set whatever rules they like for how else your copy may be used.
Except for being banned from using public data that non-American AIs are able to use.
Sure. Of course, America could also ban those non-American AIs from being used in the US. Just as America bans other products that infringe patents/IP.
Democrats didn’t vote to vacate because they like to watch chaos. They simply will not support McCarthy unless he offers something in return. Their vote is a bargaining chip and they aren’t throwing it away.