Stephen King, Zadie Smith, and Michael Pollan are among thousands of writers whose copyrighted works are being used to train large language models.

I’m talking about using copyrighted material to train AI; you’re talking about using AI to replace authors, which is a separate, related issue.

If someone uses Stephen King’s books to train an AI, how many sales of those books are lost? Because it kinda looks like “zero” since the AI isn’t replacing those books.

I think it’s two sides of the same point; the downstream effect of LLMs is devaluing writing, and it’s trained on copyrighted works.

So, for instance, if you train a LLM on everything written by Stephen King, then ask the LLM to generate stories “in the style of Stephen King”, then you could potentially create verbatim text from his books (probabilistically, it’s bound to happen with the way LLM chains words) and/or create books similar enough to his style to be direct competition to his writing.

It’s up to the courts to decide if that argument has any legal weight, and legislators (and the public voting for them) to decide if the laws should change.

And, based on the mess that is Bill C18 in Canada, I have absolutely no confidence in new copyright laws having a lick of sense.

If it generates verbatim output, then we have a good old copyright violation, which courts could latch onto for standing.

But if I hire people to write books in the style of Stephen King and then train an AI with them, where’s King’s recourse?

And the AI could be trained on public domain data and still be a competitor to authors. It seems like the plaintiffs would have to be equally against this usage if they’re worried about their jobs.

But in those two cases, I don’t think any laws are broken.

I just think, aside from a plain old piracy violation, it’s going to be a tricky one in court. Sure you can’t just copy the book, but running a copy of a book through an algorithm is tougher to ban, and it’s not something that necessarily should be illegal.

Create a post

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

  • 1 user online
  • 64 users / day
  • 174 users / week
  • 621 users / month
  • 2.31K users / 6 months
  • 1 subscriber
  • 3.28K Posts
  • 67K Comments
  • Modlog