Sarah Silverman Sues ChatGPT Creator for Copyright Infringement
www.thedailybeast.com
external-link
“If a user prompts ChatGPT to summarize a copyrighted book, it will do so,” the suit claims.
@Moonrise2473@feddit.it
link
fedilink
English
01Y

Seems very improbable that they scraped a pirate website with forced registration and tight daily download limits (10 books a day max?) to get content that’s often mislabeled and not presented in an homogeneous way.

Probably it’s just using the excerpt from Amazon (which instead with paid API access is much more easy to access) as a prompt and build on it

luciole (he/him)
link
fedilink
English
21Y

There’s been ongoing suspicions that pirated content was used to train popular LLMs simply because popular datasets used for training LLMs do include such content. The Washington Post did an article about it.

Google’s C4 dataset used for research included illegal websites. What remains to be seen is if it was cleaned up before training Bard as we know it today. OpenAI as revealed nothing on its dataset.

Create a post

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

  • 1 user online
  • 59 users / day
  • 169 users / week
  • 619 users / month
  • 2.31K users / 6 months
  • 1 subscriber
  • 3.28K Posts
  • 67K Comments
  • Modlog