this post was submitted on 29 Nov 2023
258 points (96.4% liked)

Technology

59118 readers
6622 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

ChatGPT is full of sensitive private information and spits out verbatim text from CNN, Goodreads, WordPress blogs, fandom wikis, Terms of Service agreements, Stack Overflow source code, Wikipedia pages, news blogs, random internet comments, and much more.

you are viewing a single comment's thread
view the rest of the comments
[–] NevermindNoMind@lemmy.world 19 points 11 months ago (2 children)

This is interesting in terms of copyright law. So far the lawsuits from Sarah Silverman and others haven't gone anywhere on the theory that the models do not contain a copies of books. Copyright law hinges on whether you have a right to make copies of a work. So the theory has been the models learned from the books but didn't retain exact copies, like how a human reads a book and learns it's contents but does not store an exact copy in their head. If the models "memorized" training data, including copyrighten works, OpenAI and others may have a problem (note the researchers said they did this same thing on other models).

For the silicone valley drama addicts, I find it curious that the researchers apparently didn't do this test on Bard of Anthropic's Claude, at least the article didn't mention them. Curious.

[–] Excrubulent@slrpnk.net 13 points 11 months ago* (last edited 11 months ago) (2 children)

"Copyrighten" is an interesting grammatical construction that I've never seen before. I'd assume it would come from a second language speaker.

It looks like a mix of "written" and "righted".

"Copywritten" isn't a word I've ever heard, but it would be a past tense form of "copywriting", which is usually about writing text for advertisements. It's a pretty niche concept.

"Copyrighted" is the typical form for works that have copyright.

I'm not a grammar nazi - what's right & wrong is about what gets used which is why I talk about the "usual" form and not the "correct" form - but "copyrighted" is the clearest way to express that idea.

[–] LukeMedia@lemmy.world 7 points 11 months ago* (last edited 11 months ago)

Copyrighten is just how they say it out in the country.

"I dun been copyrighten all damn day"

[–] Karyoplasma@discuss.tchncs.de 1 points 11 months ago* (last edited 11 months ago)

"Copyrightened" could mean explicit consent to use your material.

[–] BeatTakeshi@lemmy.world 2 points 11 months ago

So their angle should be plagiarism rather than copyright?