Technology

59605 readers

4056 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

425

2 authors say OpenAI 'ingested' their books to train ChatGPT. Now they're suing, and a 'wave' of similar court cases may follow. (www.businessinsider.com)

submitted 1 year ago by L4s@lemmy.world to c/technology@lemmy.world

154 comments fedilink hide all child comments

Two authors sued OpenAI, accusing the company of violating copyright law. They say OpenAI used their work to train ChatGPT without their consent.

you are viewing a single comment's thread
view the rest of the comments

[–] dhork@lemmy.world 12 points 1 year ago (1 children)

There's an additional question: who holds the copyright on the output of an algorithm? I don't think that is copyrightable at all. The bot doesn't really add anything to the output, it's just a fancy search engine. In the US, in particular, the agency in charge of Copyrights has been quite insistent that a copyright can only be given to the output if a human.

So when an AI incorporates parts of copyrighted works into its output, how can that not be infringement?

[–] cerevant@lemmy.world 3 points 1 year ago (1 children)

How can you write a blog post reviewing a book you read without copyright infringement? How can you post a plot summary to Wikipedia without copyright infringement?

I think these blanket conclusions about AI consuming content being automatically infringing are wrong. What is important is whether or not the output is infringing.

[–] dhork@lemmy.world 8 points 1 year ago* (last edited 1 year ago) (2 children)

You can write that blog post because you are a human, and your summary qualifies for copyright protection, because it is the unique output of a human based on reading the copywrited material.

But the US authorities are quite clear that a work that is purely AI generated can never qualify for copyright protection. Yet since it is based on the synthesis of works under copyright, it can't really be considered public domain either. Otherwise you could ask the AI "Write me a summary of this book that has exactly the same number of words", and likely get a direct copy of the book which is clear of copyright.

I think that these AI companies are going to face a reckoning, when it is ruled that they misappropriated all this content that they didn't explicitly license for use, and all their output is just fringing by definition.

[–] Whimsical@lemmy.world 2 points 1 year ago

I'm expecting a much messier "resolution" that'll look a lot like YouTube's copyright situation - their product can be used for copyright infringement, and they'll be required by law to try and take appropriate measures to prevent it, but will otherwise not be held liable as long as they can claim such measures are being taken.

Having an AI recite a long text to bypass copyright seems equivalent in my mind to uploading a full movie to youtube. In both cases, some amount of moderation (itself increasingly algorithmic) is required to not only be applied, but actively developed and advanced to flout efforts to bypass it. For instance, youtube pirates will upload things with some superficial changes like a filter applied or showing the movie on a weird angle or mirrored to bypass copyright bots, which means the bots need to be more strict and better trained, or else youtube once again becomes liable for knowing about these pirates and not stopping them.

The end result, just like with youtube, will probably be that AI models have to have big, clunky algorithms applied against their outputs to recalculate or otherwise make copyright-safe anything that might remotely be an infringement. It'll suck for normal users, pirates will still dig for ways to bypass it, and everyone will be unhappy. If youtube is any indicator, this situation can somehow remain stable for over a decade - long enough for AI devs to release a new-generation bot to restart the whole issue.

Yaaaaaaaaay

[–] cerevant@lemmy.world 1 points 1 year ago

But the US authorities are quite clear that a work that is purely AI generated can never qualify for copyright protection.

Which law says this? The government is certainly discussing the problem, but I wasn't aware of any legislation.

If there is such a law, it seems to overlook an important point: an algorithm - an AI - is itself an expression of human intelligence. Having a computer carry out an algorithm for summarizing content can be indistinguishable from a person having a pattern they follow for writing summaries.