this post was submitted on 26 Jul 2023

483 points (96.0% liked)

Technology

59590 readers

5389 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

483

Thousands of authors demand payment from AI companies for use of copyrighted works (www.cnn.com)

submitted 1 year ago by L4s@lemmy.world to c/technology@lemmy.world

350 comments fedilink hide all child comments

Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools, marking the latest intellectual property critique to target AI development.

top 50 comments

sorted by: hot top controversial new old

[–] cerevant@lemmy.world 31 points 1 year ago (9 children)

There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.

There are plagiarism and copyright laws to protect the output of these tools: if the output is infringing, then sue them. However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.

When you sell a book, you don’t get to control how that book is used. You can’t tell me that I can’t quote your book (within fair use restrictions). You can’t tell me that I can’t refer to your book in a blog post. You can’t dictate who may and may not read a book. You can’t tell me that I can’t give a book to a friend. Or an enemy. Or an anarchist.

Folks, this isn’t a new problem, and it doesn’t need new laws.

[–] Dark_Arc@lemmy.world 39 points 1 year ago (24 children)

It's 100% a new problem. There's established precedent for things costing different amounts depending on their intended use.

For example, buying a consumer copy of song doesn't give you the right to play that song in a stadium or a restaurant.

Training an entire AI to make potentially an infinite number of derived works from your work is 100% worthy of requiring a special agreement. This even goes beyond simple payment to consent; a climate expert might not want their work in an AI which might severely mischatacterize the conclusions, or might want to require that certain queries are regularly checked by a human, etc

load more comments (24 replies)

[–] scarabic@lemmy.world 21 points 1 year ago (7 children)

When you sell a book, you don’t get to control how that book is used.

This is demonstrably wrong. You cannot buy a book, and then go use it to print your own copies for sale. You cannot use it as a script for a commercial movie. You cannot go publish a sequel to it.

Now please just try to tell me that AI training is specifically covered by fair use and satire case law. Spoiler: you can’t.

This is a novel (pun intended) problem space and deserves to be discussed and decided, like everything else. So yeah, your cavalier dismissal is cavalierly dismissed.

[–] Zormat@lemmy.blahaj.zone 5 points 1 year ago (15 children)

I completely fail to see how it wouldn't be considered transformative work

[–] scarabic@lemmy.world 6 points 1 year ago (2 children)

It fails the transcendence criterion.Transformative works go beyond the original purpose of their source material to produce a whole new category of thing or benefit that would otherwise not be available.

Taking 1000 fan paintings of Sauron and using them in combination to create 1 new painting of Sauron in no way transcends the original purpose of the source material. The AI painting of Sauron isn’t some new and different thing. It’s an entirely mechanical iteration on its input material. In fact the derived work competes directly with the source material which should show that it’s not transcendent.

We can disagree on this and still agree that it’s debatable and should be decided in court. The person above that I’m responding to just wants to say “bah!” and dismiss the whole thing. If we can litigate the issue right here, a bar I believe this thread has already met, then judges and lawmakers should litigate it in our institutions. After all the potential scale of this far reaching issue is enormous. I think it’s incredibly irresponsible to say feh nothing new here move on.

load more comments (2 replies)

load more comments (14 replies)

load more comments (6 replies)

[–] volkhavaar@lemmy.world 8 points 1 year ago (1 children)

This is a little off, when you quote a book you put the name of the book you’re quoting. When you refer to a book, you, um, refer to the book?

I think the gist of these authors complaints is that a sort of “technology laundered plagiarism” is occurring.

load more comments (1 replies)

[–] cloudless@feddit.uk 8 points 1 year ago (1 children)

I asked Bing Chat for the 10th paragraph of the first Harry Potter book, and it gave me this:

"He couldn’t know that at this very moment, people meeting in secret all over the country were holding up their glasses and saying in hushed voices: ‘To Harry Potter – the boy who lived!’"

It looks like technically I might be able to obtain the entire book (eventually) by asking Bing the right questions?

[–] cerevant@lemmy.world 8 points 1 year ago* (last edited 1 year ago) (3 children)

Then this is a copyright violation - it violates any standard for such, and the AI should be altered to account for that.

What I’m seeing is people complaining about content being fed into AI, and I can’t see why that should be a problem (assuming it was legally acquired or publicly available). Only the output can be problematic.

load more comments (3 replies)

[–] assassin_aragorn@lemmy.world 8 points 1 year ago (15 children)

However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.

It's an algorithm that's been trained on numerous pieces of media by a company looking to make money of it. I see no reason to give them a pass on fairly paying for that media.

You can see this if you reverse the comparison, and consider what a human would do to accomplish the task in a professional setting. That's all an algorithm is. An execution of programmed tasks.

If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I'd get my ass sued. I have to buy the books and the scientific papers. STEM companies regularly pay for access to papers and codes and standards. Why shouldn't an AI have to do the same?

load more comments (15 replies)

load more comments (4 replies)

[–] novibe@lemmy.ml 29 points 1 year ago (6 children)

You know what would solve this? We all collectively agree this fucking tech is too important to be in the hands of a few billionaires, start an actual public free open source fully funded and supported version of it, and use it to fairly compensate every human being on Earth according to what they contribute, in general?

Why the fuck are we still allowing a handful of people to control things like this??

[–] Zetaphor@zemmy.cc 7 points 1 year ago (1 children)

Setting aside the obvious answer of "because capitalism", there are a lot of obstacles towards democratizing this technology. Training of these models is done on clusters of A100 GPU's, which are priced at $10,000USD each. Then there's also the fact that a lot of the progress being made is being done by highly specialized academics, often with the resources of large corporations like Microsoft.

Additionally the curation of datasets is another massive obstacle. We've mostly reached the point of diminishing returns of just throwing all the data at the training of models, it's quickly becoming apparent that the quality of data is far more important than the quantity of the data (see TinyStories as an example). This means a lot of work and research needs to go into qualitative analysis when preparing a dataset. You need a large corpus of input, each of which are above a quality threshold, but then also as a whole they need to represent a wide enough variety of circumstances for you to reach emergence in the domain(s) you're trying to train for.

There is a large and growing body of open source model development, but even that only exists because of Meta "leaking" the original Llama models, and now more recently releasing Llama 2 with a commercial license. Practically overnight an entire ecosystem was born creating higher quality fine-tunes and specialized datasets, but all of that was only possible because Meta invested the resources and made it available to the public.

Actually in hindsight it looks like the answer is still "because capitalism" despite everything I've just said.

[–] novibe@lemmy.ml 5 points 1 year ago (4 children)

I know the answer to pretty much all of our “why the hell don’t we solve this already?” questions is: capitalism.

But I mean, as Lrrr would say “why does the working class, as the biggest of the classes, doesn’t just eat the other one?”.

load more comments (4 replies)

[–] Durotar@lemmy.ml 29 points 1 year ago (22 children)

How can they prove that not some abstract public data has been used to train algorithms, but their particular intellectual property?

[–] squaresinger@feddit.de 44 points 1 year ago (7 children)

Well, if you ask e.g. ChatGPT for the lyrics to a song or page after page of a book, and it spits them out 1:1 correct, you could assume that it must have had access to the original.

[–] dojan@lemmy.world 21 points 1 year ago (1 children)

Or at least excerpts from it. But even then, it's one thing for a person to put up a quote from their favourite book on their blog, and a completely different thing for a private company to use that data to train a model, and then sell it.

[–] GlowingLantern@feddit.de 14 points 1 year ago (1 children)

Even more so, if you consider that the LLMs are marketed to replace the authors.

load more comments (1 replies)

[–] ProfessorZhu@lemmy.world 5 points 1 year ago (1 children)

Can it recreate anything 1:1? When both my wife and I tried to get them to do that they would refuse, and if pushed they would fail horribly.

[–] squaresinger@feddit.de 6 points 1 year ago (3 children)

This is what I got. Looks pretty 1:1 for me.

[–] jackie_jormp_jomp@lemmy.world 9 points 1 year ago (1 children)

Hilarious that it started with just "Buddy", like you'd be happy with only the first word.

[–] squaresinger@feddit.de 5 points 1 year ago* (last edited 1 year ago)

Yeah, for some reason it does that a lot when I ask it for copyrighted stuff.

As if it knew it wasn't supposed to output that.

load more comments (2 replies)

load more comments (5 replies)

[–] BrooklynMan@lemmy.ml 8 points 1 year ago

there are a lot of possible ways to audit an AI for copyrighted works, several of which have been proposed in the comments here, but what this could lead to is laws requiring an accounting log of all material that has been used to train an AI as well as all copyrights and compensation, etc.

load more comments (20 replies)

[–] Colorcodedresistor@lemm.ee 23 points 1 year ago (1 children)

This is a good debate about copyright/ownership. On one hand, yes, the authors works went into 'training' the AI..but we would need a scale to then grade how well a source piece is good at being absorbed by the AI's learning. for example. did the AI learn more from the MAD magazine i just fed it or did it learn more from Moby Dick? who gets to determine that grading system. Sadly musicians know this struggle. there are just so many notes and so many words. eventually overlap and similiarities occur. but did that musician steal a riff or did both musicians come to a similar riff seperately? Authors dont own words or letters so a computer that just copies those words and then uses an algo to write up something else is no more different than you or i being influenced by our favorite heroes or i formation we have been given. do i pay the author for reading his book? or do i just pay the store to buy it?

load more comments (1 replies)

[–] HiddenLayer5@lemmy.ml 23 points 1 year ago

Someone should AGPL their novel and force the AI company to open source their entire neural network.

[–] Cstrrider@lemmy.world 18 points 1 year ago (17 children)

While I am rooting for authors to make sure they get what they deserve, I feel like there is a bit of a parallel to textbooks here. As an engineer if I learn about statics from a text book and then go use that knowledge to he'll design a bridge that I and my company profit from, the textbook company can't sue. If my textbook has a detailed example for how to build a new bridge across the Tacoma Narrows, and I use all of the same design parameters for a real Tacoma Narrows bridge, that may have much more of a case.

[–] minesweepermilk@lemmy.world 5 points 1 year ago

I think that these are fiction writers. The maths you'd use to design that bridge is fact and the book company merely decided how to display facts. They do not own that information, whereas the Handmaid's Tale was the creation of Margaret Atwood and was an original work.

load more comments (16 replies)

[–] joe@lemmy.world 13 points 1 year ago (35 children)

All this copyright/AI stuff is so silly and a transparent money grab.

They're not worried that people are going to ask the LLM to spit out their book; they're worried that they will no longer be needed because a LLM can write a book for free. (I'm not sure this is feasible right now, but maybe one day?) They're trying to strangle the technology in the courts to protect their income. That is never going to work.

Notably, there is no "right to control who gets trained on the work" aspect of copyright law. Obviously.

[–] DandomRude@lemmy.world 10 points 1 year ago

There is nothing silly about that. It's a fundamental question about using content of any kind to train artificial intelligence that affects way more than just writers.

load more comments (34 replies)

[–] randon31415@lemmy.world 7 points 1 year ago

Obligatory xkcd: https://xkcd.com/827/

load more comments