this post was submitted on 09 Jan 2024

484 points (98.4% liked)

Technology

59092 readers

6622 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

484

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says (www.theguardian.com)

submitted 10 months ago by L4s@lemmy.world to c/technology@lemmy.world

242 comments fedilink hide all child comments

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

you are viewing a single comment's thread
view the rest of the comments

[–] S410@lemmy.ml 18 points 10 months ago (2 children)

They're not wrong, though?

Almost all information that currently exists has been created in the last century or so. Only a fraction of all that information is available to be legally acquired for use and only a fraction of that already small fraction has been explicitly licensed using permissive licenses.

Things that we don't even think about as "protected works" are in fact just that. Doesn't matter what it is: napkin doodles, writings on bathrooms stall walls, letters written to friends and family. All of those things are protected, unless stated otherwise. And, I don't know about you, but I've never seen a license notice attached to a napkin doodle.

Now, imagine trying to raise a child while avoiding every piece of information like that; information that you aren't licensed to use. You wouldn't end up with a person well suited to exist in the world. They'd lack education regarding science, technology, they'd lack understanding of pop-culture, they'd know no brand names, etc.

Machine learning models are similar. You can train them that way, sure, but they'd be basically useless for real-world applications.

[–] AntY@lemmy.world 48 points 10 months ago (5 children)

The main difference between the two in your analogy, that has great bearing on this particular problem, is that the machine learning model is a product that is to be monetized.

[–] deweydecibel@lemmy.world 10 points 10 months ago (2 children)

And ultimately replace the humans it learned from.

[–] Zoboomafoo@slrpnk.net -1 points 10 months ago

Good, I want AI to do all my work for me

[–] afraid_of_zombies@lemmy.world -1 points 9 months ago

Yes clearly 90 years plus death of artist is acceptable

[–] BURN@lemmy.world 3 points 10 months ago (1 children)

Also an “AI” is not human, and should not be regulated as such

[–] afraid_of_zombies@lemmy.world -1 points 9 months ago (1 children)

Neither is a corporation and yet they claim first amendment rights.

[–] BURN@lemmy.world 2 points 9 months ago (1 children)

That’s an entirely separate problem, but is certainly a problem

[–] afraid_of_zombies@lemmy.world -1 points 9 months ago (1 children)

I don't think it is. We have all these non-human stuff we are awarding more rights to than we have. You can't put a corporation in jail but you can put me in jail. I don't have freedom from religion but a corporation does.

[–] BURN@lemmy.world 2 points 9 months ago (1 children)

Corporations are not people, and should not be treated as such.

If a company does something illegal, the penalty should be spread to the board. It’d make them think twice about breaking the law.

We should not be awarding human rights to non-human, non-sentient creations. LLMs and any kind of Generative AI are not human and should not in any case be treated as such.

[–] afraid_of_zombies@lemmy.world 0 points 9 months ago

Corporations are not people, and should not be treated as such.

Understand. Please tell Disney that they no longer own Mickey Mouse.

[–] LWD@lemm.ee 2 points 10 months ago

Artificial intelligence is incredible in its flexibility!

Simultaneously, it is like a human,
And yet "only a tool."

[–] testfactor@lemmy.world -1 points 10 months ago

And real children aren't in a capitalist society?

[–] Exatron@lemmy.world 11 points 10 months ago (1 children)

The difference here is that a child can't absorb and suddenly use massive amounts of data.

[–] S410@lemmy.ml 3 points 10 months ago* (last edited 10 months ago) (3 children)

The act of learning is absorbing and using massive amounts of data. Almost any child can, for example, re-create copyrighted cartoon characters in their drawing or whistle copyrighted tunes.

If you look at, pretty much, any and all human created works, you will be able to trace elements of those works to many different sources. We, usually, call that "sources of inspiration". Of course, in case of human created works, it's not a big deal. Generally, it's considered transformative and a fair use.

[–] Barbarian@sh.itjust.works 15 points 10 months ago* (last edited 10 months ago) (3 children)

I really don't understand this whole "learning" thing that everybody claims these models are doing.

A Markov chain algorithm with different inputs of text and the output of the next predicted word isn't colloquially called "learning", yet it's fundamentally the same process, just less sophisticated.

They take input, apply a statistical model to it, generate output derived from the input. Humans have creativity, lateral thinking and the ability to understand context and meaning. Most importantly, with art and creative writing, they're trying to express something.

"AI" has none of these things, just a probability for which token goes next considering which tokens are there already.

[–] sus@programming.dev 5 points 10 months ago* (last edited 10 months ago)

I don't think "learning" is a word reserved only for high-minded creativeness. Just rote memorization and repetition is sometimes called learning. And there are many intermediate states between them.

[–] testfactor@lemmy.world 3 points 10 months ago (1 children)

Out of curiosity, how far do you extend this logic?

Let's say I'm an artist who does fractal art, and I do a line of images where I take jpegs of copywrite protected art and use the data as a seed to my fractal generation function.

Have I have then, in that instance, taken a copywritten work and simply applied some static algorithm to it and passed it off as my own work, or have I done something truly transformative?

The final image I'm displaying as my own art has no meaningful visual cues to the original image, as it's just lines and colors generated using the image as a seed, but I've also not applied any "human artistry" to it, as I've just run it through an algorithm.

Should I have to pay the original copywrite holder?
If so, what makes that fundamentally different from me looking at the copywritten image and drawing something that it inspired me to draw?
If not, what makes that fundamentally different from AI images?

[–] LWD@lemm.ee 2 points 10 months ago (1 children)

what makes [me looking at the copywritten image and drawing something that it inspired me to draw] fundamentally different from AI images?

Because you can be inspired, and a machine cannot? We don't give copyrights to monkeys, let alone fancy calculators.

[–] testfactor@lemmy.world 2 points 10 months ago (1 children)

I feel like you latched on to one sentence in my post and didn't engage with the rest of it at all.

That sentence, in your defense, was my most poorly articulated, but I feel like you responded devoid of any context.

Am I to take it, from your response, that you think that a fractal image that uses a copywritten image as a seed to it's random number generator would be copyright infringement?

If so, how much do I, as the creator, have to "transform" that base binary string to make it "fair use" in your mind? Are random but flips sufficient?
If so, how is me doing that different than having the machine do that as a tool? If not, how is that different than me editing the bits using a graphical tool?

[–] LWD@lemm.ee 2 points 10 months ago (1 children)

That's only because I thought your last sentence was the biggest difference -- everything else is all stuff you did (or theoretically would do), which is the clincher.

(And besides, on Lemmy, comments with effort are sometimes disincentivized 😉)

Art can include buying a toilet and turning it on its side and calling it a fountain. And I imagine, in your scenario, that you could process an entire comic book by flipping just one pixel on each page, print it out, arrange it in a massive mural, and get it featured in the Louvre with the title "is this fair use?" But if you started printing out comic books en masse with the intent to simply resell them in their slightly changed form, you might get in trouble, and probably rightly so. But that's a question of fair use, isn't it?

[–] testfactor@lemmy.world 0 points 10 months ago (1 children)

Fair on all counts. I guess my counter then would be, what is AI art other than running a bunch of pieces of other art through a computer system, then adding some "stuff you did" (to use your phrase) via a prompt, and then submitting the output as your own art.

That's nearly identical to my fractal example, which I think you're saying would actually be fair use?

[–] LWD@lemm.ee 2 points 10 months ago (1 children)

As far as I know, courts have basically decided that things need to be created by a person first and foremost, not by, say, a monkey (and yes there was an attempt to copyright a monkey selfie). In the flipped pixel example I personally classified as art, there was a lot more transformation than simply flipping a pixel, to the point where it hopefully transformed the original into having a new and unique intent.

You could theoretically make a piece of art where generative AI in a similar way, but it's the human element of composition that would make it art (or, at the very least, something novel and not just regurgitated). In theory, you could pull all of the works of a single comic artist, input it into generative AI and do the exact same thing, making a mural of This Is Not Wally Wood or something.

But hopping onto a generative AI that's been trained with the works of countless artists (and by no other AI networks, because AI degenerates when it trains itself) and simply typing in a phrase... Well, at that point it's closer to pushing a button on a machine that flicks paint onto a canvas, and you didn't make the machine, and it's used by thousands of other people everyday. Only so much paint flicking can be done before it's not particularly interesting or unique.

I think somebody made a relatively short (to me) video about whether AI art is even art...

[–] PipedLinkBot@feddit.rocks 1 points 10 months ago

Here is an alternative Piped link(s):

a relatively short (to me) video

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.

[–] agamemnonymous@sh.itjust.works 2 points 10 months ago (1 children)

Humans have creativity, lateral thinking and the ability to understand context and meaning

What evidence do you have that those aren't just sophisticated, recursive versions of the same statistical process?

[–] Barbarian@sh.itjust.works 2 points 9 months ago* (last edited 9 months ago) (1 children)

I think the best counter to this is to consider the zero learning state. A language model or art model without any training data at all will output static, basically. Random noise.

A group of humans socially isolated from the rest of the world will independently create art and music. It has happened an uncountable number of times. It seems to be a fairly automatic emergent property of human societies.

With that being the case, we can safely say that however creativity works, it's not merely compositing things we've seen or heard before.

[–] agamemnonymous@sh.itjust.works 1 points 9 months ago

I disagree with this analysis. Socially isolated humans aren't isolated, they still have nature to imitate. There's no such thing as a human with no training data. We gather training data our whole life, possibly from the womb. Even in an isolated group, we still have others of the group to imitate, who in turn have ancestors, and again animals and natural phenomena. I would argue that all creativity is precisely compositing things we've seen or heard before.

[–] hellothere@sh.itjust.works 5 points 10 months ago

It's a question of scale. A single child cannot replace literally all artists, for example.

[–] Exatron@lemmy.world 2 points 9 months ago (1 children)

The problem is that a human doesn’t absorb exact copies of what it learns from, and fair use doesn't include taking entire works, shoving them in a box, and shaking it until something you want comes out.

[–] S410@lemmy.ml 0 points 9 months ago (1 children)

Expect for all the cases when humans do exactly that.

A lot of learning is, really, little more than memorization: spelling of words, mathematical formulas, physical constants, etc. But, of course, those are pretty small, so they don't count?

Then there's things like sayings, which are entire phrases that only really work if they're repeated verbatim. You sure can deliver the same idea using different words, but it's not the same saying at that point.

To make a cover of a song, for example, you have to memorize the lyrics and melody of the original, exactly, to be able to re-create it. If you want to make that cover in the style of some other artist, you, obviously, have to learn their style: that is, analyze and memorize what makes that style unique. (e.g. C418 - Haggstrom, but it's composed by John Williams)

Sometimes the artists don't even realize they're doing exactly that, so we end up with with "subconscious plagiarism" cases, e.g. Bright Tunes Music v. Harrisongs Music.

Some people, like Stephen Wiltshire, are very good at memorizing and replicating certain things; way better than you, I, or even current machine learning systems. And for that they're praised.

[–] PipedLinkBot@feddit.rocks 1 points 9 months ago

Here is an alternative Piped link(s):

C418 - Haggstrom, but it's composed by John Williams

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.