this post was submitted on 07 Dec 2023

287 points (97.4% liked)

Technology

59135 readers

6622 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

287

Meta’s new AI image generator was trained on 1.1 billion Instagram and Facebook photos (arstechnica.com)

submitted 11 months ago by throws_lemy@lemmy.nz to c/technology@lemmy.world

48 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Esqplorer@lemmy.zip 18 points 11 months ago (1 children)

I wonder how they worked around user violations of copyright... Imagine all the content uploaded to Instagram/Facebook that the poster didn't create but simply uploaded their download/screenshot.

[–] Mahlzeit@feddit.de -2 points 11 months ago (1 children)

That shouldn't be an issue. If you look at an unauthorized image copy, you're not usually on the hook (unless you are intentionally pirating). It's unlikely that they needed to get explicit "consent" (ie license the images) in the first place.

[–] GiveMemes@jlai.lu 7 points 11 months ago (1 children)

Yeah but is it the same thing for a human to view data and an AI model to be trained on it? Not in my opinion as an AI doesn't understand the concept of intellectual property and just spits out the most likely next word whereas a person can recognize when they are copying something.

[–] Mahlzeit@feddit.de -1 points 11 months ago (1 children)

I understand. The idea would be to hold AI makers liable for contributory infringement, reminiscent of the Betamax case.

I don't think that would work in court. The argument is much weaker here than in the Betamax case, and even then it didn't convince. But yes, it's prudent to get the explicit permission, just in case of a case.

[–] GiveMemes@jlai.lu 4 points 11 months ago* (last edited 11 months ago) (1 children)

Doesn't really seem the similar to me at all. One is a thing that's actively making new content. Another is a machine with the purpose of time-shifting broadcasted content that's already been paid for.

It's reminiscent insofar as personal AI models on individual machines would go, but completely different as for corporate and monetizable usage.

Like if somebody sold you an AI box that you had to train yourself that would be reminiscent of the betamax case.

[–] Mahlzeit@feddit.de 0 points 11 months ago (1 children)

Yes, if it's new content, it's obviously no copy; so no copyvio (unless derivative, like fan fiction, etc.). I was thinking of memorized training data being regurgitated.

[–] GiveMemes@jlai.lu 3 points 11 months ago* (last edited 11 months ago) (1 children)

Yeah I just think that ingesting a bucnh of novels and rearranging their contents into a new piece of work (for example) is still copyright infringement. It doesn't need to be the Lord of the Rings or Star Wars word for word to get copyright stricken. Similar to how in the music sphere it doesn't need to be the same exact melody.

Edit: Glad you down voted instead of responding. Really shows the strength of your argument...

[–] Mahlzeit@feddit.de 1 points 11 months ago (1 children)

I didn't downvote you. (Just gave you an upvote, though.) You're reasonable and polite, so a downvote would be very inappropriate. Sorry for that.

Music is having ongoing problems with copyright litigation, like Ed Sheeran most recently. From what I have read, it's blamed on juries without the necessary musical background. As far as I know, higher courts usually strike down these cases, as with Sheeran. Hip hop was neutered, in a blow to (African-)American culture. While it was obviously wrong, not to find for fair use in that case, samples are copies.

It's not so bad outside of music. You can write books on "how to write a bestseller", or "how to draw comics" without needing permission. Of course, you would study many novels and images to get material. The purpose of books is that we learn from them. That we go on to use this to make our own thing is intended (in the US).

What you're proposing there would be a great change to copyright law and probably disastrous. Even if one could limit the immediate effect to new technologies, it would severely limit authors in adopting these technologies.

[–] GiveMemes@jlai.lu 2 points 11 months ago (1 children)

I'm arguing that AI and a human are doing different things when they 'learn'. A human learns. At the end of the day AI isn't doing anything near human intelligenc and therefore isn't critically thinking and applying that information to create new ideas, instead directly copying it based on what it thinks is most likely to come next.

Therefore a human is actually creating new material whereas AI can only rehash old material. It's the same problem of training AI on AI generated content. It makes any faults worse and worse over time because nothing 'new' is created.

At least with current AI tech

[–] Mahlzeit@feddit.de 0 points 11 months ago (1 children)

Well, that is a philosophical or religious argument. It's somewhat reminiscent of the claim that evolution can't add information. That can't be the basis for law.

In any case, it doesn't matter to copyright law as is, that you see it that way. The AI is the equivalent to that book on how to write bestsellers in my earlier reply. People extract information from copyrighted works to create new works, without needing permission. A closer example are programmers, who look into copyrighted references while they create.

[–] GiveMemes@jlai.lu 2 points 11 months ago (2 children)

Except that it's objectively different.

A closer example would be a programmer copying somebody else's code line for line but switching the order of some things around and calling it their own creation.

AI cannot think nor add to work. It cannot extract information in order to answer a question. It is spitting out an exact copy of what was ingested because that is the scenario the system decided was "correct".

If AI could parse information and actually create new intellectual property like a human, I'd find it reasonable, but as it stands it's just spitting out previous work.

[–] Mahlzeit@feddit.de 1 points 11 months ago (1 children)

Can we get back to this? I am confused why you believe that AIs like ChatGPT spit out "exact copies". That they spit out memorized training data is unusual in normal operation. Is there some misunderstanding here?

[–] GiveMemes@jlai.lu 1 points 11 months ago* (last edited 11 months ago)

I don't think we're really talking to each other, but more past each other so I took a break.

To answer the question, it was an analogy and the ransomware part was to show the non-intelligence and creationary lack of AI more than be applied to the programming analogy. Sorry if that was confusing.

It was an ars technica (iirc) article I read in which the author made a working ransomware with GPT-4 by having it initially create a program to encrypt a file, then had it encrypt directories instead, then added flags and debugged it all of which he claims can be done by pretty much anyone malicious with access. Nowhere along the way did chat-gpt realize what it was doing though. A human would have.

Also ime at least I got completely copy and pasted paragphs from gpt 3.5 a few times dunno how much 4 has improved upon that.

I think my disagreement with you about AI copyright infringement is that you think that AI can create new things whereas I don't think that. I think the way I do because it can only ever rehash its training data. Our current AI systems can't actually create new thoughts. For example, with your 'how to write a book' author analogy, those people haven't just read people's advice and are now putting it on paper. Those people have also read tons and tons of novels. Taken classes on English and created and defended original ideas as part of that. If you trained an AI on English classes and novels it would have no idea how to write a "how to write a novel" type book while a person would. You have to have it copy something in order for it to perform, it's just the way that it works.

Furthermore it really wouldn't take a huge change to copyright law, just clear differences between the rules that apply to sentient vs non-sentient sources.

[–] Mahlzeit@feddit.de 0 points 11 months ago (1 children)

Well, that's simply not true.

[–] GiveMemes@jlai.lu 2 points 11 months ago (1 children)

You can say that without explaining but you just look like an idiot.

It's the same reason gpt4 will write you working ransomware without ever noticing that it's writing ranosomware. The AI doesn't understand what's going on. It just does what it does because of a virtual cookie based on a calculated score.

[–] Mahlzeit@feddit.de 0 points 11 months ago

Ok, where did GPT-4 copy the ransomware code? You can't reshuffle lines of code much before the program breaks. Should be easy to find.