this post was submitted on 26 Jul 2023
104 points (85.1% liked)

Technology

59092 readers
6622 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] fubo@lemmy.world 45 points 1 year ago* (last edited 1 year ago) (12 children)

The argument regarding the specific case of AI-generated images of real actors makes sense, but the headline overgeneralizes hugely.

If you write a book about carpentry, and someone checks that book out from the library, reads it, learns how to do carpentry from it, and goes into the carpentry business, they do not owe you a share of their profits.

It's nice if they give you credit. But they do not owe you a revenue stream.

If they are a robot, the same remains true.

[–] Gradually_Adjusting@lemmy.world 26 points 1 year ago* (last edited 1 year ago) (2 children)

Corollary: if a corporation scapes the talk of the whole internet, which itself was shaped by the aggregate culture and knowledge of ten thousand years of human history, and their resultant product is an AI that can replace workers, it is morally valid to eminent domain that shit and divert its profits to a fledgling UBI program.

Edit to add: Not a statement about how UBI should really work, just a throwaway comment about seizing means.

[–] d3Xt3r@lemmy.world 12 points 1 year ago* (last edited 1 year ago)

UBI should be a government initiative, and funding for it should be collected in form of tax, irrespective of AI. Because more and more humans are getting replaced with automation and technology in general, and a lot of this being done so gradually that you don't notice it, or think of it as a problem. Every time you saw headlines like "xx corporation has laid off hundreds/thousands of employees" in the past, had very little to do with AI, but could have to do with technology and progress in general, plus a lot of other factors. Every little new development could have a butterfly effect that's hard to calculate.

Neither AI, nor the loss of jobs in general, should be a factor for UBI funding. AI is just another new technological development, maybe even a disruptive one, but it's nothing so new that we need to pick up our pitchforks against.

As for compensating creative owners, that's a bigger discussion on IP protection and ownership in general, and the responsibility falls upon the IP owners (and maybe appropriate laws). For instance, we've seen news sites, science publishers etc paywall their work, and that's because they want to protect their work and get compensation for viewership - and this has nothing to do with AI. If people want compensation for their work, then they should take appropriate measures to protect their work, and/or come up with alternate revenue streams, if it's impossible to paywall their work (for instance, how some youtubers choose to seek sponsorship or patreon donations). If people want to prevent their work from being stolen and redistributed, appropriate action should be taken against the persons/sites stealing their work (eg via DMCA etc). It's not the AI's fault for eating up copyrighted content on public sites like. pastebin.com or Scribd, it's the fault of the people uploading it.

[–] FaceDeer@kbin.social 6 points 1 year ago

UBI should not be dependent on its specific sources and specific destinations. It's universal, it's right in the name. It should be funded by a tax on the wealthy - regardless of how that wealth is obtained - and be issued to everyone.

The goal is not to "level the playing field" so that human employees can continue to labor and companies can't afford to hire robots to replace them. The goal is to make it so that if companies replace all their employees with robots those employees don't have to find some other job to continue living.

[–] pulaskiwasright@lemmy.ml 3 points 1 year ago (1 children)

If you write a book about carpentry, and someone checks that book out from the library, reads it

AI is not a person. That’s why its works aren’t eligible for copyright. You’re arguing that AI should have the same rights as a person in this regard and that’s not an established right, nor should it be.

[–] ForgetReddit@lemmy.world 11 points 1 year ago (1 children)

Also the analogy makes zero sense. It’s more accurate to say someone checks out a book about carpentry, reads it, then writes another book on carpentry by moving the words around a bit despite knowing nothing about carpentry.

[–] neblem@lemmy.world 2 points 1 year ago* (last edited 1 year ago) (1 children)

More accurately someone who knows nothing about German, writing, or carpentry but learns German and carpentry by reading hundreds of thousands of books and then decides to write a book about carpentry in German.

[–] pulaskiwasright@lemmy.ml 5 points 1 year ago* (last edited 1 year ago)

the AI still doesn’t learn carpentry. It just knows how books about carpentry generally read.

[–] phillaholic@lemm.ee -2 points 1 year ago (1 children)

I’m not sure that’s a fair comparison. You wouldn’t instantly ingest that information and know it. It’s more like photocopying a book and including it in another book that you sell. It’s a paradigm shift, and I’m not sure what the answer is.

[–] dorkian_gray@lemmy.world 3 points 1 year ago* (last edited 1 year ago) (1 children)

It's nothing like photocopying a book. It is very, very similar to the analogy given above, of someone learning the information and profiting from it. For the AI model to "learn" the information during training, it takes apart the information one piece of a word at a time, and reorganises it for quick access. Information is categorised by metadata like topic, source, date, etc; there are approximately 1536 "tags", so to speak, which OpenAI's ChatGPT uses for categorising what it learns.

Copyright of words has the order of those words as an integral part of the legal standard, and the standards for what infringes are actually pretty strict (https://fairuse.stanford.edu/2003/09/09/copyright_protection_for_short/). Training an AI is definitively transformative work which does not retain the order of the words in the finished product, merely a weighted likelihood of what word fragment will come next in a given context, so it's protected under Fair Use.

[–] phillaholic@lemm.ee -2 points 1 year ago (1 children)

I don’t think it’s that simple. Like I said it’s a paradigm shift. It doesn’t fit into existing laws well. My point is what we consider fair use now, summarizing a book or movie by a human, is based on the limited abilities of humans. When you have AI with limitless abilities, that will change things. The same rules abs considerations may have to be rethought.

[–] dorkian_gray@lemmy.world 3 points 1 year ago

Au contraire, it is that simple and it is covered by existing law just fine in the very specific case we're talking about, which is whether training a model is "transformative work" by the definition in IP law. It is. The law looks very specifically at the fact of the case, not hand-waving masquerading as an argument.

You are making this technology out to be something it isn't; there's no mystery to how AI works, and it does not have "limitless abilities". In fact, it is very limited, but that isn't relevant. What the law considers "fair use" isn't based on human ability at all, it's based on how completely the work is reproduced and the context the original work is being used in. You clearly have access to the internet, you can verify the standards required to show breach of copyright yourself if you don't believe me.

[–] scarabic@lemmy.world -4 points 1 year ago

Analogies to humans are not relevant, and yours is a bad one anyway. LLMs don’t read a carpentry book and then go build houses. They chew up carpentry books and spit out carpentry books.

Your final line remains to be established in court.

[–] silence7@slrpnk.net -4 points 1 year ago (3 children)

A key difference is that AI models tend to contain actual pieces of the training data, and on occasion regurgitate it. Kind of like randomly reproducing parts of the book during the course of your career as a carpenter. That's the kind of thing that actually results in copyright lawsuits and damages when real people do it. AI shouldn't be getting a pass here.

[–] fubo@lemmy.world 6 points 1 year ago (13 children)

Oh sure, if a copyright holder can demonstrate that a specific work is reproduced. Not just "I think your AI read my book and that's why it's so good at carpentry."

load more comments (13 replies)
[–] FaceDeer@kbin.social 4 points 1 year ago

That article doesn't show what you think it shows. There was a lot of discussion of it when it first came out and the examples of overfitting they managed to dig up were extreme edge cases of edge cases that took them a huge amount of effort to find. So that people don't have to follow a Reddit link, from the top comment:

They identified images that were likely to be overtrained, then generated 175 million images to find cases where overtraining ended up duplicating an image.

We find 94 images are extracted. [...] [We] find that a further 13 (for a total of 109 images) are near-copies of training examples

They're purposefully trying to generate copies of training images using sophisticated techniques to do so, and even then fewer than one in a million of their generated images is a near copy.

And that's on an older version of Stable Diffusion trained on only 160 million images. They actually generated more images than were used to train the model.

Overfitting is an error state. Nobody wants to overfit on any of the input data, and so the input data is sanitized as much as possible to remove duplicates to prevent it. They had to do this research on an early Stable Diffusion model that was already obsolete when they did the work because modern Stable Diffusion models have been refined enough to avoid that problem.

If I was to read a carpentry book and then publish my own, "regurgitating" most of the original text, then I plagiarized and should be sued. Furthermore, if I was to write a song and use the same melody as another copyrighted song I'd get sued and lose, even if I could somehow prove that I never heard the original.

I think the same rules should apply to AI generated content. One rule I would like to see, and I don't know if this has precedent, is that AI generated content cannot be copyrighted. Otherwise AI could truly replace humans from a creative perspective and it would be a race to generate as much content as possible.

[–] Taleya@aussie.zone -5 points 1 year ago (1 children)

AI isn't learning how to do carpentry though. It's simply including my work in an aggregate pool that it now claims as its own.

[–] FaceDeer@kbin.social 7 points 1 year ago

It is not. The AI's model does not contain a copy of your work, there is no "aggregate pool." AI is not some sort of magical compression algorithm that's able to somehow crush whole images down to less than a byte of data. The only thing that it's "including" in itself are the concepts that it learned from your work. Those are ideas, which are not copyrightable.

load more comments (6 replies)
[–] Ocelot@lemmies.world 22 points 1 year ago (1 children)

If I post some work publicly on the internet (like open source code) so that an AI is able to scrape it why in the hell should I expect to get paid for it?

[–] ParsnipWitch@feddit.de -1 points 1 year ago (1 children)

Artists need to show their works to find clients.

[–] fedev@lemmy.world 1 points 1 year ago

Maybe they have to do the Twitter way and show case their work behind a registration page or even better if there could be an implementation of the robots.txt file but for ai crawlers.

Still, there are countless of ways in which a reproduction could be leaked. I could buy a painting, which I then own, take a picture of it and upload it to a public location. Same for a book.

But I tend to agree that is the model generates an image or text that only has traces of the original work, then no compensation should be needed.

[–] tal@kbin.social 11 points 1 year ago

I don't see the argument for it. The same bar doesn't apply to humans who train their minds on other human works.

[–] iegod@lemm.ee 2 points 1 year ago (1 children)
[–] AgentCorgi@lemmy.world 4 points 1 year ago (1 children)

Is your work worth paying? Put your work behind a paywall if it’s that valuable. They will reach out to you.

[–] ParsnipWitch@feddit.de -1 points 1 year ago

As an artist you need to show your works to get commissioned and clients.

[–] Imgonnatrythis@lemmy.world 2 points 1 year ago

Lemmy: this. Also Lemmy: what is y'allz favorite Warez site?

[–] Asudox@lemmy.world 1 points 1 year ago

Imagine if AI humans became real and they had to work for lifetime to pay their debt to all the people's effort and property and such that their training data was made from.

[–] l0v9ZU5Z@feddit.de -1 points 1 year ago* (last edited 1 year ago) (1 children)

Did you pay the author of every book you read?

[–] BobKerman3999@feddit.it 6 points 1 year ago (2 children)

Yes, that's what buying a book is

[–] l0v9ZU5Z@feddit.de 2 points 1 year ago

I use the Library or libgen

[–] scarabic@lemmy.world 1 points 1 year ago (1 children)

Yep.

“dO yOu PaY fOR boOkS?”

It’s like… tell me you didn’t go to college without telling me you didn’t go to college.

[–] l0v9ZU5Z@feddit.de 2 points 1 year ago* (last edited 1 year ago)

I went to university but never had to buy a book. My university library offered all the books I needed and an online access to research articles by Springer, Elsevier, and so on. You can get access as a regular person without being an enrolled student.

[–] candio@lemmy.world -1 points 1 year ago (1 children)
[–] Imgonnatrythis@lemmy.world 1 points 1 year ago

Do you think they require your email address And phone number and don't plan to target you with ads?

[–] Bishma@discuss.tchncs.de -2 points 1 year ago* (last edited 1 year ago)

It should also be taxed as labor.

Edit: To clarify, I mean the company in control of the AI should pay some equivalent to income tax for using AI instead of a person.

load more comments
view more: next ›