this post was submitted on 23 Dec 2024

229 points (100.0% liked)

Technology

37801 readers

207 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

Los@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

229

Technologist: 'Fining Big Tech isn't working, make them give away illegally trained LLMs as public domain' (www.theregister.com)

submitted 2 days ago by thelucky8@beehaw.org to c/technology@beehaw.org

34 comments fedilink hide all child comments

Archived link

Opinionated article by Alexander Hanff, a computer scientist and privacy technologist who helped develop Europe's GDPR (General Data Protection Regulation) and ePrivacy rules.

We cannot allow Big Tech to continue to ignore our fundamental human rights. Had such an approach been taken 25 years ago in relation to privacy and data protection, arguably we would not have the situation we have to today, where some platforms routinely ignore their legal obligations at the detriment of society.

Legislators did not understand the impact of weak laws or weak enforcement 25 years ago, but we have enough hindsight now to ensure we don’t make the same mistakes moving forward. The time to regulate unlawful AI training is now, and we must learn from mistakes past to ensure that we provide effective deterrents and consequences to such ubiquitous law breaking in the future.

you are viewing a single comment's thread
view the rest of the comments

[–] GenderNeutralBro@lemmy.sdf.org 11 points 2 days ago (1 children)

I guess the idea is that the models themselves are not infringing copyright, but the training process DID. Some of the big players have admitted to using pirated material in training data. The rest obviously did even if they haven't admitted it.

While language models have the capacity to produce infringing output, I don't think the models themselves are infringing (though there are probably exceptions). I mean, gzip can reproduce infringing material too with the correct input. If producing infringing work requires both the algorithm AND specific, intentional user input, then I don't think you should put the blame solely on the algorithm.

Either way, I don't think existing legal frameworks are suitable to answer these questions, so I think it's more important to think about what the law should be rather than what it currently is.

I remember stories about the RIAA suing individuals for many thousands of dollars per mp3 they downloaded. If you applied that logic to OpenAI — maximum fine for every individual work used — it'd instantly bankrupt them. Honestly, I'd love to see it. But I don't think any copyright holder has the balls to try that against someone who can afford lawyers. They're just bullies.

[–] p03locke@lemmy.dbzer0.com 6 points 2 days ago* (last edited 2 days ago) (1 children)

I guess the idea is that the models themselves are not infringing copyright, but the training process DID.

I'm still not understanding the logic. Here is a copyrighted picture. I can search for it, download it, view it, see it with my own eye balls. My browser already downloaded the image for me, in order for me to see it in the browser. I can take that image and edit it in a photo editor. I can do whatever I want with the image on my own computer, as long as I don't publish the image elsewhere on the internet. All of that is legal. None of it infringes on copyright.

Hell, it could be argued that if I transform the image to a significant degree, I can still publish it under Fair Use. But, that still gets into a gray area for each use case.

What is not a gray area is what AI training does. They download the image and use it in training, which is like me looking at a picture in a browser. The image isn't republished, or stored in the published model, or represented in any way that could be reconstructed back to the source image in any reasonable form. It just changes a bunch of weights in a LLM model. It's mathematically impossible for a 4GB model to somehow store the many many terabytes of images on the internet.

Where is the copyright infringement?

I remember stories about the RIAA suing individuals for many thousands of dollars per mp3 they downloaded. If you applied that logic to OpenAI — maximum fine for every individual work used — it’d instantly bankrupt them. Honestly, I’d love to see it. But I don’t think any copyright holder has the balls to try that against someone who can afford lawyers. They’re just bullies.

You want to use the same bullshit tactics and unreasonable math that the RIAA used in their court cases?

[–] Semjaza@lemmynsfw.com 3 points 2 days ago (1 children)

If you take that image, copy it and then try to resell it for profit you'll find you're quickly in breach of copyright.

The LLM is, in most cases, being licensed out to users for a profit off of the input data without which it could not exist in its current form.

You could see it akin to plagiarism if you think ctrl+c, ctrl+v is too extreme.

[–] p03locke@lemmy.dbzer0.com 1 points 1 day ago (1 children)

If you take that image, copy it and then try to resell it for profit you’ll find you’re quickly in breach of copyright.

That's not what's happening. Did you even read my comment?

[–] Semjaza@lemmynsfw.com 1 points 1 day ago (1 children)

OK, if you ignore the hyperbole of my pre-christmas stress aggressive start, how much of the rest do you disagree with?

Less combatitively, I'm of the stance that just make AI generated materials exempt from copyright and you'll at least limit mass adoption in public facing things by big money. Doesn't address all the issues, though.

[–] p03locke@lemmy.dbzer0.com 3 points 1 day ago* (last edited 1 day ago) (1 children)

AI-generated materials are already exempt from copyright. It falls under the same arguments as the monkey selfie. Which is great.

Crack copyright like a fucking egg. It only benefited the rich, anyway.

[–] Semjaza@lemmynsfw.com 1 points 1 day ago

That's good, and I'm glad to have been informed of it.

Thank you.

My copyright change is the 17 years from first publication. Feels maybe still a little long, but much better than what we have now.