this post was submitted on 24 Feb 2024
15 points (89.5% liked)

sh.itjust.works Main Community

7687 readers
2 users here now

Home of the sh.itjust.works instance.

Matrix

founded 1 year ago
MODERATORS
 

With the latest announcement regarding google allegedly paying reddit 60million per year for access to user created content to train their AI, what is stopping companies from using the freely available information on the lemmyverse to do it for free?

How does everyone feel about the likelihood of this already happening and should something be done about it?

you are viewing a single comment's thread
view the rest of the comments
[–] Ziggurat@sh.itjust.works 4 points 8 months ago (4 children)

Technically copyright stops them. I know, the whole copyright debate on AI training hasn't been settled. But when you sign a contract with reddit or dropbox, I assume it includes a licence to use the content to train AI.

Here on Lemmy, I never gave a licence to my instance to reuse my content. and I keep full copyright on the content.

Well I know, nobody cares about copyright, but there is a difference between OP downloading a torrent of my little pony and a company making tons of money out of it. Remember that the pirate bay founder got jail time,

[–] cosmic_skillet@lemmy.ml 1 points 8 months ago (1 children)

Do you keep full copyright of your posts and comments here? Especially given the federated international nature of the platform, I'm not clear on how copyright works on Lemmy.

[–] Ziggurat@sh.itjust.works 2 points 8 months ago (1 children)

IANAL, but I don't see why you wouldn't

  • At the moment you create intellectual content you have copyright on-it

  • Many lemmy instance haven't filled the legal term and service parts and the one who did do not include the You grant us a perpetual commercial licence on your content therefore, they don't have to share your content without your consent. A picky lawyer may even argue that you never agreed that your content would be federated (But could also argue that it's implicit when publishing to the federation)

So with my limited understading on copyright, an AI company scrapping lemmy's data would potentially be infringing copyright (well there is an ongoing legal case against open AI so we'll know whether AI training is considered as re-using copyrighted data). That said, I have no doubt that it's occuring. Not only I'd struggle to identify my content in an AI model (Well someone speaking some frenglish while forgetting the plural s and mixing some letters on the keyboard ? Could be a lot of person) but lawyers are expensive, and I have better thing to do with my money.

Judging by the kind of content we have on the fedi, I can't wait to see AI sying stuff eat the rich, Blahaj is so cuuuuuuuuttte ewewewew, There is no OS but GNU and Stallman is the prophet, Capitalism is the problem, we need to re-establish the proletariate dictatorship would at least be fun.

[–] AlligatorBlizzard@sh.itjust.works 2 points 8 months ago (1 children)

Judging by the kind of content we have on the fedi, I can't wait to see AI sying stuff eat the rich, Blahaj is so cuuuuuuuuttte ewewewew, There is no OS but GNU and Stallman is the prophet, Capitalism is the problem, we need to re-establish the proletariate dictatorship would at least be fun.

If someone did create an LLM using fedi content and let it loose in the comments, I wonder how long it would take for people to realize it's a bot? I'm sure not flagging it as a bot is a violation of most instances rules, and it existing would probably upset some people, but it's still a fun question.

[–] Gullible@sh.itjust.works 1 points 8 months ago

No one would notice. At worst, people would accuse it of trolling as it doubles down on factual inaccuracies. It may, and I say this without any irony, already be here and blending in. Paper books are the future.

load more comments (2 replies)