this post was submitted on 14 Dec 2023
266 points (95.5% liked)
Fediverse
28213 readers
1153 users here now
A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).
If you wanted to get help with moderating your own community then head over to !moderators@lemmy.world!
Rules
- Posts must be on topic.
- Be respectful of others.
- Cite the sources used for graphs and other statistics.
- Follow the general Lemmy.world rules.
Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
You can scrape Lemmy instances for training data without even running an instance.
Yeah, sorry if I'm not great at communicating. That's exactly what I'm trying to point out when I said:
That's the thing, anything public is fair game. This is why Reddit is ruining their API.
It's not fair game for for-profit bussinesses training LLM's. That's part of why Reddit made the move; so that companies would need to pay Reddit for access to the data for legally training models
They changed the terms and made the API pay to use for large volumes of use. People using it to train models have already pillaged what they need and you can get the data prior to APIgeddon elsewhere.
Sure, but it's still true that there are legal protections we can add that make it not fair game for Lemmy. At best it would be unfair-game (illegal scraping of Lemmy)
A rule for one Lemmy or even the Lemmy app doesn't mean same rule applies across ActivityPub Federation, if your data federated to my instance, it's mine too.
it can apply across all of them, for example that's how copy-left works
In other words, fair game.
What? I'm saying every federated copy must legally must have the usage restrictions. Just cause it's copied doesn't mean it can go into a for-profit LLM.
There is no licensing in the protocol so anything you put out there is free.
https://www.w3.org/TR/2018/REC-activitypub-20180123/
If we serve licensed content over ssh or HTTPS it's still licensed. Protocols don't change the legal requirements of the data. Warner Bros will still sue if one of their movies is hosted on a server using the activity pub protocol.