this post was submitted on 01 Sep 2023
83 points (100.0% liked)

Technology

37573 readers
540 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] AdminWorker@lemmy.ca 19 points 1 year ago (3 children)

I said this in a different post's comments about Facebook scraping data:

Can activity pub change it's terms to say that all crawlers that use this must be gnu open sources and all information crawled must be open to the public on gnu open sources software (no crawling to a private enterprise)?

My understanding is all the big tech companies are scared of what happened with router software (openwrt) and they don't want to be forced to let competition be a foss community via gnu licensing.

[–] Radiant_sir_radiant@beehaw.org 9 points 1 year ago (1 children)

I share your sentiment, but personally I don't like the GPL's Borg-like assimilation of anything it touches.

How about "every crawler using the API must provide the same API free of charge for "?

[–] anlumo@feddit.de 2 points 1 year ago

Meta has no problems providing API access free of charge, since their income comes from other sources.

[–] PrincipleOfCharity@0v0.social 3 points 1 year ago

I have also thought this is a good idea. I think that the ActivityPub standard should have a required field that lists a copyright license. Then a copyleft style copyright should be created that allows storing and indexing for distribution via open-source standards, and disallows using for AI training and data scraping. If every single post has a copyleft license then it would be risky for bigtech to repurpose it because if a whistleblower called them out that could be a huge class action suit.

A good question is if a single post can be copyrighted. I think it could. Perhaps you would consider each post like a collaborative work of art. People keep adding to it, and at the end of the day the whole chain could function as a “work”. Especially since there is a lot of useful value and knowledge in some post threads.

[–] tesseract@beehaw.org 1 points 1 year ago

If that worked, we could have easily prevented AI companies from vacuuming up data from personal websites and separately hosted git repos. We could put a condition that if they train their models using our data, then the model and its weights would automatically be under the same license as our content. Of course, those psychopaths are going to use their money to defeat such arguments in court.