this post was submitted on 09 Jun 2023
127 points (100.0% liked)

Chat

7500 readers
56 users here now

Relaxed section for discussion and debate that doesn't fit anywhere else. Whether it's advice, how your week is going, a link that's at the back of your mind, or something like that, it can likely go here.


Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
 

Give me a hive five. Beehaw, pardners!

you are viewing a single comment's thread
view the rest of the comments
[โ€“] tetris11@lemmy.ml 3 points 1 year ago (1 children)

They might, but you will still be helping people, and if at a later date a court mandates that the authors of the training data be compensated for their actions, or if the corpus is released into open source repositories -- then I'd still call that a win for humanity.

[โ€“] 4bh1j47@beehaw.org 4 points 1 year ago

That is a fair point.

Personally in an ideal world, I would like to export all of my data from reddit before leaving, and then if later someone wants to host all of the dataset under a permissive open source license like I believe stackoverflow or wikipedia do, which is accessible to search engines, then scrub+anonymize my dataset and upload it there.

Obviously the issues with something like this are people uploading doctored data to poison the training models etc.