this post was submitted on 09 Jun 2023
127 points (100.0% liked)
Chat
7508 readers
30 users here now
Relaxed section for discussion and debate that doesn't fit anywhere else. Whether it's advice, how your week is going, a link that's at the back of your mind, or something like that, it can likely go here.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
They might, but you will still be helping people, and if at a later date a court mandates that the authors of the training data be compensated for their actions, or if the corpus is released into open source repositories -- then I'd still call that a win for humanity.
That is a fair point.
Personally in an ideal world, I would like to export all of my data from reddit before leaving, and then if later someone wants to host all of the dataset under a permissive open source license like I believe stackoverflow or wikipedia do, which is accessible to search engines, then scrub+anonymize my dataset and upload it there.
Obviously the issues with something like this are people uploading doctored data to poison the training models etc.