this post was submitted on 02 Jul 2023
476 points (96.5% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

54566 readers
704 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):


💰 Please help cover server costs.

Ko-Fi Liberapay
Ko-fi Liberapay

founded 1 year ago
MODERATORS
 

cross-posted from: https://lemmy.intai.tech/post/43759

cross-posted from: https://lemmy.world/post/949452

OpenAI's ChatGPT and Sam Altman are in massive trouble. OpenAI is getting sued in the US for illegally using content from the internet to train their LLM or large language models

you are viewing a single comment's thread
view the rest of the comments
[–] archomrade@midwest.social 1 points 1 year ago

They do curate the data somewhat, though it's not easy to verify if they did since they don't share their data set (likely because they expect legal challenge)

There's no evidence they have "personal data" beyond direct textual data scraped from platforms such as reddit (much of which is disembodied from other metadata). I care FAR more about data google, facebook, or microsoft has leaking than I do text written on my old reddit or twitter account, and somehow we're not wringing our hands about that data collection.

I watched most of that video, and i'm frankly not moved by much of it. The video seems primarily (if not entirely) written in response to generative image models and image data that may actually be protected under existing copywrite, unlike the textual data in question in this particular lawsuit. Despite that, I think his interpretation of "derivative work" hand waving is flimsy at best, and relies on a materialist perspective that I just can't identify with (a pragmatic framework might be more persuasive to me). A case-by-case basis of copywrite infringement of the use of AI tools is the most solid argument he makes, but I am just not persuaded that all AI is theft based on publicly accessible data being used as training data. And i just don't think copywrite law is an ideal solution to a growing problem with technological automation and ever increasing levels of productivity and stagnating levels of demand.

I'm open to being wrong, but i think copywrite law doesn't address the long-term problems introduced by AI and is instead a shortcut to maintaining a status quo destined to failure regardless.