this post was submitted on 26 Jul 2023
104 points (85.1% liked)
Technology
59092 readers
6622 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I’m not sure that’s a fair comparison. You wouldn’t instantly ingest that information and know it. It’s more like photocopying a book and including it in another book that you sell. It’s a paradigm shift, and I’m not sure what the answer is.
It's nothing like photocopying a book. It is very, very similar to the analogy given above, of someone learning the information and profiting from it. For the AI model to "learn" the information during training, it takes apart the information one piece of a word at a time, and reorganises it for quick access. Information is categorised by metadata like topic, source, date, etc; there are approximately 1536 "tags", so to speak, which OpenAI's ChatGPT uses for categorising what it learns.
Copyright of words has the order of those words as an integral part of the legal standard, and the standards for what infringes are actually pretty strict (https://fairuse.stanford.edu/2003/09/09/copyright_protection_for_short/). Training an AI is definitively transformative work which does not retain the order of the words in the finished product, merely a weighted likelihood of what word fragment will come next in a given context, so it's protected under Fair Use.
I don’t think it’s that simple. Like I said it’s a paradigm shift. It doesn’t fit into existing laws well. My point is what we consider fair use now, summarizing a book or movie by a human, is based on the limited abilities of humans. When you have AI with limitless abilities, that will change things. The same rules abs considerations may have to be rethought.
Au contraire, it is that simple and it is covered by existing law just fine in the very specific case we're talking about, which is whether training a model is "transformative work" by the definition in IP law. It is. The law looks very specifically at the fact of the case, not hand-waving masquerading as an argument.
You are making this technology out to be something it isn't; there's no mystery to how AI works, and it does not have "limitless abilities". In fact, it is very limited, but that isn't relevant. What the law considers "fair use" isn't based on human ability at all, it's based on how completely the work is reproduced and the context the original work is being used in. You clearly have access to the internet, you can verify the standards required to show breach of copyright yourself if you don't believe me.