this post was submitted on 29 Sep 2023

416 points (94.1% liked)

Technology

59288 readers

5983 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

416

Authors Are Furious After Finding Their Works on List of Books Used To Train AI (www.themarysue.com)

submitted 1 year ago by stopthatgirl7@kbin.social to c/technology@lemmy.world

143 comments fedilink hide all child comments

Authors using a new tool to search a list of 183,000 books used to train AI are furious to find their works on the list.

you are viewing a single comment's thread
view the rest of the comments

[–] elbarto777@lemmy.world 21 points 1 year ago (4 children)

These are machines, though, not human beings.

I guess I'd have to be an author to find out how I'd feel about it, to be fair.

[–] Touching_Grass@lemmy.world 8 points 1 year ago (1 children)

Machines that aren't reproducing or distributing works

[–] FaceDeer@kbin.social 4 points 1 year ago

If an AI "reproduces" a work it was trained on it is a failure of an AI. Why would anyone want to spend millions of dollars and devote oodles of computing power to build something that just does what a simple copy/paste operation can accomplish?

When an AI spits out something that's too close to one of the original training set that's called "overfitting" and it is considered an error to be corrected. Most overfitting that's been detected has been a result of duplication in the training set - when you hammer an AI image generator in training with thousands of copies of the Mona Lisa it eventually goes "alright, I get it already, when you say 'Mona Lisa' you want that exact pattern!" And will try its best to replicate that pattern when you ask it to later. That's why training sets need to be de-duplicated.

AIs are meant to produce new things.

[–] dutchkimble@lemy.lol 3 points 1 year ago (1 children)

But terminator said neural networks

[–] elbarto777@lemmy.world 3 points 1 year ago

Damn.

[–] kromem@lemmy.world 2 points 1 year ago (1 children)

Did you write a comment on Reddit before 2015? If so, your copyrighted content was used without your permission to train today's LLMs, so you absolutely get to feel one way or another about it.

The idea that these authors were somehow the backbone of the models when any individual contribution was like spitting in the ocean and model weights would have considered 100 pages of Twilight fan fiction equivalent to 100 pages from Twilight is honestly one of the negative impacts of the extensive coverage these suits are getting.

Pretty much everyone who has ever written anything indexed online is a tiny part of today's LLMs.

[–] elbarto777@lemmy.world 2 points 1 year ago* (last edited 1 year ago) (1 children)

Thank you for your reply.

On a completely separate note, it's funny to think that there exists Twilight fan fiction when ~~Twilight itself started as fan fiction work.~~

Edit: I dun goofed.

[–] kromem@lemmy.world 2 points 1 year ago (1 children)

Pretty sure it's the other way around.

Fifty Shades of Gray started out as Twilight fanfiction before becoming its own thing.

AFAIK Twilight was always just its own pulp fiction.

[–] elbarto777@lemmy.world 2 points 1 year ago

Oh true! My memory was fuzzy on the details. Thanks for the correction.

[–] Shurimal@kbin.social -2 points 1 year ago (4 children)

These are machines, though, not human beings.

What's the difference? On the most fundamental level it's all the same.

[–] brygphilomena@lemmy.world 10 points 1 year ago (2 children)

A human, regardless of how many books they read, will have personal experiences that are undeniably unique to themselves. They will interpret the works they read differently from each other based on their worldly experiences. Their writing, no matter how many books they read and get inspired on, will always be influenced by their own personal lives. They can experience love, hate, heartbreak, empathy, sadness, and happiness.

This is something a LLM does not have, and in my opinion, is a massive distinguishing factor. So on a "fundamental" level, it is not the same. It is no where near the same.

[–] lloram239@feddit.de 1 points 1 year ago

A human, regardless of how many books they read, will have personal experiences that are undeniably unique to themselves.

So will every AI. ChatGPT will give you different answers than Bard or WizardLM, since they are all trained on different books. And every StableDiffusion model creates different images, different styles, different topics, etc. It's all in the data they "experienced".

[–] originalucifer@moist.catsweat.com -1 points 1 year ago

do you really think we are that far off... from giving a foundational memory and motivation layers to these LLMs, that could mimic.. or even.. generate the generic thoughts youre indicating?

i dont think so. you seem to imply its impossibility, i expect its inevitability. the human brain will not be a black box forever... it still exists in a world of physics we can emulate, even if rudimentary.

[–] AnonStoleMyPants@sopuli.xyz 9 points 1 year ago (1 children)

The same thing as with tooooooons of things: scale.

Nobody cares if one dude steals office supplies at work. Now, if everyone stats doing it, or if the single guy steals everything, then action is taken.

Nobody cares if a random person draws in the same style and with same characters as you, but if they start to sell them, or god forbid, out-sell you, then there is a problem.

Nobody cares (except police I guess) if a random driver drives double the speed limit and annoys people living next to the road on the weekends, but when tons of people do it, you get speed bumps.

Nobody cares if few people pirate movies, but when it gets to mainstream and companies notice that there might be money being lost. Then you get whatever we have now.

Nobody cares if the mudhill behind your house erodes a bit and you get mud on your shoes. Have a bunch of that erode and you realise the danger...

You have been fine-tuning your own writing style for a decade and random schmuck starts to write similarly, you probably don't care. No harm done. Now, get an AI to write 10 000 books in a weekend and someone starts to sell them... well now you have a completely different problem.

On a fundamental level the exact same thing is happening, yet action is only taken after a certain threshold is step over.

[–] sab@lemmy.world 2 points 1 year ago

Bingo.

[–] Wander@kbin.social 8 points 1 year ago

Unless you think theres no difference between killing a person and closing a program, I think we can agree they should be treated differently in the eyes of the law.

And so theres a difference between a person reading a book and being inspired by it, and someone writing a program that automatically transforms the book in data that can create new books.

[–] elbarto777@lemmy.world 3 points 1 year ago (2 children)

Wait. Are human beings machines?

[–] lloram239@feddit.de 4 points 1 year ago

Biological machines, yes.

[–] jennraeross@lemmy.world 4 points 1 year ago

Please do not take this as support of ai use of copyrighted works (I don’t), but as far as I can tell, yes we are machines. This rant is just me being aspie atm, so feel free to ignore it.

We are thinking machines programmed by our genetics, predispositions, experiences, and circumstances. A 2 part explanation of how humans are merely products of their circumstances was once put forward to me. The first part is that humans can do anything, but only the thing we want to do most.

For instance, a common rebuttal is that people can choose go to the gym even when they find the experience of exercise undesirable. However, when that happens, it’s merely a case of other wants out balancing the want to not go to the gym, typically they want to be fit.

We want to not spend money, but we want to not rush going to jail for stealing more, usually. We want to not work overtime, but sometimes we want the extra cash more than that.

The second part of the argument is that we can’t choose what we want. When someone talks themselves out of the slice of cheesecake, they aren’t changing what they want, they’re resolving said want against the larger want they have to lose weight.

And if we make decisions by our wants, while said wants are not decided by us, then despite appearances we are little more than complex automata.