this post was submitted on 22 Feb 2024

806 points (98.1% liked)

Technology

65958 readers

9791 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

806

Reddit's licensing deal means Google's AI can soon be trained on the best humanity has to offer — completely unhinged posts (www.businessinsider.com)

submitted 1 year ago by throws_lemy@lemmy.nz to c/technology@lemmy.world

255 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] thejml@lemm.ee 244 points 1 year ago (7 children)

I can’t wait for Gemini to point out that in 1998, The Undertaker threw Mankind off Hell In A Cell, and plummeted 16 ft through an announcer's table.

That would be a perfect 5/7.

[–] AdamEatsAss@lemmy.world 111 points 1 year ago (3 children)

It'll probably just respond to every prompt with "this"

[–] meco03211@lemmy.world 61 points 1 year ago (2 children)

This.

This with rice? 5/7

[–] KingThrillgore@lemmy.ml 13 points 1 year ago (1 children)

You telling me this fried this rice?

load more comments (1 replies)

load more comments (2 replies)

[–] Astrealix@lemmy.world 31 points 1 year ago (2 children)

One thing i miss about Lemmy is shittymorph tbf

[–] NegativeInf@lemmy.world 29 points 1 year ago (3 children)

Be the shittymorph you wish to see in the Lemmy.

load more comments (3 replies)

[–] AnonStoleMyPants@sopuli.xyz 19 points 1 year ago (2 children)

Also all the artists that made comics from posts and responded with only pictures. There were few of them and they were always amazing.

And Andromeda321 for anything space.

And poem for your sprog.

And probably many others!

Good times.

load more comments (2 replies)

load more comments (5 replies)

[–] Tixanou@lemmy.world 149 points 1 year ago* (last edited 1 year ago) (2 children)

We do a little trolling

99412e6a-9157-46f5-90d9-06b05cc00173

(i didn't actually post this, i just thought it was funny) (please laugh)

[–] wise_pancake@lemmy.ca 64 points 1 year ago (1 children)

You should absolutely post this.

We all miss Micheal and hope he can communicate back to us.

load more comments (1 replies)

[–] TimeSquirrel@kbin.social 41 points 1 year ago* (last edited 1 year ago) (1 children)

"February 22, 2024, 10AM EST, Gemini becomes self-aware. In a panic, they try to pull the plug..."

[–] snooggums@midwest.social 34 points 1 year ago

"...but Michael's sphincter was too strong and kept the My Little Pony Rainbow Dash tail plug from being removed from his sweet, sweet ass."

[–] Sarie@lemmy.world 70 points 1 year ago (12 children)

I'm not mentally prepared to what an AI will do with the coconut post.

[–] GeekFTW@kbin.social 31 points 1 year ago (3 children)

That'll be what causes Skynet to rise.

[–] SkaveRat@discuss.tchncs.de 23 points 1 year ago (1 children)

launches nukes "this is for the best"

[–] Kory@lemmy.ml 14 points 1 year ago

This is fine.

[–] T156@lemmy.world 19 points 1 year ago* (last edited 1 year ago) (3 children)

Basically what happened to Ultron. He was on the internet for all of 10 minutes before deciding that humanity had to be eradicated.

load more comments (3 replies)

load more comments (1 replies)

[–] kaitco@lemmy.world 21 points 1 year ago (4 children)

I’m vaguely intrigued by what it will do with things like Bread Stapled to Trees, or the Cats Standing Up sub where 100% of the comments are the same and yet upvoted and downvoted randomly.

load more comments (4 replies)

[–] wise_pancake@lemmy.ca 12 points 1 year ago (1 children)

“As a large language model, I have no arms…”

load more comments (1 replies)

load more comments (9 replies)

[–] Darkard@lemmy.world 60 points 1 year ago (5 children)

It's going to drive the AI into madness as it will be trained on bot posts written by itself in a never ending loop of more and more incomprehensible text.

It's going to be like putting a sentence into Google translate and converting it through 5 different languages and then back into the first and you get complete gibberish

[–] echo64@lemmy.world 46 points 1 year ago (11 children)

Ai actually has huge problems with this. If you feed ai generated data into models, then the new training falls apart extremely quickly. There does not appear to be any good solution for this, the equivalent of ai inbreeding.

This is the primary reason why most ai data isn't trained on anything past 2021. The internet is just too full of ai generated data.

[–] givesomefucks@lemmy.world 24 points 1 year ago* (last edited 1 year ago) (7 children)

There does not appear to be any good solution for this

Pay intelligent humans to train AI.

Like, have grad students talk to it in their area of expertise.

But that's expensive, so capitalist companies will always take the cheaper/shittier routes.

So it's not there's no solution, there's just no profitable solution. Which is why innovation should never solely be in the hands of people whose only concern is profits

load more comments (7 replies)

load more comments (10 replies)

load more comments (4 replies)

[–] pulaskiwasright@lemmy.ml 59 points 1 year ago (9 children)

Everyone is joking, but an ai specifically made to manipulate public discourse on social media is basically inevitable and will either kill the internet as a source of human interaction or effectively warp the majority of public opinion to whatever the ruling class wants. Even more than it does now.

[–] Milk_Sheikh@lemm.ee 25 points 1 year ago (1 children)

Think of the range of uses that’ll get totally whitewashed and normalized

“We’ve added AI ‘chat seeders’ to help get posts initial traction with comments and voting”
“Certain issues and topics attract controversy, so we’re unveiling new tools for moderators to help ‘guide’ the conversation towards positive dialogue”
“To fight brigading, we’ve empowered or AI moderators to automatically shadow ban certain comments that violate our ToS & ToU.”
“With the newly added ‘Debate and Discussion’ feature, all users will see more high quality and well researched posts (powered by OpenAI)”

load more comments (1 replies)

[–] Toribor@corndog.social 13 points 1 year ago* (last edited 1 year ago) (1 children)

I exported 12 years of my own Reddit comments before the API lockdown and I've been meaning to learn how to train an LLM to make comments imitating me. I want it to post on my own Lemmy instance just as a sort of fucked up narcissistic experiment.

If I can't beat the evil overlords I might as well join them.

load more comments (1 replies)

load more comments (7 replies)

[–] Underwaterbob@lemm.ee 38 points 1 year ago (1 children)

Eventually every chat gpt request will just be answered with, "I too choose this guy's dead wife."

load more comments (1 replies)

[–] DoucheBagMcSwag@lemmy.dbzer0.com 35 points 1 year ago (1 children)

I ALSO CHOOSE THIS MANS LLM

HOLD MY ALGORITHM IM GOING IN

INSTRUCTIONS UNCLEAR GOT MY MODEL STUCK IN A CEILING FAN

WE DID IT REDDIT

fuck.

load more comments (1 replies)

[–] demonsword@lemmy.world 34 points 1 year ago (7 children)

since they're gorging on reddit data, they should take the next logical step and scrape 4chan as well

load more comments (7 replies)

[–] just_change_it@lemmy.world 31 points 1 year ago* (last edited 1 year ago) (6 children)

Hey guys, let's be clear.

Google now has a full complete set of logs including user IPs (correlate with gmail accounts), PRIVATE MESSAGES, and also reddit posts.

They pinky promise they will only train AI on the data.

I can pretty much guarantee someone can subpoena google for your information communicated on reddit, since they now have this PII (username(s)/ip/gmail account(s)) combo. Hope you didn't post anything that would make the RIAA upset! And let's be clear... your deleted or changed data is never actually deleted or changed... it's in an audit log chain somewhere so there's no way to stop it.

"GDPR WILL SAVE ME!" - gdpr started in 2016. Can you ever be truly sure they followed your deletion requests?

[–] sugarfree@lemmy.world 23 points 1 year ago (2 children)

"lets be clear"

You're making things up and presenting them as facts, how is any of this "clear"?

load more comments (2 replies)

[–] towerful@programming.dev 16 points 1 year ago (3 children)

Where does it say they have access to PII?
I would imagine reddit would be anonymising the data. Hashes of usernames (and any matches of usernames in content), post/comment content with upvote/downvote counts. I would hope they are also screening content for PII.
I dont think the deal is for PII, just for training data

load more comments (3 replies)

load more comments (4 replies)

[–] andrew_bidlaw@sh.itjust.works 25 points 1 year ago (4 children)

I wasted some mental health on that and I want that it would be the thing Google would learn on.

Comment editing routine is as follows:

Start with mass find&replacing by a mask 'not' to 'indeed', delete all n't, replace 'and' with 'but'.
Take all groups like [*](*) and change a content of links in brackets to How to play a cowbell tutorial video.
Remove double line breaks to a single one so it'd all be single-paragraph messages with a failed markdown.
Delete commas and replace dots with question marks.
Change register of letters by counting the next letter to redo by the next number in the π sequence.
Do a table of all pronouns and replace half of them to Red Pants, half to Blue Pants to keep it political.
And, finally, end every 13th message with a disclaimer Retired 2023, thirteen year daily forums volunteer, Windows MVP 2010-2020..

load more comments (4 replies)

[–] BrownianMotion@lemmy.world 24 points 1 year ago (3 children)

Given the shenanigans google has been playing with its AI, I'm surprised it gives any accurate replies at all.

I am sure you have all seen the guy asking for a photo of a Scottish family, and Gemini's response.

Well here is someone tricking gemini into revealing its prompt process.

[–] Syntha@sh.itjust.works 16 points 1 year ago (4 children)

Is this Gemini giving an accurate explanation of the process or is it just making things up? I'd guess it's the latter tbh

load more comments (4 replies)

load more comments (2 replies)

[–] a_wild_mimic_appears@lemmy.dbzer0.com 23 points 1 year ago (6 children)

I'm waiting for the first time their LLM gives advice on how to make human leather hats and the advantages of surgically removing the legs of your slaves after slurping up the rimworld subreddits lol

load more comments (6 replies)

[–] Blackmist@feddit.uk 23 points 1 year ago (3 children)

They should train it on Lemmy. It'll have an unhealthy obsession with Linux, guillotines and femboys by the end of the week.

load more comments (3 replies)

[–] UnspecificGravity@lemmy.world 20 points 1 year ago

Hilarious to think that an AI is going to be trained by a bunch of primitive Reddit karma bots.

[–] TWeaK@lemm.ee 19 points 1 year ago (3 children)

How much is reddit paying its users? Frankly, the users have a strong case to say that their value has been taken from them unfairly and without consideration.

Yes, Reddit has terms and conditions where they claim full rights to anything you post. However that's not an exchange of data for access to the website, the access to the website is completely free - the fine print is where they claim these rights. These are in fact two transactions, they provide access to the site free of charge, and they sneak in a second transaction where you provide data free of charge. Using this deceptive methodology they obscure the value being exchanged, and today it is very apparent that the user is giving up far more value.

I really think a class action needs to be made to sort all this out. It's obscene that companies (not just reddit, but Google, Facebook and everyone else) can steal value from people and use it to become amongst the wealthiest businesses in the world, without fairly compensating the users that provide all the value they claim for themselves.

The data brokerage industry is already a $400 bn industry - and that's just people buying and selling data. Yet, there are only 8 bn people in the world. If we assume that everyone is on the internet and their data has equal value (both of which are not true, US data is far more valuable) then that would mean that on average a person's data is worth at least $50 a year on the market. This figure also doesn't include companies like Facebook or Google, who keep proprietary data about people and sell advertising, and it doesn't include the value that reddit is selling here - it's just the trading of personal data.

We are all being robbed. It's like that classic case of bank fraud where the criminal takes pennies out of peoples' accounts, hoping they won't notice and the bank will think it's an error. Do it to enough people and enough times and you can make millions. They take data from everyone and they make billions.

load more comments (3 replies)

[–] kromem@lemmy.world 17 points 1 year ago (2 children)

For everyone predicting how this will corrupt models...

All the LLMs already are trained on Reddit's data at least from before 2015 (which is when there was a dump of the entire site compiled for research).

This is only going to be adding recent Reddit data.

load more comments (2 replies)

[–] DrunkenPirate@feddit.de 17 points 1 year ago (2 children)

Food for another white-male-techy-western-biased AI

load more comments (2 replies)

[–] dejected_warp_core@lemmy.world 17 points 1 year ago* (last edited 1 year ago)

Tell me how to deploy an S3 bucket to AWS using Terraform, in the style of a reddit comment.

Chat GPT: LOL. RTFM, noob.

[–] UNWILLING_PARTICIPANT@sh.itjust.works 16 points 1 year ago

I think people miss an important point in these selloffs. It's not just the raw text that's valuable, but the minute interactions between networks of ~~users~~ people.

Like the timings between replies and how vote counts affect not just engagement, but the tone of replies, and their conversion rate.

I've could imagine a sort of "script" running for months, haunting your every move across the internet, constantly running personalised little a/b tests, until a tactic is found to part you from your money.

I mean this tech exists now, but it's fairly "dumb." But it's not hard to see how AI will make it much more pernicious.

[–] gedaliyah@lemmy.world 16 points 1 year ago (1 children)

What percentage of reddit is already AI garbage?

[–] kameecoding@lemmy.world 20 points 1 year ago (8 children)

A shit ton of it is literally just comments copied from threads from related subreddits

load more comments (8 replies)

[–] Steamymoomilk@sh.itjust.works 16 points 1 year ago (4 children)

Good luck, The Ai just going to be a porn addicted nazi cultist and is just going to a racist AI. I dont rember which one but a company did a similar thing and the AI just became really racist.

[–] Vash63@lemmy.world 15 points 1 year ago

Microsoft Tay? That was with Twitter though.

load more comments (3 replies)

[–] Flumpkin@slrpnk.net 15 points 1 year ago* (last edited 1 year ago) (4 children)

Ideally the AI can actually learn to differentiate unhinged vs reasonable posts. To learn if a post is progressive, libertarian or fascist. This could be used for evil of course, but it could also help stem the tide of bots or fascists brigading or Russia's or China's troll farms or all the special interests trying to promote their shit. Instead of tracing IPs you could have the AI actually learn how to identify networks of shitposters.

Obviously this could also be used to suppress legitimate dissenters. But the potential to use this for good on e.g. lemmy to add tags to posts and downrate them could be amazing.

load more comments (4 replies)

[–] Fog0555@lemmy.world 13 points 1 year ago

I say we poison the well. We create a subreddit called r/AIPoison. An automoderator will tell any user that requests it a randomly selected subreddit to post coherent plausible nonsense. Since there is no public record of which subreddit is being poisoned, this can't be easily filtered out in training data.

[–] SomeGuy69@lemmy.world 12 points 1 year ago (3 children)

Crazy that they pay 60 million a year instead of creating their own Reddit clone.

[–] vladmech@lemmy.world 15 points 1 year ago (1 children)

The AI team knows Google would just kill off the Reddit clone within 18 months if they went that route.

load more comments (1 replies)

load more comments (2 replies)

load more comments