Technology

59092 readers

6622 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

131

Largest Dataset Powering AI Images Removed After Discovery of ‘Suspected’ Child Sexual Abuse Material (www.404media.co)

submitted 10 months ago by BlackEco@lemmy.blackeco.com to c/technology@lemmy.world

19 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] LainOfTheWired@lemy.lol 9 points 10 months ago (2 children)

On a different note how do these big companies train AI's to detect CSAM without using a bunch of illegal CSAM to train it?

[–] FaceDeer@kbin.social 15 points 10 months ago

It's perverse how the laws are so ultra-strict that you can break them by making an attempt to comply with them. The article describes how at several points the researchers had to "outsource" part of their work to people in less-strict jurisdictions And. LAION itself is based in Germany, which adds yet another jurisdiction to the situation.

CSAM always turns into a ridiculous minefield. So many different jurisdictions and different definitions, and everyone is ultra adamant about theirs being the one that must be enforced globally.

[–] fishos@lemmy.world 5 points 10 months ago* (last edited 10 months ago) (1 children)

I've heard there are specific data sets you can download that have the training data, but not the images themselves. Someone else already ran the images through a training model and you're just grabbing the processed data and plugging it into your model. I'm sure I'm missing some nuance and haven't looked into it myself, but I've seen that given as the answer when someone asked before.

[–] piecat@lemmy.world 3 points 10 months ago

IIRC from a previous thread, different law enforcement agencies will release hashes or similar so the image can be detected without distributing the original