Technology

59174 readers

2401 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

671

OpenAI confirms that AI writing detectors don’t work (arstechnica.com)

submitted 1 year ago by fne8w2ah@lemmy.world to c/technology@lemmy.world

111 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Leate_Wonceslace@lemmy.dbzer0.com 14 points 1 year ago (2 children)

I feel like this must stem from a misunderstanding of what 26% accuracy means, but for the life of me, I can't figure out what it would be.

[–] dartos@reddthat.com 11 points 1 year ago* (last edited 1 year ago)

Looks like they got that number from this quote from another arstechnica article ”…OpenAI admitted that its AI Classifier was not "fully reliable," correctly identifying only 26 percent of AI-written text as "likely AI-written" and incorrectly labeling human-written works 9 percent of the time”

Seems like it mostly wasn’t confident enough to make a judgement, but 26% it correctly detected ai text and 9% incorrectly identified human text as ai text. It doesn’t tell us how often it labeled AI text as human text or how often it was just unsure.

EDIT: this article https://arstechnica.com/information-technology/2023/07/openai-discontinues-its-ai-writing-detector-due-to-low-rate-of-accuracy/

[–] schzztl@lemmy.nz 1 points 1 year ago (1 children)

Specificity vs sensitivity, no?

[–] cmfhsu@lemmy.world 1 points 1 year ago* (last edited 1 year ago)

In statistics, everything is based off probability / likelihood - even binary yes or no decisions. For example, you might say "this predictive algorithm must be at least 95% statistically confident of an answer, else you default to unknown or another safe answer".

What this likely means is only 26% of the answers were confident enough to say "yes" (because falsely accusing somebody of cheating is much worse than giving the benefit of the doubt) and were correct.

There is likely a large portion of answers which could have been predicted correctly if the company was willing to chance more false positives (potentially getting studings mistakenly expelled).