264
OpenAI has built a text watermarking method to detect chatgpt written content
(www.tomshardware.com)
This is a most excellent place for technology news and articles.
They can cycle a some biases (dozens?) and test them all. Detokenization is super cheap to run, its not AI or anything.
I'm trying to think of a good analogy for how this would work, and I kinda came up with one. This would be kinda like an image encoder that biases itself towards coding RGB values (0-255) as even numbers. Subtly, say 30% odd 70% even.
That's totally imperceptile to humans. And even a "small" sample of the image would carry this bias if pasted into a larger image verbatim, since the sample size is so large (just as the sample size for a bunch of tokens in text is pretty big.
And I'm not saying its fullproof... but if thats indeed what they're doing, I think its a decent way to detect "lazy" OpenAI abusers who aren't working so hard to scramble and defeat it.