(edit: I accidentally a word and didn't realize you wrote 'auto-report instead of deleting them'. Read the following with a grain of salt)
I've played (briefly) with automated moderation bots on forums, and the main thing stopping me from going much past known-bad profiles (e.g. visited the site from a literal spamlist) is not just false positives but malicious abuse. I wanted to add a feature which would censor an image immediately with a warning if it was reported for (say) porn, shock imagery or other extreme content, but if a user noticed this, they could falsely report content to censor it until a staff member dismisses the report.
Could an external brigade of trolls get legitimate users banned or their posts hidden just by gaming your bot? That's a serious issue which could make real users have their work deleted, and in my experience, users can take that very personally.