After this year of testing them, i dont act on any detection result unless its over 90% confidence AND from at least two independent tools. the noise below that band is just not actionable. anyone using a tighter or looser threshold and seeing different results
Tighter, basically. our internal threshold is 95% on the primary tool, with secondary tool agreement, AND a sentence-level smoking gun. nothing below that triggers action. anything else is information only.
I operationally treat any score under roughly 85% as ‘have a conversation’ rather than ‘accusation’. above 85, version history review. above 95 with sentence-level matches, formal process.
Good thread. the absence of any standardized confidence framework across detectors is the actual problem. each one’s ‘85%’ means something different.
@nightOwl_Reborn yeah this. nobody publishes how their confidence is computed.