Writing a methods section for my thesis on this and need real-world data. teachers, lecturers, professors - what’s the false positive rate threshold where you would stop using a particular AI detection tool? im finding very little published research on this specific question.
Personally if 1 in 20 of my class would be wrongly flagged thats already too high for me. so i need under 5% false positives. anything higher and the cost of wrongful accusations outweighs the benefit of catching cheaters. and that 5% is best case, in practice id want 1-2%.
I’d echo Jonah on the percentage. From a policy standpoint though my department’s threshold isn’t just FPR, it’s also explainability. A detector that’s wrong 2% of the time but can show me which specific sentences triggered the flag is more useful than one that’s wrong 1% with a single page-level score. We can investigate the first and have nothing actionable from the second.
the other thing worth measuring in your methods: false positive rate is not constant across demographics. ESL writers, students with certain disabilities, and STEM writers all show higher baseline FPRs on the major detectors. A single aggregate FPR hides those subgroups. Definitely worth a multi-segment analysis if you can stratify your sample.
@marc_Delrieu thats exactly the kind of stratification im planning. ESL, native, and STEM-heavy disciplines. thanks all this is really useful
Plus one to the demographic stratification piece. would also be curious to see if writing instruction style correlates - schools that teach ‘concise academic prose’ as the standard may show higher FPR because that’s exactly what detectors pattern-match.
From the publishing side wed accept higher FPR than k-12/uni because we manually review every flagged piece anyway. for us its a triage signal not a verdict. teachers operating at classroom scale have a totally different calculus and i think thats why this question doesnt have one industry-wide answer