Do AI detectors actually work or is everyone just guessing

so im halfway through my dissertation and my advisor casually mentioned that the department is now running all submissions through detection software. i wrote every single word myself but now im paranoid.

i tested a few paragraphs from my literature review in a couple of free tools online and two of them flagged sections as “likely AI generated.” these are paragraphs i spent weeks refining.

do AI detectors actually work? or are we all just trusting tools that have no real accountability? genuinely asking because my entire academic career could be affected by a false flag.

This is a documented problem in the field. The core issue is that most detectors rely on perplexity and burstiness metrics, which are statistical proxies. They measure how “surprising” text is at the token level. Highly polished academic writing tends to have lower perplexity because it follows disciplined structure, and that overlaps with the patterns these tools associate with AI.

Sadasivan et al. demonstrated in 2023 that simple paraphrasing can reduce detector accuracy to near-chance levels. The fundamental limitation is that these tools cannot verify provenance. They can only make probabilistic guesses about text characteristics.

I would recommend keeping detailed version history of your drafts. That is stronger evidence than any detection score.

As a lecturer, I will be direct about this: most of us who actually understand the tools know they are not definitive. A detection flag is not an accusation. At my institution, a flag triggers a conversation with the student, not a disciplinary process.

The bigger concern is institutions that automate the process without human review. That is where students get hurt. If your university uses Turnitin’s AI detection module, know that even Turnitin themselves recommend treating results as a starting point, not a verdict.

literally same situation. i ran my own essay through GPTZero and it said 47% AI. i wrote the whole thing at 2am fueled by energy drinks and desperation lol. nothing AI about that experience

kept my google docs version history just in case though. thats probably the best protection any of us have right now

I work with writers every day and this is something we discuss constantly. The accuracy problem is real, but what people miss is that “accuracy” itself is poorly defined in this context. Accurate compared to what ground truth? There is no labeled dataset of all human-written vs AI-written text that covers every writing style, education level, and domain.

Detectors work better on casual, unedited text. They struggle with anything that has been through a rigorous editing process. Academic writing, professional copywriting, legal documents - these all get flagged at higher rates because the editing process removes the very irregularities detectors look for.