Guide · 12 min read

AI detection, explained —
how detectors work and where they fail.

Per-detector breakdown, what accuracy claims actually mean, where false positives concentrate, and how to interpret a score responsibly.

Read time: 12 min Published: 20 Apr 2026 Updated: 20 Apr 2026

A detector says your text is "98% AI." A different detector says the same text is "7% AI." A teacher treats the first number as proof. An institution builds policy around it.

Something is wrong with this picture, and the thing that's wrong is the assumption that AI detectors produce verdicts. They don't. They produce probabilistic classifier scores, and understanding what that means — what detectors measure, how they measure it, and where they fail — is the difference between using detector output well and treating it like a lie detector.

This guide walks through, in plain English, how the major detectors actually work, what their published accuracy claims mean (and don't mean), where false positives concentrate, and how to interpret a score responsibly.

The short version of how detectors work

Every major AI detector is a classifier. You give it text. It outputs a probability that the text was generated by an AI model. Behind the probability is a statistical model — usually a machine-learning classifier trained on pairs of human-written text and AI-generated text — that's learned to distinguish them based on a handful of measurable properties of the prose.

The most commonly used properties are:

  • Perplexity — how "surprising" each next word is given the words before it, measured against a reference language model. AI text tends to have low, smooth perplexity because it was generated by picking the most likely next word at each step.
  • Burstiness — the variance of sentence length and complexity across a passage. Human writers burst: short sentence, long sentence, fragment. AI writers flatline.
  • Token-level patterns — frequencies of specific phrases and collocations that models overuse: "delve into," "intricate," "multifaceted," "it is important to note."
  • Structural features — list density, transition-phrase density, paragraph-length uniformity.

Some detectors add further layers: an ensemble of classifiers, a language-model-based scorer, stylometric features, or a proprietary transformer trained specifically for AI/human discrimination. The branding differs. The underlying mechanism is more similar than the marketing suggests.

How each of the main detectors actually works

GPTZero

GPTZero was the first widely-deployed public detector, built on perplexity and burstiness as its two main signals. It computes a perplexity score for the input text using a reference language model (originally GPT-2-based), computes the variance of perplexity across sentences, and outputs a classification and a per-sentence highlight.

Strengths: interpretable, fast, publicly documented. Weaknesses: it was trained primarily on pre-2024 text distributions; newer models produce text with higher perplexity than GPT-3 did, and older human writing (particularly formal or academic) has lower burstiness than it assumed. As of 2026, GPTZero publishes a false-positive rate of around 1% on their internal benchmark — but third-party evaluation puts it higher on non-native-English text.

Turnitin AI writing indicator

Turnitin doesn't publish its methodology in detail. What's known: it's a proprietary classifier trained on a large corpus of academic writing, integrated into the same workflow as plagiarism detection. It produces a percentage score for "AI-generated content."

Turnitin's public position is that their detector is "designed to achieve a false positive rate of less than 1%." In practice, third-party audits have found higher rates, particularly on:

  • Non-native English writers (false-positive rate estimated 2–5× higher)
  • Short passages (under 300 words)
  • Heavily edited or revised text of any origin

Turnitin's own documentation warns that their AI indicator "should not be used as the sole basis for any academic action." Many institutions use it that way anyway, which is a policy failure, not a detection failure.

Originality.ai

Originality.ai targets the content-marketing and SEO world rather than education. It produces a binary "AI" / "Original" score plus a confidence percentage, and markets itself on high accuracy against current-generation models.

Its classifier is regularly updated; they publish benchmarks showing high recall on GPT-4 and Claude-generated text. It is one of the detectors most sensitive to paraphrased AI text — standard synonym-swap humanisers rarely fool it, while rhythm-level rewrites often do.

The tradeoff is that Originality.ai's aggressive calibration produces more false positives on formal human writing, particularly marketing copy written in an editorial style. They do not publish a single headline false-positive rate; third-party evaluation places it in the 2–4% range on typical content, rising sharply on highly formal prose.

Copyleaks

Copyleaks positions itself across both education and enterprise, with a classifier that ensembles multiple signals (perplexity, stylometry, token patterns) and adds a "source similarity" layer that compares input text against known AI output patterns.

Copyleaks publishes a claimed false-positive rate of 0.2%. Third-party reproduction has been difficult to run at scale; the available published evaluations place its real-world false-positive rate closer to 1–3%, varying substantially by input domain.

Winston AI

Winston AI is a newer entrant, competing on speed and on a claim of 99.98% accuracy. The mechanism appears to be an ensemble classifier with a transformer-based scorer. It produces a percentage score and per-sentence highlighting similar to GPTZero.

The 99.98% headline is, like most detector accuracy claims, measured on their internal benchmark and not reproducible independently at that figure. Winston's scores on in-the-wild text are often more moderate and less deterministic than the marketing suggests.

What "99% accuracy" actually means (and doesn't)

Detector accuracy claims almost always come from internal benchmarks measured on balanced synthetic datasets — roughly equal amounts of human-written and AI-written text, often drawn from known sources. These are not the distributions detectors face in production.

In the real world:

  • Most submitted text is human-written. If 99% of submissions are human and the detector has a 1% false-positive rate on human text, roughly half of everyone flagged as AI is a human who was misclassified. The Bayesian math is against the detector, and most users don't run it.
  • Human prose distributions differ across populations. A classifier trained largely on native-English university writing will flag non-native English writers more often. The Stanford 2024 study quantified this and the effect is real.
  • AI text distributions are moving targets. GPT-4o, Claude 3.5, Claude 4, Gemini 2 all produce prose that's shaped differently from GPT-3 and each other. A detector trained on last year's outputs has degraded on this year's models.

The result: published accuracy numbers from detector vendors are internal, best-case measurements. Real-world performance is meaningfully worse, and the false-positive pattern is not random — it concentrates on the people least able to defend themselves against the accusation.

Where false positives land

Several categories of writing reliably look more AI-like to detectors than they are:

Non-native English writers. Detectors look for uniform cadence, moderate vocabulary, and predictable structure — features that often describe careful second-language writing. Multiple studies (most prominently the 2023 Stanford paper "GPT detectors are biased against non-native English writers") have documented false-positive rates several times higher on this population.

Formal or technical prose. Academic writing, legal drafting, policy documents, and scientific abstracts all use restrained, cadence-uniform styles because the genre demands it. These are precisely the features detectors flag.

Heavily edited text. Prose that's been through many rounds of copy-editing loses the burstiness that early drafts have. Polished writing flattens. Detectors see a flat sentence-length variance and classify it as AI-shaped.

Short passages. Under 300 words, detectors have too little statistical signal to classify reliably. Most vendors acknowledge this in their documentation. In production use, many institutions still run detectors on three-paragraph answers.

Text that was AI-assisted but then substantially rewritten. If the underlying ideas were AI-suggested but the prose was rewritten by a human, classifiers often flag the residual structural patterns even though the final text is, in any meaningful sense, the human's.

If you're in any of those categories, the detector score on your text has a higher-than-average chance of being wrong. That's not an opinion; it's the measured pattern.

How to interpret a detector score

A few rules that hold up across detectors:

  1. A single score is one data point. Run the text through two or three detectors. Meaningful disagreement between them — common at the margins — tells you the score is uncertain.
  2. The per-sentence highlights are more useful than the headline percentage. If the highlighted "AI-like" sentences are all list items, transition-phrase-heavy sentences, or short-generic-opener sentences, you've found the style problem, not evidence of AI authorship.
  3. False-positive risk goes up with formality and down with specificity. Text that names specific people, events, numbers, and places is harder to misclassify as AI than text that stays abstract.
  4. A "98% AI" score on a 200-word answer is unreliable. The statistical base is too thin. Demand longer samples before making judgments.
  5. A score is not a confession. No detector can prove authorship; they can only estimate probability. A high score is a signal to look more carefully, not a verdict to apply penalties against.

What institutions should be doing

The useful institutional response to AI detection is not "more detection." It's policy clarity.

  • Define what AI assistance is and isn't acceptable, in writing, per assignment or per context.
  • Make it normal to disclose AI assistance honestly, and make disclosure not-penalised when it's within the allowed range.
  • Use detectors as one input among many (alongside process evidence, drafts, oral defence, revision history) — never as a single basis for penalty.
  • Train graders on detector limitations, especially on non-native-English bias.
  • When flagging a submission as AI-generated, require evidence that holds up outside the detector: process artefacts, inconsistent knowledge, citation failures.

The institutions that handle AI well treat the tool as a teaching question, not a policing one. The ones that lean hard on detectors as arbiters tend to produce the worst outcomes — false accusations against the most vulnerable students, and no effect on actual bad-faith AI submissions.

What we do about all this

At humanise.ai, we test our rewrite quality against all five major detectors on every release. We publish the measured median pass-rate and stand behind it. We don't promise 100% detector bypass, because nothing honest promises that — detectors update, models update, and an arms race guarantees variance.

We also think the detector arms race is downstream of the real problem, which is that institutions haven't decided what they actually want to happen when AI is part of the writing process. Until that policy question is answered, detector scores will remain what they are: noisy signals being treated as clean verdicts, with human writers — especially the ones least able to push back — bearing the cost of the error rate.

TL;DR

  • Detectors are probabilistic classifiers, not proof. They measure perplexity, burstiness, and token patterns.
  • Vendor "99% accuracy" claims come from internal benchmarks on balanced synthetic datasets. Real-world performance is meaningfully worse.
  • False positives concentrate on non-native English writers, formal/technical prose, heavily-edited text, and short passages.
  • A single detector score is one data point. Run multiple detectors; look at per-sentence highlights, not just the headline percentage.
  • Institutions that treat detector scores as verdicts make policy errors that harm real people.
  • humanise.ai tests against GPTZero, Turnitin, Originality.ai, Copyleaks, and Winston AI on every release; our median pass-rate is >85% and we publish the number rather than promising miracles.

Next: try the humaniser on a draft you have now, or read Natural AI writing for the stylistic markers detectors look for and how to remove them without gutting the prose.

Keep reading
Natural AI writing
What makes prose read as human — rhythm, vocabulary, structural markers.
How to humanize AI content
The five rewriting moves that work — practical guide.
Try the humaniser
Apply the techniques from this guide to real text. Free, unlimited.