Students are using ChatGPT, Claude, and Gemini to write essays, answer take-home prompts, and work through problem sets. You already know this.
That’s why so many teachers turned to AI detection tools like Turnitin and GPTZero. Software that scans student work, gives out a percentage, and claims to tell you whether the writing is real.
One teacher did exactly that. Ran a batch of essays through a detector, saw three flagged at 95% or higher, and reported all three for academic integrity violations. Turned out every single one was a false positive. The students had their drafts, their peer review comments, all of it. One of them broke down crying in the teacher’s office.
This post covers why that keeps happening, who these tools hurt most, and what’s replacing them at schools that have moved on.
How do AI detection tools work?
AI detection tools scan writing for statistical patterns. According to a 2026 technical analysis by EyeSift, the two core signals most detectors rely on are “perplexity” (how predictable your word choices are to a language model) and “burstiness” (how much sentence length and complexity vary across a document). AI-generated text tends to score low on both: predictable words, uniform rhythm. If a piece of writing hits those patterns, the tool flags it.
The two most common tools in schools are Turnitin, which added an AI detection feature in 2023 on top of its existing plagiarism checker, and GPTZero, which according to its own site serves over 10 million users including 380,000 educators. Both generate a percentage score representing how likely the text is AI-written.
But detecting a pattern is not the same as detecting AI. These tools don’t know whether a student used ChatGPT. They estimate whether the writing looks like something an AI might produce. And plenty of human writing triggers the same patterns.
Are AI detection tools accurate?
No. Not at the level needed to make academic integrity decisions.
These tools output a probability score, not a verdict. A score of 85% does not mean “this student cheated.” It means the tool’s model thinks 85% of the writing resembles patterns in its training data. There is no way for a teacher to verify that number independently, and no way for a student to disprove it.
Bassett et al. published a 2026 study in the Journal of Higher Education Policy and Management that made this point directly:
“[AI detection is] conceptually broken, procedurally unfair, and methodologically indefensible.”
Their conclusion was not to improve the tools but to abolish their use in academic integrity decisions entirely.
Even OpenAI, the company behind ChatGPT, shut down its own AI text classifier shortly after launch because the accuracy was too low to be useful.
And yet adoption keeps growing. An NPR investigation in December 2025 found that more than 40% of surveyed 6th- to 12th-grade teachers used AI detection tools during the previous school year. School districts from Utah to Ohio to Alabama were spending thousands on the software. Broward County Public Schools in Florida alone spent over $550,000 on a three-year Turnitin contract.

Who gets hurt when AI detectors get it wrong
False positives are not random. They cluster around specific groups of students, and the pattern is consistent enough that researchers have started calling it a fairness problem, not just an accuracy problem.
ELL and multilingual students
Students writing in a second language tend to use simpler sentence structures, more formulaic phrasing, and less varied vocabulary. Those are natural features of developing proficiency in a new language. They are also the exact features AI detectors interpret as machine-generated.
A 2025 fairness audit published in the International Journal of Teaching, Learning and Education tested four commercial AI detectors on over 1,200 text samples, including ESL graduate student writing. The detectors performed well on AI-generated text but “disproportionately flagged ESL writing with false positives.”
How do we know it’s a language bias and not an AI signal? Because Stanford researchers found that when they enhanced the vocabulary in the same TOEFL (Test of English as a Foreign Language) essays to sound more native-like, the false positive rate dropped from 61% to 11.6%. The detectors weren’t detecting AI. They were detecting non-native English.
Neurodivergent students
Students with autism, ADHD, or dyslexia often write in structured, literal, or repetitive patterns. Not because they are using AI, but because that is how they process and organize language.
Moira Olmstead, an autistic student at Central Methodist University, had her essay flagged at 100% AI-generated by Turnitin. She received a zero. Her professor deferred to the software’s judgment instead of evaluating the work. Olmstead eventually had the grade reversed, but the warning stayed on her record, with a note that another flag would trigger a formal plagiarism charge. As The Gillnetter reported, neurodivergent students learn through pattern recognition rather than prose, producing writing that detectors are structurally biased against.
Students who write well
This one is counterintuitive. Students who write formal, precise, well-organized prose can trigger the same low-perplexity signals that detectors associate with AI. Their writing is “too clean.”
As Techdirt put it in March 2026:
“Students who don’t use AI are punished for writing too well.”
The tools are not catching cheaters. They are penalizing competence.
The chilling effect

When students know their writing will be scanned, they change how they write. Not to improve it, but to avoid getting flagged.
A 2026 survey covered by Inside Higher Ed found that 75% of U.K. students who use AI reported significant stress over being wrongly flagged for plagiarism. More than half of all students surveyed cited “being accused of cheating when I did nothing wrong” as a major source of anxiety.
Some students respond by writing worse on purpose. Others go further. A professor writing in The Chronicle of Higher Education in 2026 described a student who began using AI tools only after learning that certain stylistic features were rumored to trigger detectors. She started running her writing through AI to see how it would register. Not to cheat, but to protect herself from a false accusation.
That is the opposite of what education is supposed to do.
What schools and universities are doing instead
Some institutions have already made the call. They stopped using AI detection tools and shifted their policies toward AI literacy and assessment redesign.
Washington State University terminated its Turnitin AI detection contract in early 2026. The reason: a conservative estimate of 1,485 false positives in a single semester (Fall 2024), based on Turnitin’s own self-reported 1-2% error rate. Between 2023 and 2025, 33% of all Academic Integrity Hearing Board cases involving AI allegations ended in acquittals. Vice Provost Bill Davis said the decision aligns WSU with peer R1 institutions and acknowledged that “AI detectors and tools designed to circumvent them are currently in a cat and mouse game.”
They are not alone. A 2025 investigation by GradPilot into 66 top U.S. universities found that at least 12 elite institutions had disabled Turnitin’s AI detection entirely, including Yale, Johns Hopkins, Northwestern, the University of Pittsburgh, Georgetown, and NYU. The University of Texas at Austin went further and banned purchasing AI detection software with procurement cards or personal credit cards, citing student intellectual property and FERPA concerns.
The University of Waterloo discontinued Turnitin’s AI detection in September 2025, citing unreliability, bias against non-native English speakers, and the fact that internal testing found “the advantages of the detection tool were inconclusive.” Their recommendation to faculty:
“Time and effort are best spent on education rather than policing misuse of GenAI.”
The pattern across all these institutions is the same. Detection policies are being replaced by AI literacy policies. The focus is shifting from catching violations to teaching responsible use.
How to prevent AI cheating without detection tools
If detection tools are unreliable and the institutions leading in education are dropping them, the question becomes: what actually works?
The answer is not a better detector. It is a better approach to assessment, policy, and the tools students are allowed to use.
Three shifts that replace detection
1. Make the process visible, not just the product. Grade the steps, not just the final essay. Outline, rough draft, revision with notes. Even one checkpoint between “assigned” and “due” makes AI-only submissions much harder.
2. Set AI expectations before the assignment, not after. A one-paragraph policy on your assignment sheet is enough: is AI allowed, what counts as acceptable use, and a short process statement from the student. No guessing for students, no suspicion from teachers.
3. Rethink how students use AI, not whether they use it. Students are already using ChatGPT, Claude, and Gemini outside of class. Instead of spending on detection software, invest that energy into choosing AI tools that actually support learning. That starts with one question.
4. Choose AI tools built for the classroom, not against it. Instead of detectors that police student work, look for tools that guide how students learn with AI. The right tool gives you control over what students can and cannot do with it. That is a fundamentally different approach, and it leads to a bigger question.
For practical guidance on identifying AI use through your own judgment, including specific indicators, process checks, and how to have the conversation with a student, read AI Indicators in Student Writing.
Is using AI in school cheating?
It depends on the assignment and the policy. A student who brainstorms with ChatGPT is doing something very different from a student who copies a full response and submits it. But without clear guidelines, both get treated the same way. The problem is not that students use AI. The problem is that many classrooms have not defined what responsible use of AI looks like.
The tool also matters. Generic chatbots like ChatGPT, Claude, and Gemini will write an entire essay if a student asks. That is where the cheating concern comes from, and it is valid.
But teachers can also give students AI tools that are designed to teach, not to do the work. Edcafe AI lets teachers create interactive learning activities that students actually use, from creation to assignment and response collection, all in one place.
Its custom chatbot feature lets you set the instructions yourself and decide how it responds. It can walk a student through a concept step-by-step, offer hints before answers, and check understanding along the way. The student still does the thinking. The chatbot just keeps them on track.
Want to see what this looks like in practice? Our guide on AI Chatbots for Education breaks down how teachers are using classroom chatbots, what guardrails to set, and how to pick the right tool for your subject.
What comes next
Students are going to use AI. That is not changing.
What you can change is how your classroom responds to it. The schools that have moved on from detection are not ignoring the problem. They are solving it differently. They are writing clearer policies so students know where the line is. They are designing assignments where the process is the proof. And they are choosing AI tools that keep students learning instead of handing them shortcuts.
None of that requires a percentage from a scanner. It requires teachers who are willing to rethink assessment in a world where AI is already in the room.
That is a harder shift than installing software. But it is the one that actually works.
FAQs
What is the difference between AI detection and plagiarism detection?
Plagiarism detection compares text against a database of existing sources and flags matches. AI detection tools do something different: they estimate the probability that writing was generated by a language model based on statistical patterns. A paper can pass plagiarism checks and still be flagged by AI detection, because the text is original but the writing style triggers the tool’s model.
Can AI detection tools catch paraphrased or lightly edited AI content?
Not reliably. Detection rates drop to 60-85% when AI text has been manually edited or paraphrased. Tools like Undetectable.ai and Sapling are built specifically to rewrite AI output past detection thresholds. For every detector update, a workaround appears within weeks. That is why assessment redesign works better than chasing evasion tools.
Should I trust Turnitin’s AI detection score?
Not as evidence on its own. Turnitin itself states its AI detection “should not be used as the sole basis for adverse actions against a student.” The score is a probability estimate, not proof. Researcher Mike Perkins evaluated 14 AI detection tools and found none scored above 80% accuracy, Turnitin included.
What should I do if a student’s work is flagged by AI detection?
Start with a conversation, not an accusation. Ask the student to explain their argument, describe their process, or identify the hardest part of the assignment. A student who did the work can answer. Compare the flagged piece against their earlier writing for shifts in voice or complexity. For a full framework, read AI Indicators in Student Writing.
Why do different AI detection tools give different results for the same text?
Each tool uses a different model, different training data, and different scoring thresholds. A “65% AI” score on one tool does not mean the same thing as “65%” on another because they define and measure “AI-generated” differently. If two AI detection tools cannot agree on the same text, neither is reliable enough to act on.
