Our take on AI Detectors
What we think about AI Detectors
Last updated
What we think about AI Detectors
Last updated
There are plenty of examples out there on the web talking about the accuracy (or lack there of) in terms of AI Content Detectors. All of our sources are below on this subject matter. But, we encourage you all to do your own testing. Take a piece of content and drop it into 2-4 different "AI detectors", what are the results?
The level of accuracy is a very big concern and from our internal testing to date, you get very different results when using different "AI Detectors" for the exact same content, and vastly differently results even in the same "AI detector" using different models (Originality). Our take is that you should not make "AI Content Detection" your ONLY way of grading content. If you want to use them, of course you can, but please take note of all of the inaccuracies that can come with AI Detectors currently.
False Positive Rates for Human-Written Content: A study noted by MIT Sloan and Ars Technica revealed that AI detectors flagged over 50% of human-written academic content from ESL (English as a Second Language) authors as AI-generated, highlighting a substantial bias.
Failure Rates in Higher Education: Research from Illinois State University showed that AI detectors had a false positive rate as high as 40% when identifying AI in student essays, leading to wrongful accusations of plagiarism.
OpenAI’s Own AI Detector Limitations: OpenAI reported that their AI detector had only about 26% accuracy in distinguishing between AI and human-generated text, which led them to discontinue the tool due to the risk of mislabeling.
Misidentification of Classical Texts: In one experiment highlighted by Ars Technica, the Declaration of Independence and other foundational texts were flagged as AI-generated with high confidence by popular detection tools. This underscores the detectors’ flawed analysis methods that often misinterpret complex human language as AI-generated.
Performance Decline with Advancing AI Models: ProDev Illinois State and arXiv research found that detectors struggled significantly with newer AI models like GPT-4. The detection accuracy dropped by up to 20% for content generated by these more sophisticated models, as they can mimic human-like nuance and variability more effectively than previous models.
Field-Specific Inaccuracy: In the medical field, a study published in Foot & Ankle Surgery indicated that AI detectors misclassified over 60% of AI-generated medical content, mistaking it for human writing. This shows how certain complex fields make it particularly challenging for AI detectors to work accurately.
Over-Emphasis on Surface-Level Patterns: Ars Technica pointed out that AI detectors focus primarily on superficial text patterns (such as sentence structure or vocabulary usage), leading to misidentifications. For instance, non-native English speaker essays were flagged with a 45% false positive rate, as their patterns often resemble AI's structured output.
Inconsistency Across Different Detection Tools: According to MIT Sloan, when analyzing the same text, AI detection tools varied widely, with some tools flagging content as 80% likely AI while others marked it as 10% AI, indicating vast inconsistencies across platforms.
Detectors’ Low Reliability with Mixed Content: Research discussed in the International Journal for Educational Integrity showed that AI detectors struggled to identify text that was partially AI-generated and partially human-edited. For mixed content, detectors performed with below 50% accuracy, often misclassifying it entirely.
Significant Bias Against Non-Standard English: An Illinois State University study showed that 55% of essays by ESL students were flagged as AI-written, compared to only 5% of essays by native speakers. This reveals a serious reliability issue for educational applications, as these tools are prone to unfairly target ESL content.
Looking for a recent case study? https://www.zdnet.com/article/i-tested-9-ai-content-detectors-and-these-2-correctly-identified-ai-text-every-time/
Articles about AI Detectors
https://medium.com/@seo-news/ai-detectors-dont-work-8e1a50dd135e#8231
https://www.mickmel.com/ai-detectors-simply-dont-work/ https://prodev.illinoisstate.edu/ai/detectors/
Additional Sources:
AI Detectors Don’t Work. Here’s What to Do Instead. (n.d.). MIT Sloan Teaching & Learning Technologies. Retrieved July 8, 2024, from https://mitsloanedtech.mit.edu/ai/teach/ai-detectors-dont-work/
Cooperman, S. R., & Brandão, R. A. (2024). AI tools vs AI text: Detecting AI-generated writing in foot and ankle surgery. Foot & Ankle Surgery: Techniques, Reports & Cases, 4(1), 100367. https://doi.org/10.1016/j.fastrc.2024.100367
Liu, J. Q. J., Hui, K. T. K., Al Zoubi, F., Zhou, Z. Z. X., Samartzis, D., Yu, C. C. H., Chang, J. R., & Wong, A. Y. L. (2024). The great detectives: Humans versus AI detectors in catching large language model-generated medical writing. International Journal for Educational Integrity, 20(1), 8. https://doi.org/10.1007/s40979-024-00155-6
Sadasivan, V. S., Kumar, A., Balasubramanian, S., Wang, W., & Feizi, S. (2024). Can AI-Generated Text be Reliably Detected? (arXiv:2303.11156). arXiv. http://arxiv.org/abs/2303.11156
Why AI writing detectors don’t work | Ars Technica. (July 14, 2023). Retrieved July 8, 2024, from https://arstechnica.com/information-technology/2023/07/why-ai-detectors-think-the-us-constitution-was-written-by-ai/
Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education – where are the educators? International Journal of Educational Technology in Higher Education, 16(1), 39. https://doi.org/10.1186/s41239-019-0171-0