GPTZero Case Study (Exploring False Positives)

Introduction

In this case study, I will share the vast amounts of false positives current AI detection software gives. Specifically, for this case study, I will be demonstrating GPTZero. I want to thank the supposed “Healthcare professional” who brought this to my attention via my contact link. It has motivated me to look more into this issue than just posting bypasses to these popular AI detection software programs. It will be only more beneficial to highlight their actual usability in general. In this case study, I will share the vast amounts of false positives current AI detection software gives. Specifically, for this case study, I will be demonstrating GPTZero. I want to thank the supposed “Healthcare professional” who brought this to my attention via my contact link. It has motivated me to look more into this issue rather than just posting bypasses to these popular AI detection software programs. It will be only more beneficial to highlight their actual usability in general.

What is GPTZero?

“GPTZero is a tool that uses specific characteristics of AI to identify when AI is being used. It looks at how difficult it is for the AI to understand certain words or phrases and uses this information to tell if the AI was involved in creating a piece of text. It is like a “fingerprint” for AI that can detect when it has been used.

The False Positives

From my testing before even thinking about writing this case study on GPTZero, I have found that GPTZero gives very mixed results. At the same time, OpenAI’s AI Classifier is usually more consistent but less informative about what AI is written and what is human written. After testing the medical paper on "Severe Outcomes Among Patients with Coronavirus Disease 2019 (COVID-19) — United States, February 12–March 16, 2020"Source

GPTZero states that more than 50% of this paper is AI-written, and remember that this is only the first paragraph taken from the paper. This is an official medical paper written in 2020 by Multiple CDC Contributors. It’s not AI-written; it’s obvious this is a false positive.

Note: Even after testing the maximum amount of words (5,000 characters) against GPTZero, the false positives still scored above 50%

GPTZero Result

Thank you to the supposed "Healthcare professional" for sharing this link

Pie Chart

The following pie chart follows the rules below.

  • “All papers tested will be under the “neurology” topic”All papers tested will be under the “neurology” topic.
  • All documents flagged as AI or low perplexity counts will be false positives.
  • All paper abstracts will only be tested

The Results are shocking, as GPTZero marks 11 of the 20 medical papers as having AI-written portions. Keep in mind, that I only tested the abstracts of the papers.

GPTZero False Positives

Pie Chart Resources

False Positives:

Positives:

Wayback Machine Screenshot Link

This screenshot serves as a representation of what order these medical papers where tested in. The order follows from 1 through 20.

https://web.archive.org/web/20230219025214/http://web.archive.org/screenshot/https://www.ncbi.nlm.nih.gov/pmc/?term=neurology

The Control (Baseline)

I will use OpenAI’s (ChatGPT Parent Company) AI Classifier as a control for our case study.

Note: OpenAI's AI classifier is a "work-in-progress"

The image below shows the first paragraph from the medical paper on "Severe Outcomes Among Patients with Coronavirus Disease 2019 (COVID-19) — United States, February 12–March 16, 2020"Source

OpenAI’s Classifier Result

"The classifier considers the text to be unclear if it is AI-generated." - OpenAI's AI Classifier

Due to this paper being published in 2020 and the academic level of this paper. It’s extremely unlikely this paper used any AI-based software. We can actually do another test to ensure with 100% certainty, that this paper didn’t use any AI when writing by looking back at its archived links via Wayback Machine by Internet Archive.

According to the Wayback Machine from the archived link https://web.archive.org/web/20210823160148/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7725513/, dated (August 23, 2021). The wording has not changed since publication.

The Wayback Machine is a community based universal internet archive tool, that takes timestamped screenshots of the web. With the goal of perserving the web and it's content

OpenAI AI Classifier Limitations

Current limitations:

  • Requires a minimum of 1,000 characters, approximately 150 – 250 words.
  • The classifier isn’t always accurate; it can mislabel AI-generated and human-written text.
  • AI-generated text can be edited easily to evade the classifier.
  • “The classifier is likely to get things wrong on text written by children and on text not in English because it was primarily trained on English content written by adults.” – Source

The Problem

“Having inaccurate reports on a software program that is about to go commercial is a big red flag and brings along many problems. Far too many problems to simply state in a single blog post. The biggest problem I can see it having and one that can relate to my life is plagiarism in education. If a student is wrongfully accused of plagiarism due to the institution following a flawed AI detection program, it could be detrimental to any student. At my institution, plagiarism has a huge impact on your character. My university labels it as “Academic dishonesty, therefore, is a serious breach of university standards and will result in substantial penalties.” Having inaccurate reports on a software program that is about to go commercial is a big red flag and brings along many problems. Far too many problems to simply state in a single blog post. The biggest problem I can see it having and one that can relate to my life is plagiarism in education. If a student is wrongfully accused of plagiarism due to the institution following a flawed AI detection program, it could be detrimental to any student. At my institution, plagiarism has a huge impact on your character. My university labels it as “Academic dishonesty, therefore, is a serious breach of university standards and will result in substantial penalties.”


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *