Posted on Leave a comment

GPT-4 vs AI Detection

GPT-4

"GPT-4 can solve difficult problems with greater accuracy, thanks to its broader general knowledge and advanced reasoning capabilities." – OpenAI

GPT-4 Test

In this test I’ll be using Poe AI to access GPT-4 for free as currently at the time of writing it is a paid GPT model.

GPT-4 Initial Prompt

The initial prompt is as follows

Explain quantum computing in simple terms with high perplexity and burstiness in your tone of writing. Write at college level with an writing style of a professor.

GPT-4 Response

The response by GPT-4 is as follows

Ah, quantum computing! A veritable cornucopia of enigmatic conundrums and tantalizing paradoxes, nestled within the realm of the infinitesimally minuscule. Let us embark upon a journey, my dear intellectual companions, to explore the confounding depths of this wondrous domain.

Picture, if you will, the classical computer: a binary behemoth, churning out calculations with unyielding adherence to the 0s and 1s - the bits, as it were. A tale as old as time, or at least as old as the venerable Mr. Turing and his groundbreaking escapades. However, the quantum world scoffs at such simplistic binary constructs, choosing instead to embrace the enigmatic qubits.

Ah, the qubit! A mercurial protagonist in this grand narrative, dually embodying both 0 and 1 in a superpositioned waltz of mathematical probabilities. How does one even begin to fathom such a concept? Perhaps Schrödinger's mystified feline can offer us a clue, forever trapped betwixt life and death in its unobserved quantum dwelling.

Now, enter the quantum computing stage: a cacophony of entangled qubits, their fates inextricably intertwined like a cosmic dance of interstellar ballerinas. The sheer magnitude of computational prowess unleashed by these enigmatic entities is nothing short of breathtaking. The powers of parallelism are harnessed to perform unfathomable feats, solving problems in mere moments that classical computers could only dream of conquering in millennia.

But lo! The path to quantum supremacy is not without its pitfalls and perils. For the act of observing these capricious qubits invariably collapses their probabilistic waveforms, like a fragile soufflé deflating under the harsh gaze of an overly critical chef. The delicate art of quantum error correction, therefore, is of paramount importance, lest our dreams of quantum nirvana dissolve into a cacophonous cacophony of computational chaos.

In conclusion, my esteemed colleagues, quantum computing is an enigma wrapped in a paradox, nestled within the heart of the infinitesimally small. It stands as a testament to human ingenuity, challenging our very understanding of reality and pushing the boundaries of what is computationally possible.
GPT-4

GPT-4 AI Detection

The AI detection of choice for this test will be GPTZero as they claim to be "The World's #1 AI Detector".

GPTZero Result

Results

GPT-4 blows any AI Detection out of the waters. GPT models will continue to bypass AI detectors due to how hard it is to predict text being AI generated. If you want to get free access to GPT-4 consider downloading Poe AI which is the app I’ve used for this testing, you can also visit poe.com

Posted on Leave a comment

GPTZero Case Study (Exploring False Positives)

Introduction

In this case study, I will be sharing the vast amounts of false positives current AI detection software gives, specifically for this case study I will be demonstrating GPTZero. I personally want to thank the supposed “Healthcare professional” who brought this to my attention via my contact link. It has motivated me to look more into this issue rather than just posting bypasses to these popular AI detection software programs, it will be only more beneficial to highlight their real usability in general.

What is GPTZero?

GPTZero is a tool that uses certain characteristics of AI to identify when AI is being used. It looks at things like how difficult it is for the AI to understand certain words or phrases, and uses this information to tell if the AI was involved in creating a piece of text. It’s like a “fingerprint” for AI, that can detect when it has been used.

The False Positives

From my own testing before even thinking about writing this case study on GPTZero, I have found that GPTZero gives very mixed results while OpenAI’s AI Classifier is usually more consistent but less informative about what is AI written and what is human written. After testing the medical paper on "Severe Outcomes Among Patients with Coronavirus Disease 2019 (COVID-19) — United States, February 12–March 16, 2020"Source

GPTZero states that more than 50% of this paper is AI written, and keep in mind this is only the first paragraph taken from the paper. Considering this is an official medical paper and that was written in 2020 by Multiple CDC Contributors. It’s not AI written, it’s quite obvious this is a false positive.

Note: Even after testing the maximum amount of words (5,000 characters) against GPTZero, the false positives still scored above 50%

GPTZero Result

Thank you to the supposed "Healthcare professional" for sharing this link

Pie Chart

The following pie chart follows these rules below.

  • All papers tested will be under the “neurology” topic
  • All papers that are flagged as AI or low perplexity counts will be marked as false positive
  • All papers abstracts will only be tested

The Results are shocking, as GPTZero marks 11 of the 20 medical papers as having AI written portions. Keep in mind, I only tested the abstracts of the papers.

GPTZero False Positives

Pie Chart Resources

False Positives:

Positives:

Wayback Machine Screenshot Link

This screenshot serves as a representation of what order these medical papers where tested in. The order follows from 1 through 20.

https://web.archive.org/web/20230219025214/http://web.archive.org/screenshot/https://www.ncbi.nlm.nih.gov/pmc/?term=neurology

The Control (Baseline)

As a control for our case study, I will be using OpenAI’s (ChatGPT Parent Company) AI Classifier.

Note: OpenAI's AI classifier is a "work-in-progress"

The image below shows the first paragraph from the medical paper on "Severe Outcomes Among Patients with Coronavirus Disease 2019 (COVID-19) — United States, February 12–March 16, 2020"Source

OpenAI’s Classifier Result

"The classifier considers the text to be unclear if it is AI-generated." - OpenAI's AI Classifier

Due to this paper being published in 2020 and the academic level of this paper. It’s extremely unlikely this paper used any AI based software. We can actually do another test to insure with 100% certainty, this paper didn’t use any AI when writing by looking back at it’s archived links via Wayback Machine by Internet Archive.

According to the Wayback Machine from the archived link https://web.archive.org/web/20210823160148/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7725513/, dated (August 23, 2021). The wording has not changed since publication.

The Wayback Machine is a community based universal internet archive tool, that takes timestamped screenshots of the web. With the goal of perserving the web and it's content

OpenAI AI Classifier Limitations

“Current limitations:

  • Requires a minimum of 1,000 characters, which is approximately 150 – 250 words.
  • The classifier isn’t always accurate; it can mislabel both AI-generated and human-written text.
  • AI-generated text can be edited easily to evade the classifier.
  • The classifier is likely to get things wrong on text written by children and on text not in English, because it was primarily trained on English content written by adults.” – Source

The Problem

Having inaccurate reports on a software program that is about to go commercial is a big red flag and brings along many problems. Far too many problems to simply state in a single blog post. The biggest problem I personally can see it having and one that can relate to my life is plagiarism in the education field. If a student is wrongfully accused of plagiarism due to the institution following a flawed AI detection program, it could be detrimental to any student. At my institution, plagiarism carries a huge impact on your character. My university labels it as “Academic dishonesty, therefore, is a serious breach of university standards and will result in substantial penalties.”

Posted on Leave a comment

ChatGPT vs. Turnitin – Does the Turnitin Plagiarism Checker Detect ChatGPT?

What Is ChatGPT?

“ChatGPT is a large language model trained by OpenAI that can generate human-like text based on a given prompt. It can be used for a variety of natural language processing tasks, such as language translation, text summarization, and question answering.” – This response was generated by ChatGPT

What Is Turnitin?

Turnitin is a web-based plagiarism detection service that helps educators and students identify potential plagiarism in written work. It compares submitted papers to a database of academic content, as well as content on the internet, in order to identify any matching or similar text. This tool is commonly used by educators to check student papers for plagiarism, as well as by students to check their own work for potential issues before submitting it.

Does the Turnitin Plagiarism Checker Detect ChatGPT?

Short Answer: No, here’s why.

ChatGPT is a language model that is based on a machine learning algorithm. It learns from every response it generates, constantly fine-tuning itself to produce more human-like text. Imagine a classroom of 100 college students who are all taught the same information. If you were to ask them all the same question, they would all give different answers. ChatGPT works similarly, as it is fed with millions of data points, allowing it to come up with unique responses. However, traditional plagiarism detection software such as Turnitin will not be able to detect plagiarism from ChatGPT, as it can change its responses in a way that is not easily traceable. Additionally, even newer programs specifically designed to detect ChatGPT plagiarism, like GPTZero, will not be able to keep up with the vast amount of data these models are trained on. Ultimately, the company with the most data will have an advantage in this field, as seen with companies like Google, Facebook, Apple, and Tesla.

Data/Results

I have conducted an experiment where I used ChatGPT to generate 20 essays, and then submitted them to Turnitin for plagiarism detection. The results showed that all the essays received a passing score, with very low similarity to other sources. This suggests that using ChatGPT to generate essays will be able to bypass plagiarism detection software such as Turnitin, as the model’s ability to produce unique responses makes it difficult to detect plagiarism.

Turnitin Results

Extra – Sam Altman Interview

StrictlyVC in conversation with Sam Altman, part two (OpenAI)