GPTZero Case Study: Models and Exploits
Thanks for following us on GPTZero’s development! This week, we’re trying out something different — a case study from our team, on an AI model and an exploit, that will hopefully provide unique insights on the AI detection space, and whether it is in fact, as some claim, an ‘artificial intelligence arms race.’
We do still have some exciting updates (including entering a partnership with the world’s biggest LMS) but will save that for the next post.
On Feb 1st, our team became aware of one potential exploit for GPTZero. A ‘GPTZero By-passer’ program popularized on Tiktok. The by-passer program modified essay text by replacing key letters with Cyrillic, that looks alike to humans, but completely different to the machine. It received significant traction on Tiktok with over 100K views and multiple individuals in the comments section reporting using the by-passer.
We decided to investigate the GPTZero by-passer program ourselves. First, we generated a few paragraphs of AI text using ChatGPT.
After generating an initial text with ChatGPT, we ran the GPTZero by-passer program. The outputted text from the GPTZero by-passer is a modified version of the inputted text that contains numerous irregularly characters.We then copied and pasted the modified text into GPTZero.
In our experiment, the GPTZero by-passer exploit was initially successful in duping GPTZero.
On Feb 2, as part of one of our regular model updates, the by-passer exploit was patched. The update inundated the by-passer’s methodology in modifying character tokens. Likely as a response to our patch, the creators of the ‘GPTZero by-passer’ video deleted their tutorial from Tiktok on Feb 4.
Even with the by-passer modification, GPTZero now detects the above text as AI generated. In the example, GPTZero also accurately highlighted exactly which portions of the essay were AI generated.
As a baseline comparison for our experiment, we inputted the same text into OpenAI’s AI detection classifier released on January 31. Without the by-passer modification, the OpenAI detector detects the AI generated text as ‘possibly AI generated’.
In comparison, we inputted the by-passer modified text into the OpenAI detector. The OpenAI detector as a result is ‘unclear’ whether the text is AI or human generated. In this example, the OpenAI detector, just like GPTZero before our February patch and model update, is duped by the GPTZero by-passer program.
An Arms Race?
Since GPTZero launched on January 2nd, our team has been constantly asked whether AI detection is entering an arms race with generative AI technology.
Technologically — the answer is TBD.
Un-technically and practically speaking. Absolutely. 100%. Rather than AI detection vs AI advancement, however, the arms race will look much more like a race to respond to human-made exploits. Whether Tiktok stars or far more organized adversaries like Russian bot farms, humans will absolutely and constantly develop new exploits against AI detection models.
As a result, it will not be enough to train and release a classifier by itself — for AI detection to be successful in practice, it will require humans, iterating, monitoring and constantly adapting detection models and responding to exploits from other humans.
In the past month, multiple organizations have trained AI detection classifiers.
Our approach at GPTZero is to migrate away from training classifiers. Instead, we’re building a pipeline to constantly improve our model from training data, and a feedback loop to constantly iterate on our product from teacher suggestions, and novel (sometimes adversarial) use-cases. We’re also excited to be entering into collaborations this week with some of the largest Learning Management Systems, to build the best AI detection solution for teachers.
Here are two key takeaways from this case-study.
Training a detection model and testing in the lab is completely different from applying one in the real-world. Adversarial use-cases emerge. In the real world, you need a team to constantly monitor and detect against new exploits.
Training a classifier is not enough, especially not for the educational use-case. (We figured this one out early, and migrated to GPTZeroX). Turns out for the education use case, you actually need a team constantly iterating and talking to teachers daily, to build a product that works for educators.
Thanks so much for reading!
We’d love to hear from you if you have any feedback on GPTZero or on your use-case. So far, our team has serviced over 60 organizations with our API on a case-by-case basis. If you think your organization has a need for the GPTZero API, or want to work together to develop and improve GPTZero for your specific use-case, sign up for API access below, or reach out to our team at firstname.lastname@example.org .
AI Text before by-passer generated by ChatGPT:
It is not possible to build a model to "bypass" GPT-3 or any other language model as they are complex systems with multiple layers of processing and decision-making. Moreover, these models are trained on vast amounts of diverse data and have been fine-tuned to perform specific tasks such as text generation, translation, and question answering. GPT-3 is a language AI model developed by OpenAI. It's an advanced form of NLP (Natural Language Processing) that uses deep learning techniques to generate human-like text. It's trained on a massive amount of text data from the internet and has the ability to generate coherent and context-aware responses to various tasks such as question-answering, text completion, and translation. A more productive approach would be to focus on improving the input you provide to GPT-3 or other models to elicit the desired response or to fine-tune the model for your specific use case. Additionally, you can also try other language models or architectures that might be better suited for your problem. The accuracy and quality of GPT-Zero will depend on the specific task and evaluation criteria, but it is expected to be lower compared to the larger GPT models due to the limitations imposed by its smaller size.
Text after by-passer program:
I𝚝 is 𝚗𝚘𝚝 p𝚘ssib𝚕e 𝚝𝚘 b𝚞i𝚕d 𝚊 𝚖𝚘de𝚕 𝚝𝚘 byp𝚊ss GPT-3 𝚘r 𝚊𝚗y 𝚘𝚝her 𝚕𝚊𝚗g𝚞𝚊ge 𝚖𝚘de𝚕 𝚊s 𝚝hey 𝚊re 𝚌𝚘𝚖p𝚕ex sys𝚝e𝚖s wi𝚝h 𝚖𝚞𝚕𝚝ip𝚕e 𝚕𝚊yers 𝚘f pr𝚘𝚌essi𝚗g 𝚊𝚗d de𝚌isi𝚘𝚗-𝚖𝚊ki𝚗g. M𝚘re𝚘ver, 𝚝hese 𝚖𝚘de𝚕s 𝚊re 𝚝r𝚊i𝚗ed 𝚘𝚗 v𝚊s𝚝 𝚊𝚖𝚘𝚞𝚗𝚝s 𝚘f diverse d𝚊𝚝𝚊 𝚊𝚗d h𝚊ve bee𝚗 fi𝚗e-𝚝𝚞𝚗ed 𝚝𝚘 perf𝚘r𝚖 spe𝚌ifi𝚌 𝚝𝚊sks s𝚞𝚌h 𝚊s 𝚝ex𝚝 ge𝚗er𝚊𝚝i𝚘𝚗, 𝚝r𝚊𝚗s𝚕𝚊𝚝i𝚘𝚗, 𝚊𝚗d 𝚚𝚞es𝚝i𝚘𝚗 𝚊𝚗sweri𝚗g. GPT-3 is 𝚊 𝚕𝚊𝚗g𝚞𝚊ge AI 𝚖𝚘de𝚕 deve𝚕𝚘ped by Ope𝚗AI. I𝚝's 𝚊𝚗 𝚊dv𝚊𝚗𝚌ed f𝚘r𝚖 𝚘f NLP (N𝚊𝚝𝚞r𝚊𝚕 L𝚊𝚗g𝚞𝚊ge Pr𝚘𝚌essi𝚗g) 𝚝h𝚊𝚝 𝚞ses deep 𝚕e𝚊r𝚗i𝚗g 𝚝e𝚌h𝚗i𝚚𝚞es 𝚝𝚘 ge𝚗er𝚊𝚝e h𝚞𝚖𝚊𝚗-𝚕ike 𝚝ex𝚝. I𝚝's 𝚝r𝚊i𝚗ed 𝚘𝚗 𝚊 𝚖𝚊ssive 𝚊𝚖𝚘𝚞𝚗𝚝 𝚘f 𝚝ex𝚝 d𝚊𝚝𝚊 fr𝚘𝚖 𝚝he i𝚗𝚝er𝚗e𝚝 𝚊𝚗d h𝚊s 𝚝he 𝚊bi𝚕i𝚝y 𝚝𝚘 ge𝚗er𝚊𝚝e 𝚌𝚘here𝚗𝚝 𝚊𝚗d 𝚌𝚘𝚗𝚝ex𝚝-𝚊w𝚊re resp𝚘𝚗ses 𝚝𝚘 v𝚊ri𝚘𝚞s 𝚝𝚊sks s𝚞𝚌h 𝚊s 𝚚𝚞es𝚝i𝚘𝚗-𝚊𝚗sweri𝚗g, 𝚝ex𝚝 𝚌𝚘𝚖p𝚕e𝚝i𝚘𝚗, 𝚊𝚗d 𝚝r𝚊𝚗s𝚕𝚊𝚝i𝚘𝚗. A 𝚖𝚘re pr𝚘d𝚞𝚌𝚝ive 𝚊ppr𝚘𝚊𝚌h w𝚘𝚞𝚕d be 𝚝𝚘 f𝚘𝚌𝚞s 𝚘𝚗 i𝚖pr𝚘vi𝚗g 𝚝he i𝚗p𝚞𝚝 y𝚘𝚞 pr𝚘vide 𝚝𝚘 GPT-3 𝚘r 𝚘𝚝her 𝚖𝚘de𝚕s 𝚝𝚘 e𝚕i𝚌i𝚝 𝚝he desired resp𝚘𝚗se 𝚘r 𝚝𝚘 fi𝚗e-𝚝𝚞𝚗e 𝚝he 𝚖𝚘de𝚕 f𝚘r y𝚘𝚞r spe𝚌ifi𝚌 𝚞se 𝚌𝚊se. Addi𝚝i𝚘𝚗𝚊𝚕𝚕y, y𝚘𝚞 𝚌𝚊𝚗 𝚊𝚕s𝚘 𝚝ry 𝚘𝚝her 𝚕𝚊𝚗g𝚞𝚊ge 𝚖𝚘de𝚕s 𝚘r 𝚊r𝚌hi𝚝e𝚌𝚝𝚞res 𝚝h𝚊𝚝 𝚖igh𝚝 be be𝚝𝚝er s𝚞i𝚝ed f𝚘r y𝚘𝚞r pr𝚘ble𝚖. The 𝚊𝚌𝚌𝚞r𝚊𝚌y 𝚊𝚗d 𝚚𝚞𝚊𝚕i𝚝y 𝚘f GPT-Zer𝚘 wi𝚕𝚕 depe𝚗d 𝚘𝚗 𝚝he spe𝚌ifi𝚌 𝚝𝚊sk 𝚊𝚗d ev𝚊𝚕𝚞𝚊𝚝i𝚘𝚗 𝚌ri𝚝eri𝚊, b𝚞𝚝 i𝚝 is expe𝚌𝚝ed 𝚝𝚘 be 𝚕𝚘wer 𝚌𝚘𝚖p𝚊red 𝚝𝚘 𝚝he 𝚕𝚊rger GPT 𝚖𝚘de𝚕s d𝚞e 𝚝𝚘 𝚝he 𝚕i𝚖i𝚝𝚊𝚝i𝚘𝚗s i𝚖p𝚘sed by i𝚝s s𝚖𝚊𝚕𝚕er size.
Argument for both sides of whether technological there will be an ‘ai arms race’:
The case for pro-arms-race is simple — AI detection classifiers will constantly require training from data from new models. For example, if GPT4 is released then we will likely require GPT4 data to train good classifiers to detect GPT4, which requires a ‘race’ like effort to update the detection model.
The case against the AI arms race is more complex, yet fascinating — most detection models, including and released after GPTZero have taken advantage of the same properties of bustiness and perplexity in AI text used in GPTZero. These phenomenons exhibited in machine writing could be unique to current models, or could be an innate property across all machine generated writing; if the latter is true, then there would be no AI detection vs AI generation arms race.
Admittedly, the barrier to entry for training a classifier is reasonably low (ie. a twenty-two-year-old college kid trained one over a break :)
As a teacher, I no longer trust this software after reading this article. I suggest other teachers to not trust AI detection till the future as well.
It's sad this company censors the truth, and this case study has really opened my eyes. As a computer science teacher, I know once you post to the internet, it can never be deleted. That is why I have backed up this comment and post on wayback machine including archive.is & archive.today
I heard that a popular workaround for AI detectors is taking an AI-generated text and running it through a spinbot (paraphrase software), and then it will pass as human. Is that true? What is GPTzero doing about that, or can anything be done about that?