3 minute read
We pitted ChatGPT against tools for detecting AI-written text and the results are troubling
from LawNews- Issue 5
replace a few words with synonyms. Websites offering tools that paraphrase AI-generated text for this purpose are already cropping up all over the internet.
Armin Alimardani & Emma J Jane
Advertisement
As the “chatbot wars” rage in Silicon Valley, the growing proliferation of artificial intelligence (AI) tools specifically designed to generate human-like text has left many baffled Educators in particular are scrambling to adjust to the availability of software that can produce a moderately competent essay on any topic at a moment’s notice. Should we go back to pen-and-paper assessments? Increase exam supervision? Ban the use of AI entirely?
All these and more have been proposed. However, none of these less-than-ideal measures would be needed if educators could reliably distinguish AIgenerated and human-written text.
We dug into several proposed methods and tools for recognising AI-generated text. None is fool-proof, all are vulnerable to workarounds and it’s unlikely they will ever be as reliable as we’d like.
Why can’t the world’s leading AI companies reliably distinguish the products of their own machines from the work of humans? The reason is ridiculously simple: the corporate mission in today’s high-stakes AI arms is to train ‘natural language processor’ (NLP) AIs to produce outputs that are as similar to human writing as possible.
Indeed, public demands for an easy means to spot such AIs in the wild might seem paradoxical, like we’re missing the whole point of the program.
A mediocre effort
OpenAI – the creator of ChatGPT – launched a “classifier for indicating AI-written text” in late January.
The classifier was trained on external AIs as well as the company’s own text-generating engines. In theory, this means it should be able to flag essays generated by BLOOM AI or similar, not just those created by ChatGPT.
We give this classifier a C–grade at best. OpenAI admits it accurately identifies only 26% of AI-generated text (true positive) while incorrectly labelling human prose as AI-generated 9% of the time (false positive).
OpenAI has not shared its research on the rate at which AI-generated text is incorrectly labelled as human-generated text (false negative).
A promising contender
A more promising contender is a classifier created by a Princeton University student during his Christmas break.
Edward Tian, a computer science major minoring in journalism, released the first version of GPTZero in January.
This app identifies AI authorship based on two factors: perplexity and burstiness. Perplexity measures how complex a text is, while burstiness compares the variation between sentences. The lower the values for these two factors, the more likely it is that a text was produced by an AI.
We pitted this modest David against the goliath of ChatGPT.
First, we prompted ChatGPT to generate a short essay about justice. Next, we copied the article –unchanged – into GPTZero. Tian’s tool correctly determined that the text was likely to have been written entirely by an AI because its average perplexity and burstiness scores were very low.
Fooling the classifiers
An easy way to mislead AI classifiers is simply to
Many of these tools display their own set of AI giveaways, such as peppering human prose with tortured phrases” (for example, using “counterfeit consciousness” instead of “AI”).
To test GPTZero further, we copied ChatGPT’s justice essay into GPT-Minus1 – a website offering to “scramble” ChatGPT text with synonyms. It altered about 14% of the text.
We then copied the GPT-Minus1 version of the justice essay back into GPTZero. Its verdict?
“Your text is most likely human written but there are some sentences with low perplexities.”
It highlighted just one sentence it thought had a high chance of having been written by an AI, along with a report on the essay’s overall perplexity and burstiness scores which were much higher
Tools such as Tian’s show great promise but they aren’t perfect and are also vulnerable to workarounds. For instance, a recently released YouTube tutorial explains how to prompt ChatGPT to produce text with high degrees of – you guessed it – perplexity and burstiness.
Watermarking
Another proposal is for AI-written text to contain a “watermark” that is invisible to human readers but can be picked up by software.
Natural language models work on a word-by-word basis. They select which word to generate based on statistical probability.
However, they do not always choose words with the highest probability of appearing together. Instead, from a list of probable words, they select one randomly (though words with higher probability scores are more likely to be selected).
Continued on page 13