Did you know that 72 leading linguistic experts were fooled by AI-generated content?
A 2023 study by Matthew Kessler had experts from the world’s top linguistic journals examine four writing samples and determine which ones were written by AI. They were only able to identify the AI content 38.9% of the time, and not one of the 72 experts correctly identified all four.
When they were asked to explain the factors behind their decisions, Kessler also noticed that their reasoning was either inaccurate, inconsistent, or both.
This study tells us several things (and raises several concerns). Since AI generators are obviously upping their game, guidelines and security measures surrounding the use of AI for content creation need to be improved as well.
Now, one might think; “no, we’re good. Just run the content through Copyleaks or ZeroGPT!”
But if even bonafide human experts can’t sniff out artificially written text, it’s possible that AI detection programs won’t fare much better. After all, these detection tools work by analyzing and predicting patterns. Once you learn the pattern – as large-language models are programmed to do – you can figure out how to bypass it.
This is exactly what Undetectable.ai’s “Humanize” feature does.
So, what about bypassing AI detection? Undetectable.ai (and platforms like it) claim they can rewrite AI content so that it reads like a human wrote it. But is this really the case? And if so, how?
The short answer is yes, but let’s go into more about how it works. I was messing around with the tool for a few hours to learn how it worked and measured up to detectors.
How Reliable Are AI Detection Tools?
Honestly speaking? Not very.
They’re getting better, sure, but so is AI.
As I mentioned earlier, AI detectors rely on patterns. They use three components for this process: training, analysis, and feedback loops. Training depends on the data fed to the detectors. Feedback loops depend on end users. The analysis depends on patterns.
Programs can be built to address one or more of these components specifically and create human-like text. This is how Undetectable.ai works. It analyzes AI text and rewrites it so that it “has the qualities and markers of human written content.”
Here’s a quick step-by-step overview of Undetectable.ai’s Humanize tool behind the scenes:
- Step 1: Break down AI text into its components (syntax, grammar, sentence structure, jargon, etc.) for analysis.
- Step 2: Use advanced algorithms to compare components to an existing data set of human-written content.
- Step 3: Change factors like tone, style, and readability to match human-written content without losing the essence of the original text.
It’s obviously a little more complicated than that but you get the picture. Tools like Undetectable.ai help AI-generated content pass AI detectors by modifying the text so that it’s more human-like.
What is Human-Like Text?
It’s difficult to explain how text can be “human-like.” I mean, if we refer to Kessler’s study, people can have differing opinions on what seems genuine and what seems AI-generated. What seems robotic and stilted to me may read normal or conversational to you.
So to give us a frame of reference, let’s consider three factors: perplexity, burstiness, and redundancy & coherency. A detector would use these three factors to determine if a piece of content is written by man or machine.
Perplexity
Perplexity refers to the predictability of a sequence of words based on previous or existing context. If a machine can easily predict what the next word or words will be in a sentence, it indicates that the author has a very textbook grasp of the language.
They use words correctly, yes, but the execution is simple and – as stated – predictable.
Because human minds are so complex, different people have different thought processes, speaking patterns, and writing styles. This means that human-written content would likely be more complex and harder to predict than AI text.
- High Perplexity Score: Human
- Low Perplexity Score: AI
Burstiness
Burstiness is a fun one. It’s used to determine how varied sentences and paragraphs are in terms of length, structure, and flow. Text with a low burstiness score is more uniform than text with high burstiness. Low burstiness content often feels stiff and stilted.
Take the following paragraphs as examples.
Tina drives to work. She works at a bank. The bank is 8 miles away. So she takes her car. Tina’s work starts at 9. She leaves at 8. She is always on time.
Alternatively:
Tina’s a bank teller. The branch she works at is a few miles from where she lives. So she takes her car. Since her work starts at 9, she makes sure she’s out of the house by 8. So far, she’s never been late!
Both paragraphs say the same thing. But the first paragraph feels like a metronome. Each sentence is four to six words exactly. It’s so uniform, it’s monotonous.
The second paragraph, on the other hand, is a little easier on the eyes and ears. It’s a mix of short phrases followed by longer sentences. There’s time to pause and chances to flow along. It feels a lot more natural – like how people actually talk to each other.
- High Burstiness Score: Human
- Low Burstiness Score: AI
Redundancy & Coherence
You can usually tell that something’s written by AI if it feels redundant.
If you’ve ever had to stretch your closing paragraph out just to reach the 500-word requirement on an essay, then you probably know what I’m talking about.
AI writers like ChatGPT are notorious for repeating phrases and concepts for the sake of stretching an argument. Here’s a juicy example:
Notice how it tells us, in almost every sentence, how the apple’s color indicates its quality. Each sentence is more than ten words, and all three essentially say the same thing: an apple’s color can be used to tell its sweetness, variety, and nutritional value. The color can also indicate the apple’s growing conditions.
I’ll admit, this one’s on me: I asked ChatGPT to write 200 words on a very simple topic. But it’s a good example of how AI writers write: repeat the same concept several times, just present it in different ways.
- Low Redundancy, High Coherence Score: Human
- High Redundancy, Low Coherence Score: AI
Turning Text Human
With the knowledge of the three indicators in mind, let’s see Undetectable.ai’s Humanizer in action.
The original AI text:
Does it ping as AI?
Undetectable.ai says yes. And so do Copyleaks, Sapling, and ZeroGPT.
Now, to Humanize via Undetectable.ai. For the sake of starting strong, I set it to the More Human option, which is great for aggressive AI detectors.
The verdict?
Sapling says 98.5% Human. ZeroGPT says 100% Human. Only Copyleaks sees it as 100% AI (still).
And by comparing both pieces of text using the three factors (perplexity, burstiness, redundancy), we can see significant differences. Here are some of my personal observations:
- Redundancy & Coherence: Even with just the first two sentences of each variation, we can see that the humanized version uses simple words that are more likely to be heard in general everyday conversations. Some examples:
- “Because” versus “due to the presence of”
- “Which create” versus “which are responsible for”
- “Build up” versus “accumulate”
- “Apple type” versus “specific variety of apple”
- Burstiness: The first two sentences of the original AI content are made up of 32 words each. The third sentence drops to a more reasonable 20. Meanwhile, the first sentence of the humanized version has 22 words and the second has 27. The third sentence gives us time to rest with just 14.
- Miscellaneous: ChatGPT’s content uses perfect grammar and punctuation. The sentences, though long, are properly structured. The Humanized version, on the other hand, has run-on sentences, a bit of inconsistent punctuation, and even some missing articles. In this sense, no one can argue that the Humanized version is more polished.
Here are a few more examples of Humanized text. If you read them with perplexity, redundancy, and burstiness in mind, you can say that Undetectable.ai has the right idea.
Original AI Text:
Humanized:
Original AI Text:
Humanized:
Using AI Programs to Humanize AI
So, can Undetectable.ai make AI content more human-like?
If we’re going to base our answer on the three metrics that constitute human-written content, I would have to say that, yes, the technology is there.
However, the technology isn’t perfect.
Despite its best efforts, Undetectable.ai can’t always fool Copyleaks. Its Humanized content also read a bit awkwardly due to the inconsistent grammar and word usage.
But we can see that the platform – and other similar AI detection bypassing programs – knows what to do (at least on a technical level).
It knows to use simple, conversational language for higher perplexity.
It tries to vary the sentence lengths for better rhythm and burstiness.
It also keeps the sentences simple and to the point to avoid redundancy.
So, while it’s not perfect, humanizing programs are definitely starting strong. I would recommend trying out Undetectable if you want to make your text as human-like as possible.