Last week, some voters in New Hampshire received an AI-generated robocall impersonating President Biden, telling them not to vote in the state’s primary election. It’s not clear who was responsible for the call, but two separate teams of audio experts tell WIRED it was likely created using technology from voice-cloning startup ElevenLabs.
ElevenLabs markets its AI tools for uses like audiobooks and video games; it recently achieved “unicorn” status by raising $80 million at a $1.1 billion valuation in a new funding round co-led by venture firm Andreessen Horowitz. Anyone can sign up for the company’s paid service and clone a voice from an audio sample. The company’s safety policy says it is best to obtain someone’s permission before cloning their voice, but that permissionless cloning can be OK for a variety of non-commercial purposes, including “political speech contributing to public debates.” ElevenLabs did not respond to multiple requests for comment.
Pindrop, a security company that develops tools to identify synthetic audio, claimed in a blog post on Thursday that its analysis of audio from the call pointed to ElevenLabs’ technology or a “system using similar components.” The Pindrop research team checked patterns in the audio clip against more than 120 different voice synthesis engines looking for a match, but wasn’t expecting to find one because identifying the provenance of AI-generated audio can be difficult. The results were surprisingly clear, says Pindrop CEO Vijay Balasubramaniyan. “It came back well north of 99 percent that it was ElevenLabs,” he says.
The Pindrop team worked on a 39-second clip the company obtained of one of the AI-generated robocalls. It sought to verify its results by also analyzing audio samples known to have been created using ElevenLabs’ technology and also with another voice synthesis tool to check over the methodology.
ElevenLabs offers its own AI speech detector on its website that it says can tell whether an audio clip was created using the company’s technology. When Pindrop ran its sample of the suspect robocall through that system, it came back as 84 percent likely to be generated using ElevenLabs tools. WIRED independently got the same result when checking Pindrop’s audio sample with the ElevenLabs detector.
Hany Farid, a digital forensics specialist at the UC Berkeley School of Information, was initially skeptical of claims that the Biden robocall came from ElevenLabs. “When you hear the audio from a cloned voice from ElevenLabs, it’s really good,” he says. “The version of the Biden call that I heard was not particularly good, but the cadence was really funky. It just didn’t sound of the quality that I would have expected from ElevenLabs.”
But when Farid had his team at Berkeley conduct its own, independent analysis of the audio sample obtained by Pindrop, it too reached the same conclusion. “Our model says with high confidence that it is AI-generated and likely to be ElevenLabs,” he claims.
This is not the first time that researchers have suspected ElevenLabs tools were used for political propaganda. Last September, NewsGuard, a company that tracks online misinformation, claimed that TikTok accounts sharing conspiracy theories using AI-generated voices, including a clone of Barack Obama’s voice, used ElevenLabs’ technology. “Over 99 percent of users on our platform are creating interesting, innovative, useful content,” ElevenLabs said in an emailed statement to The New York Times at the time, “but we recognize that there are instances of misuse, and we’ve been continually developing and releasing safeguards to curb them.”