Tested 3 ai voice detection tools on the same audio clips - results were rough

ChloeCipher · March 25, 2026, 7:14pm

ok so for my machine learning class project i decided to test some of the ai voice detection tools that are available right now. figured id share results here since theres not a lot of real world testing data out there

setup: 20 audio clips total. 10 real human voice recordings (mix of podcast clips, voice memos, lecture recordings). 10 generated using elevenlabs, play.ht, and bark

results by tool:

tool A: 14/20 correct (70%). flagged 2 real recordings as ai, missed 4 ai clips
tool B: 12/20 correct (60%). actually worse than a coin flip on the elevenlabs clips
tool C: 16/20 correct (80%). best of the three but still missed some obvious ones

the biggest pattern i noticed: tools were decent at catching older tts voices (the robotic sounding ones) but really struggled with emotional, expressive ai voices. the elevenlabs voice clone with emotion was the hardest for every tool

also compression killed accuracy. when i ran the same clips through whatsapp compression first, every tool performed worse. which matters because thats how most voice clips are actually shared in the real world

RustyCircuitX · March 27, 2026, 1:57am

This is really useful data thanks for sharing. the compression finding is especially important because it mirrors what we see with image detection - real world conditions (compression, noise, conversion) destroy the subtle signals that detection tools rely on

80% from the best tool is better than i expected honestly. but for any kind of enforcement or legal use thats nowhere near reliable enough

Marc_Delrieu · March 27, 2026, 5:05am

Excellent methodology. The emotional voice finding is consistent with the literature - tools trained on neutral TTS struggle with expressive synthesis. If you could test at different bitrates I suspect there’s a quality threshold below which detection becomes random.

HugoNomad · March 27, 2026, 9:46am

this is why the detection approach to voice authenticity is probably doomed long term. if 80% is the ceiling with clean audio and it drops from there with real world conditions, you cant build reliable systems on top of that

we need provenance-based solutions for audio just like we need them for images and video. sign it at creation, verify the chain. detection as an afterthought will always lag behind generation

ChloeCipher · March 27, 2026, 4:50pm

@Marc_Delrieu great suggestion, i actually have the data for different bitrates but havent analyzed it yet. my initial impression is that 128kbps mp3 is roughly where accuracy starts tanking but ill do the proper analysis

@HugoNomad yeah the provenance approach makes more sense but how do you implement that for phone calls or live audio? its not like a photo where you can embed metadata at capture time

Topic		Replies	Views
Tested 4 voice detectors on the same 30 sec clip - heres what happened Audio & Voice Authenticity	4	0	June 4, 2026
Voice cloning just got way too easy and it freaks me out Audio & Voice Authenticity	6	0	March 24, 2026
Voice cloning detection - any tools that actually catch it Audio & Voice Authenticity	4	1	April 18, 2026
Tested 5 AI content detectors on the same text - wildly different results Text Authenticity	4	0	April 22, 2026
About the Research category Research	0	1	March 18, 2026

Tested 3 ai voice detection tools on the same audio clips - results were rough

Related topics