菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-22
📄 Abstract - The Watermark Shortcut: How Provenance Marking Sabotages Audio Deepfake Detection

Provenance watermarking is increasingly treated as a safeguard for synthetic speech, whether built directly into speech-generation models such as Chatterbox, provided through dedicated techniques such as AudioSeal, or deployed by commercial platforms such as ElevenLabs. We identify a previously uncharacterized liability: when synthetic speech is watermarked and human speech is not, detectors trained alongside latch onto the watermark as a spurious "watermark => fake" shortcut. This single feature yields three coupled failures: generalization degradation (model performance deteriorates on unseen data), strip-to-evade (a watermarked fake escapes once unwatermarked), and mark-to-frame (watermarking a real voice flags it as fake). In a controlled white-box experiment, a watermark-trained detector shows all three (for example, mark-to-frame lifts Equal Error Rate from 16% to 75%). In a black-box test of a commercial API, we show that adding a watermark to real speech disguises it as fake. However, this shortcut is fixable: retraining with the watermark on both classes decorrelates it and restores clean behavior. We release experiment data as a paired clean-versus-watermarked corpus (WASP).

顶级标签: audio machine learning model evaluation
详细标签: deepfake detection watermarking spurious correlation synthetic speech generalization 或 搜索:

水印捷径:来源标记如何破坏音频深度伪造检测 / The Watermark Shortcut: How Provenance Marking Sabotages Audio Deepfake Detection


1️⃣ 一句话总结

这篇论文发现,当给合成语音加上水印而真人语音不加时,AI检测模型会错误地将“有水印”与“是伪造”画等号,导致它分不清真假、容易被绕过,甚至把真人语音误判为伪造,但只要对真实语音也加上水印就能修复这个问题。

源自 arXiv: 2606.23335