The Watermark Shortcut: How Provenance Marking Sabotages Audio Deepfake Detection

📄 Abstract - The Watermark Shortcut: How Provenance Marking Sabotages Audio Deepfake Detection

Provenance watermarking is increasingly treated as a safeguard for synthetic speech, whether built directly into speech-generation models such as Chatterbox, provided through dedicated techniques such as AudioSeal, or deployed by commercial platforms such as ElevenLabs. We identify a previously uncharacterized liability: when synthetic speech is watermarked and human speech is not, detectors trained alongside latch onto the watermark as a spurious "watermark => fake" shortcut. This single feature yields three coupled failures: generalization degradation (model performance deteriorates on unseen data), strip-to-evade (a watermarked fake escapes once unwatermarked), and mark-to-frame (watermarking a real voice flags it as fake). In a controlled white-box experiment, a watermark-trained detector shows all three (for example, mark-to-frame lifts Equal Error Rate from 16% to 75%). In a black-box test of a commercial API, we show that adding a watermark to real speech disguises it as fake. However, this shortcut is fixable: retraining with the watermark on both classes decorrelates it and restores clean behavior. We release experiment data as a paired clean-versus-watermarked corpus (WASP).

水印捷径：来源标记如何破坏音频深度伪造检测 / The Watermark Shortcut: How Provenance Marking Sabotages Audio Deepfake Detection

1️⃣ 一句话总结

这篇论文发现，当给合成语音加上水印而真人语音不加时，AI检测模型会错误地将“有水印”与“是伪造”画等号，导致它分不清真假、容易被绕过，甚至把真人语音误判为伪造，但只要对真实语音也加上水印就能修复这个问题。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要