菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-18
📄 Abstract - How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection

Since Text-to-Speech systems typically don't produce waveforms directly, recent spoof detection studies use resynthesized waveforms from vocoders and neural audio codecs to simulate an attacker. Unlike vocoders, which are specifically designed for speech synthesis, neural audio codecs were originally developed for compressing audio for storage and transmission. However, their ability to discretize speech also sparked interest in language-modeling-based speech synthesis. Owing to this dual functionality, codec resynthesized data may be labeled as either bonafide or spoof. So far, very little research has addressed this issue. In this study, we present a challenging extension of the ASVspoof 5 dataset constructed for this purpose. We examine how different labeling choices affect detection performance and provide insights into labeling strategies.

顶级标签: audio aigc model evaluation
详细标签: audio deepfake detection neural audio codec asvspoof data labeling speech synthesis 或 搜索:

如何标注重合成音频:神经音频编解码器在音频深度伪造检测中的双重角色 / How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection


1️⃣ 一句话总结

这篇论文探讨了神经音频编解码器在音频伪造检测中的双重用途问题,并通过构建一个挑战性的数据集,研究了不同数据标注策略对检测性能的影响。

源自 arXiv: 2602.16343