A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection

📄 Abstract - A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection

Self-supervised learning (SSL) has transformed speech processing, with benchmarks such as SUPERB establishing fair comparisons across diverse downstream tasks. Despite it's security-critical importance, Audio deepfake detection has remained outside these efforts. In this work, we introduce Spoof-SUPERB, a benchmark for audio deepfake detection that systematically evaluates 20 SSL models spanning generative, discriminative, and spectrogram-based architectures. We evaluated these models on multiple in-domain and out-of-domain datasets. Our results reveal that large-scale discriminative models such as XLS-R, UniSpeech-SAT, and WavLM Large consistently outperform other models, benefiting from multilingual pretraining, speaker-aware objectives, and model scale. We further analyze the robustness of these models under acoustic degradations, showing that generative approaches degrade sharply, while discriminative models remain resilient. This benchmark establishes a reproducible baseline and provides practical insights into which SSL representations are most reliable for securing speech systems against audio deepfakes.

用于音频深度伪造检测的自监督语音模型的SUPERB式基准测试 / A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection

1️⃣ 一句话总结

这篇论文提出了一个名为Spoof-SUPERB的基准测试，通过系统评估20种自监督语音模型在音频深度伪造检测任务上的表现，发现大规模判别式模型（如XLS-R）性能最好且更抗干扰，为选择可靠的语音安全技术提供了实用指导。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要