Supervised Post-training of Speech Foundation Models for Robust Adaptation in Speech Deepfake Detection

📄 Abstract - Supervised Post-training of Speech Foundation Models for Robust Adaptation in Speech Deepfake Detection

Large speech foundation models have shown strong potential for speech deepfake detection, but direct fine-tuning is limited by a mismatch between self-supervised pre-training objectives and spoof-specific artifacts. To address this, we propose a mix-frame post-training strategy to create localized spoof-oriented perturbations and use frame-level supervision to encourage the SSL model to learn local inconsistencies that are critical for robust spoof detection. On ASVspoof5, we achieve state-of-the-art EER 4.50% for a single model without data augmentation. On ASVspoof2021 LA/DF, it further achieves only 0.16\% absolute EER gap between LA and DF, indicating strong and balanced robustness across distinct distortion conditions. These results show that supervised post-training provides an effective and practical way to adapt speech foundation models for robust deepfake detection.

面向语音深度伪造检测的鲁棒自适应：语音基础模型的监督后训练 / Supervised Post-training of Speech Foundation Models for Robust Adaptation in Speech Deepfake Detection

1️⃣ 一句话总结

本研究提出一种监督后训练方法，通过在语音基础模型中引入基于帧级别的混合扰动信号，使其更擅长捕捉伪造语音中的局部异常，从而在不使用数据增强的情况下，显著提升深度伪造检测的鲁棒性和均衡性。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要