菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-24
📄 Abstract - Supervised Post-training of Speech Foundation Models for Robust Adaptation in Speech Deepfake Detection

Large speech foundation models have shown strong potential for speech deepfake detection, but direct fine-tuning is limited by a mismatch between self-supervised pre-training objectives and spoof-specific artifacts. To address this, we propose a mix-frame post-training strategy to create localized spoof-oriented perturbations and use frame-level supervision to encourage the SSL model to learn local inconsistencies that are critical for robust spoof detection. On ASVspoof5, we achieve state-of-the-art EER 4.50% for a single model without data augmentation. On ASVspoof2021 LA/DF, it further achieves only 0.16\% absolute EER gap between LA and DF, indicating strong and balanced robustness across distinct distortion conditions. These results show that supervised post-training provides an effective and practical way to adapt speech foundation models for robust deepfake detection.

顶级标签: audio model training
详细标签: speech foundation model deepfake detection post-training spoof detection robust adaptation 或 搜索:

面向语音深度伪造检测的鲁棒自适应:语音基础模型的监督后训练 / Supervised Post-training of Speech Foundation Models for Robust Adaptation in Speech Deepfake Detection


1️⃣ 一句话总结

本研究提出一种监督后训练方法,通过在语音基础模型中引入基于帧级别的混合扰动信号,使其更擅长捕捉伪造语音中的局部异常,从而在不使用数据增强的情况下,显著提升深度伪造检测的鲁棒性和均衡性。

源自 arXiv: 2606.25328