菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-13
📄 Abstract - Teacher-Student Structure for Domain Adaptation in Ensemble Audio-Visual Video Deepfake Detection

The rapid advancement of generative AI models is leading to more realistic deepfake media, encompassing the manipulation of audio, video, or both. This raises severe privacy and societal concerns. Numerous studies in this area have yielded promising intra-domain results; however, these models frequently exhibit decreased efficacy when faced with data from dissimilar domains. Consequently, recent deepfake detection approaches focus on enhancing the generalization ability through multiple techniques that incorporate all input modalities, including audio, images, and their interactions. In this regard, we propose the EAV-DFD method, a generalized deep ensemble audio-visual model (EAV-DFD) combined with a domain adaptation mechanism utilizing a teacher-student framework to enhance the model's ability to perform and generalize effectively across unseen domains. To evaluate the model's performance, we used the FakeAVCeleb dataset as the primary domain and the DFDC, Deepfake_TIMIT, and PolyGlotFake datasets as an unseen domain. Our experimental results demonstrate that the proposed framework is efficient in domain adaptation, improving AUC performance of the model by 4.09%, 17.94%, and 0.5% on three unseen datasets, using only a small portion of them to train the student model. This leads to a novel deepfake detection model capable of adapting to new domains and interpreting which modality has been manipulated, highlighting the potential of our approach for real-world applications.

顶级标签: multi-modal audio video
详细标签: deepfake detection domain adaptation teacher-student ensemble generalization 或 搜索:

基于师生结构的集成音视频深度伪造检测领域自适应方法 / Teacher-Student Structure for Domain Adaptation in Ensemble Audio-Visual Video Deepfake Detection


1️⃣ 一句话总结

本文提出了一种结合师生框架的集成音视频深度伪造检测模型(EAV-DFD),通过仅利用少量新领域数据训练学生模型,能有效适应未知数据域,在多个跨域测试集上显著提升了检测性能,并具备判断伪造来自音频还是视频的能力。

源自 arXiv: 2606.15117