菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-05
📄 Abstract - ExposeAnyone: Personalized Audio-to-Expression Diffusion Models Are Robust Zero-Shot Face Forgery Detectors

Detecting unknown deepfake manipulations remains one of the most challenging problems in face forgery detection. Current state-of-the-art approaches fail to generalize to unseen manipulations, as they primarily rely on supervised training with existing deepfakes or pseudo-fakes, which leads to overfitting to specific forgery patterns. In contrast, self-supervised methods offer greater potential for generalization, but existing work struggles to learn discriminative representations only from self-supervision. In this paper, we propose ExposeAnyone, a fully self-supervised approach based on a diffusion model that generates expression sequences from audio. The key idea is, once the model is personalized to specific subjects using reference sets, it can compute the identity distances between suspected videos and personalized subjects via diffusion reconstruction errors, enabling person-of-interest face forgery detection. Extensive experiments demonstrate that 1) our method outperforms the previous state-of-the-art method by 4.22 percentage points in the average AUC on DF-TIMIT, DFDCP, KoDF, and IDForge datasets, 2) our model is also capable of detecting Sora2-generated videos, where the previous approaches perform poorly, and 3) our method is highly robust to corruptions such as blur and compression, highlighting the applicability in real-world face forgery detection.

顶级标签: computer vision audio model evaluation
详细标签: face forgery detection diffusion models audio-to-expression zero-shot detection self-supervised learning 或 搜索:

ExposeAnyone:基于个性化音频驱动表情扩散模型的零样本人脸伪造检测方法 / ExposeAnyone: Personalized Audio-to-Expression Diffusion Models Are Robust Zero-Shot Face Forgery Detectors


1️⃣ 一句话总结

这篇论文提出了一种名为ExposeAnyone的全新方法,它利用个性化训练的音频生成表情的扩散模型,通过比较视频重建误差来检测未知或新型的人脸伪造内容,在多项测试中表现优于现有技术,并能有效识别如Sora2生成的视频。

源自 arXiv: 2601.02359