Interpretable facial dynamics as behavioral and perceptual traces of deepfakes

📄 Abstract - Interpretable facial dynamics as behavioral and perceptual traces of deepfakes

Deepfake detection research has largely converged on deep learning approaches that, despite strong benchmark performance, offer limited insight into what distinguishes real from manipulated facial behavior. This study presents an interpretable alternative grounded in bio-behavioral features of facial dynamics and evaluates how computational detection strategies relate to human perceptual judgments. We identify core low-dimensional patterns of facial movement, from which temporal features characterizing spatiotemporal structure were derived. Traditional machine learning classifiers trained on these features achieved modest but significant above-chance deepfake classification, driven by higher-order temporal irregularities that were more pronounced in manipulated than real facial dynamics. Notably, detection was substantially more accurate for videos containing emotive expressions than those without. An emotional valence classification analysis further indicated that emotive signals are systematically degraded in deepfakes, explaining the differential impact of emotive dynamics on detection. Furthermore, we provide an additional and often overlooked dimension of explainability by assessing the relationship between model decisions and human perceptual detection. Model and human judgments converged for emotive but diverged for non-emotive videos, and even where outputs aligned, underlying detection strategies differed. These findings demonstrate that face-swapped deepfakes carry a measurable behavioral fingerprint, most salient during emotional expression. Additionally, model-human comparisons suggest that interpretable computational features and human perception may offer complementary rather than redundant routes to detection.

可解释的面部动力学：深度伪造的行为与感知痕迹 / Interpretable facial dynamics as behavioral and perceptual traces of deepfakes

1️⃣ 一句话总结

本研究提出一种基于面部运动生物行为特征的可解释方法，用于检测深度伪造视频，发现伪造视频在情绪表达时会出现明显的高阶时间不规则性，且机器与人类对情绪视频的检测判断趋于一致，而对非情绪视频则存在差异，表明可解释的计算特征和人类感知可以互补而非冗余。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要