SemanticFace: Semantic Facial Action Estimation via Semantic Distillation in Interpretable Space

📄 Abstract - SemanticFace: Semantic Facial Action Estimation via Semantic Distillation in Interpretable Space

Facial action estimation from a single image is often formulated as predicting or fitting parameters in compact expression spaces, which lack explicit semantic interpretability. However, many practical applications, such as avatar control and human-computer interaction, require interpretable facial actions that correspond to meaningful muscle movements. In this work, we propose SemanticFace, a framework for facial action estimation in the interpretable ARKit blendshape space that reformulates coefficient prediction as structured semantic reasoning. SemanticFace adopts a two-stage semantic distillation paradigm: it first derives structured semantic supervision from ground-truth ARKit coefficients and then distills this knowledge into a multimodal large language model to predict interpretable facial action coefficients from images. Extensive experiments demonstrate that language-aligned semantic supervision improves both coefficient accuracy and perceptual consistency, while enabling strong cross-identity generalization and robustness to large domain shifts, including cartoon faces.

SemanticFace：通过可解释空间中的语义蒸馏实现语义面部动作估计 / SemanticFace: Semantic Facial Action Estimation via Semantic Distillation in Interpretable Space

1️⃣ 一句话总结

这篇论文提出了一个名为SemanticFace的新框架，它通过一个两阶段的语义蒸馏方法，将图像中的人脸表情转化为既精确又易于理解的肌肉运动参数，从而更好地控制数字虚拟形象或进行人机交互。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要