SemanticFace:通过可解释空间中的语义蒸馏实现语义面部动作估计 / SemanticFace: Semantic Facial Action Estimation via Semantic Distillation in Interpretable Space
1️⃣ 一句话总结
这篇论文提出了一个名为SemanticFace的新框架,它通过一个两阶段的语义蒸馏方法,将图像中的人脸表情转化为既精确又易于理解的肌肉运动参数,从而更好地控制数字虚拟形象或进行人机交互。
Facial action estimation from a single image is often formulated as predicting or fitting parameters in compact expression spaces, which lack explicit semantic interpretability. However, many practical applications, such as avatar control and human-computer interaction, require interpretable facial actions that correspond to meaningful muscle movements. In this work, we propose SemanticFace, a framework for facial action estimation in the interpretable ARKit blendshape space that reformulates coefficient prediction as structured semantic reasoning. SemanticFace adopts a two-stage semantic distillation paradigm: it first derives structured semantic supervision from ground-truth ARKit coefficients and then distills this knowledge into a multimodal large language model to predict interpretable facial action coefficients from images. Extensive experiments demonstrate that language-aligned semantic supervision improves both coefficient accuracy and perceptual consistency, while enabling strong cross-identity generalization and robustness to large domain shifts, including cartoon faces.
SemanticFace:通过可解释空间中的语义蒸馏实现语义面部动作估计 / SemanticFace: Semantic Facial Action Estimation via Semantic Distillation in Interpretable Space
这篇论文提出了一个名为SemanticFace的新框架,它通过一个两阶段的语义蒸馏方法,将图像中的人脸表情转化为既精确又易于理解的肌肉运动参数,从而更好地控制数字虚拟形象或进行人机交互。
源自 arXiv: 2603.14827