迈向无缝交互:交互式3D对话头部动态的因果轮次建模 / Towards Seamless Interaction: Causal Turn-Level Modeling of Interactive 3D Conversational Head Dynamics
1️⃣ 一句话总结
这篇论文提出了一个名为TIMAR的新方法,它能够像真人对话一样,实时、连贯地生成虚拟人物或机器人的头部动作和表情,让交互看起来更自然。
Human conversation involves continuous exchanges of speech and nonverbal cues such as head nods, gaze shifts, and facial expressions that convey attention and emotion. Modeling these bidirectional dynamics in 3D is essential for building expressive avatars and interactive robots. However, existing frameworks often treat talking and listening as independent processes or rely on non-causal full-sequence modeling, hindering temporal coherence across turns. We present TIMAR (Turn-level Interleaved Masked AutoRegression), a causal framework for 3D conversational head generation that models dialogue as interleaved audio-visual contexts. It fuses multimodal information within each turn and applies turn-level causal attention to accumulate conversational history, while a lightweight diffusion head predicts continuous 3D head dynamics that captures both coordination and expressive variability. Experiments on the DualTalk benchmark show that TIMAR reduces Fréchet Distance and MSE by 15-30% on the test set, and achieves similar gains on out-of-distribution data. The source code will be released in the GitHub repository this https URL.
迈向无缝交互:交互式3D对话头部动态的因果轮次建模 / Towards Seamless Interaction: Causal Turn-Level Modeling of Interactive 3D Conversational Head Dynamics
这篇论文提出了一个名为TIMAR的新方法,它能够像真人对话一样,实时、连贯地生成虚拟人物或机器人的头部动作和表情,让交互看起来更自然。
源自 arXiv: 2512.15340