菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-11
📄 Abstract - 3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars

Audio-driven 3D talking avatar generation is increasingly important in virtual communication, digital humans, and interactive media, where avatars must preserve identity, synchronize lip motion with speech, express emotion, and exhibit lifelike spatial dynamics, collectively defining a broader objective of expressivity. However, achieving this remains challenging due to insufficient training data with limited subject identities, narrow audio representations, and restricted explicit controllability. In this paper, we propose 3DXTalker, an expressive 3D talking avatar through data-curated identity modeling, audio-rich representations, and spatial dynamics controllability. 3DXTalker enables scalable identity modeling via 2D-to-3D data curation pipeline and disentangled representations, alleviating data scarcity and improving identity generalization. Then, we introduce frame-wise amplitude and emotional cues beyond standard speech embeddings, ensuring superior lip synchronization and nuanced expression modulation. These cues are unified by a flow-matching-based transformer for coherent facial dynamics. Moreover, 3DXTalker also enables natural head-pose motion generation while supporting stylized control via prompt-based conditioning. Extensive experiments show that 3DXTalker integrates lip synchronization, emotional expression, and head-pose dynamics within a unified framework, achieves superior performance in 3D talking avatar generation.

顶级标签: computer vision multi-modal aigc
详细标签: 3d talking avatar audio-driven generation lip synchronization facial animation virtual humans 或 搜索:

3DXTalker:在富有表现力的3D说话数字人中统一身份、唇形同步、情感与空间动态 / 3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars


1️⃣ 一句话总结

这篇论文提出了一个名为3DXTalker的新方法,它通过创新的数据处理、丰富的音频特征和可控的空间动态生成技术,在一个统一的框架内解决了3D说话数字人生成中身份保持、唇形同步、情感表达和头部姿态自然运动等多个关键难题,显著提升了数字人的表现力。

源自 arXiv: 2602.10516