Toward Phonology-Guided Sign Language Motion Generation: A Diffusion Baseline and Conditioning Analysis

📄 Abstract - Toward Phonology-Guided Sign Language Motion Generation: A Diffusion Baseline and Conditioning Analysis

Generating natural, correct, and visually smooth 3D avatar sign language motion conditioned on the text inputs continues to be very challenging. In this work, we train a generative model of 3D body motion and explore the role of phonological attribute conditioning for sign language motion generation, using ASL-LEX 2.0 annotations such as hand shape, hand location and movement. We first establish a strong diffusion baseline using an Human Motion MDM-style diffusion model with SMPL-X representation, which outperforms SignAvatar, a state-of-the-art CVAE method, on gloss discriminability metrics. We then systematically study the role of text conditioning using different text encoders (CLIP vs. T5), conditioning modes (gloss-only vs. gloss+phonological attributes), and attribute notation format (symbolic vs. natural language). Our analysis reveals that translating symbolic ASL-LEX notations to natural language is a necessary condition for effective CLIP-based attribute conditioning, while T5 is largely unaffected by this translation. Furthermore, our best-performing variant (CLIP with mapped attributes) outperforms SignAvatar across all metrics. These findings highlight input representation as a critical factor for text-encoder-based attribute conditioning, and motivate structured conditioning approaches where gloss and phonological attributes are encoded through independent pathways.

迈向音系学引导的手语动作生成：一个扩散模型基线及条件化分析 / Toward Phonology-Guided Sign Language Motion Generation: A Diffusion Baseline and Conditioning Analysis

1️⃣ 一句话总结

这篇论文通过建立一个基于扩散模型的强大基线，并系统研究了如何利用手形、位置等音系学属性来引导文本生成更自然、准确的手语动作，发现将符号化属性转换为自然语言描述能有效提升生成质量。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要