菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-23
📄 Abstract - Encoder-Free Human Motion Understanding via Structured Motion Descriptions

The world knowledge and reasoning capabilities of text-based large language models (LLMs) are advancing rapidly, yet current approaches to human motion understanding, including motion question answering and captioning, have not fully exploited these capabilities. Existing LLM-based methods typically learn motion-language alignment through dedicated encoders that project motion features into the LLM's embedding space, remaining constrained by cross-modal representation and alignment. Inspired by biomechanical analysis, where joint angles and body-part kinematics have long served as a precise descriptive language for human movement, we propose \textbf{Structured Motion Description (SMD)}, a rule-based, deterministic approach that converts joint position sequences into structured natural language descriptions of joint angles, body part movements, and global trajectory. By representing motion as text, SMD enables LLMs to apply their pretrained knowledge of body parts, spatial directions, and movement semantics directly to motion reasoning, without requiring learned encoders or alignment modules. We show that this approach goes beyond state-of-the-art results on both motion question answering (66.7\% on BABEL-QA, 90.1\% on HuMMan-QA) and motion captioning (R@1 of 0.584, CIDEr of 53.16 on HumanML3D), surpassing all prior methods. SMD additionally offers practical benefits: the same text input works across different LLMs with only lightweight LoRA adaptation (validated on 8 LLMs from 6 model families), and its human-readable representation enables interpretable attention analysis over motion descriptions. Code, data, and pretrained LoRA adapters are available at this https URL.

顶级标签: llm motion understanding
详细标签: motion question answering motion captioning structured motion description encoder-free human motion 或 搜索:

基于结构化运动描述的免编码器人体运动理解 / Encoder-Free Human Motion Understanding via Structured Motion Descriptions


1️⃣ 一句话总结

这篇论文提出了一种将人体运动数据(如关节位置序列)自动转化为结构化文本描述的新方法,使大语言模型无需专门的编码器或对齐模块就能直接理解运动信息,并在运动问答和描述生成任务上超越了所有现有方法。

源自 arXiv: 2604.21668