菜单

🤖 系统
📄 Abstract - Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos

Despite rapid advances in video generative models, robust metrics for evaluating visual and temporal correctness of complex human actions remain elusive. Critically, existing pure-vision encoders and Multimodal Large Language Models (MLLMs) are strongly appearance-biased, lack temporal understanding, and thus struggle to discern intricate motion dynamics and anatomical implausibilities in generated videos. We tackle this gap by introducing a novel evaluation metric derived from a learned latent space of real-world human actions. Our method first captures the nuances, constraints, and temporal smoothness of real-world motion by fusing appearance-agnostic human skeletal geometry features with appearance-based features. We posit that this combined feature space provides a robust representation of action plausibility. Given a generated video, our metric quantifies its action quality by measuring the distance between its underlying representations and this learned real-world action distribution. For rigorous validation, we develop a new multi-faceted benchmark specifically designed to probe temporally challenging aspects of human action fidelity. Through extensive experiments, we show that our metric achieves substantial improvement of more than 68% compared to existing state-of-the-art methods on our benchmark, performs competitively on established external benchmarks, and has a stronger correlation with human perception. Our in-depth analysis reveals critical limitations in current video generative models and establishes a new standard for advanced research in video generation.

顶级标签: video generation model evaluation multi-modal
详细标签: human motion evaluation video quality metric skeletal geometry temporal understanding action plausibility 或 搜索:

生成式动作讲述者:评估合成视频中的人体运动 / Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos


1️⃣ 一句话总结

这篇论文提出了一种新的评估指标,通过融合人体骨骼几何特征和外观特征来学习真实世界动作的潜在空间,从而更准确地衡量AI生成的视频中人体动作是否自然流畅,解决了现有方法难以评估复杂动作动态和生理合理性的问题。


📄 打开原文 PDF