菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-28
📄 Abstract - HuM-Eval: A Coarse-to-Fine Framework for Human-Centric Video Evaluation

Video generation models have developed rapidly in recent years, where generating natural human motion plays a pivotal role. However, accurately evaluating the quality of generated human motion video remains a significant challenge. Existing evaluation metrics primarily focus on global scene statistics, often overlooking fine-grained human details and consequently failing to align with human subjective preference. To bridge this gap, we propose HuM-Eval, a novel human-centric evaluation framework that adopts a coarse-to-fine strategy. Specifically, our framework first utilizes a Vision Language Model to perform a coarse assessment of global video quality. It then proceeds to a fine-grained analysis, using 2D pose to verify anatomical correctness and 3D human motion to evaluate motion stability. Extensive experiments demonstrate that HuM-Eval achieves an average human correlation of 58.2%, outperforming state-of-the-art baselines. Furthermore, we introduce HuM-Bench, a comprehensive benchmark comprising 1,000 diverse prompts, and conduct a detailed evaluation of existing text-to-video models, paving the way for next-generation human motion generation.

顶级标签: video generation model evaluation
详细标签: human motion coarse-to-fine benchmark vision language model pose analysis 或 搜索:

HuM-Eval:一种面向人类视频评估的由粗到细框架 / HuM-Eval: A Coarse-to-Fine Framework for Human-Centric Video Evaluation


1️⃣ 一句话总结

该论文提出了一种名为HuM-Eval的新型视频评估框架,它先利用视觉语言模型快速判断视频整体质量,再通过分析人体姿态和运动稳定性来检查细节,从而更准确地评价AI生成的人体运动视频,并与人类主观感受高度一致。

源自 arXiv: 2604.25361