Quantitative Video World Model Evaluation for Geometric-Consistency

📄 Abstract - Quantitative Video World Model Evaluation for Geometric-Consistency

Generative video models are increasingly studied as implicit world models, yet evaluating whether they produce physically plausible 3D structure and motion remains challenging. Most existing video evaluation pipelines rely heavily on human judgment or learned graders, which can be subjective and weakly diagnostic for geometric failures. We introduce PDI-Bench (Perspective Distortion Index), a quantitative framework for auditing geometric coherence in generated videos. Given a generated clip, we obtain object-centric observations via segmentation and point tracking (e.g., SAM 2, MegaSaM, and CoTracker3), lift them to 3D world-space coordinates via monocular reconstruction, and compute a set of projective-geometry residuals capturing three failure dimensions: scale-depth alignment, 3D motion consistency, and 3D structural rigidity. To support systematic evaluation, we build PDI-Dataset, covering diverse scenarios designed to stress these geometric constraints. Across state-of-the-art video generators, PDI reveals consistent geometry-specific failure modes that are not captured by common perceptual metrics, and provides a diagnostic signal for progress toward physically grounded video generation and physical world model. Our code and dataset can be found at this https URL.

面向几何一致性的定量视频世界模型评估 / Quantitative Video World Model Evaluation for Geometric-Consistency

1️⃣ 一句话总结

本文提出了一种名为PDI-Bench的定量评估框架，通过将生成视频中的物体分割、点跟踪并映射到3D空间，自动检测视频在尺度深度、运动一致性和物体刚性等方面的几何错误，从而更客观地衡量视频生成模型是否具备符合物理规律的世界理解能力。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要