菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-11
📄 Abstract - Is Your Driving World Model an All-Around Player?

Today's driving world models can generate remarkably realistic dash-cam videos, yet no single model excels universally. Some generate photorealistic textures but violate basic physics; others maintain geometric consistency but fail when subjected to closed-loop planning. This disconnect exposes a critical gap: the field evaluates how real generated worlds appear, but rarely whether they behave realistically. We introduce WorldLens, a unified benchmark that measures world-model fidelity across the full spectrum, from pixel quality and 4D geometry to closed-loop driving and human perceptual alignment, through five complementary aspects and 24 standardized dimensions. Our evaluation of six representative models reveals that no existing approach dominates across all axes: texture-rich models violate geometry, geometry-aware models lack behavioral fidelity, and even the strongest performers achieve only 2-3 out of 10 on human realism ratings. To bridge algorithmic metrics with human perception, we further contribute WorldLens-26K, a 26,808-entry human-annotated preference dataset pairing numerical scores with textual rationales, and WorldLens-Agent, a vision-language evaluator distilled from these judgments that enables scalable, explainable auto-assessment. Together, the benchmark, dataset, and agent form a unified ecosystem for assessing generated worlds not merely by visual appeal, but by physical and behavioral fidelity.

顶级标签: computer vision benchmark
详细标签: world model driving simulation evaluation human perception 4d geometry 或 搜索:

你的驾驶世界模型是全能选手吗? / Is Your Driving World Model an All-Around Player?


1️⃣ 一句话总结

本文指出当前驾驶世界模型虽然能生成逼真的行车视频,但都存在片面性——有的画质好却违反物理规律,有的几何准确但无法用于闭环规划,因此作者提出了一个名为WorldLens的统一基准测试,从像素质量、4D几何、闭环驾驶性能到人类感知评价等多个维度全面评估模型,并配套了一个包含2.6万条人类偏好标注的数据集和一个可解释的自动化评估AI,帮助研究者不仅看视频好不好看,更要看模型模拟的世界是否真实可信。

源自 arXiv: 2605.10858