菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-08
📄 Abstract - MIND: Benchmarking Memory Consistency and Action Control in World Models

World models aim to understand, remember, and predict dynamic visual environments, yet a unified benchmark for evaluating their fundamental abilities remains lacking. To address this gap, we introduce MIND, the first open-domain closed-loop revisited benchmark for evaluating Memory consIstency and action coNtrol in worlD models. MIND contains 250 high-quality videos at 1080p and 24 FPS, including 100 (first-person) + 100 (third-person) video clips under a shared action space and 25 + 25 clips across varied action spaces covering eight diverse scenes. We design an efficient evaluation framework to measure two core abilities: memory consistency and action control, capturing temporal stability and contextual coherence across viewpoints. Furthermore, we design various action spaces, including different character movement speeds and camera rotation angles, to evaluate the action generalization capability across different action spaces under shared scenes. To facilitate future performance benchmarking on MIND, we introduce MIND-World, a novel interactive Video-to-World baseline. Extensive experiments demonstrate the completeness of MIND and reveal key challenges in current world models, including the difficulty of maintaining long-term memory consistency and generalizing across action spaces. Code: this https URL.

顶级标签: benchmark computer vision agents
详细标签: world models memory consistency action control video generation evaluation framework 或 搜索:

MIND:世界模型中记忆一致性与行动控制的基准测试 / MIND: Benchmarking Memory Consistency and Action Control in World Models


1️⃣ 一句话总结

这篇论文提出了一个名为MIND的新基准测试,专门用于评估世界模型在理解和预测动态视觉环境时,能否保持长期记忆的一致性以及能否在不同行动指令下进行有效控制,从而揭示了当前模型的不足并推动了该领域的发展。

源自 arXiv: 2602.08025