Ego2World: Compiling Egocentric Cooking Videos into Executable Worlds for Belief-State Planning

📄 Abstract - Ego2World: Compiling Egocentric Cooking Videos into Executable Worlds for Belief-State Planning

Embodied agents in household environments must plan under partial observation: they need to remember objects, track state changes, and recover when actions fail. Existing benchmarks only partially test this ability. Egocentric video datasets capture realistic human activities but remain passive, while interactive simulators support execution but rely on synthetic scenes and hand-crafted dynamics, introducing a sim-to-real gap and often assuming fully observable state. We introduce Ego2World, an executable benchmark that turns egocentric cooking videos into executable symbolic worlds governed by graph-transition rules. Built on HD-EPIC, Ego2World derives reusable transition rules from video annotations and executes them in a hidden symbolic world graph. During evaluation, the simulator maintains the hidden world graph, while the agent plans over its own partial belief graph using only local observations and execution feedback. This separation forces agents to update memory and replan without observing the true world state. Experiments show that action-overlap scores overestimate physical-state success, and that persistent belief memory improves task completion while reducing repeated visual exploration -- suggesting that belief maintenance should be a first-class target of embodied-agent evaluation.

Ego2World：将第一人称烹饪视频编译为可执行世界，用于信念状态规划 / Ego2World: Compiling Egocentric Cooking Videos into Executable Worlds for Belief-State Planning

1️⃣ 一句话总结

本文提出Ego2World，一个将真实第一人称烹饪视频转化为可执行符号世界的新基准，用于测试智能体在部分可观察环境中通过维护信念（记忆）来规划行动的能力，实验表明基于持久信念记忆的策略比依赖重叠分数的方法更有效。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要