Executable World Models for ARC-AGI-3 in the Era of Coding Agents

📄 Abstract - Executable World Models for ARC-AGI-3 in the Era of Coding Agents

We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, and plans through the model before acting. The system is intentionally direct: it uses a scripted controller, predefined world-model interfaces, verifier programs, and a plan executor, but no hand-coded game-specific logic. We report results on the 25 public ARC-AGI-3 games. Each recorded playthrough uses a fresh agent instance with no access to previous playthrough-specific files or conversation state. Most games have a single recorded playthrough; for a few games, we report multiple independent fresh-agent playthroughs to expose run-to-run variability. The agent fully solved 7 games, achieved a Relative Human Action Efficiency greater than 75%, on 6 games, and obtained a mean per-game RHAE of 32.58%. Because the system uses no game-specific code, it can serve as a game-general baseline for ARC-AGI-3. Performance on the private validation set remains to be tested. Overall, the results provide preliminary evidence that verifier-driven executable world models are a promising approach for ARC-AGI-3 agents.

编程智能体时代的ARC-AGI-3可执行世界模型 / Executable World Models for ARC-AGI-3 in the Era of Coding Agents

1️⃣ 一句话总结

本文提出了一种无需游戏特化代码的通用智能体系统，它通过构建可执行的Python世界模型、结合验证器和规划器来理解并解决ARC-AGI-3视觉推理任务，在公开测试中成功解决了7个游戏，并展示了作为通用基线的潜力。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要