Explore Before You Solve: The Speed--Depth Trade-off in Epistemic Agents for ARC-AGI-3

📄 Abstract - Explore Before You Solve: The Speed--Depth Trade-off in Epistemic Agents for ARC-AGI-3

We systematically investigate all 25 public ARC-AGI-3 games and find that every one is reachable through non-intelligent strategies: 10 in a single blind step, 5 after one probing action, 1 via repeated ACTION1 presses, 1 via diverse exploration, and 8 via single repeated actions with sufficient budget (50-200 steps). A library-level null-coordinate vulnerability additionally bypasses 18 games in 1 step. This benchmark critique implies the public evaluation set cannot discriminate intelligent exploration from trivial heuristics - the private 55-game evaluation is the only genuine intelligence test. Against this backdrop, we present AERA (Adaptive Epistemic Reasoning Agent), a three-phase (EXPLORE / VERIFY / PLAN) agent achieving RHAE=0.2116 (4/25 solved) on these 25 games with Qwen2.5-0.5B, while random and no-explore baselines score 0.0000. We formalise AERA through a Speed--Depth trade-off framework: under a convexity assumption (proved for a class of environments in the Appendix), RHAE's quadratic form emerges as a second-order penalty for deviating from the Pareto frontier between action efficiency and information gain. Contributions: (i) a benchmark validity analysis showing that current interactive reasoning benchmarks fail to measure the exploration they claim to require, and (ii) the EXPLORE-before-PLAN framework and model-capability x exploration interaction. The linked code track entry achieves RHAE=0.30 on the full 55-game private evaluation. Code: CC0.

探索再求解：面向ARC-AGI-3认知智能体的速度与深度权衡 / Explore Before You Solve: The Speed--Depth Trade-off in Epistemic Agents for ARC-AGI-3

1️⃣ 一句话总结

本文揭示ARC-AGI-3公开测试集存在严重漏洞：大部分题目无需智能推理，仅凭简单试探步骤即可通过；为解决此问题，作者提出一个分三阶段（探索/验证/规划）的认知智能体AERA，并通过速度与探索深度的权衡理论，证明高效智能体必须优先进行信息探索，才能在真正的智能测试中取得好成绩。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要