菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-08
📄 Abstract - Structural Grid Descriptors Predict Within-Task Solver Success on ARC-AGI

We ask whether structural properties of intermediate grid states predict whether a symbolic ARC-AGI solver will succeed, framed as a test of conditional mutual information I(X;Y|task) > 0. Across 44,800 runs spanning two architecturally distinct solvers (beam search and Stochastic DFS), 400 ARC tasks, 28 configurations per solver, and both training and evaluation splits, hand-crafted grid descriptors measured at 50% trajectory completion discriminate successful from failed runs within the same task (mean within-task best-feature AUC = 0.885, p < 0.001 under within-task label permutation). Most predictive content lies along a single grid-complexity axis. The result generalizes across solver architectures: a feature selected on one solver predicts success on the other with AUC 0.747-0.762 in all four transfer directions (p < 0.001, leakage controlled). On a pre-registered held-out set of 41 reliable tasks, the frozen feature n_components_final achieves AUC = 0.765 (95% CI [0.717, 0.810], p < 0.001), robust under task-clustered bootstrap resampling and cross-solver task collapsing. The signal is not explained by solver capacity (configuration-residualized AUC = 0.927 and 0.896 for beam search and SDFS, p < 0.001) and is only weakly coupled to score trajectories (R^2 approximately 0). Early stopping at 50% completion reduces beam-search compute by 33.6% while retaining 98.9% of solves; degenerate-trajectory detection reduces SDFS compute by 65.3% with no solve loss. Finally, on 229 of 400 evaluation tasks the DSL primitive library produces no valid transition from the input grid. This 0-step collapse is invariant to search budget and universally failed by beam search, indicating a DSL coverage limitation rather than a search-budget effect.

顶级标签: machine learning model evaluation
详细标签: arc-agi grid descriptors solver success prediction conditional mutual information early stopping 或 搜索:

结构网格描述符预测ARC-AGI任务内求解器的成功与否 / Structural Grid Descriptors Predict Within-Task Solver Success on ARC-AGI


1️⃣ 一句话总结

本文通过分析中间网格状态的结构特性(如复杂度等),发现这些特征在任务内部能够高精度区分求解器是否成功,并且这一规律在不同算法和任务上普遍适用,进而可用于提前终止无效搜索以大幅节省计算资源。

源自 arXiv: 2606.09026