学习自动化科学发现的搜索技艺 / Learning the ARTS of Search for Automated Discovery
1️⃣ 一句话总结
本文提出了一种名为ARTS的新方法,利用推理型语言模型来改进科学发现的搜索过程,通过区分假设质量与实验执行质量、并利用测试时训练将搜索树知识融入模型权重,从而在多个任务上以更低成本超越现有算法,甚至重新发现了启发式方法遗漏的顶级解决方案。
Scientific discovery can be formulated as an iterative search process over the space of hypotheses and experiments. Contemporary methods navigate this space using heuristics such as MCTS. These algorithms conflate the merit of a hypothesis with the quality of its experimental execution. A promising hypothesis with preliminary execution is therefore ranked below a modest hypothesis whose execution is refined. Moreover, prior methods prune the search logs as the search progresses because the accumulated history outgrows the context window. We propose Agentic Reasoning for Tree Search (ARTS), where we deploy a reasoning language model to navigate this space. The model inspects prior execution logs, diagnoses whether earlier failures arose from faulty implementations or bad hypotheses, and selects the hypothesis to build on next. To mitigate challenges with context length, ARTS uses test-time training to instill the knowledge of search tree in the model weights. Across 22 tasks from MLGym and MLEBench, we show that ARTS outperforms leading algorithms, with over 15.3% relative improvement in the normalized score. With test-time training we show that a Qwen3-4B agent can match performance with closed-source frontier models like Gemini-3 Pro and GPT o3-reasoning with upto 5x lower inference cost. We further observe that on partially observable RL tasks, the test-time trained Qwen3-4B scientist surpasses ARTS with the o3 scientist by rediscovering the human-best recurrent-memory solution that heuristic methods prune away.
学习自动化科学发现的搜索技艺 / Learning the ARTS of Search for Automated Discovery
本文提出了一种名为ARTS的新方法,利用推理型语言模型来改进科学发现的搜索过程,通过区分假设质量与实验执行质量、并利用测试时训练将搜索树知识融入模型权重,从而在多个任务上以更低成本超越现有算法,甚至重新发现了启发式方法遗漏的顶级解决方案。
源自 arXiv: 2606.21891