菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-16
📄 Abstract - Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis

Does reinforcement learning genuinely expand what LLM agents can do, or merely make them more reliable? For static reasoning, recent work answers the second: base and RL pass@k curves converge at large k. We ask whether this holds for agentic tool use, where T rounds of interaction enable compositional strategies that re-sampling cannot recover. We introduce PASS@(k,T), a two-dimensional metric that jointly varies sampling budget k and interaction depth T, separating capability expansion from efficiency improvement. Our main finding is that, contrary to the static-reasoning result, tool-use RL genuinely enlarges the capability boundary: the RL agent's pass-curve pulls above the base model's and the gap widens at large k rather than converging. The expansion is specific to compositional, sequential information gathering; on simpler tasks RL behaves as prior work predicts. Under matched training data, supervised fine-tuning regresses the boundary on the same compositional tasks, isolating self-directed exploration as the causal factor. Mechanism analysis shows RL reweights the base strategy distribution toward the subset whose downstream reasoning more often yields a correct answer, with the improvement concentrated on how the agent integrates retrieved information. These results reconcile optimistic and pessimistic readings of RL for LLMs: both are correct, on different task types.

顶级标签: llm agents model evaluation
详细标签: reinforcement learning capability analysis tool use agent evaluation pass@(k,t) 或 搜索:

强化学习真的扩展了大语言模型智能体的能力边界吗?一项基于PASS@(k,T)的分析 / Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis


1️⃣ 一句话总结

这篇论文通过引入一个新的评估指标PASS@(k,T)发现,在需要多轮交互和组合策略的复杂工具使用任务中,强化学习能真正扩展大语言模型智能体的能力边界,而不仅仅是提高其可靠性,其关键在于强化学习促进了智能体的自主探索和信息整合能力。

源自 arXiv: 2604.14877