Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis

📄 Abstract - Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis

Does reinforcement learning genuinely expand what LLM agents can do, or merely make them more reliable? For static reasoning, recent work answers the second: base and RL pass@k curves converge at large k. We ask whether this holds for agentic tool use, where T rounds of interaction enable compositional strategies that re-sampling cannot recover. We introduce PASS@(k,T), a two-dimensional metric that jointly varies sampling budget k and interaction depth T, separating capability expansion from efficiency improvement. Our main finding is that, contrary to the static-reasoning result, tool-use RL genuinely enlarges the capability boundary: the RL agent's pass-curve pulls above the base model's and the gap widens at large k rather than converging. The expansion is specific to compositional, sequential information gathering; on simpler tasks RL behaves as prior work predicts. Under matched training data, supervised fine-tuning regresses the boundary on the same compositional tasks, isolating self-directed exploration as the causal factor. Mechanism analysis shows RL reweights the base strategy distribution toward the subset whose downstream reasoning more often yields a correct answer, with the improvement concentrated on how the agent integrates retrieved information. These results reconcile optimistic and pessimistic readings of RL for LLMs: both are correct, on different task types.

强化学习真的扩展了大语言模型智能体的能力边界吗？一项基于PASS@(k,T)的分析 / Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis

1️⃣ 一句话总结

这篇论文通过引入一个新的评估指标PASS@(k,T)发现，在需要多轮交互和组合策略的复杂工具使用任务中，强化学习能真正扩展大语言模型智能体的能力边界，而不仅仅是提高其可靠性，其关键在于强化学习促进了智能体的自主探索和信息整合能力。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要