菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-15
📄 Abstract - MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching

Tool-Integrated Reasoning (TIR) empowers large language models (LLMs) to tackle complex tasks by interleaving reasoning steps with external tool interactions. However, existing reinforcement learning methods typically rely on outcome- or trajectory-level rewards, assigning uniform advantages to all steps within a trajectory. This coarse-grained credit assignment fails to distinguish effective tool calls from redundant or erroneous ones, particularly in long-horizon multi-turn scenarios. To address this, we propose MatchTIR, a framework that introduces fine-grained supervision via bipartite matching-based turn-level reward assignment and dual-level advantage estimation. Specifically, we formulate credit assignment as a bipartite matching problem between predicted and ground-truth traces, utilizing two assignment strategies to derive dense turn-level rewards. Furthermore, to balance local step precision with global task success, we introduce a dual-level advantage estimation scheme that integrates turn-level and trajectory-level signals, assigning distinct advantage values to individual interaction turns. Extensive experiments on three benchmarks demonstrate the superiority of MatchTIR. Notably, our 4B model surpasses the majority of 8B competitors, particularly in long-horizon and multi-turn tasks. Our codes are available at this https URL.

顶级标签: llm agents model training
详细标签: tool-integrated reasoning credit assignment bipartite matching reinforcement learning multi-turn tasks 或 搜索:

MatchTIR:通过二分图匹配实现工具集成推理的细粒度监督 / MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching


1️⃣ 一句话总结

这篇论文提出了一个名为MatchTIR的新框架,它通过巧妙的二分图匹配方法,为大型语言模型使用外部工具的过程提供了更精细的监督,从而显著提升了模型在复杂、多步骤任务中的表现,让小模型也能达到甚至超过更大模型的性能。

源自 arXiv: 2601.10712