强化智能体模型中的行为知识融合 / Behavior Knowledge Merge in Reinforced Agentic Models
1️⃣ 一句话总结
这篇论文针对强化学习训练的智能体模型,提出了一种名为RAM的分布感知融合框架,它通过区分并分别处理共享和任务特有的参数更新,有效解决了传统模型融合方法在整合多个任务专家时性能下降的问题,从而创造出一个性能优于单个专家的通用智能体。
Reinforcement learning (RL) is central to post-training, particularly for agentic models that require specialized reasoning behaviors. In this setting, model merging offers a practical mechanism for integrating multiple RL-trained agents from different tasks into a single generalist model. However, existing merging methods are designed for supervised fine-tuning (SFT), and they are suboptimal to preserve task-specific capabilities on RL-trained agentic models. The root is a task-vector mismatch between RL and SFT: on-policy RL induces task vectors that are highly sparse and heterogeneous, whereas SFT-style merging implicitly assumes dense and globally comparable task vectors. When standard global averaging is applied under this mismatch, RL's non-overlapping task vectors that encode critical task-specific behaviors are reduced and parameter updates are diluted. To address this issue, we propose Reinforced Agent Merging (RAM), a distribution-aware merging framework explicitly designed for RL-trained agentic models. RAM disentangles shared and task-specific unique parameter updates, averaging shared components while selectively preserving and rescaling unique ones to counteract parameter update dilution. Experiments across multiple agent domains and model architectures demonstrate that RAM not only surpasses merging baselines, but also unlocks synergistic potential among agents to achieve performance superior to that of specialized agents in their domains.
强化智能体模型中的行为知识融合 / Behavior Knowledge Merge in Reinforced Agentic Models
这篇论文针对强化学习训练的智能体模型,提出了一种名为RAM的分布感知融合框架,它通过区分并分别处理共享和任务特有的参数更新,有效解决了传统模型融合方法在整合多个任务专家时性能下降的问题,从而创造出一个性能优于单个专家的通用智能体。
源自 arXiv: 2601.13572