面向智能体强化学习的动态双粒度技能库 / Dynamic Dual-Granularity Skill Bank for Agentic RL
1️⃣ 一句话总结
这篇论文提出了一个名为D2Skill的动态双粒度技能库,它通过将训练经验组织成任务级和步骤级两种技能,并利用性能差异自动更新和优化技能库,从而显著提升了智能体在复杂任务中的成功率。
Agentic reinforcement learning (RL) can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank for agentic RL that organizes reusable experience into task skills for high-level guidance and step skills for fine-grained decision support and error correction. D2Skill jointly trains the policy and skill bank through paired baseline and skill-injected rollouts under the same policy, using their performance gap to derive hindsight utility signals for both skill updating and policy optimization. Built entirely from training-time experience, the skill bank is continuously expanded through reflection and maintained with utility-aware retrieval and pruning. Experiments on ALFWorld and WebShop with Qwen2.5-7B-Instruct and Qwen3-4B-Instruct-2507 show that D2Skill consistently improves success rates over skill-free baselines by 10-20 points. Further ablations and analyses show that both dual-granularity skill modeling and dynamic skill maintenance are critical to these gains, while the learned skills exhibit higher utility, transfer across evaluation settings, and introduce only modest training overhead.
面向智能体强化学习的动态双粒度技能库 / Dynamic Dual-Granularity Skill Bank for Agentic RL
这篇论文提出了一个名为D2Skill的动态双粒度技能库,它通过将训练经验组织成任务级和步骤级两种技能,并利用性能差异自动更新和优化技能库,从而显著提升了智能体在复杂任务中的成功率。
源自 arXiv: 2603.28716