菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-30
📄 Abstract - Dynamic Dual-Granularity Skill Bank for Agentic RL

Agentic reinforcement learning (RL) can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank for agentic RL that organizes reusable experience into task skills for high-level guidance and step skills for fine-grained decision support and error correction. D2Skill jointly trains the policy and skill bank through paired baseline and skill-injected rollouts under the same policy, using their performance gap to derive hindsight utility signals for both skill updating and policy optimization. Built entirely from training-time experience, the skill bank is continuously expanded through reflection and maintained with utility-aware retrieval and pruning. Experiments on ALFWorld and WebShop with Qwen2.5-7B-Instruct and Qwen3-4B-Instruct-2507 show that D2Skill consistently improves success rates over skill-free baselines by 10-20 points. Further ablations and analyses show that both dual-granularity skill modeling and dynamic skill maintenance are critical to these gains, while the learned skills exhibit higher utility, transfer across evaluation settings, and introduce only modest training overhead.

顶级标签: reinforcement learning agents model training
详细标签: skill learning experience reuse hindsight utility dual-granularity dynamic memory 或 搜索:

面向智能体强化学习的动态双粒度技能库 / Dynamic Dual-Granularity Skill Bank for Agentic RL


1️⃣ 一句话总结

这篇论文提出了一个名为D2Skill的动态双粒度技能库,它通过将训练经验组织成任务级和步骤级两种技能,并利用性能差异自动更新和优化技能库,从而显著提升了智能体在复杂任务中的成功率。

源自 arXiv: 2603.28716