菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-18
📄 Abstract - Reinforcement Learning for Self-Improving Agent with Skill Library

Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in complex reasoning and multi-turn interactions but struggle to continuously improve and adapt when deployed in new environments. One promising approach is implementing skill libraries that allow agents to learn, validate, and apply new skills. However, current skill library approaches rely primarily on LLM prompting, making consistent skill library implementation challenging. To overcome these challenges, we propose a Reinforcement Learning (RL)-based approach to enhance agents' self-improvement capabilities with a skill library. Specifically, we introduce Skill Augmented GRPO for self-Evolution (SAGE), a novel RL framework that systematically incorporates skills into learning. The framework's key component, Sequential Rollout, iteratively deploys agents across a chain of similar tasks for each rollout. As agents navigate through the task chain, skills generated from previous tasks accumulate in the library and become available for subsequent tasks. Additionally, the framework enhances skill generation and utilization through a Skill-integrated Reward that complements the original outcome-based rewards. Experimental results on AppWorld demonstrate that SAGE, when applied to supervised-finetuned model with expert experience, achieves 8.9% higher Scenario Goal Completion while requiring 26% fewer interaction steps and generating 59% fewer tokens, substantially outperforming existing approaches in both accuracy and efficiency.

顶级标签: reinforcement learning agents llm
详细标签: skill library self-improving agent policy optimization sequential deployment reward shaping 或 搜索:

SAGE:一种基于强化学习的技能库智能体自我进化框架 / Reinforcement Learning for Self-Improving Agent with Skill Library


1️⃣ 一句话总结

本文提出了一种名为SAGE的新型强化学习框架,通过顺序部署和技能集成奖励机制,使基于大语言模型的智能体能够在新环境中持续学习、积累和复用技能,从而实现自我改进和高效适应。


2️⃣ 论文创新点

1. SAGE强化学习框架

2. 顺序部署机制

3. 技能集成奖励机制

4. 技能库智能体统一交互框架


3️⃣ 主要结果与价值

结果亮点

实际价值


4️⃣ 术语表

源自 arXiv: 2512.17102