元强化学习引导语言智能体进行探索 / Meta-RL Induces Exploration in Language Agents
1️⃣ 一句话总结
这篇论文提出了一个名为LaMer的元强化学习框架,它能让大型语言模型智能体在执行任务时更主动地探索环境并从反馈中学习,从而在多种复杂任务上取得比传统强化学习方法更好的性能和更强的适应能力。
Reinforcement learning (RL) has enabled the training of large language model (LLM) agents to interact with the environment and to solve multi-turn long-horizon tasks. However, the RL-trained agents often struggle in tasks that require active exploration and fail to efficiently adapt from trial-and-error experiences. In this paper, we present LaMer, a general Meta-RL framework that enables LLM agents to actively explore and learn from the environment feedback at test time. LaMer consists of two key components: (i) a cross-episode training framework to encourage exploration and long-term rewards optimization; and (ii) in-context policy adaptation via reflection, allowing the agent to adapt their policy from task feedback signal without gradient update. Experiments across diverse environments show that LaMer significantly improves performance over RL baselines, with 11%, 14%, and 19% performance gains on Sokoban, MineSweeper and Webshop, respectively. Moreover, LaMer also demonstrates better generalization to more challenging or previously unseen tasks compared to the RL-trained agents. Overall, our results demonstrate that Meta-RL provides a principled approach to induce exploration in language agents, enabling more robust adaptation to novel environments through learned exploration strategies.
元强化学习引导语言智能体进行探索 / Meta-RL Induces Exploration in Language Agents
这篇论文提出了一个名为LaMer的元强化学习框架,它能让大型语言模型智能体在执行任务时更主动地探索环境并从反馈中学习,从而在多种复杂任务上取得比传统强化学习方法更好的性能和更强的适应能力。
源自 arXiv: 2512.16848