← 返回列表

arXiv 提交日期: 2026-03-05

📄 Abstract - Reward-Conditioned Reinforcement Learning

RL agents are typically trained under a single, fixed reward function, which makes them brittle to reward misspecification and limits their ability to adapt to changing task preferences. We introduce Reward-Conditioned Reinforcement Learning (RCRL), a framework that trains a single agent to optimize a family of reward specifications while collecting experience under only one nominal objective. RCRL conditions the agent on reward parameterizations and learns multiple reward objectives from a shared replay data entirely off-policy, enabling a single policy to represent reward-specific behaviors. Across single-task, multi-task, and vision-based benchmarks, we show that RCRL not only improves performance under the nominal reward parameterization, but also enables efficient adaptation to new parameterizations. Our results demonstrate that RCRL provides a scalable mechanism for learning robust, steerable policies without sacrificing the simplicity of single-task training.

顶级标签: reinforcement learning model training agents

奖励条件化强化学习 / Reward-Conditioned Reinforcement Learning

1️⃣ 一句话总结

这篇论文提出了一种名为‘奖励条件化强化学习’的新方法，它能让一个智能体学会应对多种不同的任务目标，而不仅仅局限于训练时设定的单一奖励标准，从而提高了智能体的适应性和鲁棒性。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2603.05066

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要