表示学习实现可扩展的多任务深度强化学习 / Representation Learning Enables Scalable Multitask Deep Reinforcement Learning
1️⃣ 一句话总结
本文提出,实现高效的多任务强化学习的关键并非复杂的规划或模型预测,而是通过辅助预测任务来学习更好的状态表示,并基于此设计了一种简单无模型的算法MR.Q,在多个连续控制任务上超越了现有世界模型方法,兼具高性能和高计算效率。
Scaling reinforcement learning (RL) to diverse multitask settings remains a central challenge. While recent advances in model-based RL achieve strong performance, they rely on planning and complex training pipelines, making it unclear which components are essential for scalability. We revisit this question and argue that the primary driver of scalable multitask RL is not model-based control, but \emph{representation learning}. In particular, we show that combining predictive, model-based representations with high-capacity value function approximation is sufficient to achieve strong performance, even without planning. We evaluate a simple model-free algorithm, MR.Q, coupled with auxiliary predictive objectives into a scalable actor-critic architecture. This approach outperforms a recent world-model-based method and a range of deep RL baselines across a diverse suite of multitask continuous control tasks, while significantly reducing computational overhead and improving wall-clock efficiency. We observe consistent improvements with increased model capacity and show through ablations that predictive representation learning is critical for performance.
表示学习实现可扩展的多任务深度强化学习 / Representation Learning Enables Scalable Multitask Deep Reinforcement Learning
本文提出,实现高效的多任务强化学习的关键并非复杂的规划或模型预测,而是通过辅助预测任务来学习更好的状态表示,并基于此设计了一种简单无模型的算法MR.Q,在多个连续控制任务上超越了现有世界模型方法,兼具高性能和高计算效率。
源自 arXiv: 2606.05555