ARROW:用于鲁棒世界模型的增强回放 / ARROW: Augmented Replay for RObust World models
1️⃣ 一句话总结
这篇论文提出了一种名为ARROW的新型持续强化学习算法,它通过引入一个受神经科学启发的、高效的双重回放缓冲区来训练一个世界模型,从而在让智能体学习新任务的同时,显著减少对旧任务的遗忘,并保持知识迁移能力。
Continual reinforcement learning challenges agents to acquire new skills while retaining previously learned ones with the goal of improving performance in both past and future tasks. Most existing approaches rely on model-free methods with replay buffers to mitigate catastrophic forgetting; however, these solutions often face significant scalability challenges due to large memory demands. Drawing inspiration from neuroscience, where the brain replays experiences to a predictive World Model rather than directly to the policy, we present ARROW (Augmented Replay for RObust World models), a model-based continual RL algorithm that extends DreamerV3 with a memory-efficient, distribution-matching replay buffer. Unlike standard fixed-size FIFO buffers, ARROW maintains two complementary buffers: a short-term buffer for recent experiences and a long-term buffer that preserves task diversity through intelligent sampling. We evaluate ARROW on two challenging continual RL settings: Tasks without shared structure (Atari), and tasks with shared structure, where knowledge transfer is possible (Procgen CoinRun variants). Compared to model-free and model-based baselines with replay buffers of the same-size, ARROW demonstrates substantially less forgetting on tasks without shared structure, while maintaining comparable forward transfer. Our findings highlight the potential of model-based RL and bio-inspired approaches for continual reinforcement learning, warranting further research.
ARROW:用于鲁棒世界模型的增强回放 / ARROW: Augmented Replay for RObust World models
这篇论文提出了一种名为ARROW的新型持续强化学习算法,它通过引入一个受神经科学启发的、高效的双重回放缓冲区来训练一个世界模型,从而在让智能体学习新任务的同时,显著减少对旧任务的遗忘,并保持知识迁移能力。
源自 arXiv: 2603.11395