菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-16
📄 Abstract - RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

High-level autonomous driving requires motion planners capable of modeling multimodal future uncertainties while remaining robust in closed-loop interactions. Although diffusion-based planners are effective at modeling complex trajectory distributions, they often suffer from stochastic instabilities and the lack of corrective negative feedback when trained purely with imitation learning. To address these issues, we propose RAD-2, a unified generator-discriminator framework for closed-loop planning. Specifically, a diffusion-based generator is used to produce diverse trajectory candidates, while an RL-optimized discriminator reranks these candidates according to their long-term driving quality. This decoupled design avoids directly applying sparse scalar rewards to the full high-dimensional trajectory space, thereby improving optimization stability. To further enhance reinforcement learning, we introduce Temporally Consistent Group Relative Policy Optimization, which exploits temporal coherence to alleviate the credit assignment problem. In addition, we propose On-policy Generator Optimization, which converts closed-loop feedback into structured longitudinal optimization signals and progressively shifts the generator toward high-reward trajectory manifolds. To support efficient large-scale training, we introduce BEV-Warp, a high-throughput simulation environment that performs closed-loop evaluation directly in Bird's-Eye View feature space via spatial warping. RAD-2 reduces the collision rate by 56% compared with strong diffusion-based planners. Real-world deployment further demonstrates improved perceived safety and driving smoothness in complex urban traffic.

顶级标签: reinforcement learning robotics model training
详细标签: autonomous driving diffusion models closed-loop planning policy optimization simulation environment 或 搜索:

RAD-2:在生成器-判别器框架中扩展强化学习 / RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework


1️⃣ 一句话总结

这篇论文提出了一种名为RAD-2的新型自动驾驶规划框架,它结合了扩散模型来生成多种可能的行驶轨迹,并用强化学习优化的判别器来挑选出长期驾驶质量最好的轨迹,从而在保持多样性的同时显著提升了驾驶的安全性和稳定性。

源自 arXiv: 2604.15308