菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-03
📄 Abstract - Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving

Reinforcement learning (RL) is a fundamental methodology in autonomous driving systems, where generative policies exhibit considerable potential by leveraging their ability to model complex distributions to enhance exploration. However, their inherent high inference latency severely impedes their deployment in real-time decision-making and control. To address this issue, we propose diffusion actor-critic with entropy regulator via flow matching (DACER-F) by introducing flow matching into online RL, enabling the generation of competitive actions in a single inference step. By leveraging Langevin dynamics and gradients of the Q-function, DACER-F dynamically optimizes actions from experience replay toward a target distribution that balances high Q-value information with exploratory behavior. The flow policy is then trained to efficiently learn a mapping from a simple prior distribution to this dynamic target. In complex multi-lane and intersection simulations, DACER-F outperforms baselines diffusion actor-critic with entropy regulator (DACER) and distributional soft actor-critic (DSAC), while maintaining an ultra-low inference latency. DACER-F further demonstrates its scalability on standard RL benchmark DeepMind Control Suite (DMC), achieving a score of 775.8 in the humanoid-stand task and surpassing prior methods. Collectively, these results establish DACER-F as a high-performance and computationally efficient RL algorithm.

顶级标签: reinforcement learning agents robotics
详细标签: autonomous driving generative policy flow matching real-time inference diffusion rl 或 搜索:

基于朗之万引导流匹配的自动驾驶实时生成策略 / Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving


1️⃣ 一句话总结

这篇论文提出了一种名为DACER-F的新强化学习算法,它通过结合流匹配技术和朗之万动力学,让自动驾驶系统在保持强大探索能力的同时,只需一步就能生成决策动作,从而实现了高性能与超低延迟的平衡。

源自 arXiv: 2603.02613