OpenREAD:基于LLM作为评判者的强化开放式推理端到端自动驾驶 / OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic
1️⃣ 一句话总结
这篇论文提出了一个名为OpenREAD的端到端自动驾驶框架,它通过使用大型语言模型作为评判者来量化开放式问题的推理质量,并利用强化学习对整个系统进行联合优化,从而在推理和规划任务上取得了领先的性能。
Recently, two-stage fine-tuning strategies, e.g., acquiring essential driving knowledge through supervised fine-tuning (SFT) and further enhancing decision-making and planning via reinforcement fine-tuning (RFT), have shown strong potential in advancing the knowledge-driven autonomous driving (AD) paradigm. However, the learning nature of SFT still limits the generalization of reasoning, thereby constraining the full potential of driving performance. Meanwhile, current RFT approaches are primarily applied to downstream tasks, since scene understanding is an open-ended problem where corresponding rewards are difficult to quantify. To address these limitations, we propose OpenREAD, an OPEN-ended REasoning reinforced vision-language model (VLM)-based autonomous driving (AD) framework that enables end-to-end RFT across the full spectrum from high-level reasoning to low-level trajectory planning. Specifically, we begin by constructing large-scale Chain-of-Thought (CoT) annotations on open-source driving-related knowledge datasets, and employ the powerful Qwen3 large language model (LLM) as the critic in RFT to quantify reasoning quality for open-ended questions during reward modeling. Extensive experiments confirm that joint end-to-end RFT yields substantial improvements in both upstream and downstream tasks, enabling OpenREAD to achieve state-of-the-art performance on reasoning and planning benchmarks.
OpenREAD:基于LLM作为评判者的强化开放式推理端到端自动驾驶 / OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic
这篇论文提出了一个名为OpenREAD的端到端自动驾驶框架,它通过使用大型语言模型作为评判者来量化开放式问题的推理质量,并利用强化学习对整个系统进行联合优化,从而在推理和规划任务上取得了领先的性能。