菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-30
📄 Abstract - Judge, Then Drive: A Critic-Centric Vision Language Action Framework for Autonomous Driving

Recent advances in vision language action (VLA) models have shown remarkable potential for autonomous driving by directly mapping multimodal inputs to control signals. However, previous VLA-based methods have not explicitly exploited the critic capability of VLAs to refine driving decisions, even though such capability has been well demonstrated in other LLM-based domains, thereby limiting their performance in complex closed-loop scenarios. In this work, we present a theoretically inspired two-stage framework, CriticVLA, which extends the role of VLAs from acting to judging. CriticVLA first generates a rough trajectory and then refines it through multimodal evaluation and single-step optimization guided by a VLA-based critic, yielding higher-quality driving behaviors. To support this process, we construct a large-scale synthetic dataset of 12.9 million annotated trajectories covering diverse driving scenarios, which enhances the critic's reasoning and refinement abilities. Extensive closed-loop experiments on the Bench2Drive benchmark show that CriticVLA significantly surpasses state-of-the-art baselines, achieving a 73.33% total success rate and delivering about 30% improvement in challenging scenarios.

顶级标签: machine learning model training agents
详细标签: vision language action critic framework autonomous driving trajectory refinement bench2drive 或 搜索:

先判断,再驾驶:一种以评判者为核心的视觉语言动作自动驾驶框架 / Judge, Then Drive: A Critic-Centric Vision Language Action Framework for Autonomous Driving


1️⃣ 一句话总结

该论文提出了一种名为CriticVLA的自动驾驶新方法,它让AI先像评判者一样评估自己生成的初步驾驶轨迹,再根据评估结果进行优化,从而在复杂场景下显著提升驾驶成功率,实验显示其总成功率高达73.33%,在困难场景中比现有技术提升了约30%。

源自 arXiv: 2604.27366