UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

📄 Abstract - UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

Autonomous driving (AD) systems struggle in long-tail scenarios due to limited world knowledge and weak visual dynamic modeling. Existing vision-language-action (VLA)-based methods cannot leverage unlabeled videos for visual causal learning, while world model-based methods lack reasoning capabilities from large language models. In this paper, we construct multiple specialized datasets providing reasoning and planning annotations for complex scenarios. Then, a unified Understanding-Generation-Planning framework, named UniUGP, is proposed to synergize scene reasoning, future video generation, and trajectory planning through a hybrid expert architecture. By integrating pre-trained VLMs and video generation models, UniUGP leverages visual dynamics and semantic reasoning to enhance planning performance. Taking multi-frame observations and language instructions as input, it produces interpretable chain-of-thought reasoning, physically consistent trajectories, and coherent future videos. We introduce a four-stage training strategy that progressively builds these capabilities across multiple existing AD datasets, along with the proposed specialized datasets. Experiments demonstrate state-of-the-art performance in perception, reasoning, and decision-making, with superior generalization to challenging long-tail situations.

UniUGP：面向端到端自动驾驶的统一理解、生成与规划框架 / UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

1️⃣ 一句话总结

这篇论文提出了一个名为UniUGP的端到端自动驾驶框架，它通过整合场景理解、未来视频生成和轨迹规划，并利用专门的数据集和分阶段训练策略，有效提升了系统在复杂和罕见路况下的感知、推理与决策能力。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要