菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-16
📄 Abstract - FAR-Drive: Frame-AutoRegressive Video Generation in Closed-Loop Autonomous Driving

Despite rapid progress in autonomous driving, reliable training and evaluation of driving systems remain fundamentally constrained by the lack of scalable and interactive simulation environments. Recent generative video models achieve remarkable visual fidelity, yet most operate in open-loop settings and fail to support fine-grained frame-level interaction between agent actions and environment evolution. Building a learning-based closed-loop simulator for autonomous driving poses three major challenges: maintaining long-horizon temporal and cross-view consistency, mitigating autoregressive degradation under iterative self-conditioning, and satisfying low-latency inference constraints. In this work, we propose FAR-Drive, a frame-level autoregressive video generation framework for autonomous driving. We introduce a multi-view diffusion transformer with fine-grained structured control, enabling geometrically consistent multi-camera generation. To address long-horizon consistency and iterative degradation, we design a two-stage training strategy consisting of adaptive reference horizon conditioning and blend-forcing autoregressive training, which progressively improves consistency and robustness under self-conditioning. To meet low-latency interaction requirements, we further integrate system-level efficiency optimizations for inference acceleration. Experiments on the nuScenes dataset demonstrate that our method achieves state-of-the-art performance among existing closed-loop autonomous driving simulation approaches, while maintaining sub-second latency on a single GPU.

顶级标签: computer vision video generation systems
详细标签: autonomous driving closed-loop simulation multi-view diffusion autoregressive video generation low-latency inference 或 搜索:

FAR-Drive:闭环自动驾驶中的帧自回归视频生成 / FAR-Drive: Frame-AutoRegressive Video Generation in Closed-Loop Autonomous Driving


1️⃣ 一句话总结

这篇论文提出了一个名为FAR-Drive的闭环自动驾驶模拟器,它通过一种创新的帧自回归视频生成技术,能够根据车辆动作实时、一致地生成多视角驾驶场景视频,解决了现有模拟环境在交互性、长期一致性和实时性方面的关键挑战。

源自 arXiv: 2603.14938