菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-19
📄 Abstract - FASTER: Rethinking Real-Time Flow VLAs

Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in $\pi_{0.5}$ and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.

顶级标签: robotics model training agents
详细标签: vision-language-action real-time execution action sampling reaction latency trajectory generation 或 搜索:

FASTER:重新思考实时流式视觉语言动作模型 / FASTER: Rethinking Real-Time Flow VLAs


1️⃣ 一句话总结

这篇论文提出了一种名为FASTER的新方法,通过优化动作生成的时间安排,让机器人视觉语言模型能像人类一样对环境变化做出快速反应,显著降低了反应延迟,从而在动态任务中实现前所未有的实时响应能力。

源自 arXiv: 2603.19199