基于流的视觉-语言-动作模型的不确定性量化 / Uncertainty Quantification for Flow-Based Vision-Language-Action Models
1️⃣ 一句话总结
该论文提出了一种通过小群体模型间的速度场差异来量化预测不确定性的方法,并基于此开发了SAVE框架,能够有效检测机器人操作中的失败风险,同时将新任务所需的昂贵专家演示样本减少至少22%。
Vision-language-action models (VLAs) combine vision-language backbones with expressive generative action heads trained via flow matching on large-scale robotic datasets. Despite their strong empirical performance in robotic manipulation, VLAs lack mechanisms to quantify confidence in their predictions and to detect when their actions may be unreliable. This presents a critical limitation for real-world deployment in non-stationary environments, where models inevitably encounter scenarios outside their pretraining distribution and may fail without warning. To address this, we derive an efficient method for quantifying epistemic uncertainty in flow-matching models by leveraging velocity-field disagreement (VFD) across a small ensemble. We successfully use this uncertainty estimate for failure detection during deployment and active fine-tuning of flow-based VLAs. To this end, we propose SAVE, a framework for uncertainty-guided active multitask fine-tuning that reduces the number of costly expert demonstrations required to adapt VLAs to new tasks. Through extensive experiments on the LIBERO benchmark, we demonstrate that VFD yields better-calibrated uncertainty estimates predictive of downstream performance, that VFD achieves strong performance in detecting failures, and that uncertainty-guided data acquisition with SAVE requires at least 22% fewer samples than baselines. In summary, our work shows that quantifying epistemic uncertainty in flow-based VLAs improves both failure awareness and adaptation. Project website: this http URL.
基于流的视觉-语言-动作模型的不确定性量化 / Uncertainty Quantification for Flow-Based Vision-Language-Action Models
该论文提出了一种通过小群体模型间的速度场差异来量化预测不确定性的方法,并基于此开发了SAVE框架,能够有效检测机器人操作中的失败风险,同时将新任务所需的昂贵专家演示样本减少至少22%。
源自 arXiv: 2606.18043