菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-20
📄 Abstract - Test-Time Perturbation Learning with Delayed Feedback for Vision-Language-Action Models

Vision-Language-Action models (VLAs) achieve remarkable performance in sequential decision-making but remain fragile to subtle environmental shifts, such as small changes in object pose. We attribute this brittleness to trajectory overfitting, where VLAs over-attend to the spurious correlation between actions and entities, then reproduce memorized action patterns. We propose Perturbation learning with Delayed Feedback (PDF), a verifier-free test-time adaptation framework that improves decision performance without fine-tuning the base model. PDF mitigates the spurious correlation through uncertainty-based data augmentation and action voting, while an adaptive scheduler allocates augmentation budgets to balance performance and efficiency. To further improve stability, PDF learns a lightweight perturbation module that retrospectively adjusts action logits guided by delayed feedback, correcting overconfidence issue. Experiments on LIBERO (+7.4\% success rate) and Atari (+10.3 human normalized score) demonstrate consistent gains of PDF in task success over vanilla VLA and VLA with test-time adaptation, establishing a practical path toward reliable test-time adaptation in multimodal decision-making agents. The code is available at \href{this https URL}{this https URL}.

顶级标签: agents model evaluation multi-modal
详细标签: test-time adaptation vision-language-action models decision-making uncertainty delayed feedback 或 搜索:

基于延迟反馈的测试时扰动学习用于视觉-语言-动作模型 / Test-Time Perturbation Learning with Delayed Feedback for Vision-Language-Action Models


1️⃣ 一句话总结

本文提出了一种名为PDF的免验证器测试时自适应框架,它通过不确定性数据增强、动作投票和轻量级扰动学习来纠正视觉-语言-动作模型在环境变化下的过拟合问题,从而显著提升了任务成功率,且无需微调基础模型。

源自 arXiv: 2604.18107