BPP:通过聚焦关键历史帧实现长上下文机器人模仿学习 / BPP: Long-Context Robot Imitation Learning by Focusing on Key History Frames
1️⃣ 一句话总结
本文提出了一种名为BPP的新方法,它利用视觉语言模型自动识别任务中的关键帧,让机器人只关注这些有意义的过去时刻,从而有效解决了传统方法在依赖历史观察时容易出错、难以推广到新场景的问题,并在多项真实和模拟任务中取得了显著更好的效果。
Many robot tasks require attending to the history of past observations. For example, finding an item in a room requires remembering which places have already been searched. However, the best-performing robot policies typically condition only on the current observation, limiting their applicability to such tasks. Naively conditioning on past observations often fails due to spurious correlations: policies latch onto incidental features of training histories that do not generalize to out-of-distribution trajectories upon deployment. We analyze why policies latch onto these spurious correlations and find that this problem stems from limited coverage over the space of possible histories during training, which grows exponentially with horizon. Existing regularization techniques provide inconsistent benefits across tasks, as they do not fundamentally address this coverage problem. Motivated by these findings, we propose Big Picture Policies (BPP), an approach that conditions on a minimal set of meaningful keyframes detected by a vision-language model. By projecting diverse rollouts onto a compact set of task-relevant events, BPP substantially reduces distribution shift between training and deployment, without sacrificing expressivity. We evaluate BPP on four challenging real-world manipulation tasks and three simulation tasks, all requiring history conditioning. BPP achieves 70% higher success rates than the best comparison on real-world evaluations.
BPP:通过聚焦关键历史帧实现长上下文机器人模仿学习 / BPP: Long-Context Robot Imitation Learning by Focusing on Key History Frames
本文提出了一种名为BPP的新方法,它利用视觉语言模型自动识别任务中的关键帧,让机器人只关注这些有意义的过去时刻,从而有效解决了传统方法在依赖历史观察时容易出错、难以推广到新场景的问题,并在多项真实和模拟任务中取得了显著更好的效果。
源自 arXiv: 2602.15010