环境扩散策略:从次优数据中学习机器人模仿 / Ambient Diffusion Policy: Imitation Learning from Suboptimal Data in Robotics
1️⃣ 一句话总结
本文提出一种名为“环境扩散策略”的简单且原则性强的机器人模仿学习方法,通过仅在扩散过程的高噪声和低噪声阶段利用次优数据,有效提取其中的有用特征而屏蔽有害干扰,从而显著提升了对低质量、任务不匹配等次优演示数据的利用效率,在多种真实场景下优于现有方法。
We propose Ambient Diffusion Policy, a simple and principled method for imitation learning from suboptimal data in robotics. High-quality, task-specific robot data is expensive and time-consuming to collect, while suboptimal datasets with lower-quality or out-of-distribution demonstrations are abundant. Existing methods that co-train on both data sources in robotics often fail to separate the meaningful and the harmful features in the suboptimal samples. In contrast, our method extracts only the useful features by introducing a new axis to co-training in robotics: noise-dependent data usage. Ambient Diffusion Policy restricts the contribution of suboptimal data during training to only the high and low diffusion times. To rigorously justify our approach, we first observe that robot action data exhibits a spectral power law. This induces two important properties on the optimal Diffusion Policy that we exploit: a global-to-local hierarchy and locality. We theoretically formalize this discussion using a simplified model. Our experiments validate Ambient Diffusion Policy on four types of suboptimal action data (noisy trajectories, sim-to-real gap, task mismatch, and large-scale data mixtures) across six tasks. The results show that it effectively learns from arbitrary sources of suboptimal data. Notably, it outperforms existing co-training baselines by up to 33% when scaled to Open X-Embodiment - a large dataset with heterogeneous data quality and unstructured distribution shifts. Overall, Ambient Diffusion Policy increases the utility of suboptimal demonstrations and expands the set of usable data sources in robotics.
环境扩散策略:从次优数据中学习机器人模仿 / Ambient Diffusion Policy: Imitation Learning from Suboptimal Data in Robotics
本文提出一种名为“环境扩散策略”的简单且原则性强的机器人模仿学习方法,通过仅在扩散过程的高噪声和低噪声阶段利用次优数据,有效提取其中的有用特征而屏蔽有害干扰,从而显著提升了对低质量、任务不匹配等次优演示数据的利用效率,在多种真实场景下优于现有方法。
源自 arXiv: 2606.12365