基于流表示实现人类演示泛化的少样本模仿学习 / Flow-Enabled Generalization to Human Demonstrations in Few-Shot Imitation Learning
1️⃣ 一句话总结
这篇论文提出了一种名为SFCrP的新方法,它通过预测场景流并结合裁剪点云来指导机器人策略,使得机器人仅需少量自身演示就能学习并泛化到仅出现在人类视频中的新任务场景,从而显著降低了模仿学习对大量机器人演示数据的依赖。
Imitation Learning (IL) enables robots to learn complex skills from demonstrations without explicit task modeling, but it typically requires large amounts of demonstrations, creating significant collection costs. Prior work has investigated using flow as an intermediate representation to enable the use of human videos as a substitute, thereby reducing the amount of required robot demonstrations. However, most prior work has focused on the flow, either on the object or on specific points of the robot/hand, which cannot describe the motion of interaction. Meanwhile, relying on flow to achieve generalization to scenarios observed only in human videos remains limited, as flow alone cannot capture precise motion details. Furthermore, conditioning on scene observation to produce precise actions may cause the flow-conditioned policy to overfit to training tasks and weaken the generalization indicated by the flow. To address these gaps, we propose SFCrP, which includes a Scene Flow prediction model for Cross-embodiment learning (SFCr) and a Flow and Cropped point cloud conditioned Policy (FCrP). SFCr learns from both robot and human videos and predicts any point trajectories. FCrP follows the general flow motion and adjusts the action based on observations for precision tasks. Our method outperforms SOTA baselines across various real-world task settings, while also exhibiting strong spatial and instance generalization to scenarios seen only in human videos.
基于流表示实现人类演示泛化的少样本模仿学习 / Flow-Enabled Generalization to Human Demonstrations in Few-Shot Imitation Learning
这篇论文提出了一种名为SFCrP的新方法,它通过预测场景流并结合裁剪点云来指导机器人策略,使得机器人仅需少量自身演示就能学习并泛化到仅出现在人类视频中的新任务场景,从而显著降低了模仿学习对大量机器人演示数据的依赖。
源自 arXiv: 2602.10594