菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-09
📄 Abstract - Dexterous Manipulation Policies from RGB Human Videos via 4D Hand-Object Trajectory Reconstruction

Multi-finger robotic hand manipulation and grasping are challenging due to the high-dimensional action space and the difficulty of acquiring large-scale training data. Existing approaches largely rely on human teleoperation with wearable devices or specialized sensing equipment to capture hand-object interactions, which limits scalability. In this work, we propose VIDEOMANIP, a device-free framework that learns dexterous manipulation directly from RGB human videos. Leveraging recent advances in computer vision, VIDEOMANIP reconstructs explicit 4D robot-object trajectories from monocular videos by estimating human hand poses, object meshes, and retargets the reconstructed human motions to robotic hands for manipulation learning. To make the reconstructed robot data suitable for dexterous manipulation training, we introduce hand-object contact optimization with interaction-centric grasp modeling, as well as a demonstration synthesis strategy that generates diverse training trajectories from a single video, enabling generalizable policy learning without additional robot demonstrations. In simulation, the learned grasping model achieves a 70.25% success rate across 20 diverse objects using the Inspire Hand. In the real world, manipulation policies trained from RGB videos achieve an average 62.86% success rate across seven tasks using the LEAP Hand, outperforming retargeting-based methods by 15.87%. Project videos are available at this http URL.

顶级标签: robotics computer vision multi-modal
详细标签: dexterous manipulation motion retargeting human video learning hand-object reconstruction policy learning 或 搜索:

基于RGB人体视频与4D手-物体轨迹重建的灵巧操作策略 / Dexterous Manipulation Policies from RGB Human Videos via 4D Hand-Object Trajectory Reconstruction


1️⃣ 一句话总结

这篇论文提出了一种名为VIDEOMANIP的新方法,它能够直接从普通的RGB人体视频中学习并生成机器人灵巧操作策略,无需依赖复杂的穿戴设备或专门的传感器,从而降低了数据获取难度并提升了策略的泛化能力。

源自 arXiv: 2602.09013