PAWS:基于大规模第一人称视角视频的野外物体关节感知 / PAWS: Perception of Articulation in the Wild at Scale from Egocentric Videos
1️⃣ 一句话总结
这篇论文提出了一种名为PAWS的新方法,它能够直接从海量、未经标注的第一人称视角视频中,通过分析人手与物体的交互,自动学习并提取出抽屉、柜门等可活动物体的运动方式和结构,有效解决了以往方法依赖大量人工标注数据的瓶颈,并证明了其在机器人操作等下游任务中的实用价值。
Articulation perception aims to recover the motion and structure of articulated objects (e.g., drawers and cupboards), and is fundamental to 3D scene understanding in robotics, simulation, and animation. Existing learning-based methods rely heavily on supervised training with high-quality 3D data and manual annotations, limiting scalability and diversity. To address this limitation, we propose PAWS, a method that directly extracts object articulations from hand-object interactions in large-scale in-the-wild egocentric videos. We evaluate our method on the public data sets, including HD-EPIC and Arti4D data sets, achieving significant improvements over baselines. We further demonstrate that the extracted articulations benefit downstream tasks, including fine-tuning 3D articulation prediction models and enabling robot manipulation. See the project website at this https URL.
PAWS:基于大规模第一人称视角视频的野外物体关节感知 / PAWS: Perception of Articulation in the Wild at Scale from Egocentric Videos
这篇论文提出了一种名为PAWS的新方法,它能够直接从海量、未经标注的第一人称视角视频中,通过分析人手与物体的交互,自动学习并提取出抽屉、柜门等可活动物体的运动方式和结构,有效解决了以往方法依赖大量人工标注数据的瓶颈,并证明了其在机器人操作等下游任务中的实用价值。
源自 arXiv: 2603.25539