高斯场中的抓握:快速单目重建动态手物交互 / Grasp in Gaussians: Fast Monocular Reconstruction of Dynamic Hand-Object Interactions
1️⃣ 一句话总结
这篇论文提出了一种名为GraG的新方法,它能够仅用一部普通手机拍摄的视频,就快速、稳定地重建出人手与物体在三维空间中的动态交互过程,其核心是使用了一种轻量化的‘高斯和’表示法来高效追踪运动,速度比之前的方法快6倍以上,同时精度也更高。
We present Grasp in Gaussians (GraG), a fast and robust method for reconstructing dynamic 3D hand-object interactions from a single monocular video. Unlike recent approaches that optimize heavy neural representations, our method focuses on tracking the hand and the object efficiently, once initialized from pretrained large models. Our key insight is that accurate and temporally stable hand-object motion can be recovered using a compact Sum-of-Gaussians (SoG) representation, revived from classical tracking literature and integrated with generative Gaussian-based initializations. We initialize object pose and geometry using a video-adapted SAM3D pipeline, then convert the resulting dense Gaussian representation into a lightweight SoG via subsampling. This compact representation enables efficient and fast tracking while preserving geometric fidelity. For the hand, we adopt a complementary strategy: starting from off-the-shelf monocular hand pose initialization, we refine hand motion using simple yet effective 2D joint and depth alignment losses, avoiding per-frame refinement of a detailed 3D hand appearance model while maintaining stable articulation. Extensive experiments on public benchmarks demonstrate that GraG reconstructs temporally coherent hand-object interactions on long sequences 6.4x faster than prior work while improving object reconstruction by 13.4% and reducing hand's per-joint position error by over 65%.
高斯场中的抓握:快速单目重建动态手物交互 / Grasp in Gaussians: Fast Monocular Reconstruction of Dynamic Hand-Object Interactions
这篇论文提出了一种名为GraG的新方法,它能够仅用一部普通手机拍摄的视频,就快速、稳定地重建出人手与物体在三维空间中的动态交互过程,其核心是使用了一种轻量化的‘高斯和’表示法来高效追踪运动,速度比之前的方法快6倍以上,同时精度也更高。
源自 arXiv: 2604.12929