学习结构化潜变量点以提升机器人操作中的高效视觉表征 / Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation
1️⃣ 一句话总结
该论文提出了一种混合视觉表征方法,通过将点云潜变量与点状变分自编码器结合,学习既能保留粗糙形状与语义信息、又具有结构化先验的紧凑潜变量点,同时设计了一个轻量渲染管线,在机器人操作任务中显著提升了成功率、样本效率和鲁棒性。
Current 3D-aware pretraining methods for embodied perception and manipulation are largely built on differentiable rendering frameworks, producing either fully implicit neural fields or fully explicit geometric primitives. Implicit representations, while expressive, lack explicit structural cues, whereas explicit ones preserve geometry but suffer from resolution limits and weak generalization. To address these limitations, we propose a novel pretraining framework that learns a hybrid representation-structural latent points. Specifically, we insert a point-wise latent variational autoencoder into the latent space of a point-cloud autoencoder, jointly regularizing point-wise features and coordinates toward a Gaussian prior. The resulting compact latent preserves coarse structural tendencies, which do not encode precise geometry but capture richer rough shape and semantic information, effectively combining the expressiveness of implicit representations with the structural priors of explicit ones. In addition, informed by shared design choices in prior work, we develop a streamlined, efficient 3DGS-based rendering pipeline that is deliberately kept lightweight, improving efficiency while leaving greater representational capacity to the front-end latent module. Extensive evaluations on RLBench, ManiSkill2, and a real-robot platform demonstrate consistent gains in task success, sample efficiency, and robustness to viewpoint and scene variations over strong baselines. Ablation studies further confirm that each component of our framework is critical to overall performance.
学习结构化潜变量点以提升机器人操作中的高效视觉表征 / Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation
该论文提出了一种混合视觉表征方法,通过将点云潜变量与点状变分自编码器结合,学习既能保留粗糙形状与语义信息、又具有结构化先验的紧凑潜变量点,同时设计了一个轻量渲染管线,在机器人操作任务中显著提升了成功率、样本效率和鲁棒性。
源自 arXiv: 2605.21258