菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-26
📄 Abstract - Towards Foundation Models for 3D Scene Understanding: Instance-Aware Self-Supervised Learning for Point Clouds

Recent advances in self-supervised learning (SSL) for point clouds have substantially improved 3D scene understanding without human annotations. Existing approaches emphasize semantic awareness by enforcing feature consistency across augmented views or by masked scene modeling. However, the resulting representations transfer poorly to instance localization, and often require full finetuning for strong performance. Instance awareness is a fundamental component of 3D perception, thus bridging this gap is crucial for progressing toward true 3D foundation models that support all downstream tasks on 3D data. In this work, we introduce PointINS, an instance-oriented self-supervised framework that enriches point cloud representations through geometry-aware learning. PointINS employs an orthogonal offset branch to jointly learn high-level semantic understanding and geometric reasoning, yielding instance awareness. We identify two consistent properties essential for robust instance localization and formulate them as complementary regularization strategies, Offset Distribution Regularization (ODR), which aligns predicted offsets with empirically observed geometric priors, and Spatial Clustering Regularization (SCR), which enforces local coherence by regularizing offsets with pseudo-instance masks. Through extensive experiments across five datasets, PointINS achieves on average +3.5% mAP improvement for indoor instance segmentation and +4.1% PQ gain for outdoor panoptic segmentation, paving the way for scalable 3D foundation models.

顶级标签: computer vision model training machine learning
详细标签: 3d scene understanding self-supervised learning point clouds instance segmentation geometric reasoning 或 搜索:

迈向三维场景理解的基础模型:面向点云的实例感知自监督学习 / Towards Foundation Models for 3D Scene Understanding: Instance-Aware Self-Supervised Learning for Point Clouds


1️⃣ 一句话总结

这篇论文提出了一个名为PointINS的自监督学习框架,它通过几何感知学习让点云模型不仅能理解物体类别,还能自动识别和定位单个物体实例,从而为构建通用的三维人工智能基础模型迈出了关键一步。

源自 arXiv: 2603.25165