菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-11
📄 Abstract - Pointy - A Lightweight Transformer for Point Cloud Foundation Models

Foundation models for point cloud data have recently grown in capability, often leveraging extensive representation learning from language or vision. In this work, we take a more controlled approach by introducing a lightweight transformer-based point cloud architecture. In contrast to the heavy reliance on cross-modal supervision, our model is trained only on 39k point clouds - yet it outperforms several larger foundation models trained on over 200k training samples. Interestingly, our method approaches state-of-the-art results from models that have seen over a million point clouds, images, and text samples, demonstrating the value of a carefully curated training setup and architecture. To ensure rigorous evaluation, we conduct a comprehensive replication study that standardizes the training regime and benchmarks across multiple point cloud architectures. This unified experimental framework isolates the impact of architectural choices, allowing for transparent comparisons and highlighting the benefits of our design and other tokenizer-free architectures. Our results show that simple backbones can deliver competitive results to more complex or data-rich strategies. The implementation, including code, pre-trained models, and training protocols, is available at this https URL.

顶级标签: computer vision model training model evaluation
详细标签: point cloud transformer lightweight architecture foundation models benchmarking 或 搜索:

Pointy - 一种用于点云基础模型的轻量级Transformer / Pointy - A Lightweight Transformer for Point Cloud Foundation Models


1️⃣ 一句话总结

这篇论文提出了一种名为Pointy的轻量级Transformer架构,它仅使用少量点云数据进行训练,就能在性能上超越许多使用海量多模态数据训练的大型基础模型,证明了精心设计的模型架构和训练方案比单纯堆砌数据规模更为有效。

源自 arXiv: 2603.10963