LitePT:更轻便且更强大的点云Transformer / LitePT: Lighter Yet Stronger Point Transformer
1️⃣ 一句话总结
这篇论文提出了一种新的3D点云处理模型,它通过巧妙地在网络浅层使用卷积提取几何细节、在深层使用注意力机制捕捉语义信息,并引入一种无需训练的位置编码来保持空间结构,最终实现了模型参数量、运行速度和内存消耗的大幅降低,同时性能与当前最优模型相当甚至更优。
Modern neural architectures for 3D point cloud processing contain both convolutional layers and attention blocks, but the best way to assemble them remains unclear. We analyse the role of different computational blocks in 3D point cloud networks and find an intuitive behaviour: convolution is adequate to extract low-level geometry at high-resolution in early layers, where attention is expensive without bringing any benefits; attention captures high-level semantics and context in low-resolution, deep layers more efficiently. Guided by this design principle, we propose a new, improved 3D point cloud backbone that employs convolutions in early stages and switches to attention for deeper layers. To avoid the loss of spatial layout information when discarding redundant convolution layers, we introduce a novel, training-free 3D positional encoding, PointROPE. The resulting LitePT model has $3.6\times$ fewer parameters, runs $2\times$ faster, and uses $2\times$ less memory than the state-of-the-art Point Transformer V3, but nonetheless matches or even outperforms it on a range of tasks and datasets. Code and models are available at: this https URL.
LitePT:更轻便且更强大的点云Transformer / LitePT: Lighter Yet Stronger Point Transformer
这篇论文提出了一种新的3D点云处理模型,它通过巧妙地在网络浅层使用卷积提取几何细节、在深层使用注意力机制捕捉语义信息,并引入一种无需训练的位置编码来保持空间结构,最终实现了模型参数量、运行速度和内存消耗的大幅降低,同时性能与当前最优模型相当甚至更优。
源自 arXiv: 2512.13689