FlashFPS:通过剪枝与缓存实现大规模点云的高效最远点采样 / FlashFPS: Efficient Farthest Point Sampling for Large-Scale Point Clouds via Pruning and Caching
1️⃣ 一句话总结
这篇论文提出了一种名为FlashFPS的新方法,通过识别并消除点云神经网络中最远点采样操作中的三层计算冗余,实现了显著的加速效果,从而让处理大规模点云数据变得更快、更高效。
Point-based Neural Networks (PNNs) have become a key approach for point cloud processing. However, a core operation in these models, Farthest Point Sampling (FPS), often introduces significant inference latency, especially for large-scale processing. Despite existing CUDA- and hardware-level optimizations, FPS remains a major bottleneck due to exhaustive computations across multiple network layers in PNNs, which hinders scalability. Through systematic analysis, we identify three substantial redundancies in FPS, including unnecessary full-cloud computations, redundant late-stage iterations, and predictable inter-layer outputs that make later FPS computations avoidable. To address these, we propose \textbf{\textit{FlashFPS}}, a hardware-agnostic, plug-and-play framework for FPS acceleration, composed of \textit{FPS-Prune} and \textit{FPS-Cache}. \textit{FPS-Prune} introduces candidate pruning and iteration pruning to reduce redundant computations in FPS while preserving sampling quality, and \textit{FPS-Cache} eliminates layer-wise redundancy via cache-and-reuse. Integrated into existing CUDA libraries and state-of-the-art PNN accelerators, \textit{FlashFPS} achieves 5.16$\times$ speedup over the standard CUDA baseline on GPU and 2.69$\times$ on PNN accelerators, with negligible accuracy loss, enabling efficient and scalable PNN inference. Codes are released at this https URL.
FlashFPS:通过剪枝与缓存实现大规模点云的高效最远点采样 / FlashFPS: Efficient Farthest Point Sampling for Large-Scale Point Clouds via Pruning and Caching
这篇论文提出了一种名为FlashFPS的新方法,通过识别并消除点云神经网络中最远点采样操作中的三层计算冗余,实现了显著的加速效果,从而让处理大规模点云数据变得更快、更高效。
源自 arXiv: 2604.17720