菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-04
📄 Abstract - Partial Ring Scan: Revisiting Scan Order in Vision State Space Models

State Space Models (SSMs) have emerged as efficient alternatives to attention for vision tasks, offering lineartime sequence processing with competitive accuracy. Vision SSMs, however, require serializing 2D images into 1D token sequences along a predefined scan order, a factor often overlooked. We show that scan order critically affects performance by altering spatial adjacency, fracturing object continuity, and amplifying degradation under geometric transformations such as rotation. We present Partial RIng Scan Mamba (PRISMamba), a rotation-robust traversal that partitions an image into concentric rings, performs order-agnostic aggregation within each ring, and propagates context across rings through a set of short radial SSMs. Efficiency is further improved via partial channel filtering, which routes only the most informative channels through the recurrent ring pathway while keeping the rest on a lightweight residual branch. On ImageNet-1K, PRISMamba achieves 84.5% Top-1 with 3.9G FLOPs and 3,054 img/s on A100, outperforming VMamba in both accuracy and throughput while requiring fewer FLOPs. It also maintains performance under rotation, whereas fixed-path scans drop by 1~2%. These results highlight scan-order design, together with channel filtering, as a crucial, underexplored factor for accuracy, efficiency, and rotation robustness in Vision SSMs. Code will be released upon acceptance.

顶级标签: computer vision model training systems
详细标签: state space models scan order rotation robustness vision transformers efficient architecture 或 搜索:

部分环形扫描:重新审视视觉状态空间模型中的扫描顺序 / Partial Ring Scan: Revisiting Scan Order in Vision State Space Models


1️⃣ 一句话总结

本文提出了一种名为PRISMamba的新方法,通过将图像分割成同心圆环并采用顺序无关的聚合方式来处理视觉数据,不仅提升了模型在图像识别任务上的准确率和速度,还显著增强了模型对图像旋转等几何变换的鲁棒性。

源自 arXiv: 2602.04170