SO3UFormer:学习用于旋转鲁棒全景分割的本征球面特征 / SO3UFormer: Learning Intrinsic Spherical Features for Rotation-Robust Panoramic Segmentation
1️⃣ 一句话总结
这篇论文提出了一种名为SO3UFormer的新模型,它通过设计不依赖特定坐标系的球面特征学习方法,解决了现有全景分割模型在相机发生任意旋转时性能急剧下降的问题,显著提升了模型在真实动态场景下的鲁棒性。
Panoramic semantic segmentation models are typically trained under a strict gravity-aligned assumption. However, real-world captures often deviate from this canonical orientation due to unconstrained camera motions, such as the rotational jitter of handheld devices or the dynamic attitude shifts of aerial platforms. This discrepancy causes standard spherical Transformers to overfit global latitude cues, leading to performance collapse under 3D reorientations. To address this, we introduce SO3UFormer, a rotation-robust architecture designed to learn intrinsic spherical features that are less sensitive to the underlying coordinate frame. Our approach rests on three geometric pillars: (1) an intrinsic feature formulation that decouples the representation from the gravity vector by removing absolute latitude encoding; (2) quadrature-consistent spherical attention that accounts for non-uniform sampling densities; and (3) a gauge-aware relative positional mechanism that encodes local angular geometry using tangent-plane projected angles and discrete gauge pooling, avoiding reliance on global axes. We further use index-based spherical resampling together with a logit-level SO(3)-consistency regularizer during training. To rigorously benchmark robustness, we introduce Pose35, a dataset variant of Stanford2D3D perturbed by random rotations within $\pm 35^\circ$. Under the extreme test of arbitrary full SO(3) rotations, existing SOTAs fail catastrophically: the baseline SphereUFormer drops from 67.53 mIoU to 25.26 mIoU. In contrast, SO3UFormer demonstrates remarkable stability, achieving 72.03 mIoU on Pose35 and retaining 70.67 mIoU under full SO(3) rotations.
SO3UFormer:学习用于旋转鲁棒全景分割的本征球面特征 / SO3UFormer: Learning Intrinsic Spherical Features for Rotation-Robust Panoramic Segmentation
这篇论文提出了一种名为SO3UFormer的新模型,它通过设计不依赖特定坐标系的球面特征学习方法,解决了现有全景分割模型在相机发生任意旋转时性能急剧下降的问题,显著提升了模型在真实动态场景下的鲁棒性。
源自 arXiv: 2602.22867