RI-Mamba:用于鲁棒文本到形状检索的旋转不变Mamba模型 / RI-Mamba: Rotation-Invariant Mamba for Robust Text-to-Shape Retrieval
1️⃣ 一句话总结
本文提出了一种名为RI-Mamba的新型旋转不变模型,它能够直接从任意方向的三维点云中提取几何特征,并结合高效的跨模态学习,实现了在包含200多个类别的大规模三维模型库中,仅通过文字描述就能准确、鲁棒地检索出对应三维形状的目标。
3D assets have rapidly expanded in quantity and diversity due to the growing popularity of virtual reality and gaming. As a result, text-to-shape retrieval has become essential in facilitating intuitive search within large repositories. However, existing methods require canonical poses and support few object categories, limiting their real-world applicability where objects can belong to diverse classes and appear in random orientations. To address this challenge, we propose RI-Mamba, the first rotation-invariant state-space model for point clouds. RI-Mamba defines global and local reference frames to disentangle pose from geometry and uses Hilbert sorting to construct token sequences with meaningful geometric structure while maintaining rotation invariance. We further introduce a novel strategy to compute orientational embeddings and reintegrate them via feature-wise linear modulation, effectively recovering spatial context and enhancing model expressiveness. Our strategy is inherently compatible with state-space models and operates in linear time. To scale up retrieval, we adopt cross-modal contrastive learning with automated triplet generation, allowing training on diverse datasets without manual annotation. Extensive experiments demonstrate RI-Mamba's superior representational capacity and robustness, achieving state-of-the-art performance on the OmniObject3D benchmark across more than 200 object categories under arbitrary orientations. Our code will be made available at this https URL.
RI-Mamba:用于鲁棒文本到形状检索的旋转不变Mamba模型 / RI-Mamba: Rotation-Invariant Mamba for Robust Text-to-Shape Retrieval
本文提出了一种名为RI-Mamba的新型旋转不变模型,它能够直接从任意方向的三维点云中提取几何特征,并结合高效的跨模态学习,实现了在包含200多个类别的大规模三维模型库中,仅通过文字描述就能准确、鲁棒地检索出对应三维形状的目标。
源自 arXiv: 2602.11673