TrianguLang:用于无姿态三维定位的几何感知语义共识 / TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization
1️⃣ 一句话总结
这篇论文提出了一种名为TrianguLang的新方法,它无需相机校准就能通过文字描述快速、准确地定位3D场景中的物体或部件,在保持几何一致性的同时实现了高效的前馈推理,为机器人和增强现实等应用提供了实用工具。
Localizing objects and parts from natural language in 3D space is essential for robotics, AR, and embodied AI, yet existing methods face a trade-off between the accuracy and geometric consistency of per-scene optimization and the efficiency of feed-forward inference. We present TrianguLang, a feed-forward framework for 3D localization that requires no camera calibration at inference. Unlike prior methods that treat views independently, we introduce Geometry-Aware Semantic Attention (GASA), which utilizes predicted geometry to gate cross-view feature correspondence, suppressing semantically plausible but geometrically inconsistent matches without requiring ground-truth poses. Validated on five benchmarks including ScanNet++ and uCO3D, TrianguLang achieves state-of-the-art feed-forward text-guided segmentation and localization, reducing user effort from $O(N)$ clicks to a single text query. The model processes each frame at 1008x1008 resolution in $\sim$57ms ($\sim$18 FPS) without optimization, enabling practical deployment for interactive robotics and AR applications. Code and checkpoints are available at this https URL.
TrianguLang:用于无姿态三维定位的几何感知语义共识 / TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization
这篇论文提出了一种名为TrianguLang的新方法,它无需相机校准就能通过文字描述快速、准确地定位3D场景中的物体或部件,在保持几何一致性的同时实现了高效的前馈推理,为机器人和增强现实等应用提供了实用工具。
源自 arXiv: 2603.08096