TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization

📄 Abstract - TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization

Localizing objects and parts from natural language in 3D space is essential for robotics, AR, and embodied AI, yet existing methods face a trade-off between the accuracy and geometric consistency of per-scene optimization and the efficiency of feed-forward inference. We present TrianguLang, a feed-forward framework for 3D localization that requires no camera calibration at inference. Unlike prior methods that treat views independently, we introduce Geometry-Aware Semantic Attention (GASA), which utilizes predicted geometry to gate cross-view feature correspondence, suppressing semantically plausible but geometrically inconsistent matches without requiring ground-truth poses. Validated on five benchmarks including ScanNet++ and uCO3D, TrianguLang achieves state-of-the-art feed-forward text-guided segmentation and localization, reducing user effort from $O(N)$ clicks to a single text query. The model processes each frame at 1008x1008 resolution in $\sim$57ms ($\sim$18 FPS) without optimization, enabling practical deployment for interactive robotics and AR applications. Code and checkpoints are available at this https URL.

TrianguLang：用于无姿态三维定位的几何感知语义共识 / TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization

1️⃣ 一句话总结

这篇论文提出了一种名为TrianguLang的新方法，它无需相机校准就能通过文字描述快速、准确地定位3D场景中的物体或部件，在保持几何一致性的同时实现了高效的前馈推理，为机器人和增强现实等应用提供了实用工具。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要