菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-09
📄 Abstract - TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization

Localizing objects and parts from natural language in 3D space is essential for robotics, AR, and embodied AI, yet existing methods face a trade-off between the accuracy and geometric consistency of per-scene optimization and the efficiency of feed-forward inference. We present TrianguLang, a feed-forward framework for 3D localization that requires no camera calibration at inference. Unlike prior methods that treat views independently, we introduce Geometry-Aware Semantic Attention (GASA), which utilizes predicted geometry to gate cross-view feature correspondence, suppressing semantically plausible but geometrically inconsistent matches without requiring ground-truth poses. Validated on five benchmarks including ScanNet++ and uCO3D, TrianguLang achieves state-of-the-art feed-forward text-guided segmentation and localization, reducing user effort from $O(N)$ clicks to a single text query. The model processes each frame at 1008x1008 resolution in $\sim$57ms ($\sim$18 FPS) without optimization, enabling practical deployment for interactive robotics and AR applications. Code and checkpoints are available at this https URL.

顶级标签: computer vision robotics multi-modal
详细标签: 3d localization language grounding pose-free cross-view attention semantic segmentation 或 搜索:

TrianguLang:用于无姿态三维定位的几何感知语义共识 / TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization


1️⃣ 一句话总结

这篇论文提出了一种名为TrianguLang的新方法,它无需相机校准就能通过文字描述快速、准确地定位3D场景中的物体或部件,在保持几何一致性的同时实现了高效的前馈推理,为机器人和增强现实等应用提供了实用工具。

源自 arXiv: 2603.08096