菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-09
📄 Abstract - STaR: Scalable Task-Conditioned Retrieval for Long-Horizon Multimodal Robot Memory

Mobile robots are often deployed over long durations in diverse open, dynamic scenes, including indoor setting such as warehouses and manufacturing facilities, and outdoor settings such as agricultural and roadway operations. A core challenge is to build a scalable long-horizon memory that supports an agentic workflow for planning, retrieval, and reasoning over open-ended instructions at variable granularity, while producing precise, actionable answers for navigation. We present STaR, an agentic reasoning framework that (i) constructs a task-agnostic, multimodal long-term memory that generalizes to unseen queries while preserving fine-grained environmental semantics (object attributes, spatial relations, and dynamic events), and (ii) introduces a Scalable Task Conditioned Retrieval algorithm based on the Information Bottleneck principle to extract from long-term memory a compact, non-redundant, information-rich set of candidate memories for contextual reasoning. We evaluate STaR on NaVQA (mixed indoor/outdoor campus scenes) and WH-VQA, a customized warehouse benchmark with many visually similar objects built with Isaac Sim, emphasizing contextual reasoning. Across the two datasets, STaR consistently outperforms strong baselines, achieving higher success rates and markedly lower spatial error. We further deploy STaR on a real Husky wheeled robot in both indoor and outdoor environments, demonstrating robust long horizon reasoning, scalability, and practical utility. Project Website: this https URL

顶级标签: robotics agents multi-modal
详细标签: robot memory information retrieval long-horizon reasoning multimodal memory task-conditioned retrieval 或 搜索:

STaR:面向长时程多模态机器人记忆的可扩展任务条件检索 / STaR: Scalable Task-Conditioned Retrieval for Long-Horizon Multimodal Robot Memory


1️⃣ 一句话总结

这篇论文提出了一个名为STaR的智能机器人记忆与推理框架,它通过构建一个通用的多模态长期记忆库,并结合一种基于信息瓶颈原理的可扩展检索算法,使机器人能够在复杂多变的环境中高效地根据任务指令检索关键记忆信息,从而进行精确的导航和决策,并在仿真与真实场景中验证了其优越性能。

源自 arXiv: 2602.09255