菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-02
📄 Abstract - EvoMemNav: Efficient Self-Evolving Fine-Grained Memory for Zero-Shot Embodied Navigation

Building memory is essential for long-horizon planning in zero-shot embodied navigation. Detector-centric scene graphs often compress observations into sparse nodes, discarding fine-grained visual evidence and accumulating noise, while 3D reconstruction-based methods remain computationally prohibitive. We present EvoMemNav, an efficient, self-evolving, fine-grained memory framework for zero-shot embodied navigation. EvoMemNav constructs a Visual-Semantic Memory Graph (VSMGraph) that keeps raw views as first-class memory and organizes them with lightweight semantic cues and topological relations into a room-view-object hierarchy, preserving fine-grained details for disambiguation and Stop verification. To scale to growing memory, we introduce a budgeted coarse-to-fine policy: a coarse stage compresses the search space into promising regions, and a fine stage invokes a VLM only for targeted verification and decision. Beyond static memories, EvoMemNav performs reflection-driven write-back after each subtask, updating graph-attached priors that encode accumulated environmental knowledge to refine future decisions without retraining. Experiments on GOAT-Bench and HM3D across object, text-description, and image-goal modalities show consistent gains in SR/SPL, with better multi-instance disambiguation, fewer premature stops, and stronger zero-shot generalization.

顶级标签: robotics agents computer vision
详细标签: embodied navigation memory graph zero-shot visual-language model hierarchical planning 或 搜索:

EvoMemNav:面向零样本具身导航的高效自进化细粒度记忆框架 / EvoMemNav: Efficient Self-Evolving Fine-Grained Memory for Zero-Shot Embodied Navigation


1️⃣ 一句话总结

本文提出了一种名为EvoMemNav的新方法,通过构建视觉-语义记忆图,让机器人在未经过专门训练的情况下,高效地记住并利用细粒度的视觉信息,从而在复杂环境中更准确地找到目标物体,解决了现有方法记忆粗糙或计算量过大的问题。

源自 arXiv: 2606.03509