菜单

🤖 系统
📄 Abstract - Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

MLLMs exhibit strong reasoning on isolated queries, yet they operate de novo -- solving each problem independently and often repeating the same mistakes. Existing memory-augmented agents mainly store past trajectories for reuse. However, trajectory-based memory suffers from brevity bias, gradually losing essential domain knowledge. More critically, even in truly multimodal problem-solving settings, it records only a single-modality trace of past behavior, failing to preserve how visual attention and logical reasoning jointly contributed to the solution. This is fundamentally misaligned with human cognition: semantic memory is both multimodal and integrated, preserving visual and abstract knowledge through coordinated but distinct representational streams. We thus introduce ViLoMem, a dual-stream memory framework that constructs compact, schema-based memory. It separately encodes visual distraction patterns and logical reasoning errors, enabling MLLMs to learn from their successful and failed experiences. Following a grow-and-refine principle, the system incrementally accumulates and updates multimodal semantic knowledge -- preserving stable, generalizable strategies while avoiding catastrophic forgetting. Across six multimodal benchmarks, ViLoMem consistently improves pass@1 accuracy and substantially reduces repeated visual and logical errors. Ablations confirm the necessity of dual-stream memory with explicit distraction--hallucination separation, demonstrating the value of error-aware multimodal memory for lifelong and cross-domain agentic learning. Our project page will be available at this https URL.

顶级标签: agents multi-modal model training
详细标签: multimodal memory error correction lifelong learning visual reasoning semantic schemas 或 搜索:

📄 论文总结

具有生长与精炼多模态语义记忆的自主学习者 / Agentic Learner with Grow-and-Refine Multimodal Semantic Memory


1️⃣ 一句话总结

这篇论文提出了一个名为ViLoMem的双流记忆框架,通过分别记录视觉分心模式和逻辑推理错误,帮助多模态大模型从成功和失败的经验中学习,从而在多种任务中持续提升准确率并减少重复错误。


📄 打开原文 PDF