MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA

📄 Abstract - MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA

Iterative retrieval-reasoning agents have recently shown promise for multimodal long-document question answering. However, most existing systems maintain a single growing context that mixes retrieval traces, observations, and intermediate reasoning. As interactions accumulate, key evidence becomes scattered and diluted, making multi-hop reasoning noisy. We propose MARDoc, a Memory-Aware Refinement Agent framework that decouples long-document QA into three specialized agents: an Explorer for multi-granularity multimodal retrieval, a Refiner for distilling interaction traces into structured evidence and reasoning memories, and a Reflector for checking evidence sufficiency and providing targeted feedback. Across iterations, the agents rely on a dynamically updated structured memory rather than a full accumulated interaction history. This design reduces context noise while preserving answer-critical facts and their logical dependencies. Experiments on MMLongBench-Doc and DocBench show that MARDoc achieves strong results, outperforming same-backbone baselines and demonstrating the effectiveness of structured memory for agentic document QA.

MARDoc：面向多模态长文档问答的忆感知精炼智能体框架 / MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA

1️⃣ 一句话总结

本文提出了一种名为MARDoc的多智能体框架，通过将文档问答任务分解为检索、精炼和反思三个专业化角色，并利用结构化记忆代替杂乱的历史记录，有效解决了长文档中证据分散、推理易受干扰的问题，从而显著提升了复杂多步问答的准确性和可靠性。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要