EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric Videos

📄 Abstract - EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric Videos

Egocentric memory is widely used in embodied intelligence, but it may be insufficient for comprehensive spatial-temporal reasoning. Inspired by human recall from both field and observer perspectives, we introduce EgoExoMem, the first benchmark for cross-view memory reasoning over synchronized egocentric and exocentric videos. EgoExoMem contains $2.6K$ high-quality MCQs across eight temporal, spatial, and cross-view QA types. To support dual-view retrieval, we propose E$^2$-Select, a training-free frame selection method for synchronized ego-exo videos. It combines relevance-based budget allocation with per-view k-DPP sampling to handle view asymmetry and cross-view temporal consistency. Experiments show that ego and exo views provide complementary memory cues, while existing MLLMs remain far from solving the benchmark: the best model reaches only $55.3\%$. E$^2$-Select achieves state-of-the-art performance of $58.2\%$ over frame-selection and RAG-based memory baselines. Further analysis reveals systematic view-preference conflicts between question framing and answer grounding, underscoring the novelty and challenge of cross-view memory reasoning.

EgoExoMem：同步第一人称与第三人称视频的跨视角记忆推理 / EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric Videos

1️⃣ 一句话总结

本文提出了一个名为EgoExoMem的新基准，用于测试AI模型在同步的第一人称（亲眼所见）和第三人称（旁观者视角）视频中，结合两种视角进行时空记忆推理的能力，并设计了E²-Select方法，无需训练即可高效筛选双视角视频帧，实验表明现有模型表现远未达到人类水平，且存在视角偏好冲突问题。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要