菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-03
📄 Abstract - M3DR: Towards Universal Multilingual Multimodal Document Retrieval

Multimodal document retrieval systems have shown strong progress in aligning visual and textual content for semantic search. However, most existing approaches remain heavily English-centric, limiting their effectiveness in multilingual contexts. In this work, we present M3DR (Multilingual Multimodal Document Retrieval), a framework designed to bridge this gap across languages, enabling applicability across diverse linguistic and cultural contexts. M3DR leverages synthetic multilingual document data and generalizes across different vision-language architectures and model sizes, enabling robust cross-lingual and cross-modal alignment. Using contrastive training, our models learn unified representations for text and document images that transfer effectively across languages. We validate this capability on 22 typologically diverse languages, demonstrating consistent performance and adaptability across linguistic and script variations. We further introduce a comprehensive benchmark that captures real-world multilingual scenarios, evaluating models under monolingual, multilingual, and mixed-language settings. M3DR generalizes across both single dense vector and ColBERT-style token-level multi-vector retrieval paradigms. Our models, NetraEmbed and ColNetraEmbed achieve state-of-the-art performance with ~150% relative improvements on cross-lingual retrieval.

顶级标签: multi-modal natural language processing benchmark
详细标签: multilingual document retrieval synthetic data generation contrastive learning matryoshka representation cross-modal retrieval 或 搜索:

M3DR:一个用于多语言多模态文档检索的通用框架 / M3DR: Towards Universal Multilingual Multimodal Document Retrieval


1️⃣ 一句话总结

本文提出了M3DR框架,通过合成数据和对比学习,训练出能在22种语言上实现高效跨语言和跨模态文档检索的模型,并发布了首个综合性多语言多模态文档检索基准Nayana-IR。


2️⃣ 论文创新点

1. M3DR通用框架

2. 大规模合成数据生成流水线

3. Nayana-IR基准

4. Matryoshka表示学习与灵活部署

5. 支持多检索范式的通用框架


3️⃣ 主要结果与价值

结果亮点

实际价值


4️⃣ 术语表

源自 arXiv: 2512.03514