All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

📄 Abstract - All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

Multilingual Retrieval-Augmented Generation (mRAG) leverages cross-lingual evidence to ground Large Language Models (LLMs) in global knowledge. However, we show that current mRAG systems suffer from a language bias during reranking, systematically favoring English and the query's native language. By introducing an estimated oracle evidence analysis, we quantify a substantial performance gap between existing rerankers and the achievable upper bound. Further analysis reveals a critical distributional mismatch: while optimal predictions require evidence scattered across multiple languages, current systems systematically suppress such ``answer-critical'' documents, thereby limiting downstream generation performance. To bridge this gap, we propose \textit{\textbf{L}anguage-\textbf{A}gnostic \textbf{U}tility-driven \textbf{R}eranker \textbf{A}lignment (LAURA)}, which aligns multilingual evidence ranking with downstream generative utility. Experiments across diverse languages and generation models show that LAURA effectively mitigates language bias and consistently improves mRAG performance.

所有语言都重要：理解并缓解多语言RAG中的语言偏见 / All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

1️⃣ 一句话总结

本文揭示了多语言检索增强生成（mRAG）系统中，重排序阶段存在偏向英语和查询语言的系统性偏见，导致跨语言的有用证据被压制，并提出了一种名为LAURA的新方法，通过让重排序器直接对齐下游生成效果，有效消除了这种语言偏见，显著提升了多语言问答的准确性。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要