菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-17
📄 Abstract - Knowledge Localization in Mixture-of-Experts LLMs Using Cross-Lingual Inconsistency

Modern LLMs continue to exhibit significant variance in behavior across languages, such as being able to recall factual information in some languages but not others. While typically studied as a problem to be mitigated, in this work, we propose leveraging this cross-lingual inconsistency as a tool for interpretability in mixture-of-experts (MoE) LLMs. Our knowledge localization framework contrasts routing for sets of languages where the model correctly recalls information from languages where it fails. This allows us to isolate model components that play a functional role in answering about a piece of knowledge. Our method proceeds in two stages: (1) querying the model with difficult factual questions across a diverse set of languages to generate "success" and "failure" activation buckets and then (2) applying a statistical contrastive analysis to the MoE router logits to identify experts important for knowledge. To validate the necessity of this small number of experts for answering a knowledge question, we deactivate them and re-ask the question. We find that despite only deactivating about 20 out of 6000 experts, the model no longer answers correctly in over 40% of cases. Generally, this method provides a realistic and scalable knowledge localization approach to address increasingly complex LLMs.

顶级标签: llm model evaluation natural language processing
详细标签: mixture-of-experts interpretability cross-lingual knowledge localization router analysis 或 搜索:

利用跨语言不一致性在专家混合大模型中定位知识 / Knowledge Localization in Mixture-of-Experts LLMs Using Cross-Lingual Inconsistency


1️⃣ 一句话总结

这篇论文提出了一种新方法,利用大语言模型在不同语言中回答事实问题的能力差异,来精确定位其内部负责存储特定知识的专家模块,从而帮助理解模型的工作原理。

源自 arXiv: 2603.17102