← 返回列表

arXiv 提交日期: 2026-05-25

📄 Abstract - On the Limits of Model Merging for Multilinguality in Pre-Training

Endowing models with consistent multilingual performance can be achieved by mixing pre-training data, or post-training approaches such as language-specific model merging. In this work, we test whether merging can be applied to monolingually pre-trained models. We conduct a controlled study on the efficacy of mixed, merged, and monolingual pre-training setups. We find that while monolingual pre-training results in strong in-language performance, merging any combination of monolingual models leads to performance collapse due to interference. Our analysis suggests representational similarity is a prerequisite for model merging. We therefore conclude that the flexibility of merging in fine-tuning does not extend trivially to language-specific pre-training.

顶级标签: llm model training

预训练中多语言能力的模型融合局限性研究 / On the Limits of Model Merging for Multilinguality in Pre-Training

1️⃣ 一句话总结

本文通过实验发现，将针对不同语言单独预训练的模型直接合并，会导致性能急剧下降，原因是不同语言模型的内部表示差异过大，相互干扰；而混合多语言数据训练才是保持多语言能力的可靠方法。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2605.25846

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要