从大语言模型的机器翻译中分离语言与意义 / Disentangling meaning from language in LLM-based machine translation
1️⃣ 一句话总结
这篇论文通过分析大语言模型内部的注意力机制,发现翻译任务被分解为‘生成目标语言文本’和‘保留原句意义’两个独立的子任务,并分别由不同的注意力头负责,通过微调极少数相关头部就能实现高质量的指令无关翻译。
Mechanistic Interpretability (MI) seeks to explain how neural networks implement their capabilities, but the scale of Large Language Models (LLMs) has limited prior MI work in Machine Translation (MT) to word-level analyses. We study sentence-level MT from a mechanistic perspective by analyzing attention heads to understand how LLMs internally encode and distribute translation functions. We decompose MT into two subtasks: producing text in the target language (i.e. target language identification) and preserving the input sentence's meaning (i.e. sentence equivalence). Across three families of open-source models and 20 translation directions, we find that distinct, sparse sets of attention heads specialize in each subtask. Based on this insight, we construct subtask-specific steering vectors and show that modifying just 1% of the relevant heads enables instruction-free MT performance comparable to instruction-based prompting, while ablating these heads selectively disrupts their corresponding translation functions.
从大语言模型的机器翻译中分离语言与意义 / Disentangling meaning from language in LLM-based machine translation
这篇论文通过分析大语言模型内部的注意力机制,发现翻译任务被分解为‘生成目标语言文本’和‘保留原句意义’两个独立的子任务,并分别由不同的注意力头负责,通过微调极少数相关头部就能实现高质量的指令无关翻译。
源自 arXiv: 2602.04613