何时翻译:面向多语言推理的选择性翻译学习 / Learning When to Translate for Multilingual Reasoning
1️⃣ 一句话总结
本文提出了一种名为Luar的强化学习框架,让推理语言模型在面对非英语输入时,能够自动判断自己的理解是否可靠,仅在必要时使用英语翻译辅助推理,从而在不影响准确性的前提下大幅提升低资源语言的推理性能。
Reasoning language models (RLMs) achieve strong performance on complex reasoning tasks, but still exhibit substantial multilingual reasoning gaps, largely due to language-understanding failures in non-English inputs. English translation can mitigate these failures by expressing non-English inputs in a form that RLMs can more reliably interpret, yet translating every input is unnecessary when the model can reason reliably from the original query. To address this challenge, we propose Luar, a Language Understanding Boundary-aware Reinforcement Learning framework that trains RLMs to selectively invoke translation when direct understanding is unreliable. Luar trains the model to choose between solving the original input directly and reasoning over its English translation, encouraging translation only when translator-augmented reasoning is expected to substantially outperform direct reasoning. Across multilingual reasoning benchmarks, Luar outperforms standard GRPO and other training-based baselines, with particularly large gains on low-resource languages. Further analysis shows that Luar avoids unnecessary translation in cases where direct reasoning is sufficient, while extending its translator-call behavior to unseen low-resource languages. Together, our work suggests a selective approach to multilingual reasoning: RLMs can learn to invoke translation only when their direct understanding is unreliable. The project will be made publicly available at this https URL
何时翻译:面向多语言推理的选择性翻译学习 / Learning When to Translate for Multilingual Reasoning
本文提出了一种名为Luar的强化学习框架,让推理语言模型在面对非英语输入时,能够自动判断自己的理解是否可靠,仅在必要时使用英语翻译辅助推理,从而在不影响准确性的前提下大幅提升低资源语言的推理性能。
源自 arXiv: 2606.02465