Culture-Aware Machine Translation in Large Language Models: Benchmarking and Investigation

📄 Abstract - Culture-Aware Machine Translation in Large Language Models: Benchmarking and Investigation

Large language models (LLMs) have achieved strong performance in general machine translation, yet their ability in culture-aware scenarios remains poorly understood. To bridge this gap, we introduce CanMT, a Culture-Aware Novel-Driven Parallel Dataset for Machine Translation, together with a theoretically grounded, multi-dimensional evaluation framework for assessing cultural translation quality. Leveraging CanMT, we systematically evaluate a wide range of LLMs and translation systems under different translation strategy constraints. Our findings reveal substantial performance disparities across models and demonstrate that translation strategies exert a systematic influence on model behavior. Further analysis shows that translation difficulty varies across types of culture-specific items, and that a persistent gap remains between models' recognition of culture-specific knowledge and their ability to correctly operationalize it in translation outputs. In addition, incorporating reference translations is shown to substantially improve evaluation reliability in LLM-as-a-judge, underscoring their essential role in assessing culture-aware translation quality. The corpus and code are available at CanMT.

大型语言模型中的文化感知机器翻译：基准测试与探究 / Culture-Aware Machine Translation in Large Language Models: Benchmarking and Investigation

1️⃣ 一句话总结

本研究提出了一个专门用于评估机器翻译中文化感知能力的数据集CanMT和一套多维评价框架，通过系统测试多种大语言模型发现，模型在处理文化特有词汇时存在显著性能差异，且虽然能识别文化知识，但难以正确将其应用于翻译输出，而加入参考翻译能显著提升评估可靠性。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要