Pearmut:让人工翻译评估变得简单 / Pearmut: Human Evaluation of Translation Made Trivial
1️⃣ 一句话总结
这篇论文介绍了一个名为Pearmut的轻量级平台,它通过简化流程、支持多种标准评估方法并集成智能辅助功能,使得原本复杂耗时的人工翻译质量评估变得像自动评估一样便捷易行,从而有望让人工评估成为模型开发中的常规环节。
Human evaluation is the gold standard for multilingual NLP, but is often skipped in practice and substituted with automatic metrics, because it is notoriously complex and slow to set up with existing tools with substantial engineering and operational overhead. We introduce Pearmut, a lightweight yet feature-rich platform that makes end-to-end human evaluation as easy to run as automatic evaluation. Pearmut removes common entry barriers and provides support for evaluating multilingual tasks, with a particular focus on machine translation. The platform implements standard evaluation protocols, including DA, ESA, or MQM, but is also extensible to allow prototyping new protocols. It features document-level context, absolute and contrastive evaluation, attention checks, ESAAI pre-annotations and both static and active learning-based assignment strategies. Pearmut enables reliable human evaluation to become a practical, routine component of model development and diagnosis rather than an occasional effort.
Pearmut:让人工翻译评估变得简单 / Pearmut: Human Evaluation of Translation Made Trivial
这篇论文介绍了一个名为Pearmut的轻量级平台,它通过简化流程、支持多种标准评估方法并集成智能辅助功能,使得原本复杂耗时的人工翻译质量评估变得像自动评估一样便捷易行,从而有望让人工评估成为模型开发中的常规环节。
源自 arXiv: 2601.02933