PEGRL: Improving Machine Translation by Post-Editing Guided Reinforcement Learning

📄 Abstract - PEGRL: Improving Machine Translation by Post-Editing Guided Reinforcement Learning

Reinforcement learning (RL) has shown strong promise for LLM-based machine translation, with recent methods such as GRPO demonstrating notable gains; nevertheless, translation-oriented RL remains challenged by noisy learning signals arising from Monte Carlo return estimation, as well as a large trajectory space that favors global exploration over fine-grained local optimization. We introduce \textbf{PEGRL}, a \textit{two-stage} RL framework that uses post-editing as an auxiliary task to stabilize training and guide overall optimization. At each iteration, translation outputs are sampled to construct post-editing inputs, allowing return estimation in the post-editing stage to benefit from conditioning on the current translation behavior, while jointly supporting both global exploration and fine-grained local optimization. A task-specific weighting scheme further balances the contributions of translation and post-editing objectives, yielding a biased yet more sample-efficient estimator. Experiments on English$\to$Finnish, English$\to$Turkish, and English$\leftrightarrow$Chinese show consistent gains over RL baselines, and for English$\to$Turkish, performance on COMET-KIWI is comparable to advanced LLM-based systems (DeepSeek-V3.2).

PEGRL：通过后编辑引导的强化学习改进机器翻译 / PEGRL: Improving Machine Translation by Post-Editing Guided Reinforcement Learning

1️⃣ 一句话总结

这篇论文提出了一种名为PEGRL的两阶段强化学习方法，通过引入‘后编辑’作为辅助任务来稳定训练过程，从而更有效地指导大型语言模型进行机器翻译，在多个语言对上取得了比现有强化学习方法更好的效果。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要