菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-13
📄 Abstract - Improving Code Translation with Syntax-Guided and Semantic-aware Preference Optimization

LLMs have shown immense potential for code translation, yet they often struggle to ensure both syntactic correctness and semantic consistency. While preference-based learning offers a promising alignment strategy, it is hindered by unreliable semantic rewards derived from sparse test cases or restrictive reference translations. We argue that a robust semantic reward for code translation must be derived directly from the source code. In this paper, we propose CTO to improve code translation with syntax-guided and semantic-aware preference optimization. Through contrastive learning, we train a cross-lingual semantic model to directly assess functional equivalence between source and translated code. By formulating code translation as a multi-objective optimization problem, this robust semantic signal is seamlessly unified with compiler-based syntactic feedback within the direct preference optimization framework. Extensive experiments on C++, Java, and Python translations demonstrate that CTO significantly outperforms existing baselines and alternative preference optimization strategies.

顶级标签: llm machine learning
详细标签: code translation preference optimization semantic reward contrastive learning syntax guidance 或 搜索:

通过语法引导和语义感知偏好优化改进代码翻译 / Improving Code Translation with Syntax-Guided and Semantic-aware Preference Optimization


1️⃣ 一句话总结

该论文提出了一种名为CTO的新方法,通过结合语法检查和对比学习训练的语义模型,在代码翻译中同时保证语法正确和功能等价,显著提升了大型语言模型在不同编程语言间的翻译质量。

源自 arXiv: 2605.13229