菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-22
📄 Abstract - Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL

Modern language models demonstrate impressive coding capabilities in common programming languages (PLs), such as C++ and Python, but their performance in lower-resource PLs is often limited by training data availability. In principle, however, most programming skills are universal across PLs, so the capability acquired in one PL should transfer to others. In this work, we propose the task of zero-shot cross-programming-language transfer for code RL. We find that, for Llama-3.1, RL training for code generation in a source PL fails to improve, and sometimes even degrades, the performance on other target PLs. To address this, we hypothesize that effective RL transfer requires a generalizable SFT initialization before RL. We thus propose **Parallel-SFT**, an SFT strategy that incorporates "parallel programs" -- functionally equivalent code implemented in multiple PLs -- into the data mixture. We demonstrate that this improves transferability: when we subsequently perform RL on our Parallel-SFT model, we observe better generalization to unseen PLs. Analysis of the model internal representations reveals that Parallel-SFT leads to a more functionality-centric latent space, where equivalent programs across PLs are more tightly clustered, which we hypothesize to contribute to the improved transferability.

顶级标签: llm model training machine learning
详细标签: code generation reinforcement learning zero-shot transfer supervised fine-tuning programming languages 或 搜索:

并行监督微调:提升代码强化学习的零样本跨编程语言迁移能力 / Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL


1️⃣ 一句话总结

本文提出Parallel-SFT方法,通过在监督微调阶段加入多种编程语言实现相同功能的“并行程序”数据,使后续的强化学习训练能更好地将编程能力从常见语言(如Python、C++)零样本迁移到资源较少的语言(如Rust、Julia)上,实验表明该方法能让模型内部对功能相同但语言不同的代码产生更紧密的表示聚类,从而提升迁移效果。

源自 arXiv: 2604.20835