SWE-Lego:探索监督微调在软件问题解决任务中的性能极限 / SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving
1️⃣ 一句话总结
这篇论文提出了一个名为SWE-Lego的监督微调方案,通过构建高质量数据集和改进训练流程,证明了仅用轻量化的监督微调方法就能在软件工程问题解决任务上达到顶尖性能,并可通过测试时扩展进一步提升效果。
We present SWE-Lego, a supervised fine-tuning (SFT) recipe designed to achieve state-ofthe-art performance in software engineering (SWE) issue resolving. In contrast to prevalent methods that rely on complex training paradigms (e.g., mid-training, SFT, reinforcement learning, and their combinations), we explore how to push the limits of a lightweight SFT-only approach for SWE tasks. SWE-Lego comprises three core building blocks, with key findings summarized as follows: 1) the SWE-Lego dataset, a collection of 32k highquality task instances and 18k validated trajectories, combining real and synthetic data to complement each other in both quality and quantity; 2) a refined SFT procedure with error masking and a difficulty-based curriculum, which demonstrably improves action quality and overall performance. Empirical results show that with these two building bricks alone,the SFT can push SWE-Lego models to state-of-the-art performance among open-source models of comparable size on SWE-bench Verified: SWE-Lego-Qwen3-8B reaches 42.2%, and SWE-Lego-Qwen3-32B attains 52.6%. 3) We further evaluate and improve test-time scaling (TTS) built upon the SFT foundation. Based on a well-trained verifier, SWE-Lego models can be significantly boosted--for example, 42.2% to 49.6% and 52.6% to 58.8% under TTS@16 for the 8B and 32B models, respectively.
SWE-Lego:探索监督微调在软件问题解决任务中的性能极限 / SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving
这篇论文提出了一个名为SWE-Lego的监督微调方案,通过构建高质量数据集和改进训练流程,证明了仅用轻量化的监督微调方法就能在软件工程问题解决任务上达到顶尖性能,并可通过测试时扩展进一步提升效果。
源自 arXiv: 2601.01426