菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-02
📄 Abstract - Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs

While Large Language Models (LLMs) excel in code generation, they remain prone to replicating subtle yet critical vulnerabilities endemic to their training data. Current alignment techniques, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), typically apply coarse-grained optimization at the sequence level. This approach often fails to address the localized nature of security flaws, where a single incorrect token choice can compromise an entire program. To bridge this gap, we introduce Tree-like Self-Play (TSP), a framework that reframes secure code generation as a fine-grained sequential decision process. Unlike standard methods that blindly maximize likelihood, TSP constructs a decision tree where the model explores branching trajectories--generating both secure "golden paths" and vulnerable variants. By treating code generation as a self-play game, the model learns to strictly discriminate against its own localized errors. This provides a dense, on-policy learning signal that forces self-correction precisely at the critical decision nodes where vulnerabilities typically emerge. Our experiments demonstrate that TSP fundamentally enhances model reliability. In Python security benchmarks, TSP boosts CodeLlama-7B's pass rate (SPR@1) to 75.8%, significantly outperforming SFT (57.0%) and unstructured self-play baselines. Crucially, TSP induces robust out-of-distribution generalization: the model not only reduces vulnerabilities in unseen categories (CWEs) by 24.5% but also successfully transfers security principles learned from C/C++ to diverse languages, including Python, Go, and JavaScript. This suggests that TSP does not merely memorize patches, but internalizes abstract, language-agnostic security logic.

顶级标签: llm model training security
详细标签: code generation self-play secure code reinforcement learning vulnerability detection 或 搜索:

从错误中学习:用于安全代码大模型的树状自对弈方法 / Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs


1️⃣ 一句话总结

本文提出了一种名为树状自对弈(TSP)的新方法,通过让代码生成模型在类似下棋的自我对战中探索安全与不安全的代码路径,从而精确修正生成过程中的微小安全错误,使模型不仅显著提升了生成代码的安全性,还能将学到的安全逻辑跨语言(如从C/C++推广到Python、Go)泛化应用。

源自 arXiv: 2606.03489