ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

📄 Abstract - ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

Code sandboxes have emerged as a critical infrastructure for advancing the coding capabilities of large language models, providing verifiable feedback for both RL training and evaluation. However, existing systems fail to provide accurate verification and efficiency under high-concurrency workloads. We present ScaleBox, a high-fidelity and scalable system designed to address these limitations in large-scale code training. ScaleBox introduces automated special-judge generation and management, fine-grained parallel execution across test cases with seamless multi-node coordination, and a configuration-driven evaluation suite for reproducible benchmarking. A series of experiments demonstrates that ScaleBox significantly enhances code verification accuracy and efficiency. Our further RLVR experiments show that ScaleBox substantially improves both performance on LiveCodeBench and training stability, significantly outperforming heuristic-matching baselines. By providing a reliable and high-throughput infrastructure, ScaleBox facilitates more effective research and development in large-scale code training.

ScaleBox：为大型语言模型实现高保真且可扩展的代码验证 / ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

1️⃣ 一句话总结

ScaleBox 是一个专为大规模代码训练设计的高效验证系统，通过自动生成特殊判题规则、细粒度并行执行测试用例以及跨节点无缝协作，显著提升了代码验证的准确性和并发处理能力，从而让大型语言模型在训练和评估中获得更可靠的反馈。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要