菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-24
📄 Abstract - On Data Engineering for Scaling LLM Terminal Capabilities

Despite rapid recent progress in the terminal capabilities of large language models, the training data strategies behind state-of-the-art terminal agents remain largely undisclosed. We address this gap through a systematic study of data engineering practices for terminal agents, making two key contributions: (1) Terminal-Task-Gen, a lightweight synthetic task generation pipeline that supports seed-based and skill-based task construction, and (2) a comprehensive analysis of data and training strategies, including filtering, curriculum learning, long context training, and scaling behavior. Our pipeline yields Terminal-Corpus, a large-scale open-source dataset for terminal tasks. Using this dataset, we train Nemotron-Terminal, a family of models initialized from Qwen3(8B, 14B, 32B) that achieve substantial gains on Terminal-Bench 2.0: Nemotron-Terminal-8B improves from 2.5% to 13.0% Nemotron-Terminal-14B improves from 4.0% to 20.2%, and Nemotron-Terminal-32B improves from 3.4% to 27.4%, matching the performance of significantly larger models. To accelerate research in this domain, we open-source our model checkpoints and most of our synthetic datasets at this https URL.

顶级标签: llm agents model training
详细标签: terminal agents synthetic data generation data engineering curriculum learning benchmark evaluation 或 搜索:

关于扩展大语言模型终端能力的数据工程研究 / On Data Engineering for Scaling LLM Terminal Capabilities


1️⃣ 一句话总结

这篇论文通过开发一个名为Terminal-Task-Gen的自动化任务生成工具和一套数据训练策略,成功创建了能大幅提升大语言模型在命令行终端操作能力的数据集和模型,并将这些资源开源以推动该领域研究。

源自 arXiv: 2602.21193