菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-05
📄 Abstract - Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better

As Large Language Models (LLMs) achieve remarkable empirical success through scaling model and data size, pretraining has become increasingly critical yet computationally prohibitive, hindering rapid development. Despite the availability of numerous pretrained LLMs developed at significant computational expense, a fundamental real-world question remains underexplored: \textit{Can we leverage existing small pretrained models to accelerate the training of larger models?} In this paper, we propose a Late-to-Early Training (LET) paradigm that enables LLMs to explicitly learn later knowledge in earlier steps and earlier layers. The core idea is to guide the early layers of an LLM during early training using representations from the late layers of a pretrained (i.e. late training phase) model. We identify two key mechanisms that drive LET's effectiveness: late-to-early-step learning and late-to-early-layer learning. These mechanisms significantly accelerate training convergence while robustly enhancing both language modeling capabilities and downstream task performance, enabling faster training with superior performance. Extensive experiments on 1.4B and 7B parameter models demonstrate LET's efficiency and effectiveness. Notably, when training a 1.4B LLM on the Pile dataset, our method achieves up to 1.6$\times$ speedup with nearly 5\% improvement in downstream task accuracy compared to standard training, even when using a pretrained model with 10$\times$ fewer parameters than the target model.

顶级标签: llm model training machine learning
详细标签: knowledge transfer training acceleration representation learning pretraining parameter efficiency 或 搜索:

晚到早训练:让大语言模型学得更早,从而更快更好 / Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better


1️⃣ 一句话总结

这篇论文提出了一种名为‘晚到早训练’的新方法,它利用一个已经训练好的小模型来指导一个新的大模型在训练初期就学习到更深层的知识,从而显著加快训练速度并提升最终性能。

源自 arXiv: 2602.05393