ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models

📄 Abstract - ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models

Schedule-Free Learning has shown promise as a practical anytime training method for machine learning, showing success across dozens of standard benchmark problems. However, strong performance for LLM training has only been demonstrated at small scales. We identify a number of fixes necessary to scale up Schedule-Free Learning to larger batch sizes and model sizes, and present a learning-rate-free and schedule-free method (ScheduleFree+) for training large language models which greatly outperforms Warmup-Stable-Decay (WSD) schedules. We also demonstrate that Schedule-Free Learning is most effective for long duration training, and at 1000 tokens per parameter, it outperforms SOTA schedules by 31%. Schedule-Free Learning provides a theoretical foundation for the use of model averaging and checkpoint merging during pretraining.

ScheduleFree+：将无学习率与无学习率调整的大规模语言模型训练方法扩展至实际应用 / ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models

1️⃣ 一句话总结

本文提出了一种改进的无学习率且无需预设学习率调度策略的训练方法ScheduleFree+，通过解决大规模训练中的关键问题，使该方法在训练大型语言模型时性能显著优于传统的最佳调度策略（如WSD），尤其在长时间训练中能提升高达31%的效果，并为预训练中的模型平均与检查点合并提供了理论基础。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要