Unifying Local Communications and Local Updates for LLM Pretraining

📄 Abstract - Unifying Local Communications and Local Updates for LLM Pretraining

Communication-efficient pre-training of LLMs is increasingly important as training draws on compute distributed across clusters, data centers, and lower-bandwidth links. Many practical methods reduce communication frequency but still rely on synchronous All-Reduce operations that maintain identical model states and tie progress to global collectives. This can become a bottleneck when bandwidth or worker speed is heterogeneous. We introduce GASLoC, a novel decentralized pre-training algorithm that generalizes the notion of communication acceleration to the recently popular "outer optimizer" to allow a practical gossip-based training framework that is compatible with adaptive optimizers, allows for local optimizer steps, and can utilize sparse randomized peer communication. Empirically, on a number of standard LLM training tasks, we demonstrate that GASLoC outperforms state-of-the-art decentralized algorithms in single step per communication setting for a number of topologies and, unlike existing decentralized methods in the LLM setting, it allows to obtain performance competitive with DiLoCo when utilizing multiple local steps. In the heterogeneous bandwidth setting we demonstrate the advantage of GASLoC showing that it can significantly outperform DiLoCo.

统一本地通信与本地更新的大语言模型预训练方法 / Unifying Local Communications and Local Updates for LLM Pretraining

1️⃣ 一句话总结

本文提出了一种名为GASLoC的新型去中心化预训练算法，通过将通信加速推广到“外部优化器”，使模型训练在低带宽、异构环境下既能兼容自适应优化器，又能进行本地更新和稀疏随机通信，从而在多个标准任务上超越现有最先进的去中心化方法，并在异构带宽场景下显著优于DiLoCo。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要