RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories

📄 Abstract - RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories

Data mixture selection is critical for Large Language Model pretraining. Existing methods such as RegMix select a single static mixture by fitting a regression model on small-scale proxy runs. We propose RegMix-D, a simple extension of RegMix to dynamic mixing. Our key observation is that proxy runs produce not only endpoint losses, but also full loss trajectories, which can be used to further improve data mixture. By training regression model on these trajectories, we can predict optimal mixtures at multiple training stages. RegMix-D supports two deployment modes: an offline variant that generates a complete mixture schedule before target training, and an online variant that adapts the mixture during training using observed loss. Experiments on 25B tokens of the Pile dataset with a 1B parameter target model show that RegMix-D consistently improves over RegMix and DoReMi across 13 downstream tasks while remaining proxy-efficient: it surpasses RegMix even with only 128 proxy models (25% of RegMix's proxy compute budget).

RegMix-D：通过代理训练轨迹实现动态数据混合 / RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories

1️⃣ 一句话总结

本文提出了一种名为RegMix-D的动态数据混合方法，它利用小型代理模型在训练过程中的完整损失变化曲线来预测不同阶段的最优数据配比，相比传统静态混合方法（如RegMix）在更少计算资源下显著提升了大型语言模型的预训练效果。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要