Preservation Is Not Enough for Width Growth: Regime-Sensitive Selection of Dense LM Warm Starts

📄 Abstract - Preservation Is Not Enough for Width Growth: Regime-Sensitive Selection of Dense LM Warm Starts

Width expansion offers a practical route to reuse smaller causal-language-model checkpoints, but selecting a widened warm start is not solved by zero-step preservation alone. We study dense width growth as a candidate-selection problem over full training states, including copied weights, optimizer moments, and scheduler state. In a small-scale TinyStories proxy, we compare exact-copy, perturbative, asymmetric-reset, and structured non-clone warm starts under matched continuation budgets. We evaluate zero-step preservation, short-lag probe metrics, and downstream continuation utility in deterministic and stochastic regimes. The picture is mixed and partially replicated through a reduced-pool seed-1 check. Exact-copy symmetric warm starts rank first in every completed 16-step probe and in the completed stochastic 128-step continuations at seed-0 steps 1000 and 2000 plus reduced seed-1 step 2000. By contrast, the structured non-clone challenger wins deterministic 128-step continuation. Early escape from the inherited cloned subspace is therefore not a universal selector: it helps in long deterministic continuation, but it misleads at short lag and under stochastic continuation. The result is narrow but useful: for dense width growth at this scale, preservation is not a universal ranking criterion, and the best replacement signal depends on both regime and lag budget.

仅靠保持性不足以保证宽度扩展：针对不同训练阶段选择稠密语言模型的热启动方法 / Preservation Is Not Enough for Width Growth: Regime-Sensitive Selection of Dense LM Warm Starts

1️⃣ 一句话总结

这篇论文研究发现，在扩展小型语言模型规模时，仅仅保持模型原有性能并不够，最佳的热启动方法选择取决于后续训练是确定性的还是随机的，以及训练步数的长短。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要