📄 论文总结
通过改造递归机制让预训练语言模型进行更深层思考 / Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
1️⃣ 一句话总结
这项研究提出了一种将现有非递归预训练语言模型转化为深度递归模型的方法,通过渐进式增加模型有效深度的训练策略,在降低计算成本的同时提升了数学任务上的性能表现。
Recent advances in depth-recurrent language models show that recurrence can decouple train-time compute and parameter count from test-time compute. In this work, we study how to convert existing pretrained non-recurrent language models into depth-recurrent models. We find that using a curriculum of recurrences to increase the effective depth of the model over the course of training preserves performance while reducing total computational cost. In our experiments, on mathematics, we observe that converting pretrained models to recurrent ones results in better performance at a given compute budget than simply post-training the original non-recurrent language model.
通过改造递归机制让预训练语言模型进行更深层思考 / Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
这项研究提出了一种将现有非递归预训练语言模型转化为深度递归模型的方法,通过渐进式增加模型有效深度的训练策略,在降低计算成本的同时提升了数学任务上的性能表现。