← 返回列表

🤖 系统

📄 Abstract - Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

Recent advances in depth-recurrent language models show that recurrence can decouple train-time compute and parameter count from test-time compute. In this work, we study how to convert existing pretrained non-recurrent language models into depth-recurrent models. We find that using a curriculum of recurrences to increase the effective depth of the model over the course of training preserves performance while reducing total computational cost. In our experiments, on mathematics, we observe that converting pretrained models to recurrent ones results in better performance at a given compute budget than simply post-training the original non-recurrent language model.

顶级标签: llm model training natural language processing

📄 论文总结

通过改造递归机制让预训练语言模型进行更深层思考 / Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

1️⃣ 一句话总结

这项研究提出了一种将现有非递归预训练语言模型转化为深度递归模型的方法，通过渐进式增加模型有效深度的训练策略，在降低计算成本的同时提升了数学任务上的性能表现。

📄 打开原文 PDF

← 返回列表

菜单

🤖 AI 深度阅读

📄 论文总结

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

🤖 AI 深度阅读

📄 论文总结

1️⃣ 一句话总结

获取最新论文摘要