菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-07
📄 Abstract - Can Large Language Models Reinvent Foundational Algorithms?

LLMs have shown strong potential to advance scientific discovery. Whether they possess the capacity for foundational innovation, however, remains an open question. In this work, we focus on a prerequisite for foundational innovation: can LLMs reinvent foundational algorithms in computer science? Our \textit{Unlearn-and-Reinvent} pipeline applies LLM unlearning to remove a specific foundational algorithm, such as Dijkstra's or Euclid's algorithm, from an LLM's pretrained knowledge, and then tests whether the model can reinvent it in a controlled environment. To enable effective unlearning, we adopt a GRPO-based, on-policy unlearning method. Across 10 target algorithms, 3 strong open-weight models, and 3 hint levels, our experiments demonstrate that (1) the strongest model Qwen3-4B-Thinking-2507 successfully reinvents 50% of the algorithms with no hint, 70% at hint level 1, and 90% at hint level 2; (2) a few high-level hints can enhance the reinvention success rate, but even step-by-step hints fail for those complicated algorithms; and (3) test-time reinforcement learning enables successful reinvention for the Strassen algorithm at hint level 2. Through analyses of output trajectories and ablation studies, we find that generative verifier in the reinvention phase plays a critical role in sustaining models' reasoning strength, helping to avoid the ``thought collapse'' phenomenon. These findings offer insights into both the potential and current limits of LLMs' innovative thinking.

顶级标签: llm model training model evaluation
详细标签: algorithm reinvention unlearning reasoning foundational algorithms evaluation 或 搜索:

大型语言模型能否重新发明基础算法? / Can Large Language Models Reinvent Foundational Algorithms?


1️⃣ 一句话总结

这篇论文通过一个‘遗忘-再发明’的实验框架,测试了大型语言模型是否具备重新发明计算机科学基础算法的创新能力,发现当前最强的模型能在一定提示下成功‘再发明’部分算法,但其创新思维仍有局限性。

源自 arXiv: 2604.05716