为何大语言模型尚非科学家:来自四次自主研究尝试的启示 / Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts
1️⃣ 一句话总结
这篇论文通过四次让大语言模型自主生成机器学习研究论文的尝试,发现其中三次失败,揭示了AI在自主科研中存在的六大常见缺陷,并提出了构建更可靠AI科学家系统的设计原则。
We report a case study of four end-to-end attempts to autonomously generate ML research papers using a pipeline of six LLM agents mapped to stages of the scientific workflow. Of these four, three attempts failed during implementation or evaluation. One completed the pipeline and was accepted to Agents4Science 2025, an experimental inaugural venue that required AI systems as first authors, passing both human and multi-AI review. From these attempts, we document six recurring failure modes: bias toward training data defaults, implementation drift under execution pressure, memory and context degradation across long-horizon tasks, overexcitement that declares success despite obvious failures, insufficient domain intelligence, and weak scientific taste in experimental design. We conclude by discussing four design principles for more robust AI-scientist systems, implications for autonomous scientific discovery, and we release all prompts, artifacts, and outputs at this https URL
为何大语言模型尚非科学家:来自四次自主研究尝试的启示 / Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts
这篇论文通过四次让大语言模型自主生成机器学习研究论文的尝试,发现其中三次失败,揭示了AI在自主科研中存在的六大常见缺陷,并提出了构建更可靠AI科学家系统的设计原则。
源自 arXiv: 2601.03315