菜单

🤖 系统
📄 Abstract - DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model. The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. In addition to standard benchmarks, we introduce ProverBench, a collection of 325 formalized problems, to enrich our evaluation, including 15 selected problems from the recent AIME competitions (years 24-25). Further evaluation on these 15 AIME problems shows that the model successfully solves 6 of them. In comparison, DeepSeek-V3 solves 8 of these problems using majority voting, highlighting that the gap between formal and informal mathematical reasoning in large language models is substantially narrowing.

顶级标签: llm agents model training
详细标签: theorem proving reinforcement learning formal reasoning subgoal decomposition mathematical reasoning 或 搜索:

DeepSeek-Prover-V2:一种用于形式定理证明的统一非形式与形式推理大语言模型 / DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition


1️⃣ 一句话总结

本文提出了DeepSeek-Prover-V2,一个通过创新的子目标分解与强化学习冷启动训练方法,将非形式数学推理与形式化定理证明能力统一在单一模型中的开源大语言模型,在多个数学定理证明基准上达到了新的最先进性能。


2️⃣ 论文创新点

1. 基于子目标分解的强化学习冷启动训练

2. 非形式到形式化的定理证明范式

3. 基于子目标分解的递归证明搜索

4. 基于子目标的课程学习框架

5. 双模式证明生成

6. 两阶段训练流程

7. 引入ProverBench评估基准


3️⃣ 主要结果与价值

结果亮点

实际价值


4️⃣ 术语表

📄 打开原文 PDF