菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-13
📄 Abstract - Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics

As automated reasoning systems advance rapidly, there is a growing need for research-level formal mathematical problems to accurately evaluate their capabilities. To address this, we present Formal Conjectures, an evolving benchmark of currently 2615 mathematical problem statements formalized in Lean 4. Sourced from areas of active mathematical research, the dataset features 1029 open research conjectures providing a zero-contamination benchmark for mathematical proof discovery, and 836 solved problems for proof autoformalization. Notably, the repository provides a structured interface connecting mathematicians who formalize and clarify problems with the AI systems and humans attempting to solve them. Demonstrating its immediate utility, the benchmark has already been leveraged to make new mathematical discoveries, including the resolution of open research conjectures. We describe our approach to ensuring the correctness of these formalizations in a collaborative open-source project where contributions stem from an active community. In this framework, AI-generated proofs and disproofs serve as a valuable auditing mechanism to iteratively improve the fidelity of the benchmark. Finally, we provide a standardized evaluation setup and report baseline results on frozen evaluation subsets, demonstrating a climbable signal that measures the current frontier of automated reasoning on research-level mathematics.

顶级标签: aigc benchmark machine learning
详细标签: automated reasoning lean 4 formal mathematics proof discovery evalution 或 搜索:

形式化猜想:一个用于数学可验证发现的开放且不断发展的基准 / Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics


1️⃣ 一句话总结

该论文提出了一个名为“形式化猜想”(Formal Conjectures)的、持续更新的数学基准测试集,包含2615个用Lean 4语言形式化的数学问题,其中既有未解决的开放猜想(用于检验AI的推理发现能力),也有已解决的难题(用于验证证明的自动形式化能力),并通过社区协作和AI生成的证明来确保问题表述的准确性,从而为推动自动化推理系统在高等数学中的应用提供了一个标准化测试平台。

源自 arXiv: 2605.13171