📄
Abstract - MMFormalizer: Multimodal Autoformalization in the Wild
Autoformalization, which translates natural language mathematics into formal statements to enable machine reasoning, faces fundamental challenges in the wild due to the multimodal nature of the physical world, where physics requires inferring hidden constraints (e.g., mass or energy) from visual elements. To address this, we propose MMFormalizer, which extends autoformalization beyond text by integrating adaptive grounding with entities from real-world mathematical and physical domains. MMFormalizer recursively constructs formal propositions from perceptually grounded primitives through recursive grounding and axiom composition, with adaptive recursive termination ensuring that every abstraction is supported by visual evidence and anchored in dimensional or axiomatic grounding. We evaluate MMFormalizer on a new benchmark, PhyX-AF, comprising 115 curated samples from MathVerse, PhyX, Synthetic Geometry, and Analytic Geometry, covering diverse multimodal autoformalization tasks. Results show that frontier models such as GPT-5 and Gemini-3-Pro achieve the highest compile and semantic accuracy, with GPT-5 excelling in physical reasoning, while geometry remains the most challenging domain. Overall, MMFormalizer provides a scalable framework for unified multimodal autoformalization, bridging perception and formal reasoning. To the best of our knowledge, this is the first multimodal autoformalization method capable of handling classical mechanics (derived from the Hamiltonian), as well as relativity, quantum mechanics, and thermodynamics. More details are available on our project page: this http URL
MMFormalizer:面向真实世界的多模态自动形式化方法 /
MMFormalizer: Multimodal Autoformalization in the Wild
1️⃣ 一句话总结
这篇论文提出了一个名为MMFormalizer的新框架,它首次将自动形式化(即把自然语言描述的数学物理问题转化为机器可推理的严格公式)从纯文本扩展到包含视觉信息的真实世界,通过递归构建和自适应终止,让机器能根据图像和文字共同推断出隐藏的物理约束并完成形式化,并在涵盖经典力学、相对论等多个领域的基准测试中验证了其有效性。