菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-30
📄 Abstract - Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking

Complex reasoning problems often involve implicit spatial, geometric, and structural relationships that are not explicitly encoded in text. While recent reasoning models have achieved strong performance across many domains, purely text-based reasoning struggles to represent global structural constraints in complex settings. In this paper, we introduce FIGR, which integrates active visual thinking into multi-turn reasoning via end-to-end reinforcement learning. FIGR externalizes intermediate structural hypotheses by constructing visual representations during problem solving. By adaptively regulating when and how visual reasoning should be invoked, FIGR enables more stable and coherent reasoning over global structural properties that are difficult to capture from text alone. Experiments on challenging mathematical reasoning benchmarks demonstrate that FIGR outperforms strong text-only chain-of-thought baselines. In particular, FIGR improves the base model by 13.12% on AIME 2025 and 11.00% on BeyondAIME, highlighting the effectiveness of figure-guided multimodal reasoning in enhancing the stability and reliability of complex reasoning.

顶级标签: llm agents multi-modal
详细标签: visual reasoning multimodal reasoning reinforcement learning mathematical reasoning structural reasoning 或 搜索:

图形化思考:通过主动视觉思维提升推理前沿 / Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking


1️⃣ 一句话总结

这篇论文提出了一种名为FIGR的新方法,它通过让AI在解决复杂问题时主动绘制图形来辅助思考,从而显著提升了在数学推理等需要理解空间和结构关系任务上的表现。

源自 arXiv: 2512.24297