菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-12
📄 Abstract - Thinking with Drafting: Optical Decompression via Logical Reconstruction

Existing multimodal large language models have achieved high-fidelity visual perception and exploratory visual generation. However, a precision paradox persists in complex reasoning tasks: optical perception systems transcribe symbols without capturing logical topology, while pixel-based generative models produce visual artifacts lacking mathematical exactness. To bridge this gap, we propose that reasoning over visual inputs be reconceptualized as optical decompression-the process of reconstructing latent logical structures from compressed visual tokens. Guided by the axiom that Parsing is Reasoning, we introduce Thinking with Drafting (TwD), which utilizes a minimalist Domain-Specific Language (DSL) as a grounding intermediate representation. Unlike standard approaches that hallucinate answers directly, TwD forces the model to draft its mental model into executable code, rendering deterministic visual proofs for self-verification. To validate this, we present VisAlg, a visual algebra benchmark. Experiments demonstrate that TwD serve as a superior cognitive scaffold. Our work establishes a closed-loop system where visual generation acts not as a creative output but as a logical verifier, offering a generalizable path for visual reasoning.

顶级标签: multi-modal llm model evaluation
详细标签: visual reasoning domain-specific language optical decompression benchmark self-verification 或 搜索:

通过草稿思考:基于逻辑重建的光学解压缩 / Thinking with Drafting: Optical Decompression via Logical Reconstruction


1️⃣ 一句话总结

这篇论文提出了一种名为‘通过草稿思考’的新方法,它让AI在解决视觉推理问题时,先像写草稿一样把思考过程写成可执行的代码,然后通过生成图像来验证自己的逻辑是否正确,从而显著提升了在复杂数学和逻辑问题上的精确度。

源自 arXiv: 2602.11731