思维渲染:将文本思维链渲染为图像以实现视觉潜在推理 / Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
1️⃣ 一句话总结
这篇论文提出了一种名为‘思维渲染’的新方法,它将大语言模型推理过程中冗长的文字思维链转换成图像,从而大幅减少了计算负担并加速了推理过程,同时让模型的思考步骤变得可见和可追溯。
Chain-of-Thought (CoT) prompting has achieved remarkable success in unlocking the reasoning capabilities of Large Language Models (LLMs). Although CoT prompting enhances reasoning, its verbosity imposes substantial computational overhead. Recent works often focus exclusively on outcome alignment and lack supervision on the intermediate reasoning process. These deficiencies obscure the analyzability of the latent reasoning chain. To address these challenges, we introduce Render-of-Thought (RoT), the first framework to reify the reasoning chain by rendering textual steps into images, making the latent rationale explicit and traceable. Specifically, we leverage the vision encoders of existing Vision Language Models (VLMs) as semantic anchors to align the vision embeddings with the textual space. This design ensures plug-and-play implementation without incurring additional pre-training overhead. Extensive experiments on mathematical and logical reasoning benchmarks demonstrate that our method achieves 3-4x token compression and substantial inference acceleration compared to explicit CoT. Furthermore, it maintains competitive performance against other methods, validating the feasibility of this paradigm. Our code is available at this https URL
思维渲染:将文本思维链渲染为图像以实现视觉潜在推理 / Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
这篇论文提出了一种名为‘思维渲染’的新方法,它将大语言模型推理过程中冗长的文字思维链转换成图像,从而大幅减少了计算负担并加速了推理过程,同时让模型的思考步骤变得可见和可追溯。
源自 arXiv: 2601.14750