选择性潜在思考:语言模型推理链的自适应压缩 / Selective Latent Thinking: Adaptive Compression of LLM Reasoning Chains
1️⃣ 一句话总结
这篇论文提出了一种名为“选择性潜在思考”的方法,让大语言模型在推理时自动判断哪些步骤可以压缩成更高效的“思维潜影”,哪些步骤必须保留为完整文字推理,从而在几乎不牺牲准确率的前提下,大幅缩短推理链条的长度、降低计算成本。
Explicit chain-of-thought (CoT) reasoning substantially improves the reasoning ability of large language models (LLMs), but incurs high inference cost due to lengthy autoregressive traces. Existing latent reasoning methods offer a promising alternative, yet they often treat reasoning as uniformly compressible, causing precision-critical intermediate steps to be overly compressed and thereby degrading reasoning accuracy. In this work, we propose Selective Latent Thinking (SLT), a framework that selectively compresses redundant reasoning spans into latent representations while preserving precision-critical spans as explicit CoT within the same reasoning trajectory. Specifically, SLT first uses a lightweight decoder to anticipate a short upcoming reasoning span, and then applies confidence-based gating to determine the longest span that can be reliably compressed. The accepted span is encoded into a compact latent representation to improve reasoning efficiency, while uncertain or precision-critical reasoning remains in explicit CoT form to preserve accuracy. To learn this selective compression policy, SLT adopts a three-stage training strategy that combines span-level latent compression, reliability-aware future reasoning prediction, and trajectory-level reinforcement learning to optimize the trade-off between answer correctness and reasoning cost. Extensive experiments across four mathematical reasoning benchmarks demonstrate that SLT achieves 22.7\% higher accuracy than latent reasoning baselines at comparable compression ratios, while reducing reasoning chain length by 58.4\% with only 2.8\% accuracy degradation compared to explicit CoT,Our code can be found in this https URL.
选择性潜在思考:语言模型推理链的自适应压缩 / Selective Latent Thinking: Adaptive Compression of LLM Reasoning Chains
这篇论文提出了一种名为“选择性潜在思考”的方法,让大语言模型在推理时自动判断哪些步骤可以压缩成更高效的“思维潜影”,哪些步骤必须保留为完整文字推理,从而在几乎不牺牲准确率的前提下,大幅缩短推理链条的长度、降低计算成本。
源自 arXiv: 2605.25745