菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-07-01
📄 Abstract - Revisiting Chain-of-Thought Reasoning under Limited Supervision: Semi-supervised Chain-of-Thought Learning

Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent reasoning capabilities in large language models. However, most existing CoT methods use reasoning chains mainly as inference-time prompts, while the generated reasoning traces are rarely reused as semi-supervised learning signals. In this report, we define \textbf{Semi-supervised Chain-of-Thought Learning} and propose \textbf{Semi-CoT}, a simple framework that uses unlabeled questions to construct pseudo reasoning supervision. Semi-CoT samples multiple pseudo-CoTs for each unlabeled question, estimates answer-level semantic entropy, and selects low-entropy reasoning chains as reliable pseudo-CoT demonstrations. This extends the self-training view of CoT from inference-time refinement to semi-supervised pseudo-supervision. Pilot experiments on AQuA, SVAMP, GSM8K, and MultiArith show that the entropy gate selects high-precision pseudo-CoTs, with pseudo-answer precision ranging from $91.36\%$ to $100\%$. Semi-CoT also gives small gains on SVAMP and GSM8K, while AQuA shows negative transfer and MultiArith reaches a ceiling. These results suggest that unlabeled questions can provide reliable pseudo reasoning signals, but their effective use still requires stronger demonstration selection or student training.

顶级标签: llm natural language processing
详细标签: chain-of-thought reasoning semi-supervised learning pseudo labeling entropy filtering reasoning evaluation 或 搜索:

重新审视有限监督下的思维链推理:半监督思维链学习 / Revisiting Chain-of-Thought Reasoning under Limited Supervision: Semi-supervised Chain-of-Thought Learning


1️⃣ 一句话总结

本文提出了一种名为Semi-CoT的半监督思维链学习框架,通过利用无标签问题自动生成可靠的推理链作为训练信号,从而在减少人工标注成本的同时提升大语言模型的推理能力,实验在多个数学推理数据集上验证了其有效性,但也发现需要更优的策略来避免负迁移或性能瓶颈。

源自 arXiv: 2607.01511