菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-11
📄 Abstract - EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs

Improving the reasoning abilities of large language models (LLMs) has largely relied on iterative self-training with model-generated data. While effective at boosting accuracy, existing approaches primarily reinforce successful reasoning paths, incurring a substantial calibration cost: models become overconfident and lose the ability to represent uncertainty. This failure has been characterized as a form of model collapse in alignment, where predictive distributions degenerate toward low-variance point estimates. We address this issue by reframing reasoning training as an epistemic learning problem, in which models must learn not only how to reason, but also when their reasoning should be trusted. We propose epistemically-calibrated reasoning (EpiCaR) as a training objective that jointly optimizes reasoning performance and calibration, and instantiate it within an iterative supervised fine-tuning framework using explicit self-evaluation signals. Experiments on Llama-3 and Qwen-3 families demonstrate that our approach achieves Pareto-superiority over standard baselines in both accuracy and calibration, particularly in models with sufficient reasoning capacity (e.g., 3B+). This framework generalizes effectively to OOD mathematical reasoning (GSM8K) and code generation (MBPP). Ultimately, our approach enables a 3X reduction in inference compute, matching the K=30 performance of STaR with only K=10 samples in capable models.

顶级标签: llm model training model evaluation
详细标签: reasoning calibration uncertainty quantification self-training epistemic learning 或 搜索:

EpiCaR:让大语言模型知道“自己不知道什么”对提升推理能力至关重要 / EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs


1️⃣ 一句话总结

这篇论文提出了一种名为EpiCaR的新训练方法,它通过同时优化大语言模型的推理能力和自我评估能力,解决了现有方法导致模型过度自信的问题,从而在保持高准确率的同时,让模型能更好地判断自己何时可能出错,最终还能大幅减少推理所需的计算量。

源自 arXiv: 2601.06786