通过边际锐化实现自一致性 / Self-Consistency via Marginal Sharpening
1️⃣ 一句话总结
本文提出一种新的推理方法,通过直接优化答案的边际概率(即考虑多条推理路径最终都支持同一答案),而非仅优化完整输出序列本身,从而更高效地提升大语言模型在数学和编程任务上的推理准确性,并且计算速度比现有方法快数个数量级。
Inference-time sampling can elicit strong reasoning abilities from language models without additional training. Existing power-sampling methods do so by sharpening the distribution over full generated outputs, favoring completions that are individually likely under the model. We argue that this is the wrong object to target for reasoning: a completion entangles a reasoning trace with a final answer, whereas what matters is whether an answer is supported by many plausible reasoning paths. We therefore shift the target from the full-output distribution to the sharpened answer marginal, making self-consistency an inference-time objective rather than a post-hoc voting criterion. Surprisingly, this marginal target admits an efficient approximation: we propose a simple, purely autoregressive parallel sampling algorithm that approximately samples from the sharpened answer marginal, eliciting stronger performance than standard power sampling on mathematics and coding benchmarks while being orders of magnitude faster.
通过边际锐化实现自一致性 / Self-Consistency via Marginal Sharpening
本文提出一种新的推理方法,通过直接优化答案的边际概率(即考虑多条推理路径最终都支持同一答案),而非仅优化完整输出序列本身,从而更高效地提升大语言模型在数学和编程任务上的推理准确性,并且计算速度比现有方法快数个数量级。
源自 arXiv: 2605.28142