📄
Abstract - MARS: Margin-Adversarial Risk-controlled Stopping for Parallel LLM Test-time Scaling
Parallel test-time scaling samples many reasoning traces and majority-votes their answers, improving LLM accuracy but requiring traces to run to completion, incurring substantial computational overhead. We observe that probing partial traces at intermediate checkpoints can extract current answers without disrupting generation, revealing an evolving aggregate vote. Based on this observation, we introduce MARS, a margin-adversarial stopping rule that estimates which active traces are likely to change their answers and stops once the leader remains safe under a conservative bound on future vote movement. The rule separates two sources of uncertainty. It learns the trace-level switch probabilities that determine how much of the current margin is likely to be retained, while handling the harder question of where switching traces land through an adversarial bound calibrated from warmup traces. With true switch probabilities, MARS guarantees with high probability that the early-stopped answer matches the full-budget vote. In practice, a five-feature logistic model closely matches oracle switching behavior. Across three reasoning models and three competition-math benchmarks, MARS saves 25-47% of self-consistency tokens and 14-29% on top of DeepConf Online, a strong confidence-weighted baseline that already filters and truncates weak traces, while matching the accuracy of the corresponding full-budget baselines.
MARS:用于并行大语言模型测试时扩展的对抗性边界风险控制停止策略 /
MARS: Margin-Adversarial Risk-controlled Stopping for Parallel LLM Test-time Scaling
1️⃣ 一句话总结
针对并行生成多条推理链并投票答案的LLM测试时扩展方法计算开销大的问题,本文提出了MARS策略,通过在生成过程中实时探测部分链的中间结果、对尚未完成链的投票变化做保守估计,并引入一个对抗性边界来量化不确定性,从而在保证最终投票结果与完整生成几乎一致的前提下,提前停止大部分冗余推理,节省25%到47%的计算资源。