📄
Abstract - One Refiner to Unlock Them All: Inference-Time Reasoning Elicitation via Reinforcement Query Refinement
Large Language Models (LLMs) often fail to utilize their latent reasoning capabilities due to a distributional mismatch between ambiguous human inquiries and the structured logic required for machine activation. Existing alignment methods either incur prohibitive $O(N)$ costs by fine-tuning each model individually or rely on static prompts that fail to resolve query-level structural complexity. In this paper, we propose ReQueR (\textbf{Re}inforcement \textbf{Que}ry \textbf{R}efinement), a modular framework that treats reasoning elicitation as an inference-time alignment task. We train a specialized Refiner policy via Reinforcement Learning to rewrite raw queries into explicit logical decompositions, treating frozen LLMs as the environment. Rooted in the classical Zone of Proximal Development from educational psychology, we introduce the Adaptive Solver Hierarchy, a curriculum mechanism that stabilizes training by dynamically aligning environmental difficulty with the Refiner's evolving competence. ReQueR yields consistent absolute gains of 1.7\%--7.2\% across diverse architectures and benchmarks, outperforming strong baselines by 2.1\% on average. Crucially, it provides a promising paradigm for one-to-many inference-time reasoning elicitation, enabling a single Refiner trained on a small set of models to effectively unlock reasoning in diverse unseen models. Code is available at this https URL.
一个精炼器解锁所有模型:通过强化查询精炼实现推理时的推理激发 /
One Refiner to Unlock Them All: Inference-Time Reasoning Elicitation via Reinforcement Query Refinement
1️⃣ 一句话总结
本文提出了一种名为ReQueR的模块化框架,通过强化学习训练一个独立的精炼器,在推理过程中自动将模糊的人类提问转化为清晰的逻辑步骤,从而无需微调就能显著提升多种大型语言模型的推理能力,平均性能提高2.1%。