基于双向进化搜索的自我改进语言模型 / Self-Improving Language Models with Bidirectional Evolutionary Search
1️⃣ 一句话总结
本文提出了一种名为双向进化搜索(BES)的新方法,通过结合正向的候选方案进化(如重组部分解决方案)和反向的目标分解(将复杂任务拆解为可验证的子目标),解决了传统搜索方法(如最佳N采样或树搜索)在语言模型自我改进中探索范围有限和反馈稀疏的难题,显著提升了模型在训练和推理阶段的性能。
Search has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inference. However, widely used methods such as best-of-N sampling and tree search face two fundamental limitations: they are guided by sparse verification signals, and they construct candidates primarily through autoregressive expansion, restricting exploration to regions with substantial model probability mass. To address these, we propose Bidirectional Evolutionary Search (BES), a search framework that couples forward candidate evolution with backward goal decomposition. In the forward search, BES augments standard expansion with evolution operators that recombine partial trajectories to generate candidates that are difficult to obtain from a single model rollout. In the backward search, BES recursively decomposes the original task into checkable subgoals, producing dense intermediate feedback that guides forward search. We provide theoretical motivation showing that candidates generated by expansion-only search are confined to a narrow entropy shell while evolutionary operators can escape it, and that backward search can exponentially reduce the number of required samples to find a correct answer. Experiments show that on challenging post-training tasks where mainstream post-training algorithms fail to improve, BES enables consistent gains, and on three open problem solving benchmarks at inference time, BES outperforms existing open-source frameworks in both average and best-case performance. Code and trained models are available at this https URL.
基于双向进化搜索的自我改进语言模型 / Self-Improving Language Models with Bidirectional Evolutionary Search
本文提出了一种名为双向进化搜索(BES)的新方法,通过结合正向的候选方案进化(如重组部分解决方案)和反向的目标分解(将复杂任务拆解为可验证的子目标),解决了传统搜索方法(如最佳N采样或树搜索)在语言模型自我改进中探索范围有限和反馈稀疏的难题,显著提升了模型在训练和推理阶段的性能。
源自 arXiv: 2605.28814