菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-07
📄 Abstract - PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

Large language models have become drivers of evolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and research tasks, where evaluations are expensive, and progress depends on learning task-specific search dynamics. We introduce PACEvolve++, an advisor-model reinforcement learning framework for test-time policy adaptation in evolutionary search agents. PACEvolve++ decouples strategic search decisions from implementation: a trainable advisor generates, assesses, and selects hypotheses, while a stronger frontier model translates selected hypotheses into executable candidates. To train the advisor under non-stationary feedback, we propose a phase-adaptive approach that adapts its optimization strategy to different phases of the evolutionary process. Early in evolution, it uses group-relative feedback to learn broad search preferences; later, as reward gaps compress, it emphasizes best-of-$k$ frontier contribution to support stable refinement. Across expert-parallel load balancing, sequential recommendation, and protein fitness extrapolation, PACEvolve++ outperforms the state-of-the-art evolutionary search framework with frontier models, achieving faster convergence and stabilizing test-time training during evolutionary search.

顶级标签: llm reinforcement learning agents
详细标签: evolutionary search test-time adaptation adversarial training policy learning optimization 或 搜索:

PACEvolve++:改进进化搜索智能体的测试时学习 / PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents


1️⃣ 一句话总结

这篇论文提出了一种名为PACEvolve++的强化学习框架,通过在测试阶段动态调整策略,帮助进化搜索智能体更快、更稳定地找到最优解,从而显著提升在工程设计与生物计算等昂贵评估任务中的性能。

源自 arXiv: 2605.07039