Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents

📄 Abstract - Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents

When an LLM-based agent improves on a task, is the gain from the model itself or from the reasoning paradigm wrapped around it? We study this question by comparing six inference-time paradigms, namely Direct, CoT, ReAct, Plan-Execute, Reflection, and ReCode, across four frontier LLMs and ten benchmarks, yielding roughly 18,000 runs. We find that reasoning structure helps dramatically on some tasks but hurts on others: ReAct improves over Direct by 44pp on GAIA, while CoT degrades performance by 15pp on HumanEval. No single paradigm dominates, and oracle per-task selection beats the best fixed paradigm by 17.1pp on average. Motivated by this complementarity, we propose a select-then-solve approach: before answering each task, a lightweight embedding-based router selects the most suitable paradigm. Across four models, the router improves average accuracy from 47.6% to 53.1%, outperforming the best fixed paradigm at 50.3% by 2.8pp and recovering up to 37% of the oracle gap. In contrast, zero-shot self-routing only works for GPT-5 at 67.1% and fails for weaker models, all trailing the learned router. Our results argue that reasoning paradigm selection should be a per-task decision made by a learned router, not a fixed architectural choice.

先选择后解决：将范式路由作为LLM智能体在推理时的优化策略 / Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents

1️⃣ 一句话总结

这篇论文研究发现，不同的推理范式（如直接回答、思维链、反思等）在不同任务上表现差异巨大，没有一种范式能通吃所有任务，因此提出了一种轻量级的学习型路由器，能在处理每个任务前自动选择最合适的推理范式，从而显著提升大语言模型智能体的整体表现。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要