菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-15
📄 Abstract - Compute Allocation for Reasoning-Intensive Retrieval Agents

As agents operate over long horizons, their memory stores grow continuously, making retrieval critical to accessing relevant information. Many agent queries require reasoning-intensive retrieval, where the connection between query and relevant documents is implicit and requires inference to bridge. LLM-augmented pipelines address this through query expansion and candidate re-ranking, but introduce significant inference costs. We study computation allocation in reasoning-intensive retrieval pipelines using the BRIGHT benchmark and Gemini 2.5 model family. We vary model capacity, inference-time thinking, and re-ranking depth across query expansion and re-ranking stages. We find that re-ranking benefits substantially from stronger models (+7.5 NDCG@10) and deeper candidate pools (+21% from $k$=10 to 100), while query expansion shows diminishing returns beyond lightweight models (+1.1 NDCG@10 from weak to strong). Inference-time thinking provides minimal improvement at either stage. These results suggest that compute should be concentrated on re-ranking rather than distributed uniformly across pipeline stages.

顶级标签: llm agents model evaluation
详细标签: retrieval agents compute allocation reasoning-intensive retrieval query expansion re-ranking 或 搜索:

推理密集型检索智能体的计算资源分配 / Compute Allocation for Reasoning-Intensive Retrieval Agents


1️⃣ 一句话总结

这项研究发现,在需要复杂推理才能找到答案的智能体检索系统中,将主要的计算资源集中用于对候选结果进行精细的重新排序,比均匀地分配给查询扩展等其他环节,能更有效地提升检索质量。

源自 arXiv: 2603.14635