菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-07
📄 Abstract - LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG

Single-step retrieval-augmented generation (RAG) provides an efficient way to incorporate external information for simple question answering tasks but struggles with complex questions. Agentic RAG extends this paradigm by replacing single-step retrieval with a multi-step process, in which the large language model (LLM) acts as a search agent that generates intermediate thoughts and subqueries to iteratively interact with the retrieval system. This iterative process incurs substantial latency due to the autoregressive generation of lengthy thoughts and subqueries. To address this limitation, we propose LatentRAG, a novel framework that shifts both reasoning and retrieval from discrete language space to continuous latent space. Unlike existing explicit methods that generate natural language thoughts or subqueries token-by-token, LatentRAG produces latent tokens for thoughts and subqueries directly from the hidden states in a single forward pass. We align LLMs with dense retrieval models in the latent space, enabling retrieval over latent subquery tokens and supporting end-to-end joint optimization. To improve transparency and encourage semantically meaningful latent representations, we incorporate a parallel latent decoding mechanism that translates latent tokens back into natural language. Extensive experiments on seven benchmark datasets show that LatentRAG achieves performance comparable to explicit agentic RAG methods while reducing inference latency by approximately 90%, substantially narrowing the latency gap with traditional single-step RAG.

顶级标签: llm retrieval natural language processing
详细标签: retrieval augmented generation latent reasoning inference latency dense retrieval end-to-end optimization 或 搜索:

基于潜在推理与检索的高效智能体RAG框架 / LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG


1️⃣ 一句话总结

LatentRAG提出一种新方法,让AI模型在内部‘潜在空间’中完成思考和搜索,而不是逐字生成文字,从而在保持回答质量的同时将推理速度提升近10倍,解决了传统智能体RAG系统响应慢的问题。

源自 arXiv: 2605.06285