Reinforcement Fine-Tuning for History-Aware Dense Retriever in RAG

📄 Abstract - Reinforcement Fine-Tuning for History-Aware Dense Retriever in RAG

Retrieval-augmented generation (RAG) enables large language models (LLMs) to produce evidence-based responses, and its performance hinges on the matching between the retriever and LLMs. Retriever optimization has emerged as an efficient alternative to fine-tuning LLMs. However, existing solutions suffer from objective mismatch between retriever optimization and the goal of RAG pipeline. Reinforcement learning (RL) provides a promising solution to address this limitation, yet applying RL to retriever optimization introduces two fundamental challenges: 1) the deterministic retrieval is incompatible with RL formulations, and 2) state aliasing arises from query-only retrieval in multi-hop reasoning. To address these challenges, we replace deterministic retrieval with stochastic sampling and formulate RAG as a Markov decision process, making retriever optimizable by RL. Further, we incorporate retrieval history into the state at each retrieval step to mitigate state aliasing. Extensive experiments across diverse RAG pipelines, datasets, and retriever scales demonstrate consistent improvements of our approach in RAG performance.

面向历史感知密集检索器的强化微调在RAG中的应用 / Reinforcement Fine-Tuning for History-Aware Dense Retriever in RAG

1️⃣ 一句话总结

这篇论文提出了一种用强化学习来优化检索增强生成系统中检索器的新方法，通过引入随机采样和历史信息，解决了传统方法中目标不匹配和状态混淆的问题，从而显著提升了整个系统的回答质量。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要