通过强化学习编译扩散实现高效、属性对齐的扇出检索 / Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion
1️⃣ 一句话总结
这篇论文提出了一种名为R4T的新方法,它先用强化学习训练一个大语言模型来优化检索结果的集合属性(如多样性),然后利用该模型生成训练数据,最终训练一个轻量级的扩散模型来高效地一次性检索出满足复杂属性要求的物品集合,从而在保证质量的同时大幅提升了检索速度。
Many modern retrieval problems are set-valued: given a broad intent, the system must return a collection of results that optimizes higher-order properties (e.g., diversity, coverage, complementarity, coherence) while remaining grounded with respect to a fixed database. Set-valued objectives are typically non-decomposable and are not captured by existing supervised (query, content) datasets which only prioritize top-1 retrieval. Consequently, fan-out retrieval is often employed to generate diverse subqueries to retrieve item sets. While reinforcement learning (RL) can optimize set-level objectives via interaction, deploying an RL-tuned LLM for fan-out retrieval is prohibitively expensive at inference time. Conversely, diffusion-based generative retrieval enables efficient single-pass fan-out in embedding space, but requires objective-aligned training targets. To address these issues, we propose R4T (Retrieve-for-Train), which uses RL once as an objective transducer in a three-step process: (i) train a fan-out LLM with composite set-level rewards, (ii) synthesize objective-consistent training pairs, and (iii) train a lightweight diffusion retriever to model the conditional distribution of set-valued outputs. Across large-scale fashion and music benchmarks consisting of curated item sets, we show that R4T improves retrieval quality relative to strong baselines while reducing query-time fan-out latency by an order of magnitude.
通过强化学习编译扩散实现高效、属性对齐的扇出检索 / Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion
这篇论文提出了一种名为R4T的新方法,它先用强化学习训练一个大语言模型来优化检索结果的集合属性(如多样性),然后利用该模型生成训练数据,最终训练一个轻量级的扩散模型来高效地一次性检索出满足复杂属性要求的物品集合,从而在保证质量的同时大幅提升了检索速度。
源自 arXiv: 2603.06397