学习推理以实现个性化问答中个人上下文的多步骤检索 / Learning to Reason for Multi-Step Retrieval of Personal Context in Personalized Question Answering
1️⃣ 一句话总结
这篇论文提出了一个名为PR2的强化学习框架,它通过让AI学习在回答个性化问题时,智能地决定何时、如何从用户个人资料中检索信息并融入推理过程,从而生成更符合用户背景和偏好的答案,显著提升了问答系统的个性化效果。
Personalization in Question Answering (QA) requires answers that are both accurate and aligned with users' background, preferences, and historical context. Existing state-of-the-art methods primarily rely on retrieval-augmented generation (RAG) solutions that construct personal context by retrieving relevant items from the user's profile. Existing methods use the user's query directly to retrieve personal documents, and such strategies often lead to surface-level personalization. We propose PR2 (Personalized Retrieval-Augmented Reasoning), a reinforcement learning framework that integrates reasoning and retrieval from personal context for personalization. PR2 learns adaptive retrieval-reasoning policies, determining when to retrieve, what evidence to retrieve from user profiles, and how to incorporate it into intermediate reasoning steps. By optimizing multi-turn reasoning trajectories under a personalized reward function, the framework reinforces reasoning paths that better align with user-specific preferences and contextual signals reflected by the reward model. Extensive experiments on the LaMP-QA benchmark using three LLMs show that PR2 consistently outperforms strong baselines, achieving an average relative improvement of 8.8%-12% in personalized QA.
学习推理以实现个性化问答中个人上下文的多步骤检索 / Learning to Reason for Multi-Step Retrieval of Personal Context in Personalized Question Answering
这篇论文提出了一个名为PR2的强化学习框架,它通过让AI学习在回答个性化问题时,智能地决定何时、如何从用户个人资料中检索信息并融入推理过程,从而生成更符合用户背景和偏好的答案,显著提升了问答系统的个性化效果。
源自 arXiv: 2602.19317