菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-04
📄 Abstract - SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, assigning high rewards to flawed reasoning paths. This issue is further exacerbated in knowledge graph (KG) reasoning, as there may exist multiple paths between the start and end entities in the KGs, and a risky step can make the reasoning path flawed. Those limitations are problematic in risk-sensitive tasks such as medical and legal KG reasoning. To address the issues, we propose a Schema-aware Cumulative Process Reward Model (SCPRM) that evaluates reasoning paths by conditioning on the reasoning prefix , and incorporating schema distance between current reasoning step and the implicit target parsed from the query, which provides cumulative and future rewards to guide the path explorations. We further integrate SCPRM into Monte Carlo Tree Search (MCTS) as SCPRM-MCTS to conduct multi-hop reasoning on KGs for question answering (QA) tasks. Across medical and legal KGQA and CWQ, SCPRM-MCTS improves the performance of Hits@k by an average of 1.18% over strong baselines, demonstrating more accurate and risk-sensitive reasoning evaluation.

顶级标签: llm knowledge graph model evaluation
详细标签: process reward model monte carlo tree search multi-hop reasoning risk-sensitive reasoning question answering 或 搜索:

SCPRM:一种用于知识图谱问答的架构感知累积过程奖励模型 / SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering


1️⃣ 一句话总结

本文提出了一种名为SCPRM的新模型,它在知识图谱问答中通过结合当前推理步骤与目标之间的架构距离来评估推理路径的正确性,从而解决了传统过程奖励模型因容错效应而误判有缺陷推理路径的问题,并在医学、法律等高风险领域取得了更准确和稳健的推理效果。

源自 arXiv: 2605.02819