菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-30
📄 Abstract - Calibrated Fusion for Heterogeneous Graph-Vector Retrieval in Multi-Hop QA

Graph-augmented retrieval combines dense similarity with graph-based relevance signals such as Personalized PageRank (PPR), but these scores have different distributions and are not directly comparable. We study this as a score calibration problem for heterogeneous retrieval fusion in multi-hop question answering. Our method, PhaseGraph, maps vector and graph scores to a common unit-free scale using percentile-rank normalization (PIT) before fusion, enabling stable combination without discarding magnitude information. Across MuSiQue and 2WikiMultiHopQA, calibrated fusion improves held-out last-hop retrieval on HippoRAG2-style benchmarks: LastHop@5 increases from 75.1% to 76.5% on MuSiQue (8W/1L, p=0.039) and from 51.7% to 53.6% on 2WikiMultiHopQA (11W/2L, p=0.023), both on independent held-out test splits. A theory-driven ablation shows that percentile-based calibration is directionally more robust than min-max normalization on both tune and test splits (1W/6L, p=0.125), while Boltzmann weighting performs comparably to linear fusion after calibration (0W/3L, p=0.25). These results suggest that score commensuration is a robust design choice, and the exact post-calibration operator appears to matter less on these benchmarks.

顶级标签: natural language processing machine learning data
详细标签: retrieval augmentation score calibration multi-hop qa graph retrieval fusion methods 或 搜索:

多跳问答中异构图-向量检索的校准融合方法 / Calibrated Fusion for Heterogeneous Graph-Vector Retrieval in Multi-Hop QA


1️⃣ 一句话总结

这篇论文提出了一种名为PhaseGraph的校准融合方法,通过将图检索和向量检索这两种不同来源的分数统一到同一无单位尺度后再进行融合,有效提升了多跳问答中最终答案检索的准确率。

源自 arXiv: 2603.28886