菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-15
📄 Abstract - Hybrid Retrieval for COVID-19 Literature: Comparing Rank Fusion and Projection Fusion with Diversity Reranking

We present a hybrid retrieval system for COVID-19 scientific literature, evaluated on the TREC-COVID benchmark (171,332 papers, 50 expert queries). The system implements six retrieval configurations spanning sparse (SPLADE), dense (BGE), rank-level fusion (RRF), and a projection-based vector fusion (B5) approach. RRF fusion achieves the best relevance (nDCG@10 = 0.828), outperforming dense-only by 6.1% and sparse-only by 14.9%. Our projection fusion variant reaches nDCG@10 = 0.678 on expert queries while being 33% faster (847 ms vs. 1271 ms) and producing 2.2x higher ILD@10 than RRF. Evaluation across 400 queries -- including expert, machine-generated, and three paraphrase styles -- shows that B5 delivers the largest relative gain on keyword-heavy reformulations (+8.8%), although RRF remains best in absolute nDCG@10. On expert queries, MMR reranking increases intra-list diversity by 23.8-24.5% at a 20.4-25.4% nDCG@10 cost. Both fusion pipelines evaluated for latency remain below the sub-2 s target across all query sets. The system is deployed as a Streamlit web application backed by Pinecone serverless indices.

顶级标签: natural language processing systems benchmark
详细标签: information retrieval hybrid search rank fusion diversity reranking covid-19 literature 或 搜索:

COVID-19文献的混合检索:比较排序融合与投影融合及多样性重排序 / Hybrid Retrieval for COVID-19 Literature: Comparing Rank Fusion and Projection Fusion with Diversity Reranking


1️⃣ 一句话总结

这篇论文为COVID-19科学文献开发了一个混合检索系统,通过比较不同的融合方法,发现排序融合在检索相关性上表现最佳,而投影融合则在速度和结果多样性方面更有优势,并最终部署为一个可用的网络应用。

源自 arXiv: 2604.13728