语义召回:面向向量搜索的新评估指标 / Semantic Recall for Vector Search
1️⃣ 一句话总结
本文提出了一种名为“语义召回”的新指标,用于更准确地评估向量搜索算法的质量,它只关心与查询语义相关的结果,避免了传统指标对不相关近邻的误判,从而能更好地优化搜索效果与成本。
We introduce Semantic Recall, a novel metric to assess the quality of approximate nearest neighbor search algorithms by considering only semantically relevant objects that are theoretically retrievable via exact nearest neighbor search. Unlike traditional recall, semantic recall does not penalize algorithms for failing to retrieve objects that are semantically irrelevant to the query, even if those objects are among their nearest neighbors. We demonstrate that semantic recall is particularly useful for assessing retrieval quality on queries that have few relevant results among their nearest neighbors-a scenario we uncover to be common within embedding datasets. Additionally, we introduce Tolerant Recall, a proxy metric that approximates semantic recall when semantically relevant objects cannot be identified. We empirically show that our metrics are more effective indicators of retrieval quality, and that optimizing search algorithms for these metrics can lead to improved cost-quality tradeoffs.
语义召回:面向向量搜索的新评估指标 / Semantic Recall for Vector Search
本文提出了一种名为“语义召回”的新指标,用于更准确地评估向量搜索算法的质量,它只关心与查询语义相关的结果,避免了传统指标对不相关近邻的误判,从而能更好地优化搜索效果与成本。
源自 arXiv: 2604.20417