📄
Abstract - UnIte: Uncertainty-based Iterative Document Sampling for Domain Adaptation in Information Retrieval
Unsupervised domain adaptation generalizes neural retrievers to an unseen domain by generating pseudo queries on target domain documents. The quality and efficiency of this adaptation critically depend on which documents are selected for pseudo query generation. The existing document sampling method focuses on diversity but fails to capture model uncertainty. In contrast, we propose **Un**certainty-based **Ite**rative Document Sampling (UnIte) addressing these limitations by (1) filtering documents with high aleatoric uncertainty and (2) prioritizing those with high epistemic uncertainty, maximizing the learning utility of the current model. We conducted extensive experiments on a large corpus of BEIR with small and large models, showing significant gains of +2.45 and +3.49 nDCG@10 with a smaller training sample size, 4k on average.
基于不确定性的迭代文档采样:面向信息检索的领域自适应 /
UnIte: Uncertainty-based Iterative Document Sampling for Domain Adaptation in Information Retrieval
1️⃣ 一句话总结
本文提出了一种新的文档采样方法UnIte,通过区分两种不确定性(数据噪声引起的高偶然不确定性和模型知识不足引起的高认知不确定性)来智能选择最有价值的文档生成伪查询,从而在更少的训练样本下显著提升信息检索模型在新领域的适应效果。