Mind the Way You Select Negative Texts: Pursuing the Distance Consistency in OOD Detection with VLMs

📄 Abstract - Mind the Way You Select Negative Texts: Pursuing the Distance Consistency in OOD Detection with VLMs

Out-of-distribution (OOD) detection seeks to identify samples from unknown classes, a critical capability for deploying machine learning models in open-world scenarios. Recent research has demonstrated that Vision-Language Models (VLMs) can effectively leverage their multi-modal representations for OOD detection. However, current methods often incorporate intra-modal distance during OOD detection, such as comparing negative texts with ID labels or comparing test images with image proxies. This design paradigm creates an inherent inconsistency against the inter-modal distance that CLIP-like VLMs are optimized for, potentially leading to suboptimal performance. To address this limitation, we propose InterNeg, a simple yet effective framework that systematically utilizes consistent inter-modal distance enhancement from textual and visual perspectives. From the textual perspective, we devise an inter-modal criterion for selecting negative texts. From the visual perspective, we dynamically identify high-confidence OOD images and invert them into the textual space, generating extra negative text embeddings guided by inter-modal distance. Extensive experiments across multiple benchmarks demonstrate the superiority of our approach. Notably, our InterNeg achieves state-of-the-art performance compared to existing works, with a 3.47\% reduction in FPR95 on the large-scale ImageNet benchmark and a 5.50\% improvement in AUROC on the challenging Near-OOD benchmark.

注意你选择负文本的方式：在视觉语言模型的分布外检测中追求距离一致性 / Mind the Way You Select Negative Texts: Pursuing the Distance Consistency in OOD Detection with VLMs

1️⃣ 一句话总结

这篇论文提出了一个名为InterNeg的新方法，通过确保文本和图像之间距离计算方式的一致性，来显著提升视觉语言模型在识别未知类别图像时的准确性和可靠性。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要