菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-11
📄 Abstract - Multi-Label Test-Time Adaptation with Bayesian Conditional Priors

Multi-label recognition with frozen Vision-Language Models (VLMs) is brittle under distribution shift: standard zero-shot inference scores labels independently, ignoring co-occurrence structure and producing incoherent label sets where dominant concepts suppress weaker but compatible labels. We introduce Bayesian Conditional Priors (BCP) Estimation, a gradient-free test-time adaptation method that injects label dependency without tuning the backbone. BCP views zero-shot logits as a proxy for marginal posteriors under a fixed image-text likelihood and attributes shift-induced errors mainly to a mismatched label prior. For each test image, it selects a high-confidence anchor label and applies an anchor-conditioned Bayesian refinement. This update is closed-form in logit space and admits a pointwise mutual information (PMI) interpretation, explicitly promoting compatible labels and suppressing incompatible ones. BCP operates without target annotations by estimating anchor-conditioned priors online from the unlabeled test stream via lightweight second-order co-occurrence statistics, adding negligible overhead beyond a single forward pass. Across standard multi-label benchmarks and multiple CLIP backbones, BCP consistently outperforms strong TTA baselines, e.g., improving RN50 average mAP from 57.31 to 69.22 and ViT-B/16 from 62.61 to 71.79.

顶级标签: multi-modal model evaluation machine learning
详细标签: test-time adaptation multi-label recognition bayesian inference distribution shift vision-language models 或 搜索:

基于贝叶斯条件先验的多标签测试时自适应 / Multi-Label Test-Time Adaptation with Bayesian Conditional Priors


1️⃣ 一句话总结

本文提出了一种名为BCP的轻量级测试时自适应方法,无需重新训练模型,仅通过在线估计标签共现关系并利用贝叶斯推理修正预测结果,就能让冻结的视觉语言模型在多标签识别任务中显著提升对数据分布变化的鲁棒性。

源自 arXiv: 2606.12925