CMAP: Cross-Modal Adaptive Prompting for Multi-Domain Task-Incremental Learning

📄 Abstract - CMAP: Cross-Modal Adaptive Prompting for Multi-Domain Task-Incremental Learning

Multi-domain task-incremental learning requires a model to sequentially acquire knowledge across visually diverse domains without forgetting prior tasks, and without access to task identity at inference. Parameter-efficient methods built on frozen vision-language models have made strong progress, yet all existing approaches rely exclusively on visual features for task routing, confidence estimation, and encoder adaptation, leaving CLIP's cross-modal text embedding space entirely unexploited. We address this gap through three contributions. Text-space task routing replaces visual Gaussian matching with cosine similarity to frozen CLIP text prototypes, giving order-independent routing robust to data scarcity at zero parameter cost. Multi-prototype visual-textual confidence replaces single-Gaussian class modeling with K-means visual prototypes and cross-modal alignment scores under task-calibrated thresholds. Symmetric cross-modal gating extends per-layer Gumbel gates to the text encoder conditioned on batch image features, preserving cross-modal alignment on out-of-distribution inputs. On the MTIL benchmark spanning 11 datasets and 1201 classes, our method achieves 74.2% Transfer, 80.5% Average, and 88.7% Last under Order-I, surpassing the prior state of the art by 5.0, 3.7, and 3.0 percentage points with only 2.5M trainable parameters and no external data.

跨模态自适应提示：用于多领域任务增量学习 / CMAP: Cross-Modal Adaptive Prompting for Multi-Domain Task-Incremental Learning

1️⃣ 一句话总结

该论文提出了一种名为CMAP的方法，通过巧妙利用CLIP模型的文本和图像双重信息，让AI模型在学习新视觉任务时既能记住旧知识，又能自动识别当前任务，在11个不同数据集上以极小参数量显著提升了任务增量学习的准确率。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要