菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-28
📄 Abstract - TRACER: Persistent Regularization for Robust Multimodal Finetuning

Mainstream strategies for finetuning pretrained multimodal models often degrade out-of-distribution (OOD) robustness, a phenomenon known as catastrophic forgetting. In this paper, we develop a theoretical framework for multimodal contrastive finetuning, yielding closed-form solutions and a geometric decomposition for each strategy. This framework shows that self-distillation is more effective than other regularization approaches to retain the knowledge of the pretrained model. Our analysis reveals a largely overlooked limitation: standard Exponential Moving Average (EMA) teachers, widely used in robust finetuning, suffer from collapse. To solve this, we prove that a Weighted Moving Average (WMA) teacher maintains a persistent regularizing force over finite horizons and yields bias-free convergence in the task subspace while preserving orthogonal knowledge. These insights motivate **TRACER** (**T**rajectory-**R**obust **A**nchoring for **C**ontrastive **E**ncoder **R**egularization), which combines contrastive learning with WMA-guided multi-perspective distillation. Extensive experiments on CLIP finetuning demonstrate consistent OOD accuracy and calibration gains across three backbone architectures, and comprehensive ablations confirm that TRACER is both principled and robust to hyperparameter choices. Code is available at [this https URL](this https URL).

顶级标签: multi-modal model training
详细标签: finetuning contrastive learning catastrophic forgetting out-of-distribution robustness knowledge distillation 或 搜索:

TRACER:面向鲁棒多模态微调的持久正则化方法 / TRACER: Persistent Regularization for Robust Multimodal Finetuning


1️⃣ 一句话总结

本文提出一种名为TRACER的新方法,通过加权移动平均教师模型引导的多视角蒸馏技术,解决了多模态模型微调时常见的分布外性能下降问题,在保持预训练知识的同时显著提升了模型在未知数据上的准确性和置信度校准能力。

源自 arXiv: 2605.29380