菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-09
📄 Abstract - Listen, Look, and Learn: Learning Without Forgetting through SAM-Audio

Class-Incremental Learning (CIL) aims to continuously learn new classes without forgetting previously acquired knowledge. While recent CIL advances have spurred significant interest across various modalities, the audio-visual setting remains underexplored. Furthermore, although foundational multimodal models like SAM-Audio encapsulate rich static priors, our empirical analysis reveals that these representations struggle in incremental settings. This work bridges this gap by integrating SAM-Audio's audio-visual priors into the CIL setting. Specifically, we leverage its dense audio and visual representations and employ a novel guided attention strategy where the audio features contextually guide the visual representations. To further mitigate catastrophic forgetting, we introduce dual-level distillation objectives at both the feature and logit levels. Extensive evaluations on audio-visual CIL benchmarks demonstrate that our approach consistently outperforms state-of-the-art methods.

顶级标签: multi-modal machine learning
详细标签: class-incremental learning audio-visual continual learning knowledge distillation sam-audio 或 搜索:

倾听、观看与学习:通过SAM-Audio实现无遗忘的持续学习 / Listen, Look, and Learn: Learning Without Forgetting through SAM-Audio


1️⃣ 一句话总结

该论文提出了一种新方法,利用SAM-Audio模型的音视频先验知识,通过引导注意力机制和双重蒸馏损失,有效解决了音视频场景下的类别增量学习中的灾难性遗忘问题。

源自 arXiv: 2606.10887