Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis

📄 Abstract - Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis

Current remote sensing vision-language models (RS VLMs) demonstrate impressive performance in image interpretation but rely on static training data, limiting their ability to accommodate continuously emerging sensing modalities and downstream tasks. This exposes a fundamental challenge: enabling RS VLMs to continually adapt without catastrophic forgetting. Despite its practical importance, the continual learning capability of RS VLMs remains underexplored, and no dedicated benchmark currently exists. In this work, we present CLeaRS, a comprehensive benchmark for continual vision-language learning in remote sensing. CLeaRS comprises 10 curated subsets with over 207k image-text pairs, spanning diverse interpretation tasks, sensing modalities, and application scenarios. We further define three evaluation protocols: long-horizon, modality-incremental, and task-incremental settings, to systematically assess continual adaptation. Extensive benchmarking of diverse vision-language models reveals catastrophic forgetting across all settings. Moreover, representative continual learning methods, when adapted to RS VLMs, exhibit limited effectiveness in handling task, instruction, and modality transitions. Our findings underscore the need for developing continual learning methods tailored to RS VLMs.

面向遥感领域的持续视觉-语言学习：基准测试与分析 / Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis

1️⃣ 一句话总结

这篇论文针对遥感视觉-语言模型难以持续学习新任务和新数据的问题，创建了一个名为CLeaRS的专用基准测试集，并通过实验发现现有模型和方法在持续学习时普遍存在严重的‘灾难性遗忘’现象，从而强调了开发针对性解决方案的必要性。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要