菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-16
📄 Abstract - RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting

Remote sensing world models aim to both explain observed changes and forecast plausible futures, two tasks that share spatiotemporal priors. Existing methods, however, typically address them separately, limiting cross-task transfer. We present RS-WorldModel, a unified world model for remote sensing that jointly handles spatiotemporal change understanding and text-guided future scene forecasting, and we build RSWBench-1.1M, a 1.1 million sample dataset with rich language annotations covering both tasks. RS-WorldModel is trained in three stages: (1) Geo-Aware Generative Pre-training (GAGP) conditions forecasting on geographic and acquisition metadata; (2) synergistic instruction tuning (SIT) jointly trains understanding and forecasting; (3) verifiable reinforcement optimization (VRO) refines outputs with verifiable, task-specific rewards. With only 2B parameters, RS-WorldModel surpasses open-source models up to 120$ \times $ larger on most spatiotemporal change question-answering metrics. It achieves an FID of 43.13 on text-guided future scene forecasting, outperforming all open-source baselines as well as the closed-source Gemini-2.5-Flash Image (Nano Banana).

顶级标签: multi-modal computer vision model training
详细标签: remote sensing world model spatiotemporal forecasting instruction tuning generative pre-training 或 搜索:

RS-WorldModel:一个用于遥感理解与未来场景预测的统一模型 / RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting


1️⃣ 一句话总结

这篇论文提出了一个名为RS-WorldModel的统一模型,它能够同时理解遥感图像的变化并预测未来的场景,通过创新的三阶段训练方法,在参数规模远小于其他模型的情况下,在多项任务上超越了现有的大型开源甚至部分闭源模型。

源自 arXiv: 2603.14941