RS-WorldModel:一个用于遥感理解与未来场景预测的统一模型 / RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting
1️⃣ 一句话总结
这篇论文提出了一个名为RS-WorldModel的统一模型,它能够同时理解遥感图像的变化并预测未来的场景,通过创新的三阶段训练方法,在参数规模远小于其他模型的情况下,在多项任务上超越了现有的大型开源甚至部分闭源模型。
Remote sensing world models aim to both explain observed changes and forecast plausible futures, two tasks that share spatiotemporal priors. Existing methods, however, typically address them separately, limiting cross-task transfer. We present RS-WorldModel, a unified world model for remote sensing that jointly handles spatiotemporal change understanding and text-guided future scene forecasting, and we build RSWBench-1.1M, a 1.1 million sample dataset with rich language annotations covering both tasks. RS-WorldModel is trained in three stages: (1) Geo-Aware Generative Pre-training (GAGP) conditions forecasting on geographic and acquisition metadata; (2) synergistic instruction tuning (SIT) jointly trains understanding and forecasting; (3) verifiable reinforcement optimization (VRO) refines outputs with verifiable, task-specific rewards. With only 2B parameters, RS-WorldModel surpasses open-source models up to 120$ \times $ larger on most spatiotemporal change question-answering metrics. It achieves an FID of 43.13 on text-guided future scene forecasting, outperforming all open-source baselines as well as the closed-source Gemini-2.5-Flash Image (Nano Banana).
RS-WorldModel:一个用于遥感理解与未来场景预测的统一模型 / RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting
这篇论文提出了一个名为RS-WorldModel的统一模型,它能够同时理解遥感图像的变化并预测未来的场景,通过创新的三阶段训练方法,在参数规模远小于其他模型的情况下,在多项任务上超越了现有的大型开源甚至部分闭源模型。
源自 arXiv: 2603.14941