菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-09
📄 Abstract - WorldCompass: Reinforcement Learning for Long-Horizon World Models

This work presents WorldCompass, a novel Reinforcement Learning (RL) post-training framework for the long-horizon, interactive video-based world models, enabling them to explore the world more accurately and consistently based on interaction signals. To effectively "steer" the world model's exploration, we introduce three core innovations tailored to the autoregressive video generation paradigm: 1) Clip-level rollout Strategy: We generate and evaluate multiple samples at a single target clip, which significantly boosts rollout efficiency and provides fine-grained reward signals. 2) Complementary Reward Functions: We design reward functions for both interaction-following accuracy and visual quality, which provide direct supervision and effectively suppress reward-hacking behaviors. 3) Efficient RL Algorithm: We employ the negative-aware fine-tuning strategy coupled with various efficiency optimizations to efficiently and effectively enhance model capacity. Evaluations on the SoTA open-source world model, WorldPlay, demonstrate that WorldCompass significantly improves interaction accuracy and visual fidelity across various scenarios.

顶级标签: reinforcement learning video generation model training
详细标签: world models post-training autoregressive video generation reward shaping interactive agents 或 搜索:

WorldCompass:面向长视野世界模型的强化学习框架 / WorldCompass: Reinforcement Learning for Long-Horizon World Models


1️⃣ 一句话总结

这篇论文提出了一个名为WorldCompass的强化学习框架,它通过创新的采样策略、奖励函数和优化算法,显著提升了视频生成类世界模型在长序列任务中遵循指令的准确性和生成画面的质量。

源自 arXiv: 2602.09022