菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-26
📄 Abstract - GeoWorld: Geometric World Models

Energy-based predictive world models provide a powerful approach for multi-step visual planning by reasoning over latent energy landscapes rather than generating pixels. However, existing approaches face two major challenges: (i) their latent representations are typically learned in Euclidean space, neglecting the underlying geometric and hierarchical structure among states, and (ii) they struggle with long-horizon prediction, which leads to rapid degradation across extended rollouts. To address these challenges, we introduce GeoWorld, a geometric world model that preserves geometric structure and hierarchical relations through a Hyperbolic JEPA, which maps latent representations from Euclidean space onto hyperbolic manifolds. We further introduce Geometric Reinforcement Learning for energy-based optimization, enabling stable multi-step planning in hyperbolic latent space. Extensive experiments on CrossTask and COIN demonstrate around 3% SR improvement in 3-step planning and 2% SR improvement in 4-step planning compared to the state-of-the-art V-JEPA 2. Project website: this https URL.

顶级标签: computer vision model training reinforcement learning
详细标签: world models hyperbolic embeddings energy-based models visual planning geometric representation 或 搜索:

GeoWorld:几何世界模型 / GeoWorld: Geometric World Models


1️⃣ 一句话总结

这篇论文提出了一种名为GeoWorld的几何世界模型,它通过将状态表示映射到双曲空间来更好地捕捉其内在的层次和几何结构,从而显著提升了多步视觉规划的稳定性和准确性。

源自 arXiv: 2602.23058