菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-05
📄 Abstract - ProPhy: Progressive Physical Alignment for Dynamic World Simulation

Recent advances in video generation have shown remarkable potential for constructing world simulators. However, current models still struggle to produce physically consistent results, particularly when handling large-scale or complex dynamics. This limitation arises primarily because existing approaches respond isotropically to physical prompts and neglect the fine-grained alignment between generated content and localized physical cues. To address these challenges, we propose ProPhy, a Progressive Physical Alignment Framework that enables explicit physics-aware conditioning and anisotropic generation. ProPhy employs a two-stage Mixture-of-Physics-Experts (MoPE) mechanism for discriminative physical prior extraction, where Semantic Experts infer semantic-level physical principles from textual descriptions, and Refinement Experts capture token-level physical dynamics. This mechanism allows the model to learn fine-grained, physics-aware video representations that better reflect underlying physical laws. Furthermore, we introduce a physical alignment strategy that transfers the physical reasoning capabilities of vision-language models (VLMs) into the Refinement Experts, facilitating a more accurate representation of dynamic physical phenomena. Extensive experiments on physics-aware video generation benchmarks demonstrate that ProPhy produces more realistic, dynamic, and physically coherent results than existing state-of-the-art methods.

顶级标签: video generation multi-modal model training
详细标签: physics-aware generation video simulation mixture of experts physical alignment dynamic world modeling 或 搜索:

ProPhy:面向动态世界模拟的渐进式物理对齐框架 / ProPhy: Progressive Physical Alignment for Dynamic World Simulation


1️⃣ 一句话总结

这篇论文提出了一个名为ProPhy的新框架,它通过渐进式物理对齐和专家混合机制,显著提升了视频生成模型在模拟复杂动态场景时的物理真实性和一致性。


源自 arXiv: 2512.05564