菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-09
📄 Abstract - GimbalDiffusion: Gravity-Aware Camera Control for Video Generation

Recent progress in text-to-video generation has achieved remarkable realism, yet fine-grained control over camera motion and orientation remains elusive. Existing approaches typically encode camera trajectories through relative or ambiguous representations, limiting explicit geometric control. We introduce GimbalDiffusion, a framework that enables camera control grounded in physical-world coordinates, using gravity as a global reference. Instead of describing motion relative to previous frames, our method defines camera trajectories in an absolute coordinate system, allowing precise and interpretable control over camera parameters without requiring an initial reference frame. We leverage panoramic 360-degree videos to construct a wide variety of camera trajectories, well beyond the predominantly straight, forward-facing trajectories seen in conventional video data. To further enhance camera guidance, we introduce null-pitch conditioning, an annotation strategy that reduces the model's reliance on text content when conflicting with camera specifications (e.g., generating grass while the camera points towards the sky). Finally, we establish a benchmark for camera-aware video generation by rebalancing SpatialVID-HQ for comprehensive evaluation under wide camera pitch variation. Together, these contributions advance the controllability and robustness of text-to-video models, enabling precise, gravity-aligned camera manipulation within generative frameworks.

顶级标签: video generation aigc computer vision
详细标签: camera control text-to-video gravity alignment panoramic video conditional generation 或 搜索:

GimbalDiffusion:面向视频生成的重力感知相机控制框架 / GimbalDiffusion: Gravity-Aware Camera Control for Video Generation


1️⃣ 一句话总结

这篇论文提出了一个名为GimbalDiffusion的新框架,它利用重力作为全局参考,让AI在生成视频时能像真实摄影师一样,精确、直观地控制相机的移动和角度,从而创造出更丰富、更符合物理规律的动态画面。


源自 arXiv: 2512.09112