菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-16
📄 Abstract - GeoNVS: Geometry Grounded Video Diffusion for Novel View Synthesis

Novel view synthesis requires strong 3D geometric consistency and the ability to generate visually coherent images across diverse viewpoints. While recent camera-controlled video diffusion models show promising results, they often suffer from geometric distortions and limited camera controllability. To overcome these challenges, we introduce GeoNVS, a geometry-grounded novel-view synthesizer that enhances both geometric fidelity and camera controllability through explicit 3D geometric guidance. Our key innovation is the Gaussian Splat Feature Adapter (GS-Adapter), which lifts input-view diffusion features into 3D Gaussian representations, renders geometry-constrained novel-view features, and adaptively fuses them with diffusion features to correct geometrically inconsistent representations. Unlike prior methods that inject geometry at the input level, GS-Adapter operates in feature space, avoiding view-dependent color noise that degrades structural consistency. Its plug-and-play design enables zero-shot compatibility with diverse feed-forward geometry models without additional training, and can be adapted to other video diffusion backbones. Experiments across 9 scenes and 18 settings demonstrate state-of-the-art performance, achieving 11.3% and 14.9% improvements over SEVA and CameraCtrl, with up to 2x reduction in translation error and 7x in Chamfer Distance.

顶级标签: computer vision multi-modal model training
详细标签: novel view synthesis 3d gaussian splatting video diffusion geometric consistency feature adaptation 或 搜索:

GeoNVS:基于几何约束的视频扩散模型用于新视角合成 / GeoNVS: Geometry Grounded Video Diffusion for Novel View Synthesis


1️⃣ 一句话总结

这篇论文提出了一种名为GeoNVS的新方法,它通过一个创新的‘高斯溅射特征适配器’将2D图像特征提升为3D几何表示,从而显著提升了从单一视角生成不同角度连贯且几何准确的图像的能力,并且无需额外训练即可兼容多种现有模型。

源自 arXiv: 2603.14965