ConfCtrl:通过置信度感知插值实现视频扩散模型中的精确相机控制 / ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation
1️⃣ 一句话总结
这篇论文提出了一种名为ConfCtrl的新方法,它能让AI视频生成模型在仅有两张输入图片的情况下,通过智能地融合相机指令和图像几何信息,稳定地生成大视角变化下的、遮挡区域也清晰合理的新视角画面。
We address the challenge of novel view synthesis from only two input images under large viewpoint changes. Existing regression-based methods lack the capacity to reconstruct unseen regions, while camera-guided diffusion models often deviate from intended trajectories due to noisy point cloud projections or insufficient conditioning from camera poses. To address these issues, we propose ConfCtrl, a confidence-aware video interpolation framework that enables diffusion models to follow prescribed camera poses while completing unseen regions. ConfCtrl initializes the diffusion process by combining a confidence-weighted projected point cloud latent with noise as the conditioning input. It then applies a Kalman-inspired predict-update mechanism, treating the projected point cloud as a noisy measurement and using learned residual corrections to balance pose-driven predictions with noisy geometric observations. This allows the model to rely on reliable projections while down-weighting uncertain regions, yielding stable, geometry-aware generation. Experiments on multiple datasets show that ConfCtrl produces geometrically consistent and visually plausible novel views, effectively reconstructing occluded regions under large viewpoint changes.
ConfCtrl:通过置信度感知插值实现视频扩散模型中的精确相机控制 / ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation
这篇论文提出了一种名为ConfCtrl的新方法,它能让AI视频生成模型在仅有两张输入图片的情况下,通过智能地融合相机指令和图像几何信息,稳定地生成大视角变化下的、遮挡区域也清晰合理的新视角画面。
源自 arXiv: 2603.09819