菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-10
📄 Abstract - ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation

We address the challenge of novel view synthesis from only two input images under large viewpoint changes. Existing regression-based methods lack the capacity to reconstruct unseen regions, while camera-guided diffusion models often deviate from intended trajectories due to noisy point cloud projections or insufficient conditioning from camera poses. To address these issues, we propose ConfCtrl, a confidence-aware video interpolation framework that enables diffusion models to follow prescribed camera poses while completing unseen regions. ConfCtrl initializes the diffusion process by combining a confidence-weighted projected point cloud latent with noise as the conditioning input. It then applies a Kalman-inspired predict-update mechanism, treating the projected point cloud as a noisy measurement and using learned residual corrections to balance pose-driven predictions with noisy geometric observations. This allows the model to rely on reliable projections while down-weighting uncertain regions, yielding stable, geometry-aware generation. Experiments on multiple datasets show that ConfCtrl produces geometrically consistent and visually plausible novel views, effectively reconstructing occluded regions under large viewpoint changes.

顶级标签: computer vision video generation model training
详细标签: novel view synthesis video diffusion camera control confidence-aware interpolation geometry-aware generation 或 搜索:

ConfCtrl:通过置信度感知插值实现视频扩散模型中的精确相机控制 / ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation


1️⃣ 一句话总结

这篇论文提出了一种名为ConfCtrl的新方法,它能让AI视频生成模型在仅有两张输入图片的情况下,通过智能地融合相机指令和图像几何信息,稳定地生成大视角变化下的、遮挡区域也清晰合理的新视角画面。

源自 arXiv: 2603.09819