菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-11
📄 Abstract - StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis that models geometry purely through viewpoint conditioning, without explicit depth or warping. A canonical rectified space and the conditioning guide the generator to infer correspondences and fill disocclusions end-to-end. To ensure fair and leakage-free evaluation, we introduce an end-to-end protocol that excludes any ground truth or proxy geometry estimates at test time. The protocol emphasizes metrics reflecting downstream relevance: iSQoE for perceptual comfort and MEt3R for geometric consistency. StereoSpace surpasses other methods from the warp & inpaint, latent-warping, and warped-conditioning categories, achieving sharp parallax and strong robustness on layered and non-Lambertian scenes. This establishes viewpoint-conditioned diffusion as a scalable, depth-free solution for stereo generation.

顶级标签: computer vision multi-modal model training
详细标签: stereo synthesis diffusion models viewpoint conditioning geometric consistency monocular-to-stereo 或 搜索:

StereoSpace:在规范空间中通过端到端扩散实现无需深度的立体几何合成 / StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space


1️⃣ 一句话总结

这篇论文提出了一种名为StereoSpace的新方法,它无需依赖深度信息,仅通过视角引导就能直接生成高质量的立体图像,并且在几何一致性和视觉舒适度上都优于传统方法。


源自 arXiv: 2512.10959