PRISM: Feed-Forward Single-Image 3D Reconstruction via Geometric Warp-Residual Modeling

📄 Abstract - PRISM: Feed-Forward Single-Image 3D Reconstruction via Geometric Warp-Residual Modeling

Reconstructing 3D scenes from a single image is a fundamental challenge in computer vision, with broad applications in virtual reality, robotics, and content creation. Recent methods achieve outstanding performance by leveraging camera-controlled video diffusion models, but rely on iterative diffusion sampling, which greatly limits their practical deployment. We observe that geometric forward warping alone can cover the majority of a target view directly from the input image, with only a compact residual left for the encoder to correct. Motivated by this observation, we propose PRISM, a feed-forward framework that decomposes multi-view latent prediction into a parameter-free geometric prior and a learned residual correction, with no diffusion sampling required at inference. To enable generalization from purely synthetic training data, we devise a two-stage training strategy combining latents supervised distillation for geometric generalization and perceptual fine-tuning for appearance quality optimization. Extensive experiments on three benchmarks demonstrate that PRISM achieves competitive reconstruction quality compared with diffusion-based methods, while reducing inference time dramatically to only 36 seconds per scene.

PRISM：基于几何扭曲残差建模的前馈式单图像三维重建 / PRISM: Feed-Forward Single-Image 3D Reconstruction via Geometric Warp-Residual Modeling

1️⃣ 一句话总结

本文提出了一种无需迭代扩散采样的前馈式三维重建方法PRISM，它通过将多视角预测拆解为简单的几何扭曲先验和轻量级的残差修正，在保持重建质量的同时将推理时间大幅缩短至每场景36秒，实现了速度与效果的良好平衡。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要