菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-22
📄 Abstract - GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers

Relighting a person from a single photo is an attractive but ill-posed task, as a 2D image ambiguously entangles 3D geometry, intrinsic appearance, and illumination. Current methods either use sequential pipelines that suffer from error accumulation, or they do not explicitly leverage 3D geometry during relighting, which limits physical consistency. Since relighting and estimation of 3D geometry are mutually beneficial tasks, we propose a unified Multi-Modal Diffusion Transformer (DiT) that jointly solves for both: GeoRelight. We make this possible through two key technical contributions: isotropic NDC-Orthographic Depth (iNOD), a distortion-free 3D representation compatible with latent diffusion models; and a strategic mixed-data training method that combines synthetic and auto-labeled real data. By solving geometry and relighting jointly, GeoRelight achieves better performance than both sequential models and previous systems that ignored geometry.

顶级标签: computer vision multi-modal machine learning
详细标签: relighting 3d reconstruction diffusion transformer single image geometry 或 搜索:

GeoRelight:利用灵活的多模态扩散变换器实现联合几何重建与重光照学习 / GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers


1️⃣ 一句话总结

本文提出了一种名为GeoRelight的统一多模态扩散模型,能够从单张照片同时重建人物的3D几何形状和实现重光照,通过将这两个相互受益的任务联合求解,解决了传统分步方法误差累积和光照不一致的问题。

源自 arXiv: 2604.20715