FLUX3D:基于扩散对齐稀疏表示的高保真三维高斯生成 / FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation
1️⃣ 一句话总结
本文提出FLUX3D方法,通过改进稀疏体素表示中的特征选择和跨模态对齐机制,解决了现有图像转三维高斯技术中高频细节丢失的问题,从而生成更逼真的三维场景。
Sparse voxel representation has emerged as a scalable foundation for image-to-3D Gaussian Splatting (3DGS) generation, yet current methods struggle to preserve high-frequency visual details of input images due to two structural bottlenecks. First, they adopt discriminative 2D features optimized for semantic abstraction to construct sparse voxel latents, which suppress reconstructive cues and induce a representation bottleneck. Second, in the generation stage, standard diffusion transformers lack effective mechanisms to align dense 2D image tokens with sparse 3D voxel latents, resulting in a cross-modal correspondence bottleneck. To address these issues, we propose FLUX3D, a scalable image-to-3DGS framework that boosts both representation learning and cross-modal alignment during generation. We first revisit 2D feature selection for sparse-voxel-based 3D representation learning, propose Diffusion-Aligned Structured Latents (DA-SLAT) and couple it with a decoder-only architecture to improve 3DGS reconstruction fidelity. We also design a sparse-structure-aware diffusion framework, which integrates the Sparse-structure Multimodal Diffusion Transformer (SMDiT) and Modal-Aware Rotary Positional Embedding (MARoPE) to achieve geometry-agnostic 2D-3D alignment. Extensive benchmark experiments demonstrate that FLUX3D yields substantial improvements in appearance fidelity and significantly outperforms all state-of-the-art (SOTA) methods in generating high-quality 3DGS assets.
FLUX3D:基于扩散对齐稀疏表示的高保真三维高斯生成 / FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation
本文提出FLUX3D方法,通过改进稀疏体素表示中的特征选择和跨模态对齐机制,解决了现有图像转三维高斯技术中高频细节丢失的问题,从而生成更逼真的三维场景。
源自 arXiv: 2606.24874