分布匹配变分自编码器 / Distribution Matching Variational AutoEncoder
1️⃣ 一句话总结
这篇论文提出了一种新的变分自编码器,它通过显式地让编码器输出的特征分布去匹配任意指定的目标分布,从而发现自监督学习得到的特征分布能更好地平衡图像重建质量和生成效率,显著提升了图像生成效果。
Most visual generative models compress images into a latent space before applying diffusion or autoregressive modelling. Yet, existing approaches such as VAEs and foundation model aligned encoders implicitly constrain the latent space without explicitly shaping its distribution, making it unclear which types of distributions are optimal for modeling. We introduce \textbf{Distribution-Matching VAE} (\textbf{DMVAE}), which explicitly aligns the encoder's latent distribution with an arbitrary reference distribution via a distribution matching constraint. This generalizes beyond the Gaussian prior of conventional VAEs, enabling alignment with distributions derived from self-supervised features, diffusion noise, or other prior distributions. With DMVAE, we can systematically investigate which latent distributions are more conducive to modeling, and we find that SSL-derived distributions provide an excellent balance between reconstruction fidelity and modeling efficiency, reaching gFID equals 3.2 on ImageNet with only 64 training epochs. Our results suggest that choosing a suitable latent distribution structure (achieved via distribution-level alignment), rather than relying on fixed priors, is key to bridging the gap between easy-to-model latents and high-fidelity image synthesis. Code is avaliable at this https URL.
分布匹配变分自编码器 / Distribution Matching Variational AutoEncoder
这篇论文提出了一种新的变分自编码器,它通过显式地让编码器输出的特征分布去匹配任意指定的目标分布,从而发现自监督学习得到的特征分布能更好地平衡图像重建质量和生成效率,显著提升了图像生成效果。
源自 arXiv: 2512.07778