统一潜变量(UL):如何训练你的潜变量 / Unified Latents (UL): How to train your latents
1️⃣ 一句话总结
这篇论文提出了一个名为‘统一潜变量’的新框架,它通过巧妙结合扩散模型的先验知识和解码器,能够用更少的计算量高效地学习图像和视频数据的压缩表示,并在多个基准测试中取得了领先的重建质量和生成效果。
We present Unified Latents (UL), a framework for learning latent representations that are jointly regularized by a diffusion prior and decoded by a diffusion model. By linking the encoder's output noise to the prior's minimum noise level, we obtain a simple training objective that provides a tight upper bound on the latent bitrate. On ImageNet-512, our approach achieves competitive FID of 1.4, with high reconstruction quality (PSNR) while requiring fewer training FLOPs than models trained on Stable Diffusion latents. On Kinetics-600, we set a new state-of-the-art FVD of 1.3.
统一潜变量(UL):如何训练你的潜变量 / Unified Latents (UL): How to train your latents
这篇论文提出了一个名为‘统一潜变量’的新框架,它通过巧妙结合扩散模型的先验知识和解码器,能够用更少的计算量高效地学习图像和视频数据的压缩表示,并在多个基准测试中取得了领先的重建质量和生成效果。
源自 arXiv: 2602.17270