菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-19
📄 Abstract - Unified Latents (UL): How to train your latents

We present Unified Latents (UL), a framework for learning latent representations that are jointly regularized by a diffusion prior and decoded by a diffusion model. By linking the encoder's output noise to the prior's minimum noise level, we obtain a simple training objective that provides a tight upper bound on the latent bitrate. On ImageNet-512, our approach achieves competitive FID of 1.4, with high reconstruction quality (PSNR) while requiring fewer training FLOPs than models trained on Stable Diffusion latents. On Kinetics-600, we set a new state-of-the-art FVD of 1.3.

顶级标签: model training computer vision multi-modal
详细标签: latent representation diffusion models image generation video generation training efficiency 或 搜索:

统一潜变量(UL):如何训练你的潜变量 / Unified Latents (UL): How to train your latents


1️⃣ 一句话总结

这篇论文提出了一个名为‘统一潜变量’的新框架,它通过巧妙结合扩散模型的先验知识和解码器,能够用更少的计算量高效地学习图像和视频数据的压缩表示,并在多个基准测试中取得了领先的重建质量和生成效果。

源自 arXiv: 2602.17270