← 返回列表

arXiv 提交日期: 2026-02-19

📄 Abstract - Unified Latents (UL): How to train your latents

We present Unified Latents (UL), a framework for learning latent representations that are jointly regularized by a diffusion prior and decoded by a diffusion model. By linking the encoder's output noise to the prior's minimum noise level, we obtain a simple training objective that provides a tight upper bound on the latent bitrate. On ImageNet-512, our approach achieves competitive FID of 1.4, with high reconstruction quality (PSNR) while requiring fewer training FLOPs than models trained on Stable Diffusion latents. On Kinetics-600, we set a new state-of-the-art FVD of 1.3.

顶级标签: model training computer vision multi-modal

统一潜变量（UL）：如何训练你的潜变量 / Unified Latents (UL): How to train your latents

1️⃣ 一句话总结

这篇论文提出了一个名为‘统一潜变量’的新框架，它通过巧妙结合扩散模型的先验知识和解码器，能够用更少的计算量高效地学习图像和视频数据的压缩表示，并在多个基准测试中取得了领先的重建质量和生成效果。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2602.17270

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要