菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-23
📄 Abstract - iFSQ: Improving FSQ for Image Generation with 1 Line of Code

The field of image generation is currently bifurcated into autoregressive (AR) models operating on discrete tokens and diffusion models utilizing continuous latents. This divide, rooted in the distinction between VQ-VAEs and VAEs, hinders unified modeling and fair benchmarking. Finite Scalar Quantization (FSQ) offers a theoretical bridge, yet vanilla FSQ suffers from a critical flaw: its equal-interval quantization can cause activation collapse. This mismatch forces a trade-off between reconstruction fidelity and information efficiency. In this work, we resolve this dilemma by simply replacing the activation function in original FSQ with a distribution-matching mapping to enforce a uniform prior. Termed iFSQ, this simple strategy requires just one line of code yet mathematically guarantees both optimal bin utilization and reconstruction precision. Leveraging iFSQ as a controlled benchmark, we uncover two key insights: (1) The optimal equilibrium between discrete and continuous representations lies at approximately 4 bits per dimension. (2) Under identical reconstruction constraints, AR models exhibit rapid initial convergence, whereas diffusion models achieve a superior performance ceiling, suggesting that strict sequential ordering may limit the upper bounds of generation quality. Finally, we extend our analysis by adapting Representation Alignment (REPA) to AR models, yielding LlamaGen-REPA. Codes is available at this https URL

顶级标签: model training computer vision aigc
详细标签: image generation quantization representation learning benchmark vae 或 搜索:

iFSQ:用一行代码改进FSQ以提升图像生成 / iFSQ: Improving FSQ for Image Generation with 1 Line of Code


1️⃣ 一句话总结

这篇论文通过将原始FSQ中的激活函数替换为一个分布匹配映射,仅用一行代码就解决了图像生成中离散与连续表示之间的权衡问题,并发现每维度约4比特是两者的最佳平衡点,同时揭示了自回归模型收敛快但扩散模型上限更高的规律。

源自 arXiv: 2601.17124