菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-26
📄 Abstract - Semantic-Aware Prefix Learning for Token-Efficient Image Generation

Visual tokenizers play a central role in latent image generation by bridging high-dimensional images and tractable generative modeling. However, most existing tokenizers are still trained with reconstruction-dominated objectives, which often yield latent representations that are only weakly grounded in high-level semantics. Recent approaches improve semantic alignment, but typically treat semantic signals as auxiliary regularization rather than making them functionally necessary for representation learning. We propose SMAP, a SeMantic-Aware Prefix tokenizer that injects class-level semantic conditions into a query-based 1D tokenization framework. To make semantics indispensable during training, SMAP introduces a tail token dropping strategy, which forces semantic conditions and early latent prefixes to bear increasing responsibility under progressively reduced token budgets. To verify that the resulting latent space is useful for generation rather than reconstruction alone, we further introduce CARD, a hybrid Causal AutoRegressive--Diffusion generator. Extensive experiments on ImageNet show that SMAP consistently improves reconstruction quality across discrete and continuous tokenization settings, and that its semantically grounded latent space yields strong downstream generation performance under compact token budgets.

顶级标签: computer vision model training multi-modal
详细标签: image generation tokenization semantic representation latent space prefix learning 或 搜索:

面向语义感知的前缀学习:实现高效令牌的图像生成 / Semantic-Aware Prefix Learning for Token-Efficient Image Generation


1️⃣ 一句话总结

这篇论文提出了一种名为SMAP的新型视觉令牌化方法,通过将类别语义信息强制注入到图像表示学习中,并结合一种创新的尾部令牌丢弃策略,使得生成的图像潜在空间不仅重建质量高,而且语义信息更丰富,从而在少量令牌预算下也能实现高质量的图像生成。

源自 arXiv: 2603.25249