CPC-VAR:视觉自回归模型中的持续个性化与组合生成 / CPC-VAR:Continual Personalized and Compositional Generation in Visual Autoregressive Models
1️⃣ 一句话总结
本文针对视觉自回归模型在处理用户不断变化的需求时遇到的遗忘旧概念和组合多概念困难的问题,提出了一种新框架,通过识别并只调整关键神经元来防止遗忘,并利用空间引导的分支模型和交叉注意力融合技术,实现了可控且清晰的多概念图像生成。
Visual autoregressive (VAR) models have recently emerged as an efficient paradigm for text-to-image generation. Despite their strong generative capability, existing VAR-based personalization methods remain limited to static settings, failing to accommodate evolving user demands. In particular, sequential concept learning leads to severe catastrophic forgetting, while multi-concept synthesis often suffers from feature entanglement and attribute inconsistency. In this work, we present the first systematic study of continual personalized generation in VAR models. We identify two key challenges: (i) preserving previously learned concepts during sequential customization, and (ii) composing multiple personalized concepts in a controllable manner. To address these issues, we propose a unified framework with two core components. For continual single-concept learning, we introduce Gradient-based Concept Neuron Selection (GCNS), which identifies concept-relevant neurons and constrains only conflicting parameters across tasks, effectively mitigating forgetting without additional model expansion. For multi-concept synthesis, we propose a context-aware composition strategy that performs multi-branch feature modeling and localized cross-attention fusion guided by spatial conditions, enabling precise and disentangled concept composition. Extensive experiments demonstrate that our method significantly improves performance in long-sequence continual personalization while achieving superior results in multi-concept image synthesis compared to existing baselines. These findings highlight the potential of VAR models for scalable and controllable personalized generation.
CPC-VAR:视觉自回归模型中的持续个性化与组合生成 / CPC-VAR:Continual Personalized and Compositional Generation in Visual Autoregressive Models
本文针对视觉自回归模型在处理用户不断变化的需求时遇到的遗忘旧概念和组合多概念困难的问题,提出了一种新框架,通过识别并只调整关键神经元来防止遗忘,并利用空间引导的分支模型和交叉注意力融合技术,实现了可控且清晰的多概念图像生成。
源自 arXiv: 2605.19750