菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-16
📄 Abstract - Efficient Text-Guided Convolutional Adapter for the Diffusion Model

We introduce the Nexus Adapters, novel text-guided efficient adapters to the diffusion-based framework for the Structure Preserving Conditional Generation (SPCG). Recently, structure-preserving methods have achieved promising results in conditional image generation by using a base model for prompt conditioning and an adapter for structure input, such as sketches or depth maps. These approaches are highly inefficient and sometimes require equal parameters in the adapter compared to the base architecture. It is not always possible to train the model since the diffusion model is itself costly, and doubling the parameter is highly inefficient. In these approaches, the adapter is not aware of the input prompt; therefore, it is optimal only for the structural input but not for the input prompt. To overcome the above challenges, we proposed two efficient adapters, Nexus Prime and Slim, which are guided by prompts and structural inputs. Each Nexus Block incorporates cross-attention mechanisms to enable rich multimodal conditioning. Therefore, the proposed adapter has a better understanding of the input prompt while preserving the structure. We conducted extensive experiments on the proposed models and demonstrated that the Nexus Prime adapter significantly enhances performance, requiring only 8M additional parameters compared to the baseline, T2I-Adapter. Furthermore, we also introduced a lightweight Nexus Slim adapter with 18M fewer parameters than the T2I-Adapter, which still achieved state-of-the-art results. Code: this https URL

顶级标签: computer vision model training aigc
详细标签: diffusion models conditional generation efficient adaptation multimodal conditioning parameter efficiency 或 搜索:

用于扩散模型的高效文本引导卷积适配器 / Efficient Text-Guided Convolutional Adapter for the Diffusion Model


1️⃣ 一句话总结

这篇论文提出了两种名为Nexus的高效适配器,它们能同时理解文本提示和结构输入(如草图),从而在保持图像结构的同时大幅减少模型参数量,提升了扩散模型的条件图像生成效率。

源自 arXiv: 2602.14514