菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-17
📄 Abstract - Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Recent visual generative models often struggle with consistency during image editing due to the entangled nature of raster images, where all visual content is fused into a single canvas. In contrast, professional design tools employ layered representations, allowing isolated edits while preserving consistency. Motivated by this, we propose \textbf{Qwen-Image-Layered}, an end-to-end diffusion model that decomposes a single RGB image into multiple semantically disentangled RGBA layers, enabling \textbf{inherent editability}, where each RGBA layer can be independently manipulated without affecting other content. To support variable-length decomposition, we introduce three key components: (1) an RGBA-VAE to unify the latent representations of RGB and RGBA images; (2) a VLD-MMDiT (Variable Layers Decomposition MMDiT) architecture capable of decomposing a variable number of image layers; and (3) a Multi-stage Training strategy to adapt a pretrained image generation model into a multilayer image decomposer. Furthermore, to address the scarcity of high-quality multilayer training images, we build a pipeline to extract and annotate multilayer images from Photoshop documents (PSD). Experiments demonstrate that our method significantly surpasses existing approaches in decomposition quality and establishes a new paradigm for consistent image editing. Our code and models are released on \href{this https URL}{this https URL}

顶级标签: computer vision model training aigc
详细标签: image decomposition layered representation diffusion model image editing rgba-vae 或 搜索:

Qwen-Image-Layered:通过图层分解实现内在可编辑性 / Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition


1️⃣ 一句话总结

这篇论文提出了一个名为Qwen-Image-Layered的扩散模型,它能将一张普通图片自动分解成多个独立的透明图层,从而让用户可以像使用专业设计软件一样,轻松地单独修改图片中的某个部分而不影响其他内容。


源自 arXiv: 2512.15603