Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

📄 Abstract - Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Recent visual generative models often struggle with consistency during image editing due to the entangled nature of raster images, where all visual content is fused into a single canvas. In contrast, professional design tools employ layered representations, allowing isolated edits while preserving consistency. Motivated by this, we propose \textbf{Qwen-Image-Layered}, an end-to-end diffusion model that decomposes a single RGB image into multiple semantically disentangled RGBA layers, enabling \textbf{inherent editability}, where each RGBA layer can be independently manipulated without affecting other content. To support variable-length decomposition, we introduce three key components: (1) an RGBA-VAE to unify the latent representations of RGB and RGBA images; (2) a VLD-MMDiT (Variable Layers Decomposition MMDiT) architecture capable of decomposing a variable number of image layers; and (3) a Multi-stage Training strategy to adapt a pretrained image generation model into a multilayer image decomposer. Furthermore, to address the scarcity of high-quality multilayer training images, we build a pipeline to extract and annotate multilayer images from Photoshop documents (PSD). Experiments demonstrate that our method significantly surpasses existing approaches in decomposition quality and establishes a new paradigm for consistent image editing. Our code and models are released on \href{this https URL}{this https URL}

Qwen-Image-Layered：通过图层分解实现内在可编辑性 / Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

1️⃣ 一句话总结

这篇论文提出了一个名为Qwen-Image-Layered的扩散模型，它能将一张普通图片自动分解成多个独立的透明图层，从而让用户可以像使用专业设计软件一样，轻松地单独修改图片中的某个部分而不影响其他内容。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要