菜单

🤖 系统
📄 Abstract - OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation

Generative models have excelled in RGB synthesis, but real-world applications require RGBA manipulation. This has led to a fragmented landscape: specialized, single-task models handle alpha but lack versatility, while unified multi-task frameworks are confined to the RGB domain. To bridge this critical gap, we propose OmniAlpha, the first unified, multi-task generative framework for sequence-to-sequence RGBA image generation and editing. Its architecture features MSRoPE-BiL, a novel RoPE method with a bi-directionally extendable layer axis for its Diffusion Transformer (DiT) backbone, enabling the concurrent processing of multiple input and target RGBA layers. To power this framework, we introduce AlphaLayers, a new dataset of 1,000 high-quality, multi-layer triplets, built via a novel automated synthesis and filter pipeline. Jointly training OmniAlpha on this dataset across a comprehensive suite of 21 diverse tasks, extensive experiments demonstrate that our unified approach consistently outperforms strong, specialized baselines. Most notably, OmniAlpha achieves a dramatic 84.8% relative reduction in SAD for mask-free matting on AIM-500 and wins over 90% of human preferences in layer-conditioned completion. Our work proves that a unified, multi-task model can learn a superior shared representation for RGBA, paving the way for more powerful, layer-aware generative systems.

顶级标签: computer vision model training aigc
详细标签: rgba generation image editing multi-task learning diffusion transformer alpha matting 或 搜索:

OmniAlpha:统一多任务RGBA图像生成与编辑框架 / OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation


1️⃣ 一句话总结

OmniAlpha是首个基于序列到序列扩散变换器的统一多任务RGBA图像生成与编辑框架,通过创新的MSRoPE-BiL架构和AlphaLayers数据集,在21个任务上联合训练,实现了超越专用模型的性能。


2️⃣ 论文创新点

1. 统一多任务RGBA生成框架

2. MSRoPE-BiL架构

3. AlphaLayers数据集

4. 不透明初始化策略


3️⃣ 主要结果与价值

结果亮点

实际价值


4️⃣ 术语表

📄 打开原文 PDF