← 返回列表

arXiv 提交日期: 2026-03-01

📄 Abstract - LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model

We present \textbf{LLaDA-o}, an effective and length-adaptive omni diffusion model for multimodal understanding and generation. LLaDA-o is built on a Mixture of Diffusion (MoD) framework that decouples discrete masked diffusion for text understanding and continuous diffusion for visual generation, while coupling them through a shared, simple, and efficient attention backbone that reduces redundant computation for fixed conditions. Building on MoD, we further introduce a data-centric length adaptation strategy that enables flexible-length decoding in multimodal settings without architectural changes. Extensive experiments show that LLaDA-o achieves state-of-the-art performance among omni-diffusion models on multimodal understanding and generation benchmarks, and reaches 87.04 on DPG-Bench for text-to-image generation, supporting the effectiveness of unified omni diffusion modeling. Code is available at this https URL.

顶级标签: multi-modal model training aigc

LLaDA-o：一种高效且长度自适应的全能扩散模型 / LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model

1️⃣ 一句话总结

这篇论文提出了一种名为LLaDA-o的新型扩散模型，它通过创新的混合框架和数据驱动的长度自适应策略，在理解和生成文本、图像等多种模态内容上取得了顶尖性能，且无需改变模型结构就能灵活处理不同长度的输出。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2603.01068

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要