← 返回列表

🤖 系统

📄 Abstract - Mixture of States: Routing Token-Level Dynamics for Multimodal Generation

We introduce MoS (Mixture of States), a novel fusion paradigm for multimodal diffusion models that merges modalities using flexible, state-based interactions. The core of MoS is a learnable, token-wise router that creates denoising timestep- and input-dependent interactions between modalities' hidden states, precisely aligning token-level features with the diffusion trajectory. This router sparsely selects the top-$k$ hidden states and is trained with an $\epsilon$-greedy strategy, efficiently selecting contextual features with minimal learnable parameters and negligible computational overhead. We validate our design with text-to-image generation (MoS-Image) and editing (MoS-Editing), which achieve state-of-the-art results. With only 3B to 5B parameters, our models match or surpass counterparts up to $4\times$ larger. These findings establish MoS as a flexible and compute-efficient paradigm for scaling multimodal diffusion models.

顶级标签: multi-modal model training aigc

📄 论文总结

状态混合：面向多模态生成的路由令牌级动态机制 / Mixture of States: Routing Token-Level Dynamics for Multimodal Generation

1️⃣ 一句话总结

这篇论文提出了一种名为‘状态混合’的新方法，通过智能路由机制动态整合不同模态（如文本和图像）的特征，在显著减少参数量的情况下，实现了与更大模型相媲美甚至更优的多模态生成与编辑效果。

📄 打开原文 PDF

← 返回列表

菜单

🤖 AI 深度阅读

📄 论文总结

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

🤖 AI 深度阅读

📄 论文总结

1️⃣ 一句话总结

获取最新论文摘要