MM-WebAgent:一种用于网页生成的分层多模态网络智能体 / MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
1️⃣ 一句话总结
这篇论文提出了一个名为MM-WebAgent的智能系统,它通过分层规划和自我反思来协调图像、文字等不同模态内容的生成,解决了现有AI工具生成网页时风格不统一、整体不协调的问题,从而能自动创建出视觉一致、布局合理的网页。
The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for multimodal webpage generation and a multi-level evaluation protocol for systematic assessment. Experiments demonstrate that MM-WebAgent outperforms code-generation and agent-based baselines, especially on multimodal element generation and integration. Code & Data: this https URL.
MM-WebAgent:一种用于网页生成的分层多模态网络智能体 / MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
这篇论文提出了一个名为MM-WebAgent的智能系统,它通过分层规划和自我反思来协调图像、文字等不同模态内容的生成,解决了现有AI工具生成网页时风格不统一、整体不协调的问题,从而能自动创建出视觉一致、布局合理的网页。
源自 arXiv: 2604.15309