菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-16
📄 Abstract - MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for multimodal webpage generation and a multi-level evaluation protocol for systematic assessment. Experiments demonstrate that MM-WebAgent outperforms code-generation and agent-based baselines, especially on multimodal element generation and integration. Code & Data: this https URL.

顶级标签: agents multi-modal aigc
详细标签: webpage generation hierarchical planning multimodal integration ui/ux design benchmark 或 搜索:

MM-WebAgent:一种用于网页生成的分层多模态网络智能体 / MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation


1️⃣ 一句话总结

这篇论文提出了一个名为MM-WebAgent的智能系统,它通过分层规划和自我反思来协调图像、文字等不同模态内容的生成,解决了现有AI工具生成网页时风格不统一、整体不协调的问题,从而能自动创建出视觉一致、布局合理的网页。

源自 arXiv: 2604.15309