菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-13
📄 Abstract - AutoMV: An Automatic Multi-Agent System for Music Video Generation

Music-to-Video (M2V) generation for full-length songs faces significant challenges. Existing methods produce short, disjointed clips, failing to align visuals with musical structure, beats, or lyrics, and lack temporal consistency. We propose AutoMV, a multi-agent system that generates full music videos (MVs) directly from a song. AutoMV first applies music processing tools to extract musical attributes, such as structure, vocal tracks, and time-aligned lyrics, and constructs these features as contextual inputs for following agents. The screenwriter Agent and director Agent then use this information to design short script, define character profiles in a shared external bank, and specify camera instructions. Subsequently, these agents call the image generator for keyframes and different video generators for "story" or "singer" scenes. A Verifier Agent evaluates their output, enabling multi-agent collaboration to produce a coherent longform MV. To evaluate M2V generation, we further propose a benchmark with four high-level categories (Music Content, Technical, Post-production, Art) and twelve ine-grained criteria. This benchmark was applied to compare commercial products, AutoMV, and human-directed MVs with expert human raters: AutoMV outperforms current baselines significantly across all four categories, narrowing the gap to professional MVs. Finally, we investigate using large multimodal models as automatic MV judges; while promising, they still lag behind human expert, highlighting room for future work.

顶级标签: multi-modal aigc multi-agents
详细标签: music-to-video multi-agent system video generation evaluation benchmark automatic content creation 或 搜索:

AutoMV:一个用于音乐视频生成的自动化多智能体系统 / AutoMV: An Automatic Multi-Agent System for Music Video Generation


1️⃣ 一句话总结

这篇论文提出了一个名为AutoMV的自动化多智能体系统,它能够根据整首歌曲自动生成结构连贯、与音乐节拍和歌词内容相匹配的完整音乐视频,并通过新的评估标准证明其效果显著优于现有方法。


源自 arXiv: 2512.12196