AnimeAgent:基于图像到视频模型的多智能体是好的迪士尼故事板艺术家吗? / AnimeAgent: Is the Multi-Agent via Image-to-Video models a Good Disney Storytelling Artist?
1️⃣ 一句话总结
这篇论文提出了一个名为AnimeAgent的创新框架,它利用图像到视频模型和多智能体协作,通过模仿迪士尼动画工作流程,解决了现有方法在生成连贯、动态且符合风格的故事板时面临的三大难题,从而显著提升了生成质量。
Custom Storyboard Generation (CSG) aims to produce high-quality, multi-character consistent storytelling. Current approaches based on static diffusion models, whether used in a one-shot manner or within multi-agent frameworks, face three key limitations: (1) Static models lack dynamic expressiveness and often resort to "copy-paste" pattern. (2) One-shot inference cannot iteratively correct missing attributes or poor prompt adherence. (3) Multi-agents rely on non-robust evaluators, ill-suited for assessing stylized, non-realistic animation. To address these, we propose AnimeAgent, the first Image-to-Video (I2V)-based multi-agent framework for CSG. Inspired by Disney's "Combination of Straight Ahead and Pose to Pose" workflow, AnimeAgent leverages I2V's implicit motion prior to enhance consistency and expressiveness, while a mixed subjective-objective reviewer enables reliable iterative refinement. We also collect a human-annotated CSG benchmark with ground-truth. Experiments show AnimeAgent achieves SOTA performance in consistency, prompt fidelity, and stylization.
AnimeAgent:基于图像到视频模型的多智能体是好的迪士尼故事板艺术家吗? / AnimeAgent: Is the Multi-Agent via Image-to-Video models a Good Disney Storytelling Artist?
这篇论文提出了一个名为AnimeAgent的创新框架,它利用图像到视频模型和多智能体协作,通过模仿迪士尼动画工作流程,解决了现有方法在生成连贯、动态且符合风格的故事板时面临的三大难题,从而显著提升了生成质量。
源自 arXiv: 2602.20664