AnimeAgent: Is the Multi-Agent via Image-to-Video models a Good Disney Storytelling Artist?

📄 Abstract - AnimeAgent: Is the Multi-Agent via Image-to-Video models a Good Disney Storytelling Artist?

Custom Storyboard Generation (CSG) aims to produce high-quality, multi-character consistent storytelling. Current approaches based on static diffusion models, whether used in a one-shot manner or within multi-agent frameworks, face three key limitations: (1) Static models lack dynamic expressiveness and often resort to "copy-paste" pattern. (2) One-shot inference cannot iteratively correct missing attributes or poor prompt adherence. (3) Multi-agents rely on non-robust evaluators, ill-suited for assessing stylized, non-realistic animation. To address these, we propose AnimeAgent, the first Image-to-Video (I2V)-based multi-agent framework for CSG. Inspired by Disney's "Combination of Straight Ahead and Pose to Pose" workflow, AnimeAgent leverages I2V's implicit motion prior to enhance consistency and expressiveness, while a mixed subjective-objective reviewer enables reliable iterative refinement. We also collect a human-annotated CSG benchmark with ground-truth. Experiments show AnimeAgent achieves SOTA performance in consistency, prompt fidelity, and stylization.

AnimeAgent：基于图像到视频模型的多智能体是好的迪士尼故事板艺术家吗？ / AnimeAgent: Is the Multi-Agent via Image-to-Video models a Good Disney Storytelling Artist?

1️⃣ 一句话总结

这篇论文提出了一个名为AnimeAgent的创新框架，它利用图像到视频模型和多智能体协作，通过模仿迪士尼动画工作流程，解决了现有方法在生成连贯、动态且符合风格的故事板时面临的三大难题，从而显著提升了生成质量。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要