菜单

🤖 系统
📄 Abstract - CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation

Cooking is a sequential and visually grounded activity, where each step such as chopping, mixing, or frying carries both procedural logic and visual semantics. While recent diffusion models have shown strong capabilities in text-to-image generation, they struggle to handle structured multi-step scenarios like recipe illustration. Additionally, current recipe illustration methods are unable to adjust to the natural variability in recipe length, generating a fixed number of images regardless of the actual instructions structure. To address these limitations, we present CookAnything, a flexible and consistent diffusion-based framework that generates coherent, semantically distinct image sequences from textual cooking instructions of arbitrary length. The framework introduces three key components: (1) Step-wise Regional Control (SRC), which aligns textual steps with corresponding image regions within a single denoising process; (2) Flexible RoPE, a step-aware positional encoding mechanism that enhances both temporal coherence and spatial diversity; and (3) Cross-Step Consistency Control (CSCC), which maintains fine-grained ingredient consistency across steps. Experimental results on recipe illustration benchmarks show that CookAnything performs better than existing methods in training-based and training-free settings. The proposed framework supports scalable, high-quality visual synthesis of complex multi-step instructions and holds significant potential for broad applications in instructional media, and procedural content creation.

顶级标签: computer vision multi-modal aigc
详细标签: text-to-image diffusion models procedural generation visual consistency instruction following 或 搜索:

CookAnything:一个灵活且一致的用于多步骤菜谱图像生成的框架 / CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation


1️⃣ 一句话总结

这篇论文提出了一个名为CookAnything的新框架,它能够根据任意长度的文字菜谱指令,智能地生成一系列既连贯又步骤分明的烹饪过程图片,解决了现有AI模型在生成多步骤、结构化图像序列时面临的灵活性和一致性难题。


📄 打开原文 PDF