📄
Abstract - Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Subject-driven image generation has advanced from single- to multi-subject composition, while neglecting distinction, the ability to identify and generate the correct subject when inputs contain multiple candidates. This limitation restricts effectiveness in complex, realistic visual settings. We propose Scone, a unified understanding-generation method that integrates composition and distinction. Scone enables the understanding expert to act as a semantic bridge, conveying semantic information and guiding the generation expert to preserve subject identity while minimizing interference. A two-stage training scheme first learns composition, then enhances distinction through semantic alignment and attention-based masking. We also introduce SconeEval, a benchmark for evaluating both composition and distinction across diverse scenarios. Experiments demonstrate that Scone outperforms existing open-source models in composition and distinction tasks on two benchmarks. Our model, benchmark, and training data are available at: this https URL.
Scone:通过统一的理解-生成建模,在主体驱动图像生成中桥接组合与区分 /
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
1️⃣ 一句话总结
这篇论文提出了一个名为Scone的新方法,它通过一个统一的理解-生成模型,不仅能让AI图像生成器把多个指定物体自然地组合到一张图里,还能准确地区分并生成正确的物体,从而在复杂场景下生成更精确、干扰更少的图像。