菜单

🤖 系统
📄 Abstract - Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance

Diffusion models have demonstrated strong generative performance when using guidance methods such as classifier-free guidance (CFG), which enhance output quality by modifying the sampling trajectory. These methods typically improve a target output by intentionally degrading another, often the unconditional output, using heuristic perturbation functions such as identity mixing or blurred conditions. However, these approaches lack a principled foundation and rely on manually designed distortions. In this work, we propose Adversarial Sinkhorn Attention Guidance (ASAG), a novel method that reinterprets attention scores in diffusion models through the lens of optimal transport and intentionally disrupt the transport cost via Sinkhorn algorithm. Instead of naively corrupting the attention mechanism, ASAG injects an adversarial cost within self-attention layers to reduce pixel-wise similarity between queries and keys. This deliberate degradation weakens misleading attention alignments and leads to improved conditional and unconditional sample quality. ASAG shows consistent improvements in text-to-image diffusion, and enhances controllability and fidelity in downstream applications such as IP-Adapter and ControlNet. The method is lightweight, plug-and-play, and improves reliability without requiring any model retraining.

顶级标签: model training computer vision aigc
详细标签: diffusion models attention guidance optimal transport text-to-image adversarial training 或 搜索:

📄 论文总结

迈向可靠扩散采样的前沿:基于对抗性Sinkhorn注意力引导的方法 / Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance


1️⃣ 一句话总结

这项研究提出了一种名为ASAG的新方法,通过引入对抗性成本优化扩散模型中的注意力机制,从而在不重新训练模型的情况下提升生成图像的质量、可控性和可靠性。


📄 打开原文 PDF