菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-17
📄 Abstract - SpectralDiT: Timestep-Conditioned Spectral Residual Correction for Flow-Matching DiTs

We propose SpectralDiT, a lightweight modification to flow-matching Diffusion Transformers that adds timestep-conditioned spectral correction to the MLP residual branch. The module decomposes each residual update into low- and high-frequency components on the patch-token grid, then learns a zero-initialized additive gate so the model initially matches the baseline DiT. On CIFAR-10 pixel-space generation, SpectralDiT improves FID from 20.78 to 19.71 at patch size 1 and reduces the radial Fourier spectrum gap. Furthermore, we scale our method to latent diffusion on ImageNet-100. With 0.6% additional theoretical FLOPs and 1.36% additional parameters, SpectralDiT improves latent flow-matching, achieving an 8.7% relative FID reduction under classifier-free guidance (CFG 2.0). All reported results are averaged over five seeds. Ablations and gate visualizations on CIFAR-10 reveal stable block-specific spectral correction patterns.

顶级标签: machine learning computer vision model training
详细标签: diffusion transformers spectral correction flow-matching image generation fid improvement 或 搜索:

SpectralDiT:面向流匹配扩散Transformer的时序条件频谱残差校正 / SpectralDiT: Timestep-Conditioned Spectral Residual Correction for Flow-Matching DiTs


1️⃣ 一句话总结

提出了一种轻量级插件SpectralDiT,通过为扩散Transformer的残差分支添加时序条件化的频谱校正模块,在仅增加极少量计算和参数的情况下,显著提升了图像生成质量,并在CIFAR-10和ImageNet-100上验证了效果。

源自 arXiv: 2606.18765