OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

📄 Abstract - OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

Recent layout-to-image models have achieved remarkable progress in spatial controllability. However, they still struggle with inter-object occlusion. When bounding boxes overlap, most existing methods lack explicit occlusion information, which makes the generation in intersection regions inherently ambiguous and hinders the determination of complex occlusion relationships. As a result, they often produce entangled textures or physically inconsistent layering in the overlapped areas. To address this issue, we first construct SA-Z, a large-scale dataset enriched with explicit occlusion ordering and pixel-level annotations. Building upon our proposed dataset, we introduce OcclusionFormer, a novel occlusion-aware Diffusion Transformer framework that explicitly models Z-order priority by decoupling instances and compositing them via volume rendering. Furthermore, to ensure fine-grained spatial precision, we introduce a queried alignment loss that explicitly supervises individual instances and enhances semantic consistency. The proposed method effectively reduces ambiguity in overlapping regions, enforces correct occlusion dependencies, and preserves structural integrity, leading to substantial accuracy gains across diverse scenes.

OcclusionFormer：基于Z排序的布局图像生成方法 / OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

1️⃣ 一句话总结

针对布局到图像生成中物体重叠导致纹理混乱和层次错误的问题，本文构建了带有遮挡顺序标注的数据集SA-Z，并提出OcclusionFormer框架，通过显式建模物体的Z轴前后顺序并采用体积渲染合成，有效消除了重叠区域的不确定性，生成物理一致的遮挡关系。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要