菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-29
📄 Abstract - Unsupervised Decomposition and Recombination with Discriminator-Driven Diffusion Models

Decomposing complex data into factorized representations can reveal reusable components and enable synthesizing new samples via component recombination. We investigate this in the context of diffusion-based models that learn factorized latent spaces without factor-level supervision. In images, factors can capture background, illumination, and object attributes; in robotic videos, they can capture reusable motion components. To improve both latent factor discovery and quality of compositional generation, we introduce an adversarial training signal via a discriminator trained to distinguish between single-source samples and those generated by recombining factors across sources. By optimizing the generator to fool this discriminator, we encourage physical and semantic consistency in the resulting recombinations. Our method outperforms implementations of prior baselines on CelebA-HQ, Virtual KITTI, CLEVR, and Falcor3D, achieving lower FID scores and better disentanglement as measured by MIG and MCC. Furthermore, we demonstrate a novel application to robotic video trajectories: by recombining learned action components, we generate diverse sequences that significantly increase state-space coverage for exploration on the LIBERO benchmark.

顶级标签: multi-modal model training computer vision
详细标签: unsupervised learning diffusion models disentangled representation adversarial training factor recombination 或 搜索:

基于判别器驱动扩散模型的无监督分解与重组 / Unsupervised Decomposition and Recombination with Discriminator-Driven Diffusion Models


1️⃣ 一句话总结

这篇论文提出了一种新的无监督学习方法,通过引入一个判别器来指导扩散模型,使其能自动将复杂数据(如图像、机器人视频)分解成独立的构成要素(如背景、动作),并能高质量地重组这些要素来生成新样本,从而在多个基准测试上超越了现有方法,并成功应用于机器人探索任务。

源自 arXiv: 2601.22057