用于扩散采样ART:连续时间控制与演员-评论家学习 / ART for Diffusion Sampling: Continuous-Time Control and Actor-Critic Learning
1️⃣ 一句话总结
本文提出了一种名为ART的智能方法,通过将扩散模型采样时钟的速度视为可学习的控制变量,并利用强化学习中的演员-评论家算法自动优化时间步分配,从而在不改变采样流程其他部分的情况下,显著提升图像生成质量并适应不同任务场景。
We study timestep allocation for score-based diffusion sampling, where a learned reverse-time dynamics is discretized on a finite grid. Uniform and hand-crafted schedules are standard choices, but they rely on fixed prescriptions and can therefore be suboptimal. To address this limitation, we propose Adaptive Reparameterized Time (ART), a continuous-time control formulation that learns a time change by treating the speed of the sampling clock as the control, so that a uniform grid on the learned clock induces adaptive timesteps in the original diffusion time. Based on a leading-order Euler error surrogate, ART provides a principled objective for allocating timesteps along the sampling trajectory. To solve this deterministic control problem, we introduce ART-RL, an auxiliary randomized formulation with Gaussian policies that turns schedule learning into a continuous-time reinforcement learning problem. We prove that the randomized ART-RL formulation is equivalent to ART at the optimizer level, in the sense that its optimal Gaussian policy recovers the optimal ART time-warping rate through its mean. We further establish policy evaluation and policy improvement characterizations and derive trajectory-based moment identities that yield implementable actor--critic updates for learning the schedule. Across experiments ranging from controlled low-dimensional settings to image generation, ART-RL can be plugged into existing diffusion samplers by changing only the timestep grid, consistently improving sample quality over strong baseline schedules at matched budgets while leaving the rest of the sampling pipeline unchanged. The learned schedules also exhibit broad generalization, transferring without retraining across sampling budgets, datasets, solvers, pipelines, and representation spaces.
用于扩散采样ART:连续时间控制与演员-评论家学习 / ART for Diffusion Sampling: Continuous-Time Control and Actor-Critic Learning
本文提出了一种名为ART的智能方法,通过将扩散模型采样时钟的速度视为可学习的控制变量,并利用强化学习中的演员-评论家算法自动优化时间步分配,从而在不改变采样流程其他部分的情况下,显著提升图像生成质量并适应不同任务场景。
源自 arXiv: 2607.02137