自我优化视频采样 / Self-Refining Video Sampling
1️⃣ 一句话总结
这篇论文提出了一种让现有视频生成模型在推理时进行自我迭代优化的方法,无需额外训练或外部验证器,就能显著提升生成视频中复杂物理运动的真实感和连贯性。
Modern video generators still struggle with complex physical dynamics, often falling short of physical realism. Existing approaches address this using external verifiers or additional training on augmented data, which is computationally expensive and still limited in capturing fine-grained motion. In this work, we present self-refining video sampling, a simple method that uses a pre-trained video generator trained on large-scale datasets as its own self-refiner. By interpreting the generator as a denoising autoencoder, we enable iterative inner-loop refinement at inference time without any external verifier or additional training. We further introduce an uncertainty-aware refinement strategy that selectively refines regions based on self-consistency, which prevents artifacts caused by over-refinement. Experiments on state-of-the-art video generators demonstrate significant improvements in motion coherence and physics alignment, achieving over 70\% human preference compared to the default sampler and guidance-based sampler.
自我优化视频采样 / Self-Refining Video Sampling
这篇论文提出了一种让现有视频生成模型在推理时进行自我迭代优化的方法,无需额外训练或外部验证器,就能显著提升生成视频中复杂物理运动的真实感和连贯性。
源自 arXiv: 2601.18577