基于视频扩散先验的生成式神经视频压缩 / Generative Neural Video Compression via Video Diffusion Prior
1️⃣ 一句话总结
这篇论文提出了一种名为GNVC-VD的新型视频压缩框架,它首次将先进的视频生成模型用于压缩,通过序列级的联合优化来减少传统方法中常见的画面闪烁问题,从而在极低码率下也能保持视频的时空连贯性和高感知质量。
We present GNVC-VD, the first DiT-based generative neural video compression framework built upon an advanced video generation foundation model, where spatio-temporal latent compression and sequence-level generative refinement are unified within a single codec. Existing perceptual codecs primarily rely on pre-trained image generative priors to restore high-frequency details, but their frame-wise nature lacks temporal modeling and inevitably leads to perceptual flickering. To address this, GNVC-VD introduces a unified flow-matching latent refinement module that leverages a video diffusion transformer to jointly enhance intra- and inter-frame latents through sequence-level denoising, ensuring consistent spatio-temporal details. Instead of denoising from pure Gaussian noise as in video generation, GNVC-VD initializes refinement from decoded spatio-temporal latents and learns a correction term that adapts the diffusion prior to compression-induced degradation. A conditioning adaptor further injects compression-aware cues into intermediate DiT layers, enabling effective artifact removal while maintaining temporal coherence under extreme bitrate constraints. Extensive experiments show that GNVC-VD surpasses both traditional and learned codecs in perceptual quality and significantly reduces the flickering artifacts that persist in prior generative approaches, even below 0.01 bpp, highlighting the promise of integrating video-native generative priors into neural codecs for next-generation perceptual video compression.
基于视频扩散先验的生成式神经视频压缩 / Generative Neural Video Compression via Video Diffusion Prior
这篇论文提出了一种名为GNVC-VD的新型视频压缩框架,它首次将先进的视频生成模型用于压缩,通过序列级的联合优化来减少传统方法中常见的画面闪烁问题,从而在极低码率下也能保持视频的时空连贯性和高感知质量。