菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-19
📄 Abstract - Efficient Video Diffusion with Sparse Information Transmission for Video Compression

Video compression aims to maximize reconstruction quality with minimal bitrates. Beyond standard distortion metrics, perceptual quality and temporal consistency are also critical. However, at ultra-low bitrates, traditional end-to-end compression models tend to produce blurry images of poor perceptual quality. Besides, existing generative compression methods often treat video frames independently and show limitations in time coherence and efficiency. To address these challenges, we propose the Efficient Video Diffusion with Sparse Information Transmission (Diff-SIT), which comprises the Sparse Temporal Encoding Module (STEM) and the One-Step Video Diffusion with Frame Type Embedder (ODFTE). The STEM sparsely encodes the original frame sequence into an information-rich intermediate sequence, achieving significant bitrate savings. Subsequently, the ODFTE processes this intermediate sequence as a whole, which exploits the temporal correlation. During this process, our proposed Frame Type Embedder (FTE) guides the diffusion model to perform adaptive reconstruction according to different frame types to optimize the overall quality. Extensive experiments on multiple datasets demonstrate that Diff-SIT establishes a new state-of-the-art in perceptual quality and temporal consistency, particularly in the challenging ultra-low-bitrate regime. Code is released at this https URL.

顶级标签: video generation model training multi-modal
详细标签: video compression diffusion models temporal consistency low-bitrate perceptual quality 或 搜索:

基于稀疏信息传输的高效视频扩散模型用于视频压缩 / Efficient Video Diffusion with Sparse Information Transmission for Video Compression


1️⃣ 一句话总结

这篇论文提出了一种名为Diff-SIT的新方法,它通过稀疏编码和一步式视频扩散技术,在极低码率下显著提升了视频压缩的视觉观感和时间连贯性。

源自 arXiv: 2603.18501