菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-23
📄 Abstract - WorldCache: Content-Aware Caching for Accelerated Video World Models

Diffusion Transformers (DiTs) power high-fidelity video world models but remain computationally expensive due to sequential denoising and costly spatio-temporal attention. Training-free feature caching accelerates inference by reusing intermediate activations across denoising steps; however, existing methods largely rely on a Zero-Order Hold assumption i.e., reusing cached features as static snapshots when global drift is small. This often leads to ghosting artifacts, blur, and motion inconsistencies in dynamic scenes. We propose \textbf{WorldCache}, a Perception-Constrained Dynamical Caching framework that improves both when and how to reuse features. WorldCache introduces motion-adaptive thresholds, saliency-weighted drift estimation, optimal approximation via blending and warping, and phase-aware threshold scheduling across diffusion steps. Our cohesive approach enables adaptive, motion-consistent feature reuse without retraining. On Cosmos-Predict2.5-2B evaluated on PAI-Bench, WorldCache achieves \textbf{2.3$\times$} inference speedup while preserving \textbf{99.4\%} of baseline quality, substantially outperforming prior training-free caching approaches. Our code can be accessed on \href{this https URL}{World-Cache}.

顶级标签: video generation model training systems
详细标签: diffusion transformers inference acceleration feature caching video world models computational efficiency 或 搜索:

WorldCache:面向加速视频世界模型的内容感知缓存方法 / WorldCache: Content-Aware Caching for Accelerated Video World Models


1️⃣ 一句话总结

这篇论文提出了一种名为WorldCache的智能缓存框架,它通过动态感知视频内容中的运动和重要区域,在几乎不损失生成质量的前提下,显著提升了视频生成模型的推理速度。

源自 arXiv: 2603.22286