菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-11
📄 Abstract - TetherCache: Stabilizing Autoregressive Long-Form Video Generation with Gated Recall and Trusted Alignment

Autoregressive video diffusion models provide a natural formulation for streaming and variable-length video generation by conditioning newly generated frames on previously generated content. However, extending these models to minute-level generation remains challenging: the limited KV-cache budget prevents the model from retaining the full history, while repeatedly conditioning on self-generated frames induces a context distribution shift that accumulates over time, leading to visual artifacts, quality degradation, and temporal drift. In this paper, we propose TetherCache, a training-free and plug-and-play cache management strategy for drift-resistant long video generation. TetherCache organizes the cache into sink, memory, and recent regions, and introduces two complementary mechanisms. First, GRAB (Gated Recall with Attention-Diversity Balancing) selects long-range memory frames using a gated score that combines attention-based relevance with temporal diversity, preserving informative yet diverse historical context under a fixed cache budget. Second, TAME (Trusted Alignment via Memory Editing) lightly edits newly recalled memory tokens by aligning their statistics to a trusted context distribution, reducing the pollution caused by drifted historical features. Built on Self-Forcing, TetherCache consistently improves long-video generation quality on VBench-Long across 30s, 60s, and 240s settings. In particular, for 240s generation, it substantially improves overall and semantic scores while reducing quality drift from 7.84 to 1.33, demonstrating its effectiveness for stable long-horizon autoregressive video diffusion.

顶级标签: video generation model evaluation
详细标签: autoregressive video diffusion cache management long video generation drift mitigation kv-cache 或 搜索:

TetherCache:通过门控回忆与可信对齐稳定自回归长视频生成 / TetherCache: Stabilizing Autoregressive Long-Form Video Generation with Gated Recall and Trusted Alignment


1️⃣ 一句话总结

本文提出了一种无需重新训练、即插即用的缓存管理策略TetherCache,通过将历史缓存划分为不同区域,并采用门控评分选择重要且多样化的记忆帧、同时校正它们的数据分布,有效解决了长视频生成中因依赖自身生成内容而产生的错误累积和画质漂移问题,从而大幅提升了超长视频的稳定性和质量。

源自 arXiv: 2606.13035