菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-29
📄 Abstract - Pretraining Frame Preservation in Autoregressive Video Memory Compression

We present PFP, a neural network structure to compress long videos into short contexts, with an explicit pretraining objective to preserve the high-frequency details of single frames at arbitrary temporal positions. The baseline model can compress a 20-second video into a context at about 5k length, where random frames can be retrieved with perceptually preserved appearances. Such pretrained models can be directly fine-tuned as memory encoders for autoregressive video models, enabling long history memory with low context cost and relatively low fidelity loss. We evaluate the framework with ablative settings and discuss the trade-offs of possible neural architecture designs.

顶级标签: video model training multi-modal
详细标签: video compression autoregressive models pretraining memory efficiency neural architecture 或 搜索:

自回归视频记忆压缩中的预训练帧保留 / Pretraining Frame Preservation in Autoregressive Video Memory Compression


1️⃣ 一句话总结

这篇论文提出了一种名为PFP的神经网络方法,它通过专门的预训练目标,能够将长视频压缩成很短的上下文信息,同时保留视频中任意时刻单帧画面的细节,从而为需要长时记忆的视频生成模型提供了一个高效且保真的记忆编码方案。

源自 arXiv: 2512.23851