EgoLCD:基于长上下文扩散模型的第一人称视角视频生成 / EgoLCD: Egocentric Video Generation with Long Context Diffusion
1️⃣ 一句话总结
这篇论文提出了一种名为EgoLCD的新方法,通过巧妙地管理长期和短期记忆来生成连贯、高质量的第一人称视角长视频,有效解决了现有模型在生成过程中容易出现的画面内容漂移和遗忘问题。
Generating long, coherent egocentric videos is difficult, as hand-object interactions and procedural tasks require reliable long-term memory. Existing autoregressive models suffer from content drift, where object identity and scene semantics degrade over time. To address this challenge, we introduce EgoLCD, an end-to-end framework for egocentric long-context video generation that treats long video synthesis as a problem of efficient and stable memory management. EgoLCD combines a Long-Term Sparse KV Cache for stable global context with an attention-based short-term memory, extended by LoRA for local adaptation. A Memory Regulation Loss enforces consistent memory usage, and Structured Narrative Prompting provides explicit temporal guidance. Extensive experiments on the EgoVid-5M benchmark demonstrate that EgoLCD achieves state-of-the-art performance in both perceptual quality and temporal consistency, effectively mitigating generative forgetting and representing a significant step toward building scalable world models for embodied AI. Code: this https URL. Website: this https URL.
EgoLCD:基于长上下文扩散模型的第一人称视角视频生成 / EgoLCD: Egocentric Video Generation with Long Context Diffusion
这篇论文提出了一种名为EgoLCD的新方法,通过巧妙地管理长期和短期记忆来生成连贯、高质量的第一人称视角长视频,有效解决了现有模型在生成过程中容易出现的画面内容漂移和遗忘问题。