菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-25
📄 Abstract - Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation

Real-time portrait animation is essential for interactive applications such as virtual assistants and live avatars, requiring high visual fidelity, temporal coherence, ultra-low latency, and responsive control from dynamic inputs like reference images and driving signals. While diffusion-based models achieve strong quality, their non-causal nature hinders streaming deployment. Causal autoregressive video generation approaches enable efficient frame-by-frame generation but suffer from error accumulation, motion discontinuities at chunk boundaries, and degraded long-term consistency. In this work, we present a novel streaming framework named Knot Forcing for real-time portrait animation that addresses these challenges through three key designs: (1) a chunk-wise generation strategy with global identity preservation via cached KV states of the reference image and local temporal modeling using sliding window attention; (2) a temporal knot module that overlaps adjacent chunks and propagates spatio-temporal cues via image-to-video conditioning to smooth inter-chunk motion transitions; and (3) A "running ahead" mechanism that dynamically updates the reference frame's temporal coordinate during inference, keeping its semantic context ahead of the current rollout frame to support long-term coherence. Knot Forcing enables high-fidelity, temporally consistent, and interactive portrait animation over infinite sequences, achieving real-time performance with strong visual stability on consumer-grade GPUs.

顶级标签: video generation aigc model training
详细标签: portrait animation autoregressive diffusion real-time generation temporal coherence streaming inference 或 搜索:

节点强制:驯服自回归视频扩散模型以实现实时无限交互式肖像动画 / Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation


1️⃣ 一句话总结

这篇论文提出了一种名为“节点强制”的新方法,通过分块生成、重叠区域平滑和前瞻更新机制,解决了现有实时肖像动画模型在连贯性和延迟上的难题,从而能在普通显卡上实现高质量、无限长的流畅互动动画。

源自 arXiv: 2512.21734