菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-27
📄 Abstract - DiscoForcing: A Unified Framework for Real-Time Audio-Driven Character Control with Diffusion Forcing

We study real-time audio-responsive character control as a deployment-faithful problem: strictly causal, bounded-latency streaming that must generate coherent full-body motion at interactive frame rates while the audio condition can change abruptly, including tempo shifts, drops, or user edits. Prior music-to-motion systems are largely optimized for offline generation with global context, and degrade in streaming rollouts where conditioning history becomes stale or unreliable. We introduce DiscoForcing, a streaming audio-driven diffusion framework that combines a causal music encoder that captures rhythmic structure and phase dynamics with a diffusion-forcing sequence model trained under heterogeneous noise levels across the temporal horizon. Building on this, we design a hybrid temporal schedule and a history-guided streaming sampler to explicitly trade off responsiveness against long-horizon consistency under non-stationary audio. Implemented in an end-to-end real-time interactive system with online avatar playback and humanoid deployment workflows, DiscoForcing delivers more stable long-horizon rollouts and sharper audio-motion alignment than prior baselines under matched causality and latency constraints while maintaining real-time throughput.

顶级标签: audio computer vision machine learning
详细标签: diffusion forcing audio-driven motion character control streaming generation causal encoder 或 搜索:

DiscoForcing:基于扩散驱动的统一框架实现实时音频驱动角色控制 / DiscoForcing: A Unified Framework for Real-Time Audio-Driven Character Control with Diffusion Forcing


1️⃣ 一句话总结

该论文提出了一个名为DiscoForcing的实时音频驱动角色动画框架,它通过结合因果音乐编码器和扩散序列模型,能够在音频条件突然变化(如节奏切换或用户编辑)时,依然稳定、流畅地生成全身动作,并兼顾实时响应速度和长期动作连贯性。

源自 arXiv: 2605.28491