菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-23
📄 Abstract - Active Intelligence in Video Avatars via Closed-loop World Modeling

Current video avatar generation methods excel at identity preservation and motion alignment but lack genuine agency, they cannot autonomously pursue long-term goals through adaptive environmental interaction. We address this by introducing L-IVA (Long-horizon Interactive Visual Avatar), a task and benchmark for evaluating goal-directed planning in stochastic generative environments, and ORCA (Online Reasoning and Cognitive Architecture), the first framework enabling active intelligence in video avatars. ORCA embodies Internal World Model (IWM) capabilities through two key innovations: (1) a closed-loop OTAR cycle (Observe-Think-Act-Reflect) that maintains robust state tracking under generative uncertainty by continuously verifying predicted outcomes against actual generations, and (2) a hierarchical dual-system architecture where System 2 performs strategic reasoning with state prediction while System 1 translates abstract plans into precise, model-specific action captions. By formulating avatar control as a POMDP and implementing continuous belief updating with outcome verification, ORCA enables autonomous multi-step task completion in open-domain scenarios. Extensive experiments demonstrate that ORCA significantly outperforms open-loop and non-reflective baselines in task success rate and behavioral coherence, validating our IWM-inspired design for advancing video avatar intelligence from passive animation to active, goal-oriented behavior.

顶级标签: agents video generation multi-modal
详细标签: interactive video avatars world modeling long-horizon planning benchmark closed-loop reasoning 或 搜索:

从被动动画到主动智能:通过在线推理与认知架构实现长视野交互式视频化身 / Active Intelligence in Video Avatars via Closed-loop World Modeling


1️⃣ 一句话总结

本文提出了首个旨在为视频化身赋予主动智能的ORCA框架,通过闭环OTAR推理循环和分层双系统架构,解决了现有方法在随机生成环境中缺乏自主长期目标规划能力的问题,并为此类任务建立了首个标准化评估基准L-IVA。


2️⃣ 论文创新点

1. L-IVA任务与基准

2. ORCA(在线推理与认知架构)框架

3. 闭环OTAR(观察-思考-行动-反思)循环

4. 分层双系统架构

5. 显式世界建模(信念状态跟踪)


3️⃣ 主要结果与价值

结果亮点

实际价值


4️⃣ 术语表

源自 arXiv: 2512.20615