📄
Abstract - SPIKE: An Adaptive Dual Controller Framework for Cost-Efficient Long-Horizon Game Agents
Long-horizon multimodal agents in open-world games must stay goal-directed across many low-level interactions under tight token and latency budgets. Existing approaches often trade off costly per-step reasoning against reactive execution that can drift, repeat failures, and recover poorly. Our key idea is to reuse strategic reasoning across locally stable segments and reinvoke it at event boundaries. We present SPIKE, an adaptive dual controller framework for cost-efficient long-horizon game control. Its Strategic Controller performs low-frequency global planning, failure analysis, and recovery, while its Reactive Controller handles fast local execution under a strict token budget. An Event Trigger monitors visual change, task progress, repeated actions, and failure signals to decide when control should stay reactive or escalate to strategic reasoning. Hierarchical Memory separates short-term experience reuse in the State-Action Memory Bank (SA-MB) from structured evidence in the State Action Knowledge Graph (SA-KG), allowing each controller to retrieve the context it needs. This design reuses strategic proposals over multiple reactive steps, supports local override when plans become stale, and reserves expensive reasoning for moments where extra deliberation is useful. On the Lite-100 split of StarDojo, SPIKE improves Lite-100 success rate (SR) by 5.0 percentage points (38.5% relative) over the strongest Lite-100 baseline and Budgeted SR by 9.3 points (75.6% relative) over the strongest budgeted baseline. It also reduces token consumption by 54.9% and latency by 40.8%. Ablations show that event triggering, reactive override, and heterogeneous memory each contribute to success and recovery, supporting selective reasoning rather than reasoning at every step.
SPIKE:一种面向低成本、长周期游戏智能体的自适应双控制器框架 /
SPIKE: An Adaptive Dual Controller Framework for Cost-Efficient Long-Horizon Game Agents
1️⃣ 一句话总结
本文提出了一种名为SPIKE的自适应双控制器框架,通过让一个低频策略控制器负责全局规划和故障恢复,一个高频反应控制器负责快速执行,并利用事件触发器在两者间智能切换,从而在开放世界游戏的长周期任务中大幅降低计算成本和延迟,同时提升任务成功率。