菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-23
📄 Abstract - Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure

Autonomous agents operating in continuous environments must decide not only what to do, but when to act. We introduce a lightweight adaptive temporal control system that learns the optimal interval between cognitive ticks from experience, replacing ad hoc biologically inspired timers with a principled learned policy. The policy state is augmented with a predictive hyperbolic spread signal (a "curvature signal" shorthand) derived from hyperbolic geometry: the mean pairwise Poincare distance among n sampled futures embedded in the Poincare ball. High spread indicates a branching, uncertain future and drives the agent to act sooner; low spread signals predictability and permits longer rest intervals. We further propose an interval-aware reward that explicitly penalises inefficiency relative to the chosen wait time, correcting a systematic credit-assignment failure of naive outcome-based rewards in timing problems. We additionally introduce a joint spatio-temporal embedding (ATCPG-ST) that concatenates independently normalised state and position projections in the Poincare ball; spatial trajectory divergence provides an independent timing signal unavailable to the state-only variant (ATCPG-SO). This extension raises mean hyperbolic spread (kappa) from 1.88 to 3.37 and yields a further 5.8 percent efficiency gain over the state-only baseline. Ablation experiments across five random seeds demonstrate that (i) learning is the dominant efficiency factor (54.8 percent over no-learning), (ii) hyperbolic spread provides significant complementary gain (26.2 percent over geometry-free control), (iii) the combined system achieves 22.8 percent efficiency over the fixed-interval baseline, and (iv) adding spatial position information to the spread embedding yields an additional 5.8 percent.

顶级标签: reinforcement learning agents theory
详细标签: temporal control adaptive timing hyperbolic geometry credit assignment decision timing 或 搜索:

学习何时行动:具有预测性时间结构的区间感知强化学习 / Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure


1️⃣ 一句话总结

这篇论文提出了一种新的智能体决策方法,它不仅能决定‘做什么’,还能通过预测未来状态的不确定性来自主学习‘何时行动’的最佳时机,从而显著提高行动效率。

源自 arXiv: 2603.22384