Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control

📄 Abstract - Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control

Associative memory has long underpinned the design of sequential models. Beyond recall, humans reason by projecting future states and selecting goal-directed actions, a capability that modern language models increasingly require but do not natively encode. While prior work uses reinforcement learning or test-time training, planning remains external to the model architecture. We formulate reasoning as optimal control and introduce the Test-Time Control (TTC) layer, which performs finite-horizon LQR planning over latent states at inference time, represents a value function within neural architectures, and leverages it as the nested objective to enable planning before prediction. To ensure scalability, we derive a hardware-efficient LQR solver based on a symplectic formulation and implement it as a fused CUDA kernel, enabling parallel execution with minimal overhead. Integrated as an adapter into pretrained LLMs, TTC layers improve mathematical reasoning performance by up to +27.8% on MATH-500 and 2-3x Pass@8 improvements on AMC and AIME, demonstrating that embedding optimal control as an architectural component provides an effective and scalable mechanism for reasoning beyond test-time training.

超越测试时训练：通过硬件高效的最优控制学习推理 / Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control

1️⃣ 一句话总结

这篇论文提出了一种名为‘测试时控制层’的新方法，它将推理过程建模为最优控制问题，并设计了一个高效的硬件求解器，将其作为插件集成到大型语言模型中，从而显著提升了模型在数学推理等复杂任务上的性能。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要