世界与任务的分解:面向机器人学习 / World-Task Factorization for Robot Learning
1️⃣ 一句话总结
本文提出了一种将机器人学习中的“世界模型”(包括环境和机器人自身特性)与“任务逻辑”(即具体要完成的目标)明确分离的结构化方法,通过一个可微分的组合图(AICON)来传递世界信息,并用一个小型策略网络来学习任务,从而让机器人能在不同任务、环境和硬件之间实现零样本泛化,无需重新训练。
Robot learning must produce policies that generalize to new combinations of constraints, teammates, and environments. To achieve this, we must structurally factor the policy, which is a choice that dictates what generalizes, what requires retraining, and what remains entangled. Existing methods span a wide spectrum, from expecting structure to emerge from data scaling, to hand-designing it via hierarchies, skill libraries or learned specializations. In this paper, we study what we argue is the most fundamental factorization in robotics: separating the world from the task. We investigate the conditions under which this factorization is principled. World factors are properties of the embodied system and the environment; they exist independently of intent. Task factors are defined by the task's logic over what the world admits. We formalize this asymmetry through Bayesian model evidence: it aligns with the data-generating process, maintains high likelihood through an analytical world model, and reduces the Occam razor's penalty on task parameters. We instantiate this factorization by pairing AICON, a differentiable graph of recursive estimators and interconnections that is compositional, operates without task-specific data, and propagates cost gradients to actuators, with a compact, learned policy that modulates gradient paths. Gradients serve as the interface between the two factors: they carry world structure through the graph and task structure through costs, enabling low-dimensional learning while preserving structural generalization. We test the world/task factorization across three problems that encompass heterogeneous robots, environments, task logic and sensorimotor modalities. Our framework outperforms end-to-end baselines and analytical heuristics in all settings, generalizes zero-shot to out-of-distribution configurations, and transfers to real hardware without retraining.
世界与任务的分解:面向机器人学习 / World-Task Factorization for Robot Learning
本文提出了一种将机器人学习中的“世界模型”(包括环境和机器人自身特性)与“任务逻辑”(即具体要完成的目标)明确分离的结构化方法,通过一个可微分的组合图(AICON)来传递世界信息,并用一个小型策略网络来学习任务,从而让机器人能在不同任务、环境和硬件之间实现零样本泛化,无需重新训练。
源自 arXiv: 2606.02027