📄 论文总结
一起烹饪和清洁:教授具身智能体并行执行任务 / Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution
1️⃣ 一句话总结
这篇论文提出了一个结合语言理解、三维空间定位和效率优化的新任务ORS3D,并构建了大规模数据集ORS3D-60K,同时开发了GRANT模型来帮助具身智能体通过并行执行子任务(如一边运行微波炉一边清洁水槽)来最小化总任务完成时间。
Task scheduling is critical for embodied AI, enabling agents to follow natural language instructions and execute actions efficiently in 3D physical worlds. However, existing datasets often simplify task planning by ignoring operations research (OR) knowledge and 3D spatial grounding. In this work, we propose Operations Research knowledge-based 3D Grounded Task Scheduling (ORS3D), a new task that requires the synergy of language understanding, 3D grounding, and efficiency optimization. Unlike prior settings, ORS3D demands that agents minimize total completion time by leveraging parallelizable subtasks, e.g., cleaning the sink while the microwave operates. To facilitate research on ORS3D, we construct ORS3D-60K, a large-scale dataset comprising 60K composite tasks across 4K real-world scenes. Furthermore, we propose GRANT, an embodied multi-modal large language model equipped with a simple yet effective scheduling token mechanism to generate efficient task schedules and grounded actions. Extensive experiments on ORS3D-60K validate the effectiveness of GRANT across language understanding, 3D grounding, and scheduling efficiency. The code is available at this https URL
一起烹饪和清洁:教授具身智能体并行执行任务 / Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution
这篇论文提出了一个结合语言理解、三维空间定位和效率优化的新任务ORS3D,并构建了大规模数据集ORS3D-60K,同时开发了GRANT模型来帮助具身智能体通过并行执行子任务(如一边运行微波炉一边清洁水槽)来最小化总任务完成时间。