Training the Orchestrator: A Supervised Approach to End-to-End PDDL Planning with LLM Agents

📄 Abstract - Training the Orchestrator: A Supervised Approach to End-to-End PDDL Planning with LLM Agents

Translating natural-language planning intent into verified plans is a longstanding challenge: people communicate goals in language, while classical planners require formal PDDL specifications. Recent agentic frameworks bridge this gap by orchestrating a pool of specialized repair agents inside a verifier-checked refinement loop, but the orchestrator at the centre is itself a prompted frontier LLM, paying a frontier-LLM API call at every refinement step. We present HALO (Hybrid Agent-Learned Orchestrator), which trains the orchestrator from refinement trajectories that an external verifier has certified as ending in valid plans, across 11 PDDL domains. HALO pairs a small QLoRA-tuned policy with three hardcoded rules for trivially decidable selections, and operates over an expanded 21-agent action space. Unlike approaches that prompt a frontier LLM at every step or learn an orchestrator from sparse end-of-episode rewards, our key observation is that the verifier already provides strong guidance: every accepted trajectory is a sequence of demonstrably correct (state, agent) decisions, directly usable as supervision. Across PlanBench, Natural Plan, and classical planning benchmarks, HALO matches or exceeds the GPT-5-mini prompted baseline on success rate, sits within three percentage points of the stronger Gemini-3-Flash prompted baseline, reduces orchestration cost by more than an order of magnitude (\$0.18 to \$0.004 per task against GPT-5-mini, roughly 45$\times$ cheaper; roughly 15$\times$ cheaper than Gemini-3-Flash), and cuts total LLM calls per episode by 40 to 50 percent.

训练编排器：一种基于监督学习的端到端PDDL规划方法，结合大语言模型智能体 / Training the Orchestrator: A Supervised Approach to End-to-End PDDL Planning with LLM Agents

1️⃣ 一句话总结

本文提出了一种名为HALO的新方法，通过利用验证器提供的正确决策轨迹作为监督信号，训练一个小型语言模型作为编排器，代替昂贵的前沿大模型来协调多个专业修复智能体，从而在保持甚至提升规划成功率的同时，将规划成本降低数十倍，为实现高效且可靠的端到端形式化规划提供了实用方案。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要