菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-27
📄 Abstract - Tool Forge: A Validation-Carrying Toolchain for Governed Agentic Execution

Large language model agents are increasingly expected to perform operational work: calling APIs, manipulating files, assembling workflows, and acting inside enterprise systems. Yet the tool layer on which this execution depends is still commonly treated as either a hand-written integration artifact or a static list of schemas exposed to a model. This paper introduces Tool Forge, a validation-carrying toolchain for converting natural-language capability intent into governed, sandbox-verified, cataloged tool artifacts and exposing those artifacts to agents through a token-efficient routing layer. Tool Forge treats a tool as a capsule containing intent, capability contract, implementation, dependency policy, tests, documentation, runtime validation evidence, lifecycle state, credential bindings, and routing metadata. It also introduces a Router that exposes intent-scoped tool sessions instead of loading full catalog schemas into the model context. We describe the system architecture, validation pipeline, MCP-facing routing model, governance controls, and initial reproducible benchmarks from the open-source implementation. Across 83 Router benchmark cases, Tool Forge Router achieves aggregate micro-F1 of 0.901 while reducing estimated task-flow tool context by 99.2% relative to naive full-catalog schema exposure. In a 25-case end-to-end generation probe over local-tool tasks, Tool Forge generates 25 of 25 tool bundles, reaches micro-F1 of 0.940 against deterministic acceptance checks, and passes 23 of 25 live sandbox validations. These results are presented as an initial systems benchmark, not as a state-of-the-art claim. The paper identifies remaining challenges in adversarial routing, broader API grounding, sandbox isolation, and cross-system evaluation.

顶级标签: llm agents systems
详细标签: tool orchestration governance validation pipeline routing benchmark 或 搜索:

工具锻造:面向受管控智能体执行的携带验证的工具链 / Tool Forge: A Validation-Carrying Toolchain for Governed Agentic Execution


1️⃣ 一句话总结

本文提出了一种名为“工具锻造”的系统,它能将用户用自然语言描述的操作需求自动转化为经过严格验证、可安全使用的工具包,并通过一个高效的路由器让AI助手只加载必要的工具信息,从而大幅降低系统开销,同时保证安全性和准确性。

源自 arXiv: 2605.28000