SKILLS:面向大语言模型驱动的电信运营的结构化知识注入 / SKILLS: Structured Knowledge Injection for LLM-Driven Telecommunications Operations
1️⃣ 一句话总结
这篇论文提出了一个名为SKILLS的基准框架,通过实验证明,为通用大语言模型注入电信领域的结构化知识(如工作流逻辑和API规范),能显著提升其在真实电信运营场景中执行自动化任务的准确性和可靠性。
As telecommunications operators accelerate adoption of AI-enabled automation, a practical question remains unresolved: can general-purpose large language model (LLM) agents reliably execute telecom operations workflows through real API interfaces, or do they require structured domain guidance? We introduce SKILLS (Structured Knowledge Injection for LLM-driven Service Lifecycle operations), a benchmark framework comprising 37 telecom operations scenarios spanning 8 TM Forum Open API domains (TMF620, TMF621, TMF622, TMF628, TMF629, TMF637, TMF639, TMF724). Each scenario is grounded in live mock API servers with seeded production-representative data, MCP tool interfaces, and deterministic evaluation rubrics combining response content checks, tool-call verification, and database state assertions. We evaluate open-weight models under two conditions: baseline (generic agent with tool access but no domain guidance) and with-skill (agent augmented with a portable this http URL document encoding workflow logic, API patterns, and business rules). Results across 5 open-weight model conditions and 185 scenario-runs show consistent skill lift across all models. MiniMax M2.5 leads (81.1% with-skill, +13.5pp), followed by Nemotron 120B (78.4%, +18.9pp), GLM-5 Turbo (78.4%, +5.4pp), and Seed 2.0 Lite (75.7%, +18.9pp).
SKILLS:面向大语言模型驱动的电信运营的结构化知识注入 / SKILLS: Structured Knowledge Injection for LLM-Driven Telecommunications Operations
这篇论文提出了一个名为SKILLS的基准框架,通过实验证明,为通用大语言模型注入电信领域的结构化知识(如工作流逻辑和API规范),能显著提升其在真实电信运营场景中执行自动化任务的准确性和可靠性。
源自 arXiv: 2603.15372