菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-10
📄 Abstract - Routing, Cascades, and User Choice for LLMs

To mitigate the trade-offs between performance and costs, LLM providers route user tasks to different models based on task difficulty and latency. We study the effect of LLM routing with respect to user behavior. We propose a game between an LLM provider with two models (standard and reasoning) and a user who can re-prompt or abandon tasks if the routed model cannot solve them. The user's goal is to maximize their utility minus the delay from using the model, while the provider minimizes the cost of servicing the user. We solve this Stackelberg game by fully characterizing the user best response and simplifying the provider problem. We observe that in nearly all cases, the optimal routing policy involves a static policy with no cascading that depends on the expected utility of the models to the user. Furthermore, we reveal a misalignment gap between the provider-optimal and user-preferred routes when the user's and provider's rankings of the models with respect to utility and cost differ. Finally, we demonstrate conditions for extreme misalignment where providers are incentivized to throttle the latency of the models to minimize their costs, consequently depressing user utility. The results yield simple threshold rules for single-provider, single-user interactions and clarify when routing, cascading, and throttling help or harm.

顶级标签: llm systems theory
详细标签: routing game theory cost-performance tradeoff user behavior stackelberg game 或 搜索:

大语言模型的路由、级联与用户选择 / Routing, Cascades, and User Choice for LLMs


1️⃣ 一句话总结

这篇论文通过一个博弈模型分析了大语言模型提供商如何在成本和性能之间权衡,发现最优路由策略通常是静态的,并揭示了当提供商与用户对模型的效用和成本排序不一致时,会导致双方利益错位,甚至可能促使提供商故意降低服务速度来减少成本,从而损害用户体验。

源自 arXiv: 2602.09902