菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-24
📄 Abstract - From Performance to Purpose: A Sociotechnical Taxonomy for Evaluating Large Language Model Utility

As large language models (LLMs) continue to improve at completing discrete tasks, they are being integrated into increasingly complex and diverse real-world systems. However, task-level success alone does not establish a model's fit for use in practice. In applied, high-stakes settings, LLM effectiveness is driven by a wider array of sociotechnical determinants that extend beyond conventional performance measures. Although a growing set of metrics capture many of these considerations, they are rarely organized in a way that supports consistent evaluation, leaving no unified taxonomy for assessing and comparing LLM utility across use cases. To address this gap, we introduce the Language Model Utility Taxonomy (LUX), a comprehensive framework that structures utility evaluation across four domains: performance, interaction, operations, and governance. Within each domain, LUX is organized hierarchically into thematically aligned dimensions and components, each grounded in metrics that enable quantitative comparison and alignment of model selection with intended use. In addition, an external dynamic web tool is provided to support exploration of the framework by connecting each component to a repository of relevant metrics (factors) for applied evaluation.

顶级标签: llm model evaluation systems
详细标签: evaluation framework sociotechnical systems utility taxonomy applied ai model selection 或 搜索:

从性能到目的:评估大语言模型实用性的社会技术分类法 / From Performance to Purpose: A Sociotechnical Taxonomy for Evaluating Large Language Model Utility


1️⃣ 一句话总结

这篇论文提出了一个名为LUX的综合性评估框架,旨在超越传统性能指标,从性能、交互、运营和治理四个维度系统评估大语言模型在真实复杂场景中的实际效用,以帮助用户根据具体应用需求选择合适的模型。

源自 arXiv: 2602.20513