菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-08
📄 Abstract - SparseEval: Efficient Evaluation of Large Language Models by Sparse Optimization

As large language models (LLMs) continue to scale up, their performance on various downstream tasks has significantly improved. However, evaluating their capabilities has become increasingly expensive, as performing inference on a large number of benchmark samples incurs high computational costs. In this paper, we revisit the model-item performance matrix and show that it exhibits sparsity, that representative items can be selected as anchors, and that the task of efficient benchmarking can be formulated as a sparse optimization problem. Based on these insights, we propose SparseEval, a method that, for the first time, adopts gradient descent to optimize anchor weights and employs an iterative refinement strategy for anchor selection. We utilize the representation capacity of MLP to handle sparse optimization and propose the Anchor Importance Score and Candidate Importance Score to evaluate the value of each item for task-aware refinement. Extensive experiments demonstrate the low estimation error and high Kendall's~$\tau$ of our method across a variety of benchmarks, showcasing its superior robustness and practicality in real-world scenarios. Code is available at {this https URL}.

顶级标签: llm model evaluation benchmark
详细标签: efficient evaluation sparse optimization anchor selection gradient descent performance estimation 或 搜索:

SparseEval:通过稀疏优化高效评估大语言模型 / SparseEval: Efficient Evaluation of Large Language Models by Sparse Optimization


1️⃣ 一句话总结

这篇论文提出了一种名为SparseEval的新方法,通过将大语言模型评估问题转化为一个稀疏优化问题,并利用梯度下降和迭代策略来智能选取少量代表性测试样本,从而在保证评估准确性的同时,大幅降低了评估所需的计算成本。

源自 arXiv: 2602.07909