LLM Evaluation as Tensor Completion: Low Rank Structure and Semiparametric Efficiency

📄 Abstract - LLM Evaluation as Tensor Completion: Low Rank Structure and Semiparametric Efficiency

Large language model (LLM) evaluation platforms increasingly rely on pairwise human judgments. These data are noisy, sparse, and non-uniform, yet leaderboards are reported with limited uncertainty quantification. We study this as semiparametric inference for a low-rank latent score tensor observed through pairwise comparisons under Bradley-Terry-Luce-type models. This places LLM evaluation in a new tensor completion setting with structured observations, non-uniform sampling, and pairwise contrasts. Our target is a smooth functional $\psi(T^\star)$, including linear estimands such as ability gaps and nonlinear ones such as win probabilities. We derive the information operator on the low-rank tangent space, the efficient influence function, and the semiparametric efficiency bound, then construct a one-step debiased estimator with asymptotic normality. A central challenge is that the information operator is anisotropic and does not commute with the tangent-space projection, creating a bottleneck absent from isotropic models. We introduce a score-whitening method that equalizes local Fisher information and restores stable inference at the optimal sample-complexity scale. Our results provide a principled framework for uncertainty quantification in LLM evaluation and more broadly for inference on low-rank structures from pairwise data.

作为张量补全的大语言模型评估：低秩结构与半参数效率 / LLM Evaluation as Tensor Completion: Low Rank Structure and Semiparametric Efficiency

1️⃣ 一句话总结

这篇论文提出了一种新的理论框架，将大语言模型评估中基于两两比较的、稀疏且嘈杂的排名数据，建模为一个低秩张量补全问题，并设计了高效的统计方法来量化评估结果的不确定性。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要