STAR:融合统计与智能体推理的大模型性能预测 / STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction
1️⃣ 一句话总结
本文提出了一种名为STAR的新框架,它巧妙地将数据驱动的统计预测与知识驱动的智能推理相结合,即使在仅有极少量测试数据的情况下,也能更准确、更可靠地预测大型人工智能模型的性能,并给出可解释的预测依据。
As comprehensive large model evaluation becomes prohibitively expensive, predicting model performance from limited observations has become essential. However, existing statistical methods struggle with pattern shifts, data sparsity, and lack of explanation, while pure LLM methods remain unreliable. We propose STAR, a framework that bridges data-driven STatistical expectations with knowledge-driven Agentic Reasoning. STAR leverages specialized retrievers to gather external knowledge and embeds semantic features into Constrained Probabilistic Matrix Factorization (CPMF) to generate statistical expectations with uncertainty. A reasoning module guided by Expectation Violation Theory (EVT) then refines predictions through intra-family analysis, cross-model comparison, and credibility-aware aggregation, producing adjustments with traceable explanations. Extensive experiments show that STAR consistently outperforms all baselines on both score-based and rank-based metrics, delivering a 14.46% gain in total score over the strongest statistical method under extreme sparsity, with only 1--2 observed scores per test model.
STAR:融合统计与智能体推理的大模型性能预测 / STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction
本文提出了一种名为STAR的新框架,它巧妙地将数据驱动的统计预测与知识驱动的智能推理相结合,即使在仅有极少量测试数据的情况下,也能更准确、更可靠地预测大型人工智能模型的性能,并给出可解释的预测依据。
源自 arXiv: 2602.12143