超越ECE:校准尺寸比、风险评估与置信加权指标 / Beyond ECE: Calibrated Size Ratio, Risk Assessment, and Confidence-Weighted Metrics
1️⃣ 一句话总结
本文指出传统校准评估指标ECE无法有效捕捉过度置信风险,提出新的校准尺寸比(CSR)来量化风险,并引入置信加权准确率(cwA)等指标以同时衡量模型是否具备区分正确与错误预测的能力,实验表明新方法能更好识别有风险的置信输出。
Confidence calibration has been dominated by the Expected Calibration Error (ECE), a linear metric that counts calibration offset equally regardless of the confidence level at which it occurs. We show that ECE can remain small even under arbitrarily large overconfidence risk, so we propose Calibrated Size Ratio (CSR) instead, an interpretable metric that equals 1 under perfect calibration, from which we derive the risk probability $P_{\mathrm{risk}}$ that quantifies the statistical evidence for overconfidence. We further argue that overconfidence risk assessment must be complemented by a measure of discriminative value: whether the assigned confidences actively distinguish correct from incorrect predictions. We show that confidence-weighted accuracy $\mathrm{cwA}$ is the natural such complement, and that confidence-weighting extends to all standard classification metrics. In particular, we prove that the confidence-weighted AUC (cwAUC) captures the information about calibration while the classical AUC cannot. We validate the proposed indicators on several synthetic confidence distributions under multiple controlled calibration profiles and find that CSR separates risky from non-risky assignments. We also test the metrics on fifteen real datasets, with and without post-hoc calibration, and find that standard methods can yield risky confidence profiles.
超越ECE:校准尺寸比、风险评估与置信加权指标 / Beyond ECE: Calibrated Size Ratio, Risk Assessment, and Confidence-Weighted Metrics
本文指出传统校准评估指标ECE无法有效捕捉过度置信风险,提出新的校准尺寸比(CSR)来量化风险,并引入置信加权准确率(cwA)等指标以同时衡量模型是否具备区分正确与错误预测的能力,实验表明新方法能更好识别有风险的置信输出。
源自 arXiv: 2605.01796