别急着停:重新思考模型参数选择的验证标准 / Don't stop me now: Rethinking Validation Criteria for Model Parameter Selection
1️⃣ 一句话总结
这篇论文通过系统实验发现,在训练神经网络时,使用验证集上的准确率(特别是用于早停)来选择最佳模型参数效果不佳,反而使用验证集上的损失函数值作为选择标准能得到更稳定、更好的测试性能。
Despite the extensive literature on training loss functions, the evaluation of generalization on the validation set remains underexplored. In this work, we conduct a systematic empirical and statistical study of how the validation criterion used for model selection affects test performance in neural classifiers, with attention to early stopping. Using fully connected networks on standard benchmarks under $k$-fold evaluation, we compare: (i) early stopping with patience and (ii) post-hoc selection over all epochs (i.e. no early stopping). Models are trained with cross-entropy, C-Loss, or PolyLoss; the model parameter selection on the validation set is made using accuracy or one of the three loss functions, each considered independently. Three main findings emerge. (1) Early stopping based on validation accuracy performs worst, consistently selecting checkpoints with lower test accuracy than both loss-based early stopping and post-hoc selection. (2) Loss-based validation criteria yield comparable and more stable test accuracy. (3) Across datasets and folds, any single validation rule often underperforms the test-optimal checkpoint. Overall, the selected model typically achieves test-set performance statistically lower than the best performance across all epochs, regardless of the validation criterion. Our results suggest avoiding validation accuracy (in particular with early stopping) for parameter selection, favoring loss-based validation criteria.
别急着停:重新思考模型参数选择的验证标准 / Don't stop me now: Rethinking Validation Criteria for Model Parameter Selection
这篇论文通过系统实验发现,在训练神经网络时,使用验证集上的准确率(特别是用于早停)来选择最佳模型参数效果不佳,反而使用验证集上的损失函数值作为选择标准能得到更稳定、更好的测试性能。
源自 arXiv: 2602.22107