菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-02
📄 Abstract - How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration

Hyperparameter optimization (HPO) for Random Forest faces a specific difficulty in tuning the number of trees: the predictive score typically improves monotonically with ensemble size, so standard methods such as Tree-structured Parzen Estimator (TPE) and Hyperband require a predefined search range and often drive the estimate toward its right boundary. Early-stopping strategies avoid fixing such a range, but can be sensitive to score noise and prone to premature stopping. To address this, we propose an integrated triplet-based plateau-search algorithm that removes the number of trees from the direct TPE search space and still exploits information accumulated across HPO trials. The method adaptively tracks a near-minimal sufficient ensemble size by monitoring relative changes in the out-of-bag (OOB) score across a triplet of forest sizes and shifting this triplet accordingly. This yields an automated and user-interpretable procedure based on a tolerance parameter. We also provide a theoretical analysis: we relate the proposed relative OOB-score criterion to the gap between the current and limiting scores, and derive an asymptotic variance estimate for the corresponding OOB-based absolute relative difference. Experiments show that the selected number of trees can differ substantially from the common heuristic: for most classical benchmark datasets it is smaller, whereas for some high-dimensional bioinformatics datasets, such as Arcene and Dorothea, it is larger. The source code and reproducible experiments are available at this https URL.

顶级标签: machine learning model training
详细标签: random forest hyperparameter optimization plateau search optuna ensemble size 或 搜索:

随机森林中需要多少棵树?一种结合高原搜索与Optuna集成的改进方法 / How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration


1️⃣ 一句话总结

本文提出了一种自动确定随机森林中最佳树数量的新方法,通过监测袋外误差的波动情况动态调整森林规模,避免了传统调参方法中因性能单调提升而导致预估偏向最大值的缺陷,实验表明该方法在多数数据集上推荐的树数少于常规经验值,而在某些高维生物信息数据上则更多。

源自 arXiv: 2606.03549