基于采样引导的主动特征选择策略探索 / Sampling-guided exploration of active feature selection policies
1️⃣ 一句话总结
这篇论文提出了一种新的智能方法,通过结合启发式采样和策略简化技术,让机器学习模型能够更高效、低成本地动态选择最有用的数据特征,从而在多个数据集上取得了比现有方法更好的预测效果和更简单的决策流程。
Determining the most appropriate features for machine learning predictive models is challenging regarding performance and feature acquisition costs. In particular, global feature choice is limited given that some features will only benefit a subset of instances. In previous work, we proposed a reinforcement learning approach to sequentially recommend which modality to acquire next to reach the best information/cost ratio, based on the instance-specific information already acquired. We formulated the problem as a Markov Decision Process where the state's dimensionality changes during the episode, avoiding data imputation, contrary to existing works. However, this only allowed processing a small number of features, as all possible combinations of features were considered. Here, we address these limitations with two contributions: 1) we expand our framework to larger datasets with a heuristic-based strategy that focuses on the most promising feature combinations, and 2) we introduce a post-fit regularisation strategy that reduces the number of different feature combinations, leading to compact sequences of decisions. We tested our method on four binary classification datasets (one involving high-dimensional variables), the largest of which had 56 features and 4500 samples. We obtained better performance than state-of-the-art methods, both in terms of accuracy and policy complexity.
基于采样引导的主动特征选择策略探索 / Sampling-guided exploration of active feature selection policies
这篇论文提出了一种新的智能方法,通过结合启发式采样和策略简化技术,让机器学习模型能够更高效、低成本地动态选择最有用的数据特征,从而在多个数据集上取得了比现有方法更好的预测效果和更简单的决策流程。
源自 arXiv: 2603.15110