菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-20
📄 Abstract - Objective-Induced Bias and Search Dynamics in Multiobjective Unsupervised Feature Selection

Unsupervised feature selection is commonly formulated as a multiobjective optimisation problem that jointly optimises subset quality and subset size. Yet the behaviour of this formulation depends critically on the choice of evaluation objective, the direction of subset-size regularisation, and the initialisation strategy. We study these factors in a controlled setting using a synthetic dataset with known informative, redundant, and irrelevant feature types. Six formulations are compared by combining three evaluation objectives: accuracy, silhouette score, and PCA reconstruction loss with subset-size minimisation or maximisation. The results show that formulation strongly affects both search dynamics and the quality of the resulting Pareto front. Silhouette-based formulations exhibit a strong bias toward trivial low-cardinality solutions and remain weak proxies for predictive performance. In contrast, the proposed PCA loss objective produces compact subsets with test accuracy comparable to subsets obtained by directly optimising supervised accuracy. Overall, the study shows that objective design is central to effective multiobjective unsupervised feature selection.

顶级标签: machine learning model evaluation
详细标签: feature selection multiobjective optimization unsupervised learning pareto front search dynamics 或 搜索:

多目标无监督特征选择中的目标诱导偏差与搜索动态 / Objective-Induced Bias and Search Dynamics in Multiobjective Unsupervised Feature Selection


1️⃣ 一句话总结

本文通过合成数据集对比六种目标函数组合,发现无监督特征选择中目标函数的设计会显著影响搜索行为和解的质量,其中基于PCA重构损失的目标能像监督方法一样选出紧凑且预测性能高的特征子集,而基于轮廓系数的目标则容易偏向选择极少量特征,导致预测效果不佳。

源自 arXiv: 2605.21561