特征加权改进基于池的序列主动回归学习 / Feature Weighting Improves Pool-Based Sequential Active Learning for Regression
1️⃣ 一句话总结
这篇论文提出了一种简单有效的方法,通过使用已标注数据训练出的模型系数来为不同特征分配权重,从而改进主动回归学习算法在挑选样本时的距离计算,最终用更少的标注成本获得了更准确的预测模型。
Pool-based sequential active learning for regression (ALR) optimally selects a small number of samples sequentially from a large pool of unlabeled samples to label, so that a more accurate regression model can be constructed under a given labeling budget. Representativeness and diversity, which involve computing the distances among different samples, are important considerations in ALR. However, previous ALR approaches do not incorporate the importance of different features in inter-sample distance computation, resulting in sub-optimal sample selection. This paper proposes three feature weighted single-task ALR approaches and two feature weighted multi-task ALR approaches, where the ridge regression coefficients trained from a small amount of previously labeled samples are used to weight the corresponding features in inter-sample distance computation. Experiments showed that this easy-to-implement enhancement almost always improves the performance of four existing ALR approaches, in both single-task and multi-task regression problems. The feature weighting strategy may also be easily extended to stream-based ALR, and classification algorithms.
特征加权改进基于池的序列主动回归学习 / Feature Weighting Improves Pool-Based Sequential Active Learning for Regression
这篇论文提出了一种简单有效的方法,通过使用已标注数据训练出的模型系数来为不同特征分配权重,从而改进主动回归学习算法在挑选样本时的距离计算,最终用更少的标注成本获得了更准确的预测模型。
源自 arXiv: 2604.02019