基于标记信任集引导:结合强化学习的批量主动学习 / Labeled TrustSet Guided: Batch Active Learning with Reinforcement Learning
1️⃣ 一句话总结
本文提出了一种名为BRAL-T的新框架,通过结合从已标记数据中精选的‘信任集’和强化学习策略,智能地从海量未标记数据中挑选最有价值的一批样本进行标注,从而在降低标注成本的同时,显著提升了模型在各种图像分类任务上的性能。
Batch active learning (BAL) is a crucial technique for reducing labeling costs and improving data efficiency in training large-scale deep learning models. Traditional BAL methods often rely on metrics like Mahalanobis Distance to balance uncertainty and diversity when selecting data for annotation. However, these methods predominantly focus on the distribution of unlabeled data and fail to leverage feedback from labeled data or the model's performance. To address these limitations, we introduce TrustSet, a novel approach that selects the most informative data from the labeled dataset, ensuring a balanced class distribution to mitigate the long-tail problem. Unlike CoreSet, which focuses on maintaining the overall data distribution, TrustSet optimizes the model's performance by pruning redundant data and using label information to refine the selection process. To extend the benefits of TrustSet to the unlabeled pool, we propose a reinforcement learning (RL)-based sampling policy that approximates the selection of high-quality TrustSet candidates from the unlabeled data. Combining TrustSet and RL, we introduce the Batch Reinforcement Active Learning with TrustSet (BRAL-T) framework. BRAL-T achieves state-of-the-art results across 10 image classification benchmarks and 2 active fine-tuning tasks, demonstrating its effectiveness and efficiency in various domains.
基于标记信任集引导:结合强化学习的批量主动学习 / Labeled TrustSet Guided: Batch Active Learning with Reinforcement Learning
本文提出了一种名为BRAL-T的新框架,通过结合从已标记数据中精选的‘信任集’和强化学习策略,智能地从海量未标记数据中挑选最有价值的一批样本进行标注,从而在降低标注成本的同时,显著提升了模型在各种图像分类任务上的性能。
源自 arXiv: 2604.12303