菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-12
📄 Abstract - CAAL: Confidence-Aware Active Learning for Heteroscedastic Atmospheric Regression

Quantifying the impacts of air pollution on health and climate relies on key atmospheric particle properties such as toxicity and hygroscopicity. However, these properties typically require complex observational techniques or expensive particle-resolved numerical simulations, limiting the availability of labeled data. We therefore estimate these hard-to-measure particle properties from routinely available observations (e.g., air pollutant concentrations and meteorological conditions). Because routine observations only indirectly reflect particle composition and structure, the mapping from routine observations to particle properties is noisy and input-dependent, yielding a heteroscedastic regression setting. With a limited and costly labeling budget, the central challenge is to select which samples to measure or simulate. While active learning is a natural approach, most acquisition strategies rely on predictive uncertainty. Under heteroscedastic noise, this signal conflates reducible epistemic uncertainty with irreducible aleatoric uncertainty, causing limited budgets to be wasted in noise-dominated regions. To address this challenge, we propose a confidence-aware active learning framework (CAAL) for efficient and robust sample selection in heteroscedastic settings. CAAL consists of two components: a decoupled uncertainty-aware training objective that separately optimises the predictive mean and noise level to stabilise uncertainty estimation, and a confidence-aware acquisition function that dynamically weights epistemic uncertainty using predicted aleatoric uncertainty as a reliability signal. Experiments on particle-resolved numerical simulations and real atmospheric observations show that CAAL consistently outperforms standard AL baselines. The proposed framework provides a practical and general solution for the efficient expansion of high-cost atmospheric particle property databases.

顶级标签: machine learning model training data
详细标签: active learning heteroscedastic regression uncertainty estimation atmospheric science sample selection 或 搜索:

CAAL:一种面向异方差大气回归的置信度感知主动学习框架 / CAAL: Confidence-Aware Active Learning for Heteroscedastic Atmospheric Regression


1️⃣ 一句话总结

本文提出了一种名为CAAL的新型主动学习方法,它通过区分可减少的模型不确定性和固有的数据噪声,在数据标注成本高昂的大气颗粒物属性预测任务中,能更智能地选择最有价值的样本进行测量,从而高效地扩充数据库。

源自 arXiv: 2602.11825