学会什么不该填补:一个面向有意义缺失的感知不确定性扩散框架 / Learning What Not to Impute: An Uncertainty-Aware Diffusion Framework for Meaningful Missingness
1️⃣ 一句话总结
这篇论文提出了一种名为Diff-Joint的扩散模型框架,能够区分数据中“有意义缺失”和“观测缺失”两种类型,并智能地决定哪些空缺应该被填补、哪些应保持原样,从而在提高填补精度的同时保留数据背后的语义信息。
Missing value imputation is a fundamental task in machine learning, with most existing methods assuming that all missing entries correspond to unobserved regular values. In many real-world datasets, however, missingness may arise from two distinct sources: some entries are meaningfully missing (intrinsically absent and semantically valid), while others are missing due to the observation process and should be imputed. We formalize this distinction as a selective imputation problem, where the goal is to jointly infer which missing entries should be preserved and which should be recovered. To address this challenge, we propose Diff-Joint, a diffusion-based framework that jointly models tabular data together with a latent missingness mask. The method alternates between conditional sampling and uncertainty-aware aggregation to iteratively refine both imputed values and missingness labels. Empirical results on synthetic and real-world datasets demonstrate that Diff-Joint effectively identifies meaningfully missing entries while achieving competitive imputation accuracy and improved downstream task performance.
学会什么不该填补:一个面向有意义缺失的感知不确定性扩散框架 / Learning What Not to Impute: An Uncertainty-Aware Diffusion Framework for Meaningful Missingness
这篇论文提出了一种名为Diff-Joint的扩散模型框架,能够区分数据中“有意义缺失”和“观测缺失”两种类型,并智能地决定哪些空缺应该被填补、哪些应保持原样,从而在提高填补精度的同时保留数据背后的语义信息。
源自 arXiv: 2606.05073