菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-03
📄 Abstract - Learning What Not to Impute: An Uncertainty-Aware Diffusion Framework for Meaningful Missingness

Missing value imputation is a fundamental task in machine learning, with most existing methods assuming that all missing entries correspond to unobserved regular values. In many real-world datasets, however, missingness may arise from two distinct sources: some entries are meaningfully missing (intrinsically absent and semantically valid), while others are missing due to the observation process and should be imputed. We formalize this distinction as a selective imputation problem, where the goal is to jointly infer which missing entries should be preserved and which should be recovered. To address this challenge, we propose Diff-Joint, a diffusion-based framework that jointly models tabular data together with a latent missingness mask. The method alternates between conditional sampling and uncertainty-aware aggregation to iteratively refine both imputed values and missingness labels. Empirical results on synthetic and real-world datasets demonstrate that Diff-Joint effectively identifies meaningfully missing entries while achieving competitive imputation accuracy and improved downstream task performance.

顶级标签: machine learning data
详细标签: missing value imputation diffusion model uncertainty-aware tabular data selective imputation 或 搜索:

学会什么不该填补:一个面向有意义缺失的感知不确定性扩散框架 / Learning What Not to Impute: An Uncertainty-Aware Diffusion Framework for Meaningful Missingness


1️⃣ 一句话总结

这篇论文提出了一种名为Diff-Joint的扩散模型框架,能够区分数据中“有意义缺失”和“观测缺失”两种类型,并智能地决定哪些空缺应该被填补、哪些应保持原样,从而在提高填补精度的同时保留数据背后的语义信息。

源自 arXiv: 2606.05073