Learning What Not to Impute: An Uncertainty-Aware Diffusion Framework for Meaningful Missingness

📄 Abstract - Learning What Not to Impute: An Uncertainty-Aware Diffusion Framework for Meaningful Missingness

Missing value imputation is a fundamental task in machine learning, with most existing methods assuming that all missing entries correspond to unobserved regular values. In many real-world datasets, however, missingness may arise from two distinct sources: some entries are meaningfully missing (intrinsically absent and semantically valid), while others are missing due to the observation process and should be imputed. We formalize this distinction as a selective imputation problem, where the goal is to jointly infer which missing entries should be preserved and which should be recovered. To address this challenge, we propose Diff-Joint, a diffusion-based framework that jointly models tabular data together with a latent missingness mask. The method alternates between conditional sampling and uncertainty-aware aggregation to iteratively refine both imputed values and missingness labels. Empirical results on synthetic and real-world datasets demonstrate that Diff-Joint effectively identifies meaningfully missing entries while achieving competitive imputation accuracy and improved downstream task performance.

学会什么不该填补：一个面向有意义缺失的感知不确定性扩散框架 / Learning What Not to Impute: An Uncertainty-Aware Diffusion Framework for Meaningful Missingness

1️⃣ 一句话总结

这篇论文提出了一种名为Diff-Joint的扩散模型框架，能够区分数据中“有意义缺失”和“观测缺失”两种类型，并智能地决定哪些空缺应该被填补、哪些应保持原样，从而在提高填补精度的同时保留数据背后的语义信息。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要