针对图像生成模型预训练数据的黑盒成员推断攻击 / Black-box Membership Inference Attacks on the Pre-training Data of Image-generation Models
1️⃣ 一句话总结
本文提出了一种黑盒成员推断攻击方法SD-MIA,通过分析模型对图像和受干扰文本指令的去噪差异,来检测扩散模型是否使用了特定图像作为预训练数据,从而在无需访问模型内部结构的情况下,有效识别潜在的版权或隐私侵犯风险。
The rapid advancement of diffusion-based image generation models has raised serious concerns regarding potential copyright and privacy infringements involving human-created data. Membership inference attacks (MIAs) have emerged as a promising tool for identifying unauthorized data usage during model training. Existing methods typically assess the ability of model to denoise perturbed suspect images as an indicator of membership status. However, the discriminative power of such features is highly dependent on the degree of model memorization and deteriorates significantly when applied to less exposed data (e.g., pre-training data). Although several methods attempt to enhance detection by leveraging internal model features, these features are generally inaccessible in mainstream closed-source image generation platforms, limiting their practicality. In this paper, we demonstrate that analyzing how a black-box diffusion model denoises a target image and corresponding perturbed textual instructions can reveal more distinctive membership cues. Based on this insight, we propose a black-box membership inference attack framework (named SD-MIA) that leverages a cross-modal data perturbation mechanism to detect pre-training data in diffusion models. We conduct extensive experiments on both a public benchmark dataset and a newly constructed dataset, each comprising pre-training membership and non-membership samples with identical distributions. Experimental results demonstrate that SD-MIA achieves superior performance compared to existing baselines, including those with the unfair advantage of accessing internal model features.
针对图像生成模型预训练数据的黑盒成员推断攻击 / Black-box Membership Inference Attacks on the Pre-training Data of Image-generation Models
本文提出了一种黑盒成员推断攻击方法SD-MIA,通过分析模型对图像和受干扰文本指令的去噪差异,来检测扩散模型是否使用了特定图像作为预训练数据,从而在无需访问模型内部结构的情况下,有效识别潜在的版权或隐私侵犯风险。
源自 arXiv: 2605.27020