深度神经网络的可证明公平性修复 / Provable Fairness Repair for Deep Neural Networks
1️⃣ 一句话总结
本文提出了一种名为ProF的新框架,通过结合区间边界传播和混合整数线性规划,能够在对深度神经网络进行公平性修复时提供数学上可保证的公平性,而不仅仅是依赖数据调整,从而有效防止模型对特定群体的歧视行为。
Deep neural networks (DNNs) are suffering from ethical issues such as individual discrimination. In response, extensive NN repair techniques have been developed to adjust models and mitigate such undesired behaviors. However, existing fairness repair methods are typically data-centric, which often lack provable guarantees and generalization to unseen samples. To overcome these limitations, we propose ProF, a novel fairness repair framework with provable guarantees. The key intuition of ProF is to leverage interval bound propagation (a widely used NN verification technique) to soundly capture model outputs over the whole set $S(\mathbf{x})$ around a biased sample $\mathbf{x}$. The derived bounds are utilized to guide fairness repair which encourages the model to produce consistent outputs on $S(\mathbf{x})$. Specifically, we integrate fairness constraints and model modifications into a unified constraint-solving formulation, which can be transformed to a Mixed-Integer Linear Programming (MILP) problem solvable by off-the-shelf solvers. The solution to the MILP problem effectively induces a repaired model with guaranteed fairness over the whole set $S(\mathbf{x})$. We evaluate ProF on four widely used benchmark datasets and demonstrate that it achieves provable fairness repair, with generalization of up to 95.93\% on full datasets and 93.16\% on the entire input space. Notably, ProF can be easily configured to support multiple sensitive attributes and more practical fairness definitions, while providing provable repair guarantees and delivering around 90\% fairness improvement. Our code is available at this https URL.
深度神经网络的可证明公平性修复 / Provable Fairness Repair for Deep Neural Networks
本文提出了一种名为ProF的新框架,通过结合区间边界传播和混合整数线性规划,能够在对深度神经网络进行公平性修复时提供数学上可保证的公平性,而不仅仅是依赖数据调整,从而有效防止模型对特定群体的歧视行为。
源自 arXiv: 2605.19549