菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-27
📄 Abstract - When Interpretability Is Unequally Distributed: Fairness in Hybrid Interpretable Models

Hybrid interpretable models combine a transparent component with a black-box model by assigning some examples to the former and deferring the rest to the latter. While this design enables flexible tradeoffs between accuracy and interpretability, it also raises a distinct procedural fairness concern: some demographic groups may systematically receive interpretable decisions, while others are disproportionately routed to a black box. We formalize this issue as Interpretability Coverage Disparity (ICD), a demographic-parity-style measure applied to the routing decision of hybrid interpretable models. Using tools from predictive multiplicity, we study ICD across four hybrid interpretable learning methods, three standard fairness benchmark datasets, and multiple sensitive attributes. Our experiments reveal substantial ICD in intermediate transparency regimes, where both the interpretable and black-box components are actively used. We further show that simple coverage-disparity constraints can significantly reduce ICD in exact hybrid learning methods, with marginal impact on accuracy and sparsity. In several settings, ICD mitigation also improves standard algorithmic fairness metrics. These results show that hybrid interpretable models should be audited not only for predictive fairness, but also for how they allocate interpretability across individuals and groups.

顶级标签: machine learning model evaluation
详细标签: interpretability fairness hybrid models disparity predictive multiplicity 或 搜索:

当可解释性分配不均:混合可解释模型中的公平性问题 / When Interpretability Is Unequally Distributed: Fairness in Hybrid Interpretable Models


1️⃣ 一句话总结

本文发现,混合可解释模型(一部分数据由透明模型解释,另一部分交给黑箱模型)可能导致某些人群系统性地获得可解释决策、而其他人群更多被交给黑箱处理,作者将这种不公平定义为“可解释性覆盖差异”,并通过实验证明在中间透明度区间这一问题尤为严重,进而提出了简单有效的约束方法来缓解这种分配不均,且不影响模型准确率。

源自 arXiv: 2605.28626