菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-07-02
📄 Abstract - Beyond Gradient-Based Attacks: Adversarial Robustness and Explainability Stability in Cybersecurity Classifiers

Adversarial attacks on cybersecurity classifiers pose a dual threat: degrading predictions and destabilising the SHAP-based explanations that security analysts rely on to understand and triage alerts. We extend our prior MLP conference study to Random Forest and XGBoost across four tabular security datasets (phishing URLs, UNSW-NB15, NF-ToN-IoT, HIKARI-2021), evaluating five attacks including three black-box methods applicable to non-differentiable tree models. We introduce the Explainability Stability Index (ESI), a scalar metric computed from TreeSHAP attribution drift under adversarial perturbation, reported on the same [0,1] scale as the Robustness Index (RI). A key finding is that gradient-based black-box attacks (ZOO) produce degenerate results against XGBoost (apparent RI ~0.98) due to piecewise-constant prediction surfaces, while score-based Square Attack reveals genuine vulnerability (RI ~0.36). These degenerate perturbations still drive substantial attribution drift: XGBoost ESI ~0.06-0.16 despite near-perfect ZOO robustness, versus 0.14-0.29 for RF, showing that prediction robustness and explanation stability are distinct axes requiring joint measurement. A two-axis framework (gradient dependence, query efficiency) explains the observed attack ranking and yields practical guidance for tree ensemble evaluation. A step-size ablation explains a counterintuitive PGD anomaly on z-score normalised tabular data.

顶级标签: machine learning model evaluation
详细标签: adversarial robustness explainability stability cybersecurity classifiers tree ensemble shap-based explanations 或 搜索:

超越基于梯度的攻击:网络安全分类器中的对抗鲁棒性与可解释性稳定性 / Beyond Gradient-Based Attacks: Adversarial Robustness and Explainability Stability in Cybersecurity Classifiers


1️⃣ 一句话总结

本文针对网络安全分类器,发现传统对抗攻击(如基于梯度的攻击)在树模型上会产生误导性的鲁棒性评估,并首次提出了“可解释性稳定性指数”(ESI),证明即使模型预测看似稳健,其基于SHAP的解释也可能严重失真,因此需要同时衡量预测鲁棒性和解释稳定性,才能全面评估分类器的安全性。

源自 arXiv: 2607.01679