菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-08
📄 Abstract - Scaling Neural Network Verification with Tensor Parallelism and Fully Sharded Data Parallelism

Formal neural network verification -- proving that a network satisfies safety properties for \emph{all} inputs in a specified domain -- is bounded in practice by GPU memory: standard implementations of bound-propagation algorithms (IBP, CROWN, $\alpha$-CROWN) require weight and relaxation-coefficient matrices to reside entirely on one accelerator. We adapt two parallelism techniques originally developed for large-scale model training to the \texttt{auto\_LiRPA}\,/\,$\alpha,\beta$-CROWN verification framework. \textbf{Tensor Parallelism (TP)} shards both weight and $A$-matrices across GPUs, achieving ${\approx}2\times$ peak-memory reduction at $P{=}2$; soundness is confirmed on VNN-COMP 2022 MNIST-FC benchmarks, though bound tightness degrades with the number of sharded zones due to forced IBP substitution for intermediate bounds inside sharded zones. \textbf{Fully Sharded Data Parallelism (FSDP)} shards only weight matrices with a per-layer \texttt{AllGather}, producing bounds that are \emph{bitwise identical} to the single-GPU baseline: baseline memory drops by 80--90\%, peak memory by 34--39\% on wide MLPs. FSDP integrates cleanly with complete verification ($\beta$-CROWN + Branch-and-Bound) and with convolutional layers (\texttt{BoundConv}); a complete \emph{unsat} result is obtained for CIFAR-100 ResNet-large (VNN-COMP 2024) under FSDP. Across all experiments the memory bottleneck in $\alpha$-CROWN+BaB mode proves to be per-neuron alpha tensors, not weight matrices, pointing to the key direction for future work.

顶级标签: model evaluation systems
详细标签: neural network verification tensor parallelism fully sharded data parallelism memory reduction bound propagation 或 搜索:

利用张量并行与全分片数据并行扩展神经网络验证 / Scaling Neural Network Verification with Tensor Parallelism and Fully Sharded Data Parallelism


1️⃣ 一句话总结

本文针对神经网络形式化验证中GPU内存不足的问题,提出了两种并行化方法:张量并行(TP)通过将权重矩阵分片到多个GPU来减少峰值内存,而全分片数据并行(FSDP)则通过按层聚合通信实现内存降低80-90%且精度无损,并成功应用于复杂网络(如ResNet)的完整验证,同时指出未来优化的关键瓶颈在于激活值相关的alpha张量。

源自 arXiv: 2606.09377