Scaling Neural Network Verification with Tensor Parallelism and Fully Sharded Data Parallelism

📄 Abstract - Scaling Neural Network Verification with Tensor Parallelism and Fully Sharded Data Parallelism

Formal neural network verification -- proving that a network satisfies safety properties for \emph{all} inputs in a specified domain -- is bounded in practice by GPU memory: standard implementations of bound-propagation algorithms (IBP, CROWN, $\alpha$-CROWN) require weight and relaxation-coefficient matrices to reside entirely on one accelerator. We adapt two parallelism techniques originally developed for large-scale model training to the \texttt{auto\_LiRPA}\,/\,$\alpha,\beta$-CROWN verification framework. \textbf{Tensor Parallelism (TP)} shards both weight and $A$-matrices across GPUs, achieving ${\approx}2\times$ peak-memory reduction at $P{=}2$; soundness is confirmed on VNN-COMP 2022 MNIST-FC benchmarks, though bound tightness degrades with the number of sharded zones due to forced IBP substitution for intermediate bounds inside sharded zones. \textbf{Fully Sharded Data Parallelism (FSDP)} shards only weight matrices with a per-layer \texttt{AllGather}, producing bounds that are \emph{bitwise identical} to the single-GPU baseline: baseline memory drops by 80--90\%, peak memory by 34--39\% on wide MLPs. FSDP integrates cleanly with complete verification ($\beta$-CROWN + Branch-and-Bound) and with convolutional layers (\texttt{BoundConv}); a complete \emph{unsat} result is obtained for CIFAR-100 ResNet-large (VNN-COMP 2024) under FSDP. Across all experiments the memory bottleneck in $\alpha$-CROWN+BaB mode proves to be per-neuron alpha tensors, not weight matrices, pointing to the key direction for future work.

利用张量并行与全分片数据并行扩展神经网络验证 / Scaling Neural Network Verification with Tensor Parallelism and Fully Sharded Data Parallelism

1️⃣ 一句话总结

本文针对神经网络形式化验证中GPU内存不足的问题，提出了两种并行化方法：张量并行（TP）通过将权重矩阵分片到多个GPU来减少峰值内存，而全分片数据并行（FSDP）则通过按层聚合通信实现内存降低80-90%且精度无损，并成功应用于复杂网络（如ResNet）的完整验证，同时指出未来优化的关键瓶颈在于激活值相关的alpha张量。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要