📄
Abstract - ZC-Swish: Stabilizing Deep BN-Free Networks for Edge and Micro-Batch Applications
Batch Normalization (BN) is a cornerstone of deep learning, yet it fundamentally breaks down in micro-batch regimes (e.g., 3D medical imaging) and non-IID Federated Learning. Removing BN from deep architectures, however, often leads to catastrophic training failures such as vanishing gradients and dying channels. We identify that standard activation functions, like Swish and ReLU, exacerbate this instability in BN-free networks due to their non-zero-centered nature, which causes compounding activation mean-shifts as network depth increases. In this technical communication, we propose Zero-Centered Swish (ZC-Swish), a drop-in activation function parameterized to dynamically anchor activation means near zero. Through targeted stress-testing on BN-free convolutional networks at depths 8, 16, and 32, we demonstrate that while standard Swish collapses to near-random performance at depth 16 and beyond, ZC-Swish maintains stable layer-wise activation dynamics and achieves the highest test accuracy at depth 16 (51.5%) with seed 42. ZC-Swish thus provides a robust, parameter-efficient solution for stabilizing deep networks in memory-constrained and privacy-preserving applications where traditional normalization is unviable.
ZC-Swish:面向边缘与微批次应用的深度无批归一化网络稳定方法 /
ZC-Swish: Stabilizing Deep BN-Free Networks for Edge and Micro-Batch Applications
1️⃣ 一句话总结
该论文提出了一种名为ZC-Swish的新型激活函数,通过让输出均值始终接近零,解决了在无法使用批归一化的深度网络中(如医疗影像小批次训练或联邦学习场景)因激活函数(如Swish、ReLU)非零中心特性导致的梯度消失和训练崩溃问题,使网络在深度达32层时仍能稳定训练并保持较高准确率。