📄
Abstract - How Controlling the Variance can Improve Training Stability of Sparsely Activated DNNs and CNNs
The intermediate layers of deep networks can be characterised as a Gaussian process, in particular the Edge-of-Chaos (EoC) initialisation strategy prescribes the limiting covariance matrix of the Gaussian process. Here we show that the under-utilised chosen variance of the Gaussian process is important in the training of deep networks with sparsity inducing activation, such as a shifted and clipped ReLU, $\text{CReLU}_{\tau,m}(x)=\min(\max(x-\tau,0),m)$. Specifically, initialisations leading to larger fixed Gaussian process variances, allow for improved expressivity with activation sparsity as large as 90% in DNNs and CNNs, and generally improve the stability of the training process. Enabling full, or near full, accuracy at such high levels of sparsity in the hidden layers suggests a promising mechanism to reduce the energy consumption of machine learning models involving fully connected layers.
如何通过控制方差来提升稀疏激活深度神经网络和卷积神经网络的训练稳定性 /
How Controlling the Variance can Improve Training Stability of Sparsely Activated DNNs and CNNs
1️⃣ 一句话总结
这篇论文发现,在初始化深度神经网络时,通过设置一个较大的高斯过程方差,可以显著提升使用稀疏激活函数(如CReLU)的模型的训练稳定性,并能在隐藏层激活稀疏度高达90%的情况下保持模型性能,为降低机器学习模型的能耗提供了新思路。