神经网络的统计量化器优化方法 / StatQAT: Statistical Quantizer Optimization for Deep Networks
1️⃣ 一句话总结
这篇论文提出了一套基于统计误差分析的方法,能够自动为深度神经网络选择最优的量化参数(如整数或浮点精度),从而在降低计算开销的同时提升模型训练的准确性和稳定性,使低精度推理更加高效可靠。
Quantization is essential for reducing the computational cost and memory usage of deep neural networks, enabling efficient inference on low-precision hardware. Despite the growing adoption of uniform and floating-point quantization schemes, selecting optimal quantization parameters remains a key challenge, particularly for diverse data distributions encountered during training and inference. This work presents a novel statistical error analysis framework for uniform and floating-point quantization, providing theoretical insight into error behavior across quantization configurations. Building on this analysis, we propose iterative quantizers designed for arbitrary data distributions and analytic quantizers tailored for Gaussian-like weight distributions. These methods enable efficient, low-error quantization suitable for both activations and weights. We incorporate our quantizers into quantization-aware training and evaluate them across integer and floating-point formats. Experiments demonstrate improved accuracy and stability, highlighting the effectiveness of our approach for training low-precision neural networks.
神经网络的统计量化器优化方法 / StatQAT: Statistical Quantizer Optimization for Deep Networks
这篇论文提出了一套基于统计误差分析的方法,能够自动为深度神经网络选择最优的量化参数(如整数或浮点精度),从而在降低计算开销的同时提升模型训练的准确性和稳定性,使低精度推理更加高效可靠。
源自 arXiv: 2605.17745