菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-16
📄 Abstract - S2D: Selective Spectral Decay for Quantization-Friendly Conditioning of Neural Activations

Activation outliers in large-scale transformer models pose a fundamental challenge to model quantization, creating excessively large ranges that cause severe accuracy drops during quantization. We empirically observe that outlier severity intensifies with pre-training scale (e.g., progressing from CLIP to the more extensively trained SigLIP and SigLIP2). Through theoretical analysis as well as empirical correlation studies, we establish the direct link between these activation outliers and dominant singular values of the weights. Building on this insight, we propose Selective Spectral Decay ($S^2D$), a geometrically-principled conditioning method that surgically regularizes only the weight components corresponding to the largest singular values during fine-tuning. Through extensive experiments, we demonstrate that $S^2D$ significantly reduces activation outliers and produces well-conditioned representations that are inherently quantization-friendly. Models trained with $S^2D$ achieve up to 7% improved PTQ accuracy on ImageNet under W4A4 quantization and 4% gains when combined with QAT. These improvements also generalize across downstream tasks and vision-language models, enabling the scaling of increasingly large and rigorously trained models without sacrificing deployment efficiency.

顶级标签: model training machine learning theory
详细标签: model quantization activation outliers spectral regularization transformer models fine-tuning 或 搜索:

S2D:用于神经激活量化友好调节的选择性谱衰减 / S2D: Selective Spectral Decay for Quantization-Friendly Conditioning of Neural Activations


1️⃣ 一句话总结

这篇论文提出了一种名为S2D的新方法,通过有针对性地调整神经网络权重中影响最大的部分,有效解决了大模型量化时因激活值异常大而导致的精度下降问题,从而让模型在保持高性能的同时更容易被压缩和高效部署。

源自 arXiv: 2602.14432