矢量量化中通用性的代价至多为0.11比特 / Price of universality in vector quantization is at most 0.11 bit
1️⃣ 一句话总结
这篇论文证明,存在一种通用的低精度数据存储格式,它无需针对特定数据分布进行定制,就能在压缩大语言模型参数时,性能仅比最优的定制化方法最多损失0.11比特,这为设计高效的通用模型压缩方案提供了理论依据。
Fast computation of a matrix product $W^\top X$ is a workhorse of modern LLMs. To make their deployment more efficient, a popular approach is that of using a low-precision approximation $\widehat W$ in place of true $W$ ("weight-only quantization''). Information theory demonstrates that an optimal algorithm for reducing precision of $W$ depends on the (second order) statistics of $X$ and requires a careful alignment of vector quantization codebook with PCA directions of $X$ (a process known as "waterfilling allocation''). Dependence of the codebook on statistics of $X$, however, is highly impractical. This paper proves that there exist a universal codebook that is simultaneously near-optimal for all possible statistics of $X$, in the sense of being at least as good as an $X$-adapted waterfilling codebook with rate reduced by 0.11 bit per dimension. Such universal codebook would be an ideal candidate for the low-precision storage format, a topic of active modern research, but alas the existence proof is non-constructive. Equivalently, our result shows existence of a net in $\mathbb{R}^n$ that is a nearly-optimal covering of a sphere simultaneously with respect to all Hilbert norms.
矢量量化中通用性的代价至多为0.11比特 / Price of universality in vector quantization is at most 0.11 bit
这篇论文证明,存在一种通用的低精度数据存储格式,它无需针对特定数据分布进行定制,就能在压缩大语言模型参数时,性能仅比最优的定制化方法最多损失0.11比特,这为设计高效的通用模型压缩方案提供了理论依据。
源自 arXiv: 2602.05790