权重聚类的大语言模型中,只有相对排序才重要 / Only relative ranks matter in weight-clustered large language models
1️⃣ 一句话总结
这篇论文发现,大语言模型性能的关键在于权重之间的相对强弱排序,而非精确数值,因此通过简单的权重聚类将每个矩阵压缩到仅16-64个不同值,就能在不重新训练的情况下有效压缩模型,并揭示了保持权重排序对维持模型能力至关重要。
Large language models (LLMs) contain billions of parameters, yet many exact values are not essential. We show that what matters most is the relative rank of weights-whether one connection is stronger or weaker than another-rather than precise magnitudes. To reduce the number of unique weight values, we apply weight clustering to pretrained models, replacing every weight matrix with K shared values from K-means. For Llama 3.1-8B-Instruct and SmolLM2-135M, reducing each matrix to only 16-64 distinct values preserves strong accuracy without retraining, providing a simple, training-free method to compress LLMs on disk. Optionally fine-tuning only the cluster means (centroids) recovers 30-40 percent of the remaining accuracy gap at minimal cost. We then systematically randomize cluster means while keeping assignments fixed. Scrambling the relative ranks of the clusters degrades quality sharply-perplexity can increase by orders of magnitude-even when global statistics such as mean and variance are preserved. In contrast, rank-preserving randomizations cause almost no loss at mid and late layers. On the other hand, when many layers are perturbed simultaneously, progressive layer-by-layer replacement reveals that scale drift-not rank distortion-is the dominant collapse mechanism; however, an affine correction w' = aw + b with a > 0 (which preserves both rank order and overall weight distribution) can substantially delay this drift. This rank-based perspective offers a new lens on model compression and robustness.
权重聚类的大语言模型中,只有相对排序才重要 / Only relative ranks matter in weight-clustered large language models
这篇论文发现,大语言模型性能的关键在于权重之间的相对强弱排序,而非精确数值,因此通过简单的权重聚类将每个矩阵压缩到仅16-64个不同值,就能在不重新训练的情况下有效压缩模型,并揭示了保持权重排序对维持模型能力至关重要。
源自 arXiv: 2603.17917