菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-18
📄 Abstract - Only relative ranks matter in weight-clustered large language models

Large language models (LLMs) contain billions of parameters, yet many exact values are not essential. We show that what matters most is the relative rank of weights-whether one connection is stronger or weaker than another-rather than precise magnitudes. To reduce the number of unique weight values, we apply weight clustering to pretrained models, replacing every weight matrix with K shared values from K-means. For Llama 3.1-8B-Instruct and SmolLM2-135M, reducing each matrix to only 16-64 distinct values preserves strong accuracy without retraining, providing a simple, training-free method to compress LLMs on disk. Optionally fine-tuning only the cluster means (centroids) recovers 30-40 percent of the remaining accuracy gap at minimal cost. We then systematically randomize cluster means while keeping assignments fixed. Scrambling the relative ranks of the clusters degrades quality sharply-perplexity can increase by orders of magnitude-even when global statistics such as mean and variance are preserved. In contrast, rank-preserving randomizations cause almost no loss at mid and late layers. On the other hand, when many layers are perturbed simultaneously, progressive layer-by-layer replacement reveals that scale drift-not rank distortion-is the dominant collapse mechanism; however, an affine correction w' = aw + b with a > 0 (which preserves both rank order and overall weight distribution) can substantially delay this drift. This rank-based perspective offers a new lens on model compression and robustness.

顶级标签: llm model training machine learning
详细标签: weight clustering model compression rank preservation training-free parameter efficiency 或 搜索:

权重聚类的大语言模型中,只有相对排序才重要 / Only relative ranks matter in weight-clustered large language models


1️⃣ 一句话总结

这篇论文发现,大语言模型性能的关键在于权重之间的相对强弱排序,而非精确数值,因此通过简单的权重聚类将每个矩阵压缩到仅16-64个不同值,就能在不重新训练的情况下有效压缩模型,并揭示了保持权重排序对维持模型能力至关重要。

源自 arXiv: 2603.17917