1比特奇迹:通过K-Means量化提升低比特量化感知训练的性能 / 1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization
1️⃣ 一句话总结
这项研究发现,在极低的比特数下,使用K-Means方法对大型语言模型的权重进行量化,比传统的整数格式效果更好,并且能在固定内存预算下,用1比特权重在下游生成任务上取得最佳性能。
Quantization-aware training (QAT) is an effective method to drastically reduce the memory footprint of LLMs while keeping performance degradation at an acceptable level. However, the optimal choice of quantization format and bit-width presents a challenge in practice. The full design space of quantization is not fully explored in the context of QAT, and the precise trade-off between quantization and downstream performance is poorly understood, as comparisons often rely solely on perplexity-based evaluations. In this work, we address these shortcomings with an empirical study of QAT in the low-bit regime. We show that k-means based weight quantization outperforms integer formats and can be implemented efficiently on standard hardware. Furthermore, we find that, under a fixed inference memory budget, the best performance on generative downstream tasks is achieved with $1$-bit quantized weights.
1比特奇迹:通过K-Means量化提升低比特量化感知训练的性能 / 1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization
这项研究发现,在极低的比特数下,使用K-Means方法对大型语言模型的权重进行量化,比传统的整数格式效果更好,并且能在固定内存预算下,用1比特权重在下游生成任务上取得最佳性能。
源自 arXiv: 2602.15563