菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-02
📄 Abstract - Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs

Quantization Error Reconstruction (QER) reduces accuracy loss in Post-Training Quantization (PTQ) by approximating weights as $\mathbf{W} \approx \mathbf{Q} + \mathbf{L}\mathbf{R}$, using a rank-$r$ correction to reconstruct quantization error. Prior methods devote the full rank budget to error reconstruction, which is suboptimal when $\mathbf{W}$ has intrinsic low-rank structure and quantization corrupts dominant directions. We propose Structured Residual Reconstruction (SRR), a rank-allocation framework that preserves the top-$k$ singular subspace of the activation-scaled weight before quantization, quantizes only the residual, and uses the remaining rank $r-k$ for error reconstruction. We derive a theory-guided criterion for selecting $k$ by balancing quantization-exposed energy and unrecoverable error under rank constraints. We further show that resulting $\mathbf{Q} + \mathbf{L}\mathbf{R}$ parameterization naturally supports Quantized Parameter-Efficient Fine-Tuning (QPEFT), and stabilizes fine-tuning via gradient scaling along preserved directions. Experiments demonstrate consistent perplexity reductions across diverse models and quantization settings in PTQ, along with a 5.9 percentage-point average gain on GLUE under 2-bit QPEFT.

顶级标签: llm model training theory
详细标签: quantization error reconstruction low-rank approximation parameter-efficient fine-tuning post-training quantization 或 搜索:

先保留再量化:在大型语言模型的量化误差重建中平衡秩预算 / Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs


1️⃣ 一句话总结

这篇论文提出了一种名为‘结构化残差重建’的新方法,它先保护权重矩阵中最重要的部分不被量化破坏,再用剩下的计算资源去修复量化带来的误差,从而在压缩大型语言模型时更好地保持其性能,并提升后续微调的效果。

源自 arXiv: 2602.02001