CoQuant:联合权重-激活子空间投影的混合精度大语言模型量化方法 / CoQuant: Joint Weight-Activation Subspace Projection for Mixed-Precision LLMs
1️⃣ 一句话总结
本文提出了一种名为CoQuant的新方法,通过同时考虑模型权重和激活值的噪声影响,智能地选择部分关键计算通道使用高精度,从而在保持大语言模型推理效果的同时大幅降低计算成本。
Post-training quantization (PTQ) has become an important technique for reducing the inference cost of Large Language Models (LLMs). While recent mixed-precision methods improve ultra-low bit quantization by preserving critical subspaces in high precision, they typically construct these subspaces relying solely on activation statistics. This ignores the fundamental nature of linear operations, where the output perturbation is jointly driven by both activation and weight quantization noise. In this paper, we propose CoQuant, a joint weight-activation subspace projection method. By theoretically modeling the expected output error, CoQuant formulates a closed-form weighted PCA solution that balances activation and weight covariances to select the optimal high-precision subspace. Extensive experiments on Llama-3.2 and Qwen2.5 models show that CoQuant consistently outperforms strong PTQ baselines in both WikiText perplexity and zero-shot common-sense reasoning accuracy. These results demonstrate that joint weight-activation subspace modeling provides a principled and effective direction for low-bit LLM quantization. The source code is available at this https URL.
CoQuant:联合权重-激活子空间投影的混合精度大语言模型量化方法 / CoQuant: Joint Weight-Activation Subspace Projection for Mixed-Precision LLMs
本文提出了一种名为CoQuant的新方法,通过同时考虑模型权重和激活值的噪声影响,智能地选择部分关键计算通道使用高精度,从而在保持大语言模型推理效果的同时大幅降低计算成本。
源自 arXiv: 2604.26378