SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

📄 Abstract - SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

Extreme low-bit quantization is critical for efficiently deploying Large Language Models (LLMs), yet it often leads to severe performance degradation at 2-bits and even 4-bits (e.g., MXFP4). We present SignRoundV2, a post-training quantization framework that is highly effective even without mixed-precision. SignRoundV2 introduces (1) a fast sensitivity metric that combines gradient information with quantization-induced deviations to guide layer-wise bit allocation, and (2) a lightweight pre-tuning search for quantization scales to improve extremely low-bit quantization. These components allow SignRoundV2 to close the gap with full-precision models. Extensive experiments indicate that our method sustains competitive accuracy for LLMs, achieving production-grade performance with about 1 percent variance at 4-5 bits and strong results even at 2 bits. The implementation is available at this https URL.

SignRoundV2：弥合大语言模型极低位宽后训练量化中的性能差距 / SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

1️⃣ 一句话总结

这篇论文提出了一个名为SignRoundV2的新方法，它通过一种快速的敏感度指标和轻量级预调优技术，成功地将大语言模型压缩到极低的位宽（如2比特或4比特），同时保持了与原始高精度模型非常接近的性能，解决了此类压缩通常导致性能严重下降的难题。

← 返回列表

菜单

🤖 AI 深度阅读

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

🤖 AI 深度阅读

1️⃣ 一句话总结

获取最新论文摘要