菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-05
📄 Abstract - Regularized Calibration with Successive Rounding for Post-Training Quantization

Large language models (LLMs) deliver robust performance across diverse applications, yet their deployment often faces challenges due to the memory and latency costs of storing and accessing billions of parameters. Post-training quantization (PTQ) enables efficient inference by mapping pretrained weights to low-bit formats without retraining, but its effectiveness depends critically on both the quantization objective and the rounding procedure used to obtain low-bit weight representations. In this work, we show that interpolating between symmetric and asymmetric calibration acts as a form of regularization that preserves the standard quadratic structure used in PTQ while providing robustness to activation mismatch. Building on this perspective, we derive a simple successive rounding procedure that naturally incorporates asymmetric calibration, as well as a bounded-search extension that allows for an explicit trade-off between quantization quality and the compute cost. Experiments across multiple LLM families, quantization bit-widths, and benchmarks demonstrate that the proposed bounded search based on a regularized asymmetric calibration objective consistently improves perplexity and accuracy over PTQ baselines, while incurring only modest and controllable additional computational cost.

顶级标签: llm model training systems
详细标签: post-training quantization model compression calibration low-bit inference successive rounding 或 搜索:

用于训练后量化的正则化校准与逐次舍入方法 / Regularized Calibration with Successive Rounding for Post-Training Quantization


1️⃣ 一句话总结

这篇论文提出了一种新的训练后量化方法,通过结合对称与非对称校准的正则化技术,以及一种高效的逐次舍入搜索策略,在仅增加少量计算成本的前提下,显著提升了大语言模型在低比特量化后的性能表现。

源自 arXiv: 2602.05902