JacQuant: STE-Free Quantization-Aware Training via Learned Jacobian Surrogates

📄 Abstract - JacQuant: STE-Free Quantization-Aware Training via Learned Jacobian Surrogates

Quantization-aware training (QAT) is widely deployed but typically relies on the Straight-Through Estimator (STE), which passes gradients through non-differentiable quantizers by fiat. This often makes training brittle near bin boundaries and weakly aligned with the actual behavior of the low-precision model. We introduce JacQuant, a QAT framework that learns a lightweight surrogate of the model's local sensitivity to parameter changes and uses it to stabilize and accelerate training within standard variance-reduced optimizers. The surrogate is inexpensive (diagonal or block-diagonal), data-driven, and compatible with common weight and activation quantizers. On code-preserving training phases, we prove convergence for non-convex objectives and obtain linear rates under a PL condition, and we relate the learned sensitivity to end-to-end output fidelity via a simple calibration argument. Across LLM benchmarks at $\leq 2$ bits, JacQuant consistently reaches higher accuracy than STE-based QAT, and the runtime analyses on various models show that the added cost remains negligible under practical group sizes. The method is drop-in and requires no changes to the forward quantizers; our empirical claims are scoped to ultra-low-bit LLM QAT.

JacQuant：通过学习雅可比代理实现无STE的量化感知训练 / JacQuant: STE-Free Quantization-Aware Training via Learned Jacobian Surrogates

1️⃣ 一句话总结

本文提出了一种名为JacQuant的新型量化感知训练方法，通过学习模型参数变化的局部灵敏度代理（轻量级对角或块对角矩阵），替代传统方法中不稳定的直通估计器（STE），从而在超低位宽（≤2比特）的大语言模型量化训练中显著提升精度，且计算开销几乎可以忽略。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要