菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-28
📄 Abstract - The Joint Effect of Quantization and Sampling Temperature on LLM Safety Alignment: A Factorial Analysis

Modern LLM deployments routinely compress models and raise sampling temperature to reduce cost, latency, or repetition, yet safety evaluations usually treat these choices as fixed implementation details. This leaves a practical uncertainty: does a model that is safe at FP16 and greedy decoding remain safe after it is quantized and sampled stochastically, or do the two deployment knobs amplify one another? We study this question with a factorial evaluation of 9 instruction-tuned models from six families, 3 precisions (FP16, GPTQ INT8, AWQ INT4), and 6 temperatures ($T{=}0$ to $1.0$), yielding 161 configurations and $\approx$322k responses judged by a six-model safety ensemble. Contrary to the concern that low-bit deployment broadly erodes alignment, standard non-adversarial quantization is usually safety-neutral: INT4 keeps or lowers attack success for 7 of 9 models, with clear degradation concentrated in the weakest baseline model, SmolLM3-3B ($18.5\%{\to}36.0\%$). The larger risk comes from sampling: higher temperature sharply increases decision instability for vulnerable models, with DFR reaching 53.0\% at $T{=}1.0$, even when average ASR changes modestly. Finally, the interaction is not a ``double penalty'': our Compound Degradation Index remains largely sub-additive ($-0.195$ to $+0.045$), indicating that quantization and temperature do not systematically compound. These results suggest a deployment rule of thumb: standard INT4/INT8 quantization can be reasonable for strongly aligned models, but safety claims at elevated temperature should report multi-sample stability, not only average attack success.

顶级标签: llm model evaluation systems
详细标签: quantization sampling temperature safety alignment factorial analysis deployment 或 搜索:

量化与采样温度对大型语言模型安全对齐的联合影响:一项因子分析 / The Joint Effect of Quantization and Sampling Temperature on LLM Safety Alignment: A Factorial Analysis


1️⃣ 一句话总结

这篇论文通过大规模实验发现,对大型语言模型进行低比特量化(如INT4)通常不会显著破坏其安全性,但提高采样温度会大幅增加模型产生不安全输出的不稳定性,且两者并不会叠加产生更严重的危害。

源自 arXiv: 2606.29581