菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-22
📄 Abstract - On the Quantization Robustness of Diffusion Language Models in Coding Benchmarks

Auto-regressive Large Language Models (LLMs) achieve strong performance on coding tasks, but incur high memory and inference costs. Diffusion-based language models (d-LLMs) offer bounded inference cost via iterative denoising, but their behavior under post-training quantization (PTQ) has been sparsely explored. We investigate the application and robustness of PTQ techniques, specifically GPTQ and a modified Hessian-Aware Quantization (HAWQ) algorithm, on a diffusion-based coding LLM (CoDA) and observe that these methods applied to CoDA exhibit greater robustness at low bitwidths compared to Qwen3-1.7B, its auto-regressive counterpart, under a standardized evaluation pipeline. We find that in our setup, CoDA exhibits greater robustness at low bitwidths (2-4 bits), with smaller accuracy degradation across HumanEval and MBPP benchmarks. Additionally, mixed-precision configurations derived from HAWQ provide smooth trade-offs across accuracy, latency, and memory. The results suggest that diffusion LLMs may offer advantages for efficient deployment due to more quantization-resilience.

顶级标签: llm model evaluation systems
详细标签: diffusion language models post-training quantization quantization robustness coding benchmarks low bitwidth 或 搜索:

关于扩散语言模型在编程基准测试中的量化鲁棒性研究 / On the Quantization Robustness of Diffusion Language Models in Coding Benchmarks


1️⃣ 一句话总结

本文研究了扩散语言模型(如CoDA)在低比特量化时的表现,发现其比传统自回归模型(如Qwen3-1.7B)更能抵抗精度损失,在编程任务上能以更小的性能下降实现更高效的模型部署。

源自 arXiv: 2604.20079