菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-19
📄 Abstract - Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor

MXFP4 arithmetic can dramatically accelerate reinforcement learning (RL) post-training of large language models (LLMs), yet the quantization error introduces severe accuracy degradation. Existing work treats the quantization error as a monolithic noise term, missing the distinct mechanisms upon interpreting how quantization error damages training. We prove an exact three-way decomposition of quantization error and show how each component dominates a distinct RL training pathway. Our theoretical and empirical analysis decomposes the MXFP4 quantization error into three additive components: "scale bias" from power-of-two rounding, "deadzone truncation" from zeroing small values, and "grid noise" from rounding to the nearest 4-bit grid. Each component dominates a distinct RL failure mode: scale bias accumulates multiplicatively through the backward pass, affecting gradient accuracy; deadzone truncation degrades rollout quality; and grid noise raises the policy's entropy. We combine corrections that are RL failure mode-targeted but not component-exclusive: Macro-block scaling to reduce scale bias, Outlier Fallback recovers deadzone entries, but also partially reduces scale bias induced error, and Adaptive Quantization Noise (AQN) for controlling the policy entropy. On Qwen2.5-3B dense and Qwen3-30B-A3B-Base mixture-of-experts model, the targeted corrections recover BF16 accuracy to within 0.7% and exceed BF16 by +1.0% respectively.

顶级标签: llm model training
详细标签: quantization error reinforcement learning mxfp4 error decomposition training efficiency 或 搜索:

分解MXFP4量化误差以用于大语言模型强化学习:可约减的偏差、可恢复的死区和不可约的底噪 / Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor


1️⃣ 一句话总结

该研究首次将MXFP4低精度计算在强化学习训练大语言模型时产生的量化误差,精确拆解为三种不同成分(缩放偏差、死区截断和网格噪声),并针对每种成分设计了专门的修复策略,从而在保持计算加速的同时,将模型性能恢复到接近或超过全精度(BF16)的水平。

源自 arXiv: 2605.20402