菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-05
📄 Abstract - Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity

Semi-structured N:M sparsity and low-bit quantization (e.g., 1.58-bit BitNet) are two promising approaches for improving the efficiency of large language models (LLMs), yet they have largely been studied in isolation. In this work, we investigate their interaction and show that 1.58-bit BitNet is naturally more compatible with N:M sparsity than full-precision models. To study this effect, we propose Sparse-BitNet, a unified framework that jointly applies 1.58-bit quantization and dynamic N:M sparsification while ensuring stable training for the first time. Across multiple model scales and training regimes (sparse pretraining and dense-to-sparse schedules), 1.58-bit BitNet consistently exhibits smaller performance degradation than full-precision baselines at the same sparsity levels and can tolerate higher structured sparsity before accuracy collapse. Moreover, using our custom sparse tensor core, Sparse-BitNet achieves substantial speedups in both training and inference, reaching up to 1.30X. These results highlight that combining extremely low-bit quantization with semi-structured N:M sparsity is a promising direction for efficient LLMs. Code available at this https URL

顶级标签: llm model training systems
详细标签: quantization sparsity efficient inference model compression 1.58-bit 或 搜索:

稀疏比特网:1.58比特大语言模型天然适用于半结构化稀疏化 / Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity


1️⃣ 一句话总结

这篇论文发现,将大语言模型压缩到极低的1.58比特后,它们反而能更好地承受另一种名为‘半结构化稀疏’的压缩技术,两者结合能显著提升模型运行速度且性能损失更小。

源自 arXiv: 2603.05168