📄
Abstract - NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks, but their deployment is often constrained by substantial memory footprints and computational costs. While prior work has achieved significant progress in compressing and accelerating linear layers, nonlinear layers-such as SiLU, RMSNorm, and Softmax-still heavily depend on high-precision floating-point operations. In this paper, we propose a calibration-free, dynamic-programming-optimal, and hardware-friendly framework called Non-uniform Linear Interpolation (NLI). NLI is capable of efficiently approximating a variety of nonlinear functions, enabling seamless integration into LLMs and other deep neural networks with almost no loss in accuracy. NLI ingeniously recasts cutpoint selection as a dynamic-programming problem, achieving the globally minimal interpolation error in O(MxN2) time via Bellman's optimality principle. Based on the NLI algorithm, we also design and implement a plug-and-play universal nonlinear computation unit. Hardware experiments demonstrate that the NLI Engine achieves more than 4x improvement in computational efficiency compared to the state-of-the-art designs.
NLI:一种用于高效大语言模型推理的非线性操作非均匀线性插值近似方法 /
NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference
1️⃣ 一句话总结
这篇论文提出了一种名为NLI的创新方法,通过将非线性函数近似问题转化为动态规划问题,高效且精准地替代大语言模型中的复杂非线性计算,从而在不损失精度的前提下,显著提升了模型的推理速度,硬件实验显示其计算效率比现有最优方案提升了4倍以上。