LAMP: Look-Ahead Mixed-Precision Inference of Large Language Models

📄 Abstract - LAMP: Look-Ahead Mixed-Precision Inference of Large Language Models

Mixed-precision computations are a hallmark of the current stage of AI, driving the progress in large language models towards efficient, locally deployable solutions. This article addresses the floating-point computation of compositionally-rich functions, concentrating on transformer inference. Based on the rounding error analysis of a composition $f(g(\mathrm{x}))$, we provide an adaptive strategy that selects a small subset of components of $g(\mathrm{x})$ to be computed more accurately while all other computations can be carried out with lower accuracy. We then explain how this strategy can be applied to different compositions within a transformer and illustrate its overall effect on transformer inference. We study the effectiveness of this algorithm numerically on GPT-2 models and demonstrate that already very low recomputation rates allow for improvements of up to two orders of magnitude in accuracy.

LAMP：大型语言模型的前瞻性混合精度推理 / LAMP: Look-Ahead Mixed-Precision Inference of Large Language Models

1️⃣ 一句话总结

这篇论文提出了一种名为LAMP的前瞻性混合精度推理方法，它通过分析计算过程中的误差，智能地选择对最终结果影响最大的少量关键计算步骤使用高精度，而让其余大部分计算使用低精度，从而在几乎不增加计算量的情况下，将Transformer模型（如GPT-2）的推理精度提升高达两个数量级。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要