大语言模型的零阶微调本质上是一种推理任务 / LLM Zeroth-Order Fine-Tuning is an Inference Workload
1️⃣ 一句话总结
本文发现,大语言模型的零阶微调(无需反向传播,仅靠多次前向评估)实际运行中大部分计算是重复的、类似推理的打分操作,因此将这一过程放到推理服务框架(如vLLM)中执行,相比传统训练循环可提升2.34到8.13倍速度,同时保持相近的模型准确率,为未来将轻量模型适配作为推理任务调度提供了新思路。
Zeroth-order (ZO) fine-tuning is attractive for large language models because it replaces backpropagation with forward objective evaluations. Existing implementations nevertheless execute ZO algorithms inside conventional training loops, even though their dominant work is repeated scoring under nearby parameter states. This creates a workload-runtime mismatch: the algorithm asks for structured inference-style scoring, while the system exposes a sequence of fragmented training-loop steps. We show that LLM ZO fine-tuning is an inference-dominated workload and execute its repeated scoring phase through a serving runtime. On OPT-13B SST-2, the resulting vLLM execution path completes the 20k-step LoZO run in 0.51 estimated training hours versus 4.15 hours for the official LoZO baseline under the matched LoRA-only setting, an 8.13x speedup, while reaching 0.922 final evaluation accuracy and 0.931 final full-validation accuracy. In core-step scaling experiments across OPT-1.3B to OPT-13B, the same runtime reorganization gives 2.34x--7.72x speedups. A MeZO-style high-rank factorized experiment shows that the same runtime paradigm can track a MeZO-like loss trajectory while running up to 2.55x faster. More broadly, representing ZO updates as dynamic adapter states suggests a practical path toward inference-time training, where lightweight adaptation can be scheduled as an inference-like workload rather than as a separate training job.
大语言模型的零阶微调本质上是一种推理任务 / LLM Zeroth-Order Fine-Tuning is an Inference Workload
本文发现,大语言模型的零阶微调(无需反向传播,仅靠多次前向评估)实际运行中大部分计算是重复的、类似推理的打分操作,因此将这一过程放到推理服务框架(如vLLM)中执行,相比传统训练循环可提升2.34到8.13倍速度,同时保持相近的模型准确率,为未来将轻量模型适配作为推理任务调度提供了新思路。
源自 arXiv: 2605.28760