反向传播是最优的吗?——合成梯度何时能提升样本效率 / Is Backpropagation Optimal? When Synthetic Gradients Improve Sample Efficiency
1️⃣ 一句话总结
本文从样本效率角度挑战了反向传播在神经网络训练中的默认地位,提出在计算图上使用合成梯度作为替代方案,并从理论上证明在特定条件下合成梯度可以比反向传播获得更低的梯度估计误差,从而在上下文强盗和强化学习任务中显著提升样本效率。
Backpropagation is the default learning rule for artificial neural networks and is often treated as the settled approach whenever differentiability is available. In this work, we revisit this convention through a theoretical lens of sample efficiency. We introduce a unified vectorized feedback framework for loss-based and reward-based learning on computational graphs, in which synthetic gradients emerge as a natural alternative to backpropagation. We characterize the conditions under which synthetic gradients can achieve a lower gradient-estimation mean squared error than backpropagation. We construct examples illustrating that this sample efficiency advantage can be arbitrarily large. Experiments on contextual bandits and reinforcement learning tasks demonstrate the potential of our theoretical findings.
反向传播是最优的吗?——合成梯度何时能提升样本效率 / Is Backpropagation Optimal? When Synthetic Gradients Improve Sample Efficiency
本文从样本效率角度挑战了反向传播在神经网络训练中的默认地位,提出在计算图上使用合成梯度作为替代方案,并从理论上证明在特定条件下合成梯度可以比反向传播获得更低的梯度估计误差,从而在上下文强盗和强化学习任务中显著提升样本效率。
源自 arXiv: 2605.27946