FOGO:具有遗忘感知的正交化优化器 / FOGO: Forgetting-aware Orthogonalization Optimizer
1️⃣ 一句话总结
本文提出了一种名为FOGO的新型优化器,它能够自动检测并解决训练过程中的梯度冲突,防止某些“强势”梯度方向长期压制其他有用但罕见的更新方向,从而在标准训练、类别不平衡、持续学习以及大模型微调等场景下显著提升模型的收敛速度和记忆保留能力,效果优于Adam和Muon。
We argue that forgetting is not confined to continual learning but is a general optimization phenomenon: during standard training, dominant mini-batch gradients suppress rare but useful update directions, causing short-term forgetting at every step. When such knowledge is never revisited, these losses compound into long-term forgetting-the classical failure mode of continual learning. We introduce FOGO, a scalable optimizer that continuously detects and resolves gradient interference across both regimes. FOGO spectrally orthogonalizes momentum updates to prevent dominant directions from monopolizing optimization, then stores representative past directions in a compact codebook memory built on random projection, where pairwise distances are provably preserved in low-dimensional space. At each step, conflicts between the current update and stored directions are resolved via lightweight orthogonal correction and lifted back through a proximal step, with minimal overhead and no data storage. Across class-imbalanced classification, continual visual learning under domain and class shifts, continual fine-tuning of LLaVA-7B, and GPT-2 pretraining, FOGO consistently improves convergence and knowledge retention, outperforming Adam and Muon.
FOGO:具有遗忘感知的正交化优化器 / FOGO: Forgetting-aware Orthogonalization Optimizer
本文提出了一种名为FOGO的新型优化器,它能够自动检测并解决训练过程中的梯度冲突,防止某些“强势”梯度方向长期压制其他有用但罕见的更新方向,从而在标准训练、类别不平衡、持续学习以及大模型微调等场景下显著提升模型的收敛速度和记忆保留能力,效果优于Adam和Muon。
源自 arXiv: 2606.10406