上下文持续学习中的泛化与遗忘机制理解 / Understanding Generalization and Forgetting in In-Context Continual Learning
1️⃣ 一句话总结
本文首次为上下文持续学习建立了理论框架,通过分析注意力机制如何处理连续任务序列,揭示了标准注意力因统一或因果聚合历史上下文而必然导致任务间干扰,从而解释了长提示中性能下降和顺序敏感性现象。
In-context learning (ICL) derives its power from enabling Large Language Models to adapt to new tasks via prompt-based reasoning alone, entirely bypassing the need for parameter updates. Existing theories primarily study ICL in single-task settings, while real-world prompts often contain sequences of heterogeneous tasks, leaving a gap in understanding whether Large Language Models implicitly perform continual learning during inference. To bridge this gap, we propose the first theoretical framework for in-context continual learning, modeling how a pretrained Transformer processes multiple sequential tasks within a single prompt through shared attention mechanisms. Focusing on linear and masked linear self-attention, we derive error expressions for model predictions under sequential task prompts and analyze their generalization and forgetting behavior. Our results reveal that standard attention mechanisms inevitably induce intertask interference by uniformly or causally aggregating historical contexts, leading to systematic bias. We further provide a bias-variance-interference decomposition of prediction error, characterizing when historical in-context information yields positive transfer or provable negative transfer. This analysis exposes fundamental limits of attention-based continual inference and offers theoretical explanations for order sensitivity and performance degradation in long prompts.
上下文持续学习中的泛化与遗忘机制理解 / Understanding Generalization and Forgetting in In-Context Continual Learning
本文首次为上下文持续学习建立了理论框架,通过分析注意力机制如何处理连续任务序列,揭示了标准注意力因统一或因果聚合历史上下文而必然导致任务间干扰,从而解释了长提示中性能下降和顺序敏感性现象。
源自 arXiv: 2605.28705