仅有知识还不够:注入强化学习技能以实现持续适应 / Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation
1️⃣ 一句话总结
这篇论文提出了一种名为PaST的新方法,它能够将大语言模型从强化学习中获得的‘知识运用技能’模块化地提取出来,然后像‘打补丁’一样快速注入到经过简单微调的模型中,从而让模型不仅能记住新知识,还能更有效地利用这些知识来回答问题或完成任务。
Large Language Models (LLMs) face the "knowledge cutoff" challenge, where their frozen parametric memory prevents direct internalization of new information. While Supervised Fine-Tuning (SFT) is commonly used to update model knowledge, it often updates factual content without reliably improving the model's ability to use the newly incorporated information for question answering or decision-making. Reinforcement Learning (RL) is essential for acquiring reasoning skills; however, its high computational cost makes it impractical for efficient online adaptation. We empirically observe that the parameter updates induced by SFT and RL are nearly orthogonal. Based on this observation, we propose Parametric Skill Transfer (PaST), a framework that supports modular skill transfer for efficient and effective knowledge adaptation. By extracting a domain-agnostic Skill Vector from a source domain, we can linearly inject knowledge manipulation skills into a target model after it has undergone lightweight SFT on new data. Experiments on knowledge-incorporation QA (SQuAD, LooGLE) and agentic tool-use benchmarks (ToolBench) demonstrate the effectiveness of our method. On SQuAD, PaST outperforms the state-of-the-art self-editing SFT baseline by up to 9.9 points. PaST further scales to long-context QA on LooGLE with an 8.0-point absolute accuracy gain, and improves zero-shot ToolBench success rates by +10.3 points on average with consistent gains across tool categories, indicating strong scalability and cross-domain transferability of the Skill Vector.
仅有知识还不够:注入强化学习技能以实现持续适应 / Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation
这篇论文提出了一种名为PaST的新方法,它能够将大语言模型从强化学习中获得的‘知识运用技能’模块化地提取出来,然后像‘打补丁’一样快速注入到经过简单微调的模型中,从而让模型不仅能记住新知识,还能更有效地利用这些知识来回答问题或完成任务。
源自 arXiv: 2601.11258