菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-19
📄 Abstract - Representation-Guided Parameter-Efficient LLM Unlearning

Large Language Models (LLMs) often memorize sensitive or harmful information, necessitating effective machine unlearning techniques. While existing parameter-efficient unlearning methods have shown promise, they still struggle with the forget-retain trade-off. This can be attributed to their reliance on parameter importance metrics to identify parameters that are important exclusively for the forget set, which is fundamentally limited by the superposition phenomenon. Due to the polysemantic nature of LLM parameters, such an importance metric may struggle to disentangle parameters associated with the forget and retain sets. In this work, we propose Representation-Guided Low-rank Unlearning (REGLU), a novel approach that leverages the geometric properties of representation spaces to achieve robust and precise unlearning. First, we develop a representation-guided initialization for LoRA that identifies the optimal subspace for selective forgetting. Second, we introduce a regularization loss that constrains the outputs of the LoRA update to lie in the orthogonal complement of the retain set's representation subspace, thereby minimizing interference with the model's performance on the retain set. We evaluate REGLU on the TOFU and WMDP benchmarks across multiple models. Our results demonstrate that REGLU consistently outperforms state-of-the-art baselines, achieving superior unlearning quality while maintaining higher model utility.

顶级标签: llm model training
详细标签: machine unlearning parameter-efficient lora representation space forget-retain trade-off 或 搜索:

基于表示引导的参数高效大模型遗忘方法 / Representation-Guided Parameter-Efficient LLM Unlearning


1️⃣ 一句话总结

本文提出了一种名为REGLU的新方法,通过利用模型内部表示空间的几何特性来引导参数微调,从而在删除大模型中的敏感或有害信息时,既能高效地擦除目标内容,又能最大程度地保留模型原有的其他能力。

源自 arXiv: 2604.17396