CRePE: Convolution-aware Relative Importance in Post-training Pruning with Efficient Search

📄 Abstract - CRePE: Convolution-aware Relative Importance in Post-training Pruning with Efficient Search

Deploying Large Language Models (LLMs) in practice incurs substantial memory and computational costs. Post-training pruning (PTP) is an effective approach to reducing these costs by removing weights without additional training. Among existing methods, RIA introduces relative importance scores normalized by row and column sums, achieving state-of-the-art accuracy. However, RIA considers only 1D cross-shaped (row/column) directional information and assigns equal weight to row and column contributions. In this paper, we propose \textbf{CRePE}, which incorporates 2D local neighborhood context and adaptive coefficients into Relative Importance scoring. CRePE consistently outperforms existing PTP methods across diverse models and sparsity settings. However, identifying optimal adaptive coefficients via perplexity (PPL)-based hill climbing requires numerous PPL evaluations and approximately 11 hours of search time. To address this, we propose \textbf{PHO} (Proxy-based Hyperparameter Optimization), which eliminates the need for repeated PPL measurements and reduces the search time to approximately 20 minutes. Furthermore, the optimal hyperparameter configuration found by PHO on one model transfers well to other models, demonstrating strong generalization. Finally, we verify that CRePE can be orthogonally combined with existing techniques including Channel Permutation, non-uniform sparsity allocation, and re-pruning methods.

CRePE：基于卷积感知的相对重要性进行高效搜索的后训练剪枝 / CRePE: Convolution-aware Relative Importance in Post-training Pruning with Efficient Search

1️⃣ 一句话总结

本文提出了一种名为CRePE的新剪枝方法，通过引入二维局部邻域信息和自适应系数来更精确地评估权重的重要性，从而在压缩大型语言模型时保持更高精度，并进一步设计了快速搜索算法PHO，将调优时间从11小时缩短至20分钟，且搜索到的参数可迁移到其他模型。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要