菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-16
📄 Abstract - VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse

The rapid scaling of Large Language Models (LLMs) has achieved remarkable performance, but it also leads to prohibitive memory costs. Existing parameter-efficient approaches such as pruning and quantization mainly compress pretrained models without enhancing architectural capacity, thereby hitting the representational ceiling of the base model. In this work, we propose VersatileFFN, a novel feed-forward network (FFN) that enables flexible reuse of parameters in both width and depth dimensions within a fixed parameter budget. Inspired by the dual-process theory of cognition, VersatileFFN comprises two adaptive pathways: a width-versatile path that generates a mixture of sub-experts from a single shared FFN, mimicking sparse expert routing without increasing parameters, and a depth-versatile path that recursively applies the same FFN to emulate deeper processing for complex tokens. A difficulty-aware gating dynamically balances the two pathways, steering "easy" tokens through the efficient width-wise route and allocating deeper iterative refinement to "hard" tokens. Crucially, both pathways reuse the same parameters, so all additional capacity comes from computation rather than memory. Experiments across diverse benchmarks and model scales demonstrate the effectiveness of the method. The code will be available at this https URL.

顶级标签: llm model training systems
详细标签: parameter efficiency feed-forward network adaptive computation model architecture expert mixture 或 搜索:

VersatileFFN:通过自适应宽深复用实现大语言模型的参数高效化 / VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse


1️⃣ 一句话总结

这篇论文提出了一种名为VersatileFFN的新型前馈网络,它通过在同一套固定参数内,自适应地复用参数来拓宽模型宽度或加深处理深度,从而在不增加内存开销的前提下,有效提升了大语言模型处理不同难度任务的能力。


源自 arXiv: 2512.14531