预训练模型规模扩展可证明降低下游任务样本复杂度 / Provable Target Sample Complexity Improvements as Pre-Trained Models Scale
1️⃣ 一句话总结
这篇论文通过一个名为‘填隙’的新理论框架,首次从理论上证明了更大的预训练模型确实能降低下游任务的学习所需数据量,为实践中观察到的‘模型越大,下游性能越好’的规律提供了坚实的数学解释。
Pre-trained models have become indispensable for efficiently building models across a broad spectrum of downstream tasks. The advantages of pre-trained models have been highlighted by empirical studies on scaling laws, which demonstrate that larger pre-trained models can significantly reduce the sample complexity of downstream learning. However, existing theoretical investigations of pre-trained models lack the capability to explain this phenomenon. In this paper, we provide a theoretical investigation by introducing a novel framework, caulking, inspired by parameter-efficient fine-tuning (PEFT) methods such as adapter-based fine-tuning, low-rank adaptation, and partial fine-tuning. Our analysis establishes that improved pre-trained models provably decrease the sample complexity of downstream tasks, thereby offering theoretical justification for the empirically observed scaling laws relating pre-trained model size to downstream performance, a relationship not covered by existing results.
预训练模型规模扩展可证明降低下游任务样本复杂度 / Provable Target Sample Complexity Improvements as Pre-Trained Models Scale
这篇论文通过一个名为‘填隙’的新理论框架,首次从理论上证明了更大的预训练模型确实能降低下游任务的学习所需数据量,为实践中观察到的‘模型越大,下游性能越好’的规律提供了坚实的数学解释。
源自 arXiv: 2602.04233