ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

📄 Abstract - ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

This work presents \textsc{ChunkFT}, a memory-efficient fine-tuning framework that reformulates full-parameter fine-tuning around a dynamically activated working set. \textsc{ChunkFT} enables gradient computation for arbitrary sub-tensors without modifying the network architecture, providing an algorithmic foundation for optimizing arbitrary sub-networks while avoiding standard dense gradient computation. We provide a theoretical convergence analysis of \textsc{ChunkFT} in the deterministic setting. Empirically, we apply \textsc{ChunkFT} to fine-tune Llama 3-8B and Llama 3-70B using a single RTX 4090-24GB GPU and 2$\times$ H800-80GB GPUs, respectively. Full-parameter fine-tuning of a 7B model with a 1K input length requires only 13.72GB of GPU memory. The results demonstrate the effectiveness of \textsc{ChunkFT} in memory usage, running time, and optimization quality. Moreover, downstream evaluations on language understanding, mathematical reasoning, and MT-Bench show that \textsc{ChunkFT} consistently outperforms existing memory-efficient baselines. Notably, \textsc{ChunkFT} achieves performance comparable to, and in some cases exceeding, full-parameter fine-tuning. Our repository is on this https URL.

ChunkFT：面向内存高效全参数微调的字节流优化方法 / ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

1️⃣ 一句话总结

ChunkFT提出了一种创新的微调框架，通过动态激活工作集来分块计算梯度，无需修改网络结构即可大幅降低内存占用，使得在单张消费级显卡上也能完成70亿参数大模型的完整微调，并保持甚至超越传统全参数微调的性能。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要