菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-20
📄 Abstract - ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

This work presents \textsc{ChunkFT}, a memory-efficient fine-tuning framework that reformulates full-parameter fine-tuning around a dynamically activated working set. \textsc{ChunkFT} enables gradient computation for arbitrary sub-tensors without modifying the network architecture, providing an algorithmic foundation for optimizing arbitrary sub-networks while avoiding standard dense gradient computation. We provide a theoretical convergence analysis of \textsc{ChunkFT} in the deterministic setting. Empirically, we apply \textsc{ChunkFT} to fine-tune Llama 3-8B and Llama 3-70B using a single RTX 4090-24GB GPU and 2$\times$ H800-80GB GPUs, respectively. Full-parameter fine-tuning of a 7B model with a 1K input length requires only 13.72GB of GPU memory. The results demonstrate the effectiveness of \textsc{ChunkFT} in memory usage, running time, and optimization quality. Moreover, downstream evaluations on language understanding, mathematical reasoning, and MT-Bench show that \textsc{ChunkFT} consistently outperforms existing memory-efficient baselines. Notably, \textsc{ChunkFT} achieves performance comparable to, and in some cases exceeding, full-parameter fine-tuning. Our repository is on this https URL.

顶级标签: llm model training
详细标签: memory-efficient fine-tuning gradient computation 或 搜索:

ChunkFT:面向内存高效全参数微调的字节流优化方法 / ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning


1️⃣ 一句话总结

ChunkFT提出了一种创新的微调框架,通过动态激活工作集来分块计算梯度,无需修改网络结构即可大幅降低内存占用,使得在单张消费级显卡上也能完成70亿参数大模型的完整微调,并保持甚至超越传统全参数微调的性能。

源自 arXiv: 2605.21177