菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-02
📄 Abstract - From Fewer Samples to Fewer Bits: Reframing Dataset Distillation as Joint Optimization of Precision and Compactness

Dataset Distillation (DD) compresses large datasets into compact synthetic ones that maintain training performance. However, current methods mainly target sample reduction, with limited consideration of data precision and its impact on efficiency. We propose Quantization-aware Dataset Distillation (QuADD), a unified framework that jointly optimizes dataset compactness and precision under fixed bit budgets. QuADD integrates a differentiable quantization module within the distillation loop, enabling end-to-end co-optimization of synthetic samples and quantization parameters. Guided by the rate-distortion perspective, we empirically analyze how bit allocation between sample count and precision influences learning performance. Our framework supports both uniform and adaptive non-uniform quantization, where the latter learns quantization levels from data to represent information-dense regions better. Experiments on image classification and 3GPP beam management tasks show that QuADD surpasses existing DD and post-quantized baselines in accuracy per bit, establishing a new standard for information-efficient dataset distillation.

顶级标签: model training data machine learning
详细标签: dataset distillation quantization efficient training rate-distortion synthetic data 或 搜索:

从减少样本到减少比特:将数据集蒸馏重新定义为精度与紧凑性的联合优化 / From Fewer Samples to Fewer Bits: Reframing Dataset Distillation as Joint Optimization of Precision and Compactness


1️⃣ 一句话总结

这篇论文提出了一个名为QuADD的新方法,它通过同时优化合成数据的数量和每个数据的存储精度(比特数),在固定的总存储预算下,比现有方法更高效地压缩大型数据集,从而在图像分类等任务上获得更好的性能。

源自 arXiv: 2603.02411