UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees

📄 Abstract - UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees

Selecting prototypical examples from a source distribution to represent a target data distribution is a fundamental problem in machine learning. Existing subset selection methods often rely on implicit importance scores, which can be skewed towards majority classes and lead to low-quality prototypes for minority classes. We present $\methodprop$, a novel subset selection framework that minimizes the optimal transport (OT) distance between a uniformly weighted prototypical distribution and the target distribution. While intuitive, this formulation leads to a cardinality-constrained maximization of a \emph{super-additive} objective, which is generally intractable to approximate efficiently. To address this, we propose a principled reformulation of the OT marginal constraints, yielding a partial optimal transport-based submodular objective. We prove that this reformulation enables a greedy algorithm with a $(1-1/e)$ approximation guarantee relative to the original super-additive maximization problem. Empirically, we showcase that enforcing uniform prototype weights in UniPROT consistently improves minority-class representation in imbalanced classification benchmarks without compromising majority-class accuracy. In both finetuning and pretraining regimes for large language models under domain imbalance, UniPROT enforces uniform source contributions, yielding robust performance gains. Our results establish UniPROT as a scalable, theoretically grounded solution for uniform-weighted prototype selection. Our code is publicly available at GitHub\footnote{Code: this https URL}

UniPROT：基于部分最优传输与次模保证的均匀原型选择方法 / UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees

1️⃣ 一句话总结

这篇论文提出了一种名为UniPROT的新方法，通过数学优化确保从数据集中选出的代表性样本（原型）具有均匀的权重，从而有效改善数据不平衡时少数类别的代表性，同时保持整体性能，并提供了理论保证和实际验证。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要