菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-09
📄 Abstract - GEMSS: A Variational Bayesian Method for Discovering Multiple Sparse Solutions in Classification and Regression Problems

Selecting interpretable feature sets in underdetermined ($n \ll p$) and highly correlated regimes constitutes a fundamental challenge in data science, particularly when analyzing physical measurements. In such settings, multiple distinct sparse subsets may explain the response equally well. Identifying these alternatives is crucial for generating domain-specific insights into the underlying mechanisms, yet conventional methods typically isolate a single solution, obscuring the full spectrum of plausible explanations. We present GEMSS (Gaussian Ensemble for Multiple Sparse Solutions), a variational Bayesian framework specifically designed to simultaneously discover multiple, diverse sparse feature combinations. The method employs a structured spike-and-slab prior for sparsity, a mixture of Gaussians to approximate the intractable multimodal posterior, and a Jaccard-based penalty to further control solution diversity. Unlike sequential greedy approaches, GEMSS optimizes the entire ensemble of solutions within a single objective function via stochastic gradient descent. The method is validated on a comprehensive benchmark comprising 128 synthetic experiments across classification and regression tasks. Results demonstrate that GEMSS scales effectively to high-dimensional settings ($p=5000$) with sample size as small as $n = 50$, generalizes seamlessly to continuous targets, handles missing data natively, and exhibits remarkable robustness to class imbalance and Gaussian noise. GEMSS is available as a Python package 'gemss' at PyPI. The full GitHub repository at this https URL also includes a free, easy-to-use application suitable for non-coders.

顶级标签: machine learning model training data
详细标签: sparse feature selection variational bayesian inference multimodal posterior high-dimensional data ensemble methods 或 搜索:

GEMSS:一种用于发现分类和回归问题中多个稀疏解的变分贝叶斯方法 / GEMSS: A Variational Bayesian Method for Discovering Multiple Sparse Solutions in Classification and Regression Problems


1️⃣ 一句话总结

这篇论文提出了一个名为GEMSS的新方法,它能够同时找到多个不同的、稀疏的特征组合来解释数据,特别适用于样本少、特征多且特征间高度相关的情况,为理解复杂数据背后的多种可能机制提供了有力工具。

源自 arXiv: 2602.08913