菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-03
📄 Abstract - RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy

Machine Learning (ML) has transformed many scientific fields, yet key applications still lack standardized benchmarks. Raman spectroscopy, a widely used technique for non-invasive molecular analysis, is one such field where progress is limited by fragmented datasets, inconsistent evaluation, and models that fail to capture the structure of spectral data. We introduce RamanBench, the first large-scale, fully reproducible benchmark for ML on Raman spectroscopy, consisting of streamlined data access, evaluation protocols and code, as well as a live leaderboard. It unifies 74 datasets (including 16 first released with this benchmark) across four domains, comprising 325,668 spectra and spanning classification and regression tasks under diverse experimental conditions. We benchmark 28 models under a standardized protocol, including classical methods (e.g., PLS), Raman-specific (e.g., RamanNet), Tabular Foundation Model (TFM) (e.g., TabPFN), and time-series approaches (e.g., ROCKET). TFM consistently outperform domain-specific and gradient boosting baselines, while time-series models remain competitive. However, no method generalizes across datasets, revealing a fundamental gap. Therefore, we invite the community to contribute new approaches to our living benchmark, with the potential to accelerate advances in critical applications such as medical diagnostics, biological research, and materials science.

顶级标签: machine learning benchmark systems
详细标签: raman spectroscopy spectral data tabular foundation model reproducibility multi-domain 或 搜索:

RamanBench:面向拉曼光谱机器学习的超大标准基准数据集 / RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy


1️⃣ 一句话总结

该论文推出了一个名为RamanBench的大型公开基准平台,整合了74个数据集、超过32万条拉曼光谱,系统评估了28种机器学习模型,发现现有方法均无法在所有数据集上通用,呼吁社区贡献新算法以推动医学诊断、生物研究和材料科学等领域的进步。

源自 arXiv: 2605.02003