基于流匹配的质谱从头分子结构解析 / De novo molecular structure elucidation from mass spectra via flow matching
1️⃣ 一句话总结
本研究开发了一种名为MSFlow的新型人工智能模型,它能像‘化学翻译器’一样,直接从质谱数据中高精度地推断出未知小分子的完整化学结构,将解析准确率提升了高达14倍,为药物发现和生命科学研究提供了强大工具。
Mass spectrometry is a powerful and widely used tool for identifying molecular structures due to its sensitivity and ability to profile complex samples. However, translating spectra into full molecular structures is a difficult, under-defined inverse problem. Overcoming this problem is crucial for enabling biological insight, discovering new metabolites, and advancing chemical research across multiple fields. To this end, we develop MSFlow, a two-stage encoder-decoder flow-matching generative model that achieves state-of-the-art performance on the structure elucidation task for small molecules. In the first stage, we adopt a formula-restricted transformer model for encoding mass spectra into a continuous and chemically informative embedding space, while in the second stage, we train a decoder flow matching model to reconstruct molecules from latent embeddings of mass spectra. We present ablation studies demonstrating the importance of using information-preserving molecular descriptors for encoding mass spectra and motivate the use of our discrete flow-based decoder. Our rigorous evaluation demonstrates that MSFlow can accurately translate up to 45 percent of molecular mass spectra into their corresponding molecular representations - an improvement of up to fourteen-fold over the current state-of-the-art. A trained version of MSFlow is made publicly available on GitHub for non-commercial users.
基于流匹配的质谱从头分子结构解析 / De novo molecular structure elucidation from mass spectra via flow matching
本研究开发了一种名为MSFlow的新型人工智能模型,它能像‘化学翻译器’一样,直接从质谱数据中高精度地推断出未知小分子的完整化学结构,将解析准确率提升了高达14倍,为药物发现和生命科学研究提供了强大工具。
源自 arXiv: 2602.19912