- 🚀 New paper alert! 🚀 Happy to introduce #ChemEmbed, a deep learning framework for metabolite identification that enhances MS/MS data and leverages multidimensional molecular embeddings. A 🧵 on how it works and why it matters! ⬇️ #metabolomics #MachineLearning #DeepLearning
- 1/ The problem: #Metabolomics relies on MS/MS spectral databases, but most spectra remain unidentified due to limited reference libraries. Computational methods help, but they struggle with high-dimensional and sparse spectral and structural data.
- 2/ Our solution to reduce this problem: #ChemEmbed We combine enhanced MS/MS spectra with continuous vector representations of molecular structures (300-dimensional embeddings aligned with Mol2vec representations). This gives our CNN-based model richer input, improving annotation accuracy.Feb 11, 2025 12:21
- 3/ We enhance MS/MS data by: ✅ Merging spectra from multiple collision energies ✅ Incorporating calculated neutral losses ✅ Training a CNN on a dataset of 38,472 unique compounds from NIST20, MSDIAL, GNPS, and Agilent METLIN metabolomic libraries
- 4/ The results: ✅ ChemEmbed ranks the correct metabolite #1 in 42% of cases in a test dataset. ✅ Finds the correct compound in the top 5 in 76% of cases ✅ Against external benchmarks CASMI 2016 and 2022, and ARUS dataset (unidentified spectra from human plasma & urine), ChemEmbed outperforms #SIRIUS
- 5/ Want to learn more? 📄 Read our full paper on #bioRxiv: www.biorxiv.org/content/10.1... #Metabolomics #MachineLearning #DeepLearning #MSMS We’d love to hear your thoughts! This is another successful collaboration with @seeslab.bsky.social at @urv.cat