Next Article in Journal
Characterization and Identification of Prenylated Flavonoids from Artocarpus heterophyllus Lam. Roots by Quadrupole Time-Of-Flight and Linear Trap Quadrupole Orbitrap Mass Spectrometry
Previous Article in Journal
Identification of High-Affinity Inhibitors of Cyclin-Dependent Kinase 2 Towards Anticancer Therapy
Open AccessArticle

Multiple Compounds Recognition from The Tandem Mass Spectral Data Using Convolutional Neural Network

by Jiali Lv 1, Jian Wei 1, Zhenyu Wang 1,* and Jin Cao 2
1
School of Software and Microelectronics, Peking University, 24 Jinyuan Road, Daxing District, Beijing 102600, China
2
National institutes for food and drug control, Beijing 100050, China
*
Author to whom correspondence should be addressed.
Molecules 2019, 24(24), 4590; https://doi.org/10.3390/molecules24244590
Received: 30 October 2019 / Revised: 9 December 2019 / Accepted: 13 December 2019 / Published: 15 December 2019
(This article belongs to the Section Analytical Chemistry)
Mixtures analysis can provide more information than individual components. It is important to detect the different compounds in the real complex samples. However, mixtures are often disturbed by impurities and noise to influence the accuracy. Purification and denoising will cost a lot of algorithm time. In this paper, we propose a model based on convolutional neural network (CNN) which can analyze the chemical peak information in the tandem mass spectrometry (MS/MS) data. Compared with traditional analyzing methods, CNN can reduce steps in data preprocessing. This model can extract features of different compounds and classify multi-label mass spectral data. When dealing with MS data of mixtures based on the Human Metabolome Database (HMDB), the accuracy can reach at 98%. In 600 MS test data, 451 MS data were fully detected (true positive), 142 MS data were partially found (false positive), and 7 MS data were falsely predicted (true negative). In comparison, the number of true positive test data for support vector machine (SVM) with principal component analysis (PCA), deep neural network (DNN), long short-term memory (LSTM), and XGBoost respectively are 282, 293, 270, and 402; the number of false positive test data for four models are 318, 284, 198, and 168; the number of true negative test data for four models are 0, 23, 7, 132, and 30. Compared with the model proposed in other literature, the accuracy and model performance of CNN improved considerably by separating the different compounds independent MS/MS data through three-channel architecture input. By inputting MS data from different instruments, adding more offset MS data will make CNN models have stronger universality in the future. View Full-Text
Keywords: tandem mass spectra; compounds recognition; multi-label classification; convolutional neural network tandem mass spectra; compounds recognition; multi-label classification; convolutional neural network
Show Figures

Graphical abstract

MDPI and ACS Style

Lv, J.; Wei, J.; Wang, Z.; Cao, J. Multiple Compounds Recognition from The Tandem Mass Spectral Data Using Convolutional Neural Network. Molecules 2019, 24, 4590.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop