Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Acquisition
2.2. Metrics and Metric Families
2.3. Pearson Correlation, T-Statistic, and Overlap Score
2.4. Understanding Clusters of Similar Scores
3. Results
3.1. Correlated Metric Groups
3.2. Metric Performance
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gibbs, E.P.J. The evolution of One Health: A decade of progress and challenges for the future. Vet. Rec. 2014, 174, 85–91. [Google Scholar] [CrossRef]
- Pećina-Šlaus, N.; Pećina, M. Only one health, and so many omics. Cancer Cell Int. 2015, 15, 64. [Google Scholar] [CrossRef]
- Manrai, A.K.; Cui, Y.; Bushel, P.R.; Hall, M.; Karakitsios, S.; Mattingly, C.J.; Ritchie, M.; Schmitt, C.; Sarigiannis, D.A.; Thomas, D.C.; et al. Informatics and Data Analytics to Support Exposome-Based Discovery for Public Health. Annu. Rev. Public Health 2017, 38, 279–294. [Google Scholar] [CrossRef] [PubMed]
- Traversi, D.; Ripabelli, G. Editorial: New omics research challenges for Public and sustainable Health. Front. Microbiol. 2022, 13, 1078865. [Google Scholar] [CrossRef]
- Tigistu-Sahle, F.; Mekuria, Z.H.; Satoskar, A.R.; Sales, G.F.C.; Gebreyes, W.A.; Oliveira, C.J.B. Challenges and opportunities of molecular epidemiology: Using omics to address complex One Health issues in tropical settings. Front. Trop. Dis. 2023, 4, 1151336. [Google Scholar] [CrossRef]
- Cabal, A.; Martinovic, A. Special Issue ‘One Health meets Omics: The way forward to investigate zoonosis’. J. Appl. Microbiol. 2022, 133, 1144–1145. [Google Scholar] [CrossRef] [PubMed]
- Hajjar, G.; Barros Santos, M.C.; Bertrand-Michel, J.; Canlet, C.; Castelli, F.; Creusot, N.; Dechaumet, S.; Diémé, B.; Giacomoni, F.; Giraudeau, P.; et al. Scaling-up metabolomics: Current state and perspectives. TrAC Trends Anal. Chem. 2023, 167, 117225. [Google Scholar] [CrossRef]
- Hotea, I.; Sirbu, C.; Plotuna, A.M.; Tîrziu, E.; Badea, C.; Berbecea, A.; Dragomirescu, M.; Radulov, I. Integrating (Nutri-)Metabolomics into the One Health Tendency—The Key for Personalized Medicine Advancement. Metabolites 2023, 13, 800. [Google Scholar] [CrossRef]
- Kim, S.; Koo, I.; Jeong, J.; Wu, S.W.; Shi, X.; Zhang, X. Compound Identification Using Partial and Semipartial Correlations for Gas Chromatography-Mass Spectrometry Data. Anal. Chem. 2012, 84, 6477–6487. [Google Scholar] [CrossRef]
- Koo, I.; Zhang, X.; Kim, S. Wavelet- and Fourier-transform-based spectrum similarity approaches to compound identification in gas chromatography/mass spectrometry. Anal. Chem. 2011, 83, 5631–5638. [Google Scholar] [CrossRef]
- Stein, S.E.; Scott, D.R. Optimization and testing of mass spectral library search algorithms for compound identification. J. Am. Soc. Mass Spectrom. 1994, 5, 859–866. [Google Scholar] [CrossRef] [PubMed]
- Koo, I.; Kim, S.; Zhang, X. Comparative analysis of mass spectral matching-based compound identification in gas chromatography-mass spectrometry. J. Chromatogr. A 2013, 1298, 132–138. [Google Scholar] [CrossRef] [PubMed]
- Kim, S.H.; Ouyang, M.; Jeong, J.; Shen, C.Y.; Zhang, X. A New Method of Peak Detection for Analysis of Comprehensive Two-Dimensional Gas Chromatography Mass Spectrometry Data. Ann. Appl. Stat. 2014, 8, 1209–1231. [Google Scholar] [CrossRef]
- Matyushin, D.D.; Sholokhova, A.Y.; Buryak, A.K. Deep Learning Driven GC-MS Library Search and Its Application for Metabolomics. Anal. Chem. 2020, 92, 11818–11825. [Google Scholar] [CrossRef]
- Kim, S.; Zhang, X. Discovery of false identification using similarity difference in GC-MS-based metabolomics. J. Chemometr. 2015, 29, 80–86. [Google Scholar] [CrossRef]
- Hu, Q.; Zhang, J.; Chen, P.; Wang, B. Compound identification via deep classification model for electron- ionization mass spectrometry. Int. J. Mass Spectrom. 2021, 463, 116540. [Google Scholar] [CrossRef]
- Zhang, J.; Xia, Y.; Zheng, C.H.; Wang, B.; Zhang, X.; Chen, P. Combine multiple mass spectral similarity measures for compound identification. Int. J. Data Min. Bioin. 2016, 15, 84–100. [Google Scholar] [CrossRef]
- Wei, X.L.; Koo, I.; Kim, S.; Zhang, X. Compound identification in GC-MS by simultaneously evaluating the mass spectrum and retention index. Analyst 2014, 139, 2507–2514. [Google Scholar] [CrossRef]
- Scheubert, K.; Hufsky, F.; Böcker, S. Computational mass spectrometry for small molecules. J. Cheminform. 2013, 5, 12. [Google Scholar] [CrossRef] [PubMed]
- Degnan, D.J.; Bramer, L.M.; Flores, J.E.; Paurus, V.L.; Corilo, Y.E.; Clendinen, C.S. Evaluating Retention Index Score Assumptions to Refine GC–MS Metabolite Identification. Anal. Chem. 2023, 95, 7536–7544. [Google Scholar] [CrossRef]
- Flores, J.E.; Bramer, L.M.; Degnan, D.J.; Paurus, V.L.; Corilo, Y.E.; Clendinen, C.S. Gaussian Mixture Modeling Extensions for Improved False Discovery Rate Estimation in GC–MS Metabolomics. J. Am. Soc. Mass Spectrom. 2023, 34, 1096–1104. [Google Scholar] [CrossRef] [PubMed]
- Corilo, Y.E.; Kew, W.R.; McCue, L. EMSL-Computing/CoreMS: CoreMS 1.0.0, version v1.0.0; as developed on Github; Zenodo: Switzerland, Geneva, 2021. [Google Scholar] [CrossRef]
- Cha, S.H. Comprehensive survey on distance/similarity measures between probability density functions. Int. J. Math. Model Meth. Appl. Sci. 2007, 1, 300–307. [Google Scholar]
- Vaniya, A.; Fiehn, O. Using fragmentation trees and mass spectral trees for identifying unknown compounds in metabolomics. Trends Analyt. Chem. 2015, 69, 52–61. [Google Scholar] [CrossRef]
- Kolde, R. Pheatmap: Pretty Heatmaps, version 1.0.12; 2019. Available online: https://CRAN.R-project.org/package=pheatmap (accessed on 4 May 2023).
- Wright, M.N.; Ziegler, A. Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. J. Stat. Softw. 2017, 77, 1–17. [Google Scholar] [CrossRef]
- Kuhn, M.; Wickham, H.; Hvitfeldt, E. Recipes: Preprocessing and Feature Engineering Steps for Modeling, version 1.0.6; CRAN: Wien, Austria, 2023; Available online: https://CRAN.R-project.org/package=recipes (accessed on 4 May 2023).
- Frick, H.; Chow, F.; Kuhn, M.; Mahoney, M.; Silge, J.; Wickham, H. Rsample: General Resampling Infrastructure, version 1.1.1; CRAN: Wien, Austria, 2022; Available online: https://CRAN.R-project.org/package=rsample (accessed on 4 May 2023).
- Kuhn, M.; Vaughan, D. Parsnip: A Common API to Modeling and Analysis Functions, version 1.1.0; CRAN: Wien, Austria, 2023; Available online: https://CRAN.R-project.org/package=parsnip (accessed on 4 May 2023).
- Kuhn, M.; Couch, S. Workflowsets: Create a Collection of ‘Tidymodels’ Workflows, version 1.0.1; CRAN: Wien, Austria, 2023; Available online: https://CRAN.R-project.org/package=workflowsets (accessed on 4 May 2023).
- Kuhn, M.; Vaughan, D.; Hvitfeldt, E. Yardstick: Tidy Characterizations of Model Performance, version 1.2.0; CRAN: Wien, Austria, 2023; Available online: https://CRAN.R-project.org/package=yardstick (accessed on 4 May 2023).
- Kuhn, M.; Frick, H. Dials: Tools for Creating Tuning Parameter Values, version 1.2.0; CRAN: Wien, Austria, 2023; Available online: https://CRAN.R-project.org/package=dials (accessed on 4 May 2023).
- Greenwell, B.M.; Boehmke, B.C. Variable Importance Plots—An Introduction to the vip Package. R J. 2020, 12, 343–366. [Google Scholar] [CrossRef]
- Kuhn, M. Tune: Tidy Tuning Tools, version 1.1.1; CRAN: Wien, Austria, 2023; Available online: https://CRAN.R-project.org/package=tune (accessed on 4 May 2023).
- Ceriani, L.; Verme, P. The origins of the Gini index: Extracts from Variabilità e Mutabilità (1912) by Corrado Gini. J. Econ. Inequal. 2012, 10, 421–443. [Google Scholar] [CrossRef]
- Hafen, R.; Schloerke, B. Trelliscopejs: Create Interactive Trelliscope Displays, version 0.2.6; CRAN: Wien, Austria, 2021; Available online: https://CRAN.R-project.org/package=trelliscopejs (accessed on 4 May 2023).
- Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
- da Silva, R.R.; Dorrestein, P.C.; Quinn, R.A. Illuminating the dark matter in metabolomics. Proc. Natl. Acad. Sci. USA 2015, 112, 12549–12550. [Google Scholar] [CrossRef]
Cluster | t-Statistic | Overlap Score | Score Median * |
---|---|---|---|
1 | 482.798 | 0.022 | 0.120 |
2 | −412.002 | 0.026 | 6.63 × 1010 |
3 | −47.049 | 0.139 | 2.513 × 1029 |
4 | −151.720 | 0.213 | 106.002 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Degnan, D.J.; Flores, J.E.; Brayfindley, E.R.; Paurus, V.L.; Webb-Robertson, B.-J.M.; Clendinen, C.S.; Bramer, L.M. Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification. Metabolites 2023, 13, 1101. https://doi.org/10.3390/metabo13101101
Degnan DJ, Flores JE, Brayfindley ER, Paurus VL, Webb-Robertson B-JM, Clendinen CS, Bramer LM. Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification. Metabolites. 2023; 13(10):1101. https://doi.org/10.3390/metabo13101101
Chicago/Turabian StyleDegnan, David J., Javier E. Flores, Eva R. Brayfindley, Vanessa L. Paurus, Bobbie-Jo M. Webb-Robertson, Chaevien S. Clendinen, and Lisa M. Bramer. 2023. "Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification" Metabolites 13, no. 10: 1101. https://doi.org/10.3390/metabo13101101
APA StyleDegnan, D. J., Flores, J. E., Brayfindley, E. R., Paurus, V. L., Webb-Robertson, B. -J. M., Clendinen, C. S., & Bramer, L. M. (2023). Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification. Metabolites, 13(10), 1101. https://doi.org/10.3390/metabo13101101