Prediction of Recurrent Mutations in SARS-CoV-2 Using Artificial Neural Networks
Abstract
:1. Introduction
2. Results and Discussion
2.1. SARS-CoV-2 Mutation Description
2.2. Recurrent Mutations in SARS-CoV-2
2.3. Prediction of Whether a Given Position in the SARS-CoV-2 Genome Will Be Affected by a Recurrent Mutation
2.4. Global Feature Importance of the Prediction of Whether a Given Position in the SARS-CoV-2 Genome Will Be Affected by a Recurrent Mutation
2.5. Prediction of Whether a Given Mutation Will Be a Recurrent Mutation
2.6. Evaluation of the Models with the Variants of Concern
2.7. Prediction of AA Changes Caused by Recurrent Mutations in the M-Pro and Spike Proteins
3. Materials and Methods
3.1. NDRL Algorithm
3.2. Data Set Composition
3.3. Machine Learning
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wu, F.; Zhao, S.; Yu, B.; Chen, Y.-M.; Wang, W.; Song, Z.-G.; Hu, Y.; Tao, Z.-W.; Tian, J.-H.; Pei, Y.-Y.; et al. A New Coronavirus Associated with Human Respiratory Disease in China. Nature 2020, 579, 265–269. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kim, D.; Lee, J.-Y.; Yang, J.-S.; Kim, J.W.; Kim, V.N.; Chang, H. The Architecture of SARS-CoV-2 Transcriptome. Cell 2020, 181, 914–921.e10. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Liu, Q.; Guo, D. Emerging Coronaviruses: Genome Structure, Replication, and Pathogenesis. J. Med. Virol. 2020, 92, 418–423. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, R.; Hozumi, Y.; Zheng, Y.-H.; Yin, C.; Wei, G.-W. Host Immune Response Driving SARS-CoV-2 Evolution. Viruses 2020, 12, 1095. [Google Scholar] [CrossRef] [PubMed]
- Carrasco-Hernandez, R.; Jácome, R.; López Vidal, Y.; Ponce de León, S. Are RNA Viruses Candidate Agents for the Next Global Pandemic? A Review. ILAR J. 2017, 58, 343–358. [Google Scholar] [CrossRef] [Green Version]
- Duffy, S.; Shackelton, L.A.; Holmes, E.C. Rates of Evolutionary Change in Viruses: Patterns and Determinants. Nat. Rev. Genet. 2008, 9, 267–276. [Google Scholar] [CrossRef] [PubMed]
- Eckerle, L.D.; Becker, M.M.; Halpin, R.A.; Li, K.; Venter, E.; Lu, X.; Scherbakova, S.; Graham, R.L.; Baric, R.S.; Stockwell, T.B.; et al. Infidelity of SARS-CoV Nsp14-Exonuclease Mutant Virus Replication Is Revealed by Complete Genome Sequencing. PLOS Pathog. 2010, 6, e1000896. [Google Scholar] [CrossRef] [Green Version]
- Simmonds, P.; Ansari, M.A. Extensive C->U Transition Biases in the Genomes of a Wide Range of Mammalian RNA Viruses; Potential Associations with Transcriptional Mutations, Damage- or Host-Mediated Editing of Viral RNA. PLOS Pathog. 2021, 17, e1009596. [Google Scholar] [CrossRef]
- Ratcliff, J.; Simmonds, P. Potential APOBEC-Mediated RNA Editing of the Genomes of SARS-CoV-2 and Other Coronaviruses and Its Impact on Their Longer Term Evolution. Virology 2021, 556, 62–72. [Google Scholar] [CrossRef]
- Di Giorgio, S.; Martignano, F.; Torcia, M.G.; Mattiuz, G.; Conticello, S.G. Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2. Sci. Adv. 2020, 6, eabb5813. [Google Scholar] [CrossRef]
- Harris, R.S.; Dudley, J.P. APOBECs and Virus Restriction. Virology 2015, 479–480, 131–145. [Google Scholar] [CrossRef] [Green Version]
- Kim, K.; Calabrese, P.; Wang, S.; Qin, C.; Rao, Y.; Feng, P.; Chen, X.S. The Roles of APOBEC-Mediated RNA Editing in SARS-CoV-2 Mutations, Replication and Fitness. Sci. Rep. 2022, 12, 14972. [Google Scholar] [CrossRef]
- Simmonds, P. Rampant C→U Hypermutation in the Genomes of SARS-CoV-2 and Other Coronaviruses: Causes and Consequences for Their Short- and Long-Term Evolutionary Trajectories. mSphere 2020, 5, e00408-20. [Google Scholar] [CrossRef] [PubMed]
- Turakhia, Y.; Maio, N.D.; Thornlow, B.; Gozashti, L.; Lanfear, R.; Walker, C.R.; Hinrichs, A.S.; Fernandes, J.D.; Borges, R.; Slodkowicz, G.; et al. Stability of SARS-CoV-2 Phylogenies. PLoS Genet. 2020, 16, e1009175. [Google Scholar] [CrossRef] [PubMed]
- Graudenzi, A.; Maspero, D.; Angaroni, F.; Piazza, R.; Ramazzotti, D. Mutational Signatures and Heterogeneous Host Response Revealed via Large-Scale Characterization of SARS-CoV-2 Genomic Diversity. iScience 2021, 24, 102116. [Google Scholar] [CrossRef] [PubMed]
- Eisenberg, E.; Levanon, E.Y. A-to-I RNA Editing—Immune Protector and Transcriptome Diversifier. Nat. Rev. Genet. 2018, 19, 473–490. [Google Scholar] [CrossRef]
- Vlachogiannis, N.I.; Verrou, K.-M.; Stellos, K.; Sfikakis, P.P.; Paraskevis, D. The Role of A-to-I RNA Editing in Infections by RNA Viruses: Possible Implications for SARS-CoV-2 Infection. Clin. Immunol. 2021, 226, 108699. [Google Scholar] [CrossRef]
- van Dorp, L.; Richard, D.; Tan, C.C.S.; Shaw, L.P.; Acman, M.; Balloux, F. No Evidence for Increased Transmissibility from Recurrent Mutations in SARS-CoV-2. Nat. Commun. 2020, 11, 5986. [Google Scholar] [CrossRef]
- Lauring, A.S.; Hodcroft, E.B. Genetic Variants of SARS-CoV-2—What Do They Mean? JAMA 2021, 325, 529–531. [Google Scholar] [CrossRef]
- Khateeb, J.; Li, Y.; Zhang, H. Emerging SARS-CoV-2 Variants of Concern and Potential Intervention Approaches. Crit. Care 2021, 25, 244. [Google Scholar] [CrossRef]
- Rochman, N.D.; Wolf, Y.I.; Faure, G.; Mutz, P.; Zhang, F.; Koonin, E.V. Ongoing Global and Regional Adaptive Evolution of SARS-CoV-2. Proc. Natl. Acad. Sci. USA 2021, 118, e2104241118. [Google Scholar] [CrossRef] [PubMed]
- CDC. Coronavirus Disease 2019 (COVID-19). Available online: https://www.cdc.gov/coronavirus/2019-ncov/variants/variant-info.html (accessed on 8 November 2021).
- Salama, M.A.; Hassanien, A.E.; Mostafa, A. The Prediction of Virus Mutation Using Neural Networks and Rough Set Techniques. EURASIP J. Bioinforma. Syst. Biol. 2016, 2016, 10. [Google Scholar] [CrossRef] [Green Version]
- van Dorp, L.; Acman, M.; Richard, D.; Shaw, L.P.; Ford, C.E.; Ormond, L.; Owen, C.J.; Pang, J.; Tan, C.C.S.; Boshier, F.A.T.; et al. Emergence of Genomic Diversity and Recurrent Mutations in SARS-CoV-2. Infect. Genet. Evol. 2020, 83, 104351. [Google Scholar] [CrossRef]
- Greener, J.G.; Kandathil, S.M.; Moffat, L.; Jones, D.T. A Guide to Machine Learning for Biologists. Nat. Rev. Mol. Cell Biol. 2022, 23, 40–55. [Google Scholar] [CrossRef] [PubMed]
- Xu, C.; Jackson, S.A. Machine Learning and Complex Biological Data. Genome Biol. 2019, 20, 76. [Google Scholar] [CrossRef]
- Tng, S.S.; Le, N.Q.K.; Yeh, H.-Y.; Chua, M.C.H. Improved Prediction Model of Protein Lysine Crotonylation Sites Using Bidirectional Recurrent Neural Networks. J. Proteome Res. 2022, 21, 265–273. [Google Scholar] [CrossRef]
- Le, N.Q.K.; Ho, Q.-T.; Ou, Y.-Y. Using Two-Dimensional Convolutional Neural Networks for Identifying GTP Binding Sites in Rab Proteins. J. Bioinform. Comput. Biol. 2019, 17, 1950005. [Google Scholar] [CrossRef]
- Yan, S.; Wu, G. Application of Neural Network to Predict Mutations in Proteins from Influenza A Viruses—A Review of Our Approaches with Implication for Predicting Mutations in Coronaviruses. J. Phys. Conf. Ser. 2020, 1682, 012019. [Google Scholar] [CrossRef]
- Yang, W.; Bang, H.; Jang, K.; Sung, M.K.; Choi, J.K. Predicting the Recurrence of Noncoding Regulatory Mutations in Cancer. BMC Bioinform. 2016, 17, 492. [Google Scholar] [CrossRef] [Green Version]
- Malone, B.; Simovski, B.; Moliné, C.; Cheng, J.; Gheorghe, M.; Fontenelle, H.; Vardaxis, I.; Tennøe, S.; Malmberg, J.-A.; Stratford, R.; et al. Artificial Intelligence Predicts the Immunogenic Landscape of SARS-CoV-2 Leading to Universal Blueprints for Vaccine Designs. Sci. Rep. 2020, 10, 22375. [Google Scholar] [CrossRef]
- Liu, X.; Luo, Y.; Li, P.; Song, S.; Peng, J. Deep Geometric Representations for Modeling Effects of Mutations on Protein-Protein Binding Affinity. PLoS Comput. Biol. 2021, 17, e1009284. [Google Scholar] [CrossRef] [PubMed]
- Hu, F.; Wang, L.; Hu, Y.; Wang, D.; Wang, W.; Jiang, J.; Li, N.; Yin, P. A Novel Framework Integrating AI Model and Enzymological Experiments Promotes Identification of SARS-CoV-2 3CL Protease Inhibitors and Activity-Based Probe. Brief. Bioinform. 2021, 22, bbab301. [Google Scholar] [CrossRef] [PubMed]
- Mekni, N.; Coronnello, C.; Langer, T.; Rosa, M.D.; Perricone, U. Support Vector Machine as a Supervised Learning for the Prioritization of Novel Potential SARS-CoV-2 Main Protease Inhibitors. Int. J. Mol. Sci. 2021, 22, 7714. [Google Scholar] [CrossRef] [PubMed]
- Nagy, Á.; Ligeti, B.; Szebeni, J.; Pongor, S.; Győrffy, B. COVIDOUTCOME—Estimating COVID Severity Based on Mutation Signatures in the SARS-CoV-2 Genome. Database 2021, 2021, baab020. [Google Scholar] [CrossRef]
- Hossain, M.S.; Pathan, A.Q.M.S.U.; Islam, M.N.; Tonmoy, M.I.Q.; Rakib, M.I.; Munim, M.A.; Saha, O.; Fariha, A.; Reza, H.A.; Roy, M.; et al. Genome-Wide Identification and Prediction of SARS-CoV-2 Mutations Show an Abundance of Variants: Integrated Study of Bioinformatics and Deep Neural Learning. Inform Med. Unlocked 2021, 27, 100798. [Google Scholar] [CrossRef]
- Nawaz, M.S.; Fournier-Viger, P.; Shojaee, A.; Fujita, H. Using Artificial Intelligence Techniques for COVID-19 Genome Analysis. Appl. Intell. 2021, 51, 3086–3103. [Google Scholar] [CrossRef]
- Hie, B.; Zhong, E.D.; Berger, B.; Bryson, B. Learning the Language of Viral Evolution and Escape. Science 2021, 371, 284–288. [Google Scholar] [CrossRef]
- Maher, M.C.; Bartha, I.; Weaver, S.; Iulio, J.D.; Ferri, E.; Soriaga, L.; Lempp, F.A.; Hie, B.L.; Bryson, B.; Berger, B.; et al. Predicting the Mutational Drivers of Future SARS-CoV-2 Variants of Concern. Sci. Transl. Med. 2022, 14, eabk3445. [Google Scholar] [CrossRef]
- Sangeet, S.; Sarkar, R.; Mohanty, S.K.; Roy, S. Quantifying Mutational Response to Track the Evolution of SARS-CoV-2 Spike Variants: Introducing a Statistical-Mechanics-Guided Machine Learning Method. J. Phys. Chem. B 2022, 126, 7895–7905. [Google Scholar] [CrossRef]
- Kc, G.B.; Bocci, G.; Verma, S.; Hassan, M.M.; Holmes, J.; Yang, J.J.; Sirimulla, S.; Oprea, T.I. A Machine Learning Platform to Estimate Anti-SARS-CoV-2 Activities. Nat. Mach. Intell. 2021, 3, 527–535. [Google Scholar] [CrossRef]
- Arora, G.; Joshi, J.; Mandal, R.S.; Shrivastava, N.; Virmani, R.; Sethi, T. Artificial Intelligence in Surveillance, Diagnosis, Drug Discovery and Vaccine Development against COVID-19. Pathogens 2021, 10, 1048. [Google Scholar] [CrossRef] [PubMed]
- Alyasseri, Z.A.A.; Al-Betar, M.A.; Doush, I.A.; Awadallah, M.A.; Abasi, A.K.; Makhadmeh, S.N.; Alomari, O.A.; Abdulkareem, K.H.; Adam, A.; Damasevicius, R.; et al. Review on COVID-19 Diagnosis Models Based on Machine Learning and Deep Learning Approaches. Expert Syst. 2022, 39, e12759. [Google Scholar] [CrossRef] [PubMed]
- Khare, S.; Gurry, C.; Freitas, L.; Schultz, M.B.; Bach, G.; Diallo, A.; Akite, N.; Ho, J.; Lee, R.T.; Yeo, W.; et al. GISAID’s Role in Pandemic Response. China CDC Wkly. 2021, 3, 1049–1051. [Google Scholar] [CrossRef] [PubMed]
- Daron, J.; Bravo, I.G. Variability in Codon Usage in Coronaviruses Is Mainly Driven by Mutational Bias and Selective Constraints on CpG Dinucleotide. Viruses 2021, 13, 1800. [Google Scholar] [CrossRef] [PubMed]
- Forni, D.; Cagliani, R.; Pontremoli, C.; Clerici, M.; Sironi, M. The Substitution Spectra of Coronavirus Genomes. Brief. Bioinform. 2022, 23, bbab382. [Google Scholar] [CrossRef] [PubMed]
- Takata, M.A.; Gonçalves-Carneiro, D.; Zang, T.M.; Soll, S.J.; York, A.; Blanco-Melo, D.; Bieniasz, P.D. CG Dinucleotide Suppression Enables Antiviral Defence Targeting Non-Self RNA. Nature 2017, 550, 124–127. [Google Scholar] [CrossRef]
- Xia, X. Extreme Genomic CpG Deficiency in SARS-CoV-2 and Evasion of Host Antiviral Defense. Mol. Biol. Evol. 2020, 37, 2699–2705. [Google Scholar] [CrossRef] [Green Version]
- Rambaut, A.; Holmes, E.C.; O’Toole, Á.; Hill, V.; McCrone, J.T.; Ruis, C.; du Plessis, L.; Pybus, O.G. A Dynamic Nomenclature Proposal for SARS-CoV-2 Lineages to Assist Genomic Epidemiology. Nat. Microbiol. 2020, 5, 1403–1407. [Google Scholar] [CrossRef]
- O’Toole, Á.; Scher, E.; Underwood, A.; Jackson, B.; Hill, V.; McCrone, J.T.; Colquhoun, R.; Ruis, C.; Abu-Dahab, K.; Taylor, B.; et al. Assignment of Epidemiological Lineages in an Emerging Pandemic Using the Pangolin Tool. Virus Evol. 2021, 7, veab064. [Google Scholar] [CrossRef]
- Yi, K.; Kim, S.Y.; Bleazard, T.; Kim, T.; Youk, J.; Ju, Y.S. Mutational Spectrum of SARS-CoV-2 during the Global Pandemic. Exp. Mol. Med. 2021, 53, 1229–1237. [Google Scholar] [CrossRef]
- Rice, A.M.; Castillo Morales, A.; Ho, A.T.; Mordstein, C.; Mühlhausen, S.; Watson, S.; Cano, L.; Young, B.; Kudla, G.; Hurst, L.D. Evidence for Strong Mutation Bias toward, and Selection against, U Content in SARS-CoV-2: Implications for Vaccine Design. Mol. Biol. Evol. 2021, 38, 67–83. [Google Scholar] [CrossRef]
- Manfredonia, I.; Nithin, C.; Ponce-Salvatierra, A.; Ghosh, P.; Wirecki, T.K.; Marinus, T.; Ogando, N.S.; Snijder, E.J.; van Hemert, M.J.; Bujnicki, J.M.; et al. Genome-Wide Mapping of SARS-CoV-2 RNA Structures Identifies Therapeutically-Relevant Elements. Nucleic Acids Res. 2020, 48, 12436–12452. [Google Scholar] [CrossRef]
- Macip, G.; Garcia-Segura, P.; Mestres-Truyol, J.; Saldivar-Espinoza, B.; Pujadas, G.; Garcia-Vallvé, S. A Review of the Current Landscape of SARS-CoV-2 Main Protease Inhibitors: Have We Hit the Bullseye Yet? Int. J. Mol. Sci. 2022, 23, 259. [Google Scholar] [CrossRef]
- Petushkova, A.I.; Zamyatnin, A.A. Papain-Like Proteases as Coronaviral Drug Targets: Current Inhibitors, Opportunities, and Limitations. Pharmaceuticals 2020, 13, 277. [Google Scholar] [CrossRef]
- Chen, J.; Ali, F.; Khan, I.; Zhu, Y.Z. Recent Progress in the Development of Potential Drugs against SARS-CoV-2. Curr. Res. Pharmacol. Drug Discov. 2021, 2, 100057. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Mallapaty, S. Where Did Omicron Come from? Three Key Theories. Nature 2022, 602, 26–28. [Google Scholar] [CrossRef]
- Jangra, S.; Ye, C.; Rathnasinghe, R.; Stadlbauer, D.; Alshammary, H.; Amoako, A.A.; Awawda, M.H.; Beach, K.F.; Bermúdez-González, M.C.; Chernet, R.L.; et al. SARS-CoV-2 Spike E484K Mutation Reduces Antibody Neutralisation. Lancet Microbe 2021, 2, e283–e284. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, J.; Plante, K.S.; Plante, J.A.; Xie, X.; Zhang, X.; Ku, Z.; An, Z.; Scharton, D.; Schindewolf, C.; et al. The N501Y Spike Substitution Enhances SARS-CoV-2 Infection and Transmission. Nature 2022, 602, 294–299. [Google Scholar] [CrossRef]
- Motozono, C.; Toyoda, M.; Zahradnik, J.; Saito, A.; Nasser, H.; Tan, T.S.; Ngare, I.; Kimura, I.; Uriu, K.; Kosugi, Y.; et al. SARS-CoV-2 Spike L452R Variant Evades Cellular Immunity and Increases Infectivity. Cell Host Microbe 2021, 29, 1124–1136.e11. [Google Scholar] [CrossRef]
- Flynn, J.M.; Samant, N.; Schneider-Nachum, G.; Barkan, D.T.; Yilmaz, N.K.; Schiffer, C.A.; Moquin, S.A.; Dovala, D.; Bolon, D.N. Comprehensive Fitness Landscape of SARS-CoV-2 Mpro Reveals Insights into Viral Resistance Mechanisms. eLife 2022, 11, e77433. [Google Scholar] [CrossRef]
- Gimeno, A.; Mestres-Truyol, J.; Ojeda-Montes, M.J.; Macip, G.; Saldivar-Espinoza, B.; Cereto-Massagué, A.; Pujadas, G.; Garcia-Vallvé, S. Prediction of Novel Inhibitors of the Main Protease (M-pro) of SARS-CoV-2 through Consensus Docking and Drug Reposition. Int. J. Mol. Sci. 2020, 21, 3793. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; He, S.; Deng, W.; Zhang, Y.; Li, G.; Sun, J.; Zhao, W.; Guo, Y.; Yin, Z.; Li, D.; et al. Comprehensive Insights into the Catalytic Mechanism of Middle East Respiratory Syndrome 3C-Like Protease and Severe Acute Respiratory Syndrome 3C-Like Protease. ACS Catal. 2020, 10, 5871–5890. [Google Scholar] [CrossRef] [PubMed]
- Lan, J.; Ge, J.; Yu, J.; Shan, S.; Zhou, H.; Fan, S.; Zhang, Q.; Shi, X.; Wang, Q.; Zhang, L.; et al. Structure of the SARS-CoV-2 Spike Receptor-Binding Domain Bound to the ACE2 Receptor. Nature 2020, 581, 215–220. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chan, Y.A.; Zhan, S.H. The Emergence of the Spike Furin Cleavage Site in SARS-CoV-2. Mol. Biol. Evol. 2022, 39, msab327. [Google Scholar] [CrossRef] [PubMed]
- Lubinski, B.; Fernandes, M.H.V.; Frazier, L.; Tang, T.; Daniel, S.; Diel, D.G.; Jaimes, J.A.; Whittaker, G.R. Functional Evaluation of the P681H Mutation on the Proteolytic Activation of the SARS-CoV-2 Variant B.1.1.7 (Alpha) Spike. iScience 2022, 25, 103589. [Google Scholar] [CrossRef]
- Elbe, S.; Buckland-Merrett, G. Data, Disease and Diplomacy: GISAID’s Innovative Contribution to Global Health. Glob. Chall. 2017, 1, 33–46. [Google Scholar] [CrossRef] [Green Version]
- Severe Acute Respiratory Syndrome Coronavirus 2 Isolate Wuhan-Hu-1, Complete Genome. Available online: https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2 (accessed on 20 March 2022).
- Lorenz, R.; Bernhart, S.H.; Höner zu Siederdissen, C.; Tafer, H.; Flamm, C.; Stadler, P.F.; Hofacker, I.L. ViennaRNA Package 2.0. Algorithms Mol. Biol. 2011, 6, 26. [Google Scholar] [CrossRef]
- Buck, S.F. A Method of Estimation of Missing Values in Multivariate Data Suitable for Use with an Electronic Computer. J. R. Stat. Soc. Ser. B Methodol. 1960, 22, 302–306. [Google Scholar] [CrossRef]
- van Buuren, S.; Groothuis-Oudshoorn, K. Mice: Multivariate Imputation by Chained Equations in R. J. Stat. Softw. 2011, 45, 1–67. [Google Scholar] [CrossRef] [Green Version]
- Sklearn.Impute.IterativeImputer. Available online: https://scikit-learn/stable/modules/generated/sklearn.impute.IterativeImputer.html (accessed on 20 March 2022).
- Scikit-Optimize. Available online: https://github.com/scikit-optimize/scikit-optimize (accessed on 20 March 2022).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
- Le, T.T.; Fu, W.; Moore, J.H. Scaling Tree-Based Automated Machine Learning to Biomedical Big Data with a Feature Set Selector. Bioinformatics 2020, 36, 250–256. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jin, H.; Song, Q.; Hu, X. Auto-Keras: An Efficient Neural Architecture Search System. arXiv 2018, arXiv:1806.10282v3. [Google Scholar] [CrossRef]
- Plońska, A.; Ploński, P. MLJAR: State-of-the-Art Automated Machine Learning Framework for Tabular Data. Version 0.10.3. 2021. Available online: https://github.com/mljar/mljar-supervised (accessed on 12 November 2022).
- McNemar, Q. Note on the Sampling Error of the Difference between Correlated Proportions or Percentages. Psychometrika 1947, 12, 153–157. [Google Scholar] [CrossRef] [PubMed]
- Dror, R.; Baumer, G.; Shlomov, S.; Reichart, R. The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Melbourne, Australia, 2018; pp. 1383–1392. [Google Scholar]
- Ovadia, Y.; Fertig, E.; Ren, J.; Nado, Z.; Sculley, D.; Nowozin, S.; Dillon, J.V.; Lakshminarayanan, B.; Snoek, J. Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift. arXiv 2019, arXiv:1906.02530v2. [Google Scholar] [CrossRef]
- Tracking SARS-CoV-2 Variants. Available online: https://www.who.int/activities/tracking-SARS-CoV-2-variants (accessed on 20 April 2022).
- CoVariants. Available online: https://covariants.org/ (accessed on 20 April 2022).
NDRL Threshold Pred 04/2021 | NDRL Threshold True 01/2022 | ROC-AUC | Sensitivity | Specificity | Accuracy | Fps in 2021 | Fps in 2021 to Tps in 2022 | Fps to Tps Ratio |
---|---|---|---|---|---|---|---|---|
15 | 15 | 0.644 | 0.481 | 0.724 | 0.549 | 3147 | 2119 | 0.673 |
15 | 30 | 0.728 | 0.597 | 0.743 | 0.671 | 3147 | 1402 | 0.446 |
15 | 45 | 0.800 | 0.747 | 0.716 | 0.726 | 3147 | 557 | 0.177 |
15 | 60 | 0.848 | 0.853 | 0.681 | 0.715 | 3147 | 99 | 0.031 |
15 | 75 | 0.873 | 0.910 | 0.655 | 0.691 | 3147 | 14 | 0.004 |
15 | 90 | 0.879 | 0.936 | 0.636 | 0.668 | 3147 | 5 | 0.002 |
15 | 105 | 0.877 | 0.939 | 0.622 | 0.647 | 3147 | 2 | 0.001 |
15 | 120 | 0.880 | 0.949 | 0.612 | 0.634 | 3147 | 2 | 0.001 |
15 | 135 | 0.883 | 0.953 | 0.606 | 0.625 | 3147 | 0 | 0 |
Position Prediction. NDRL 15/45 * (2022) | |||||
---|---|---|---|---|---|
Variant of Concern | No. of Mutations | No. of Mutations NDRL45 | Accuracy | Sensitivity | Specificity |
Alpha | 11 | 8 | 0.636 | 0.750 | 0.333 |
Beta | 10 | 8 | 0.600 | 0.625 | 0.500 |
Delta | 9 | 7 | 0.778 | 0.857 | 0.500 |
Gamma | 15 | 11 | 0.800 | 0.818 | 0.750 |
Omicron | 33 | 24 | 0.697 | 0.708 | 0.667 |
Combined | 61 | 49 | 0.697 | 0.776 | 0.667 |
Mutation Prediction. NDRL 15/45 * (2022) | |||||
Alpha | 11 | 8 | 0.545 | 0.500 | 0.667 |
Beta | 10 | 8 | 0.400 | 0.375 | 0.500 |
Delta | 9 | 7 | 0.333 | 0.286 | 0.500 |
Gamma | 15 | 11 | 0.733 | 0.727 | 0.750 |
Omicron | 33 | 17 | 0.636 | 0.471 | 0.812 |
Combined | 61 | 42 | 0.607 | 0.500 | 0.842 |
Position | VoC * | Gene | Mutation | AA | N i | Countries i | NL i,† | NDRL i | Prediction 15/45 ‡ | ||
---|---|---|---|---|---|---|---|---|---|---|---|
pos. | mut. | pos. | mut. | ||||||||
3267 | A | Plpro | C3267U | T183I | 903,866 | 164 | 246 | 241 | 238 | tp | tp |
21614 | G | S | C21614U | L18F | 167,687 | 145 | 428 | 399 | 397 | tp | tp |
21762 | O | S | C21762U | A67V | 13,723 | 103 | 244 | 248 | 244 | tp | tp |
23709 | A | S | C23709U | T716I | 904,197 | 167 | 247 | 234 | 234 | tp | tp |
14408 | A,B,D,G,O | RNA pol | C14408U | P323L | 4,577,014 | 193 | 1450 | 1 | 1 | fp | fp |
6515 | O | Plpro | U6515A | L1266I | 61 | 4 | 3 | 15 | 3 | tn | tn |
23403 | A,B,D,G,O | S | A23403G | D614G | 4,589,366 | 193 | 1460 | 1 | 1 | tn | tn |
24424 | O | S | A24424C | Q954H | 5 | 4 | 4 | 30 | 4 | tn | tn |
8393 | O | Plpro | G8393A | A1892T | 722 | 30 | 33 | 43 | 32 | tn | tn |
10449 | O | M-pro | C10449A | P132H | 1064 | 32 | 33 | 173 | 31 | tp | tn |
23599 | O | S | U23599G | N679K | 2425 | 38 | 36 | 138 | 34 | tp | tn |
23854 | O | S | C23854A | N764K | 849 | 27 | 26 | 200 | 24 | tp | tn |
24130 | O | S | C24130A | N856K | 658 | 32 | 32 | 314 | 31 | tp | tn |
21801 | B | S | A21801C | D80A | 25,012 | 108 | 88 | 133 | 84 | fn | fn |
22917 | D | S | U22917G | L452R | 2,844,958 | 171 | 321 | 154 | 137 | fn | fn |
23063 | A,B,G,O | S | A23063U | N501Y | 1,020,863 | 175 | 280 | 243 | 242 | fn | fn |
21801 | B | S | A21801C | D80A | 25,012 | 108 | 88 | 133 | 84 | fn | fn |
Gene | Year † | tp | fp | fn | tn | tnp | acc | spec | Sens | roc-auc |
---|---|---|---|---|---|---|---|---|---|---|
spike | 2021 | 371 | 1880 | 471 | 24,032 | 113 | 0.912 | 0.927 | 0.441 | 0.684 |
2022 | 473 | 1778 | 596 | 23,907 | 103 | 0.911 | 0.931 | 0.442 | 0.687 | |
M-pro | 2021 | 133 | 492 | 26 | 5775 | 22 | 0.919 | 0.921 | 0.836 | 0.879 |
2022 | 141 | 484 | 41 | 5760 | 22 | 0.918 | 0.922 | 0.775 | 0.849 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Saldivar-Espinoza, B.; Macip, G.; Garcia-Segura, P.; Mestres-Truyol, J.; Puigbò, P.; Cereto-Massagué, A.; Pujadas, G.; Garcia-Vallve, S. Prediction of Recurrent Mutations in SARS-CoV-2 Using Artificial Neural Networks. Int. J. Mol. Sci. 2022, 23, 14683. https://doi.org/10.3390/ijms232314683
Saldivar-Espinoza B, Macip G, Garcia-Segura P, Mestres-Truyol J, Puigbò P, Cereto-Massagué A, Pujadas G, Garcia-Vallve S. Prediction of Recurrent Mutations in SARS-CoV-2 Using Artificial Neural Networks. International Journal of Molecular Sciences. 2022; 23(23):14683. https://doi.org/10.3390/ijms232314683
Chicago/Turabian StyleSaldivar-Espinoza, Bryan, Guillem Macip, Pol Garcia-Segura, Júlia Mestres-Truyol, Pere Puigbò, Adrià Cereto-Massagué, Gerard Pujadas, and Santiago Garcia-Vallve. 2022. "Prediction of Recurrent Mutations in SARS-CoV-2 Using Artificial Neural Networks" International Journal of Molecular Sciences 23, no. 23: 14683. https://doi.org/10.3390/ijms232314683