New Virus Variant Detection Based on the Optimal Natural Metric
Highlights
- We introduce a new algorithm designed for the automatic detection of emerging virus variants.
- The algorithm was tested on real datasets including SARS-CoV-2 and HIV-1, demonstrating nearly 100% precision in identification.
- Our approach enables the efficient identification of new virus variants based solely on sequence data, eliminating the need for biologists to pinpoint key viral regions.
- Our method pushes the boundaries of alignment-free techniques, expanding their application from classifying within known categories to recognizing new categories.
Abstract
1. Introduction
2. Materials and Methods
2.1. Materials
2.2. The Optimal Natural Metric
2.3. New Virus Detection Method
Algorithm 1 New variant detection method |
|
3. Results
3.1. SARS-CoV-2
3.2. HIV-1
3.3. Orthocoronavirinae
3.4. Time Complexity Analysis
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
COVID-19 | Coronavirus Disease 2019 |
SARS-CoV-2 | Severe Acute Respiratory Syndrome Coronavirus-2 |
HIV-1 | Human Immunodeficiency Virus-1 |
WHO | World Health Organization |
VOC | Variants Of Concern |
VOI | Variants Of Interest |
NCBI | National Center for Biotechnology Information |
NN | Nearest Neighbor |
Appendix A
References
- LaTourrette, K.; Garcia-Ruiz, H. Determinants of Virus Variation, Evolution, and Host Adaptation. Pathogens 2022, 11, 1039. [Google Scholar] [CrossRef]
- Uddin, M.; Mustafa, F.; Rizvi, T.A.; Loney, T.; Al Suwaidi, H.; Al-Marzouqi, A.H.H.; Kamal Eldin, A.; Alsabeeha, N.; Adrian, T.E.; Stefanini, C.; et al. SARS-CoV-2/COVID-19: Viral Genomics, Epidemiology, Vaccines, and Therapeutic Interventions. Viruses 2020, 12, 526. [Google Scholar] [CrossRef]
- Maartens, G.; Celum, C.; Lewin, S.R. HIV infection: Epidemiology, pathogenesis, treatment, and prevention. Lancet 2014, 384, 258–271. [Google Scholar] [CrossRef] [PubMed]
- Vulturar, D.-M.; Moacă, L.-Ș.; Neag, M.A.; Mitre, A.-O.; Alexescu, T.-G.; Gherman, D.; Făgărășan, I.; Chețan, I.M.; Gherman, C.D.; Melinte, O.-E.; et al. Delta Variant in the COVID-19 Pandemic: A Comparative Study on Clinical Outcomes Based on Vaccination Status. J. Pers. Med. 2024, 14, 358. [Google Scholar] [CrossRef]
- Huang, Y.; Yang, C.; Xu, X.F.; Xu, W.; Liu, S.W. Structural and functional properties of SARS-CoV-2 spike protein: Potential antivirus drug development for COVID-19. Acta Pharmacol. Sin. 2020, 41, 1141–1149. [Google Scholar] [CrossRef]
- Li, M.; Lou, F.; Fan, H. SARS-CoV-2 Variants of Concern Delta: A great challenge to prevention and control of COVID-19. Sig. Transduct. Target Ther. 2021, 6, 349. [Google Scholar] [CrossRef] [PubMed]
- Enhancing Response to Omicron SARS-CoV-2 Variant. Available online: https://www.who.int/publications/m/item/enhancing-readiness-for-omicron-(b.1.1.529)-technical-brief-and-priority-actions-for-member-states (accessed on 31 May 2024).
- Karim, S.S.A.; Karim, Q.A. Omicron SARS-CoV-2 variant: A new chapter in the COVID-19 pandemic. Lancet 2022, 399, 2126–2128. [Google Scholar]
- Zielezinski, A.; Vinga, S.; Almeida, J.S.; Karłowski, W.M. Alignment-free sequence comparison: Benefits, applications, and tools. Genome Biol. 2017, 18, 186. [Google Scholar] [CrossRef] [PubMed]
- Bonham-Carter, O.; Steele, J.; Bastola, D.R. Alignment-free genetic sequence comparisons: A review of recent approaches by word analysis. Brief. Bioinform. 2014, 15, 890–905. [Google Scholar] [CrossRef]
- Lu, Y.Y.; Tang, K.; Ren, J.; Fuhrman, J.A.; Waterman, M.S.; Sun, F. CAFE: ACcelerated Alignment-FrEe sequence analysis. Nucl. Acids Res. 2017, 45, W554–W559. [Google Scholar] [CrossRef] [PubMed]
- Deng, M.; Yu, C.; Liang, Q.; He, R.L.; Yau, S.S.T. A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications. PLoS ONE 2011, 6, e17293. [Google Scholar] [CrossRef]
- Wen, J.; Chan, R.H.; Yau, S.C.; He, R.L.; Yau, S.S.T. K-mer natural vector and its application to the phylogenetic analysis of genetic sequences. Gene 2014, 546, 25–34. [Google Scholar] [CrossRef] [PubMed]
- Yau, S.S.T.; Zhao, X.; Tian, K.; Yu, H. Mathematical Principles in Bioinformatics; Springer: Cham, Switzerland, 2023; pp. 91–144. [Google Scholar]
- Sun, N.; Pei, S.; He, L.; Yin, C.; He, R.L.; Yau, S.S.T. Geometric construction of viral genome space and its applications. Comput. Struct. Biotechnol. J. 2021, 19, 4226–4234. [Google Scholar] [CrossRef] [PubMed]
- Dong, R.; Pei, S.; Guan, M.; Yau, S.C.; Yin, C.; He, R.L.; Yau, S.S.T. Full Chromosomal Relationships Between Populations and the Origin of Humans. Front. Genet. 2022, 12, 828805. [Google Scholar] [CrossRef]
- Yu, H.; Yau, S.S.T. Automated recognition of chromosome fusion using an alignment-free natural vector method. Front. Genet. 2024, 15, 1364951. [Google Scholar] [CrossRef] [PubMed]
- Yu, H.; Yau, S.S.T. The optimal metric for viral genome space. Comput. Struct. Biotechnol. J. 2024, 23, 2083–2096. [Google Scholar] [CrossRef] [PubMed]
- Cover, T.M.; Hart, P.E. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Dekking, F.M.; Kraaikamp, C.; Lopuhaä, H.P.; Meester, L.E. A Modern Introduction to Probability and Statistics: Understanding Why and How; Springer: London, UK, 2005; pp. 377–379. [Google Scholar]
- Weglarczyk, S. Kernel density estimation and its application. ITM Web Conf. 2018, 23, 00037. [Google Scholar] [CrossRef]
- Taylor, B.S.; Sobieszczyk, M.E.; McCutchan, F.E.; Hammer, S.M. The challenge of HIV-1 subtype diversity. N. Engl. J. Med. 2008, 358, 1590–1602. [Google Scholar] [CrossRef]
- D’arc, M.; Ayouba, A.; Esteban, A.; Learn, G.H.; Boué, V.; Liegeois, F.; Etienne, L.; Tagg, N.; Leendertz, F.H.; Boesch, C.; et al. Origin of the HIV-1 group O epidemic in western lowland gorillas. Proc. Natl. Acad. Sci. USA 2015, 112, E1343–E1352. [Google Scholar] [CrossRef]
- Mourez, T.; Simon, F.; Plantier, J.C. Non-m variants of human immunodeficiency virus type 1. Clin. Microbiol. Rev. 2013, 26, 448–461. [Google Scholar] [CrossRef] [PubMed]
- Plantier, J.C.; Leoz, M.; Dickerson, J.E.; De Oliveira, F.; Cordonnier, F.; Lemée, V.; Damond, F.; Robertson, D.L.; Simon, F. A new human immunodeficiency virus derived from gorillas. Nat. Med. 2009, 15, 871–872. [Google Scholar] [CrossRef] [PubMed]
- Hemelaar, J.; Gouws, E.; Ghys, P.D.; Osmanov, S. Global and regional distribution of HIV-1 genetic subtypes and recombinants in 2004. AIDS 2006, 20, W13–W23. [Google Scholar] [CrossRef] [PubMed]
- Smith, D.M.; Richman, D.D.; Little, S.J. HIV Superinfection. J. Infect. Dis. 2005, 192, 438–444. [Google Scholar] [CrossRef]
- Louten, J. Essential Human Virology, 2nd ed.; Academic Press: Boston, MA, USA, 2023; pp. 277–306. [Google Scholar]
- McBride, R.; van Zyl, M.; Fielding, B.C. The coronavirus nucleocapsid is a multifunctional protein. Viruses 2014, 6, 2991–3018. [Google Scholar] [CrossRef] [PubMed]
- Carstens, E.B. Ratification vote on taxonomic proposals to the International Committee on Taxonomy of Viruses. Arch. Virol. 2010, 155, 133–146. [Google Scholar] [CrossRef]
- Altschul, S.; Gish, W.; Miller, W.; Myers, E.; Lipman, D. Basic Local Aligment Search Tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef]
- Needleman, S.B.; Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970, 48, 443–453. [Google Scholar] [CrossRef]
k-mer/Order | 0 | 1 | 2 |
---|---|---|---|
1 | |||
2 | |||
3 | |||
4 | |||
5 | |||
6 | |||
7 | |||
8 | |||
9 |
Dataset | Type I Error | Type II Error |
---|---|---|
SARS-CoV-2 | 0.94% | 0.96% |
HIV-1 | 0.94% | 0.87% |
Orthocoronavirinae | 0.03% | 0.98% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, H.; Yau, S.S.-T. New Virus Variant Detection Based on the Optimal Natural Metric. Genes 2024, 15, 891. https://doi.org/10.3390/genes15070891
Yu H, Yau SS-T. New Virus Variant Detection Based on the Optimal Natural Metric. Genes. 2024; 15(7):891. https://doi.org/10.3390/genes15070891
Chicago/Turabian StyleYu, Hongyu, and Stephen S.-T. Yau. 2024. "New Virus Variant Detection Based on the Optimal Natural Metric" Genes 15, no. 7: 891. https://doi.org/10.3390/genes15070891
APA StyleYu, H., & Yau, S. S.-T. (2024). New Virus Variant Detection Based on the Optimal Natural Metric. Genes, 15(7), 891. https://doi.org/10.3390/genes15070891