How Trustworthy Are the Genomic Sequences of SARS-CoV-2 in GenBank?
:1. Introduction
1.1. Rationale for Identifying Inauthentic SARS-CoV-2 Genome Sequences in GenBank
1.2. Identifying SARS-CoV-2 Genomes in GenBank with Altered Collection Times
2. Materials and Methods
2.1. Identify Inauthentic Sequences
2.2. Identify Genomes with Altered Collection Times
3. Results
3.1. Inauthentic SARS-CoV-2 Genomes in GenBank
3.2. Changes in Viral Sample Collection Times
3.3. NCBI Is Slow to Correct Annotation Errors
4. Discussion
5. Conclusions
Data Availability Statement
Conflicts of Interest
- Wu, F.; Zhao, S.; Yu, B.; Chen, Y.M.; Wang, W.; Song, Z.G.; Hu, Y.; Tao, Z.W.; Tian, J.H.; Pei, Y.Y.; et al. A new coronavirus associated with human respiratory disease in China. Nature 2020, 579, 265–269. [Google Scholar] [CrossRef] [PubMed]
- Polack, F.P.; Thomas, S.J.; Kitchin, N.; Absalon, J.; Gurtman, A.; Lockhart, S.; Perez, J.L.; Pérez Marc, G.; Moreira, E.D.; Zerbini, C.; et al. Safety and Efficacy of the BNT162b2 mRNA COVID-19 Vaccine. N. Engl. J. Med. 2020, 383, 2603–2615. [Google Scholar] [CrossRef] [PubMed]
- Corbett, K.S.; Edwards, D.K.; Leist, S.R.; Abiona, O.M.; Boyoglu-Barnum, S.; Gillespie, R.A.; Himansu, S.; Schäfer, A.; Ziwawo, C.T.; DiPiazza, A.T.; et al. SARS-CoV-2 mRNA vaccine design enabled by prototype pathogen preparedness. Nature 2020, 586, 567–571. [Google Scholar] [CrossRef] [PubMed]
- Xia, X. Detailed Dissection and Critical Evaluation of the Pfizer/BioNTech and Moderna mRNA Vaccines. Vaccines 2021, 9, 734. [Google Scholar] [CrossRef] [PubMed]
- Xia, X. Domains and Functions of Spike Protein in SARS-CoV-2 in the Context of Vaccine Design. Viruses 2021, 13, 109. [Google Scholar] [CrossRef] [PubMed]
- MacLean, O.A.; Lytras, S.; Weaver, S.; Singer, J.B.; Boni, M.F.; Lemey, P.; Kosakovsky Pond, S.L.; Robertson, D.L. Natural selection in the evolution of SARS-CoV-2 in bats created a generalist virus and highly capable human pathogen. PLoS Biol. 2021, 19, e3001115. [Google Scholar] [CrossRef]
- Wang, H.; Pipes, L.; Nielsen, R. Synonymous mutations and the molecular evolution of SARS-CoV-2 origins. Virus Evol. 2021, 7, veaa098. [Google Scholar] [CrossRef]
- Boni, M.F.; Lemey, P.; Jiang, X.; Lam, T.T.-Y.; Perry, B.; Castoe, T.; Rambaut, A.; Robertson, D.L. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat. Microbiol. 2020, 5, 1408–1417. [Google Scholar] [CrossRef]
- Lytras, S.; Xia, W.; Hughes, J.; Jiang, X.; Robertson, D.L. The animal origin of SARS-CoV-2. Science 2021, 373, 968–970. [Google Scholar] [CrossRef]
- Xia, X. Dating the Common Ancestor from an NCBI Tree of 83688 High-Quality and Full-Length SARS-CoV-2 Genomes. Viruses 2021, 13, 1790. [Google Scholar] [CrossRef]
- Xia, X. Improved method for rooting and tip-dating a viral phylogeny. In Handbook of Statistical Bioinformatics; Lu, H.H.-S., Scholkopf, B., Wells, M.T., Zhao, H., Eds.; Springer: Berlin, Germany, 2022; pp. 397–410. [Google Scholar]
- Vakatov, D. The NCBI C++ Toolkit Book; National Center for Biotechnology Information (US): Bethesda, MD, USA, 2009. Available online: (accessed on 1 September 2021).
- Xia, X. Rooting and Dating Large SARS-CoV-2 Trees by Modeling Evolutionary Rate as a Function of Time. Viruses 2023, 15, 684. [Google Scholar] [CrossRef] [PubMed]
- Xia, X. Extreme genomic CpG deficiency in SARS-CoV-2 and evasion of host antiviral defense. Mol. Biol. Evol. 2020, 37, 2699–2705. [Google Scholar] [CrossRef] [PubMed]
- Nchioua, R.; Kmiec, D.; Müller, J.A.; Conzelmann, C.; Groß, R.; Swanson, C.M.; Neil, S.J.D.; Stenger, S.; Sauter, D.; Münch, J.; et al. SARS-CoV-2 Is Restricted by Zinc Finger Antiviral Protein despite Preadaptation to the Low-CpG Environment in Humans. MBio 2020, 11, e01930-20. [Google Scholar] [CrossRef] [PubMed]
- Zimmer, M.M.; Kibe, A.; Rand, U.; Pekarek, L.; Ye, L.; Buck, S.; Smyth, R.P.; Cicin-Sain, L.; Caliskan, N. The short isoform of the host antiviral protein ZAP acts as an inhibitor of SARS-CoV-2 programmed ribosomal frameshifting. Nat. Commun. 2021, 12, 7193. [Google Scholar] [CrossRef] [PubMed]
- Kmiec, D.; Lista, M.J.; Ficarelli, M.; Swanson, C.M.; Neil, S.J.D. S-farnesylation is essential for antiviral activity of the long ZAP isoform against RNA viruses with diverse replication strategies. PLoS Pathog. 2021, 17, e1009726. [Google Scholar] [CrossRef]
- Jacot, D.; Pillonel, T.; Greub, G.; Bertelli, C. Assessment of SARS-CoV-2 Genome Sequencing: Quality Criteria and Low-Frequency Variants. J. Clin. Microbiol. 2021, 59, e0094421. [Google Scholar] [CrossRef]
- Wegner, F.; Roloff, T.; Huber, M.; Cordey, S.; Ramette, A.; Gerth, Y.; Bertelli, C.; Stange, M.; Seth-Smith, H.M.B.; Mari, A.; et al. External Quality Assessment of SARS-CoV-2 Sequencing: An ESGMD-SSM Pilot Trial across 15 European Laboratories. J. Clin. Microbiol. 2022, 60, e01698-21. [Google Scholar] [CrossRef]
- Camp, J.V.; Puchhammer-Stöckl, E.; Aberle, S.W.; Buchta, C. Virus sequencing performance during the SARS-CoV-2 pandemic: A retrospective analysis of data from multiple rounds of external quality assessment in Austria. Front. Mol. Biosci. 2024, 11, 1327699. [Google Scholar] [CrossRef]
- Lau, K.A.; Foster, C.S.P.; Theis, T.; Draper, J.; Sullivan, M.J.; Ballard, S.; Rawlinson, W.D. Continued improvement in the development of the SARS-CoV-2 whole genome sequencing proficiency testing program. Pathology 2024, 56, 717–725. [Google Scholar] [CrossRef]
- Maschietto, C.; Otto, G.; Rouzé, P.; Debortoli, N.; Bihin, B.; Nyinkeu, L.; Denis, O.; Huang, T.-D.; Mullier, F.; Bogaerts, P.; et al. Minimal requirements for ISO15189 validation and accreditation of three next generation sequencing procedures for SARS-CoV-2 surveillance in clinical setting. Sci. Rep. 2023, 13, 6934. [Google Scholar] [CrossRef]
- van Dorp, L.; Acman, M.; Richard, D.; Shaw, L.P.; Ford, C.E.; Ormond, L.; Owen, C.J.; Pang, J.; Tan, C.C.S.; Boshier, F.A.T.; et al. Emergence of genomic diversity and recurrent mutations in SARS-CoV-2. Infect. Genet. Evol. 2020, 83, 104351. [Google Scholar] [CrossRef] [PubMed]
- Gómez-Carballa, A.; Bello, X.; Pardo-Seco, J.; Martinón-Torres, F.; Salas, A. Mapping genome variation of SARS-CoV-2 worldwide highlights the impact of COVID-19 super-spreaders. Genome Res. 2020, 30, 1434–1448. [Google Scholar] [CrossRef] [PubMed]
- Rambaut, A.; Holmes, E.C.; O’Toole, Á.; Hill, V.; McCrone, J.T.; Ruis, C.; du Plessis, L.; Pybus, O.G. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 2020, 5, 1403–1407. [Google Scholar] [CrossRef] [PubMed]
- Chaw, S.-M.; Tai, J.-H.; Chen, S.-L.; Hsieh, C.-H.; Chang, S.-Y.; Yeh, S.-H.; Yang, W.-S.; Chen, P.-J.; Wang, H.-Y. The origin and underlying driving forces of the SARS-CoV-2 outbreak. J. Biomed. Sci. 2020, 27, 73. [Google Scholar] [CrossRef]
- Drummond, A.J.; Ho, S.Y.; Phillips, M.J.; Rambaut, A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 2006, 4, e88. [Google Scholar] [CrossRef]
- Lepage, T.; Bryant, D.; Philippe, H.; Lartillot, N. A general comparison of relaxed molecular clock models. Mol. Biol. Evol. 2007, 24, 2669–2680. [Google Scholar] [CrossRef]
- Rannala, B.; Yang, Z. Inferring speciation times under an episodic molecular clock. Syst. Biol. 2007, 56, 453–466. [Google Scholar] [CrossRef]
- De Maio, N.; Walker, C.R.; Turakhia, Y.; Lanfear, R.; Corbett-Detig, R.; Goldman, N. Mutation Rates and Selection on Synonymous Mutations in SARS-CoV-2. Genome Biol. Evol. 2021, 13, evab087. [Google Scholar] [CrossRef]
- Korber, B.; Fischer, W.M.; Gnanakaran, S.; Yoon, H.; Theiler, J.; Abfalterer, W.; Hengartner, N.; Giorgi, E.E.; Bhattacharya, T.; Foley, B.; et al. Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus. Cell 2020, 182, 812–827.e819. [Google Scholar] [CrossRef]
- Yurkovetskiy, L.; Wang, X.; Pascal, K.E.; Tomkins-Tinch, C.; Nyalile, T.P.; Wang, Y.; Baum, A.; Diehl, W.E.; Dauphin, A.; Carbone, C.; et al. Structural and Functional Analysis of the D614G SARS-CoV-2 Spike Protein Variant. Cell 2020, 183, 739–751.e738. [Google Scholar] [CrossRef]
- Oude Munnink, B.B.; Sikkema, R.S.; Nieuwenhuijse, D.F.; Molenaar, R.J.; Munger, E.; Molenkamp, R.; van der Spek, A.; Tolsma, P.; Rietveld, A.; Brouwer, M.; et al. Transmission of SARS-CoV-2 on mink farms between humans and mink and back to humans. Science 2021, 371, 172. [Google Scholar] [CrossRef] [PubMed]
- Benson, D.A.; Boguski, M.; Lipman, D.J.; Ostell, J. GenBank. Nucleic Acids Res. 1994, 22, 3441–3444. [Google Scholar] [CrossRef] [PubMed]
- Benson, D.A.; Boguski, M.; Lipman, D.J.; Ostell, J. GenBank. Nucleic Acids Res. 1996, 24, 1–5. [Google Scholar] [CrossRef] [PubMed]
- Benson, D.A.; Boguski, M.S.; Lipman, D.J.; Ostell, J. GenBank. Nucleic Acids Res. 1997, 25, 1–6. [Google Scholar] [CrossRef] [PubMed]
- Sayers, E.W.; Cavanaugh, M.; Clark, K.; Pruitt, K.D.; Schoch, C.L.; Sherry, S.T.; Karsch-Mizrachi, I. GenBank. Nucleic Acids Res. 2022, 50, D161–D164. [Google Scholar] [CrossRef]
- Sayers, E.W.; Cavanaugh, M.; Clark, K.; Pruitt, K.D.; Sherry, S.T.; Yankie, L.; Karsch-Mizrachi, I. GenBank 2023 update. Nucleic Acids Res. 2023, 51, D141–D144. [Google Scholar] [CrossRef]
- Sayers, E.W.; Cavanaugh, M.; Clark, K.; Pruitt, K.D.; Sherry, S.T.; Yankie, L.; Karsch-Mizrachi, I. GenBank 2024 Update. Nucleic Acids Res. 2024, 52, D134–D137. [Google Scholar] [CrossRef]
- Worobey, M.; Levy, J.I.; Malpica Serrano, L.; Crits-Christoph, A.; Pekar, J.E.; Goldstein, S.A.; Rasmussen, A.L.; Kraemer, M.U.G.; Newman, C.; Koopmans, M.P.G.; et al. The Huanan Seafood Wholesale Market in Wuhan was the early epicenter of the COVID-19 pandemic. Science 2022, 377, 951–959. [Google Scholar] [CrossRef]
- Pekar, J.E.; Magee, A.; Parker, E.; Moshiri, N.; Izhikevich, K.; Havens, J.L.; Gangavarapu, K.; Malpica Serrano, L.M.; Crits-Christoph, A.; Matteson, N.L.; et al. The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2. Science 2022, 377, 960–966. [Google Scholar] [CrossRef]
- Volz, E.; Hill, V.; McCrone, J.T.; Price, A.; Jorgensen, D.; O’Toole, Á.; Southgate, J.; Johnson, R.; Jackson, B.; Nascimento, F.F.; et al. Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity. Cell 2020, 184, 64–75. [Google Scholar] [CrossRef]
- Ruan, Y.; Wen, H.; Hou, M.; He, Z.; Lu, X.; Xue, Y.; He, X.; Zhang, Y.-P.; Wu, C.-I. The twin-beginnings of COVID-19 in Asia and Europe—One prevails quickly. Natl. Sci. Rev. 2022, 9, nwab223. [Google Scholar] [CrossRef]
- Schriml, L.M.; Chuvochina, M.; Davies, N.; Eloe-Fadrosh, E.A.; Finn, R.D.; Hugenholtz, P.; Hunter, C.I.; Hurwitz, B.L.; Kyrpides, N.C.; Meyer, F.; et al. COVID-19 pandemic reveals the peril of ignoring metadata standards. Sci. Data 2020, 7, 188. [Google Scholar] [CrossRef]
- Kamil, J.P. Virus variants: GISAID policies incentivize surveillance in global south. Nature 2021, 593, 341. [Google Scholar] [CrossRef]
- Shu, Y.; McCauley, J. GISAID: Global initiative on sharing all influenza data—From vision to reality. Euro Surveill. 2017, 22, 30494. [Google Scholar] [CrossRef]
ACCN (1) | Country | T (2) | (3) | (4) | (5) |
OM094978 | USA | 24 March 2021 | 454 | 25.0880 | 1.2718 × 10−11 |
OM108445 | India | 1 July 2021 | 553 | 30.5588 | 5.3517 × 10−14 |
OP022337 | USA | 20 October 2021 | 664 | 36.6926 | 1.1603 × 10−16 |
OP268178 | Mexico | 19 August 2022 | 967 | 53.4364 | 6.2067 × 10−24 |
PP434597 | India | 10 April 2023 | 1201 | 66.3673 | 1.5034 × 10−29 |
PQ008636 | India | 10 December 2023 | 1445 | 79.8507 | 2.0955 × 10−35 |
PQ008633 | India | 11 January 2024 | 1477 | 81.6190 | 3.5753 × 10−36 |
PQ008634 | India | 11 January 2024 | 1477 | 81.6190 | 3.5753 × 10−36 |
PQ008635 | India | 18 January 2024 | 1484 | 82.0058 | 2.4284 × 10−36 |
ACCN | Country | T | |||
OM095202 | USA | 8 October 2020 | 287 | 15.8596 | 1.2950 × 10−07 |
MZ722043 | USA | 25 October 2020 | 304 | 16.7990 | 5.0614 × 10−08 |
OM095001 | USA | 25 November 2020 | 335 | 18.5121 | 9.1264 × 10−09 |
OM095004 | USA | 25 November 2020 | 335 | 18.5121 | 9.1264 × 10−09 |
OM095010 | USA | 25 November 2020 | 335 | 18.5121 | 9.1264 × 10−09 |
OM095127 | USA | 11 December 2020 | 351 | 19.3963 | 3.7697 × 10−09 |
MW960278 | Pakistan | 11 December 2020 | 351 | 19.3963 | 3.7697 × 10−09 |
MZ722192 | USA | 14 December 2020 | 354 | 19.5620 | 3.1938 × 10−09 |
OP278726 | Pakistan | 17 December 2020 | 357 | 19.7278 | 2.7059 × 10−09 |
OM095142 | USA | 21 December 2020 | 361 | 19.9489 | 2.1693 × 10−09 |
MZ722000 | USA | 21 December 2020 | 361 | 19.9489 | 2.1693 × 10−09 |
MZ722615 | USA | 21 December 2020 | 361 | 19.9489 | 2.1693 × 10−09 |
MZ722630 | USA | 21 December 2020 | 361 | 19.9489 | 2.1693 × 10−09 |
MZ722702 | USA | 21 December 2020 | 361 | 19.9489 | 2.1693 × 10−09 |
OP022336 | USA | 30 December 2020 | 370 | 20.4462 | 1.3193 × 10−09 |
ACCN | Country | T1 (1) | T2 (2) | Tree1..Tree2 (3) | T1–T2 |
MW795884 | USA | 13 January 2020 | 13 January 2021 | −366 | |
OK244698 | USA | 14 January 2020 | 30 December 2021 | −716 | |
MW585340 | USA | 5 January 2020 | 5 January 2021 | −366 | |
MZ028629 | USA | 18 February 2020 | 18 February 2021 | 12 July 2021..7 May 2022 | −366 |
MZ436887 | Sierra Leone | 14 January 2020 | 14 January 2021 | 8 November 2021..7 May 2022 | −366 |
MZ436896 | Sierra Leone | 14 January 2020 | 14 January 2021 | 8 November 2021..7 May 2022 | −366 |
MZ469886 | USA | 12 January 2020 | 12 January 2021 | 8 November 2021..7 May 2022 | −366 |
MZ469887 | USA | 6 January 2020 | 6 January 2021 | 8 November 2021..7 May 2022 | −366 |
MZ473469 | USA | 17 February 2020 | 17 February 2021 | 8 November 2021..7 May 2022 | −366 |
MW786995 | USA | 10 March 2020 | 10 March 2021 | 3 April 2021..7 May 2022 | −365 |
MW921831 | USA | 15 March 2020 | 15 March 2021 | 25 April 2021..7 May 2022 | −365 |
MZ021503 | India | 1 March 2020 | 1 March 2021 | 8 November 2021..7 May 2022 | −365 |
MZ021504 | India | 6 March 2020 | 6 March 2021 | 8 November 2021..7 May 2022 | −365 |
MZ021505 | India | 6 March 2020 | 6 March 2021 | 8 November 2021..7 May 2022 | −365 |
MZ021506 | India | 6 March 2020 | 6 March 2021 | 8 November 2021..7 May 2022 | −365 |
MZ278198 | USA | 21 April 2020 | 21 April 2021 | 8 November 2021..7 May 2022 | −365 |
MZ397171 | Myanmar | 28 May 2020 | 28 May 2021 | 8 November 2021..7 May 2022 | −365 |
MZ397172 | Myanmar | 28 May 2020 | 28 May 2021 | 8 November 2021..7 May 2022 | −365 |
MZ397173 | Myanmar | 28 May 2020 | 28 May 2021 | 8 November 2021..7 May 2022 | −365 |
MZ397174 | Myanmar | 28 May 2020 | 28 May 2021 | 8 November 2021..7 May 2022 | −365 |
MZ397175 | Myanmar | 2 June 2020 | 2 June 2021 | 8 November 2021..7 May 2022 | −365 |
MZ397176 | Myanmar | 2 June 2020 | 2 June 2021 | 8 November 2021..7 May 2022 | −365 |
MZ397177 | Myanmar | 26 May 2020 | 26 May 2021 | 8 November 2021..7 May 2022 | −365 |
MW591579 | USA | 18 January 2020 | 17 December 2020 | 25 April 2021..7 May 2022 | −334 |
MW750862 | USA | 22 May 2020 | 2 March 2021 | 3 April 2021..7 May 2022 | −284 |
MW750906 | USA | 23 May 2020 | 14 January 2021 | 3 April 2021..7 May 2022 | −236 |
MW737421 | Iran | 25 October 2019 | 11 January 2020 | 3 April 2021..7 May 2022 | −109 |
MW898809 | Iran | 12 December 2019 | 29 February 2020 | 25 April 2021..7 May 2022 | −79 |
MZ077094 | USA | 14 April 2021 | 20 April 2021 | 12 July 2021..7 May 2022 | −6 |
MW093534 | USA | 6 June 2020 | 11 June 2020 | 3 April 2021..4 September 2021 | −5 |
MW883366 | USA | 29 March 2021 | 22 March 2021 | 25 April 2021..7 May 2022 | 7 |
MW883371 | USA | 27 March 2021 | 16 March 2021 | 25 April 2021..7 May 2022 | 11 |
MW883363 | USA | 29 March 2021 | 11 March 2021 | 25 April 2021..7 May 2022 | 18 |
MW883370 | USA | 27 March 2021 | 8 March 2021 | 25 April 2021..7 May 2022 | 19 |
MW883364 | USA | 29 March 2021 | 21 January 2021 | 25 April 2021..7 May 2022 | 67 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (
Share and Cite
Xia, X. How Trustworthy Are the Genomic Sequences of SARS-CoV-2 in GenBank? Microorganisms 2024, 12, 2187.
Xia X. How Trustworthy Are the Genomic Sequences of SARS-CoV-2 in GenBank? Microorganisms. 2024; 12(11):2187.
Chicago/Turabian StyleXia, Xuhua. 2024. "How Trustworthy Are the Genomic Sequences of SARS-CoV-2 in GenBank?" Microorganisms 12, no. 11: 2187.
APA StyleXia, X. (2024). How Trustworthy Are the Genomic Sequences of SARS-CoV-2 in GenBank? Microorganisms, 12(11), 2187.