1. Introduction
In April 2025, Massey and Quay reported that seven patients infected with ancestral strains of SARS-CoV-2 were found at a hospital in the United States [
1]. The molecular clock predicts 14.7–36.1 single-nucleotide variations (SNVs) per case, while the actual number of SNVs found in each patient ranged from zero to six. The above paper suspects these cases most likely came from laboratory infections.
In five of the seven cases, the D614G mutation was not found in the sequence. It is known that D614 is unstable in humans while G614 is competent in human-to-human transmission [
2], which makes the extinction of D614 inevitable. Indeed, D614G is known to be the first major mutation observed in the original Wuhan strain [
2,
3] and the only mutation shared by all the VOCs, which means that the re-emergence of D614 does not align with the expectations of the natural mutation process.
From a molecular perspective, the structure of the SARS-CoV-2 spike trimer carrying the D614G substitution has been examined by cryo-electron microscopy. These studies indicate that D614G modulates receptor-binding domain positioning and conformational dynamics. In particular, D614G has been associated with increased accessibility of the S1/S2 junction to proteolytic processing, consistent with enhanced furin cleavage efficiency [
4]. Analyses further suggest that D614G stabilizes inter-protomer interactions within the trimer, thereby shifting the conformational ensemble toward an ACE2-binding-competent state [
5]. Moreover, the G614 spike has been reported to adopt a more open yet stable architecture, consistent with reduced premature S1 dissociation and a higher abundance of functional spikes available for entry [
6].
These molecular changes translate into measurable advantages at the level of viral replication and transmission. Zhang et al. showed that the G614 spike exhibits reduced S1 subunit shedding, resulting in an increased number of functional spike proteins per virion and improved efficiency of infectious particle production [
7]. Consistently, using both in vitro systems and animal models, Plante et al. demonstrated that D614G enhances viral replication in the upper respiratory tract and promotes inter-host transmission, providing direct experimental evidence that G614 confers a fitness advantage to SARS-CoV-2 [
8]. Together, these findings support the view that D614G is an adaptive substitution that optimizes spike processing, stability, and entry competence.
In this study, we survey cases in which the less-fit D614 was found in a wider scale to investigate whether any suspicious spread of SARS-CoV-2 samples took place, using publicly available genome data registered to date. We use data from the NCBI (National Center for Biotechnology Information) GenBank database, as the majority of the sequences were submitted by the CDC (Centers for Disease Control and Prevention) [
9]. Illumina sequencing, which is known for its low error rates, accounts for the majority of submissions on the CDC platform distribution, which may increase data reliability.
2. Materials and Methods
Spike protein (surface glycoprotein) sequences for 22 Variants of Concern (VOCs) were retrieved from the NCBI GenBank database in June 2023 by querying records annotated with the relevant Pango lineages and containing a translated spike protein sequence. No additional filtering was applied at the download stage; subsequent programmatic filtering is described below. The names of the 22 VOCs and the numbers of sequences used for the analyses are listed in
Table 1.
The ratios of reverse mutations of D614G in the spike protein were calculated in the 22 VOCs. The counts of D614 were divided by the number of sequences and the number of SNVs respectively to calculate the ratios. Mutations co-occurring with D614 were checked for the VOCs where the ratio of D614 reversions was high. Among the 22 VOCs, histograms of spike mutations for B.1.1.7, P.1, B.1.617.2, BA.1, BA.2, and XBB.1.5 were generated for comparison. To reduce computational burden, we excluded protein sequences containing deletions or insertions, retaining about 80–98% of sequences for each lineage. All filtering and analyses were implemented using custom Python 3.12 scripts.
The timing and the location of collection were analyzed to identify the epicenter of samples containing the reversion of D614G for the VOCs where many D614 reversions were found. Here the whole data were searched to count all D614 instances regardless of the difference in spike sequence length by picking up “YQDVN” from the amino acid sequences.
3. Results
The counts of D614 reversions divided by the total number of sequences (ratio A) and those divided by the number of SNVs (ratio B) are summarized for the 22 VOCs in
Figure 1. With regard to ratio A, B.1.617.2 has the highest value. With regard to ratio B, BA.2 has the highest value.
We used Fisher’s exact test to assess whether the proportion of sequences carrying a D614 reversion differed between each target VOC and a pooled set of the remaining VOCs. For each comparison, 2 × 2 contingency tables were constructed using counts of (i) sequences with a D614 reversion and (ii) sequences without a D614 reversion in the target VOC versus the pooled comparator group. Specifically, we compared B.1.617.2 with the other 20 VOCs (excluding B.1.617.2 and BA.2) and BA.2 with the other 20 VOCs (excluding B.1.617.2 and BA.2). The resulting p-values were 1.89 × 10−56 and 1.91 × 10−39, respectively.
As a supplementary analysis, we also constructed contingency tables using counts of (i) D614 reversion events and (ii) all other mutation events for the target VOC and the pooled comparator group. We again compared B.1.617.2 with the other 20 VOCs and BA.2 with the other 20 VOCs. The resulting p-values were 2.34 × 10−17 and 1.58 × 10−34, respectively.
The histograms of spike mutations for B.1.1.7, P.1, B.1.617.2, BA.1, BA.2, and XBB.1.5 based on the whole data obtained from GenBank are shown in
Figure 2. In all of the VOCs, residues are stable around the 614th amino acid, with small ratios of missing residues.
Counts of mutations co-occurring with D614 in B.1.617.2 and BA.2 lineages are listed in
Figure 3. Here the amino acids where co-occurrence of mutation is more than 10% and 20% are shown for B.1.617.2 and BA.2 respectively. In B.1.617.2, almost all major mutations are reversions (the 95th amino acid is the exception) and overall co-occurrence of mutation is infrequent. In BA.2, co-occurrence of mutation is more frequent, while major mutations are again reversions (the 408th amino acid is the exception).
The counts of D614 and the whole data of B.1.617.2 and BA.2 lineages in each month are shown in
Figure 4A,B. The timing of D614 surge is preceded by the surge of the whole data both in B.1.617.2 and BA.2. The delay is notably long in B.1.617.2. The locations in the United States of the whole sequences and the reversions of D614G in B.1.617.2 and BA.2 are shown in heatmaps in
Figure 4C–F. For B.1.617.2, D614-containing sequences were most frequently collected around Michigan and Illinois, whereas for BA.2 they were most frequently collected in New York and New Jersey. In both cases, these locations differ from the sites where overall samples of each lineage were most frequently collected.
4. Discussion
To explain the unexpected emergence of D614, the possibility of widespread sequencing error is unlikely, given that the CDC—an organization that employs high-accuracy sequencing technologies—accounts for most of the submissions of sequences carrying D614. It is also noteworthy that sequence coverage around position 614 is stable for B.1.617.2 and BA.2, with low levels of missing residues, as shown in
Figure 2. Although sequencing errors can be more frequent early in the emergence of a new variant due to immature primer design [
10], D614 is observed more often at later stages of the variant surge, as shown in
Figure 4A,B. Therefore, the emergence of D614 is difficult to attribute to primer-related issues.
Re-emergence of D614 could occur either through a point mutation or through homologous recombination. Substitution trends in SARS-CoV-2 have been studied extensively not only at the spike protein level [
11,
12,
13,
14], but also at the nucleotide level [
15,
16,
17], including the emergence of saltational variants such as Omicron BA.1 [
18] and BA.2.86 [
19]. Re-emergence of D614 via point mutation would require a G-to-A substitution at the corresponding nucleotide position (i.e., a transition), which is relatively common in SARS-CoV-2.
In contrast, re-emergence via recombination would require prior presence of a viral genome carrying D614. Since D614 had almost disappeared before the emergence of Delta and Omicron BA.2, it is difficult to explain elevated re-emergence of D614 through homologous recombination during typical community transmission. In addition, homologous recombination occurs less frequently in SARS-CoV-2 than in many other RNA viruses according to a previous study [
20].
As shown in
Figure 3, many mutations that co-occur with D614 in Delta and Omicron BA.2 are reversions. This pattern suggests that the D614-carrying genomes do not show obvious accompanying substitutions that might plausibly facilitate the persistence of D614 in these backgrounds. Because D614 is considered unstable in vivo, including in animal models such as hamsters [
21], the re-emergence of D614 without apparent accompanying changes is unexpected under strong negative selection.
As shown in
Figure 4, D614 reverse mutants were geographically concentrated in Illinois and Michigan for B.1.617.2, and in New York and New Jersey for BA.2. Notably, these locations differ from the sites where overall samples of each lineage were most frequently collected. This geographic pattern, together with the late rise in the frequency of an ancestral, fitness-reducing mutation that was previously thought to be almost extinct, is difficult to reconcile with typical community transmission alone and warrants consideration of alternative explanations for localized reintroduction, such as laboratory-associated events.
Since the start of the COVID-19 pandemic, quite a large number of laboratories have kept SARS-CoV-2 for experimental purposes. At the end of 2021, a researcher in Taiwan was bitten by a mouse in a biosafety level 3 laboratory and was infected with the Delta variant of SARS-CoV-2, spreading the disease without realizing it [
22]. In this case, the incident was confirmed as a lab leak because the virus infection in Taiwan had been subdued due to a strict quarantine policy, which made it easier to identify the researcher as the source of infection. If a lab leak takes place in a city populated with many infected patients, it quite likely remains unnoticed.
Many lab-leak accidents have happened historically and the number of them has been increasing due to the recent spread of genetic engineering [
23,
24]. Unfortunately, those accidents have gone undisclosed or been reported only after substantial delays in the field of microbiology [
25]. A typical example is the Sverdlovsk anthrax leak in 1979 [
26], which took 15 years to be accepted officially as a lab-leak event, while it took about 30 years to reach a consensus among virologists that the 1977 Russian influenza H1N1 originated from a frozen virus in a laboratory [
27]. Even recently, a prolonged brucellosis laboratory leak occurred in China, with a delay of nearly six months before formal acknowledgment was received from local authorities. In that case, geospatial analysis of open-source intelligence data identified the outbreak early, along with the likely source and a nearby laboratory implicated in the outbreak [
28].
It is known that the SARS-CoV-2 variants earlier than Omicron were more virulent than Omicron [
29]. If a patient infected with an earlier strain is treated assuming that they are infected with a prevailing non-virulent strain, the symptoms can be much severe than expected. To avoid improper treatment of infected patients, medical staff should be cautious in a vicinity of a laboratory studying infectious viruses.
5. Conclusions
In this study, we systematically examined the occurrence of D614 reversions across 22 SARS-CoV-2 VOCs using publicly available spike protein sequence data. Our analysis revealed that D614 reversions are not randomly distributed among VOCs but are markedly enriched in the Delta and Omicron BA.2 lineages. These enrichments were supported by statistically significant deviations from the distributions observed across other VOCs.
Beyond frequency, D614-carrying genomes in Delta and BA.2 exhibited distinctive features, including reversion-heavy co-mutation patterns with limited mutational diversity and pronounced geographic clustering within specific regions of the United States. Together, these characteristics are difficult to reconcile with spontaneous reverse mutation arising and spreading through broadly mixing community transmission alone.
Instead, the observed patterns are more consistent with localized reintroduction of an older genetic background. While the present study does not establish causal mechanisms, it highlights non-random features in publicly available genomic surveillance data that warrant further investigation. In particular, evaluating whether such patterns could involve laboratory-associated events or other non-community transmission processes will require additional epidemiological, experimental, and institutional data.
More broadly, our findings underscore the importance of continuous and transparent genomic surveillance, as well as careful interpretation of anomalous mutation patterns in large public databases. Systematic identification and scrutiny of such anomalies may contribute to improved biosafety awareness and to the early detection of atypical transmission scenarios in future outbreaks.
Author Contributions
Conceptualization, H.K.; methodology, H.K.; software, H.K.; validation, H.K.; formal analysis, H.K.; investigation, H.K.; resources, H.K.; data curation, H.K.; writing—original draft preparation, H.K.; writing—review and editing, H.K. and Y.M.; visualization, H.K.; supervision, Y.M.; project administration, H.K. and Y.M. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
This research is solely based on the publicly available data and does not involve any human subjects or laboratory animals, requiring no ethical approval to carry out the study.
Data Availability Statement
Acknowledgments
During the preparation of this manuscript, the authors used ChatGPT 5.2, specifically for improving English grammar and clarity. The authors have reviewed and edited the output and take full responsibility for the content of this publication.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| SNV | Single-Nucleotide Variant |
| NCBI | National Center for Biotechnology Information |
| CDC | Centers for Disease Control and Prevention |
| VOC | Variant of Concern |
References
- Massey, S.; Quay, S.C. The Illusion of Biosafety During SARS-CoV-2 Research: Multiple Apparent Occult Lab-Acquired Infections Are Identified Under BSL-3 Conditions at a Premier US-based Coronavirus Laboratory. Zenodo 2025, 15172195. [Google Scholar] [CrossRef]
- Korber, B.; Fischer, W.M.; Gnanakaran, S.; Yoon, H.; Theiler, J.; Abfalterer, W.; Hengartner, N.; Giorgi, E.E.; Bhattacharya, T.; Foley, B.; et al. Tracking changes in SARS-CoV-2 spike: Evidence that D614G increases infectivity of the COVID-19 virus. Cell 2020, 182, 812–827. [Google Scholar] [CrossRef] [PubMed]
- Volz, E.; Hill, V.; McCrone, J.T.; Price, A.; Jorgensen, D.; O’Toole, Á.; Southgate, J.; Johnson, R.; Jackson, B.; Nascimento, F.F.; et al. Evaluating the effects of SARS-CoV-2 spike mutation D614G on transmissibility and pathogenicity. Cell 2021, 184, 64–75. [Google Scholar] [CrossRef]
- Gobeil, S.M.-C.; Janowska, K.; McDowell, S.; Mansouri, K.; Parks, R.; Manne, K.; Stalls, V.; Kopp, M.F.; Henderson, R.; Edwards, R.J.; et al. D614G Mutation Alters SARS-CoV-2 Spike Conformation and Enhances Protease Cleavage at the S1/S2 Junction. Cell Rep. 2021, 34, 108630. [Google Scholar] [CrossRef] [PubMed]
- Yurkovetskiy, L.; Wang, X.; Pascal, K.E.; Tomkins-Tinch, C.; Nyalile, T.; Wang, Y.; Baum, A.; Diehl, W.E.; Dauphin, A.; Carbone, C.; et al. Structural and Functional Analysis of the D614G SARS-CoV-2 Spike Protein Variant. Cell 2020, 183, 739–751.e8. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Cai, Y.; Xiao, T.; Lu, J.; Peng, H.; Sterling, S.M.; Walsh, R.M., Jr.; Rits-Volloch, S.; Zhu, H.; Woosley, A.N.; et al. Structural Impact on SARS-CoV-2 Spike Protein by D614G Substitution. Science 2021, 372, 525–530. [Google Scholar] [CrossRef]
- Zhang, L.; Jackson, C.B.; Mou, H.; Ojha, A.; Peng, H.; Quinlan, B.D.; Rangarajan, E.S.; Pan, A.; Vanderheiden, A.; Suthar, M.S.; et al. SARS-CoV-2 Spike-Protein D614G Mutation Increases Virion Spike Density and Infectivity. Nat. Commun. 2020, 11, 6013. [Google Scholar] [CrossRef]
- Plante, J.A.; Liu, Y.; Liu, J.; Xia, H.; Johnson, B.A.; Lokugamage, K.G.; Zhang, X.; Muruato, A.E.; Zou, J.; Fontes-Garfias, C.R.; et al. Spike Mutation D614G Alters SARS-CoV-2 Fitness. Nature 2021, 592, 116–121. [Google Scholar] [CrossRef]
- Kakeya, H. Anomalies in regional and chronological distributions of SARS-CoV-2 Omicron BA.1.1 lineage in the United States. medRxiv 2024. [Google Scholar] [CrossRef]
- Martin, D.P.; Lytras, S.; Lucaci, A.G.; Maier, W.; Grüning, B.; Shank, S.D.; Weaver, S.; MacLean, O.A.; Orton, R.J.; Lemey, P.; et al. Selection analysis identifies clusters of unusual mutational changes in Omicron lineage BA.1 that likely impact spike function. Mol. Biol. Evol. 2022, 39, msac061. [Google Scholar] [CrossRef]
- Gan, H.H.; Zinno, J.; Piano, F.; Gunsalus, K.C. Omicron Spike protein has a positive electrostatic surface that promotes ACE2 recognition and antibody escape. Front. Virol. 2022, 2, 894531. [Google Scholar] [CrossRef]
- Cotten, M.; Phan, M.V.T. Evolution of increased positive charge on the SARS-CoV-2 spike protein may be adaptation to human transmission. iScience 2023, 26, 106230. [Google Scholar] [CrossRef] [PubMed]
- Harari, S.; Tahor, M.; Rutsinsky, N.; Meijer, S.; Miller, D.; Henig, O.; Halutz, O.; Levytskyi, K.; Ben-Ami, R.; Adler, A.; et al. Drivers of adaptive evolution during chronic SARS-CoV-2 infections. Nat. Med. 2022, 28, 1501–1508. [Google Scholar] [CrossRef] [PubMed]
- Raglow, Z.; Surie, D.; Chappell, J.D.; Zhu, Y.; Martin, E.T.; Kwon, J.H.; E Frosch, A.; Mohamed, A.; Gilbert, J.; E Bendall, E.; et al. SARS-CoV-2 shedding and evolution in patients who were immunocompromised during the omicron period: A multicentre, prospective analysis. Lancet Microbe 2024, 5, e235–e246. [Google Scholar] [CrossRef]
- Simmonds, P. Rampant C→U hypermutation in the genomes of SARS-CoV-2 and other coronaviruses: Causes and consequences for their short- and long-term evolutionary trajectories. mSphere 2020, 5, e00408–e00420. [Google Scholar] [CrossRef]
- Di Giorgio, S.; Martignano, F.; Torcia, M.G.; Mattiuz, G.; Conticello, S.G. Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2. Sci. Adv. 2020, 6, eabb5813. [Google Scholar] [CrossRef]
- Shan, K.J.; Wei, C.; Wang, Y.; Huan, Q.; Qian, W. Host-specific asymmetric accumulation of mutation types reveals that the origin of SARS-CoV-2 is consistent with a natural process. Innovation 2021, 2, 100159. [Google Scholar] [CrossRef]
- Wei, C.; Shan, K.-J.; Wang, W.; Zhang, S.; Huan, Q.; Qian, W. Evidence for a mouse origin of the SARS-CoV-2 Omicron variant. J. Genet. Genom. 2021, 48, 1111–1121. [Google Scholar] [CrossRef]
- Kakeya, H. Anomalous spike mutations and sporadic global detection of BA.2.86. JMA J. 2025, 8, 954–960. [Google Scholar] [CrossRef]
- Akaishi, T.; Fujiwara, K.; Ishii, T. Genetic recombination sites away from indel hotspots in SARS-related coronaviruses. Tohoku J. Exp. Med. 2023, 259, 17–26. [Google Scholar] [CrossRef]
- Hou, Y.J.; Chiba, S.; Halfmann, P.; Ehre, C.; Kuroda, M.; Dinnon, K.H.; Leist, S.R.; Schäfer, A.; Nakajima, N.; Takahashi, K.; et al. SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo. Science 2020, 370, 1464–1468. [Google Scholar] [CrossRef]
- Silver, A. Taiwan’s science academy fined for biosafety lapses after lab worker contracts COVID-19. Science 2022. [Google Scholar] [CrossRef]
- Butler, D. Fears grow over lab-bred flu. Nature 2011, 480, 421–422. [Google Scholar] [CrossRef] [PubMed]
- Biosafety in the balance. Nature 2014, 510, 443. [CrossRef] [PubMed][Green Version]
- Young, A. Pandora’s Gamble: Lab Leaks, Pandemics, and a World at Risk; Center Street: New York, NY, USA, 2023; ISBN 978-1546002932. [Google Scholar]
- Meselson, M.; Guillemin, J.; Hugh-Jones, M. The Sverdlovsk Anthrax Outbreak of 1979. Science 1994, 266, 1202–1208. [Google Scholar] [CrossRef] [PubMed]
- Kransnitz, M.; Levine, A.J.; Rabadan, R. Anomalies in the Influenza Virus Genome Database: New Biology or Laboratory Errors? J. Virol. 2008, 82, 8947–8950. [Google Scholar] [CrossRef]
- Muluneh, A.G.; Lim, S.; Moa, A.; Maclntyre, C.R. Geospatial analysis of open-source intelligence data to early detect laboratory-acquired infections, using the 2019 brucellosis laboratory leak in China as a case study. Infection 2025, 54, 331–338. [Google Scholar] [CrossRef]
- Brüssow, H. COVID-19: Omicron—The latest, the least virulent, but probably not the last variant of concern of SARS-CoV-2. Microb. Biotechnol. 2025, 15, 1927–1939. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |