Molecular Epidemiology of SARS-CoV-2 in Bangladesh

Mohammad Mahmud, Abu Sayeed; Andersson, Patiyan; Bulach, Dieter; Duchene, Sebastian; da Silva, Anders Goncalves; Lin, Chantel; Seemann, Torsten; Howden, Benjamin P.; Stinear, Timothy P.; Taznin, Tarannum; Habib, Md. Ahashan; Akter, Shahina; Banu, Tanjina Akhtar; Sarkar, Md. Murshed Hasan; Goswami, Barna; Jahan, Iffat; Khan, Md. Salim

doi:10.3390/v17040517

Open AccessArticle

Molecular Epidemiology of SARS-CoV-2 in Bangladesh

by

Abu Sayeed Mohammad Mahmud

^1,†,

Patiyan Andersson

^2,†,

Dieter Bulach

²,

Sebastian Duchene

³

,

Anders Goncalves da Silva

²,

Chantel Lin

²,

Torsten Seemann

^2,4

,

Benjamin P. Howden

^2,4,

Timothy P. Stinear

^3,4

,

Tarannum Taznin

⁵,

Md. Ahashan Habib

¹,

Shahina Akter

¹

,

Tanjina Akhtar Banu

¹,

Md. Murshed Hasan Sarkar

¹,

Barna Goswami

¹,

Iffat Jahan

¹ and

Md. Salim Khan

^1,*

¹

Bangladesh Council of Scientific and Industrial Research, Dr. Qudrat-E-Khuda Road, Dhaka 1205, Bangladesh

²

Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia

³

Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia

⁴

Centre for Pathogen Genomics, University of Melbourne, Melbourne, VIC 3000, Australia

⁵

Department of Microbiology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore 7408, Bangladesh

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Viruses 2025, 17(4), 517; https://doi.org/10.3390/v17040517

Submission received: 28 September 2024 / Revised: 11 December 2024 / Accepted: 16 December 2024 / Published: 1 April 2025

(This article belongs to the Section Coronaviruses)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Mutation is one of the most important drivers of viral evolution and genome variability, allowing viruses to potentially evade host immune responses and develop drug resistance. In the context of COVID-19, local genomic surveillance of circulating virus populations is therefore critical. The goals of this study were to describe the distribution of different SARS-CoV-2 lineages, assess their genomic differences, and infer virus importation events in Bangladesh. We individually aligned 1965 SARS-CoV-2 genome sequences obtained between April 2020 and June 2021 to the Wuhan-1 sequence and used the resulting multiple sequence alignment as input to infer a maximum likelihood phylogenetic tree. Sequences were assigned to lineages as described by the hierarchical Pangolin nomenclature scheme. We built a phylogeographic model using the virus population genome sequence variation to infer the number of virus importation events. We observed thirty-four lineages and sub-lineages in Bangladesh, with B.1.1.25 and its sub-lineages D.* (979 sequences) dominating, as well as the Beta variant of concern (VOC) B.1.351 and its sub-lineages B.1.351.* (403 sequences). The earliest B.1.1.25/D.* lineages likely resulted from multiple introductions, some of which led to larger outbreak clusters. There were 570 missense mutations, 426 synonymous mutations, 18 frameshift mutations, 7 deletions, 2 insertions, 10 changes at start/stop codons, and 64 mutations in intergenic or untranslated regions. According to phylogeographic modeling, there were 31 importation events into Bangladesh (95% CI: 27–36). Like elsewhere, Bangladesh has experienced distinct waves of dominant lineages during the COVID-19 pandemic; this study focuses on the emergence and displacement of the first wave-dominated lineage, which contains mutations seen in several VOCs and may have had a transmission advantage over the extant lineages.

Keywords:

SARS-CoV-2; mutation; phylogeny; Bangladesh; spike protein; epidemiology

1. Introduction

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of COVID-19, was first reported in Wuhan, China, in December 2019 [1]. The outbreak of SARS-CoV-2 was declared a pandemic by the World Health Organization (WHO) on 11 March 2020. The first COVID-19 case in Bangladesh was reported on 8 March 2020; as of 30 June 2021, there were 888,406 confirmed cases (21.6 cases/100,000/week) and 14,172 deaths (case fatality rate of 1.6%) [2]. While all clades of SARS-CoV-2 were introduced to Bangladesh early in the pandemic, the B.1.1.25 lineage, also known as clade 20B, rapidly became the most prevalent [3,4].

Theoretically, SARS-CoV-2 containment depends on restricting the reproduction number, or R₀, to less than 1.0 [5,6]. In practice, this proved difficult. Early on, it was clear that the spread of the virus was not contained by isolating affected individuals [5] due to the prevalence of asymptomatic or mildly symptomatic diseases (both can go undiagnosed during surveillance) and the infectiousness of the virus. Another concern was the possible pre-symptomatic transmission from infected individuals. Importantly, contact tracing studies have demonstrated that infectiousness peaks just before the onset of symptoms and that pre-symptomatic and moderately symptomatic individuals frequently transmit the virus [7], while completely asymptomatic individuals also transmit the virus [8].

The implications of these epidemiological characteristics for preventing the pandemic were catastrophic for a country like Bangladesh. Through the pandemic, the Bangladesh government and public health officials have adopted several strategies to slow local transmission of the virus. These have included lockdowns, limiting personal movement, contact tracing, extensive testing, mandatory use of masks, school shutdowns, limiting international arrivals, imposition of thermal scanners, and quarantine for international arrivals [9]. However, these strategies have not met with the same success in Bangladesh as they have in other locations [10]. Several factors have contributed to this outcome in Bangladesh, such as high population density (up to ~46,000/km² in the capital, Dhaka). This high population density, coupled with low income, means that access to basic sanitation measures is not guaranteed, and social distancing is often not possible [9].

Lockdowns were used effectively to slow transmission in other countries. This measure could not be used extensively in Bangladesh due to the catastrophic impact of lockdowns would have on the finances of low-income families [11]. Finally, additional elements have contributed to the increased transmission and failure to control COVID-19 in Bangladesh, including poor access to medical care and testing facilities [2,9,12], poor communication and coordination among government bodies, and large gatherings during lockdown [9].

Thus, it is important to identify effective responses to COVID-19 in Bangladesh and measure their impact to ensure their efficacy. However, accurate measures of the impact of any public health policy are likely going to be hindered by poor estimates of the actual number of cases and, thus, the basic reproduction number R₀ and other epidemiological parameters of interest [12]. The reasons for the difficulty in obtaining reliable estimates include the previously noted inadequate access to testing centers, which are frequently situated in cities and not in rural regions, and low testing rates; as of 30 June 2021, the WHO reported a testing rate of 105/100,000/week [2].

Despite the challenges, excellent efforts were made to sequence SARS-CoV-2 from as many cases as possible, with 2485 sequences deposited in GISAID on 30 June 2021. The genomic data enabled researchers to find novel, regularly mutating positions that correlate with clade-defining sites, which can help in monitoring the spread of viruses and their various clades. It can also be used to estimate independent measurements of R₀, the influence of different public health approaches on transmission rates, and the extent of underreporting [13]. In this study, the whole genomes of 660 SARS-CoV-2 samples collected between April 2020 and June 2021 were sequenced, and 1305 sets of sequencing data from the public domain were analyzed to infer genomic variants, lineages, phylodynamic, and mutational patterns.

2. Materials and Methods

2.1. Setting and Data Sources

The Bangladesh Council of Scientific and Industrial Research (BCSIR) sequenced COVID-19-positive samples from public health screening clinics in the eight administrative districts of Bangladesh. Over this period, the total number of cases rose from ~10,000 to 760,000 [WHO weekly Coronavirus disease (COVID-19) Bangladesh situation report]. A representative sample set for sequencing was randomly selected to reflect the relative population proportions in each district [2022 census] and is presented in Table 1. The Human Research Ethics Committee at the National Institute of Laboratory Medicine and Referral Center (NILMRC) approved the whole-genome sequencing of SARS-CoV-2.

2.2. Nucleotide Extraction and Sequencing

Viral nucleic acid was extracted from nasopharyngeal specimens using the PureLink™ Viral RNA/DNA Mini Kit (Thermo Fisher Scientific, Waltham, MA, USA). cDNA was generated using random hexamer primed reverse transcription; specifically, 20 μL of RNA extract, 660 μM dNTPs, 5 × RT Improm II reaction buffer (Promega, Madison, WI, USA), 50 ng hexanucleotides, 1.5 mM MgCl2, 20 U RNasin^® Plus RNase Inhibitor (Promega), and 1U of ImProm-II™ Reverse Transcriptase (Promega). SARS-CoV-2 positive specimens were identified using the Novel Coronavirus (2019-nCoV) Nucleic Acid Diagnostic Kit (Sansure Biotech, Hunan, China detecting N-gene and ORF 1ab-gene). Each specimen’s virus load was determined using an RT-qPCR assay targeting a conserved region of the envelope gene. Sequencing-ready libraries were prepared using cDNA from the CoV sample (CoVOC43), the viral pool sample (ViralPool) with Nextera Flex for Enrichment (Illumina, San Diego, CA, USA, Catalog no. 20025524), and IDT for Illumina Nextera DNA UD Indexes (Illumina, Catalog no. 20027213). The total DNA input used for tagmentation was between 10 and 1000 ng, as recommended. After tagmentation and amplification, samples were enriched with the Respiratory Virus Oligos Panel (Illumina, Catalog no. 20042472). After enrichment, the prepared libraries were quantified, pooled, and loaded onto the MiniSeq™ System, producing data sets for each specimen comprising 76 base paired-end reads.

2.3. Bioinformatic Analysis for Generating Sequencing Data

FASTQ data sets were exported from the local run manager to BaseSpace Hub, Illumina. DRAGEN RNA Pathogen Detection V3.5.14 (BaseSpace) generated the consensus genome sequence for each sample, and consensus sequences were checked using the CZID platform [14]. The consensus sequence for each sample was uploaded in Genome Detective Virus Tools [15] to investigate the impact of mutations, both in relation to changes to protein-coding regions and encoded proteins. The consensus sequences were then compared to SARS-CoV-2 genomic data at the China National Center for Bioinformation (CNCB) [16] and Nextstrain.org. All genomic differences are with reference to the Wuhan-1 strain of SARS-CoV-2 (GenBank accession: MN908947.3). All sequences (n = 660) were uploaded to GISAID.

2.4. Nucleotide Substitution Analysis and Phylogeny

To complement the 660 sequences generated by BCSIR, we downloaded the remaining 1305 Bangladesh sequences (>28,000 bp) from GISAID for samples collected in the period from 1 April 2020 to 30 June 2021 for a total of 1965 sequences. We aligned each sequence to the Wuhan-1 reference genome sequence using MAFFT (v7.480—21 June 2021) [17] and then consolidated to a single multiple sequence alignment. The multiple sequence alignment was cleaned by replacing all non-ACGT bases with “-” using goalign replace (v0.3.4). We then used goalign clean seqs (v0.3.4) to remove all sequences with > 0.05 proportion of gap sites (“-”). Gappy sites and sites with no phylogenetic information were removed using a clipkit in the epic-smart-gap mode (v1.1.3 with a patch to work with BioPython v1.79) [18]. The alignment was then compressed by removing duplicate sequences with goalign dedup (v0.3.4) and by removing duplicate columns with goalign compress (v0.3.4). The sequence alignment was then used as input for building a maximum likelihood phylogenetic tree using FastTree (v2.1.10 with double precision [18] using the -fastest and -nosupport options). Multifurcations in the inferred tree were resolved using gotree resolve (v0.4.1). We estimated the branch lengths of the resolved tree from the previous step using the column weights obtained from goalign dedup and the alignment used in FastTree as input into raxml-ng-evaluate (v 1.0.2 released on 22 February 2021 [19] with the option -blmin 0.0000000001 and assuming a GTR+G4 substitution model [20]). The final tree was then processed with a gotree brlen round to precision 10⁻⁶ and gotree collapse length to remove zero-length branches. Duplicate samples (previously removed) were re-inserted in the tree using gotree repopulate with duplicate labels being accessed through the deduplication list produced with gotree dedup above. Finally, we used Newick utilities v1.6 [21] to reorder (nw_order) and root the tree on the Wuhan-1 branch (nw_root). Phylogenetic trees were visualized and edited using FigTree ver. 1.4 (http://tree.bio.ed.ac.uk/software/figtree/; accessed on 1 June 2021).

2.5. Lineage Assignment

Sequences were assigned to lineages as described by the hierarchical Pangolin nomenclature scheme [22]. The following versions were used for each subcomponent of the software: pangolin (3.1.11), pangoLEARN (2021-08-21), scorpio (0.3.12), constellations (0.0.15), and designations (1.2.76).

2.6. Number of Importation Events

To infer the number of importation events, geographic locations (Bangladesh or abroad) were treated as discrete states in a phylogeographic model [23]. This model describes geographic movement along a phylogenetic tree as the result of a Markovian process, where Markov jumps from abroad to Bangladesh correspond to importation events (Figure 1). The model was set up in BEAST1.10 [24], but due to the size of the data set, a fixed phylogenetic time tree was used as in previous studies of SARS-CoV-2 phylogeographic [25], instead of a sequence alignment.

3. Results

3.1. Characterization of Samples

We analyzed 660 SARS-CoV-2 genomes from samples collected between April 2020 and June 2021 and sequenced by BCSIR (Table 1; sample metadata is presented in Table S1). The age (median 40.5 years, range 0–95 years) and gender profile (Male, 67%; Female, 33%) of the patients reflect the profile of COVID-19 cases in Bangladesh. Sequenced samples from the Dhaka division (59.1%) are overrepresented in the dataset when compared to the population proportion profile (26%) but consistent with the proportion of COVID-19 cases (63.9% of cases).

3.2. Phylogenetic Analysis of SARS-CoV-2

We compared the 660 SARS-CoV-2 genomes generated by BCSIR together with 1305 sequences from Bangladesh available in GISAID sequenced by other institutions; an overview of the relationship between the sequences is shown in Figure 2 as a tree inferred from a multiple sequence alignment of the 1965 genome sequences. Thirty-four lineages and sub-lineages were observed, dominated by B.1.1.25 and its sub-lineages D.* (979 sequences) and the Beta variant of concern (VOC) B.1.351 and its sub-lineages B.1.351.* (403 sequences). The Alpha VOC B.1.1.7/Q.* (92 sequences), Gamma VOC P.1.* (1 sequence), Delta VOC B.1.617.2/AY.* (77 sequences), and B.1.36.* lineages (75 sequences) were also observed. The remaining lineages occurred at a frequency of 1 to 108 (median 2 sequences) and included sequences assigned to higher-level Pango lineages such as B, B.1, and B.1.1. The phylogenetic clade containing 1076 B.1.1.25 sequences include 97 sequences that were either unassigned or assigned to higher-level lineages, most likely due to missing data at lineage-defining sites.

Lineages occurred in waves, with B.1.1.25/D.* lineages observed between April 2020 to March 2021, followed by Alpha lineages from December 2020 to June 2021, partly overlapping with Beta lineages from November 2020 to June 2021, and with Delta lineages appearing in April 2021 (Figure 3 and Figure 4). The earliest sequences that were assigned as B.1.1.25/D.* lineages in GISAID are from five samples collected in the United Kingdom and Germany on 31 March 2020 and 1 April 2020 [26]. However, 45/49 B.1.1.25 sequences from samples collected in April 2020 were submitted from Bangladesh. The clade structure in the phylogenetic analysis indicates that the 979 sequences assigned as B.1.1.25/D.* in the dataset likely result from multiple introductions to Bangladesh, some of which subsequently led to large outbreak clusters. Following the introduction of the B.1.1.25 lineage in late March/early April, it rapidly became dominant (Figure 3). In the dataset analyzed, the proportion of B.1.1.25 rose from 42.5% in April to 64.5% and 74.1% in May and June and then remained between 72 and 86% through to January 2021. During this time, there was also a consistent presence of B.1.36.* lineages, constituting between 3 and 12% of the total from May to December 2020. The B.1.1.25 and B.1.36.* lineages were rapidly replaced by the introduction of the Alpha and Beta VOCs in December 2021 (Figure 4).

3.3. Genomic Variations in SARS-CoV-2

The B.1.1.25/D.* sequences observed in the Bangladeshi dataset all contained mutations Orf1a: I300F, Orf1b: P314L, S: D614G, N: R203K, N: G204R. Among the sequences in the B.1.1.25 clade, a further 570 non-synonymous (missense) mutations, 426 synonymous mutations, 18 frameshift mutations, 7 inframe deletions, 2 inframe insertions, 10 mutations affecting start/stop codons, and 64 mutations located in intergenic or untranslated regions were observed in the lineage (Supplementary Table S1), Table S3 shows the most prevalent mutations present in the B.1.1.25 clade and spike mutations present in at least two samples. In addition to the canonical mutations for the lineage, other mutations that have been associated with changes in biological characteristics of the virus and presence in VOCs include S: P681R (n = 197), S: L452R (n = 10), S: E484K (n = 9), and S: L452Q (n = 2). Across the samples in B.1.1.25 clade, we observe nine deletions (two deletions in non-coding regions and seven resulting in in-frame deletions); most deletions were detected at a low frequency, and some may represent technical errors introduced during the sequencing processing. While we are hesitant to over-interpret the significance of these deletions, there was a notable presence in eleven samples of the del: HV69/70 deletion in the spike protein characteristic of the SARS-CoV-2 lineage in the UK [27].

3.4. Importation Dynamics

The analysis of the Markov jump revealed a median of 31 importation events into Bangladesh from any other country (95% credible interval: 27–36). Importantly, this may be an underestimate due to the sampling effort within Bangladesh and abroad being neither random nor exhaustive at that time. The model also provided an estimate of the importation rate, which had a median of 0.37 importations per lineage per year (95% credible interval: 0.05–1.30), an estimate that carries the same caveats as the number of importation events. These parameters indicate an average of about 45 genomes per importation event (median: 46.90; 95% credible interval: 40.39–53.85).

4. Discussion

Across the world, the COVID-19 pandemic consisted of successive waves of dominating SARS-CoV-2 lineages, including in Bangladesh. During the first wave in Bangladesh, from March to November 2020, the B.1.1.25 lineage was particularly successful. Understanding the genomic characteristics of such successful lineages, together with the epidemiological data, can help with the retrospective evaluation of public health responses.

The B.1.1.25 lineage and its sub-lineages D.2-D.5 are defined by the mutations Orf1a: I300F, Orf1b: P314L, S: D614G, N: R203K, and N: G204R. The Orf1b: P314L and S: D614G mutations are present in all five lineages considered variants of concern (VOC): Alpha, Beta, Gamma, Delta, and Omicron. The two mutations in the nucleocapsid gene, R203K and G204R, are also found in the Alpha, Gamma, and Omicron lineages. The Orf1a: I300F mutation is not seen in any of the VOC or variant of interest (VOI) lineages and outside the B.1.1.25 lineage, seemingly only extensively found in the B.1.1.315/AD.* lineage. The described effects of these mutations range from increased transmissibility to the appearance of only being lineage-associated. The genes encoding the spike and nucleocapsid proteins have an important role in the adaption and evolution of SARS-CoV-2. In other related viruses, these proteins have a key role in the initial interaction between the virus particle and a susceptible new host [28].

The Orf1a: I300F and Orf1b: P314L mutations have not been associated with a change in biological activity, with the former likely a lineage-specific mutation, and the latter is strong linkage-disequilibrium with the S: D614G mutation [29]. However, enhanced phenotypic characteristics have been described for the S: D614G, N: R203K, and N: G204R mutations. The mutation (S: D614G) within the spike protein close to the receptor binding domain emerged simultaneously in several geographical areas of the world in March 2020, rapidly became dominant, and is now ubiquitous in current SARS-CoV-2 lineages [30]. It is thought to be associated with a moderate advantage in infectivity and transmissibility [31]. It has been shown that retroviruses pseudotyped with SG614 can infect ACE2-expressing cells more effectively than the S: D614 [32]. The N: R203K and N: G204R mutations have been associated with increased infectivity of the virus, which is proposed to be mediated through increased effectiveness of RNA packaging. Experiments using viral pseudo particles showed that the N: R203K mutation was associated with a 10-fold increase in mRNA delivery and expression, and a reverse genetics model resulted in 50-fold increased viral titers [33]. Nucleocapsid protein can induce both cell-mediated and humoral immune responses and has possible utility in vaccine production [34]. The co-occurring mutations N: R203K and N: G204R have also shown increased infectivity in human lung cells and hamsters [35]. While the S: D614G mutation became fixed in the SARS-CoV-2 B-lineage early in the pandemic, the N: R203K and N: G204R mutations appear to have emerged independently multiple times in the history of SARS-CoV-2, including among several VOCs. The convergent evolution around these sites is indicative of the significant advantage of the alternate variant.

Among the B.1.1.25 sequences studied in this work, there were mutations associated with altered biological properties of the virus that occurred at minor frequencies, such as S: P681R, S: L452R, and S: E484K. The S:681R mutation was found in 95% of samples in the largest sub-clade within the B.1.1.25 clade, suggestive of an associated transmission advantage. The S: P681R has also been observed in the Alpha, Gamma, Delta, and Omicron VOCs. The mutation is located next to the furin-binding pocket and has been shown to enhance the cleavage of the full-length spike protein into its subunits, which are thought to improve viral cell entry [36]. The S: L452R mutation, which was also found in Delta and Omicron, has been associated with enhanced cleavage of the spike protein and increased ability to infect lung tissues of humanized ACE2 mice [37]. The S: E484K mutation has been a source of concern with changes in the residue associated with immune escape and convergent evolution observed in several lineages, including near universal presence in Beta and Gamma, and to a lesser extent, Alpha and Delta VOCs. Mutations of the S:484 residue, located in the receptor binding domain, are thought to increase the human ACE2 binding capacity of the spike protein. In experimental systems the S: E484K mutation reduces the neutralization of convalescent sera, including post-vaccination sera [36]. Trials have also indicated reduced susceptibility to some monoclonal antibodies for lineages carrying S: E484K. While most of these mutations did not have a profound effect in the B.1.1.25 population of Bangladesh, the emergence and subsequent success of the S: P681R in 20% of the B.1.1.25 samples shows the essential role that genomic surveillance can play in identifying changes in the epidemiology of the virus and changes in characteristics such as morbidity, mortality, and transmissibility [38].

The B.1.1.25 lineage showed prolonged dominance in Bangladesh in 2020. The initial wave of COVID-19 in Bangladesh was associated with a diversity of lineages, similar to other countries, but the B.1.1.25 lineage effectively outcompeted most other lineages following its introduction. This may indicate that the B.1.1.25 lineage had a transmission advantage compared with contemporary lineages, associated with mutations subsequently seen in several VOCs. However, epidemiological factors may have contributed to the rapid expansion of this lineage. The closure of the international border on 21 March 2020 meant a reduction in the introduction of new lineages into the country and a competitive advantage over lineages already present in the country. Similarly, the rapid dissemination of COVID-19 to regional areas associated with the mass migration events following the declaration of a prolonged National General Holiday and closures of workplaces likely also contributed to the dispersion of B.1.1.25 across the country and founder effects in regional areas [39]. Interestingly, B.1.1.25 was introduced into Australia via a traveler from Bangladesh and a breach in quarantine procedures was the cause of the significant second wave, mainly in the state of Victoria [40]. It is possible that an efficient transmission dynamic associated with the B.1.1.25 lineage contributed to the emergence and rapid spread of infections in Australia despite highly stringent public health restrictions and follow-up procedures at the time.

The B.1.1.25 lineage likely originated in Europe and was introduced to Bangladesh via travel. More than 90% of the B.1.1.25 sequences in GISAID from samples collected in April 2020 were submitted from Bangladesh. Although the earliest sequences assigned to this lineage were collected and submitted from the United Kingdom and Germany, given that the earliest date of B. 1.1.25 detection in Bangladesh was 31 March 2020, it is possible that this lineage originated in Bangladesh but went unnoticed due to inadequate surveillance. In the phylogenetic analysis of our dataset, there are 40 sequences at the base of the B.1.1.25 clade, making it difficult to distinguish whether the clade arose from single or multiple introductions. The available sequences only represent a small proportion of cases and would be affected by sequencing sampling strategies at the time, making it difficult to conclusively describe the emergence and international dispersal of the B.1.1.25 lineage. However, the importation analysis shows evidence of at least 31 introductions of the B.1.1.25 lineage to the country. Previous analyses of data from Bangladesh have suggested at least two introductory events of the B.1.1.25 lineage, but these phylogeographic analyses presented in this work allow a more detailed understanding of the genomic diversity and their geographic context.

In this analysis, a total of 2585 nucleotide mutation events were observed relative to the reference genome (MN908947.3), and these mutations occurred in 633 different positions in 25 different proteins. This finding suggests that mutational diversity was strikingly high in Bangladesh. However, this is possibly inflated as a result of effective efforts to sequence a diverse set of cases. The virus naturally acquires additional mutations in various areas as time passes. However, no unique lineages have been found in the various divisions of Bangladesh that would support a background of higher mutation.

The emergence of highly transmissible variants of SARS-CoV-2 elsewhere in the world underscores the importance of genomic monitoring. The strategy of obtaining a representative sample of the prevalent virus genomic type circulating in Bangladesh is important for the local management of disease and for understanding the virus’s global evolution during the pandemic. Continued genomic monitoring allows the detection of new variant incursions and, perhaps, the local evolution of variants of concern.

5. Conclusions

We present an overview of the brief history of SARS-CoV-2 in Bangladesh by using a representative sample of patients from throughout the country. We propose that SARS-CoV-2 was introduced into Bangladesh on several occasions. Although many introductions may have been overlooked since most cases have not been sequenced, one introduction event led to the dominant lineage in Bangladesh; this introduction is likely to have been via Europe, given the presence of characteristic mutations in samples from the dominant lineage. It is inevitable that as time progresses, the virus accumulates more independent mutations in different locations. However, at this stage, there is no evidence of significant differences in the dominant lineage in Bangladesh that may be an indicator of a significant local shift in the epidemiology of SARS-CoV-2 in Bangladesh. The sampling of the cases in this study may be sparse, and it is unlikely to have captured all the divergence that has occurred. However, this is a comprehensive and important baseline study that enables us to understand the epidemiology of SARS-CoV-2 in Bangladesh.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/v17040517/s1, Table S1: Frequency and characteristics of mutations in the dataset; Table S2: Accession numbers and associated metadata for BCSIR samples; Table S3: Overview of Differences in the 1076 genome sequences classified as B.1.1.25 by clade.

Author Contributions

A.S.M.M. conceived the study. M.S.K., M.A.H., S.A., B.G., M.M.H.S., I.J. and T.A.B. were responsible for data collection and sequencing. A.S.M.M., P.A., D.B., A.G.d.S., T.S., B.P.H., T.P.S., T.T. and C.L. curated and verified the data. A.S.M.M., P.A., S.D., D.B., A.G.d.S. and T.S. were responsible for the formal analysis, visualization, and writing the initial draft. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the internal funding of Genomic Research Lab, Bangladesh Council of Scientific and Industrial Research, Bangladesh, and the Microbiological Diagnostic Unit Public Health Laboratory, Doherty Institute, Australia (Project code: 224125200).

Institutional Review Board Statement

Ethical approval is not required as samples from suspected COVID-19 patients were collected and tested at the NILMRC but not at the BCSIR. However, the whole-genome sequencing of SARS-CoV-2 was approved by the human research ethics committee of the National Institute of Laboratory Medicine and Referral Center (NILMRC/2020/001, Approval date: 14 March 2020).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patient(s) to publish this paper.

Data Availability Statement

The BCSIR-sequenced 660 SARS-CoV-2 genomes available at GISAID are presented in the table (see Table S2). Moreover, 1365 SARS-CoV-2 sequences sequenced by other institutions in Bangladesh were collected from GISAID.

Acknowledgments

The authors acknowledge the Bangladesh Council of Scientific and Industrial Research and the University of Melbourne for supporting the study, recognize the National Institute of Laboratory Medicine and Referral Center (NILMRC), Illumina channel partner Invent Technologies Limited, submitters of SARS-CoV-2 sequence data to the GISAID database, the database managers, developers, and scientists associated with GISAID. Leo Featherstone (University of Melbourne) provided feedback on earlier versions of the manuscript and phylogeographic analyses.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhou, P.; Yang, X.-L.; Wang, X.-G.; Hu, B.; Zhang, L.; Zhang, W.; Si, H.-R.; Zhu, Y.; Li, B.; Huang, C.-L.; et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 2020, 579, 270–273. [Google Scholar] [CrossRef] [PubMed]
WHO. WHO Bangladesh COVID-19 Morbidity and Mortality Weekly Update (MMWU); WHO: Geneva, Switzerland, 2021; Volume 70. [Google Scholar]
Mohmmad Mahmud, A.S.; Taznin, T.; Hasan Sarkar, M.M.; Uzzaman, M.S.; Osman, E.; Habib, M.A.; Akter, S.; Banu, T.A.; Goswami, B.; Jahan, I.; et al. The genetic variant analyses of SARS-CoV-2 strains; circulating in Bangladesh. bioRxiv 2020. [Google Scholar] [CrossRef]
Parvin, R.; Afrin, S.Z.; Begum, J.A.; Ahmed, S.; Nooruzzaman, M.; Chowdhury, E.H.; Pohlmann, A.; Paul, S.K. Molecular Analysis of SARS-CoV-2 Circulating in Bangladesh during 2020 Revealed Lineage Diversity and Potential Mutations. Microorganisms 2021, 9, 1035. [Google Scholar] [CrossRef]
Fraser, C.; Riley, S.; Anderson, R.M.; Ferguson, N.M. Factors that make an infectious disease outbreak controllable. Proc. Natl. Acad. Sci. USA 2004, 101, 6146–6151. [Google Scholar]
Anderson, R.M.; May, R.M. Infectious Diseases of Humans: Dynamics and Control; Oxford University Press: Oxford, UK, 1992. [Google Scholar]
Johansson, M.A.; Quandelacy, T.M.; Kada, S.; Prasad, P.V.; Steele, M.; Brooks, J.T.; Slayton, R.B.; Biggerstaff, M.; Butler, J.C. SARS-CoV-2 transmission from people without COVID-19 symptoms. JAMA Netw. Open 2021, 4, e2035057. [Google Scholar]
Qiu, X.; Nergiz, A.I.; Maraolo, A.E.; Bogoch, I.I.; Low, N.; Cevik, M. The role of asymptomatic and pre-symptomatic infection in SARS-CoV-2 transmission—A living systematic review. Clin. Microbiol. Infect. 2021, 27, 511–519. [Google Scholar] [CrossRef] [PubMed]
Anwar, S.; Nasrullah, M.; Hosen, M.J. COVID-19 and Bangladesh: Challenges and How to Address Them. Front. Public Health 2020, 8, 154. [Google Scholar]
Seemann, T.; Lane, C.R.; Sherry, N.L.; Duchene, S.; da Silva, A.G.; Caly, L.; Sait, M.; Ballard, S.A.; Horan, K.; Schultz, M.B.; et al. Tracking the COVID-19 pandemic in Australia using genomics. Nat. Commun. 2020, 11, 4376. [Google Scholar] [PubMed]
Hamadani, J.D.; Hasan, M.I.; Baldi, A.J.; Hossain, S.J.; Shiraji, S.; Bhuiyan, M.S.A.; Mehrin, S.F.; Fisher, J.; Tofail, F.; Tipu, S.M.M.U.; et al. Immediate impact of stay-at-home orders to control COVID-19 transmission on socioeconomic conditions, food insecurity, mental health, and intimate partner violence in Bangladeshi women and their families: An interrupted time series. Lancet Glob. Health 2020, 8, e1380–e1389. [Google Scholar]
Biswas, R.K.; Afiaz, A.; Huq, S. Underreporting COVID-19: The curious case of the Indian subcontinent. Epidemiol. Infect. 2020, 148, e207. [Google Scholar]
Attwood, S.W.; Hill, S.C.; Aanensen, D.M.; Connor, T.R.; Pybus, O.G. Phylogenetic and phylodynamic approaches to understanding and combating the early SARS-CoV-2 pandemic. Nat. Rev. Genet. 2022, 23, 547–562. [Google Scholar] [PubMed]
Kalantar, K.L.; Carvalho, T.; de Bourcy, C.F.A.; Dimitrov, B.; Dingle, G.; Egger, R.; Han, J.; Holmes, O.B.; Juan, Y.F.; King, R.; et al. IDseq—An Open Source Cloud-based Pipeline and Analysis Service for Metagenomic Pathogen Detection and Monitoring. bioRxiv 2020. [Google Scholar] [CrossRef]
Vilsker, M.; Moosa, Y.; Nooij, S.; Fonseca, V.; Ghysens, Y.; Dumon, K.; Pauwels, R.; Alcantara, L.C.; Vanden Eynden, E.; Vandamme, A.-M.; et al. Genome Detective: An automated system for virus identification from high-throughput sequencing data. Bioinformatics 2019, 35, 871–873. [Google Scholar]
National Genomics Data Center Members and Partners. Database Resources of the National Genomics Data Center in 2020. Nucleic Acids Res. 2019, 48, D24–D33. [Google Scholar]
Katoh, K.; Standley, D.M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar]
Steenwyk, J.L.; Buida, T.J., 3rd; Li, Y.; Shen, X.X.; Rokas, A. ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference. PLoS Biol. 2020, 18, e3001007. [Google Scholar]
Kozlov, A.M.; Darriba, D.; Flouri, T.; Morel, B.; Stamatakis, A. RAxML-NG: A fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 2019, 35, 4453–4455. [Google Scholar]
Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods. J. Mol. Evol. 1994, 39, 306–314. [Google Scholar] [PubMed]
Junier, T.; Zdobnov, E.M. The Newick utilities: High-throughput phylogenetic tree processing in the UNIX shell. Bioinformatics 2010, 26, 1669–1670. [Google Scholar]
Rambaut, A.; Holmes, E.C.; O’Toole, Á.; Hill, V.; McCrone, J.T.; Ruis, C.; du Plessis, L.; Pybus, O.G. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 2020, 5, 1403–1407. [Google Scholar]
Lemey, P.; Rambaut, A.; Drummond, A.J.; Suchard, M.A. Bayesian phylogeography finds its roots. PLoS Comput. Biol. 2009, 5, e1000520. [Google Scholar]
Suchard, M.A.; Lemey, P.; Baele, G.; Ayres, D.L.; Drummond, A.J.; Rambaut, A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018, 4, vey016. [Google Scholar]
Duchene, S.; Featherstone, L.; Freiesleben de Blasio, B.; Holmes, E.C.; Bohlin, J.; Pettersson, J.H. The impact of public health interventions in the Nordic countries during the first year of SARS-CoV-2 transmission and evolution. EuroSurveillance 2021, 26, 2001996. [Google Scholar]
Mercatelli, D.; Giorgi, F.M. Geographic and Genomic Distribution of SARS-CoV-2 Mutations. Front. Microbiol. 2020, 11, 1800. [Google Scholar]
Wilton, T.; Bujaki, E.; Klapsa, D.; Majumdar, M.; Zambon, M.; Fritzsche, M.; Mate, R.; Martin, J. Rapid Increase of SARS-CoV-2 Variant B.1.1.7 Detected in Sewage Samples from England between October 2020 and January 2021. mSystems 2021, 6, e0035321. [Google Scholar] [CrossRef]
Benvenuto, D.; Giovanetti, M.; Ciccozzi, A.; Spoto, S.; Angeletti, S.; Ciccozzi, M. The 2019-new coronavirus epidemic: Evidence for virus evolution. J. Med. Virol. 2020, 92, 455–459. [Google Scholar]
Flores-Alanis, A.; Cruz-Rangel, A.; Rodríguez-Gómez, F.; González, J.; Torres-Guerrero, C.A.; Delgado, G.; Cravioto, A.; Morales-Espinosa, R. Molecular Epidemiology Surveillance of SARS-CoV-2: Mutations and Genetic Diversity One Year after Emerging. Pathogens 2021, 10, 184. [Google Scholar] [CrossRef]
Isabel, S.; Graña-Miraglia, L.; Gutierrez, J.M.; Bundalovic-Torma, C.; Groves, H.E.; Isabel, M.R.; Eshaghi, A.; Patel, S.N.; Gubbay, J.B.; Poutanen, T.; et al. Evolutionary and structural analyses of SARS-CoV-2 D614G spike protein mutation are now documented worldwide. Sci. Rep. 2020, 10, 14031. [Google Scholar]
Volz, E.; Hill, V.; McCrone, J.T.; Price, A.; Jorgensen, D.; O’Toole, Á.; Southgate, J.; Johnson, R.; Jackson, B.; Nascimento, F.F.; et al. Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity. Cell 2021, 184, 64–75.e11. [Google Scholar]
Zhu, N.; Zhang, D.; Wang, W.; Li, X.; Yang, B.; Song, J.; Zhao, X.; Huang, B.; Shi, W.; Lu, R.; et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. N. Engl. J. Med. 2020, 382, 727–733. [Google Scholar]
Syed, A.M.; Taha, T.Y.; Tabata, T.; Chen, I.P.; Ciling, A.; Khalid, M.M.; Sreekumar, B.; Chen, P.Y.; Hayashi, J.M.; Soczek, K.M.; et al. Rapid assessment of SARS-CoV-2–evolved variants using virus-like particles. Science 2021, 374, 1626–1632. [Google Scholar] [CrossRef]
Pehrsson, E.C.; Tsukayama, P.; Patel, S.; Mejía-Bautista, M.; Sosa-Soto, G.; Navarrete, K.M.; Calderon, M.; Cabrera, L.; Hoyos-Arango, W.; Bertoli, M.T.; et al. Interconnected microbiomes and resistomes in low-income human habitats. Nature 2016, 533, 212–216. [Google Scholar] [CrossRef]
Saito, A.; Irie, T.; Suzuki, R.; Maemura, T.; Nasser, H.; Uriu, K.; Kosugi, Y.; Shirakawa, K.; Sadamasu, K.; Kimura, I.; et al. Enhanced fusogenicity and pathogenicity of SARS-CoV-2 Delta P681R mutation. Nature 2022, 602, 300–306. [Google Scholar] [CrossRef]
Liu, Y.; Liu, J.; Johnson, B.A.; Xia, H.; Ku, Z.; Schindewolf, C.; Widen, S.G.; An, Z.; Weaver, S.C.; Menachery, V.D.; et al. Delta spike P681R mutation enhances SARS-CoV-2 fitness over Alpha variant. Cell Rep. 2022, 39, 110829. [Google Scholar] [CrossRef] [PubMed]
Zhang, L.; Dutta, S.; Xiong, S.; Chan, M.; Chan, K.K.; Fan, T.M.; Bailey, K.L.; Lindeblad, M.; Cooper, L.M.; Rong, L.; et al. Engineered ACE2 decoy mitigates lung injury and death induced by SARS-CoV-2 variants. Nat. Chem. Biol. 2022, 18, 342–351. [Google Scholar] [CrossRef] [PubMed]
Alizon, S.; Hurford, A.; Mideo, N.; Van Baalen, M. Virulence evolution and the trade-off hypothesis: History, the current state of affairs and the future. J. Evol. Biol. 2009, 22, 245–259. [Google Scholar] [CrossRef] [PubMed]
Cowley, L.A.; Afrad, M.H.; Rahman, S.I.A.; Mamun, M.M.A.; Chin, T.; Mahmud, A.; Rahman, M.Z.; Billah, M.M.; Khan, M.H.; Sultana, S.; et al. Genomics, social media, and mobile phone data enable mapping of SARS-CoV-2 lineages to inform health policy in Bangladesh. Nat. Microbiol. 2021, 6, 1271–1278. [Google Scholar] [CrossRef]
Lane, C.R.; Sherry, N.L.; Porter, A.F.; Duchene, S.; Horan, K.; Andersson, P.; Wilmot, M.; Turner, A.; Dougall, S.; Johnson, S.A.; et al. Genomics-informed responses in the elimination of COVID-19 in Victoria, Australia: An observational, genomic epidemiological study. Lancet Public Health 2021, 6, e547–e556, Correction in Lancet Public Health 2021, 6, e708. [Google Scholar] [CrossRef]

Figure 1. Histograms of posterior importation events and importation rate, as estimated using Markov jumps.

Figure 2. A phylogenetic tree showing the relationship between the SARS-CoV-2 genome sequences from each of the 1965 Bangladesh samples. Tips are colored by Pango lineage, with sub-lineages collapsed into the parental lineage, e.g., all D.* sub-lineages are counted in the B.1.1.25, all AY* sub-lineages in the B.1.617.2, all Q.* sub-lineages in the B.1.1.7, etc. Samples occurring at a frequency of fewer than 15 samples or that were unassigned were grouped into “Other”. The ring around the circumference of the tree shows the division from which each sample was collected.

Figure 3. Proportional stacked area graph showing lineages over time in Bangladesh. Samples are collated by month and year of collection, and sub-lineages collapsed into the parental lineage, e.g., all D.* sub-lineages are counted in the B.1.1.25, all AY* sub-lineages in the B.1.617.2, all Q.* sub-lineages in B.1.1.7, etc. Samples occurring at a frequency of fewer than 15 samples or were unassigned were grouped into “Other”.

Figure 4. Dot plot graph showing lineages over time by division. The size of the dots is proportional to the number of samples observed for each lineage in each division for each day of collection. Dots are colored by Pango lineage, with sub-lineages collapsed into the parental lineage, e.g., all D.* sub-lineages are counted in the B.1.1.25, all AY* sub-lineages in the B.1.617.2, all Q.* sub-lineages in the B.1.1.7, etc. Samples occurring at a frequency of fewer than 15 samples or that were unassigned were grouped into “Other”.

Table 1. Geographic distribution (by Division) of patients from which virus genome sequence was obtained.

Division	* BCSIR Dataset	** Total Dataset	Population	COVID-19 Cases
Barishal	26 (3.9%)	70 (3.6%)	9,100,102 (5.5%)	2.4%
Chittagong	86 (13.0%)	295 (15.0%)	33,202,326 (20.1%)	13.4%
Dhaka	390 (59.1%)	1098 (55.9%)	44,215,107 (26.8%)	63.9%
Khulna	23 (3.5%)	91 (4.6%)	17,416,645 (10.5%)	6.1%
Mymensingh	14 (2.1%)	22 (1.1%)	12,225,498 (7.4%)	1.8%
Rajshahi	38 (5.8%)	78 (4.0%)	20,353,119 (12.3%)	5.6%
Rangpur	25 (3.8%)	44 (2.2%)	17,610,956 (10.7%)	3.4%
Sylhet	58 (8.8%)	116 (5.9%)	11,034,863 (6.7%)	3.4%
No data		151 (7.7%)
Total	660		165,158,616 ^†	327,349 ^

^† 2022 census; ^ 7 September 2020, WHO weekly Coronavirus disease (COVID-19) Bangladesh situation report. * BCSIR dataset: SARS-CoV-2 Sequenced by BCSIR. ** Total dataset: Gathered sequencing data from GISAID, incorporating the BCSIR dataset.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mohammad Mahmud, A.S.; Andersson, P.; Bulach, D.; Duchene, S.; da Silva, A.G.; Lin, C.; Seemann, T.; Howden, B.P.; Stinear, T.P.; Taznin, T.; et al. Molecular Epidemiology of SARS-CoV-2 in Bangladesh. Viruses 2025, 17, 517. https://doi.org/10.3390/v17040517

AMA Style

Mohammad Mahmud AS, Andersson P, Bulach D, Duchene S, da Silva AG, Lin C, Seemann T, Howden BP, Stinear TP, Taznin T, et al. Molecular Epidemiology of SARS-CoV-2 in Bangladesh. Viruses. 2025; 17(4):517. https://doi.org/10.3390/v17040517

Chicago/Turabian Style

Mohammad Mahmud, Abu Sayeed, Patiyan Andersson, Dieter Bulach, Sebastian Duchene, Anders Goncalves da Silva, Chantel Lin, Torsten Seemann, Benjamin P. Howden, Timothy P. Stinear, Tarannum Taznin, and et al. 2025. "Molecular Epidemiology of SARS-CoV-2 in Bangladesh" Viruses 17, no. 4: 517. https://doi.org/10.3390/v17040517

APA Style

Mohammad Mahmud, A. S., Andersson, P., Bulach, D., Duchene, S., da Silva, A. G., Lin, C., Seemann, T., Howden, B. P., Stinear, T. P., Taznin, T., Habib, M. A., Akter, S., Banu, T. A., Sarkar, M. M. H., Goswami, B., Jahan, I., & Khan, M. S. (2025). Molecular Epidemiology of SARS-CoV-2 in Bangladesh. Viruses, 17(4), 517. https://doi.org/10.3390/v17040517

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Molecular Epidemiology of SARS-CoV-2 in Bangladesh

Abstract

1. Introduction

2. Materials and Methods

2.1. Setting and Data Sources

2.2. Nucleotide Extraction and Sequencing

2.3. Bioinformatic Analysis for Generating Sequencing Data

2.4. Nucleotide Substitution Analysis and Phylogeny

2.5. Lineage Assignment

2.6. Number of Importation Events

3. Results

3.1. Characterization of Samples

3.2. Phylogenetic Analysis of SARS-CoV-2

3.3. Genomic Variations in SARS-CoV-2

3.4. Importation Dynamics

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI