1. Introduction
Since the first identification of the SARS-CoV-2 virus in Wuhan [
1], China, SARS-CoV-2 has spread throughout the world and caused a pandemic. Cases of COVID-19 have been reported in Nepal since its first case of COVID-19, a student returning from Wuhan [
2]. Nepal, along with India, has suffered two waves of coronavirus infections, one in 2020 and another in 2021. As of September 2021, the government of Nepal has reported over 700,000 cases and over 10,000 deaths attributed to COVID-19 [
3].
Like any virus found in nature, SARS-CoV-2 has undergone various rounds of mutations since infecting humans in 2019. The mutation rate of SARS-CoV-2 has been estimated to be about (1.19–1.31) × 10
−3/site/year [
4]. The SARS-CoV-2 genome encodes for at least 29 proteins. Sixteen are non-structural proteins, first translated as one protein, ORF1ab. Later, this protein is proteolytically cleaved into the 16 non-structural proteins. From other open reading frames, four structural proteins—S, N, E, and M—are transcribed and translated. There are other accessory proteins—NS3a, NS3b, NS6, NS7a, NS7b, NS8, NS9a, NS9b, and NS10—encoded by the genome of the virus [
5]. Although mutations can happen in the genomic region encoding for any of these proteins, mutations in the S (Spike) protein, which encodes for a viral receptor that binds to angiotensin-converting enzyme (ACE2) are particularly important. Mutations in the S protein are known to contribute to higher transmissibility and infectivity of the virus [
6,
7].
One of the first mutations seen in the virus was the D614G mutation in the S protein, which has been hypothesized to lead to higher transmissibility and infectivity of the virus [
8]. Globally this substitution mutation has become dominant among SARS-CoV-2 sequences. Since the pandemic began, next-generation sequencing techniques have been refined to better sequence SARS-CoV-2 strains [
9]. Sequence information is mostly deposited in the GISAID database and analyzed using various software products developed in the last two years [
4,
10]. By September 2021, over 3 million different SARS-CoV-2 sequences had been deposited in the GISAID (
https://www.gisaid.org/) (accessed on 2 September 2021) database. Among these sequences, over 180 belong to samples from Nepal.
Analysis of sequences has informed us about the evolution and spread of variants of SARS-CoV-2. Different variants of SARS-CoV-2 are named according to different nomenclature systems; PANGO, GISAID, and Nextstrain systems are the prominent nomenclature systems [
4,
11]. In this paper we have primarily used the NextStrain nomenclature, which names strains according to the year they most spread. The strain names thus start with numbers 19, 20, and 21. The WHO has classified variants based on how fast they transmit and how much illness they cause. Variants are classified as variants of concern (VOCs) and variants of interest (VOIs). So far there are four VOCs: alpha, beta, gamma, and delta [
12].
The alpha variant was first discovered in Kent, England in September 2020. It is also called B.1.1.7 according to the PANGO nomenclature and 29I according to the NextStrain nomenclature. Its prominent mutations in the spike proteins are 69−70 deletion, N501Y, and P681H. This variant is believed to be 29% more transmissible than the original Wuhan strain [
13]. It was the first variant to be designated the title of variant of concern. After the alpha variant, variants beta and gamma were discovered. The beta variant was first sequenced in May 2020 in South Africa, and the gamma variant was first discovered in Brazil in November 2020. The beta and gamma variants are thought to be 25% and 38% more transmissible than the original Wuhan strain, respectively [
14]. The beta strain has K417T, E484K, and N501Y mutations in the spike protein. The gamma strain has K417T, E484K, and N501Y mutations in the spike protein. The delta strain, which was first discovered in October 2020 in India, has so far demonstrated the highest transmissibility rate. While the alpha, beta, and gamma strains share common spike protein mutations, delta has its own spike protein mutation signature. It has L452R, T478K, and P681R substitutions in the S protein. It is believed to be 97% more transmissible than the Wuhan strain [
13]. As of September 2021, it is the predominant variant globally.
There are several other variants of interest identified throughout the world. The one that arose in India, and has also spread to Nepal to some extent, is called the kappa variant. Both kappa and delta variants arose from a common ancestor of SARS-CoV-2. Both carry L452R and P581R mutations in the spike protein. However, kappa carries an E484Q mutation and not the T478K mutation found in the delta strain. By most measures, the delta strain has outcompeted the kappa strain.
Since 2020, COVID-19 has been one of the biggest stories in Nepal. Much of the country has been under lockdown for major parts of 2020 and 2021. Business, tourism, and education have been some of the hardest-hit sectors. Most of the decisions on lockdown have been based on the numbers of cases and deaths throughout the country. Since the beginning of the pandemic, a need for genomic surveillance of the SARS-CoV-2 strains raging through Nepal was felt. At the beginning, the country was very ill-equipped to sequence the strains. The first case of COVID-19 was sequenced abroad. There were no amino acid substitution mutations in the original patient [
2]. Later on, the Nepal Health Research Council (NHRC) felt a need to sequence the genome of the virus and sent samples abroad. The agency collected 15 COVID-19 samples and sequenced them after August 2020. The results of this report were published on the NHRC website (
http://nhrc.gov.np/) (accessed on 4 September 2021). Since then, the NHRC, Dhulikhel Hospital, and a number of other agencies have been sequencing genomes of the coronavirus from time to time. Although strict time interval sequencing has not been performed, the pooling of the sequencing results from various groups gives a picture of the variants that gave rise to the first and second waves of coronavirus infections in Nepal.
In this paper, we attempt to describe the different variants that have infected the population of Nepal during the pandemic of 2020 and 2021. We describe the different variants that have spread throughout the population over this time duration. Furthermore, using phylogenetic trees we attempt to trace back the spread of the variants within the country and from outside the country. We hope this paper gives a history of variants of SARS-CoV-2 in 2020 and 2021. Furthermore, we hope that this paper helps policymakers in Nepal plan a robust response to the pandemic.
3. Results
In this paper, we analyze the whole genome sequence data for the SARS-CoV-2 pandemic in Nepal. Before delving into details of the sequences, it would be pertinent to analyze the average new cases and deaths over the course of the two examined years. Nepal suffered from two waves of coronavirus infections, one peaking in October 2020 and another peaking in May 2021 (
Figure 1). In
Figure 1b, the average monthly new cases and average monthly deaths are plotted against time. Discernible COVID-19 cases started in May 2020 in Nepal. In the months of June and July of 2020, there were over 200 daily cases on average, but very few deaths. In the month of August 2020, cases and deaths increased in number. In the first wave, the average number of daily cases rose to 3000, and deaths peaked at less than 20 a day. Cases peaked in October 2020, whereas deaths peaked in November 2020. Although cases reached a baseline level in February and March of 2021, deaths from earlier infection possibly led to a rise in the death rate. The second wave began in April 2021 as cases picked up and peaked in May, when the average number of cases per day reached 8000. Deaths in May 2021 peaked at around 140 a day. After May 2021, deaths and new cases plummeted but remained significant until September 2021. The ratio of deaths to cases is plotted in
Figure 1a. The death rate (
Figure 1a) substantially rose after the introduction of the delta variant, indicating a higher death rate for the delta variant.
The next important question regards what variants of SARS-CoV-2 drove the first and second waves of coronavirus infection. We could get a very good answer to this question if there were a fixed number of whole genome sequences obtained throughout the country over a regular time interval of one month or so. Unfortunately, such data are not available for Nepal. Although genome sequencing was a top priority from the beginning of the pandemic, lack of infrastructure inside the country led to a paucity of sequencing data. Most of the sequencing information was obtained from facilities abroad at the initiative of the Nepal Health Research Council. From whatever information was provided in the GISAID database, we constructed a temporal map of the variants in Nepal. Only one sample was sequenced in January 2020. It belonged to a COVID-19 patient who traveled from Wuhan, China (
Figure 2a). The patient had the 19A strain with no amino acid substitution mutation. Samples collected from August 2020 to January 2021 had 20A and 20B variants. Thereafter, samples collected in 2021 were not from the 19A, 20A, or 20B clades (
Figure 2a). They had been replaced by alpha, kappa, and delta variants. Kappa variants appeared until May 2021. Alpha variants appeared until June 2021. Both these variants were outcompeted by the delta variant, which was the only one that appeared in July of 2021 (
Figure 2a). From this information, it can be hypothesized that all other variants have been outcompeted by the delta variant. Additionally, it can also be gleaned that the first wave was characterized by the 20A and 20B variants, whereas the second wave was characterized by the alpha, kappa, and delta variants.
In the next figure of the paper, we show the numbers of alpha, kappa, delta, 20A, 20B, and 19A variants sampled through sequencing (
Figure 2b). A majority (over 75 percent) of the samples contained the delta variant. The next most abundant variant was the alpha variant. We also present a figure (
Figure 2c) showing the numbers of samples sequenced over time. It is important to note that most of the samples were sequenced in 2021 when various variants of concern started arising globally.
Next, the S protein mutations occurring in the variants within Nepal were analyzed (
Figure 3). As expected, D614G mutation was found in all the samples except the first sample. The 20A strain found in Nepal had A570S, D936H, K1073N, and S477N mutations in the S protein. These mutations are not known to increase the transmissibility or infectivity of the SARS-CoV-2 virus. The alpha strain found in Nepal had A570D, D1118H, N501Y, S982A, T716I, and P681H mutations. All these mutations are found in the generic alpha variant originating from Kent, England. Additional mutations of interest were not found in Nepali strains in numbers greater than three. N501Y mutation was found in the receptor binding motif of the S protein and is thought to increase infectivity and transmissibility.
All delta and kappa variants had L452R and P681R mutations. Besides these mutations, the delta variant had T748K and T19R mutations. These two mutations helped increase the infectivity and transmissibility of the virus. L452R and T478K mutations lie in the receptor-binding motif and can be considered very significant mutations [
16]. The highly infective AY4 strain of the delta variant has not been found in Nepal. Other mutations in the S protein found in only a fraction of the delta variants in Nepal are T95I, D950N, A222V, W258L, K417N, R158G, and V1104L mutations. The K417N mutation occurs also in the beta variant, and this mutation on top of the delta variant first found in Nepal has been dubbed the delta plus variant. There is no indication whatsoever that this strain is more infectious or transmissible than the original delta strain. The kappa strain found in Nepal demonstrated G142D and E484Q mutations in the S protein. The significance of all these minor mutations in the S protein is unknown.
Besides looking at the amino acid substitution mutations in the S protein, we used the entire genome of the virus to construct the maximum likelihood phylogenetic tree. Sequences from Nepal were included along with global sequences to figure out where in the tree the sequences from Nepal lie. Two figures (
Figure 4 and
Figure 5) are dedicated to showing the phylogenetic tree radially and horizontally. In the radial tree (
Figure 4), sequences from Nepal are marked in red. Various NextStrain clades are shown in colors ranging from various shades of blue to gray. The mutation numbers range from 0 to over 50. Most of Nepal’s sequences lie in the delta clade. Even within the delta clade there are several subclades. Nepal’s sequences do not lie in just one subclade of delta, which would indicate single transmission from abroad and several local transmissions. Instead, sequences lie in various subclades, indicating several cases of transmission from foreign countries followed by local transmission. Similar is the case with alpha and other variants. This shows the global nature of COVID-19, where strains develop in one part of the country very quickly and robustly spread throughout the world.
Figure 5 shows our sequences in different colors. It can be seen that mutation numbers range from 0 to over 45. The first sequence from a student from Wuhan shows very few mutations (one mutation) and falls within the 19A clade. Other variants detected from Nepal are alpha, delta, kappa, 20A, and 20B, as initially described.
Next, sample subtrees showing local and global transmission events were constructed (
Supplementary Material File S1). A congregation of strains from one location in the phylogenetic tree demonstrates a local transmission event, whereas the appearance of our strains in different clades represents a global transmission event. In these three sample trees, both local and global transmission events can be detected. Samples were collected from different parts of Nepal. The sizes of the sample groups collected from different regions of Nepal are shown as blobs in
Figure 6. The region of Nepal where most of the samples were collected was the Central Development region, where Kathmandu, the capital city, is located.
To compare the impact of COVID-19 in Nepal versus that in the rest of the world, deaths per million population in Nepal and in countries with the highest and lowest death rates due to COVID-19 in different continents were plotted (
Figure 7). It was found that in all continents, there were countries that fared better and worse than Nepal. This shows that how countries handled the pandemic had a large bearing on the death rate. In general, death rates in Europe, North America, and South America were higher than that in Nepal.
4. Discussion
The SARS-CoV-2 pandemic of 2020 and 2021 left Nepal guessing during its ups and downs. It came out of nowhere in 2020 and upended the normal way of life. During the first wave, deaths in Europe and America were at their peak and Nepal was left scrambling for options and measures to implement. Although rates of deaths and new cases were substantially lower than those in Western countries, Nepal imposed a very strict lockdown during the first wave. After the first wave, it felt as though Nepal had taken extra precautions that were not necessary. Lockdowns were lifted and life returned to normal. Only a few months into normality, cases and deaths began rising in India and, subsequently, in Nepal. The alpha variant was raging, and the delta variant had just been identified. It was not certain what to make of the rising numbers of cases and deaths. Since India was showing a wave bigger than the first wave, Nepal immediately took precautionary measures and declared a second round of lockdown as strict as the first one.
To give us a clear understanding of what was going on, very regular sequencing would have been very helpful. Nepal could have, over time, tracked which variant was appearing in different parts of the country. Through sequencing initiatives, the transmissibility of a variant can be better understood, since a more transmissible variant displaces other variants. Very few samples were collected during the early phase of the second wave, and there is confusion as to which strain was dominant in which month. To better understand the SARS-CoV-2 pandemic, larger, representative samples must be collected for sequencing. It would have been best to collect a large, fixed number of samples at regular time intervals of two weeks or one month. Several other countries have done so, and have gotten a good picture of strain prevalence. An example of one such country is Bangladesh [
17]. From Dhaka’s data, it can be seen that the alpha variant was first identified on 6 January 2021. It outcompeted the Wuhan-like strain and preexisting variants. The beta variant was then introduced on 16 March 2021. The beta variant outcompeted the alpha variant and established itself as the major lineage. The delta variant then appeared at the beginning of May and became the dominant variant in May and June. This kind of granular time-lapsed data can help plan public policy.
Although such detailed sequencing data are not available in Nepal, we can surmise that the 19A variant that first appeared in early 2020 was replaced by the 20A and 20B variants. In the early part of 2021, alpha, kappa, and delta strains appeared. Although all three strains appeared, the delta strain outcompeted the other strains and established itself as the predominant strain by July 2021. From a policy point of view, it would be important to target the delta variant.
An evaluation of the death rate per infected person was calculated for different variants. Several studies have shown that the delta variant is more deadly than other variants of SARS-CoV-2 [
18]. The data from Nepal show a similar result with a higher death rate for the delta variant.
In 2021, Nepal started vaccinating its population over the age of 16 years with vaccines approved by the WHO. Nepal has received donations and purchased vaccines made by Astrazeneca, Sinopharm, and Johnson and Johnson. Other vaccines have also been developed by Moderna and Pfizer. The technologies, efficacy, and safety of these vaccines have been discussed in previous studies in the literature. The Astrazeneca vaccine, which requires two doses to be administered intramuscularly for complete protection, has 62–90% efficacy, while the efficacy rates of the Moderna and Pfizer vaccines are 95 and 94%, respectively [
19]. These vaccines were first tested against coronaviruses in 2020. Their efficacy is lower for the delta strain, which is a cause for concern. However, all these vaccines protect against the development of severe disease, hospitalization, and death [
20,
21]. New efficacy data regarding the vaccines are coming in every day. The government of Nepal should clearly watch for efficacy data against the delta variant before further rolling out vaccination programs.
From the phylogenetic data, the country from which the virus variants originated can be vaguely discerned. From
supplementary Figure S1 we can make out that most of the transmissions took place from India or other Asian countries. This is indeed a logical conclusion since most of the travel in Nepal is to and from India. These data have to be taken with a pinch of salt because most of our transmissions might be taking place from India, but since India does not sequence very many samples, we might be missing out transmission hotspots. We can also observe from
supplementary Figure S1 that there is local transmission as well, where different strains from Nepal congregate. These data are also very useful in planning lockdowns. It gives hints as to for which countries travel restrictions are needed during the lockdown.
From a global analysis of deaths per million population, throughout the world, most countries in Europe, South America, and North America performed worse than Nepal. The reason for this might be demographic. Nepal has a very young population, which might have been less affected by COVID-19. Alternatively, there might be other underlying, causes such as higher parasitic infection rates, which might have given greater protection to the Nepalese population compared to populations of the West [
20].
The next big question regards whether there will be a third wave or a new variant capable of outcompeting the delta variant. Computational modeling work can be carried out to detect a possible variant that is more transmissible than the current delta variant. While such modeling may be able to predict the next variant, sequencing work will have to be continued to detect the emergence of such variants. Genomic surveillance can predict the emergence of the next variant and will be useful to the roll-out of COVID-19-related policies.