Insight into Genetic Characteristics of Identified SARS-CoV-2 Variants in Egypt from March 2020 to May 2021

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) was first detected in Egypt in February 2020. Data about the prevalence rates of the SARS-CoV-2 lineages are relatively scarce. To understand the genetic characteristics of SARS-CoV-2 in Egypt during several waves of the pandemic, we analyzed sequences of 1256 Egyptian SARS-CoV-2 full genomes from March 2020 to May 2021. From one wave to the next, dominant strains have been observed to be replaced by other dominant strains. We detected an emerging lineage of SARS-CoV-2 in Egypt that shares mutations with the variant of concern (VOC). The neutralizing capacity of sera collected from cases infected with C.36.3 against dominant strains detected in Egypt showed a higher cross reactivity of sera with C.36.3 compared to other strains. Using in silico tools, mutations in the spike of SARS-CoV-2 induced a difference in binding affinity to the viral receptor. The C.36 lineage is the most dominant SARS-CoV-2 lineage in Egypt, and the heterotrophic antigenicity of SARS-CoV-2 variants is asymmetric. These results highlight the value of genetic and antigenic analyses of circulating strains in regions where published sequences are limited.


Introduction
Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is an enveloped, single-stranded, positive-sense RNA virus from the Coronaviridae family. SARS-CoV-2 is prone to a multitude of recombinations and mutations in its genome that could lead to changes in viral protein structure, binding affinity, viral transmission, diagnosis, vaccine efficacy, and antiviral susceptibility [1][2][3]. The symptoms of infection with SARS-CoV-2 tend to resemble symptoms of other viral and bacterial infections that range in severity from mild respiratory and gastrointestinal illness to Acute Respiratory Distress Syndrome (ARDS).
Coronavirus disease (COVID- 19) was introduced into Africa via Nigeria, Egypt, and South Africa during the first quartile of 2020 [4]. On 14 February 2020, the Egyptian Ministry of Health and Population (MOHP) announced the detection of the first human COVID-19 infection in Egypt in a Chinese tourist [5]. A few days later, the MOHP discovered another source of transmission on board of a Nile cruise ship traveling between Luxor and Aswan where COVID-19 infections were confirmed amongst 33 passengers and 12 staff members [5]. Since this first influx of cases, Egypt has ranked as one of the top five countries reporting COVID-19 cases in Africa [5]. National efforts have been exerted to control the virus spreading through a comprehensive control plan strategy set by the Egyptian government. The control strategy included applying various types of lockdowns, increasing public awareness about COVID-19 and the importance of using community protective equipment and disinfectants, screening for SARS-CoV-2 before entering the country, preparing healthcare facilities and reinforcing infection control practices among healthcare workers, ensuring vaccine availability, and establishment of COVID-19 treatment protocols. Unfortunately, these attempts and efforts to control the spread of virus have not succeeded. As of 4 March 2022, 489,000 Egyptian cases and 24,162 deaths have been reported to WHO from MOHP.
Next-generation sequencing provides high-quality, full-scale genome sequences and constitutes a rapid approach to determine genetic variations and new emerging variants of SARS-CoV-2. Analysis of genome sequences gives insight into the pattern of global distribution, genetic diversity during the pandemic, and the dynamics of newly emerging variants. Monitoring variants is essential for the control strategy plan to ensure the effectiveness of vaccination and treatment protocols. Several public databases such as The Global Initiative on Sharing All Influenza Data (GISAID) (www.gisaid.org accessed on 3 January 2022), the NCBI SARS-CoV-2 database (https://www.ncbi.nlm. nih.gov/sars-cov-2 accessed on 3 January 2022), and the Virus Pathogen Database and Analysis Resource (ViPR) (https://www.viprbrc.org accessed on 3 January 2022) share genomic information of circulating viruses and provide further insight into molecular variations among SARS-CoV-2 genomes. In the same context, several online classification and analysis tools have been developed such as GISAID, Nextstrain SARS-CoV-2, Pangolin (https://cov-lineages.org/resources/pangolin.html accessed on 3 January 2022), Nextclade (https://clades.nextstrain.org/ accessed on 3 January 2022), and others.
Several lineages and sublineages of SARS-CoV-2 have been identified with variable prevalence rates between the North and the South, depending on the source of introduction [4]. The continuous direct or indirect spreading of SARS-CoV-2 from person to person, particularly among those who are fully vaccinated, drives the virus to evolve into new Variants of Interest (VOIs) or Variants of Concern (VOCs), by escaping different hosts and immunological pressures [6] [2,4].
Although Egypt is the largest connecting point between Europe, Asia, and Africa with a remarkable number of intercontinental airlines directly linked to Europe, data about the prevalence rates of the SARS-CoV-2 lineages and sublineages including the VOIs, VOCs, and VUMs are relatively scarce. In this article, we report the distribution of SARS-CoV-2 variants, highlighting the prevalence of certain mutations and their cross reactivity, in addition to hACE2 binding affinity to the most detected strains in Egypt between March 2020 and May 2021. We present data on the genetic characteristics of the evolution of SARS-CoV-2 variants in the first year of pandemic in Egypt, highlighting the characteristics of spike protein mutations. To achieve this, we measured neutralizing antibody titers in sera collected from confirmed cases of SARS-CoV-2 and performed protein-protein docking experiments to illustrate the characteristics of virus variants circulating in Egypt.

Epidemiology of SARS-CoV-2 in Egypt
Three SARS-CoV-2 waves were recorded in Egypt based on the available epidemiological data ( Figure 1A). The first wave began in April 2020 and receded by August 2020. It was associated with the religious and cultural celebrations of the holy month of Ramadan as well as the yearly wedding celebration season. The second wave coincided with the onset of winter weather. The third reported wave followed the mass gatherings associated with the start of the following year's religious and cultural celebrations in 2021.
Pathogens 2022, 11, x FOR PEER REVIEW 3 of 15 tibody titers in sera collected from confirmed cases of SARS-CoV-2 and performed protein-protein docking experiments to illustrate the characteristics of virus variants circulating in Egypt.

Epidemiology of SARS-CoV-2 in Egypt
Three SARS-CoV-2 waves were recorded in Egypt based on the available epidemiological data ( Figure 1A). The first wave began in April 2020 and receded by August 2020. It was associated with the religious and cultural celebrations of the holy month of Ramadan as well as the yearly wedding celebration season. The second wave coincided with the onset of winter weather. The third reported wave followed the mass gatherings associated with the start of the following year's religious and cultural celebrations in 2021.

Clades and Lineage Distribution of SARS-CoV-2
The analysis included 1256 whole-genome sequences of SARS-CoV-2 strains submitted to the GISAID until 31 May 2021 from Egypt. The number of submitted sequences by month during the duration of our study (n = 15 months) is shown in Figure 1B. We noted that the number of submitted sequences was closely correlated with the number of cases shown in the epidemiological curve ( Figure 1A). Full-genome sequences (n = 778) were submitted to the database in 2020 and 478 were submitted (n = 478) by May 2021. Seven

Clades and Lineage Distribution of SARS-CoV-2
The analysis included 1256 whole-genome sequences of SARS-CoV-2 strains submitted to the GISAID until 31 May 2021 from Egypt. The number of submitted sequences by month during the duration of our study (n = 15 months) is shown in Figure 1B. We noted that the number of submitted sequences was closely correlated with the number of cases shown in the epidemiological curve ( Figure 1A). Full-genome sequences (n = 778) were submitted to the database in 2020 and 478 were submitted (n = 478) by May 2021. Seven According to Nextclade analysis, 529 of the identified genomes belonged to Nextstrain clade 20D, 484 belonged to clade 20A, 94 belonged to clade 20B, 59 belonged to clade 19A, 28 belonged to clade 19B, and 3 belonged to clade 20C. The remaining sequences were classified as variants, 29 belonged to clade 20I (Alpha, V1) which was identified in March-May 2021, nine sequences belonged to 21J (Delta), and one strain belonged to clade 21H(Mu) ( Figure 1C).  Figure 1D).

Phylogenetic Reconstruction of SARS-CoV-2 in Egypt
Phylogenetic analysis of the full genome sequences showed that Egyptian strains diversified into several lineages ( Figure 2). Among VOCs, only B.1.1.7 was detected in limited number of sequences. The most dominant is the C.36 that is subsequently subdivided into C.36.1, C.36.2, and C.36.3 ( Figure 2).

Mutational Pattern Analysis
We performed a comprehensive mutational pattern analysis of the SARS-CoV-2 strains from this study in Egypt. In total, we identified mutations in S, N, M, ORF1a, ORF1b, ORF3a, ORf8, and ORF7b genes as compared with the reference strain Wuhan/WIV04/2019. The D614G mutation is an amino acid substitution in the spike protein that was discovered early in the pandemic and has since become the most common SARS-CoV-2 mutation worldwide. In our study, 1016 samples had D614G and 79 samples from GISAID clade L, O, and S did not have this mutation ( Figure 2B). The VOCs (Alpha, Beta, and Gamma) have a common mutation N501Y in spike gene, we show that N501Y mutation has been identified in VOCs Alpha (B.1.1.7) and two strains (B.1 and B.1.1.10 Lineages). The N501T was detected in A.28 (n = 20), B.1.1 (n = 4), and C.36 (n = 3). E484K spike mutation was detected in B.1.44.1 (n = 1) and C.38 (n = 11). Spike protein H69/V70 deletion was observed in 68 and 66 (5.4%, and 5.2%) sequences, respectively. Most deletions of H69/V70 were detected in C.36.3 Lineage (n = 49) (Table 2, Figure 2B).   The genetic diversity of SARS-CoV-2 has increased rapidly, we observed other mutations in other genes of SARS-CoV-2 (Supplementary Materials Table S1). Most mutations were detected in N, ORF1a, and ORF1b and few mutations in M and E genes, indicating that these proteins are highly conserved among these viruses.

Microneutralization Assay of Infected Cases with C.36.3 Lineage
We collected sera samples from seven convalescent patients who were infected with the SARS-CoV-2 C.36 strain and exhibited mild or moderate symptoms from May-June 2021. Using an in vitro neutralization assay with authentic viral isolates of B.  The protein-ligand interaction fingerprints (PLIF) were studied for each selected pose from the three applied docking processes. Regarding the hCoV-19/Egypt/NRC-03/2020 (B.1 lineage)-hACE2 docking process, it was noted that Glu231 and Val604 of the RBD of the hCoV-19/Egypt/NRC-03/2020 (B.1 lineage) S protein (12.4%) were responsible for the interactions. Moreover, Gln24, Asp136, Glu197, Glu224, and Asn601 of the RBD formed the major binding interactions (Supplementary Materials Figure S1A).

Visualization and Description of the Selected Docking Poses
The best pose in each docking process was selected depending on the score, rmsd_refine value, and the binding interactions for further studies. For the hCoV-19/Egypt/NRC-03/2020 (B.1 lineage)-hACE2 docking process, pose number 2 was selected with a binding score of −79.82 kcal/mol and an rmsd_refine value of 0.68. It was obvious that the hCoV-19/Egypt/NRC-03/2020 (B.1 lineage) S protein (represented in turquoise color) fitted with the RBD region of the hACE2 receptor (represented in black color), which explains largely the expected intrinsic binding affinity as represented in Figure 4A The protein-ligand interaction fingerprints (PLIF) were studied for each selected pose from the three applied docking processes. Regarding the hCoV-19/Egypt/NRC-03/2020 (B.1 lineage)-hACE2 docking process, it was noted that Glu231 and Val604 of the RBD of the hCoV-19/Egypt/NRC-03/2020 (B.1 lineage) S protein (12.4%) were responsible for the interactions. Moreover, Gln24, Asp136, Glu197, Glu224, and Asn601 of the RBD formed the major binding interactions (Supplementary Materials Figure S1A).

Visualization and Description of the Selected Docking Poses
The best pose in each docking process was selected depending on the score, rmsd_refine value, and the binding interactions for further studies. For the hCoV-19/Egypt/NRC-03/2020 (B.1 lineage)-hACE2 docking process, pose number 2 was selected with a binding score of −79.82 kcal/mol and an rmsd_refine value of 0.68. It was obvious that the hCoV-19/Egypt/NRC-03/2020 (B.1 lineage) S protein (represented in turquoise color) fitted with the RBD region of the hACE2 receptor (represented in black color), which explains largely the expected intrinsic binding affinity as represented in Figure 4A However, for the hCoV-19/Egypt/NRC-6071/2021 (C.36 lineage)-hACE2 docking process, pose number 3 was selected with a binding score of -96.00 kcal/mol and an rmsd_refine value of 0.31. Notably, the hCoV-19/Egypt/NRC-6071/2021 (C.36 lineage) S protein (represented in purple color) achieved the best binding score with the RBD region of the hACE2 receptor (represented in black color) which explains its great binding affinity, as represented in Figure 4E,F. an rmsd_refine value of 1.66. Moreover, the hCoV-19/Egypt/NRC-369/2021 (C.36.3 lineage) S protein (represented in orange color) bound the RBD region of the hACE2 receptor (represented in black color) which indicates the expected superior binding affinity, as depicted in Figure 4C,D. However, for the hCoV-19/Egypt/NRC-6071/2021 (C.36 lineage)-hACE2 docking process, pose number 3 was selected with a binding score of -96.00 kcal/mol and an rmsd_refine value of 0.31. Notably, the hCoV-19/Egypt/NRC-6071/2021 (C.36 lineage) S protein (represented in purple color) achieved the best binding score with the RBD region of the hACE2 receptor (represented in black color) which explains its great binding affinity, as represented in Figure 4E,F.

Discussion
Since its detection in Egypt in March 2020, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) strains have varied. The number of COVID-19 cases in Egypt has risen steadily over the last two years, with the first recorded case registered on 14 February 2020, according to the World Health Organization (WHO) and Egyptian officials. The

Discussion
Since its detection in Egypt in March 2020, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) strains have varied. The number of COVID-19 cases in Egypt has risen steadily over the last two years, with the first recorded case registered on 14 February 2020, according to the World Health Organization (WHO) and Egyptian officials. The initial wave of the pandemic in Egypt was dominated by lineage B viruses (mainly B.1), while the second and third waves were dominated by C-like lineages (especially C.36).
Genomic epidemiology allows for the detection of new variants as well as a better understanding of how individual virus lineages' mutations contribute to their relative adaptive advantages. This information can then be utilized to inform the development of vaccines and treatments, as well as establish effective pandemic control measures. In this study, we analyzed 1256 SARS-CoV-2 full genome sequences from Egypt. We have identified a novel SARS-CoV-2 cluster in Egypt, descended from the VUM in lineage C.36, L452R with an additional spike protein substitution at amino acid position 677, namely Q677H. The cluster of cases carries the core substitutions of the C.36 lineage, and was first found in the UK, USA, and United Arab Emirates in January 2021 and has risen in frequency to comprise the majority of sequenced cases in Egypt. They also contain mutation L452R from VOI substitutions which is present in B.1.526.1, B.1.427, and B.1.429 lineages, as well as B.1.617 lineage and sublineages. The novel variant strains also contain del69-70 with or without 142-143-144 del, which represents an important mutation in a VOC that has been previously observed in the Alpha variant, raising interest in potential convergent evolution. Independently, both C.36 lineages with L452R substitutions and 69-70 del substitutions have increased in prevalence. The impact of the detected VOC on viral transmission, vaccine effectiveness, and viral pathogenicity requires further investigation.
We examined 1256 SARS-CoV-2 sequences for mutation countryside tracking and observed that the most common subclade of SARS-CoV-2 in Egypt was D614G/Q57H/V5F/ G823S, consistent with the findings of Lamptey et al. [13]. There was only one Egyptian sequence where four mutations occurred at the same time [13]. SARS-CoV-2 sequences examined in the current study are all identical and closely related to Wuhan-Hu-1. A previous study found >99.8 percent identity with the reference isolate (NC 045512) [14]. Based on the spike amino acid sequence, the Pangolin lineage C.36 can be separated into three sublineages. The C.36.3 sublineage, which was discovered in January 2021 and has a similar global distribution with highest prevalence in Egypt, is the focus of this research. With the common spike mutation L452R and deletion 69/70HV, C.36 sublineages reached the pinnacle of their outbreak in May 2021. Sublineage 3 was initially identified in late January 2021, and its prevalence has been slowly increasing since then, while the global incidence has remained modest. It has a distinct geographic distribution from the other C.36 sublineages, being found primarily in Europe and the United States, as well as a greater number of spike variants that are preserved among samples belonging to it.
In this article, we have shed light on the viral dynamics underlying Egypt's COVID-19 outbreak by combining genomic and epidemiological data. We have shown, for instance, that non-VOC lineages can evolve rapidly within a particular country to acquire adaptive mutations required to maintain community transmission and change the prognosis of SARS-CoV-2 infections from mild to severe if they are not properly managed. The collected samples were processed and subjected to nucleic acid extraction using the Chemagic 360 machine (PerkinElmer Inc., TURKU, Finland). A VIASURE SARS-CoV-2 Real-Time PCR Detection Kit was used to identify SARS-CoV-2 RNA (ORF1ab and N gene) (Certest Biotec SL, Zaragoza, Spain). Positive samples were verified for SARS-CoV-2 using a Cobas 6800 system, whereby the RT-PCR runs were performed in duplicate and according to the manufacturer s instructions (Roche Holding AG). The cycle threshold (Ct) of SARS-CoV-2 quantitative real-time PCR (RT-qPCR) in the samples selected for sequencing was less than 30.

SARS-CoV-2 Whole Genome Sequencing
The extracted total RNA was subjected to ribosomal RNA removal using the Ribo-Zero rRNA Removal Kit, followed by 1st and 2nd strand cDNA synthesis using the truSeq whole RNA (Illumina, San Diego, CA, USA) according to the manufacturer's instructions and sequenced on the Miseq sequencer (Illumina). Genomic analysis and strain typing of SARS-CoV-2 variants is one of the sequence analysis techniques available with the truSeq assay. Adequate sequencing requires depth coverage of at least 100 straindistinguishing regions of the open reading frame 1a (ORF1a), S, N, and ORF8 genes. With the exception of two sequences, all samples produced sufficient strain-typeable sequences, and no samples had a mixed population of viruses. The reads were aligned with the reference genome (NC_045512.2) using CLC Genomics Workbench version 20 (CLC Bio, Qiagen, Aarhus, Denmark) through workflow, and whole genome sequences were submitted to the GISAID platform.

Phylogenetic Analysis
To infer the phylogeny of SARS-CoV-2 complete genomes, high-coverage sequences of Egyptian genomes were obtained from GISAID platform. Consensus sequences of whole genomes of SARS-CoV-2 strains and reference sequences were aligned using the MAFFT web server (https://mafft.cbrc.jp/alignment/server 3 January 2022). An inference maximum-likelihood tree was constructed using a sequence alignment file and IQ-TREE 1.6.1. For better visualization, the phylogenetic tree was modified using FigTree v1.4.2 software (http://tree.bio.ed.ac.uk/softw are/figtree 3 January 2022).

Microneutralization Assay
Vero E6 cells were seeded into 96-well plates 24 h before the experiment. Collected sera from vaccinated animals were used as positive controls. Human sera samples were decomplemented at 56 • C for 30 min. Serial 2-fold dilutions of sera starting with a dilution of 1:10 were mixed with equal volumes of 100 TCID50/mL of SARS-CoV-2 viruses. After 1 h of incubation at 37 • C, 35 µL of the virus-serum mixture was applied in duplicate to Vero-E6 cell monolayers in 96-well microtiter plates after washing cells with 1X PBS. After 1 h of adsorption, the inoculums were aspirated. The plates were then incubated for three more days at 37 • C in 5% CO 2 in a humidified incubator. The highest serum dilution that completely protected the cells from Cytopathic Effect (CPE) was recorded as the neutralizing antibody titer.  [20] was used to carry out the docking studies to examine the binding affinities and modes for each of the three SARS-CoV-2 Egyptian strains with the hACE2 receptor extracted from the spike receptor-binding domain (PDB ID: 7DDN). We confirmed inhibitory potentials as well.

Preparation of the Target hACE2
The X-ray structure of the S protein of SARS-CoV-2 (at its open state) was extracted from the Protein Data Bank (PDB code: 7DDN) [21]. The open state of the S protein of SARS-CoV-2 provides the receptor-binding domain (RBD) in its opened orientation which allows the simulation of the possible epitope binding and interactions. Then, the hACE2 receptor was extracted from the SARS-CoV-2 spike receptor-binding domain. It was prepared for docking using the Quickprep protocol by applying the automatic correction step for all the predicted errors in the amino acids' connections and types, followed by the addition of hydrogen atoms in the 3D geometry to their atoms. Additionally, the system atoms were selected to be free during the energy minimization as a final step for preparation.

Protein-Protein Docking Processes
The protein-protein docking protocol was selected to evaluate the possible interactions between each of the three SARS-CoV-2 Egyptian strains, hCoV-19/Egypt/NRC-03/2020 (B.1 lineage), hCoV-19/Egypt/NRC-369/2021 (C.36.3 lineage), and hCoV-19/Egypt/NRC-6071/2021 (C.36 lineage)) against the hACE2 receptor in three successive different docking processes, respectively. The applied methodology is described as follows: the previously prepared hACE2 was selected in the three docking processes to be the receptor, then each prepared SARS-CoV-2 Egyptian strain (hCoV-19/Egypt/NRC-03/2020 (B.1 lineage), hCoV-19/Egypt/NRC-369/2021 (C.36.3 lineage), and hCoV-19/Egypt/NRC-6071/2021 (C.36 lineage), respectively) was inserted in the ligand place in each docking process. Each docking process was initiated separately using a hydrophobic patch potential as described earlier [22,23]. Additionally, the scoring tools were selected as default, 10,000 for the number of pre-placement poses, 1000 for the number of placement poses, and 100 for the number of refinement poses. Finally, after the end of each docking process, the obtained 100 poses were studied to select the most appropriate one having the best score, binding interactions, and rmsd_refine value to be studied further and described in detail.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/pathogens11080834/s1, Figure S1: Maximum likelihood (ML) phylogeny of Egyptian SARS-CoV-2 whole-genome sequences. Figure S2: cumulative of C.36.3 prevalence worldwide. Figure S3: PLIF for the Egyptian strains-hACE2 docking (in population form). Table S1: sequences from GISAID used in this analysis. Table S2: shows the most common amino acid mutations among different lineages occurring in Egypt between March 2020 and May 2021.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
All data used in this study are available on request from the corresponding author.