SARS-CoV-2 Spike Mutations, L452R, T478K, E484Q and P681R, in the Second Wave of COVID-19 in Maharashtra, India

As the global severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic expands, genomic epidemiology and whole genome sequencing are being used to investigate its transmission and evolution. Against the backdrop of the global emergence of “variants of concern” (VOCs) during December 2020 and an upsurge in a state in the western part of India since January 2021, whole genome sequencing and analysis of spike protein mutations using sequence and structural approaches were undertaken to identify possible new variants and gauge the fitness of the current circulating strains. Phylogenetic analysis revealed that newly identified lineages B.1.617.1 and B.1.617.2 were predominantly circulating. The signature mutations possessed by these strains were L452R, T478K, E484Q, D614G and P681R in the spike protein, including within the receptor-binding domain (RBD). Of these, the mutations at residue positions 452, 484 and 681 have been reported in other globally circulating lineages. The structural analysis of RBD mutations L452R, T478K and E484Q revealed that these may possibly result in increased ACE2 binding while P681R in the furin cleavage site could increase the rate of S1-S2 cleavage, resulting in better transmissibility. The two RBD mutations, L452R and E484Q, indicated decreased binding to select monoclonal antibodies (mAbs) and may affect their neutralization potential. Further in vitro/in vivo studies would help confirm the phenotypic changes of the mutant strains. Overall, the study revealed that the newly emerged variants were responsible for the second wave of COVID-19 in Maharashtra. Lineage B.1.617.2 has been designated as a VOC delta and B.1.617.1 as a variant of interest kappa, and they are being widely reported in the rest of the country as well as globally. Continuous monitoring of these and emerging variants in India is essential.


Introduction
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) surface spike (S) protein mediates entry into host cells by binding to the host receptor angiotensinconverting enzyme 2 (ACE2) via its receptor-binding domain (RBD). Crystal structures of SARS-CoV-2 S protein or its RBD complexed with ACE2 from different hosts reveal that the RBD contains a core and a receptor-binding motif (RBM) which forms contacts with ACE2 [1]. A number of naturally selected mutations in the RBM have been shown to affect infectivity, human-to-human transmission, pathogenesis and immune escape [2].
As the global SARS-CoV-2 pandemic expands, genomic epidemiology and whole genome sequencing are being used to investigate the transmission and evolution. The expansion of the genome-wide diversity resulted in delineation of the viral strains into clades, lineages and sub-lineages. Clades G/GH/GR/GV/GRY as per the Global Initiative on Sharing All Influenza Data (GISAID) database (https://www.gisaid.org/, Accessed on 14 June 2021) [3] possess a common mutation D614G in the spike protein. The mutation was shown to result in significantly higher human host infectivity and better transmission efficiency for the virus [4,5]. Three recently emerged "variants of concern" (VOCs) of SARS-CoV-2 are GRY (formerly GR/501Y.V1)/B.1.1.7 (alpha), GH/501Y.V2/B.1.351 (beta) and GR/501Y.V3/P.1 (gamma) [6]. These variants are known to possess multiple mutations across the genome, including several in the S protein and its RBD, such as N501Y, E484K and K417N/T [5][6][7][8]. Multiple SARS-CoV-2 variants are now seen to be circulating globally.
The potential introduction and consequence of these emerging variants in the country is vital to support the public health response. The National Influenza Centre at the ICMR-National Institute of Virology, Pune, as an apex laboratory of the Indian Council of Medical Research (ICMR), has been continuously involved in SARS-CoV-2 diagnostics and monitoring the genomic evolution of this virus. In addition, the ICMR-NIV, Pune is also one among the ten national laboratories of the Indian SARS-CoV-2 Consortium on Genomics (INSACOG), Ministry of Health and Family Welfare, Government of India, catering to western India. Against the backdrop of the global emergence of VOCs and an upsurge in Maharashtra, a state in the western part of India, enhanced sequencing was undertaken. This was to identify possible new variants and analyze spike protein mutants, in particular to gauge the fitness of current circulating strains, using bioinformatics sequence and molecular structural approaches.

Materials and Methods
The state of Maharashtra (latitude 19 • 39 N, longitude 75 • 18 E) is located in west central India, in the northwestern part of the Indian subcontinent. Since the end of January 2021, a concentrated spurt in COVID-19 cases was noted in several districts of Maharashtra ( Figure S1). As per the state government directives, samples from international travelers and 5% of surveillance samples from the positive cases with Ct value <25, including those from clusters, long haulers and mild/moderate/severe and deceased cases, were referred to the ICMR-NIV, Pune for whole genome sequencing. Nasopharyngeal swabs (n = 2987, 25 November 2020-30 May 2021) were processed for whole genome sequencing.
Briefly, nucleic acid extraction was performed using 280 µL of each sample in duplicate by a Qiagen viral RNA extraction protocol and quantified RNA was further processed for template preparation using the Ion Chef System. Purified template beads were submitted to meta transcriptome next-generation sequencing (NGS) in the Ion S5 platform (ThermoFisher Scientific, Waltham, MA, USA) using an Ion 540™ chip and the Ion Total RNA-Seq kit v2.0, as per the manufacturer's protocol (ThermoFisher Scientific, Waltham, MA, USA) [9]. Sequence data were processed using Torrent Suite Software (TSS) v5.10.1 (ThermoFisher Scientific, Waltham, MA, USA). Coverage analysis plugins were utilized to generate coverage analysis report for each of the samples. Reference-based reads gathering and assembly were performed for all the samples using Iterative Refinement Meta-Assembler (IRMA) [10]. A subset of the samples was also sequenced on an Illumina machine and analyzed using CLC Genomics Workbench version 20 (CLC, Qiagen, Hilden, Germany) as described elsewhere [11]. Of 2204 whole genomes obtained, 1791 were considered for different analyses based on coverage of ≥93% of the genome. Both eastern (n = 829) and western (n = 962) districts of the state were well represented. For each whole genome sequence, the GISAID clade assignment was done using CoVsurver: Mutation Analysis of hCoV-19 (https://www.gisaid.org/epiflu-applications/covsurver-mutations-app/, Accessed on 1 June 2021). For lineage assignment, a web application, i.e., Phylogenetic Assignment of Named Global Outbreak LINeages (PangoLIN) COVID-19 Lineage Assigner (https://pangolin.cog-uk.io/, Accessed on 1 June 2021), was implemented [8]. The 1791 genomes were aligned using MAFFT v.7.450 [12] and a neighbor-joining phylogenetic tree was constructed using MEGA V.6 [13], employing the composite likelihood as the substitution model and 1000 bootstrap replications. The gene-wise top mutations in the predominant lineages were depicted using heat maps based on the calculations of the mutational frequencies.
For further structural characterization of the S protein mutations, the crystal structure of SARS-CoV-2 S glycoprotein complexed with ACE2 was obtained from the protein data bank (PDB ID: 7A98 [14]). The top mutations in the S protein based on the heatmap were mapped using Biovia Discovery studio visualizer 2020. To assess the effect of noted RBD mutations on ACE2 binding, the crystal structure of the SARS-CoV-2 spike RBD domain complexed with ACE2 (PDB ID: 6LZG [15]) was used. For assessment of the noted mutations on binding to neutralizing antibodies, the SARS-CoV-2 spike RBD domain complexed with two selected mAbs REGN10933/P2B-2F6 were retrieved (PDB ID: 6XDG; resolution 3.90A and 7BWJ; resolution 2.65 Å, respectively) [16,17]. Point mutations were carried out using Biovia Discovery studio visualizer 2020 and the structures of the complexes were subjected to energy minimization using the macro model tool in Schrodinger 2020 using default parameters. The molecular interactions between the RBD-ACE2 interface, within the RBD and between the neutralizing mAbs-RBD were analyzed using a non-bonded interactions tool in Biovia Discovery studio visualizer 2020. The structure of the S protein of Wuhan-Hu-1 (Genbank ID: YP_009724390.1) inclusive of the pre-cleavage motif PRRAR was modeled based on homology modeling using SWISS-MODEL (https://swissmodel.expasy.org/, Accessed on 31 May 2021) with the template as a SARS-CoV-2 spike protein trimer complex, 7CWU.pdb.

Results
The genomic surveillance for the spurt in the COVID-19 cases in Maharashtra was carried out to identify the circulating lineages/VOCs and identify possible functionally significant mutations in the spike protein.    The key mutations in the S protein are mapped on a furin-cleaved structure of the S protein ( Figure 4). The structural implications of the RBD mutations, L452R and E484Q, as in lineages B.1.617.1 and B.1.617.3, were analyzed in terms of interaction with the ACE2 receptor and neutralizing antibodies that are known to have interactions with these residues ( Figure S3). Similarly, the implications of RBD mutations, L452R and T478K, as in lineage B.1.617.2, were studied. The effect of the E484Q mutation is noted in terms of disruption in an electrostatic bond of the spike RBD residue E484 with K31 in the ACE2 interaction interface ( Figure 5A, Supplementary Table S1). The intramolecular interactions in the wild strain indicate that the L452 residue is involved in a hydrophobic interaction with L492 which forms another hydrophobic contact with F490. These residue interactions form a hydrophobic patch on the surface of the RBD. The L452R mutation abolishes the hydrophobic interaction with L492 of the RBD. Estimation of the minimized energies of the wildtype and L452R occurring with the E484Q mutant structure of the RBD complexed with ACE2 showed energy values of −93732.305 kcal/mol and −94543.180 kcal/mol, respectively. In the case of the mutant RBD possessing L452R with T478K ( Figure 5B), though no additional intermolecular contact was affected, three additional hydrogen bonds (H-bonds) were formed by K478 with F486. The intramolecular interaction L452-L492 associated with the L452R mutation was disrupted before. The minimum energy value of the RBD-ACE2 complex was 94,751.367 kcal/mol.   Table S2), The heavy chain of monoclonal antibody REGN10933 interacts with the RBD by making two H-bonds between E484 of the RBD and Y53 and S56 of the antibody. Residue Y453 of the RBD makes an H-bond with D31 of the heavy chain of the antibody. The RBD mutation E484Q disrupts the two H-bonds with S56 and Y53 ( Figure 5C). Residues L452 and E484 possess direct contacts with the mAb P2B-2F6 ( Figure 5D). The intermolecular interactions of the wildtype complex indicate that L452 is involved in hydrophobic interactions with residues I103 and V105 of the heavy chain of the antibody. Residue E484 forms H-bonds with N33 and Y34 of the light chain as well as a salt bridge and H-bond interaction with the R112 side chain of the light chain. The L452R mutation breaks the hydrophobic interactions with both residues I103 and V105 and also disrupts the H-bond and electrostatic interaction with R112.
Modeling the S protein with the pre-cleavage motif PRRAR showed the exposure of R682 and R685 ( Figure S4). Induction of a P681R point mutation in the modeled structure further revealed that the side chain of R681 could facilitate additional interactions with furin.

Discussion
In addition to the three global VOCs, the global variants of interest (VOIs) such as GH/452R.V1/ B.1.427/B.1.429 (epsilon), GR/P.2 (zeta), G/484K.V3/B.1.525 (eta), GR/P.3 (theta) and GH/B.1.526 (iota) have been identified by the WHO [6]. The potential consequences of emerging variants are increased transmissibility, increased pathogenicity and the ability to escape natural or vaccine-induced immunity [18,19]. With this background, we sequenced and analyzed the whole genomes of SARS-CoV-2 from different districts of Maharashtra where a surge in COVID-19 activity was noted since the end of January 2021 after a gap of almost four months.
Phylogenetic and sequence analyses revealed that only a small proportion of sequences, mostly in the month of December 2020, were detected as the VOC "alpha". The  1, B.1.1,  B.1.36, B.1.351, B.1.618 and B.1.1.216, that were not common in the state of Maharashtra. Our earlier studies [9,20] that were based on genome sequences from India from January to August/September 2020 had demonstrated that the dominant lineages were B.1.113,  B.1.1.32, B.1.1.8, B.1.80, B.4 and B.6. This reflects the diversity of lineages in the country and also the switches in the dominant lineages.
Notably, D111D was found to be associated with the signature mutations of lineage B.1.1.617.1. The co-occurrence of synonymous mutations with the non-synonymous mutations observed is interesting and generally being reported [21,22].
The structural analysis of the effect of RBD mutations L452R and E484Q towards ACE2 binding revealed a decrease in intramolecular and intermolecular contacts with respect to the wildtype. However, the hydrophobic L452 residue mutation to the hydrophilic 452R might help in interactions with water molecules and overall stabilization of the complex, as was reflected in the lower minimum energy of the mutant complex. The effect of the mutations L452R and T478K on ACE2 binding was also observed as enhanced stabilization of the RBD-ACE2 complex.
Another significant mutation, P681R, in the furin cleavage site resulted in enhancement of the basicity of the poly-basic stretch, and the likely facilitation of additional contacts with furin for S1-S2 cleavage. This could help in an increased rate of membrane fusion, internalization and thus better transmissibility. A recent study [23] revealed increased infectivity of the B.1.617 spike protein that could be attributed to L452R which itself caused a 3.5-fold increase in infectivity and, in combination with E484Q, caused a 3-fold increase. The P681R mutation caused a small increase in proteolytic processing that might have an effect on infectivity [24]. Observations on enhanced transmissibility of B.1.617.2 are also reported [23]. Further in vitro and/or in vivo studies to understand the phenotypic effects of the mutant strains would be vital for validating these observations.
Structural analysis further showed that the two RBD mutations L452R and E484Q may decrease the binding ability of REGN10933 and P2B-2F6 antibodies to the variant strains, compared to that in the wildtype strain. A recent report [25] revealed that the L452R mutation reduced or abolished neutralizing activity of 14 out of 35 RBD-specific mAbs including three clinical stage mAbs. Another recent study [26] has demonstrated that the mutation L452R can escape from human leukocyte antigen (HLA) 24-restricted cellular immunity and can also increase viral infectivity, potentially promoting viral replication. The combination of the two RBD mutations, L452R and E484Q, noted in this study could affect the neutralization of the select mAbs. The neutralizing potential of a wider range of mAbs against the strains of the B.1.617 lineages needs to be assessed. Recent studies have investigated the level of neutralization efficacy of sera from vaccinees of different vaccine formulations and those who have experienced natural infection. The neutralization of B.1.617.1 with sera of BBV152 (Covaxin) vaccinees showed almost two-fold reduction in neutralization compared to a B.1 "G" clade strain [27]. Even more reduction in neutralizing titer with sera of ChAdOx1 2 (Covishield)-vaccinated individuals was observed against the B.1.617.1 variant [28]. Against the B.1.617.2 variant, a 4.6-fold and 2.7-fold reduction in the neutralization titer was observed in comparison to the B.1 variant with sera of COVID-19 recovered cases and Covaxin vaccinees, respectively [29]. Bernal et al. [30], have also reported the effectiveness of COVID-19 vaccines against the B.1.617.2 variant. A summary of the major spike protein mutations defining the B.1.617 lineages and their proposed effects [31][32][33][34][35][36][37] is provided in Supplementary Table S3.
Mutations at both the residue positions 452 and 484 individually have been reported previously. L452R has been noted earlier in lineages B.1.427 and B.1.429 while the E484K mutation is common to the two VOCs beta and gamma and two VUIs, zeta and eta. E484Q has also been reported in several sequences in the GISAID. P681H is one of the mutations in the VIC alpha, while P681R is one of the mutations in the lineage A.23.1, which is identified as a "variant under monitoring" (https://www.ecdc.europa.eu/en/COVID-19/variantsconcern, Accessed on 14 June 2021  [6].
To summarize, the study investigated the S protein mutations associated with the rise in COVID-19 cases in Maharashtra observed since the month of February 2021. The continuous increase in positivity could be attributed to signature mutations in the spike protein and functionally significant co-occurring mutations. The investigations related to patterns of circulation of the new variant lineages during the COVID-19 second wave in India need to be continued for their public health impact and association with vaccine breakthroughs, both in the case of partially and fully vaccinated individuals and reinfections.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/microorganisms9071542/s1, Figure S1: The distribution of predominant PangoLIN lineages in the labelled districts of Maharashtra. The size of the pie charts is proportional to the number of genome sequences from the particular district. Figure S2: Heat map of major mutations in the different proteins of the predominant lineages from November, 2020 to May, 2021. Figure Table S1: Intra and intermolecular interactions of the wild-type and mutant spike RBD of SARS-CoV-2 with ACE2. The bonds that are made bold in the wild-type indicate those that are disrupted in the mutant strain while the bonds that are bold in the mutant indicate those that are additional when compared to the wild-type strain. Mutant1 indicates the strain of lineage B.1.617.1 and mutant2 indicates a strain of lineage B.1.617.2. Table S2: Intermolecular interaction of neutralizing antibodies with wild-type and mutant spike RBD of SARS-CoV-2. Table S3  Institutional Review Board Statement: Ethical review and approval were waived for this study, considering that the study was conducted on referred samples as part of the COVID-19 pandemic investigations.
Informed Consent Statement: Patient consent was waived considering that the study was conducted on referred samples as part of the COVID-19 pandemic investigations. Data Availability Statement: Genomes sequenced as a part of this study have been deposited in GISAID, with INSACOG tags.

Acknowledgments:
The authors acknowledge Indian SARS-CoV-2 Consortium on Genomics (IN-SACOG), Ministry of Health and Family Welfare, Government of India for their support. The authors are grateful to State Integrated Disease Surveillance Program (IDSP) for coordination of samples. We also acknowledge the cohesive efforts of NIC staff for sample processing and diagnostics. Thanks are due to Bioinformatics Group members for inputs during analyses.

Conflicts of Interest:
The authors declare no conflict of interest.