Determining the International Spread of B.1.1.523 SARS-CoV-2 Lineage with a Set of Mutations Highly Associated with Reduced Immune Neutralization

Here, we report the emergence of the variant lineage B.1.1.523 that contains a set of mutations including 156_158del, E484K and S494P in the spike protein. E484K and S494P are known to significantly reduce SARS-CoV-2 neutralization by convalescent and vaccinated sera and are considered as mutations of concern. Lineage B.1.1.523 presumably originated in the Russian Federation and spread across European countries with the peak of transmission in April–May 2021. The B.1.1.523 lineage has now been reported from 31 countries. In this article, we analyze the possible origin of this mutation subset and its immune response using in silico methods.


Introduction
The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in late 2019 led to the ongoing Coronavirus Disease 2019 (COVID- 19), now a global pandemic with more than 535 million cases of infection and more than 6.3 million deaths worldwide [1] (data from 16 June 2022). On 5 January 2020, the first whole genome sequence of 2019-nCoV was completed by the Wuhan Institute of Virology, the China Centre for Disease Control and the Shanghai Public Health Clinical Centre of Fudan University [2]. From this point on, genome sequencing has played an important role in vaccine development, and understanding its viral evolution and epidemiological characteristics. In many countries, SARS-CoV-2 sequencing has been implemented at a national level as a tool for epidemiological management [3].
A large dataset of SARS-CoV-2 genomes has been collected in the GISAID database, which now contains more than 3.7 million sequenced genomes from around the world [4]. As of 31 May 2021, the World Health Organization (WHO) proposed designations for global SARS-CoV-2 variants of concern (VOC) and variants of interest (VOI) to be used alongside scientific nomenclature in communications about variants to the public [5]. This list includes the variants on the global list of WHO VOC and VOI and will be updated as the WHO's list changes. There is currently one SARS-CoV-2 VOC known as Omicron. The previous VOC's Alpha, Beta, Gamma and Delta have been spread widely and showed significant evolutionary benefit in comparison with other lineages [6].
Through the routine analysis of results from the national Lithuanian sequencing efforts coordinated by the National Public Health Surveillance Laboratory, we have identified a novel SARS-CoV-2 variant classified as B.1 by PANGO but containing multiple S protein mutations associated with effects on immunity (https://github.com/cov-lineages/pangodesignation/issues/69) (reported on 5 May 2021), such as E484K, 156_158del and S494P. Preliminary phylogenetic analysis indicated that this variant has a distinct viral lineage that may have originated in Russia. We reported this variant to the PANGO curators and gave it the new phylum name B.1.1.523. On 14 July 2021, the WHO added this variant to the list of variants under its monitoring section. By using bioinformatics tools, we performed a detailed analysis of this lineage and disclosed our findings about this variant and the mutational subgroups typical of this variant.
With this report, we aim to share a detailed analysis of the variant discovered, evaluate its origin as well as predict its potential epidemiological impact and risks.

Transmission Cluster Analysis
In order to elucidate the potential origin of the lineage and transmission cluster, a supa phylogeny analysis of the full genomes representing a small subset of GISAID was performed. The sequences chosen for analysis were composed of the union of two sets of sequences: (i) sequences that were assigned B.1.1.523 lineage by pangolin and (ii) sequences that were at least 99.3% identical to the Latvian B.1.1.523 sequence EPI_ISL_1590462. The number of matched residues amounts to equal or more than 95% of the reference sequence. These two limits 99.3% and 95% are of a different nature. The 95% refers to the level of query coverage in the alignment despite the identity level, the aligned fraction, and the 99.3% refers to the identity level in the aligned region. The reference sequence was chosen as it was closest to the one this lineage sequenced first in Lithuania but with smaller gap regions. The alignment against the GISAID sequences was conducted using minimap2 2.20-r1061 [https://github.com/lh3/minimap2/ (accessed on 5 May 2021)] [9]. The limits were chosen arbitrarily after several attempts to look for cut-offs resulting in a set of sequences that includes most sequences from the lineage and some more diverged ones classified by pangolin as belonging to other lineages. The sequences with poor overall quality control status or with more than 1000 bps missing (as indicated by Nextclade analysis) were discarded. The maximum likelihood tree was calculated using a modified version of Nextstrain workflow [https://github.com/nextstrain/zika (cloned at 1 May 2021)]. The tree was built using IQ-TREE v2.1.2 [10]. A general time reversible model with unequal rates and unequal base frequencies was used [11] allowing for a proportion of invariable sites together with a discrete Gamma model [12]. Ultrafast bootstrap with 1000 replicates was used. The maximum likelihood emergence time and origin of country for inner nodes were calculated by treetime 0.8.1 [https://libraries.io/pypi/phylo-treetime (accessed on 5 May 2021)] as described by the aforementioned workflow. The set of sequences collected as described above was clustered into transmission clusters using Phydelity v2.0 [https://github.com/alvinxhan/Phydelity (accessed on 5 May 2021)] [13].

S Protein Phylogeny Analysis
The S protein-based phylogeny was based on the S protein sequences extracted from GISAID by Nextclade and aligned to the reference COVID-19 sequence. Sequences shorter than 1175 residues or having more than one stop codon or having any number of undetermined residues were discarded. Sequences were further clustered into identical sequence clusters using CD-HIT v4.8.1 [14] (command line option "-c 1.0"). The sequences representing all high-quality S protein variants were used for maximum likelihood tree calculation by VeryFastTree 3.0.1 [15] using an LG substitution model. The sequence alignment that was used to construct the tree was composed of the alignment produced by Nextclade, leaving only the representative sequences of the clusters and CAT approximation with 20 rate categories: lg -gamma. Additionally, command line flags that should increase calculation accuracy were added: "-spr 4 -mlacc 2 -slownni -double-precision". The tree was re-rooted using sequence matching to the EPI_ISL_402124 (one of the earliest human SARS-CoV-2 sequences from Wuhan) as an out-group using GOtree 0.4.1 [https://anaconda.org/bioconda/gotree/files (accessed on 5 May 2021)] [16].
Ancestral states for inner nodes for 156, 157, 158, 484 and 494 positions of S protein were inferred using the command line version of GRASP-suite [17]. This tool for inference was chosen due to its high speed and, most importantly, ability to handle insertions and deletions. The same substitution model that was used for the phylogeny tree was also used in this case (LG). The resulting tree was analyzed and potential changes in the haplotypes were detected using a custom script written in Julia 1.6 exploiting capabilities of the NewickTree library [https://github.com/arzwa/NewickTree.jl (accessed on 5 May 2021)]. The tree was trimmed to keep only those inner nodes that lead to leaves containing the triple deletion at 156-158 positions and either E484K, S494P and visualized using ggtree 3.0.426 [18]. Additionally, a set of full genome sequences was composed matching the leaves of the aforementioned trimmed tree, and a corresponding maximum likelihood tree was calculated using the aforementioned Nextstrain workflow.

Analysis of Potential Recombination Events
The focused set of sequences was composed based on the sequences used for the S protein phylogeny. The set was further narrowed down to the sequences containing either the triple deletion at 156-158 positions or a combination of E484K and S494P. The detection of recombination events at the DNA level was performed using the PoSeiDon workflow [https://github.com/hoelzer/poseidon cloned at 28 September 2021] [19] that runs a GARD program to identify recombination events [20]. The genomic regions matching to the S protein were used. As the workflow is limited to 100 sequences, the initial set of 110 was clustered to 100 sequences using the cd-hit-est program from the CD-HIT package 4.8.1 using identity cut-off equal to 0.9998658. The detection of recombination effects at the protein level was performed using the detREC tool [https://github.com/qianfeng2 /detREC_program (cloned on 1 October 2021)] [21]. The recombination detection was conducted using either a genomic or protein sequence corresponding to the S protein. The full genome scale recombination analysis was performed for the same set of 110 sequences using the 3SEQ program [22].

Antibody Escape Effect Estimation
An escape effect against an antibody targeting the NTD domain was evaluated using structural analysis [23]. In order to evaluate the escape effect for S protein, four sequence variants were tested: (i) the wild type, (ii) 156_158del (B.1.1.523 lineage), (iii) 156_157del & R158G (Delta variant), and (iv) S protein sequence with multiple mutations matching NCBI Genbank entry UFO69279.1 (Omicron). These sequence variants were modelled using part of the 7LQV structure as a template: 14-283 residues from a chain A (spike protein), 1-113 residues from a chain H (antibody) and 1-106A residues from a chain L (antibody) employing Rosetta package (2021.16.61629_bundle) [24]. The S protein residues from 284 position to the "N" terminus were discarded. The PDB structures for fragment library generation and other required databases for Rosetta were downloaded at 2 June 2021. The S protein structure models were created using a comparative modelling approach following guidelines in the documentation. During the homology modelling Rosetta runs, the antibody structure including side chains was kept constrained using a Coordinate Constraint Generator. For the relaxation part, an "InterfaceRelax2019" script was used. The lowest energy model was chosen from 200 models using two applications from the Rosetta package: (i) clustering with an energy-based_clustering application, and (ii) choosing the best scoring cluster centroid using the score_jd2 application.
The antibody binding for the chosen model was further evaluated performing local docking in two ways (i) using HADDOCK 2.2 [25], and (ii) using the docking protocol application from the Rosetta package [24] in combination with scoring by PRODIGY [26,27]. HADDOCK 2.2 was run via the Refinement interface on the webpage (https:// milou.science.uu.nl/services/HADDOCK2.2/haddockserver-refinement.html, (accessed on 12 June 2022) using the default parameters.
The antibody docking was also carried out using the docking-protocol application from the Rosetta package. At first, local docking was run using "-partners LH_A -ex1 -ex2aro -dock_pert 3 8" non-default flags, then the refinement step was conducted using "-partners LH_A -use_input_sc -docking_local_refine -ex1 -ex2ar" non-default flags. The resulting complex structure was scored with the InterfaceAnalyzer application using "-interface LH_A -pack_separated" non-default flags [24]. For each model, docking was run 5000 times and the top 500 structures were chosen for analysis based on the interface score (I_sc). The complex formation energy of the 500 structures for each sequence variant was predicted using PRODIGY [26,27] command line program (https://github.com/haddocking/prodigy, master branch cloned on 20 June 2022). The binding was evaluated using the non-default flag "-selection H, L, A" (this requests to evaluate interaction between the antibody (H,L chains) and the S protein).
Additionally, the four homology models (starting structures for docking rins) were evaluated using the PRODIGY program via the web interface (https://wenmr.science. uu.nl/prodigy/ (accessed 12 June 2022)) [27]. The RMSD between the docked structures resulting from the calculations by the docking protocol application from the Rosetta package and the homology models (starting structures) were calculated using the PyMOL 2.5 [28] command line interface via the "align" command.
The antibody escape effect of E484K and S494P mutations was analyzed using the FoldX 4 program [26]. The antibody structures targeting the receptor binding domain of the S protein were downloaded from CoV3D as available on 1 May 2021. The structures were initially processed by the pdb2pqr v3.1.0 [https://doi.org/10.1002/pro.3280] program using these non-default command line flags: "pdb2pqr30 -drop-water -ff AMBER -keepchain -pdb-output". The mutations into the complexes were introduced and binding energy was calculated by consecutively applying the FoldX commands: RepairPDB, BuildModel, and AnalyseComplex. The residues at 484 and 494 positions were modelled either to the residue matching the mutation or to the wild type residue, and changes in free energy upon binding were evaluated. The AnalyseComplex command was run using a non-default flag "-analyseComplexChains = A".
The synergy of the escape effect was evaluated as follows. At first, for a given complex structure, relative ∆∆G was calculated using the FoldX output using this formula: where ∆G is the free energy of the complex. The higher the ∆∆G, the greater extent to which the binding to an antibody was disturbed due to a mutation. For example, if ∆G wildtype is −10 kcal/mol and ∆G E484 is −5 kcal/mol, then the ∆∆G E484 would be (−5-−10)/10 kcal/mol, which is 0.5 or 50%. The relative, and not the absolute value, was used to make the evaluation robust to variations in absolute ∆G values and make results comparable across different complexes. The synergy of the double E484K & S494P mutant was evaluated by introducing a relative ∆∆∆G that was calculated using this formula: For example, if ∆∆G E484 is 0.2, ∆∆G S494P is 0.4 and ∆∆G E484&S494P is 0.6, then the corresponding ∆∆∆G relative would be min((0.6-0.2), (0.6-0.4)), which is 0.2 or 20%. The greater the relative ∆∆∆G, the greater is the escape synergy, or in other words, the greater is the disruption of a complex by the double mutant compared with any of the single point counterparts.

Mutation Review of B.1.1.523
Several mutations in the S region have been observed in the B.1.1.523 variant, from which 156_158del, E484K and S494P are considered as an attribute of VOM ( Figure 1). According to previously reported data, the 156_157del and G158R mutations in the Delta variant match the same protein surface area as the 144 and 241-243 deletions in the Alpha and Beta (B.1.351) variants, respectively. These altered residues are found in the NTD 'supersite' targeted by most anti-NTD neutralizing antibodies, thus providing a mode to evade the immune system [29]. Moreover, the E484K mutation also contributes to SARS-CoV-2 immune system evasion. Several recent studies have observed that E484K may significantly reduce convalescent serum neutralization [30,31]. Additionally, it was observed that the S494P mutation is related to a 3-5-fold reduced SARS-CoV-2 neutralization in sera. With the combination of the 156_158del, E484K and S494P mutations, as shown in Figure 1, the B.1.1.523 lineage should remain on epidemiologists' watchlists as one of the concerning SARS-CoV-2 lineages. Moreover, the T1027I substitution in the spike is in a perfectly conserved amino acid strand [32] typical of the former VOC gamma variant (P.1). We cannot find any studies focused on investigating the effect of this mutation. E156V, F306L, E780A and D839V are not typical in other lineages, but are observed in Delta, Omicron, Gamma and other lineages. The total number of sequences published in GISAID with these mutations (13 June 2022) is 615, 2540, 962, and 832, respectively, out of a total of 11.4 million published genomes.

The Origin and Formation of Key S Protein Formation of the Lineage
One of the objectives of the analysis was to determine if the two clusters of mutations, which are responsible for immunity resistance, were acquired by sequential mutations or if this is a result of recombination events. Initially, a list of data entries comprised of the sequences classifiable as B.1.1.523 lineage together with the closest sequences based on identity was used to construct a maximum likelihood (ML) tree. The size and overlap between the two data sets is depicted in Figure 2. We have observed that some sequences which originated from progenitors with a full set of expected substitutions at 484, 494, 156, 157 and 158 S region positions underwent a reverse-type mutation to wild type variants. Such events are highly unlikely and are usually caused by erroneous sequence assembly or could be an indication of low-quality data.
Additionally, to discern the cases in which the mutations comprised of the two regions of potential enhanced resistance to immune response have been combined, phylogeny analysis of all the unique S protein variants was performed. As depicted in Figure 3, there are at least three distinct cases in which E484K and/or S494P have been combined with 156_158del. The visualized bootstrap values were calculated either by the default method used by VeryFastTree (Figure 3a) or IQ-TREE (Figure 3b), as indicated in Section 2. The phylogeny analysis of the S protein indicates that lineage assignments by Pango can change and sometimes are not stable across different Pangolin versions when pangoLEARN is used [33]. Therefore the assignments can be misleading and relying on the plain assignments could hide novel developments of SARS-CoV-2. For example, some sequences from Turkey that were assigned as lineage B.  Transitions E1 (introduction of the triple deletion) and E2 (introduction of E484, S494P) ( Figure 3a) indicate a pathway which the majority of B.1.1.523 lineage sequences took. As in the case of data submitted, we do not detect sequential acquisition of E484K or S494P, and immediately next to the triple deletion, we see the two additional mutations. This could indicate a potential recombination event but it is most probably due to lack of sufficient data.
An interesting case is observed with two highly diverged Turkish sequences classified by Pango as B.1.1.523 lineage. In this case, following the most likely scenario, the E484K was acquired first and only then were the S494P and E156_R158del acquired ( Figure 4). As in the case discussed above, the sequential steps acquired are missing from the data. Most probably, the sequencing data was not comprehensive enough to reveal the full picture of how the combination was formed. Transitions E1 (introduction of the triple deletion) and E2 (introduction of E484, S494P) (Figure 3a) indicate a pathway which the majority of B.1.1.523 lineage sequences took. As in the case of data submitted, we do not detect sequential acquisition of E484K or S494P, and immediately next to the triple deletion, we see the two additional mutations. This could indicate a potential recombination event but it is most probably due to lack of sufficient data.
An interesting case is observed with two highly diverged Turkish sequences classified by Pango as B.1.1.523 lineage. In this case, following the most likely scenario, the E484K was acquired first and only then were the S494P and E156_R158del acquired (Figure 4). As in the case discussed above, the sequential steps acquired are missing from the data. Most probably, the sequencing data was not comprehensive enough to reveal the full picture of how the combination was formed. used by GISAID. The asterisks ** indicate a sequence from Turkish variants that have their two most variable sequences manually swapped with corresponding regions from the reference sequence. The two lower graphs indicate the sequence alignments from the two extremely variable fragments. The order of sequences is the same as in the upper graph: reference sequence, three Turkish sequences with swapped fragments and three original Turkish sequences.
In addition to these two sequences assigned to the B.1.1.523 lineage, one Turkish sequence was placed in the ML tree immediately next to them ( Figure 5) but was assigned to the B.1 lineage. For each of the three sequences, we created a synthetic sequence with the variable fragments switched to the wildtype match (Figure 4, asterisks denote the synthetic variants). These two highly diverged Turkish B.1.1.523 sequences indicate that there are vast uncharted territories in COVID-19 evolution, as we have only sparse sequencing data obtained from central Asian, African, southeast Asian, and east Mediterranean regions where no more than 1.6% of confirmed cases are sequenced [34]. An evident third case emergent combination of immune response hindering mutations from distinct S pro- used by GISAID. The asterisks ** indicate a sequence from Turkish variants that have their two most variable sequences manually swapped with corresponding regions from the reference sequence. The two lower graphs indicate the sequence alignments from the two extremely variable fragments. The order of sequences is the same as in the upper graph: reference sequence, three Turkish sequences with swapped fragments and three original Turkish sequences.
In addition to these two sequences assigned to the B.1.1.523 lineage, one Turkish sequence was placed in the ML tree immediately next to them ( Figure 5) but was assigned to the B.1 lineage. For each of the three sequences, we created a synthetic sequence with the variable fragments switched to the wildtype match (Figure 4, asterisks denote the synthetic variants). These two highly diverged Turkish B.1.1.523 sequences indicate that there are vast uncharted territories in COVID-19 evolution, as we have only sparse sequencing data obtained from central Asian, African, southeast Asian, and east Mediterranean regions where no more than 1.6% of confirmed cases are sequenced [34]. An evident third case emergent combination of immune response hindering mutations from distinct S protein regions is highlighted by E5 and E6 in Figure 3a. In this case, within the B.1.351 lineage, the E484K mutation was introduced first and then E156_R158del followed. The data presented shows that, evidently, the immune response-hindering mutations from the RBD and NTD domains have been combined in one protein at least three times. In two of them, E484K occurred first; in one case, the first one was the triple deletion. Most probably, these are the results of independent mutational events. The Turkish sequences assigned to the B.1.1.523 lineage depicted in Figure 5 are highly diverged from the wild type based on the regional variation in the S protein. According to the GISAID Audacity Instant search (performed on 7 October 2021) [4], the first variable region (bottom left, Figure 5) from Turkey/HSGM-B11599/2021 was found in seven sequences, and the second variable region (bottom right, Figure 5) was found in four sequences. The sequences have been deposited into GISAID during several submissions. Other genomic regions of the Turkish sequences shown in Figure 4 do not contain such large SNP clusters as the ones shown in the S protein. This is indicated by the changes in the placement of the three Turkish sequences upon switching the two extremely deviated fragments with corresponding fragments with the reference sequence (the black lines in Figure 5 connect the original sequence variant with the manually changed sequence). The switch to the wildtype fragment for the three sequences changes the placement and they are no longer clustered close to each other. Only one of them after the switch is clearly placed together with other B.1.1.523 sequences. In most of the cases, the transmission origin country (country inferred for a cluster MRCA sequence) and the target country (country of a sequence indicated in the GISAID metadata) was Russia ( Figure 7). As we see in Figure 7b, most of the transmission events occurred within Russia. Evidently, the largest number of cases where transmission origin and target countries were different resembles transmission from Russia to Germany. The data indicates one reverse type (Germany to Russia) transmission. As of the date this article was written, Germany can be considered as a reservoir of the B.1.1.523 lineage. In May 2021, the B.1.1.523 and Delta variants represented 0.002% and 0.054%, respectively, whereas the data for August 2021 shows a steady increase in detected cases to 0.327% and 32.358%, respectively. The overall number of sequenced cases in Germany is overwhelmed by more contagious the Delta (Figure 8)    The "0" time point indicates the date of the earliest lineage sequence uploaded onto the GISAID database. Only sequences that have the typical set of S mutations were considered (E484K, S494P, 156_158del). Only the cases which correspond to the top 12 countries with the most abundant detection rate are included in the underlying data. The top 12 countries correspond to 93% of all cases.
In most of the cases, the transmission origin country (country inferred for a cluster MRCA sequence) and the target country (country of a sequence indicated in the GISAID metadata) was Russia (Figure 7). As we see in Figure 7b, most of the transmission events occurred within Russia. Evidently, the largest number of cases where transmission origin and target countries were different resembles transmission from Russia to Germany. The data indicates one reverse type (Germany to Russia) transmission. As of the date this article was written, Germany can be considered as a reservoir of the B.1.1.523 lineage. In May 2021, the B.1.1.523 and Delta variants represented 0.002% and 0.054%, respectively, whereas the data for August 2021 shows a steady increase in detected cases to 0.327% and 32.358%, respectively. The overall number of sequenced cases in Germany is overwhelmed by more contagious the Delta (

B.1.1.523 Antibody Escape
In Figure 9a, the predicted free energy of complex formation for the three sequence variants are presented considering the NTD-antibody complex. The ∆G of the complex formation was predicted using the PRODIGY [27] program based on the 500 top scoring structures resulting from the Rosetta docking runs for each sequence variant. The PRODIGY scores correlate well with experimentally determined binding ∆G [27]. The data could allow ranking of the sequence variants in terms of decreasing affinity as follows: WT < Delta < B.1.1.523 < Omicron. The matching data on Rosetta dG_separated scores and RMSD values for the 500 structures for each sequence variant are presented in the Figure A2. The analogous docking using HADDOCK [25] is presented in Figure 9b. The HADDOCK scores positively correlate with experimental binding free energies [35], and in terms of decreasing mean score of the best cluster, the sequence variants could be arranged in the following order: WT < B.1.1.523 < Delta < Omicron. Both methods indicate that the Omicron variant disrupts the antibody binding to the largest extent. In addition, both methods predict lower affinity to the antibody for B.1.1.523 and Omicron sequence variants compared with the wild type. These differences were shown to be statistically significant by pairwise comparisons using the Wilcoxon rank sum test analyzing the PRODIGY predictions (Figure 9a). All calculated FoldX binding energies are given in the Supplementary File FoldX_data.zip. The csv file provided contains data on binding energies of the three mutants (two single point and one double) and the corresponding wild type structures. As described in Section 2, the synergy of the two mutations was evaluated using a relative measure ∆∆∆G. The relative values were used in order to make the evaluation robust to variations in absolute ∆G values and make the results comparable across different complexes. The relative ∆∆G values for single point mutants and the corresponding ∆∆∆G values for the cases with the highest values of the latter are shown in Figure 10a [37].
As we see in Figure 10a the majority of analyzed complexes are not influenced by the combination of the two mutations. As we see from the ∆∆G S494P&E484K values, apart from the complexes that should be disrupted by the combination of the two mutations, there are complexes that should be stabilized, as indicated by negative ∆∆G values. The complexes that have the highest ∆∆G S494P&E484K (the highest complex disruption by the double mutation) are also prominent for the highest ∆∆∆G values that indicates synergy.
It is interesting to notice that the two most disrupted complexes are single chain synthetic antibodies from structures 6YZ5 and 6ZH9 [38]. However the isolated monoclonal antibodies from structures 7KGJ [39] and 7CDJ [40], along with an antibody from structure 7K43 [41], are also highly impacted. The ∆∆G values indicate a relative increase in binding energy compared with the wild type structure for a particular mutation or set of mutations. The ∆∆∆G indicates the minimum difference between the ∆∆G of the double mutation and any of the two single point mutations. The larger the value, the larger the synergy in evading interaction with the antibody. If a complex's ∆∆∆G was greater than 15%, its label is given containing the name followed by the corresponding value of ∆∆∆G. (b) The structure of an antibody and receptor binding domain of the S protein complex (PDB ID: 6YZ5 that was highly impacted by the double mutation [38]. The green color depicts the S protein, cyan color-antibody chain. Sticks depict the E484 and S494 residues together with residues that form polar contacts with E484 or S494. The polar contacts are depicted by yellow dashes.

Discussion
We have identified a new SARS-CoV-2 virus lineage with multiple mutations associated with immune escape and reported this to Pango on 5 May 2021 [https://github.com/ cov-lineages/pango-designation/issues/69], which mandates the new lineage designation B.1.1.523. This lineage was first determined in March 2021, and at the time of writing this article, the total number of cases had reached 623 across 31 countries [42]. It is likely that the rapid increase in the circulation of the Delta variant could have diminished the rise of the B.1.1.523 lineage. The spread of the B.1.1.523 SARS-CoV-2 lineage was pushed out by other lineages, and the last case was observed in a sample from France, collected in 31 August 2021 (EPI_ISL_4412356). Interestingly, the transmission of this lineage has diminished in Russia, where it was most expected to rise. This can be explained by different diagnostic strategy approaches in the Russian Federation, where the testing is performed on non-randomly selected sources in the country. Alternatively, this could be explained by the steep rise in the Delta variant in Russia, which started a month earlier than in Europe. For example, in mid-June, more than 80% of cases in Russia were Delta, whereas in Germany the same frequency of Delta was observed only in mid-July.
The B.1.1.523 lineage possesses three or more mutations that characterize SARS-CoV-2 VOCs, including the S:D156-158 deletion, S:E484K and S:S494P. The D156-158 deletion is at the β-hairpin antigenic supersite, located in the same region typical for the Delta variant (E156G and 157-158del) [43]. The E484K mutation has been detected in the previous VOC Beta variant (B.1.351) and VUM Zeta (B.1.1.28). The mutation is in the genomic region coding the SARS-CoV-2 spike protein and it appears to have a significant impact on the body's immune response and, possibly, vaccine efficacy. On 1 February, Public Health England (PHE) announced that the COVID-19 Genomics (COG-UK) consortium had identified this same E484K mutation in 11 samples carrying the Alpha variant B.1.1.7, after analyzing 214,159 sequences [44]. An in vitro study for the SARS-CoV-2 spike protein mutations responsible for antibody evasiveness has identified that the S494P mutation reduces SARS-CoV-2 neutralization by 3-5-fold in some convalescent sera [45]. Additionally, Lassaunière et al. found that lineages with E484K showed a significant reduction in virus neutralization titers relative to D614G (4.0-fold) and other Delta strains tested, including B.1.617.2 (2.3-fold), AY.4 (2.3-fold), and AY.4.2 (1.7-fold) [46]. Li, et.al observed that additional S494P showed increased Alpha and Beta variants infectivity [47]. Our analysis indicates that there is a possibility of antibody escape synergy for the S:E484K and S:S494P combination. The potential synergetic effect was most evident for the single chain antibodies; however, our modeling showed that it can also be substantial and also in the cases of monoclonal human origin antibodies. Sequences currently deposited into GISAID do not support the hypothesis that S:156_158del was combined with S:E484K and S:S494P during a recombination event. Instead, it appears that the triple deletion was introduced initially, and then the addition of S:E484K and S:S494P followed.
We have shown by molecular modelling that in at least one antibody case the triple deletion del156-158 could decrease interaction. The combination with other immune escape enhancing mutations at RBD could result in a variant highly resistant to immunity formed from initial virus variants. The Delta variant also posseses sequence changes at the S protein residues 156-158 that can induce immune escape and recombination with the B.1.1.523 variant or de novo introduction of the E484K and S494P mutations could make the variants more contagious. It is worth noting that a Delta variant containing the E484K mutation has already emerged [48].
Our modeling showed that, at least in the case of one NTD targeting antibody, an Omicron sequence variant is notable as having the lowest potential affinity, followed by those for Delta and B.1.1.523. Two docking approaches were consistent in indicating that the Omicron sequence variant disrupts the interaction with the antibody to the largest extent compared with the other three sequence variants. Additionally, both methods showed that the B.1.1.523 and Delta variants should also bind the antibody with lower affinity compared with the wild type. This is clearly consistent with experimental data showing that the Omicron variant is capable of immune escape [49] and could partially explain why B.1.1.523 did not spread: Omicron could have an advantage in escaping antibodies targeting the NTD domain of the S protein.
The case with the Turkish sequences is controversial. The sequences in the S region significantly deviated fragments with many "private" mutations. These sequences have been seen across several submissions. Hence, these might not be artificial. If these sequences are not artefacts, then the Turkish sequences classified by Pango as belonging to the B.1.1.523 lineage resulted after a recombination event between a highly diverged variant from Turkey and a typical B.1.1.523 lineage sequence. In Turkey, the virus has spread widely with limited control and reporting [50]. Therefore, widely divergent variants may have evolved. The results indicate that the Pango classification should not be taken for granted. Of the two Turkish sequences assigned to the B.1.1.523 lineage, only one was characteristic of the lineage SNPs at the S protein; the other one (EPI_ISL_2403346) has mutations characteristic of the Delta variant (a double deletion and G residue in the 156-158 region). This issue was resolved in Pango v.4.0.6 PLEARN-v1.8 version.

Conclusions
Although this variant no longer poses a risk and has been displaced by other VOCs, the study of a combination of mutations in the B.1.1.523 lineage has shown a synergistic effect in antibody escape, which may be useful for understanding new variants of SARS-CoV-2 and the evolution of mutations.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/microorganisms10071356/s1, Figure S1: Maximum likelihood tree based on the genomic sequences that are classifiable as B.1.1.523 lineage together with the closest sequences based on identity search; File S1: FoldX binding.

Data Availability Statement:
The datasets analyzed during the current study are available in the GISAID and COV3D repositories, the corresponding persistent links: https://www.gisaid.org/ (accessed on 5 May 2021) [4] and https://cov3d.ibbr.umd.edu/ (accessed on 5 May 2021) [51]. All data generated during this study are included in this published article and its Supplementary Information Files.