The Promise of Mutation Resistant Drugs for SARS-CoV-2 That Interdict in the Folding of the Spike Protein Receptor Binding Domain

: In recent work, we proposed that effective therapeutic drugs aimed at treating the SARS-CoV-2 infection could be developed based on interdicting in the early steps of the folding pathway of key viral proteins, including the receptor binding domain (RBD) of the spike protein. In order to provide for a drug target on the protein, the earliest contact-formation event along the dominant folding pathway of the RBD spike protein was predicted employing the Sequential Collapse Model (SCM). The segments involved in the predicted earliest contact were suggested to provide optimal folding interdiction target regions (FITRs) for potential therapeutic drugs, with a focus on folding interdicting peptides (FIPs). In this paper, we extend our analysis to include 13 known single mutations of the RBD spike protein as well as the triple mutation B1.351 and the recent double mutation B1.617.2. The results show that the location of the FITR does not change in any of the 15 studied mutations, providing for a mutation-resistant drug design strategy for the RBD-spike protein. Author Contributions: Conceptualization, F.B.-C.; Data curation, F.B.-C.; analysis, F.B.-C.; Funding acquisition, F.B.-C. and H.A.R.; Investigation, F.B.-C. and H.A.R.; Methodology, F.B.-C.; Project administration, F.B.-C. and H.A.R.; Supervision, H.A.R.; Validation, F.B.-C. and H.A.R.; Visualization, F.B.-C.; Writing—original


Introduction
Now in its second year, the pandemic generated by the coronavirus SARS-CoV-2 has produced more than four million deaths, as well as massive economic and social damage [1][2][3]. A matter of current concern is the long-term suitability of the newly developed vaccines especially under continuing mutations of the coronavirus [4][5][6]. Many such mutations are already known [7,8] and in recent times, new strains combining several mutations and deletions were identified in the UK, South Africa and India with enhanced infectivity and, possibly, lethality [9][10][11]. The high rate of mutation, common in RNA viruses [12], raises the concern that the coronavirus SARS-CoV-2 might become endemic within the human population [9], and that more lethal variants might evolve over time. Thus, developing treatments whose efficacy is minimally affected by viral mutation has become a pressing goal.
In recent work, we proposed that effective therapeutic drugs aimed to treat SARS-CoV-2, could be designed by aiming to interdict in the early folding stages of the folding pathway of critical viral proteins [13]. In particular, we suggested that the folding of the receptor binding domain (RBD) of the spike protein of SARS-CoV-2, (i.e., a protein essential for viral entry in the cell through interaction with angiotensin-converting enzyme 2 (ACE2)) [14][15][16][17][18], could be pre-empted by employing specific peptide drugs [13]. Such peptide drugs can be designed relying on the predictions of the Sequential Collapse Model (SCM) for early folding contact formation [19,20]. The SCM allows for a prediction of the earliest contact-formation events along the folding pathway of globular proteins [19]. The earliest contacts along the multi-state folding pathway of proteins longer than~100 amino acids, are called primary contacts in the SCM. These primary contacts are expected within the model to nucleate the subsequent folding steps, and thus should be key steps COVID 2021, 1 289 along "Nature's Shortcut" to protein folding [20]. The specific protein segments involved in these early primary contact-formation events can then be employed to identify folding interdiction target regions (FITRs), as well as the starting sequence to design possible folding interdicting peptide (FIP) drugs [13]. The FITR concept may also be receptive to other drugs besides peptides, although the latter choice is natural and remains the focus of this work.
The purpose of this paper is to show that the proposed FITR-based strategy for the design of therapeutic drugs is robust to a broad sample of the known mutations of the RBD-spike protein. In order to do so, the SCM will be applied to the sequences of 13 single position mutants of RBD-spike (V341I, A348T, N354D, D364Y, V367F, K378R, Q409E, A435S, W436R, G476S, V483A, Y508H and N501Y) as well as the triple mutation associated with the B1.351 lineage (K417N, E484K and N501Y), and the double mutation known as the delta variant B1.617.2 (E484Q and L432R). The goal of examining these mutants is to determine whether the location along the sequence of the FITR changes, and also if the predicted starting sequence of the FIPs should be altered as a result of any of the mutations. The FITR and FIP are generally found to be robust with only two mutations showing significant deviations in the stabilizing energy of the early contacts with respect to the predictions for the native sequence. These deviations, however, do not alter the predicted location of the FITR and have a limited effect on the possible FIP sequences.
The concepts presented in this paper are similar to earlier proposals to develop drugs that might be therapeutic by interfering with the folding process [21,22]. There has been a growing interest in the possibility of modulating the folding process through the employment of specific molecules, and theoretical and experimental results have been recently reported that support the view that targeting folding dynamics is a promising path for therapeutic development [23][24][25]. The main difference between the proposal presented here and other approaches is that the FITR strategy aims to interdict in the folding process while the protein is still completely unfolded [13]. Thus, the FITR strategy is particularly well suited to target proteins as they are translated by the ribosome as we will explain below. Due to this feature, the FITR strategy holds the promise of being a very effective viral suppressor. Moreover, if shown to properly function, the FITR strategy could be readily extended to other viruses.

The Receptor Binding Domain of the Spike Protein of SARS-CoV-2
The single-strand RNA genome of SARS-CoV-2 encodes four structural proteins. These are the small protein (E), the nucleocapsid (N), the matrix protein (M) and the spike protein (S) [26][27][28]. The spike protein is a fusion I type protein that forms trimers on the protein surface [26][27][28]. It has two subunits: S1 and S2, responsible for receptor-binding and membrane fusion, respectively [26][27][28].
The receptor recognition mechanisms of coronaviruses have been extensively studied [14,15,28]. In particular, for SARS-CoV-2 [18] and its earlier viral variant SARS-CoV [29], entry into the host's cells is mediated by the spike protein that includes a specific receptorbinding domain (RBD) [18,30]. The RBD recognizes the angiotensin-converting enzyme 2 (ACE2) as its receptor [15,16,18]. The structure of SARS-CoV RBD is well known [16], and the structure of SARS-CoV-2 RBD is similar [18], albeit with some specific mutations in the ACE2 binding ridge that enhance the ability of SARS-CoV-2 to bind human ACE2 [18]. Due to its role in receptor recognition and binding, the RBD of the spike protein is the main determinant of SARS-CoV-2 infectivity [16,31], and also the major antigen for immune response [32]. Thus, the RBD of the spike protein of SARS-CoV-2 is a natural therapeutic target given its critical biological role in facilitating the virus entry in a cell. The search for inhibitors, including peptides, that actively impede the RBD-ACE2 binding process of the SARS-CoV-1 coronavirus has been an active field of investigation for many years [17,33,34], and the current pandemic due to the closely related SARS-CoV-2 has led to greatly renewed interest [35][36][37][38][39][40][41]. A considerable number of mutations of the spike protein of SARS-CoV-2 have been characterized, as the virus mutates rapidly under selective pressure as is commonly observed in RNA viruses [12,42]. Several of these mutations are located in the RBD domain [7][8][9]. While most of the mutations of the RBD known to date tend to reduce the virus' infectivity [43], several of them (for example V367F, W436R and N354D) have been predicted to have higher binding affinity to human ACE2 on the basis of molecular dynamics simulations, thus probably being more infectious in vivo [44]. Some of the mutations reduce the sensitivity of the virus to specific monoclonal antibodies [45]. For example, A475V, Y508H and F490L reduce the sensitivity to several monoclonal antibodies, while V483A becomes resistant to some as well [43]. Conversely, some mutations in the RBD region, including V367F, Q409E, Q414E, I468F, I468T, Y508H, and A522V, have been observed to make the spike protein more susceptible to neutralization by monoclonal antibodies [44]. Importantly, mutation N501Y of the RBD appears in two recently identified high infectivity strains [29,46,47]. Due to the high rate of mutation of CoV-2, considerable effort has been devoted to identify conserved epitopes in the RBD that might provide for mutation-resistant antibody targets [48][49][50][51].
In recent work, we proposed that an alternative strategy for the development of therapeutic drugs for SARS-CoV-2 could target critical regions of the virus proteins involved in folding initiation [13]. Specifically, we proposed that peptide drugs of defined sequence could function as naturally designed inhibitors of the folding of RBD-spike thus leading to virus inactivation [13]. The proposed strategy is graphically described in Figure 1.
A considerable number of mutations of the spike protein of SARS-CoV-2 have been characterized, as the virus mutates rapidly under selective pressure as is commonly observed in RNA viruses [12,42]. Several of these mutations are located in the RBD domain [7][8][9]. While most of the mutations of the RBD known to date tend to reduce the virus' infectivity [43], several of them (for example V367F, W436R and N354D) have been predicted to have higher binding affinity to human ACE2 on the basis of molecular dynamics simulations, thus probably being more infectious in vivo [44]. Some of the mutations reduce the sensitivity of the virus to specific monoclonal antibodies [45]. For example, A475V, Y508H and F490L reduce the sensitivity to several monoclonal antibodies, while V483A becomes resistant to some as well [43]. Conversely, some mutations in the RBD region, including V367F, Q409E, Q414E, I468F, I468T, Y508H, and A522V, have been observed to make the spike protein more susceptible to neutralization by monoclonal antibodies [44]. Importantly, mutation N501Y of the RBD appears in two recently identified high infectivity strains [29,46,47]. Due to the high rate of mutation of CoV-2, considerable effort has been devoted to identify conserved epitopes in the RBD that might provide for mutation-resistant antibody targets [48][49][50][51].
In recent work, we proposed that an alternative strategy for the development of therapeutic drugs for SARS-CoV-2 could target critical regions of the virus proteins involved in folding initiation [13]. Specifically, we proposed that peptide drugs of defined sequence could function as naturally designed inhibitors of the folding of RBD-spike thus leading to virus inactivation [13]. The proposed strategy is graphically described in Figure 1.  For such a strategy to be efficient, it is important that the drugs developed from it are sufficiently effective against mutated viruses. In order to investigate whether that is the case, the following section explores the effects, if any, on the earliest folding events of 13 known single mutations of RBD-spike (V31I, A348T, N354D, D364Y, V367F, K378R, Q409E, A435S, G476S, W436R, V483A, Y508H and N501Y), and also the impact of the triple mutation associated with the B1.351 lineage (K417N, E484K and N501Y), and the double mutant B1.617.2 known as the delta variant (E484Q and L432R). Our predictions rely on a combination of SCM energy calculations from primary sequence information following the procedure described in previous work, and the crystal tertiary structure of RBD-spike [52] to identify the viable primary contacts and their stabilization energies.

Primary Contacts for RBD-Spike Mutants
In previous work, we calculated the possible primary contacts (i.e., the earliest contactformation events along the folding pathway) for native RBD-spike [13] and we show the predicted contacts within~6 kT (i.e., three orders of magnitude difference in population) of the most stable one in Table 1, labeled in order of decreasing energies as C1, C2 . . . Cn. The calculations were completed following the methodology employed in the SCM before [19,20], entailing a search for the most stable possible hydrophobic contacts between pairs of 5-amino acid segments i and j, located at a distance n ij along the sequence such that 65 n ij 100 amino acids (see Section 4.2 in Methods for a complete explanation). Our calculations predict that the best possible primary contact C1 is established between segments ( 432 CVIAW 436 ) centered at I434, and ( 511 VVLSF 515 ), centered at L513, 79 residues apart, with a stability of ∆G cont (C1) ≈ −11.5 kT*, representing~59% of the early primary contacts. The two components of the predicted contact C1 are in clear proximity in the crystal structure of the protein (PDB code 6M0J) [52]. The second-best possible contact C2 is defined by segments ( 335 LCPFG 340 ), centered at P337, and ( 432 CVIAW 436 ), centered at I434, 97 residues apart, with a stability of ∆G cont ≈ −10.8 kT. Contact C1 is shown on the crystal structure in Figure 2.
Contacts C2-C4 are not native-like in the crystal structure, although its constituent segments are generally in the same region of RBD-spike. Contacts C5-C7 are clearly nonnative. Thus, as explained in previous work [13], our result implies that the best primary contact C1 predicted here constitutes the entry point to "Nature's Shortcut" [18] to the native structure. As segments ( 432 CVIAW 436 ) and ( 511 VVLSF 515 ) define C1, they constitute the natural FITR/FIP pair for consideration of developing a therapeutic peptide. Moreover, because contacts C1-C4 all include segment ( 432 CVIAW 436 ), a FIP designed from sequence ( 511 VVLSF 515 ) should be able to inhibit not just folding through the dominant primary contact, but also any added folding that might occur through C2-C4, thus maximizing the folding interdiction effect. The color code reflects the location from the N-terminus (dark blue) to the C-terminus (yellow). Only the side chains corresponding to the segments that define the primary contacts are shown.
Contacts C2-C4 are not native-like in the crystal structure, although its constituent segments are generally in the same region of RBD-spike. Contacts C5-C7 are clearly nonnative. Thus, as explained in previous work [13], our result implies that the best primary contact C1 predicted here constitutes the entry point to "Nature's Shortcut" [18] to the native structure. As segments ( 432 CVIAW 436 ) and ( 511 VVLSF 515 ) define C1, they constitute the natural FITR/FIP pair for consideration of developing a therapeutic peptide. Moreover, because contacts C1-C4 all include segment ( 432 CVIAW 436 ), a FIP designed from sequence ( 511 VVLSF 515 ) should be able to inhibit not just folding through the dominant primary contact, but also any added folding that might occur through C2-C4, thus maximizing the folding interdiction effect.
Our calculations considered the mutants V341I, A348T, N354D, D364Y, V367F, K378R, Q409E, G476S, V483A, Y508H and N501Y as well as the triple mutation B1.351, K417N, E484K and N501Y, besides the double delta variant B1.617.2, E484Q and L432R. Importantly, none of these variants were found to bear a significantly altered native primary contact as identified in the non-mutated sequence. Thus, the predictions for possible interdicting peptides made for the native sequence apply to the considered mutants. There are some small effects of these mutations on some of the non-dominant, non-native primary contacts that do not affect the dominant native pathway. These effects are shown in Table 2. The other two mutations, A435S and W436R are located in segment ( 432 CVIAW 436 ) which is, as explained above, involved in all of the predicted primary contacts. Due to their location within segment ( 432 CVIAW 436 ), both mutations induce an energy shift Gcont, in the stability of the native primary contact. The effects of these shifts on the energy of the predicted native primary contact and the possible FITRs are analyzed in Sections 2.3 and 2.4. Our calculations considered the mutants V341I, A348T, N354D, D364Y, V367F, K378R, Q409E, G476S, V483A, Y508H and N501Y as well as the triple mutation B1.351, K417N, E484K and N501Y, besides the double delta variant B1.617.2, E484Q and L432R. Importantly, none of these variants were found to bear a significantly altered native primary contact as identified in the non-mutated sequence. Thus, the predictions for possible interdicting peptides made for the native sequence apply to the considered mutants. There are some small effects of these mutations on some of the non-dominant, non-native primary contacts that do not affect the dominant native pathway. These effects are shown in Table 2. The other two mutations, A435S and W436R are located in segment ( 432 CVIAW 436 ) which is, as explained above, involved in all of the predicted primary contacts. Due to their location within segment ( 432 CVIAW 436 ), both mutations induce an energy shift ∆∆G cont , in the stability of the native primary contact. The effects of these shifts on the energy of the predicted native primary contact and the possible FITRs are analyzed in Sections 2.3 and 2.4.

Effects of Mutation A435S
Mutation A435S induces an energy shift of magnitude ∆∆G cont ≈ 0.8 kT on contacts C1-C4 that now involve the mutated segment ( 432 CVISW 436 ), instead of ( 432 CVIAW 436 ), including the dominant native primary contact. The results obtained for the possible contacts of this mutant are shown in Table 3. Table 3. Predicted possible primary contacts of the A435S RBD-spike mutant and their correlation with the native 3D structure, only native-like contacts are expected to lead to native folding pathways. The relative populations are expressed in percentages.

Contact
Stability ( With these results, the predictions made for the native chain remain the same as mutation A435S induces the same energy shift on all of the other possible contacts that involve segment ( 432 CVIAW 436 ) as shown in Table 2. The changes in relative population are also very small, around~1%. Thus, the dominant primary contact remains the same as for the native sequence C1, with C2 relatively close in energy, and a FITR strategy focused on interdicting the formation of C1 should work equally well as with the native protein.
The mutation A435S could have an effect on the design of an optimal FIP. When searching for an optimal FIP, it is likely that detailed interactions play a significant role. Ideally a FIP is to be found that has greater affinity for the FITR than the naturally occurring segments of the primary contacts. Such an optimized FIP could more efficiently compete with the intramolecular formation of the primary contact, thus reducing the drug concentration required to interdict in the folding of RBD-spike. The search for such an optimal FIP is likely to be greatly facilitated by employing molecular dynamics techniques that allow for an atomic-level resolution assessment of the relative binding affinities of different FIPs [53,54].

Effects of Mutation W436R
Mutation W436R induces a 1-amino acid shift for contacts C1-C4 involving segment ( 432 CVIAW 436 ), that now involves segment ( 431 GCVIA 435 ) instead. This segment shift is the natural consequence of a hydrophobic tryptophan, being replaced at the end of the segment by a charged arginine. From a structural perspective, the segment shift does not significantly change the native classification of the predicted contacts as shown in Table 4. Table 4. Predicted possible primary contacts of the W436R RBD-spike mutant and their correlation with the native 3D structure, only native-like contacts are expected to lead to native folding pathways. The relative populations are expressed in percentages. The mutation W436R has made the non-native C5 more stable than the native-like C1. The location of C1 is shifted by one amino acid with respect to native RBD-spike (see Section 2.2).

Contact
Stability ( Thus, it is reasonable to treat all the contacts involving ( 431 GCVIA 435 ) in the W436R mutant as equivalent to those involving ( 432 CVIAW 436 ) for the native sequence. From the stability standpoint however, mutation W436R represents a different case from the mutations considered above because it introduces a large change in the stabilization energy of the dominant contact C1, and also contacts C2-C4, all including segment ( 432 CVIAW 436 ) in the native protein, such that ∆∆G C1-C4 (W436R) ≈ 5 kT. The new set of predicted contacts and their stabilities is shown in Table 3. The native primary contact location is shifted by one residue; the native segment ( 432 CVIAW 436 ) is shifted to ( 431 GCVIA 435 ).
This energy shift induces a large change in the relative populations of the possible primary contacts, which is large enough that the most stable possible primary contact is now C5, defined by segments ( 390 LCFTN 394 ) and ( 488 CYFPL 492 ), 98 amino acids apart, with ∆G cont ≈ 7.6 kT, 1.1 kT more stable than the predicted native-like best mutated contact C1(W436R). This contact is completely non-native on the tertiary structure of RBD-spike, so within the current model, it is not expected to nucleate a native-like folding pathway. The native-like contact C1 represents for this mutant, just~16.4% of the total population of primary contacts. As contacts that are not native-like are not expected to lead to successful folding, they would have to break up before those molecules bearing them could form a native-like alternative contact. Thus, formation of the non-native contacts is suggested to constitute a kinetic trap along the folding pathway, and it is expected within the model that mutant W436R folds less efficiently than the native RBD-spike, taking a longer time for most of the molecules to reach the native 3D structure. There are no detailed folding experiments on RBD-spike and its mutants to directly test this hypothesis. However, when expression of the mutant W436R was attempted experimentally, it was observed that the yield of functional protein is much less than that of native RBD-spike and also several mutants [39]. Finally, because virus assembly time scales within the cell are much longer than protein folding times, the less kinetically efficient folding of the RBD-spike mutant W436R need not significantly hamper the formation and spreading of the virus in vivo.
The results for mutant W436R suggest that a possible interdiction strategy could employ the equivalent native protein primary contact peptide ( 432 CVIAW 436 ) as the starting point to design an FIP. This suggestion is made because the interaction of ( 432 CVIAW 436 ) with ( 511 VVLSF 515 ) is much stronger than that of segment ( 431 GCVIA 435 ) included in the native primary contact in mutant W436R. Thus, peptide ( 432 CVIAW 436 ) is a natural choice as an initial design of a therapeutic peptide drug that interdicts in the formation of the native primary contact of mutant W436R. Finally, because the initial population of the native primary contact is lower than that in the native protein, the FITR strategy is likely to be more efficient for W436R at the same concentrations of folding interdicting peptides.
For the purpose of designing an experimental test of the FITR strategy, it is important to bear in mind that, in order to successfully interdict the folding process, the FIP must interact with the unfolded protein. This specific feature of the FITR strategy suggests that the optimal step along the protein synthesis cycle to attempt interdiction of the subsequent folding process is when the protein is still being translated in the ribosome. Thus, a possible proof-of-concept experiment could rely on employing ribosomes to synthetize spike-RBD in the presence of an appropriate FIP designed following the criteria laid out previously. If the FITR strategy operates as expected, the yield of properly folded spike-RBD should decrease, and that can be measured both through standard structure characterization techniques such as circular dichroism (CD) or nuclear magnetic resonance (NMR) [55], or potentially through measures of activity such as the ability to bind ACE-2 [56]. An additional advantage of this experimental approach is that it allows for the molecule being tested to interdict the folding of RBD-spike on much longer time scales than early folding time scales. The earliest folding time scales can be as short as microseconds [57], while the translation process in the ribosomes is much slower [58]. Although such an experiment is certainly within the purview of current techniques [59], it is important to keep in mind that the process of drug discovery is fraught with practical difficulties, and that success in vitro is only the first step on the way to applicability in vivo.

Discussion
For the 15 SARS-CoV-2 RBD-spike mutants studied here, the only native-like primary contact and thus "Nature's shortcut" to the native structure remains defined by segment ( 432 CVIAW 436 ) (or the equivalent ( 431 GCVIA 435 ) for the W436R mutant), and ( 511 VVLSF 515 ). The robustness of the location of the native-like primary contact to mutation is a natural consequence of its key role in nucleating the subsequent folding of the protein chain [19]. Thus, although these conclusions await experimental verification, these results also serve to bolster the foundations of the SCM and its prediction capability. It was noted additionally in previous work on adenylate kinase [19] that the segments involved in the predicted best primary contact were very conserved, and their mutation tended to impede the folding of the protein. It would be important to experimentally test further effects of conservative and non-conservative mutations on the predicted RBD-spike primary contacts. In particular, it would be very interesting to test the prediction made here of slower folding for the mutant W436R.
In previous work, we proposed that the identification of key protein segments involved in the earliest folding nucleation events provided for natural folding interdiction target regions (FITRs), against which specific folding interdicting peptides (FIPs) could be identified as leading therapeutic candidates [13]. Moreover, because two protein segments are involved in every contact, in each case one segment can be used as a template for the FITR and the other for the FIP. In the case of RBD-spike, the FITR would be either one of the dominant primary contact segments ( 432 CVIAW 436 ) or ( 511 VVLSF 515 ). Depending upon the FITR actually aimed for, the FIP could be chosen employing the other segment as a template, so if for example ( 432 CVIAW 436 ) was the FITR, the starting point for the design of the FIP would be the segment ( 511 VVLSF 515 ). While these choices are natural within the SCM model, careful mutation of these choices might perform even better. These conclusions are depicted graphically in Figure 3.
that the process of drug discovery is fraught with practical difficulties, and that success in vitro is only the first step on the way to applicability in vivo.

Discussion
For the 15 SARS-CoV-2 RBD-spike mutants studied here, the only native-like primary contact and thus "Nature's shortcut" to the native structure remains defined by segment ( 432 CVIAW 436 ) (or the equivalent ( 431 GCVIA 435 ) for the W436R mutant), and ( 511 VVLSF 515 ). The robustness of the location of the native-like primary contact to mutation is a natural consequence of its key role in nucleating the subsequent folding of the protein chain [19]. Thus, although these conclusions await experimental verification, these results also serve to bolster the foundations of the SCM and its prediction capability. It was noted additionally in previous work on adenylate kinase [19] that the segments involved in the predicted best primary contact were very conserved, and their mutation tended to impede the folding of the protein. It would be important to experimentally test further effects of conservative and non-conservative mutations on the predicted RBD-spike primary contacts. In particular, it would be very interesting to test the prediction made here of slower folding for the mutant W436R.
In previous work, we proposed that the identification of key protein segments involved in the earliest folding nucleation events provided for natural folding interdiction target regions (FITRs), against which specific folding interdicting peptides (FIPs) could be identified as leading therapeutic candidates [13]. Moreover, because two protein segments are involved in every contact, in each case one segment can be used as a template for the FITR and the other for the FIP. In the case of RBD-spike, the FITR would be either one of the dominant primary contact segments ( 432 CVIAW 436 ) or ( 511 VVLSF 515 ). Depending upon the FITR actually aimed for, the FIP could be chosen employing the other segment as a template, so if for example ( 432 CVIAW 436 ) was the FITR, the starting point for the design of the FIP would be the segment ( 511 VVLSF 515 ). While these choices are natural within the SCM model, careful mutation of these choices might perform even better. These conclusions are depicted graphically in Figure 3  In this paper, we showed that such a FITR drug development strategy is likely to be quite robust to mutations of the selected target virus proteins. The expected and observed In this paper, we showed that such a FITR drug development strategy is likely to be quite robust to mutations of the selected target virus proteins. The expected and observed robustness within the model is the natural consequence of the primary contact being an essential nucleating event for the folding of the protein [18].
Robustness to mutation of the FITR strategy is suggested to be relevant for the successful design of effective drugs against viruses with high rates of mutation, and whose mutations regularly enable them to overcome previously developed treatments, a common occurrence in RNA viruses [13]. It remains to be seen whether a pattern of vaccine effectiveness reduction through mutation establishes itself for SARS-CoV-2 but the possibility cannot be discounted [60]. Thus, a mutation-resistant strategy such as FITR might be very useful for therapeutics whose effectiveness is difficult for the virus to overcome.
An interesting related question is whether mutations that make the primary contact more stable relative the native protein could increase the infectivity and/or lethality of SARS-CoV-2 by increasing the initial population of molecules entering the dominant folding pathway. The efficacy of such a possible gain-of-function mechanism cannot be ascertained on the basis of the mutants of RBD-spike analyzed here.

Methods: The SCM Model
The physical basis of the SCM and its most up-to-date formulation have been recently explained in full detail [19,20]. Here, only a brief summary of the main concepts is presented that are relevant to the issues investigated in the present paper.

The SCM Entropic Cost of Loop Formation
The SCM considers early non-local contacts based on the entropy of formation of the resultant protein loops. The SCM has successfully predicted many of the observed features of protein folding pathways [20]. In the SCM two different loop regimes are considered when analyzing early non-local contacts: short loops for which the gyration radius R g (n), is smaller than the average side chain length š(n) (i.e., R g (n) < š(n)), and long loops for which R g (n) > š(n). The loop length at which the transition between the short and long loops takes place (i.e., the length for which R g (n) ≈ š(n)) is called the optimal loop length n op . The optimal loop length has been estimated to be n op ≈ 65 amino acids for typical protein sequences for š(n) ≈ 7.9 Å [20], although some sequence variability exists and n op is expected to be shorter for highly disordered proteins that contain few of the bulky hydrophobic amino acids that must be buried in any contact and thus should dominate the effective š(n) in most globular proteins [61]. This value for n op is consistent with experimental data showing the behavior of poly-alanine, a polypeptide with smaller side chains than the average globular protein; in this case š(n) ≈ 6.7 Å [20], which exhibits deviations from Gaussian statistics due to steric hindrance when n < 50 amino acids [62].
The long loop regime is physically equivalent to the classical Flory-Jacobson-Stockmeyer (FSJ) picture and the entropic cost of forming protein loops is well represented, assuming that the amino acids can be taken to be solid ball-like, by a simple logarithmic function of the form [63]: ∆S loop (n > n op ) ≈ −3/2 ln (n) This is clearly an approximation, as for example, the side chains would be better represented by solid spheres of different sizes according to the primary sequence. In the SCM short loop regime, however, the internal degrees of freedom of the side chains cannot be neglected, and the entropic cost of forming short loops must be higher than when the amino acids are taken to be solid spheres. Moreover, because most of the degrees of freedom are in the side chains, we expect the contribution of the side chains to the overall entropic cost to be dominant with respect to that of any constraints imposed by loop formation on the backbone.
Thus, in the SCM it is expected that for a short loop, the entropic cost of loop formation ∆S loop approximately becomes: ∆S loop (n < n op ) ≈ −3/2 ln (n) + ∆S side-chain-crowding (n, š), with ∆S side-chain-crowding << 0, opposing folding. When R g (n) ≥ š(n), we have ∆S side-chain-crowding ≈ 0 and the standard FJS regime is recovered. The side chain crowding term ∆S side-chain-crowding will appear as a correction to the JS results for shorter loops. It is extremely difficult to obtain an analytical expression for the side chain crowding term, and in the SCM it has been presented in generic Boltzmann-Gibbs form [20]: ∆S side-chain-crowding (n, š) = n ln [f 0 (n, š)/f loop (n, š)], where f 0 (n, š) is the average configurational freedom per amino acid in the unfolded chain, and f loop (n, š) is the average configurational freedom of an amino acid in a loop. Consideration of modifying the homogeneous Flory-like representation of the protein chain to take into fuller account the microscopic details of the protein-solvent system is not exclusive to the SCM and has been employed before to account, for example, for the effects of the solvent [64]. Based on the model developed above, in the SCM, the folding of proteins with more than~100 amino acids likely involve the formation of an early non-local contact (i.e., the primary contact within the SCM), that defines the earliest folding phase with n ≥ n op ≈ 65 amino acids. As only a few primary contacts can be established at most in proteins of length n ≥ n op , most of the tertiary structure contacts will still be defined by contacts at shorter range established in later folding phases [20,65]. Formation of the primary contact in the SCM defines the primary loop, which subsequently collapses through two-state kinetics [20,60]. As proteins longer than~100 amino acids do not generally undergo twostate collapse [20] but rather fold through multi-step pathways, consistent simple physical reasoning implies that there is a limit to the size of the primary loop that can successfully lead to the native SCM folding pathway of~100 amino acids.
The concept of folding nucleated by non-local contacts is not exclusive of the SCM, having arisen earlier in the context of the diffusion-collision model [66] and in the energy landscape picture [67]. It also has appeared in simulations of the transition state of twostate folding proteins [68]. Additionally, protein topology has been considered an essential element of folding mechanisms in a number of theoretical efforts [69][70][71][72][73]. The particular feature in the SCM is that the early non-local contacts are highly specific as in the loop hypothesis [74], and a methodology is developed to derive their location from primary sequence information [19,20].

Determining the Primary Contact
Based on the model presented in the previous sections, whether there is a non-local contact in an otherwise unfolded state is dependent upon the stability of the potential contact candidates at loop lengths of n ≥ n op amino acids. In the SCM, the stability of a contact formed by the number n cont of amino acids, ∆G contact (n cont , n loop ), can be written as: ∆G contact (n cont , n loop ) ≈ ∆G int,H (n cont ) + ∆G loop (n loop ) + ∆G cont,S (n cont ) Here ∆G loop represents the entropic free energy cost of the loop as discussed in Section 4.1. The term ∆G int,H denotes all the enthalpic interactions that help stabilize the contact, possibly including hydrophobic interactions, van der Waals interactions, hydrogen bonds, disulfide bonds, and salt bridges [75], and its value satisfies ∆Gint < 0. The term ∆G cont,S > 0 represents the entropic cost of constraining the side chains of the amino acids defining the contact and it opposes contact formation. A segment-specific determination of the value ∆G cont,S (n cont ) for a given contact would require detailed MD techniques. However, a heuristic estimate can be made from earlier work within the SCM which showed that the average entropic cost of folding per amino acid for a sample of 13 proteins was ∆G folding/residue,S ≈ 0.85 kT/residue [65], and the maximum was ∆G folding/residue,S ≈ 1.09 kT/residue. As these are estimates for the entropic cost for folding per residue of complete proteins that include highly buried as well as flexible exposed regions, it is then reasonable to expect that the entropic cost of a contact-forming region must be closer to the highest calculated values for ∆G folding/residue,S . Here, we will assume that ∆G contact,S (n contact ) for a contact including n cont amino acids is approximately ∆G folding/residue,S determined by the number of residues defining the contact, such that ∆G cont,S (n cont ) ≈ 1.09 n cont . This result is clearly an approximation, but it suffices for establishing a cut-off in the number of possible contacts that is consistent with the available structural data [65].
Hydrophobic interactions are well understood to constitute the main driving force of the folding process [70]. Other interactions such as hydrogen bonds are weaker [76], or like disulfide bonds and salt bridges form later along the folding pathway [75]. Thus, for an early contact forming from the unfolded state, we can take ∆G int (n op ) ≈ ∆G hyd (n op ), where ∆G hyd (n op ) represents the stabilizing effect of hydrophobicity in the early contacts, and Equation (4) can be written as ∆G contact (n cont , n loop ) ≈ ∆G hyd (n cont ) + ∆G loop (n loop ) + ∆G contact,S (n contact ) Since the hydrophobic stabilization energy of the contact ∆G hyd is determined by the hydrophobicity of the segments involved, the hydrophobicity values h k are obtained from the Fauchere-Pliska scale [77] and assigned to each residue in accordance with previous calculations within the SCM.
As the amino acid side chains are significantly larger than the typical peptide bond length, early contacts between two hydrophobic amino acids will inherently involve segments including several amino acids, adjacent to the initial contact. The stability of this early hydrophobic contact will determine where the folding process is initiated. This picture is not unlike the zapping model of Dill and collaborators [78]. Here, the typical early contact segment size will be taken to be~5 amino acids in line with previous calculations within the SCM [19,20]. The 5-amino acid window size is based on the geometric considerations underlying the SCM: with an average effective fluctuating width of the unfolded protein chain of w~2 š(n) ≈ 15.8 Å, and a peptide bond length of 3.5 Å, the minimum number n cont of amino acids that can define a contact in the open fluctuating chain should be n cont~i nt [2 š(n)/3.5] = 5 amino acids. In practice within the SCM, the hydrophobicity h k of each residue is added over a segment contact window of five amino acids centered at residue i, resulting in a segment hydrophobicity h i , 5 (a value of~0.45 is equivalent to a change in energy of kT).
In order to determine the best contact, the h i,5 values of a segment centered at residue i is added to the h j value of a segment centered at residue j, located at a distance n ij at least n op amino acids apart along the sequence, and no longer than the maximum primary loop length of~100 amino acids, to give a contact stability of: ∆G cont (n cont , n loop ) ≈ kT [−(h i,5 + h j,5 )/0.45 + 3/2 ln n ij + 10.9] 100 ≥ n ij ≥ 65 (6) Finally, in order to estimate the uncertainty in ∆G cont , we carried out a standard deviation analysis, relying on the experimental uncertainties reported for the h i values for each residue type [77], and the experimental data leading to the calculated value of ∆G folding/residue,S ≈ 1.09 kT/residue [65].

Conclusions
In this paper, we showed that a hypothesized FITR drug design strategy appears robust for 15 mutants of SARS-CoV-2. It was further explained that such robustness is an attractive asset for the development of treatments whose effectiveness does not diminish in the face of natural virus mutation.
The FITR strategy could be moreover extended to other viral and bacterial protein targets. For example, it would be very interesting to design mutation-resistant drugs against Influenza and many other infectious diseases [79]. As the FITR aims to induce therapeutic effects by interfering with the protein folding process in a specific fashion, it is possible that its application could be further extended to diseases which are related to protein folding, including misfolding-related neurodegenerative conditions [80], and maybe even cancer [81,82]. In these scenarios, FITR would serve to interdict in the misfolding process. Work is underway to explore these prospects.