Influence of Conformational Entropy on the Protein Folding Rate

One of the most important questions in molecular biology is what determines folding pathways: native structure or protein sequence. There are many proteins that have similar structures but very different sequences, and a relevant question is whether such proteins have similar or different folding mechanisms. To explain the differences in folding rates of various proteins, the search for the factors affecting the protein folding process goes on. Here, based on known experimental data, and using theoretical modeling of protein folding based on a capillarity model, we demonstrate that the relation between the average conformational entropy and the average energy of contacts per residue, that is the entropy capacity, will determine the possibility of the given chain to fold to a particular topology. The difference in the folding rate for proteins sharing more ball-like and less ball-like folds is the result of differences in the conformational entropy due to a larger surface of the boundary between folded and unfolded phases in the transition state for proteins with a more ball-like fold. The result is in agreement with the experimental folding rates for 67 proteins. Proteins with high or low side chain entropy would have extended unfolded regions and would require some additional agents for complete folding. Such proteins are common in nature, and their structural properties are of biological importance.


Introduction
Understanding the folding of biological macromolecules is one of the main problems of molecular biology and biophysics.Protein molecules have an enormous number of possible conformations, but this does not hinder finding their unique stable three-dimensional (3D) structure within much less time than required for an exhaustive sorting of all their conformations.Understanding the reason for fast folding of 3D structures of biological macromolecules is especially important for the de-novo design of proteins.For instance, for the de-novo design of a protein it is necessary to know what features of its primary structure define the stability of its 3D structure and what features of the sequences provide for its fast folding.
Some general trends and correlations are beginning to emerge between the structural, thermodynamic and kinetic properties of proteins [1][2][3][4][5][6][7].There is an enormous diversity in the folding behavior for small proteins that fold with simple two-state kinetics, as well as for large proteins that fold with multi-state kinetics.
The apparent lack of a relationship between stabilities and folding rates of topologically diverse proteins indicates that the topology may be a critical determinant of folding kinetics [3].But the topology itself cannot explain the differences in the refolding rates for some proteins sharing the same fold (SH3 domains, cold shock proteins, fibronectin domains, proteins belonging to the ferredoxin fold) [8][9][10][11][12].
On the other hand, a number of basic correlations between the protein size and folding rate have been suggested [1,[13][14][15].All of them point out that, as might be expected, the folding rate decreases with the protein size, but suggest different scaling laws for this decrease.It should be noted that simulations of folding of simplified off-lattice protein models [16] suggest that the logarithm of protein folding rate scales as -L 0.61±0. 18, which is in accordance with both -L 2/3 , following the theory of Finkelstein and Badretdinov [1], and -L 1/2 , following the theory of Thirumalai [13].The current statistical analysis of protein folding data shows that all the suggested scalings, from log(L) [14] to -L 1/2 and -L 2/3 correlate with the observed folding rates almost equally: the correlation between folding rates and protein sizes is about 70% (see Table 1) [17,18].
The relative contact order (CO) (equal to average chain separation of contacting residues in the native fold, divided by the chain length) was invented to compare differences in chain folds (rather than in size) between two-state folding proteins of different lengths [3].This parameter is small for proteins stabilized mainly by local interactions and large when residues in a protein interact frequently with partners far away in the sequence.The latter should lead to slower folding [3,19].Recently, however, it has been found that the relative contact order (CO) [3], computed from the shape of protein chain structure rather than from the lengths of the chains, fails to predict the protein folding rates (see Table 1) [18].At the same time, it was found that the "absolute contact order" that combines CO and the chain length L in a form Abs_CO = CO × L (and scales as ~ L 0.7 , on the average) is able to predict rather accurately folding rates for both two-and multi-state folding proteins, as well as for short peptides: the correlation reaches 77% for all 84 proteins and peptides (see Table 1) [18,20].Moreover, it has been noted that in order to predict protein folding rates, any parameter should correlate well with the protein chain length [18].A very illustrative example of this is observed with relative contact order (the normalized parameter, which has poor correlation with the logarithm of protein folding rates, see Table 1) and absolute contact order, which includes both the size of protein and the average length of loops and correlates well with the logarithm of protein folding rates [18].It has been shown that folding rates of small single-domain proteins that fold through simple two-state kinetics correlate with the number of native contacts and can be estimated from contact predictions [21,22].To predict the protein folding rate using Abs_CO, one should have its 3D structure.The question arises: is it possible to predict the protein folding rate from the sequence alone?
One important question in molecular biology is what determines folding pathways: native structure or protein sequence.There are many proteins that have similar structures but very different sequences, and a relevant question is whether such proteins have similar or different folding mechanisms.To explain the differences in folding rates of various proteins, the theory of protein folding rate should be developed further.Therefore, the search for the factors affecting the protein folding process goes on.
The empirical dependence of the folding rate on some features of amino acid sequences (secondary structure content in the 3D structures, hydrophobic content) has been published for two-state proteins [23][24][25].Having considered 69 proteins and peptides Naganathan and Munoz obtained 74% correlation between experimental folding times and the square root of the number of residues [26].As the authors underlined, an important practical implication of the finding is that one can predict folding times with a precision of ~1.1 time decades by just knowing the size of a protein [26].
Punta and Rost have calculated in a recent publication that if one considers 27 two-state proteins having a significant content of beta strands, the correlation between the effective length and the folding rate becomes insignificant (13% in a jack-knife experiment) [22].Such a result demonstrates that the theory of protein folding rate requires further development.Therefore, the search for the factors affecting the protein folding process continues.
The capillarity model [1] gave rise to the hypothesis that protein folding rates are determined by the average "entropy capacity" (the entropy capacity for the given protein structure is defined as the average conformational entropy per residue divided by the average contact energy per residue; thus, this value is, in a sense, reciprocal to the expected melting temperature) [27].Consideration of the compactness specifically addresses the issue of why some proteins fold more rapidly than others.Statistical analysis demonstrates that four main structural classes [28] of proteins (all-α, all-β, α/β, α+β) differ from one another in a statistically significant manner with respect to the number of rotatable angles φ, ψ and χ and the average number of contacts per residue [29].On the whole, it has been shown that among proteins of the same size, α/β proteins have, on average, a greater number of contacts per residue due to their more compact (more "spherical," since a sphere is the most compact geometrical body having the minimal accessible surface area in comparison with other geometrical bodies of equal volumes) structure, rather than to tighter packing [30].For 75 proteins for which both folding rates and tertiary structures are known, α-helical proteins have on average the fastest folding kinetics and the smallest number of contacts per residue (they are less compact than others), whereas α/β proteins have on average the slowest folding kinetics and the largest number of contacts (they are more compact than others) [30].One explanation is that the expected surface of the boundary between folded and unfolded phases in the transition state [18,31] for a more spherical protein is larger than that for a non-spherical protein.Thus, the fact that α/β proteins are more spherical explains both the more average number of contacts per residue [30] and the slower folding kinetics.
The more spherical the protein is, the slower folding it exhibits.Proteins with multi-state kinetics, on average, are more spherical than proteins with two-state kinetics [18].The obtained result suggests that shape properties are important determinants of the folding kinetics.Rapidly and slowly folding proteins have additional structural diversities that contribute to their differing folding rates.From our previous analysis, we can conclude that the barrier height for folding of large proteins is defined by the size of the boundary between folded and unfolded phases in the transition state.This boundary is larger for a ball-like protein globule than for an elongated one [18,31].
In this work, based on the known experimental data and using theoretical modeling of protein folding process, we demonstrate that an optimal relationship exists between the average conformational entropy and the average energy of contacts per residue, i.e., an entropy capacity, for fast protein folding.Consideration of the entropy capacity specifically addresses the issue of why some proteins fold more rapidly than others.Statistical analysis demonstrates that four main structural classes of proteins (all-α, all-β, α/β, α+β) differ from one another in a statistically significant manner with respect to the number of rotatable angles φ, ψ and χ and the average number of contacts per residue.These parameters offer an explanation to the vast range of experimentally observed folding rates.The obtained result suggests that structural and sequence properties are important determinants of the folding rate.

Entropy Capacity for Proteins with Given Topology
The formation of a sufficient number of interactions between amino acid residues is necessary to compensate for the loss of the conformational entropy during protein folding.Therefore structural uniqueness of the native state is a result of the balance between the conformational entropy and the energy of interactions between residues.Taking these phenomena into account we consider the entropy capacity for a given protein chain with a chosen topology [27].This parameter is the relation between conformational entropy and energy of residue-residue interactions.We will show that the relationship between these values determines the ability of the given chain to fold to a particular topology and the rate of the process.
To demonstrate the importance of the balance between the conformational entropy and energy of residue-residue interactions and its influence on the folding rate, we consider a theoretical model of protein folding proceeding from the analytical theory of this process [1,15].In this model, the folding and unfolding pathways are treated as a sequential insertion of residues from the unfolded state to their native positions according to the 3D native structure or removal of residues from the native position to the coil, respectively, so that the pathways of folding and unfolding would coincide at the point of equilibrium between the native and unfolded states according to the principle of detailed balance (Figure 1).The removed (inserted) residues are assumed to lose (gain) all the non-bonded interactions and gain (lose) the coil entropy, except that spent to close the disordered loops protruding from the remaining globule.A general assumption of this model is that the residues remaining in the globule keep their native position and the unfolded regions do not fold to another, non-native globule.Thus, we neglect non-native interactions; this makes the considered model similar to that of Gō [32].

Optimal Value of Entropy Capacity for Fast Protein Folding
First-order phase transitions proceed by a nucleation mechanism in which a small droplet of the ordered phase is formed within the metastable disordered phase [33].A similar mechanism could apply to the folding of a protein with the ordered phase identified as a native state and the disordered state identified as an unfolded phase [34].A number of recent experimental and theoretical studies suggest that a crucial step of folding is the formation of a native-like folding nucleus [1,18,[35][36][37][38][39][40].We can estimate the changing of free energy during the protein folding by considering the capillarity model [1,41].From such a consideration, a general question arises: is there a minimal value of the free energy barrier for the protein folding?To answer this question, let us consider several general equations from the capillarity model where is the free energy of the native state; is the free energy of the unfolded state; is the free energy of the intermediate state with n fixed residues in the native topology.Here, N is the total number of residues in the protein molecule; ε is the average contact energy per residue; μn 2/3 takes into account the surface residues having less interactions than the internal ones, where μ = 1.5 for a ball-like body [1]; s is the average conformational entropy per residue for an unfolded chain, and T is the temperature.
is the free energy difference between intermediate and unfolded states.We can simplify this expression using ξ = -ε/RT > 0 and σ = s/R > 0. Then we have: The transition state on this pathway is g # = max{g n }.To find the transition state (the maximum), we take the derivative of Equation 5with respect to n: . If we substitute the value of n 0 in Equation 5 and introduce the "entropy capacity" C = σ/ξ (0 < C < 1) we obtain the equation The question about the minimal value of a free energy barrier is reformulated from Equation 6as follows: at what value of ξ at given σ would the value of g # be the minimal?If we try to find the optimal values of C (0 < C < 1) from this equation, we obtain C = 1 (this value is not applicable) and C = 1/3.Namely, with this relationship between the energy and entropy we will have the minimal value of the barrier on the pathway of folding in our model.If we take C opt = 1/3 in Equation 6, we obtain g opt = μ 3 σ.
This equation means that for proteins with a similar topology, the difference in the folding rates will be determined by the difference in the average conformational entropy between native and unfolded states.This result is in agreement with our previous result where we demonstrated that there exists an optimal value of side-chain entropy for fast protein folding of proteins sharing the same fold [27].
At the point of mid-transition we have: and C eq = σ/ξ = 1 -μN -1/3 .If N = 125 and μ = 1.5, we obtain C eq = 0.7 and It should be mentioned that such a relationship between entropy and energy (1/3) is possible only at very low temperatures (if we consider that the relationship between entropy and energy does not change with changing the temperature, we have C = (s/-ε)T and C eq /C opt = T eq /T opt = 0.7 × 3 = 2.1).
To answer the question about the changing of the free energy barrier with two major parameters, ξ and σ , let us write the changing of the free energy barrier in partial derivatives in ξ and σ : At the point of equilibrium we have G N -G 0 = 0, see Equation 7and σ/ξ = 1 -μN -1/3 .Putting this equation in Equation 9and neglecting μN -1/3 (which is not large in comparison with 1 at large N, where N is the size of protein domain in the range of 100-300 amino acid residues) we obtain: Introducing the entropy capacity C= σ/ξ we obtain the dependence of the changing of free-energy barrier on the changing of entropy capacity at an equal value of N: Further, we try to demonstrate using the statistical and experimental data of the protein folding rate that each class of proteins has its own class-specific average number of contacts and average conformational entropy per residue, and these class-specific features determine the folding rates.
One of the main determinants of the protein folding rate is the number of degrees of freedom that are to be fixed during the rate-limiting step of folding.Therefore, the difference in the folding rate for proteins sharing the spherical and less spherical fold is the result of differences in the conformational entropy due to a larger surface of the boundary between folded and unfolded phases in the transition state for a protein with a spherical fold (see Equation 11, μ is larger for a non ball-like body, which causes diminishing of the free energy barrier).Moreover, the unfolded closed loops protruding from the folded part (the nucleus) cause an additional surface tension.

Statistical Analysis of Average Conformational Entropy and Average Number of Contacts per Residue for Different Classes of Proteins
To examine whether there is a significant difference between the mean values of a number of contacts and conformational entropy for four general structural classes, we analyzed 1133 all-α proteins from class a, 1644 all-β proteins from class b, 1617 α/β proteins from class c, and 1435 α+β proteins from class d (according to the SCOP classification [28]).
Calculations of the average number of contacts per residue in proteins were done separately for each class (see Figure 2).In our case, two residues have a contact if any pair of their heavy atoms is at a distance of less than 8.0 Å from each other.The average number of contacts per residue is calculated from the 3D structure of a protein as a sum of the number of contacts of all residues, divided by the number of residues in the protein.The value of the average conformational entropy, ν, was calculated as a summation of the individual number of rotatable angles φ, ψ and χ for each residue (see Table 2) over its complete sequence, normalized by the number of residues in the protein.In Table 3 and Figure 2, one can see that each class of proteins has its own class-specific average number of contacts and the average conformational entropy.The c class has the largest number of contacts.The set of conformations for the main chain of glycine is larger than that of other residues; this is accounted for by assigning three rather than two rotatable angles for glycine.To prove that the result is not a consequence of possible differences in length distribution between proteins of class c and others, we analyzed the number of contacts in each structural class within a given size range where the average length of proteins is nearly the same in each structural class (see Figure 3c,d).It turns out that the proteins of class c indeed have the largest number of contacts than proteins of all other structural classes (all-α, all-β, α+β), and this trend persists in all length windows (see Figure 3c,d) where the number of proteins is sufficient for statistics (see Figure 3a) with the exception of only one site (200-250 residues in the protein chain) where four classes have practically the same number of contacts.We checked that this result (site 200-250) is the consequence of using the database of proteins with homology less than 80%.If one considers a database of proteins with homology less than 25%, then at the site of 200-250 residues the c class of proteins has the largest number of contacts than other classes (the result is not presented).
The reasons for the observed differences in the number of contacts per residue have been investigated [30].Apart from the protein size, the number of contacts also depends on the tightness of packing and on the protein shape.It turns out that, between two structures of the same size and shape, the structure with a tighter packing will have a greater number of contacts, while between two structures of the same size and tightness of packing, the structure with a less surface area (i.e., a more "spherical" structure) will have a greater number of contacts.We have explored separately the influence of tightness of packing and shape.To this end, the solvent-accessible surface area S and volume V surrounded by this surface for each protein considered have been calculated by program YASARA [http://yasara.org],using 1.4 Å as the probe radius of a water molecule.The analysis of protein structures having from 51 to 350 residues revealed that the packing efficiency, indeed, is the same for proteins from different structural classes evidenced by the molecular volume per atom (for all-α proteins -18.520 ± 0.010 Å 3 , for all-β proteins -18.577 ± 0.009 Å 3 , for α/β proteins -18.618 ± 0.007 Å 3 , and for α+β proteins -18.598 ± 0.009 Å 3 ) [18].Therefore, the average molecular volume per atom V, which excludes water, is practically the same for all classes of proteins.This means that the tightness of packing is practically the same for all the four investigated protein classes.
Next, we explored the influence of the protein shape on the average number of contacts per residue.We used two ratios: V/S for different geometric bodies of equal volumes, this ratio should be maximal for a ball, and this value is proportional to the radius of the cross-section [18]) and S 0 /S which we call compactness (i.e., the ratio of the surface area of a sphere of the same volume as that of the protein to the accessible surface area of the protein; for a sphere, this ratio will be 1).The average values of these parameters are statistically higher for class c than for other classes (Figure 4), which holds true for proteins of all sizes.This means that proteins from class c, on average, are more compact than proteins from other classes.
On the whole, the presented analysis shows that among proteins of the same size, α/β proteins have, on average, a greater number of contacts per residue due to their more compact (more ball-like, since a ball is the most compact geometrical body having the minimal accessible surface area in comparison with other geometrical bodies of equal volumes) structure, rather than to tighter packing.
The value of the average conformational entropy, ν, was calculated as a summation of the individual number of rotatable angles φ, ψ and χ for each residue (see Table 2) over its complete sequence normalized by the number of residues in the protein.Figure 2 demonstrates that topologically diverse proteins fall in a definable region of the average conformational entropy, 3.4-4.2[27].
It is noteworthy that proteins with the smallest value of the average number of rotatable angles per residue belong to class b (see Figure 2), and proteins with the highest value of the average number of rotatable angles belong to class a (as a rule some α-helical DNA-binding proteins, such as transcription factors, have a high conformational entropy).Moreover, the averaging of the obtained values over the proteins of each structural class revealed (see Table 3) that class a has the highest average number of rotatable angles while this value is minimal for class b.
We also analyzed the number of rotatable angles per residue in each structural class within a given size range where the average length of proteins is nearly the same in each structural class (see Figure 3b).Our differential picture for conformational entropy just confirms our results obtained from the consideration of all proteins: class a has the highest average number of rotatable angles while this value is minimal for class b.This result supports the hypothesis that the loss of side-chain entropy is a major determinant of the helix-forming propensities of naturally occurring residues [42].One of the main determinants of the protein folding rate is the number of degrees of freedom that are to be fixed during the rate-limiting step of folding.Therefore, the difference in the folding rate for proteins sharing the same fold (SH3 domains, cold shock proteins, fibronectin domains, proteins belonging to the ferredoxin fold [8][9][10][11][12]) is the result of differences in the conformational entropy.In our case, the difference in the number of rotatable bonds ranges in terms of energy from 0.03 to 0.06 kcal/mol per residues, which could account for a factor of 100 and 10,000 in the spread of folding times for proteins of ~100 residues.To demonstrate this, we have analyzed the proteins from the database of 67 proteins which belong to the same fold (according to SCOP) separately.Proteins that have a similar number of contacts per residue are listed in Table 4.One can see that the folding rates of the proteins significantly differ: it is a factor of four in the case of cold shock proteins (fold b.40), while in the fold b.34 one protein folds 180 times faster than the other one.This difference in the folding rates can be explained by a different number of degrees of freedom per residue for these proteins (see Table 4).

Statistical Analysis of Average Conformational Entropy and Average Number of Contacts per Residue for Different Classes of Proteins
Now we are interested only in the dependence of the change of the free-energy barrier # dg on the change of value C, which depends on the properties of amino-acid residues in the protein (see Equation 11).Since the value of conformational entropy is proportional to the number of degrees of freedom (i.e., the number of rotatable angles φ, ψ and χ ) and the value of energy per residue proportional to the number of atom-atom (or residue-residue) contacts in protein, then the entropy capacity is proportional to the relation of the above values: C ≈ const × C mod where model entropy capacity C mod = ν/m with m being the number of residue-residue contacts in the native structure (two residues were considered in contact with each other if any atom-atom pairs were at a distance of less than 8Å, the contacts between adjacent residues were not taken into account), and ν being the number of rotatable angles φ, ψ and χ (see Table 2).The model entropy capacity has been calculated for 67 proteins with known experimental data for the folding rate and for which the Monte-Carlo simulations has been done for comparison of two methods for prediction of protein folding rate [51].Figure 5 demonstrates that the logarithm of entropy capacity correlates by about 78% with the logarithm of protein folding rate in mid-transition.Using model values for entropy capacity we can see that helical proteins are more frequently selected for fast folding than other classes of proteins.
It should be noted that entropy capacity correlates with folding rate in mid-transition practically at the same level as the first passage times obtained from Monte Carlo simulations (82%, see Table 5) [51].The capillarity model allows us to predict the folding rate at the same level of correlation as by Monte Carlo simulations but without time-consuming calculations required for large proteins.The correlation between the logarithm of the first passage times in Monte Carlo simulations and the logarithm of folding rates for separate classes of proteins is sufficiently high: -0.74 for class a, -0.83 for class b, -0.84 for class d (see Table 5, and http://phys.protres.ru/resources/compact.html, the correlation for class c is absent, so only two proteins are included in the set of 67 proteins).We compare the correlation between folding rates and a set of structural parameters for 67 proteins: contact order, length and the different functional forms of length (see Table 5).It is worth underlining here, that the correlation between folding rates and such structural parameters as contact order is absent for all proteins.The correlation with the length reaches 70% as has been mentioned in the Introduction.Although, the dependence of the logarithm of folding rate proportional to L has Levinthal's estimation of folding time, it is not applicable for consideration.Such a good correlation means that there is a rather great scatter in the data, or the range of variation is too small.It is surprising that the correlation between the folding rate and the different functional forms of lengths (ln(L), L 2/3 , L 1/2 ) is high, which means that the size of the proteins is one of the important and determining parameters of protein folding [4,18] (see Table 5).But the protein size cannot explain the difference in the folding rates for proteins of a similar size (or proteins sharing a similar fold, practically with a similar number of amino acid residues).All other considered parameters are used to explain the differences in protein folding for proteins of practically the same size.We demonstrate that the relation between the folding rate and entropy capacity has class-specific features.It is surprising that average structural properties give a high correlation with the folding rate in each structural class.At the same time the radius of the cross-section (V/S) has a worse correlation coefficient with the folding rate than the size of proteins for the considered set of proteins.We also analyzed the value of model entropy capacity in each structural class within a given size range where the average length of proteins is nearly the same in each structural class (see Figure 6).The differential pattern of entropy capacity only confirms our results obtained from the consideration of all proteins: class a comprises the fastest folding proteins, then follow proteins of classes b and d, and class c comprises the slowest folding proteins.The number of contacts depends on the choice of the contact radius; therefore we checked whether the agreement between folding rates for four general classes is conserved at a varying contact radius.Figure 6 also demonstrates the value of model entropy capacity for two different values of the contact radius: 6 Å and 8 Å.In all cases, we observed that proteins of class a are the fastest folding ones, then follow proteins of classes b and d.The slowest folding proteins are those from class c.
We used a t-test to compare the average values of the model entropy capacity in different structural classes.The results of this comparison are shown in Table 6.There is a significant difference in average between all classes, whereas the difference between β and α+β proteins is not so significant.
From this result we conclude that each class has a class-specific average number of contacts and conformational entropy.In addition to the relationship observed here between a larger number of contacts per residue and slower folding for class c proteins, other properties specific to class c proteins have been observed in previous works.It has been shown that α/β proteins are relatively older than proteins from other classes [52].This is supported by the observation that eukaryotic innovation domains (which only exist in eukaryotes and, therefore, are believed to have evolved later) are statistically less compact than the older, prokaryotic-only domains [53].Also, fold's contact trace in proteins is proportional to protein designability (the ability of a protein to accept mutations that do not destroy the original fold and is correlated with the logarithm of the average gene family size) [54,55].This, too, is consistent with our results because class c with class b includes the largest number of proteins (see Table 3).The higher trace conformations exhibit a marked tendency toward greater structural regularity and symmetry [54,55].This is typical of proteins from class c.
Therefore, the work presented here demonstrates that a more compact structure and slower folding can be expected for α/β proteins, which are older and more designable than proteins from the other structural classes.

Behavior of Proteins with a High and Low Number of Contacts and Side-Chain Entropy
The competition of the side-chain entropy of the given protein, and favorable inter-residue interactions for the given topology, will determine the possibility of the given chain to fold to a particular topology.The formation of sufficient residue-residue interactions is required to compensate side-chain conformational entropy.Therefore, structural uniqueness of native proteins is the result of the balance between the conformational entropy of side-chains and the energy of residue interactions.Proteins with higher or lower side-chain entropy have, as a rule, extended unfolded regions and/or require some additional agents for complete folding.For example: some DNA-binding proteins such as transcription factors (with high side-chain entropy) have regions which only became structured upon DNA binding [56,57]; α-lytic protease and subtilysin (with low side-chain entropy because of the relatively high Gly and Ala content which results in a high conformational entropy of the backbone chain) use a pro region to promote folding [58,59]; N-terminal region of prions [60] with low side-chain entropy can adopt a more stable conformation which is induced by interaction with other prions with that conformation [61].It seems that proteins with large conformational entropy (out of the optimal region) have no sufficient energetic interactions to compensate such large entropy.Therefore, enhanced stabilization for them is achieved by additional interactions with other agents or by oligomerization.
It seems that disordered regions in a protein chain do not have a sufficient number of interactions to compensate for the loss of conformational entropy, which results from the formation of a globular state.On the other hand, a large increase in the energy of interactions will lead to a loss of the unique structure because the strengthening of contact energy will speed up folding, but it is also likely to lead to erroneous folds (for example, to amyloid fibrils).It has been suggested that the lack of a rigid globular structure under physiological conditions might represent a considerable functional advantage for intrinsically disordered proteins, as their large plasticity allows them to interact efficiently with several different targets, as compared with a folded protein with limited conformational flexibility [62][63][64][65][66][67].It has been shown that disordered regions are involved in DNA binding and other types of molecular recognition [68].
The need to obtain a definite balance between conformational entropy and energy of interactions is one of the general conditions to achieve the functional active form of the protein.For some proteins (with high conformational entropy or low energy of residue-residue interactions) such a balance can be achieved only by oligomerization with the same proteins and, for other proteins, only by interaction with additional agents.The shift of this balance caused by changing the external condition (pH or temperature) can result in the formation of stable intermediates as a molten globule-like one [69] or fibrils which may play a pathological role in a cell [70][71][72].
The obtained results may be useful in protein design.When we attempt to create de-novo protein that has a unique structure, it is important to answer the question whether a protein will be folded or natively unfolded.Such a parameter as the mean packing density is able to detect both amyloidogenic and disordered regions in the protein sequence [66,67].(Each residue in a protein sequence is first assigned an expected number of contacts (ENC, or expected packing density) found from the statistics of globular structures).The observed average number of contacts in a globular state for each of the 20 residue types was calculated using 5829 protein structures from the SCOP protein database.Then an average value of ENCs was computed and assigned to the central residue in each window of several sequential residues.If the average ENC value of a residue is larger than the threshold value of 20.4, the residue is considered to be rigid otherwise it is flexible [67]).It has been shown that regions with strong ENCs (expected number of contacts) are responsible for the amyloid formation and regions with weak expected packing density are responsible for the appearance of disordered regions (see Figure 7) [73].

Conclusion
The existence of the optimal region of the entropy capacity, relating two important factors of protein folding such as the conformational entropy and the energy of residue-residue interactions, suggests some optimum balance between them for fast folding.
A definite balance between the conformational entropy and the energy of interactions is one of the general conditions to get a functionally active form of the protein.For some proteins (with a high conformational entropy or low energy of residue-residue interactions), such a balance can be achieved only by oligomerization with the same proteins, and, for other proteins, only by interaction with additional agents.Strong contact energy will speed up the folding rate, but it is likely to result in misfolds.Namely, taking into account the existence of such a balance, we can explain why sometimes less stable proteins fold faster than more stable proteins with the same topology.
We consider such intrinsic structural properties of protein structure that provide information about intrinsic preferences for the given class of protein structures: the number of rotatable angles and the number of contacts per residue.It should be noted that the protein classes differ from one another in a statistically significant manner with respect to the model entropy capacity in our case.The result suggests that structural and sequence properties are important determinants of the folding rate.Such topological features as the number of contacts and number of rotatable angles (which is defined by the content and connectivity of secondary structures) can significantly affect the protein folding rates.
The understanding of the folding mechanisms and the reason for fast protein folding can help in the design of new proteins and in the understanding of protein misfolding.When we attempt to create a de novo protein that has a unique structure, it is important to consider the relationship between the number of rotatable angles and the average number of contacts for optimal folding of this protein.

Figure 1 .
Figure 1.Sequential folding (viewed from left to right) and unfolding (viewed from right to left) pathways.The folded part (dotted) is native-like.The bold line shows the backbone fixed in this part; the fixed side chains are not shown for the sake of simplicity (the volume that they occupy is dotted).The dashed line shows the unfolded chain.

Figure 2 .
Figure 2. (a, c, e, g) Distribution of proteins on the average number of rotatable angles φ, ψ and χ per residue and the average number of contacts per residue (the contact radius is 8 Å) for four general classes.(b, d, f, h) Distribution of the number of contacts per residue on the protein size for four general classes.Different colors correspond to various structural classes (red-class a, green-class b, blue-class c, and violet-class d).

Figure 3 .
Figure 3. (a) Distribution of proteins composing our database by their size.Two vertical lines indicate the range of sizes (proteins 51-350 residues long) where the number of proteins is sufficient for our analysis.(b) Number of rotatable angles per residue in the proteins of given size.(c) Number of residue-residue contacts (per residue) at a contact distance of 6 Å in proteins of given size.(d) Number of residue-residue contacts (per residue) at a contact distance of 8 Å.The colors are the same as used in Figure 2.

Figure 4 .
Figure 4. (a) Average ratio of volume to accessible surface area (i.e., radius of cross-section) in each size range for the four general SCOP classes.(b) Average compactness, i.e., ratio of the accessible surface area of a sphere of the same volume as the protein to the surface area of the protein.The error of average is given.

Figure 5 .
Figure 5. Correlation between the computed model entropy capacity and the experimentally measured folding rate at the mid-transition for 67 proteins.Both are represented in a logarithmic scale.The correlation coefficient is 0.78 ± 0.05.

Figure 6 .
Figure 6.Dependence of the model entropy capacity on protein size for four general classes (large database) when contact radius is 8 Å (lower four curves) and when it is 6 Å (upper four curves).The average values of the model entropy capacity for all proteins of four classes are presented in the inset.

Figure 7 .
Figure 7. Histogram representing the distribution of 5829 globular protein domains as a function of the expected packing density.Arrows indicate upper and lower thresholds which correspond to unusually strong and unusually weak expected packing density.

Table 1 .
Correlation coefficients between logarithms of folding rates in mid-transition and size of protein structure.

Table 2 .
Number of rotatable angles for amino acid residues of 20 types.

Table 3 .
Structural characteristics of proteins.
L is the number of residues in the protein; m(8 Å) is the number of contacts (at a contact distance of 8 Å) per residue; ν is the number of rotatable angles per residue; C mod is the model entropy capacity which is the relation between ν and m.Averaging is done for each class of proteins.

Table 4 .
Calculated parameters and experimentally measured folding rates of proteins.

Table 5 .
Correlation coefficients between logarithms of folding rates in mid-transition and parameters connected with protein size and shape for three protein classes (a, b, d).

Table 6 .
Two-sample t-test performed on structural classes.