1. Introduction
Proteins under biological conditions exhibit marginal structural stability [
1], and they unfold and refold repeatably in vivo [
2]. Consequently, many of the biological processes that are facilitated by protein macromolecules are modulated by the properties and energetic character of the denatured state. Indeed, numerous efforts have shown that denatured state effects, such as residual structure [
3], excluded volume [
4], and intrinsic conformational propensities [
5], have key roles in molecular recognition [
6], allosteric signaling [
7], folding [
8,
9], and stability [
10]. A molecular-level understanding of how proteins are utilized for biological work thus requires characterization of the native, as well as the myriad of non-native, conformational states that exist in solution for a protein, the latter of which is referred to as its denatured state ensemble (DSE).
Despite its importance in understanding protein function, the probability and structural character of the full spectrum of states sampled by proteins are not known. Numerous studies have used short peptides as experimental models from which to probe the characteristics of the DSE [
11,
12,
13]. The use of short peptides is advantageous because, being too short to fold, they offer access to unfolded states under otherwise folding conditions. Moreover, in the absence of folding, conformational preferences are simplified and locally driven by factors such as hydration [
14] and steric hindrance [
15]. These studies find that peptides have strong preferences for the polyproline II (PPII) backbone conformation, even at nonproline positions [
12,
16,
17], suggesting that PPII structures are dominant components of the DSE. The PPII conformation is characterized by an extended left-handed helical turn with the amide hydrogen and the carboxyl oxygen of each peptide backbone projecting into solution, presumably making favorable contact with the solvent [
18,
19,
20]. In addition, the PPII conformation appears to facilitate favorable intrachain n→π* interactions, which should be a stabilizing factor [
21]. Short peptides also exhibit conformational preferences for other backbone structures. At cold temperatures, alanine residues have intrinsic α-helix-forming tendencies (i.e., even in the absence of favorable side chain interactions) that are stabilized predominantly by peptide hydrogen bonds [
22]. Elevated temperatures have been observed to promote low levels of β-strand [
16] or β-turn [
23], though the amino acid preferences for forming strand [
24] or reverse turn structures [
25,
26] are thought to be highly context-dependent.
The protein coil library [
27] also has been used as a structural model for the DSE [
28,
29,
30]. These libraries are constructed from the segments of protein structure in the Protein Data Bank (PDB) that are found outside the α-helix and β-strand domains. Some libraries further omit additional conformationally restricted positions, such as those in reverse turns, or preceding prolines, or immediately flanking a region of secondary structure [
29]. The underlying assumption when using a coil library as a DSE model is that site-specific effects on the intrinsic conformational preferences of the amino acids are minimized by averaging over many environments, and also by removing the regular and repetitive interactions associated with folded structures. Overall, coil libraries exhibit structural trends that are in good agreement with the results from peptide structural studies [
29,
31]. For example, chemical shifts and three-bond
J couplings (
3JHNα) measured in peptides by NMR spectroscopy can be reproduced from structural models made from the protein coil library [
32,
33,
34]. Notably, and similar to the results obtained from peptides, strong preferences for PPII that vary by amino acid type are found in structural surveys of the protein coil library [
28,
29,
30].
Intrinsically disordered proteins (IDPs) offer another experimental system from which to assess structural preferences in unfolded states under nondenaturing conditions [
35]. While chemically denatured proteins are known to adopt macromolecular sizes that depend weakly on sequence details other than chain length [
36], IDPs in water exhibit strong sequence-dependent influences on structural size [
37]. Computer simulations show that steric effects on disordered structure cannot account for the hydrodynamic size dependence on sequence observed in IDPs [
38]. Additionally, temperature changes are found to induce large shifts in the hydrodynamic size for disordered proteins in water [
39,
40,
41] that can exceed the change in size associated with the heat denaturation of folded proteins of the same chain length [
42]. The implication of these findings, albeit expected, is that monomeric disordered protein structure is both under thermodynamic control and highly sensitive to the primary sequence.
In this review, we show that the sequence dependence of IDP hydrodynamic size can be described from the amino acid-specific biases for PPII in the denatured state. Because PPII-rich structures are extended [
43], the magnitude of a PPII preference in the denatured state can affect its mean hydrodynamic size [
44,
45]. Specifically, experiments that evaluate how IDP hydrodynamic size changes with compositional changes in the protein give an independent measure of PPII bias, and further reveal amino acid-specific preferences for PPII that are in good quantitative agreement with PPII bias determined experimentally in peptides [
37]. Good agreement is also found when the IDP results are compared to PPII bias in the protein coil library. Moreover, the analysis of heat effects on IDP hydrodynamic size indicates the PPII bias is driven by a significant and favorable enthalpy, and is partially offset by an unfavorable entropy [
37], which, again, agrees quantitatively with the peptide results [
46]. Across these three models (i.e., peptides, the coil library, and IDPs), the data indicate that the structural and energetic character of the DSE at normal temperatures follows the predictions of a PPII-dominant ensemble. At cold temperatures, both peptides and IDPs reveal the DSE can shift in population toward the α-helix backbone conformation. To demonstrate these conclusions, the following sections review results obtained from numerous spectroscopic and calorimetric studies on short peptides [
11,
12,
13,
16,
17,
46], surveys of structures in the protein coil library [
28,
29,
30], and the more recently acquired sequence- and temperature-based analysis of IDP hydrodynamic sizes [
37], showing that these three experimental systems used for characterizing unfolded proteins under folding conditions convey a surprisingly consistent structural and energetic view of the DSE.
2. Peptide Models of the DSE
The structural preferences associated with unfolded proteins are often described in terms of a predisposition for specific pairs of backbone dihedral angles, phi (Φ) and psi (Ψ). Visually, this is demonstrated with a Ramachandran plot, shown in
Figure 1, where pairs of Φ, Ψ angles that are sterically accessible to a polypeptide chain are mapped [
47]. For example, using a representative plot computed for the central residue in a poly-alanine tripeptide, it shows that (Φ, Ψ) = (0°, 0°) is found in a disallowed region of the plot because these angles for the central residue place the backbone carbonyl oxygen and backbone nitrogen from the first and third residues, respectively, inside the normal contact limits, creating a steric conflict. In contrast, (Φ, Ψ) = (−90°, 90°) for the central alanine has no such contact violations for any of the tripeptide atoms, and thus this angle pair is physically allowed. When an unfolded protein shows preferences for some allowed Φ, Ψ pairs at the expense of others, specifically during the rapid interconversion between states of its conformational ensemble, it is said that the unfolded protein exhibits a conformational bias.
The idea that unfolded proteins and polypeptides in water may exhibit intrinsic biases for some backbone conformations at the expense of others began to receive widespread consideration when the observation was made that, for a protein chain to achieve its unique structure in a biologically relevant time frame, a random search of all accessible conformations is not possible [
50]. The unfolded chain, accordingly, must search a smaller conformational space to what would be predicted from steric considerations alone. This observation predicted that folding is guided by the structural characteristics of the DSE, and experiments to identify folding intermediates, both kinetic [
51,
52] and equilibrium [
53,
54], and measure the intrinsic conformational propensities of the amino acids [
5] have been extensively pursued over the many decades since.
Early experimental evidence indicating structural preferences in the DSE was provided by Tiffany and Krimm from studies on short poly-proline and poly-lysine peptides using circular dichroism (CD) and optical rotatory dispersion (ORD) spectroscopies [
55,
56,
57]. Though these short peptides were unfolded, owing to insufficient chain length for forming compact, globular structures, Tiffany and Krimm found strong preferences for PPII structures. This structural motif at the residue level corresponds to the
trans isomer of the peptide bond and (Φ, Ψ) of approximately (−75°, +145°) [
43,
55]. Its presence in a polypeptide can be established from positive and negative bands in the spectroscopic readings at ~220 nm and ~200 nm, respectively [
55,
56]. The predisposition for adopting PPII was linked to a variety of factors, such as low temperatures, steric hindrance between side chains, a lack of internal hydrogen bonding, and protonation [
57]. Short peptides of poly-glutamic acid also were observed to transition from α-helix at low pH to PPII at neutral pH and higher, identified from CD and ORD spectroscopies [
56], indicating that structural transitions between one region of the Ramachandran plot to others could occur for some sequences owing to simple changes in the peptide charge state. These results, Tiffany and Krimm hypothesized, predict a DSE dominated by backbone interconversions between three main structural states: PPII, α-helix, and unordered, where unordered is represented by the random chain [
57]. They also speculated, to some resistance [
58,
59,
60], that solvation effects may contribute to the observed PPII preferences, since the PPII configuration places the backbone amide and backbone carbonyl oxygen polar groups in favorable positions for contact with water. Intrinsic PPII propensities thus could be helpful for keeping unfolded proteins solvated. Overall, their findings from these peptide-based studies supported the idea that unfolded proteins, though highly dynamic and exhibiting broad structural heterogeneity, nonetheless can show backbone conformational biases that are determined locally by sequence details.
Peptide studies have also made extensive use of poly-alanine, because of the natural abundance of alanine in proteins and its chemically simple side chain (i.e., a methyl group). Using a peptide called XAO, where A is an alanine heptamer and X and O are flanking diaminobutyric acid and ornithine, respectively, Kallenbach and coworkers found strong, temperature-dependent preferences for the PPII conformation [
11].
3JHNα coupling constants measured by NMR techniques were used to estimate the Φ angle at each alanine position from the Karplus relationship [
61], and it was found that Φ was approximately −70° at low temperatures. Because both PPII and α-helix can have Φ angles near this value (
Figure 1), the presence of the α-helix was ruled out by a lack of measurable NOEs between successive amides in the peptide chain, which is an indicator for α-helix formation. The CD spectrum of XAO also confirmed PPII content. Increasing temperatures caused gradual reductions in populating the PPII state that coincided with an increasing population of β-strand conformations to approximately 10% at 55 °C. The reduction in PPII content at high temperatures implied a favorable enthalpy of PPII formation that was also observed by Tiffany and Krimm [
57]. Further studies of XAO by Asher et al. using UV Raman spectroscopy established that XAO is structurally similar to a 21-residue alanine-peptide, AP, that forms α-helix under cold conditions [
62]. AP transitions to PPII at higher temperatures, and demonstrates that AP, similarly to XAO, shows temperature-dependent conformational preferences.
Additional studies that examined a single alanine flanked on both sides by two glycines (i.e., Ac-(Gly)
2-Ala-(Gly)
2-NH
2) found intrinsic preferences for PPII and heat-induced shifts toward β-strand backbone conformations [
63]. Temperature-dependent transitions that exhibit similar structural characteristics have also been seen in alanine tripeptides, tetrapeptides, and octapeptides [
18,
64,
65].
To explore the determinants of the PPII bias in greater detail, quantitative studies designed to measure its dependence on amino acid type were initially conducted by Creamer and coworkers [
12]. Host–guest substitutions at an internal position in a proline-rich peptide (Ac-(Pro)
3-X-(Pro)
3-Gly-Tyr-NH
2, where X is the substitution site) were used to analyze substitution-induced effects on the CD spectrum and measure a scale of relative PPII propensities for 18 of the 20 common amino acids. Bias estimates for tryptophan and tyrosine were not measured, because the aromatic contribution to the CD spectrum from their side chains overlaps with the region where signal height was used to determine PPII content [
66,
67], impeding their analysis. These experiments found that amino acids with charged side chains, except for histidine, had relatively high preferences for the PPII conformation in this peptide. The observed biases, measured at 5 °C, were mostly insensitive to changes in solution pH from 2 to 12. Residues with small, non-polar side chains, such as alanine and glycine, reported somewhat higher propensities for PPII that, in general, exceeded the biases observed from residues with non-polar and bulky side chains, such as isoleucine and valine. The list of amino acid-specific intrinsic propensities for PPII determined in these studies is given in
Table 1.
Similarly, Kallenbach and coworkers extended their NMR- and CD-based structural studies of the short peptides mentioned above to include other amino acid types at the central residue position in Ac-(Gly)
2-X-(Gly)
2-NH
2, where X was the substitution site. Substitution-induced effects on peptide structure were then used to establish a scale of PPII bias in this glycine-rich host [
16]. Substantial intrinsic PPII propensities were found, giving additional support to the idea that unfolded states are predisposed to PPII (see
Table 1). The magnitude of the PPII bias at the peptide guest position, surrounded by glycine, however, was noticeably different (and typically larger) when compared to the amino acid-specific biases that were measured in the proline-based host by Creamer. This predicts position-specific PPII bias in an unfolded chain that is modulated by the amino acid identity at neighboring sites, which has been subsequently verified [
68]. Moreover, the glycine-rich peptides exhibited a heat-induced shift in structure from PPII to nonPPII with a slight bias at high temperatures for strand-like conformations. The intrinsic PPII propensities reported in
Table 1 from Kallenbach were measured at 20 °C.
A third experimental scale of PPII propensity in peptides was measured calorimetrically by Hilser and coworkers [
13,
17,
69]. Their experiments utilized a peptide host–guest system in which the
Caenorhabditis elegans Sem-5 SH3 domain binds a peptide in the PPII conformation [
70]. This peptide (Ac-Val-(Pro)
3-Val-(Pro)
2-(Arg)
3-Tyr-NH
2) is derived from the recognition sequence of a SH3 binding partner, Sos (Son of Sevenless). A non-interacting residue of this peptide corresponding to its fourth position [
13] was substituted for each amino acid before binding was measured by isothermal titration calorimetry. The observed change in binding affinity reflects a change in the conformational equilibrium between binding-incompetent and binding-competent (i.e., PPII) states of the peptide ligand, which can be interpreted as a PPII propensity [
13,
69]. Once again, a substantial intrinsic bias for PPII was observed, albeit at magnitudes and rank orders that were different when compared to the scales determined by either Creamer or Kallenbach. Elam et al. conclude that there is a general consensus regarding amino acids that are high in PPII propensity (proline, lysine, glutamine, and glutamic acid) and low in PPII propensity (histidine, tryptophan, tyrosine, and phenylalanine), with the other amino acids falling in between [
17]. The intrinsic PPII propensities in
Table 1 from Hilser’s group were measured at 25 °C.
There are a number of other studies beyond the few described above, each of which uses their own system to examine the structural propensities of the different amino acids in peptides (reviewed in ref. [
71]). While the ranks of relative PPII propensities are often both quantitatively and qualitatively different when compared between studies, possibly owing to the use of different host models, all studies have indicated the same general conclusions that (1) unfolded peptides have structural preferences that are predominantly locally determined [
72]; (2) nevertheless, these preferences at individual positions can be modulated by the structural features of neighboring residues [
68], and (3) importantly, the unfolded chain does not evenly sample the sterically allowed regions of Ramachandran space [
71].
In addition to PPII propensities, alanine-based peptides have been utilized to measure intrinsic α-helix-forming tendencies in a host–guest model that was designed to avoid stabilizing side chain–side chain and side chain–macrodipole interactions [
22]. Though cold temperatures were required for this peptide to populate helix at appropriate levels for study, Baldwin and coworkers measured amino acid substitution effects on the CD signal at 222 nm and determined an experimental scale of α-helix intrinsic propensities for each of the 20 common amino acids. At 0 °C, most of the amino acids disfavored forming helix at guest positions in the alanine-based host, while leucine and arginine were indifferent to helix-formation. Alanine, however, had a preference for forming helix in this host. The intrinsic propensity for forming α-helix determined by Baldwin and coworkers for each of the common amino acids is provided in
Table 2.
3. Protein Coil Library Model of the DSE
The PDB [
73] provides an ever-increasing number of high-resolution protein structures, which include both regularly ordered secondary structures (helices, sheets, and turns) and irregularly ordered structures (coils and loops). While any individual coil or loop was sufficiently ordered for structural determination, the assumption is that in aggregate, a large set of irregularly ordered structures would provide information on the conformational tendencies and properties of the polypeptide chain in the denatured state. Collectively, these models of the denatured state are constructed by examining the regions of resolved protein structures that are outside the α-helix and β-strand domains. Indeed, analyses of “protein coil libraries” generally support the structural preferences that have been observed in peptide-based models. As these libraries of coil structures have evolved, the field has gained valuable insights into the roles of sequence context, intramolecular interactions, and protein hydration in determining the intrinsic structural tendencies of the amino acids.
In 1995, Swindells and Thornton generated one of the first iterations of a protein coil library based on high-resolution protein structures [
27]. Four basins were defined on the Ramachandran plot, corresponding to a (α-helix), b (β-sheet), p (PPII), and L (left-handed helix). Using 85 structures obtained from the PDB, they removed residues that were assigned helix or sheet conformation, retaining all coils, loops, and turns in the analyzed set. Within this set, residues Glu, Gln, Ser, Asp, and Thr demonstrated strong propensities for the “a” region, as their side chains have both the hydrogen bonding capacity and rotational flexibility to form hydrogen bonds to backbone groups. The “b” propensities appeared to be less sensitive to the chemistry and rotamer of the side chain, consistent with the location of the side chain relative to the backbone when in the β-sheet conformation. While the authors did not explicitly discuss the “p” region (PPII), their data show a significant redistribution of the population between the four basins when the “whole” and “coil library” sets are compared. When the entire polypeptide chain was considered, the a and b basins were the two most highly populated. In the coil library, with helices and sheets removed, the a and p basins exhibited the highest populations. This demonstrated that in the structures of intact proteins, PPII conformations are well represented in the non-alpha and non-beta regions.
This work was followed by an analysis of the PPII content in 274 high-resolution structures conducted by Stapley and Creamer [
74]. In their analysis, they found the PPII conformation was common, with more than half of the proteins containing at least one PPII helix longer than three residues, despite PPII residues comprising just 2% of all residues in the dataset. This study was the first to detail the PPII propensities of each side chain. Predictably, Gly was disfavored, while Pro had a strong PPII propensity. Additionally, they observed that Gln, Arg, Lys, and Thr had generally strong propensities for adopting PPII conformations. Moreover, a positional dependence of PPII propensity within the PPII helix was also found. The ability of polar side chains, such as Gln, Lys, and Arg, to form hydrogen bonds with the backbone between
i and
i + 1 positions stabilizes the PPII helix. This is consistent with the overrepresentation of Gln, Arg, Lys, and Thr in the first PPII helix position. These data also supported the idea that PPII helices have extensive solvent exposure, as there was a significant negative correlation between nonpolar solvent accessibility and PPII propensity. Taken together, their work demonstrated that both solvent accessibility and the ability to form hydrogen bonds with the backbone were important elements of PPII propensity, consistent with prior work in peptides.
In 2005, Rose and coworkers developed a protein coil library (PCL) that is web-accessible [
28]. The PCL becomes updated as the PDB is also updated. This repository of structure elements uses the regular expressions for α-helices and β-sheet and then extracts all non-helix and non-sheet residues from deposited structures that share <90% identity. Note that, as a result, the PCL contains both turns and homologous sequences. Additionally, for structure classification purposes, the PCL divides the Ramachandran plot into 30° × 30° bins, whereby each bin refers to one of 144 different “mesostates”.
An analysis in 2008 by Perskie et al. identified seven naturally clustering basins in a Ramachandran plot of PCL structures [
30]. These basins represent the familiar α, β, PPII, αL, and τ (type II’ β-turn) structural motifs, and also a γ basin, for inverse γ turns, and a δ basin that captures residues preceding a proline in proline-terminated helices. This allowed amino acid preferences for the different basins (see
Table 2 in ref. [
30]) to be determined and studied. For example, solvent–backbone hydrogen bonding, which can favor PPII [
14], and side chain–side chain sterics, which for branched amino acids adjacent to proline can favor δ at the expense of β, were found to be crucial determinants of the basin preferences.
To better understand how the conformational preferences of a residue in the denatured state depend on the identity and state of its adjacent (nearest) neighbor, Freed and coworkers constructed an increasingly stringent set of coil libraries [
29]. Using 2020 nonhomologous polypeptide chains, the “full” set was defined as the entire polypeptide chain, sans the terminal residues. The first cull of the full set (C
αβ) removed the α-helix and β-sheet identified residues, similar to the original coil libraries and the PCL described above. This had the effect of reducing the number of residues to 40% of the original. The next restriction additionally removed hydrogen-bonded turns from the set (C
αβt), slimming the library to 28% of the original. Finally, to produce the most restricted coil library, the authors retained only those residue positions located within contiguous stretches four residues or longer, and which were “internal” to coils. This had the effect of reducing “end bias” from structured regions, which is known to favor PPII at the expense of α and β.
The sequential removal of ordered residues had the overall effect of increasing PPII content and decreasing α populations in the coil library. Specifically, when all structured positions were included, α-helical conformations were the predominant state. Upon removing the α-helix and β-sheet residues—as Swindells and Thornton did a decade prior—the PPII conformation emerged as a major subpopulation. With turns also removed (C
αβt), the most populated conformation was PPII, and there was a significant reduction in the α population. The dominance of the PPII conformation is not restricted to a particular subset of amino acids, as all 20 amino acids show a considerable propensity to adopt the PPII configuration (
Table 3). The most restricted set (with only residues that are well within coil regions) showed little change in the population distribution, with the PPII population continuing to be dominant.
Using the most restricted set, the authors also found that the size of the PPII subpopulation is constant regardless of solvent accessibility [
29]. Moreover, PPII is the dominant conformation in all but the 10% most surface-exposed residues. The α-helix dominates in the surface residues, due to the propensity of the polypeptide backbone at the surface to preferentially turn back toward the folded core of the protein. The independence of PPII content and solvent accessibility initially appears to contrast with earlier work with both peptides and earlier versions of PCLs. However, these results can be reconciled by understanding the PPII conformation as a mechanism for maximizing backbone hydrogen bonding. In the PPII conformation, the backbone amides and carbonyls are in positions that both minimize steric hindrance and enable both functional groups to form hydrogen bonds, either with solvent molecules or within the protein [
29]. Therefore, the PPII propensity likely reflects the intrinsic hydrogen bond capacity of a polypeptide, not merely solvation.
These general results can be replicated using almost any set of nonhomologous protein structures.
Figure 2 shows results from a curated set of 122 human protein structures, sharing less than 50% sequence identity and with structural resolution < 2.0 Å [
75]. In the full set, containing 15,958 residue positions, the α conformation is the most populated (
Figure 2A). When α-helices and β-strands are removed, PPII is the most favored conformation for the remaining 6418 residue positions (
Figure 2B).
The consistency of PPII propensity in protein coil libraries, especially when viewed in light of hydrogen bonding capacity, therefore predicts that a bias toward PPII conformations is an inherent characteristic of the polypeptide backbone.
4. IDP Model of the DSE
The results of many studies (reviewed above) revealed a significant bias toward PPII in the unstructured states of proteins, even when no prolines are present in the sequence. This indicates that the PPII conformation is a dominant component of the DSE, and potentially an important structural descriptor for understanding the properties associated with IDPs and intrinsically disordered regions (IDRs). Although intrinsic PPII propensities have been determined for the common amino acids (see
Table 1 and
Table 3), the ability of these experimentally determined propensities to quantitatively reproduce ID structural behavior in biological proteins has been difficult to establish.
An experimental system was designed to address this issue and provide an independent measure of the amino acid-specific bias for PPII in IDPs. Based on the hypothesis that the magnitude of a PPII preference in the disordered conformational ensemble can affect its population-weighted hydrodynamic size [
41,
44,
45], it has been shown that intrinsic PPII propensities can be obtained by analyzing the sequence dependence of the mean hydrodynamic radius,
Rh, of IDPs [
37]. This method relies on two assumptions we demonstrate are reasonable. First, that PPII effects on mean
Rh follow a simple power law scaling relationship [
41,
44,
45], and second, that the protein net charge also can influence the hydrodynamic size [
38,
76].
To establish the relationship linking mean
Rh to chain bias for PPII in an ensemble, a computer algorithm based on the hard sphere collision (HSC) model was used to generate polypeptide structures through a random search of conformational space [
48,
49]. The HSC model has no intrinsic bias for PPII, which was demonstrated previously [
49], and thus a PPII sampling bias could be added to the algorithm as a user-defined parameter [
41].
Briefly, in this model, individual conformers are generated by using the standard bond angles and bond lengths [
77], and a random sampling of the backbone dihedral angles Φ, Ψ, and Ω. (Φ, Ψ) is restricted to the allowed Ramachandran regions [
78]; the peptide bond dihedral angle, Ω, is given 100% the
trans form for nonproline amino acids, while prolines sample the
cis form at a rate of 6–10%, depending on the identity of the preceding amino acid [
79]. The positions of side chain atoms are determined from sampling rotamer libraries [
80]. Van der Waals atomic radii [
47,
81] are used as the only scoring function to eliminate grossly improbable conformations. To calculate state distributions typical of protein ensembles, a structure-based energy function parameterized to solvent-accessible surface areas is used to determine the population weight of the generated structures [
82,
83,
84,
85,
86,
87,
88,
89,
90]. Random structures are generated until the population-weighted structural size, <
L>, becomes stable [
41].
L is the maximum C
α–C
α distance in a state, and <
L> is considered stable when its value changes by less than 1% upon a 10-fold increase in the number of ensemble states. <
L>/2 is used to approximate the mean
Rh of an ensemble.
Figure 3A shows the effect on simulated mean
Rh (i.e., <
L>/2) from increasing the applied PPII sampling bias,
SPPII, which is obtained by weighting the random selection of Φ and Ψ. For example, a 30% sampling bias for PPII had 30% of the paired (Φ, Ψ) values for any residue randomly distributed in the region of (−75° ± 10°, +145° ± 10°). The remaining 70% of paired (Φ, Ψ) were distributed in the allowed Ramachandran regions outside of (−75° ± 10°, +145° ± 10°). In this figure, each data point represents a computer-generated poly-alanine conformational ensemble (typically >10
8 states). These results are mostly insensitive to steric effects originating from the side chain atoms when biological sequences are used instead of poly-alanine [
38]. Unusual sequences, such as all proline or all glycine, cause deviations from the poly-alanine trend.
The simulations revealed that increasing chain propensity for PPII gives rise to increased mean
Rh, which is expected because PPII is an extended structure [
43]. The dependence of mean
Rh on chain length at each sampling bias was fit to the power law scaling relationship,
Rh =
Ro∙
Nv, where
N is chain length in number of residues,
Ro the pre-factor, and
v the polymer scaling exponent. Individual fits at a given
SPPII are shown by lines in
Figure 3A, obtained by nonlinear least squares methods.
Ro, on average, was 2.16 Å, except when the sampling bias was 100% PPII (
Figure 3B). When
Ro is held at 2.16 Å, the resulting
v shows a logarithmic dependence on
SPPII (
Figure 3C).
Because most computer-generated random structures have steric conflicts, and thus are removed by the hard sphere filter, the applied PPII bias,
SPPII, does not necessarily equal the population-weighted fractional number of residues in the PPII conformation in an ensemble of allowed states. By using
fPPII = <
NPPII>/
N to account for this difference, where
NPPII is the number of residues in the PPII conformation in a state, and <
NPPII> is the population-weighted value for the ensemble (i.e., <
NPPII> = ∑
NPPII,i∙
Pi with
Pi the Boltzmann probability of state
i), the simulation trends in
Figure 3 can be combined into a simple relationship,
Additional simulations found that Equation (1) is independent of the specific pattern of PPII propensities in the polypeptide chain [
45].
To test Equation (1) directly, mutational effects on experimental
Rh were measured for an IDP [
44]. Apparent changes in
fPPII were determined from amino acid substitutions, following the strategy shown in
Figure 4. These experiments used the N-terminal end of the p53 tumor suppressor protein, a prototypical IDP consisting of 93 residues, p53(1-93). The apparent net charge,
Qnet, calculated from sequence for p53(1-93), is −17. Thus, this test was conducted in the background of potentially strong intramolecular charge–charge interactions that were unaccounted for. Nonetheless, experiments with P→G and A→G substitutions applied to p53(1-93) gave reasonable results, indicating a per-position average PPII bias change of 0.76 at each proline site (i.e., relative to the intrinsic PPII bias of glycine) and 0.48 at each alanine site. These results are evidence of significant conformational bias for PPII in IDPs, even at nonproline positions.
Equation (1) was also used to predict
Rh from sequence for a database of IDPs, using the experimental PPII propensities in
Table 1 [
45]. For each IDP,
fPPII was calculated by ∑
PPIIi/
N, where
PPIIi is the PPII propensity of amino acid type
i, and the summation is over the protein sequence containing
N number of amino acids.
Figure 5A shows
Rh predicted when using PPII propensities from Hilser and coworkers (column 4,
Table 1). Compared to the null model where PPII is not strongly preferred and the chain is an unbiased statistical coil, Equation (1) indeed captures the overall experimental trend. Repeating these predictions using the PPII scales measured by Creamer or Kallenbach (columns 2 and 3,
Table 1), both yield
Rh values that are consistently larger than in the experiment [
45], indicating these two scales may be overestimated, at least for describing structural preferences in prototypical IDPs. Moreover, the error from predicting
Rh by Equation (1) when using the Hilser-measured PPII scale was found to trend strongly with
Qnet when
Qnet was normalized to chain length (
Figure 5B), more so than >500 other physicochemical properties that can be calculated from the primary sequence [
38]. The linear trend in prediction error to
Qnet (determined from sequence as number of K and R minus number of D and E) was used to modify Equation (1), yielding
Equation (2), which amends Equation (1) for
Qnet effects on the hydrodynamic size, is highly accurate in predicting
Rh from sequence for many IDPs (
Figure 5C). Further, in this set of IDPs, mean
Rh did not trend with κ [
38], which is a measure of the mixing of positive and negative charges in the primary sequence [
91]. This justified using
Qnet to modify Equation (1) and obtain Equation (2), because mean
Rh was independent of sequence organization of the charged side chains.
To further test Equation (2) and its ability to describe PPII effects on IDP
Rh, random PPII scales were generated and tested for accuracy at predicting experimental
Rh [
37]—thus establishing the sensitivity of Equation (2) to scale variations. Briefly, each random scale, where the 20 common amino acids were individually assigned random values between 0 and 1, was used to predict
Rh by Equation (1), and was then compared to experimental
Rh, an example of which is shown
Figure 5A for the peptide-based PPII scale measured by Hilser and coworkers. Next, the linear trend in prediction error to size-normalized
Qnet was determined, as in
Figure 5B. These two steps generate two correlations (R
2), which were used to evaluate each random scale (
Figure 6A). Because the slope and intercept from the error trend with size-normalized
Qnet provides the coefficients preceding |
Qnet| and
N0.5 in Equation (2), each scale yields a unique empirical modification to Equation (1) that corrects for net charge effects on mean
Rh. The results from analyzing 10
6 randomly generated scales in this manner are given in
Figure 6A. Each data point represents a PPII scale. The color, from black to purple, red, and through yellow, is the average error in predicting
Rh from sequence after correcting for net charge effects on hydrodynamic size (i.e., after using scale-specific Equation (2) to predict
Rh). The abscissa is the correlation (R
2) of Equation (1)-predicted
Rh with the experiment for a scale. The ordinate is the correlation (R
2) of size-normalized Equation (1) error with size-normalized
Qnet.
Two key observations are immediately apparent in the data given in
Figure 6A. First, there is a set of random PPII propensity scales that are better than typical at predicting mean
Rh from sequence when using
fPPII,
Qnet, and
N. These scales, highlighted by the boxed area, predict IDP
Rh with good correlation with experimental
Rh (R
2 > 0.7;
x-axis) and a prediction error that also trends with
Qnet (R
2 > 0.4;
y-axis). Second, the experimental PPII propensities determined calorimetrically from host–guest analysis of the binding energetics of the Sos peptide (i.e., the peptide-based scale measured by Hilser and coworkers) outperform almost all random scales in their ability to describe sequence effects on mean hydrodynamic size when using only conformational bias and net charge considerations. This is particularly evident when comparing error magnitudes (
Figure 6B).
To determine if Equation (2) is sufficiently sensitive to discern the differences in PPII bias of the amino acids, the average scale value for each amino acid type was computed from the “best” performing random scales. The “best” scales were defined as those in the boxed area of
Figure 6A with the smallest error (i.e., less than the distribution mode; see
Figure 6B). The computed averages, unfortunately, report a somewhat trivial specificity except for distinguishing proline and nonproline types (red bars,
Figure 6C), most likely owing to the low representation of some amino acid types in the IDP dataset, specifically the nonpolar amino acids [
92]. When substitution effects on mean
Rh were measured experimentally in p53(1-93) to determine rank order in PPII propensities among the nonpolar amino acid types [
37], and then used to restrict the “best” random scales to those that also maintain this rank order, the average scale value by amino acid type (blue bars,
Figure 6C) exhibited strong correlations with the other experimental PPII scales (
Figure 7). These amino acid-specific average scale values (blue bars,
Figure 6C), which were obtained solely from analyzing sequence effects on IDP
Rh, represent an independent measure of the intrinsic PPII bias in the ID states of biological proteins.
Because ID has sequence characteristics that show fundamental disparities when compared to nonID sequences, using IDPs as a DSE model for folded protein is not fully supported. For example, unlike the heterogeneous composition of amino acids and the weak repetition found in the sequences of folded proteins [
93,
94], IDPs and IDRs have a lower sequence complexity [
95] with strong preferences for hydrophilic and charged amino acid side chains over aromatic and hydrophobic side chains [
92,
96]. These disparate properties of the primary sequence suggest potentially disparate structural behavior. To investigate this issue, protein sequence reversal was used to gain experimental access to the disordered ensemble of a protein with a composition of L-amino acids and pattern of side chains identical to those of a conventional folded protein [
42]. Using staphylococcal nuclease for these studies, the unaltered wild type adopts a stable native structure consisting of three α-helices and a five-stranded, barrel-shaped β-sheet [
97]. The protein variant with reversed sequence directionality, Retro-nuclease, was found to be an elongated monomer, and exhibits the structural characteristics of intrinsic disorder [
42]. At 25 °C, the mean
Rh of Retro-nuclease was found to be 34.0 ± 0.5 Å by DLS techniques. Sedimentation analysis by analytical ultracentrifugation (AUC) and SEC methods gave similar results under similar conditions (33.0 Å at 20 °C by AUC, and 33.7 Å at ~23 °C by SEC). Equation (2), for comparison, predicts 33.1 Å using the Retro-nuclease primary sequence, which is close to the observed experimental values.
The hydrodynamic size of Retro-nuclease is highly sensitive to temperature changes (
Figure 8A), which is consistent with observations from other IDPs [
39,
40,
41]. The enthalpy and entropy of the PPII to nonPPII transition have been measured in short alanine peptides by monitoring heat effects on structure over a broad temperature range [
46]. The results from CD spectroscopy, which monitored the change in the CD signal at 215 nm, gave Δ
HPPII and Δ
SPPII of ~10 kcal mol
−1 and 32.7 cal mol
−1 K
−1, respectively, while NMR measurements, using heat effects on
3JHNα, gave ~13 kcal mol
−1 and 40.9 cal mol
−1 K
−1.
Because the PPII bias is noncooperative [
46] and locally determined [
72], the effect from temperature changes can be modeled at the level of individual residue positions using the integrated van’t Hoff equation,
where
KPPII is the equilibrium between PPII and nonPPII states,
T is temperature, and
R is the gas constant. ∆
HPPII is assumed to be constant. If PPII is the lone dominant conformation, then
KPPII for each amino acid type can be estimated from experimental PPII propensities at 25 °C as
KPPII,i = (1 −
PPIIi)/
PPIIi. The importance of Equation (3) is that it provides another check on the ability of the DSE to be described from the results of peptide studies. Moreover, these two values, Δ
HPPII and
PPIIi, give access to the entropy from the relationship (∂
G/∂
T)
P = −
S. Using IDP-measured intrinsic PPII propensities (blue bars,
Figure 6C), we found that ∆
HPPII~13 kcal mol
−1 captures the decrease in Retro-nuclease mean
Rh from 25 to 65 °C (
Figure 8A). For alanine, using its IDP-measured PPII propensity at 25 °C (0.32) and ∆
HPPII = 13 kcal mol
−1 yields Δ
SPPII = 45.1 cal mol
−1 K
−1.
Although the predicted and experimental mean
Rh agree at 25 and 65 °C, experimental and Equation (2)-predicted values at 5, 15, 35, and 45 °C show obvious differences (
Figure 8A). At 35 and 45 °C, the experimental mean
Rh values were larger than predicted, whereas at 5 and 15 °C, they were smaller. The analysis of heat effects on
Rh using Equation (3) assumed PPII to be the lone dominant DSE conformation, which is not necessarily correct. Indeed, the Retro-nuclease CD spectrum reported a cold-induced local minimum at 222 nm for
T < 25 °C [
42], revealing temperature-dependent population of the α backbone conformation. By including the effects of an α bias in simulations of DSE hydrodynamic size, both the over- and underpredictions of mean
Rh at 5, 15, 35, and 45 °C can be explained.
Briefly, preferential sampling of main chain dihedral angles for Φ and Ψ associated with α-helix can cause changes in the structural dimensions of the DSE [
38]. Monitored from the population-weighted mean size,
Rh ~ <
L>/2, computer-generated ensembles that sample (Φ, Ψ) in the α region (−64° ± 10°, −41° ± 10°) show compaction under modest preferences, and elongated sizes at higher α sampling rates (
Figure 8B). Specifically, when (Φ, Ψ) sampling for α is weakly preferred, the probability of contiguous stretches of residues in the α state is low, and turn structures are more likely than helical segments that form when the α bias is higher. Because the effect of the α bias on the mean
Rh of the DSE can be accentuated by the PPII bias, whereby ensembles with high PPII propensities show increased sensitivity to changes in the α bias, the consequences of both the α and PPII biases for mean
Rh must be considered. For example, the average chain propensity for PPII in our IDP database is ~0.4 when estimated from sequence. Thus, the IDP trend of mean
Rh with α bias should follow the red line in
Figure 8B, and not the black line. Likewise, the effect of PPII bias on mean
Rh is codependent on the α bias (
Figure 8C). When PPII is the dominant conformation, the structural dimensions of the denatured state follow the relationship given by Equation (1) (black line in
Figure 8C). If, instead, PPII is not the dominant conformation, and moderate α preferences are present, then the
Rh dependence on PPII bias changes. More precisely, the result of increasing the chain preference for α is to suppress the effect of PPII on mean
Rh (blue line in
Figure 8C). When the α bias is stronger than the PPII bias (i.e., α is the dominant conformation), then the effect of the PPII bias is compaction (red line in
Figure 8C).
The comparison of experimental IDP
Rh to the curves in
Figure 8C (open circles in the figure) confirms that PPII is the dominant backbone conformation in IDP ensembles [
37]. Here, fractional ∆
Rh was calculated as (experimental
Rh—simulated
Rh)/simulated
Rh, where simulated
Rh refers to the size of an unbiased ensemble that has been corrected for net charge effects. In the figure, a majority of the IDPs are found to have experimental mean
Rh values slightly larger than expected based upon the sequence-calculated value of
fPPII. This suggests that the amino acid preferences for PPII may be underestimated by the IDP-based scale, and the values for
fPPII in this figure should be shifted to the right. The possibility of a larger intrinsic PPII bias cannot be eliminated because PPII effects on mean
Rh are suppressed by the presence of an α bias. The magnitude and sequence dependence to the α bias in the protein DSE is currently unknown, although it has been estimated in short alanine-rich peptides [
22].
The idea that PPII propensities are underestimated possibly explains some of the Retro-nuclease data shown in
Figure 8A. An underestimated PPII bias gives an underestimated predicted mean
Rh at 35 and 45 °C. At 5 and 15 °C, the disagreement between theory and experiment is likely caused by the α bias detected in the Retro-nuclease CD spectrum [
37,
38]. To obtain the sequence dependence of both the α and PPII biases in the DSE and test these assumptions, the analysis of sequence effects on IDP mean
Rh reviewed above could simply be repeated at both colder and warmer temperatures. Higher temperatures reduce α effects on mean
Rh and isolate the effects of the PPII bias. Colder temperatures give access to the α bias. Just as the sequence dependence of mean
Rh at
T ≥ 25 °C yields the amino acid-specific biases for PPII from the comparison of experimental
Rh to simulated coil values that omit PPII effects, the sequence dependence of mean
Rh at T < 25 °C can yield the amino acid bias for the α conformation via comparison to the theoretical treatment that omits the α effects.