<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Genes</journal-id>
<journal-title>Genes</journal-title>
<issn pub-type="epub">2073-4425</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/genes2040748</article-id>
<article-id pub-id-type="publisher-id">genes-02-00748</article-id>
<article-categories>
<subj-group>
<subject>Review</subject></subj-group></article-categories>
<title-group>
<article-title>The Evolution of Protein Structures and Structural Ensembles Under Functional Constraint</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Siltberg-Liberles</surname><given-names>Jessica</given-names></name><xref ref-type="corresp" rid="c1-genes-02-00748"><sup>*</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Grahnen</surname><given-names>Johan A.</given-names></name></contrib>
<contrib contrib-type="author">
<name><surname>Liberles</surname><given-names>David A.</given-names></name><xref ref-type="corresp" rid="c1-genes-02-00748"><sup>*</sup></xref></contrib>
<aff id="af1-genes-02-00748">Department of Molecular Biology, University of Wyoming, Laramie, WY 82071, USA; E-Mail: <email>jgrahnen@uwyo.edu</email></aff></contrib-group>
<author-notes>
<corresp id="c1-genes-02-00748">
<label>*</label> Author to whom correspondence should be addressed; E-Mails: <email>jliberle@uwyo.edu</email> (J.S.-L.); <email>liberles@uwyo.edu</email> (D.A.L.).</corresp></author-notes>
<pub-date pub-type="collection">
<year>2011</year></pub-date>
<pub-date pub-type="epub">
<day>28</day>
<month>10</month>
<year>2011</year></pub-date>
<volume>2</volume>
<issue>4</issue>
<fpage>748</fpage>
<lpage>762</lpage>
<history>
<date date-type="received">
<day>24</day>
<month>09</month>
<year>2011</year></date>
<date date-type="rev-recd">
<day>15</day>
<month>10</month>
<year>2011</year></date>
<date date-type="accepted">
<day>19</day>
<month>10</month>
<year>2011</year></date></history>
<permissions>
<copyright-statement>© 2011 by the authors; licensee MDPI, Basel, Switzerland.</copyright-statement>
<copyright-year>2011</copyright-year>
<license>
<p>This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).</p></license></permissions>
<abstract>
<p>Protein sequence, structure, and function are inherently linked through evolution and population genetics. Our knowledge of protein structure comes from solved structures in the Protein Data Bank (PDB), our knowledge of sequence through sequences found in the NCBI sequence databases (<ext-link xlink:href="http://www.ncbi.nlm.nih.gov/" ext-link-type="uri">http://www.ncbi.nlm.nih.gov/</ext-link>), and our knowledge of function through a limited set of <italic>in-vitro</italic> biochemical studies. How these intersect through evolution is described in the first part of the review. In the second part, our understanding of a series of questions is addressed. This includes how sequences evolve within structures, how evolutionary processes enable structural transitions, how the folding process can change through evolution and what the fitness impacts of this might be. Moving beyond static structures, the evolution of protein kinetics (including normal modes) is discussed, as is the evolution of conformational ensembles and structurally disordered proteins. This ties back to a question of the role of neostructuralization and how it relates to selection on sequences for functions. The relationship between metastability, the fitness landscape, sequence divergence, and organismal effective population size is explored. Lastly, a brief discussion of modeling the evolution of sequences of ordered and disordered proteins is entertained.</p></abstract>
<kwd-group>
<kwd>conformational ensemble</kwd>
<kwd>multiscale modeling</kwd>
<kwd>structural disorder</kwd>
<kwd>sequence-structure-function-evolution relationships</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>The links between gene sequence, protein structure, and biological function are central to the development of a mechanistic understanding of molecular and cellular biological processes. Further, from an evolutionary perspective, changes in gene sequences, as filtered by protein structure and function, can drive phenotypic change through neutral and adaptive mechanisms. Selection can ultimately occur at the level of the fitness of the individual organism, filtered through the lens of cell biology down to the level of protein function, structure, and sequence. Not all proteins contribute equally to organismal fitness. The generation of high throughput genomic, proteomic, and structural datasets has enabled molecular evolutionary analysis of functional data. Ultimately, an understanding of the interplay of protein structure with both sequence evolution and functional/phenotypic evolution is necessary. This review will depict this understanding from several key perspectives.</p></sec>
<sec>
<label>2.</label>
<title>Protein Structure Space</title>
<p>The nature of protein structure space is an important starting point for characterizing the link between sequence, structure, and function. Knowing how well protein structure space has been characterized (the degree to which the sampling is complete) is a necessary prerequisite for, understanding how it has evolved, the constraints on its evolution, and the constraints that it (and evolutionarily accessible alternatives) place on sequences and functions.</p>
<p><italic>That protein structure is more conserved than sequence</italic> is a common perception among molecular life scientists. This is based upon an observation of the experimentally determined protein structures in the Protein Data Bank (PDB). However, if we remove the 100% identical proteins from PDB, we are left with about 40,000 PDB structures. If we compare that to the number of protein sequences in the RefSeq database (currently &gt;10 million protein sequences), it is clear that our current knowledge of protein structure space is derived from a very small subset of proteins. This is especially true if it is the case that structure can vary among homologous proteins from different species with correspondingly more variation in structure than is sometimes appreciated. It is known that the protein composition of PDB is biased [<xref ref-type="bibr" rid="b1-genes-02-00748">1</xref>]. Membrane proteins and structurally disordered proteins are underrepresented in PDB and many proteins are modified (truncated and/or mutated) in order to facilitate crystal formation. Some proteins in fact show the hallmarks of crystal packing forces in their structures that cannot reasonably be expected to reflect that stable structure in solution [<xref ref-type="bibr" rid="b2-genes-02-00748">2</xref>]. There are also biases in the function, subcellular localization and protein coverage in PDB [<xref ref-type="bibr" rid="b1-genes-02-00748">1</xref>].</p>
<p>Despite these caveats, there are a lot of important data and trends to be found in the PDB. Protein structure classification, for instance CATH [<xref ref-type="bibr" rid="b3-genes-02-00748">3</xref>], further characterizes most multidomain structures in PDB at the domain level, as the domain is commonly regarded to be the smallest functional unit that can fold by itself. CATH currently has almost 1,300 different topologies or folds, some of which are used much more frequently than other folds. However, while this data is focused on the domain level, it misses structural organization at the multidomain level. Many multidomain proteins contain linker sequences between domains and the structural flexibility of these linkers has informational value for our understanding of the extended protein structure space. If we can estimate the extent of structural flexibility between domains, it would certainly add to the current understanding of how protein structures evolve on the tertiary and quaternary structural levels. Not only could intra-chain domain-domain packing be affected in the tertiary structure, but also inter-chain domain-domain packing can be affected in (for instance) the case of domain swaps. These studies are likely to increase our understanding of how domain-domain crosstalk and allostery evolve, which can improve current methods for homology modeling of multidomain complexes and correspondingly, our understanding of the evolution of protein function, interaction, and regulation.</p>
<p>Focusing on the PDB as our source for protein structure information may lead us to a skewed view of protein structure space. From a structural rather than a functional perspective, proteins that rarely make it into the PDB simply because they are too dynamic are systematically missed. Due to the nature of the energy landscape (the relative energies of different conformations and ultimately different sequences in different conformations, see <xref ref-type="fig" rid="f1-genes-02-00748">Figure 1</xref>), these are the proteins that exist in rapidly exchanging conformations and that may only progress down the folding funnel towards a stable conformation after being either post-translationally modified or when interacting with a binding partner. These proteins are commonly referred to as structurally disordered. Structurally disordered proteins can be fully or partially disordered, and what is intriguing about these proteins is their presence as a conformational ensemble that kinetically interconverts on cellular timescales. Here we cannot simply say that protein structure is more conserved than sequence because a mutation in the conformational ensemble is likely to shift the equilibrium of the conformational ensembles. Hence, the evolution of structurally disordered proteins may lead to non-conserved protein structures among homologs through this shift in the conformational ensemble. We call this phenomenon neostructuralization [<xref ref-type="bibr" rid="b4-genes-02-00748">4</xref>]. Starting with the structurally ordered proteins, we will attempt to systematically describe our understanding of how proteins evolve. As our understanding of protein sequence-structure links and the intertwining roles of physical chemistry and evolution improves, key aspects of our knowledge based on protein structural evolution may need revision.</p></sec>
<sec>
<label>3.</label>
<title>Evolution of Structurally Ordered Proteins</title>
<p>Structured domains are characterized by a large proportion of secondary structure, as well as a single hydrophobic core and mostly hydrophilic surface. Distinct regions of non-enzymatic proteins with different evolutionary properties include the hydrophilic surface, the hydrophobic core, and more hydrophobic surface binding interfaces involved in protein-protein interaction. These regions show different rates of amino acid substitution, with the hydrophobic core evolving more slowly than the hydrophilic surface [<xref ref-type="bibr" rid="b5-genes-02-00748">5</xref>]. Quantitatively, core residues evolve up to 10x slower than surface residues [<xref ref-type="bibr" rid="b6-genes-02-00748">6</xref>], and include residues that are the most informative for determining the topology of the native fold [<xref ref-type="bibr" rid="b7-genes-02-00748">7</xref>]. In fact, rates of evolution correlate strongly with fractional residue burial [<xref ref-type="bibr" rid="b8-genes-02-00748">8</xref>]. Within protein families, backbone change in the core increases very slowly [<xref ref-type="bibr" rid="b9-genes-02-00748">9</xref>], mostly preserving the characteristic topology of the fold over relatively long evolutionary distances. Single substitutions are generally accommodated by side chain packing [<xref ref-type="bibr" rid="b10-genes-02-00748">10</xref>]. The structure dictates the inter-residue interactions that occur and the thermodynamic intramolecular coupling of substitutions is detectable from evolutionary data [<xref ref-type="bibr" rid="b11-genes-02-00748">11</xref>], leading to the use of contact maps and viewing proteins in a network context [<xref ref-type="bibr" rid="b12-genes-02-00748">12</xref>]. For proteins with a binding function, the binding interface is under functional constraint and may evolve the slowest, with differences in rate between affinity-determining and specificity-determining residues [<xref ref-type="bibr" rid="b13-genes-02-00748">13</xref>]. Different secondary structural elements also show different rates of evolution, with beta-sheet regions evolving more slowly than helical regions, and with random coil regions evolving fastest [<xref ref-type="bibr" rid="b5-genes-02-00748">5</xref>,<xref ref-type="bibr" rid="b14-genes-02-00748">14</xref>]. Beyond secondary structure, this may be influenced by differences in relative burial between different elements. In addition to point substitutions, insertion and deletion events (indels) also occur at varying rates [<xref ref-type="bibr" rid="b15-genes-02-00748">15</xref>].</p>
<p>While it sometimes supposed that Hidden Markov Model (HMM) emission probabilities from Pfam [<xref ref-type="bibr" rid="b16-genes-02-00748">16</xref>] reflect the allowed nature of sequence divergence within a structure and describe aspects of allowable sequences within structures, these have been generated without consideration of the phylogenetic scale on which sequences have been diverging. Kondrashov [<xref ref-type="bibr" rid="b17-genes-02-00748">17</xref>] has suggested that explored sequence space within folds of real proteins is still expanding. Consistent with this, evolutionary simulation imply that there are many sequences that have not been observed that can fold into a given known structure [<xref ref-type="bibr" rid="b18-genes-02-00748">18</xref>,<xref ref-type="bibr" rid="b19-genes-02-00748">19</xref>]. This is also consistent with observations from protein design [<xref ref-type="bibr" rid="b20-genes-02-00748">20</xref>–<xref ref-type="bibr" rid="b22-genes-02-00748">22</xref>]. These views may necessitate revision of our understanding of the uniqueness of superfolds and related concepts of designability, leading to alternative hypotheses for fold distributions rooted in evolutionary and population genetic processes [<xref ref-type="bibr" rid="b23-genes-02-00748">23</xref>–<xref ref-type="bibr" rid="b25-genes-02-00748">25</xref>].</p>
<p>For the subset of proteins that form a stable unique tertiary structure, the thermodynamic stability (ΔG) of the protein in the context of a folding funnel is important [<xref ref-type="bibr" rid="b26-genes-02-00748">26</xref>] (see <xref ref-type="fig" rid="f1-genes-02-00748">Figure 1</xref>). It is therefore maintained throughout evolution despite the average destabilizing effect of non-synonymous mutations [<xref ref-type="bibr" rid="b27-genes-02-00748">27</xref>–<xref ref-type="bibr" rid="b29-genes-02-00748">29</xref>]. Proteins are only marginally stable, with a free-energy change of a few kcal/mol upon folding [<xref ref-type="bibr" rid="b30-genes-02-00748">30</xref>]. This has been attributed to population-level neutral processes, where there is more power to select for a larger energy gap in larger population species (organisms) or when there is a strong selective advantage to do so (as in hyperthermophiles) [<xref ref-type="bibr" rid="b31-genes-02-00748">31</xref>,<xref ref-type="bibr" rid="b32-genes-02-00748">32</xref>], or alternatively to functional requirements for protein flexibility [<xref ref-type="bibr" rid="b33-genes-02-00748">33</xref>]. To overcome the Levinthal Paradox, distal parts of the energy landscape must be gently sloping towards the native structure(s). However, the metastability of the folded structure relative to alternative folded structures combined with dN/dS data suggesting strong negative selection on the average protein against the average mutation [<xref ref-type="bibr" rid="b5-genes-02-00748">5</xref>] suggests that the local funnel near the native state is more rugged from a mutational perspective through evolution than other parts of the landscape, with allowable mutations forming a neutral network. Ultimately, structure is important as a scaffold for properly orienting functional residues (for example, a binding interface, catalytic residues, or a pore). Consequently, there is little selective pressure for particular sequences within a given structure over longer evolutuionary periods, generating a neutral network of sequences connected by those accessible through the mutational process. Folds with excess ΔG are thought to possess more potential for neofunctionalization (and gene family expansion [<xref ref-type="bibr" rid="b30-genes-02-00748">30</xref>,<xref ref-type="bibr" rid="b34-genes-02-00748">34</xref>–<xref ref-type="bibr" rid="b36-genes-02-00748">36</xref>]. But as expected from nearly-neutral theory [<xref ref-type="bibr" rid="b37-genes-02-00748">37</xref>], the majority of mutations are either deleterious or neutral rather than adaptive, both in terms of ΔG and fitness [<xref ref-type="bibr" rid="b27-genes-02-00748">27</xref>,<xref ref-type="bibr" rid="b29-genes-02-00748">29</xref>,<xref ref-type="bibr" rid="b38-genes-02-00748">38</xref>–<xref ref-type="bibr" rid="b40-genes-02-00748">40</xref>]. Compensatory mutation can play a selective role within nearly neutral sequence networks, whereby a deleterious mutation makes a subsequent otherwise neutral change selectively advantageous [<xref ref-type="bibr" rid="b9-genes-02-00748">9</xref>].</p>
<p>The processes described above can lead to structural transitions through two different processes. Within a neutral network that is functional, there may be multiple structural states that can exist. It is unclear that there is always a selective pressure for an energy gap near the native structure(s), especially in the case that closely related structures are functionally equivalent. Changes in secondary structure content after residue substitutions can occur due to varying helix/sheet propensity, with sheets being more plastic [<xref ref-type="bibr" rid="b5-genes-02-00748">5</xref>,<xref ref-type="bibr" rid="b14-genes-02-00748">14</xref>]. Some of these changes in secondary structural composition are likely to be evolutionarily neutral. A second mode of structural transition involves positive selection. In this case, a new fold that is mutationally accessible may enable the development of a new function that was not possible within the previous fold.</p>
<p>This raises an interesting question: is protein structure space continuous or discrete in enabling evolutionary transitions between distinct folds? A variety of measures of structural similarity have been applied to construct maps of protein structure space [<xref ref-type="bibr" rid="b41-genes-02-00748">41</xref>–<xref ref-type="bibr" rid="b43-genes-02-00748">43</xref>]. These maps consistently show highly populated regions roughly corresponding to the Class level of SCOP [<xref ref-type="bibr" rid="b43-genes-02-00748">43</xref>–<xref ref-type="bibr" rid="b45-genes-02-00748">45</xref>], and smaller clusters corresponding to the presumably homologous Superfamily level [<xref ref-type="bibr" rid="b43-genes-02-00748">43</xref>]. Depending on the algorithms and graph-theoretical measures employed, different groups have argued that this space is fully connected [<xref ref-type="bibr" rid="b42-genes-02-00748">42</xref>] or highly fragmented [<xref ref-type="bibr" rid="b43-genes-02-00748">43</xref>]. However, mechanistically protein evolution does not proceed via jumps in structural space/geometry as it is sometimes modeled, but via small changes in sequence space and the mapping between structural hierarchies and mutation-based hierarchies is unclear. While circular permutation and other larger scale mutational re-arrangements have been observed [<xref ref-type="bibr" rid="b46-genes-02-00748">46</xref>], the important consideration is that fold transitions occur through the mutational process at the sequence level rather than geometrically at the structural level as it is sometimes modeled. To rigorously evaluate the possibility of a fold transition one would have to determine the viability of a series of mutations that connect the two folds. Both thermodynamics and kinetics of folding must be taken into account, as well as fitness effects due to function, all within in a context of population genetics.</p></sec>
<sec>
<label>4.</label>
<title>Evolution of Protein Folding Pathways</title>
<p>In addition to a unique and stable native state, structured proteins also have pathways through which they rapidly fold. In some cases, the folding pathway has been shown to affect the final structure that the sequence folds into, meaning that the folding pathway can be important to the ultimate fold and therefore the ultimate biological function (for example, [<xref ref-type="bibr" rid="b47-genes-02-00748">47</xref>]). It is only to the extent that folding pathway effects structure and ultimately function that it is evolutionarily important. Folding pathways do also have an important role in preventing aggregation, with proper folding driven at least partly by hydrophobic collapse. With these views in mind, the conservation of folding pathways is described.</p>
<p>The intermediates in the folding pathway are known to be conserved for some homologous proteins [<xref ref-type="bibr" rid="b48-genes-02-00748">48</xref>]. The correlation between native state contact order and folding kinetics [<xref ref-type="bibr" rid="b49-genes-02-00748">49</xref>] further suggests that the native state topology is the main evolutionary determinant of the folding pathway. A number of studies [<xref ref-type="bibr" rid="b50-genes-02-00748">50</xref>,<xref ref-type="bibr" rid="b51-genes-02-00748">51</xref>] subsequently showed that folding pathways are partially, but not fully, conserved in homologs of single-domain proteins. Folded subdomains (folding nuclei or foldons) can be strongly conserved, particularly if they define an intermediate or transition state late in the pathway [<xref ref-type="bibr" rid="b52-genes-02-00748">52</xref>,<xref ref-type="bibr" rid="b53-genes-02-00748">53</xref>]. However, even very small proteins appear to have multiple parallel pathways and intermediates [<xref ref-type="bibr" rid="b53-genes-02-00748">53</xref>,<xref ref-type="bibr" rid="b54-genes-02-00748">54</xref>], and the flux through each pathway can change appreciably after mutation [<xref ref-type="bibr" rid="b52-genes-02-00748">52</xref>,<xref ref-type="bibr" rid="b55-genes-02-00748">55</xref>,<xref ref-type="bibr" rid="b56-genes-02-00748">56</xref>]. Earlier stages of folding appear to be less conserved than later stages [<xref ref-type="bibr" rid="b57-genes-02-00748">57</xref>]. Variability in the ruggedness of the energy landscape containing a folding funnel [<xref ref-type="bibr" rid="b26-genes-02-00748">26</xref>] depending upon distance from the native state can explain these observations. In the early stages of folding, the funnel is very wide and multiple pathways may lead into it over a variety of transition states. As the bottom of the funnel is approached (for the classic funnel model with a single minimum), the width (<italic>i.e.</italic>, number of available conformations) shrinks and fewer pathway options exist for proteins with a single native state. Additionally, as the number of native contacts increases, the choice of pathways becomes increasingly dominated by the topology of the native state, including specific residue contacts [<xref ref-type="bibr" rid="b50-genes-02-00748">50</xref>,<xref ref-type="bibr" rid="b58-genes-02-00748">58</xref>]. The early and intermediate conformations are stabilized by various non-native contacts, which do not contribute to the stability of the fully folded state and are therefore under less selective pressure to be maintained within a fold if they are not necessary for proper folding. Ultimately, the shape of the folding funnel near the folded conformation and towards the edges of the native sequence landscape in the context of marginal stability is an open question, as is the existence of divergent structures dependent upon folding pathway for some protein families.</p></sec>
<sec>
<label>5.</label>
<title>Evolution of Conformational Ensembles and of Protein Dynamics</title>
<p>Given the potential continuity of fold space and of the underlying sequence space, it is clear that proteins can exist in conformational ensembles, both functionally and as evolutionary transitions. Beyond thermodynamic considerations of conformational ensembles is the role of kinetics in protein structure and function. This section will focus on the motion of individual proteins.</p>
<p>As a neutral baseline, Illergard <italic>et al.</italic> [<xref ref-type="bibr" rid="b6-genes-02-00748">6</xref>] established an approximately linear divergence between the rate of sequence evolution and of structural divergence measured by structural root mean square deviation (RMSD) evolution for static structures. There is a relationship between the lowest energy normal modes and the paths through which protein structure diverges through mutational opportunity [<xref ref-type="bibr" rid="b59-genes-02-00748">59</xref>]. Further, it has been established that the lowest normal modes also evolved with approximate rate of divergence proportionality to the structural divergence hierarchy [<xref ref-type="bibr" rid="b60-genes-02-00748">60</xref>,<xref ref-type="bibr" rid="b61-genes-02-00748">61</xref>]. Deviations from this clock-like rate may be expected to show a functional signal that may evolve particularly rapidly under processes like positive selective pressure. The hypothesis, then, is that rate accelerations in normal mode divergence may be useful in predicting functional divergence.</p>
<p>A confounding factor is the role of post-translational modification in modifying thermodynamic and kinetic conformational ensemble stabilities, especially as patterns of post-translational modification can evolve rapidly on evolutionary timescales. As will be discussed further below, post-translational modification can alter the equilibrium in a conformational ensemble and may therefore play a more major role than is commonly attributed in protein structure determination. From an evolutionary perspective, selection on folding stability and pathway may interplay with selection on sites for post-translational modification.</p>
<p>Given that ensembles of structures can play functional roles and can be found as either evolutionary intermediates or as evolutionarily stable functional proteins, the question emerges, how do these proteins that are disordered or in rapidly shifting equilibria between ordered structures evolve?</p></sec>
<sec>
<label>6.</label>
<title>Evolution of Structurally Disordered Proteins</title>
<p>Study of the evolution of structurally disordered proteins is in its infancy. It has been predicted that the fraction of structurally disordered protein increases with organismal complexity [<xref ref-type="bibr" rid="b62-genes-02-00748">62</xref>], but why is unclear. This may be linked to the increase in the frequency of multidomain proteins with organismal complexity [<xref ref-type="bibr" rid="b63-genes-02-00748">63</xref>,<xref ref-type="bibr" rid="b64-genes-02-00748">64</xref>]. An increase in multidomain proteins also means more domain spacers or linkers, which often are structurally flexible. More fundamentally, more complex organisms (as defined by the number of distinct cell types) tend to have smaller population sizes and reduced strengths of selection. A null hypothesis for the rise of disorder in these lineages might simply be a reduction in the strength of selection along these lineages, including on proper protein folding [<xref ref-type="bibr" rid="b65-genes-02-00748">65</xref>]. To reject this hypothesis, we would need to detect selectable functions in disordered proteins that cannot be accomplished by ordered proteins. Fundamentally, we would also need to account for the ability to select for these features in evolutionary regimes where selection has less power, such as in small population size multicellular animals.</p>
<p>To understand how structural disorder evolves and if it is conserved or not, one needs to study the evolutionary dynamics of disordered regions in the phylogenetic context of homologous proteins. Studies of this kind are scarce, but it appears that structural order and disorder, as well as underlying secondary structural propensities, are conserved in some homologs, but not all [<xref ref-type="bibr" rid="b4-genes-02-00748">4</xref>]. For example, patterns of disorder, like other evolutionary features, appear more likely to shift among paralogs than among orthologs [<xref ref-type="bibr" rid="b4-genes-02-00748">4</xref>]. Further studies are needed to characterize these trends in greater detail. Like structured proteins, different structurally disordered proteins evolve at different rates, although there is a tendency for structurally disordered regions to evolve at higher amino acid substitution rates than structured proteins [<xref ref-type="bibr" rid="b66-genes-02-00748">66</xref>–<xref ref-type="bibr" rid="b68-genes-02-00748">68</xref>]. A recent effort to calculate a disordered protein specific substitution matrix also shows that specific matrices for these proteins can be generated [<xref ref-type="bibr" rid="b69-genes-02-00748">69</xref>] but unfortunately, the generality of such matrices is dependent upon the conservation of selective pressures within disordered regions and the conservation of disorder itself.</p>
<p>If we view proteins from the perspective of the folding energy landscape, the conformational dynamics vary from globular proteins with a well-defined global minimum to those that are present as highly dynamic ensembles of interconverting conformational states separated by low energy barriers, such as the structurally disordered proteins [<xref ref-type="bibr" rid="b70-genes-02-00748">70</xref>]. Structurally disordered proteins are prone to adopt different conformations (alter the conformational ensemble) in different environments and indeed structurally disordered regions show high conformational flexibility over different timescales and ranges of motion. Different conformational states are favored in interactions with different structural scaffolds and post-translational modifications are often involved in regulating conformational ensembles. As the structurally disordered proteins are characterized as conformational ensembles interconverting over a flattened energy landscape, mutations are likely to shift the conformational ensemble.</p>
<p>One of the mechanisms for generating novel or partitioned functions is through gene duplications/gene redundancy. It was recently shown that gene retention after gene duplication is higher for genes with many phosphorylation sites [<xref ref-type="bibr" rid="b71-genes-02-00748">71</xref>]. Structurally disordered proteins are enriched in phosphorylation sites and perhaps the thermodynamics of disorder in itself can provide an explanation. For globular structured proteins one main determinant for fixing a mutation is the effect of the mutation on the stability of the protein fold. Structurally disordered proteins are already less stable than the globular protein and exist as interconverting conformational ensembles. Therefore one might expect that these proteins will follow different rules. Here, a certain mutation may not abolish all conformations but simply a subset of the conformational ensemble. On shorter time scales, mutations that affect the equilibrium of the conformational ensemble can be regarded as influencing the function rather than the structure, while on longer time scales large changes in the conformational ensemble from a pair of gene duplicates may no longer overlap and can be regarded as changing the structure or fold. This would reflect a fold transition; a change from one fold or conformational ensemble into a distinctly different fold or conformational ensemble. Hence, structurally disordered proteins (proteins present as conformational ensembles) provide a mechanism for neostructuralization. An example of this concept is illustrated in <xref ref-type="fig" rid="f2-genes-02-00748">Figure 2</xref>.</p></sec>
<sec sec-type="methods">
<label>7.</label>
<title>Designability of Structurally Disordered Proteins</title>
<p>Structurally disordered proteins are present as conformations of very low stability, distributed over a locally flat energy landscape. A mutation is likely to rearrange the conformational equilibrium and hence, mutations can be stabilizing, neutral, and destabilizing for different parts of the conformational ensemble at the same time. A mutation can alter the conformational ensemble, making a subset of conformations essentially unpopulated while functional conformations for which the mutation is stabilizing may gain population. This will result in a new energy landscape. If the new energy landscape is slightly less flat and has a few deeper wells, it could result in mutation driven conformational selection, which explains how structurally disordered proteins or regions can speed up the evolution of the protein structural landscape. Hence mutation driven conformation selection contributes to neostructuralization with different predominant conformations among homologs. Globular structured proteins that maintain their fold despite high sequence divergence have high designability (reviewed in [<xref ref-type="bibr" rid="b72-genes-02-00748">72</xref>]). Structurally disordered proteins evolve at elevated rates compared to many globular proteins [<xref ref-type="bibr" rid="b66-genes-02-00748">66</xref>–<xref ref-type="bibr" rid="b68-genes-02-00748">68</xref>], but does this mean that structural disorder has high designability with functional consequences or does it mean that most substitutions do not change the conformational ensemble significantly and are in fact functionally neutral? Can evolution of structurally disordered proteins provide a mechanism for neutral mutations to drive biological divergence [<xref ref-type="bibr" rid="b73-genes-02-00748">73</xref>]? Structurally disordered proteins are known to have a broad functional spectrum (reviewed in [<xref ref-type="bibr" rid="b74-genes-02-00748">74</xref>]), and this can lead to functional partitions after gene duplication. In a more subtle case, structurally disordered proteins can generate small changes in phenotype by a change in genotype that affects the conformational ensemble. If several conformational ensembles are altered in a small but cooperative manner, it could provide an underlying mechanism for structural divergence driving functional and phenotypic differences.</p>
<p>From an understanding of the evolutionary behavior of ordered and disordered proteins from a biophysical perspective comes the goal of modeling the evolution of proteins with more realistic models.</p></sec>
<sec>
<label>8.</label>
<title>Modeling Evolution of Structurally Ordered Proteins</title>
<p>An overview of methods for modeling of the evolution of structurally ordered proteins has recently been described [<xref ref-type="bibr" rid="b24-genes-02-00748">24</xref>] and will only be summarized here. Two research trajectories have emerged that model the evolution of sequences in structurally ordered regions for evolutionary purposes. Retrospective analysis, particularly in the construction of phylogenetic trees [<xref ref-type="bibr" rid="b75-genes-02-00748">75</xref>–<xref ref-type="bibr" rid="b77-genes-02-00748">77</xref>] is one trajectory, where structural and biophysical considerations are viewed as an integral component of the evolution of proteins over long evolutionary distances and attempts have been made to replace purely statistical models that account for structure with the use of either a gamma distribution or a covarion process [<xref ref-type="bibr" rid="b78-genes-02-00748">78</xref>]. A second trajectory that has emerged is in the forward evolution of proteins, or sequence simulation constrained by a fold that does not vary [<xref ref-type="bibr" rid="b18-genes-02-00748">18</xref>,<xref ref-type="bibr" rid="b19-genes-02-00748">19</xref>].</p>
<p>For both of these trajectories, two classes of models are available, informational and physical models. In informational models, average interaction propensities extracted from PDB are summarized in matrices that reflect informational potentials [<xref ref-type="bibr" rid="b79-genes-02-00748">79</xref>,<xref ref-type="bibr" rid="b80-genes-02-00748">80</xref>]. These models can suffer from a lack of folding specificity [<xref ref-type="bibr" rid="b19-genes-02-00748">19</xref>,<xref ref-type="bibr" rid="b75-genes-02-00748">75</xref>–<xref ref-type="bibr" rid="b77-genes-02-00748">77</xref>]. An alternative is the use of models rooted in the physical principles of inter-atomic or inter-residue interaction. Because of the large number of calculations involved in both forward and retrospective evolutionary analysis, some degree of coarse-graining is necessary. The early physical coarse-grained models appear to be more specific than the informational potentials, but still have barriers to overcome, including a representation of side chains that leads to a properly packed hydrophobic core [<xref ref-type="bibr" rid="b19-genes-02-00748">19</xref>]. Research in these trajectories is ongoing.</p></sec>
<sec>
<label>9.</label>
<title>Modeling Evolution of Structurally Disordered Proteins</title>
<p>One important trajectory will be to extend the models for structurally ordered regions to structurally disordered regions. Structurally disordered regions are functional in two key ways. Some structurally disordered regions become ordered upon binding and function as ordered regions [<xref ref-type="bibr" rid="b81-genes-02-00748">81</xref>]. In this case, the problem is simpler in that the proteins can be simulated as ordered while accounting in the model for the energy associated with the order to disorder transition. This will initially only approximate differences in the energy of this transition for different binding partners that is not reflected in differences in energy accounted for in the modeled ordered state. Nothing along these lines has yet been implemented.</p>
<p>A second class of disordered proteins are those that function as disordered regions [<xref ref-type="bibr" rid="b81-genes-02-00748">81</xref>]. To model such proteins, it will be important to uncover the sequence constraints on their disorder to be functional, as this will reflect a departure from neutrality in evolutionary rate. To the extent that this is a sequence rather than structural constraint, standard Markov Models will likely be appropriate [<xref ref-type="bibr" rid="b68-genes-02-00748">68</xref>]. One pitfall with Markov Models is that they generalize evolutionary properties that may be context dependent and better models are not conceivable without a better understanding of the evolutionary and biophysical properties of disordered regions.</p>
<p>In both cases, an important added constraint may be that the sequence in its unbound state is disordered rather than ordered. This constraint can be added to the model to select against mutations that would lead to a folded state. A random contacts model [<xref ref-type="bibr" rid="b82-genes-02-00748">82</xref>] could be implemented and a feature of this nature is implemented in IUPred, based upon an evaluation of the existence of favorable contacts for folding within the region [<xref ref-type="bibr" rid="b83-genes-02-00748">83</xref>].</p></sec>
<sec sec-type="conclusions">
<label>10.</label>
<title>Conclusions</title>
<p>As computational molecular biology and computational molecular evolution mature as fields, considerations of both the biophysical and the evolutionary attributes of proteins are increasingly being integrated. This coincides with an appreciation of the complexity of the biophysical chemistry of proteins in a cell, including the role of conformational ensembles, of post-translational modifications, of folding pathways, of protein kinetics, of protein complexes, and eventually of other cellular attributes, such as the role of chaperones. This is ultimately underpinned by an understanding of the energy landscape for a single sequence, and for homologous sequences linked through the mutational process. Simultaneously, protein structural and biophysical models will increasingly need to explicitly consider evolutionary processes as well in the field of structural bioinformatics. With these considerations, models will become more powerful (and slower) as the field moves forward.</p></sec></body>
<back>
<sec sec-type="display-objects">
<title>Figures</title>
<fig id="f1-genes-02-00748" position="float">
<label>Figure 1.</label>
<caption>
<p>A possible conformational energy landscape for a typical structured protein. The protein has two alternative folding pathways (top), proceeding from the unfolded state (U) to the native state (N) through one (I2) or two (I1A, I1B) intermediate conformations. The funnel-shaped landscape guarantees rapid folding to the native state, passing various metastable states with different rates of interconversion on the way. The shaded area near the native state indicates the magnitude of change in folding energy that is selectively neutral (dependent upon to population size Ne and selective pressure s).</p></caption>
<graphic xlink:href="genes-02-00748f1.gif"/></fig>
<fig id="f2-genes-02-00748" position="float">
<label>Figure 2.</label>
<caption>
<p>Evolution of an energy landscape and its conformational ensemble after gene duplication. At the root, the gene giving rise to the protein with the blue energy landscape resulting in conformations A to G is duplicated. At the next speciation event we can see that the two different gene copies have evolved along different trajectories. The blue copy at the speciation node has evolved under negative selection and resembles the ancient blue. The green copy at the speciation node has evolved under positive selection and of the original conformational ensemble, conformations F and G are no longer forming, but a new conformation, H, is forming. In addition, the equilibrium of the conformations is different in the blue <italic>vs.</italic> green energy landscapes. From the speciation node down to the extant sequences, blue is much conserved, while green although under negative selection, will lose conformation D, in one lineage. Analysis of the extant sequences would show that blue and green are structurally disordered homologs. However, although all these proteins are structurally disordered, the conformational ensembles differ between blue and green (while being the same within the blue copies, and very similar within the green copies.)</p></caption>
<graphic xlink:href="genes-02-00748f2.gif"/></fig></sec>
<ack>
<p>We thank Jan Kubelka, Richard Goldstein, Vladimir Uversky, and three anonymous reviewers for helpful discussions. JSL, JAG, and DAL receive funding from NIH-INBRE P20 RR016474. DAL also receives funding from NSF DBI-0743374.</p></ack>
<ref-list>
<title>References</title>
<ref id="b1-genes-02-00748"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname><given-names>L.</given-names></name><name><surname>Bourne</surname><given-names>P.E.</given-names></name></person-group><article-title>Functional coverage of the human genome by existing structures, structural genomics targets, and homology models</article-title><source>PLoS Comput. Biol.</source><year>2005</year><volume>1</volume><pub-id pub-id-type="doi">10.1371/journal.pcbi.0010031</pub-id></citation></ref>
<ref id="b2-genes-02-00748"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Campbell</surname><given-names>Z.T.</given-names></name><name><surname>Baldwin</surname><given-names>T.O.</given-names></name><name><surname>Miyashita</surname><given-names>O.</given-names></name></person-group><article-title>Analysis of the bacterial luciferase mobile loop by replica-exchange molecular dynamics</article-title><source>Biophys. J.</source><year>2010</year><volume>99</volume><fpage>4012</fpage><lpage>4019</lpage><pub-id pub-id-type="doi">10.1016/j.bpj.2010.11.001</pub-id><pub-id pub-id-type="pmid">21156144</pub-id></citation></ref>
<ref id="b3-genes-02-00748"><label>3.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pearl</surname><given-names>F.M.</given-names></name><name><surname>Bennett</surname><given-names>C.F.</given-names></name><name><surname>Bray</surname><given-names>J.E.</given-names></name><name><surname>Harrison</surname><given-names>A.P.</given-names></name><name><surname>Martin</surname><given-names>N.</given-names></name><name><surname>Shepherd</surname><given-names>A.</given-names></name><name><surname>Sillitoe</surname><given-names>I.</given-names></name><name><surname>Thornton</surname><given-names>J.</given-names></name><name><surname>Orengo</surname><given-names>C.A.</given-names></name></person-group><article-title>The CATH database: An extended protein family resource for structural and functional genomics</article-title><source>Nucleic Acids Res.</source><year>2003</year><volume>31</volume><fpage>452</fpage><lpage>455</lpage><pub-id pub-id-type="doi">10.1093/nar/gkg062</pub-id><pub-id pub-id-type="pmid">12520050</pub-id></citation></ref>
<ref id="b4-genes-02-00748"><label>4.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Siltberg-Liberles</surname><given-names>J.</given-names></name></person-group><article-title>Evolution of structurally disordered proteins promotes neostructuralization</article-title><source>Mol. Biol. Evol.</source><year>2011</year><volume>28</volume><fpage>59</fpage><lpage>62</lpage><pub-id pub-id-type="doi">10.1093/molbev/msq291</pub-id><pub-id pub-id-type="pmid">21037204</pub-id></citation></ref>
<ref id="b5-genes-02-00748"><label>5.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roth</surname><given-names>C.</given-names></name><name><surname>Liberles</surname><given-names>D.A.</given-names></name></person-group><article-title>A systematic search for positive selection in higher plants (Embryophytes)</article-title><source>BMC Plant Biol.</source><year>2006</year><volume>12</volume><pub-id pub-id-type="doi">10.1186/1471-2229-6-12</pub-id></citation></ref>
<ref id="b6-genes-02-00748"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Illergård</surname><given-names>K.</given-names></name><name><surname>Ardell</surname><given-names>D.H.</given-names></name><name><surname>Elofsson</surname><given-names>A.</given-names></name></person-group><article-title>Structure is three to ten times more conserved than sequence—A study of structural response in protein cores</article-title><source>Proteins</source><year>2009</year><volume>77</volume><fpage>499</fpage><lpage>508</lpage><pub-id pub-id-type="doi">10.1002/prot.22458</pub-id><pub-id pub-id-type="pmid">19507241</pub-id></citation></ref>
<ref id="b7-genes-02-00748"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pereira de Araujo</surname><given-names>A.F.</given-names></name><name><surname>Onuchic</surname><given-names>J.N.</given-names></name></person-group><article-title>A sequence-compatible amount of native burial information is sufficient for determining the structure of small globular proteins</article-title><source>Proc. Natl. Acad. Sci. USA</source><year>2009</year><volume>106</volume><fpage>19001</fpage><lpage>19004</lpage><pub-id pub-id-type="doi">10.1073/pnas.0910851106</pub-id><pub-id pub-id-type="pmid">19858496</pub-id></citation></ref>
<ref id="b8-genes-02-00748"><label>8.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ramsey</surname><given-names>D.C.</given-names></name><name><surname>Scherrer</surname><given-names>M.P.</given-names></name><name><surname>Zhou</surname><given-names>T.</given-names></name><name><surname>Wilke</surname><given-names>C.O.</given-names></name></person-group><article-title>The relationship between relative solvent accessibility and evolutionary rate in protein evolution</article-title><source>Genetics</source><year>2011</year><volume>188</volume><fpage>479</fpage><lpage>488</lpage><pub-id pub-id-type="doi">10.1534/genetics.111.128025</pub-id><pub-id pub-id-type="pmid">21467571</pub-id></citation></ref>
<ref id="b9-genes-02-00748"><label>9.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Williams</surname><given-names>S.G.</given-names></name><name><surname>Lovell</surname><given-names>S.C.</given-names></name></person-group><article-title>The effect of sequence evolution on protein structural divergence</article-title><source>Mol. Biol. Evol.</source><year>2009</year><volume>26</volume><fpage>1055</fpage><lpage>1065</lpage><pub-id pub-id-type="doi">10.1093/molbev/msp020</pub-id><pub-id pub-id-type="pmid">19193735</pub-id></citation></ref>
<ref id="b10-genes-02-00748"><label>10.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kellogg</surname><given-names>E.H.</given-names></name><name><surname>Leaver-Fay</surname><given-names>A.</given-names></name><name><surname>Baker</surname><given-names>D.</given-names></name></person-group><article-title>Role of conformational sampling in computing mutation-induced changes in protein structure and stability</article-title><source>Proteins</source><year>2011</year><volume>79</volume><fpage>830</fpage><lpage>838</lpage><pub-id pub-id-type="doi">10.1002/prot.22921</pub-id><pub-id pub-id-type="pmid">21287615</pub-id></citation></ref>
<ref id="b11-genes-02-00748"><label>11.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lockless</surname><given-names>S.W.</given-names></name><name><surname>Ranganathan</surname><given-names>R.</given-names></name></person-group><article-title>Evolutionarily conserved pathways of energetic connectivity in protein families</article-title><source>Science</source><year>1999</year><volume>286</volume><fpage>295</fpage><lpage>299</lpage><pub-id pub-id-type="doi">10.1126/science.286.5438.295</pub-id><pub-id pub-id-type="pmid">10514373</pub-id></citation></ref>
<ref id="b12-genes-02-00748"><label>12.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Böde</surname><given-names>C.</given-names></name><name><surname>Kovács</surname><given-names>I.A.</given-names></name><name><surname>Szalay</surname><given-names>M.S.</given-names></name><name><surname>Palotai</surname><given-names>R.</given-names></name><name><surname>Korcsmáros</surname><given-names>T.</given-names></name><name><surname>Csermely</surname><given-names>P.</given-names></name></person-group><article-title>Network analysis of protein dynamics</article-title><source>FEBS Lett.</source><year>2007</year><volume>581</volume><fpage>2776</fpage><lpage>2782</lpage><pub-id pub-id-type="doi">10.1016/j.febslet.2007.05.021</pub-id><pub-id pub-id-type="pmid">17531981</pub-id></citation></ref>
<ref id="b13-genes-02-00748"><label>13.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pechmann</surname><given-names>S.</given-names></name><name><surname>Levy</surname><given-names>E.D.</given-names></name><name><surname>Tartaglia</surname><given-names>G.G.</given-names></name><name><surname>Vendruscolo</surname><given-names>M.</given-names></name></person-group><article-title>Physicochemical principles that regulate the competition between functional and dysfunctional association of proteins</article-title><source>Proc. Natl. Acad. Sci. USA</source><year>2009</year><volume>106</volume><fpage>10159</fpage><lpage>10164</lpage><pub-id pub-id-type="doi">10.1073/pnas.0812414106</pub-id><pub-id pub-id-type="pmid">19502422</pub-id></citation></ref>
<ref id="b14-genes-02-00748"><label>14.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schaefer</surname><given-names>C.</given-names></name><name><surname>Schlessinger</surname><given-names>A.</given-names></name><name><surname>Rost</surname><given-names>B.</given-names></name></person-group><article-title>Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be</article-title><source>Bioinformatics</source><year>2010</year><volume>26</volume><fpage>625</fpage><lpage>631</lpage><pub-id pub-id-type="doi">10.1093/bioinformatics/btq012</pub-id><pub-id pub-id-type="pmid">20081223</pub-id></citation></ref>
<ref id="b15-genes-02-00748"><label>15.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kamneva</surname><given-names>O.K.</given-names></name><name><surname>Liberles</surname><given-names>D.A.</given-names></name><name><surname>Ward</surname><given-names>N.L.</given-names></name></person-group><article-title>Genome-wide influence of indel Substitutions on evolution of bacteria of the PVC superphylum, revealed using a novel computational method</article-title><source>Genome Biol. Evol.</source><year>2010</year><fpage>870</fpage><lpage>886</lpage></citation></ref>
<ref id="b16-genes-02-00748"><label>16.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Finn</surname><given-names>R.D.</given-names></name><name><surname>Mistry</surname><given-names>J.</given-names></name><name><surname>Tate</surname><given-names>J.</given-names></name><name><surname>Coggill</surname><given-names>P.</given-names></name><name><surname>Heger</surname><given-names>A.</given-names></name><name><surname>Pollington</surname><given-names>J.E.</given-names></name><name><surname>Gavin</surname><given-names>O.L.</given-names></name><name><surname>Gunasekaran</surname><given-names>P.</given-names></name><name><surname>Ceric</surname><given-names>G.</given-names></name><name><surname>Forslund</surname><given-names>K.</given-names></name><etal/></person-group><article-title>The Pfam protein families database</article-title><source>Nucleic Acids Res.</source><year>2010</year><volume>38</volume><fpage>D211</fpage><lpage>D222</lpage><pub-id pub-id-type="doi">10.1093/nar/gkp985</pub-id><pub-id pub-id-type="pmid">19920124</pub-id></citation></ref>
<ref id="b17-genes-02-00748"><label>17.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Povolotskaya</surname><given-names>I.S.</given-names></name><name><surname>Kondrashov</surname><given-names>F.A.</given-names></name></person-group><article-title>Sequence space and the ongoing expansion of the protein universe</article-title><source>Nature</source><year>2010</year><volume>465</volume><fpage>922</fpage><lpage>926</lpage><pub-id pub-id-type="doi">10.1038/nature09105</pub-id><pub-id pub-id-type="pmid">20485343</pub-id></citation></ref>
<ref id="b18-genes-02-00748"><label>18.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rastogi</surname><given-names>S.</given-names></name><name><surname>Reuter</surname><given-names>N.</given-names></name><name><surname>Liberles</surname><given-names>D.A.</given-names></name></person-group><article-title>Evaluation of models for the evolution of protein sequences and functions under structural constraint</article-title><source>Biophys. Chem.</source><year>2006</year><volume>124</volume><fpage>134</fpage><lpage>144</lpage><pub-id pub-id-type="doi">10.1016/j.bpc.2006.06.008</pub-id><pub-id pub-id-type="pmid">16837122</pub-id></citation></ref>
<ref id="b19-genes-02-00748"><label>19.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grahnen</surname><given-names>J.A.</given-names></name><name><surname>Nandakumar</surname><given-names>P.</given-names></name><name><surname>Kubelka</surname><given-names>J.</given-names></name><name><surname>Liberles</surname><given-names>D.A.</given-names></name></person-group><article-title>Biophysical and Structural Considerations for Protein Evolution</article-title><source>BMC Evol. Biol.</source><year>2011</year><comment>submitted</comment></citation></ref>
<ref id="b20-genes-02-00748"><label>20.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alvizo</surname><given-names>O.</given-names></name><name><surname>Mayo</surname><given-names>S.L.</given-names></name></person-group><article-title>Evaluating and optimizing computational protein design force fields using fixed composition-based negative design</article-title><source>Proc. Natl. Acad. Sci. USA</source><year>2008</year><volume>105</volume><fpage>12242</fpage><lpage>12247</lpage><pub-id pub-id-type="doi">10.1073/pnas.0805858105</pub-id><pub-id pub-id-type="pmid">18708527</pub-id></citation></ref>
<ref id="b21-genes-02-00748"><label>21.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ding</surname><given-names>F.</given-names></name><name><surname>Dokholyan</surname><given-names>N.V.</given-names></name></person-group><article-title>Emergence of protein fold families through rational design</article-title><source>PLoS Comput. Biol.</source><year>2006</year><volume>2</volume><pub-id pub-id-type="doi">10.1371/journal.pcbi.0020085</pub-id></citation></ref>
<ref id="b22-genes-02-00748"><label>22.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dalal</surname><given-names>S.</given-names></name><name><surname>Balasubramanian</surname><given-names>S.</given-names></name><name><surname>Regan</surname><given-names>L.</given-names></name></person-group><article-title>Transmuting alpha helices and beta sheets</article-title><source>Fold. Des.</source><year>1997</year><volume>2</volume><fpage>R71</fpage><lpage>79</lpage><pub-id pub-id-type="doi">10.1016/S1359-0278(97)00036-9</pub-id><pub-id pub-id-type="pmid">9377709</pub-id></citation></ref>
<ref id="b23-genes-02-00748"><label>23.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Drummond</surname><given-names>D.A.</given-names></name><name><surname>Wilke</surname><given-names>C.O.</given-names></name></person-group><article-title>Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution</article-title><source>Cell</source><year>2008</year><volume>134</volume><fpage>341</fpage><lpage>352</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2008.05.042</pub-id><pub-id pub-id-type="pmid">18662548</pub-id></citation></ref>
<ref id="b24-genes-02-00748"><label>24.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Teufel</surname><given-names>A.I.</given-names></name><name><surname>Grahnen</surname><given-names>J.A.</given-names></name><name><surname>Liberles</surname><given-names>D.A.</given-names></name></person-group><article-title>Modeling Proteins at the Interface of Structure, Evolution, and Population Genetics</article-title><source>Computational Modeling of Biological Systems: From Molecules to Pathways</source><person-group person-group-type="editor"><name><surname>Dokholyan</surname><given-names>N.</given-names></name></person-group><publisher-name>Springer-Verlag</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>2011</year><comment>in press</comment></citation></ref>
<ref id="b25-genes-02-00748"><label>25.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fernández</surname><given-names>A.</given-names></name><name><surname>Lynch</surname><given-names>M.</given-names></name></person-group><article-title>Non-adaptive origins of interactome complexity</article-title><source>Nature</source><year>2011</year><volume>474</volume><fpage>502</fpage><lpage>505</lpage><pub-id pub-id-type="doi">10.1038/nature09992</pub-id><pub-id pub-id-type="pmid">21593762</pub-id></citation></ref>
<ref id="b26-genes-02-00748"><label>26.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wolynes</surname><given-names>P.G.</given-names></name></person-group><article-title>Recent successes of the energy landscape theory of protein folding and function</article-title><source>Q. Rev. Biophys.</source><year>2005</year><volume>38</volume><fpage>405</fpage><lpage>410</lpage><pub-id pub-id-type="doi">10.1017/S0033583505004075</pub-id><pub-id pub-id-type="pmid">16934172</pub-id></citation></ref>
<ref id="b27-genes-02-00748"><label>27.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taverna</surname><given-names>D.M.</given-names></name><name><surname>Goldstein</surname><given-names>R.A.</given-names></name></person-group><article-title>Why are proteins so robust to site mutations?</article-title><source>J. Mol. Biol.</source><year>2002</year><volume>315</volume><fpage>479</fpage><lpage>484</lpage><pub-id pub-id-type="doi">10.1006/jmbi.2001.5226</pub-id><pub-id pub-id-type="pmid">11786027</pub-id></citation></ref>
<ref id="b28-genes-02-00748"><label>28.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kumar</surname><given-names>M.D.</given-names></name><name><surname>Bava</surname><given-names>K.A.</given-names></name><name><surname>Gromiha</surname><given-names>M.M.</given-names></name><name><surname>Prabakaran</surname><given-names>P.</given-names></name><name><surname>Kitajima</surname><given-names>K.</given-names></name><name><surname>Uedaira</surname><given-names>H.</given-names></name><name><surname>Sarai</surname><given-names>A.</given-names></name></person-group><article-title>ProTherm and ProNIT: Thermodynamic databases for proteins and protein-nucleic acid interactions</article-title><source>Nucleic Acids Res.</source><year>2006</year><volume>34</volume><fpage>D204</fpage><lpage>206</lpage><pub-id pub-id-type="doi">10.1093/nar/gkj103</pub-id><pub-id pub-id-type="pmid">16381846</pub-id></citation></ref>
<ref id="b29-genes-02-00748"><label>29.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Soskine</surname><given-names>M.</given-names></name><name><surname>Tawfik</surname><given-names>D.S.</given-names></name></person-group><article-title>Mutational effects and the evolution of new protein functions</article-title><source>Nat. Rev. Genet.</source><year>2010</year><volume>11</volume><fpage>572</fpage><lpage>582</lpage><pub-id pub-id-type="pmid">20634811</pub-id></citation></ref>
<ref id="b30-genes-02-00748"><label>30.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taverna</surname><given-names>D.M.</given-names></name><name><surname>Goldstein</surname><given-names>R.A.</given-names></name></person-group><article-title>Why are proteins marginally stable?</article-title><source>Proteins</source><year>2002</year><volume>46</volume><fpage>105</fpage><lpage>109</lpage><pub-id pub-id-type="doi">10.1002/prot.10016</pub-id><pub-id pub-id-type="pmid">11746707</pub-id></citation></ref>
<ref id="b31-genes-02-00748"><label>31.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goldstein</surname><given-names>R.A.</given-names></name></person-group><article-title>The evolution and evolutionary consequences of marginal thermostability in proteins</article-title><source>Proteins</source><year>2011</year><volume>79</volume><fpage>1396</fpage><lpage>1407</lpage><pub-id pub-id-type="doi">10.1002/prot.22964</pub-id><pub-id pub-id-type="pmid">21337623</pub-id></citation></ref>
<ref id="b32-genes-02-00748"><label>32.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Berezovsky</surname><given-names>I.N.</given-names></name><name><surname>Shakhnovich</surname><given-names>E.I.</given-names></name></person-group><article-title>Physics and evolution of thermophilic adaptation</article-title><source>Proc. Natl. Acad. Sci. USA</source><year>2005</year><volume>102</volume><fpage>12742</fpage><lpage>12747</lpage><pub-id pub-id-type="doi">10.1073/pnas.0503890102</pub-id><pub-id pub-id-type="pmid">16120678</pub-id></citation></ref>
<ref id="b33-genes-02-00748"><label>33.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>DePristo</surname><given-names>M.A.</given-names></name><name><surname>Weinreich</surname><given-names>D.M.</given-names></name><name><surname>Hartl</surname><given-names>D.L.</given-names></name></person-group><article-title>Missense meanderings in sequence space: A biophysical view of protein evolution</article-title><source>Nat. Rev. Genet.</source><year>2005</year><volume>6</volume><fpage>678</fpage><lpage>687</lpage><pub-id pub-id-type="doi">10.1038/nrg1672</pub-id><pub-id pub-id-type="pmid">16074985</pub-id></citation></ref>
<ref id="b34-genes-02-00748"><label>34.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tokuriki</surname><given-names>N.</given-names></name><name><surname>Stricher</surname><given-names>F.</given-names></name><name><surname>Serrano</surname><given-names>L.</given-names></name><name><surname>Tawfik</surname><given-names>D.S.</given-names></name></person-group><article-title>How protein stability and new functions trade off</article-title><source>PLoS Comput. Biol.</source><year>2008</year><volume>4</volume><pub-id pub-id-type="doi">10.1371/journal.pcbi.1000002</pub-id></citation></ref>
<ref id="b35-genes-02-00748"><label>35.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shakhnovich</surname><given-names>B.E.</given-names></name><name><surname>Deeds</surname><given-names>E.</given-names></name><name><surname>Delisi</surname><given-names>C.</given-names></name><name><surname>Shakhnovich</surname><given-names>E.</given-names></name></person-group><article-title>Protein structure and evolutionary history determine sequence space topology</article-title><source>Genome Res.</source><year>2005</year><volume>15</volume><fpage>385</fpage><lpage>392</lpage><pub-id pub-id-type="doi">10.1101/gr.3133605</pub-id><pub-id pub-id-type="pmid">15741509</pub-id></citation></ref>
<ref id="b36-genes-02-00748"><label>36.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bolon</surname><given-names>D.N.</given-names></name><name><surname>Grant</surname><given-names>R.A.</given-names></name><name><surname>Baker</surname><given-names>T.A.</given-names></name><name><surname>Sauer</surname><given-names>R.T.</given-names></name></person-group><article-title>Specificity <italic>versus</italic> stability in computational protein design</article-title><source>Proc. Natl. Acad. Sci. USA</source><year>2005</year><volume>102</volume><fpage>12724</fpage><lpage>12729</lpage><pub-id pub-id-type="doi">10.1073/pnas.0506124102</pub-id><pub-id pub-id-type="pmid">16129838</pub-id></citation></ref>
<ref id="b37-genes-02-00748"><label>37.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ohta</surname><given-names>T.</given-names></name><name><surname>Gillespie</surname><given-names>J.H.</given-names></name></person-group><article-title>Development of neutral and nearly neutral theories</article-title><source>Theor. Popul. Biol.</source><year>1996</year><volume>49</volume><fpage>128</fpage><lpage>142</lpage><pub-id pub-id-type="doi">10.1006/tpbi.1996.0007</pub-id><pub-id pub-id-type="pmid">8813019</pub-id></citation></ref>
<ref id="b38-genes-02-00748"><label>38.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hietpas</surname><given-names>R.T.</given-names></name><name><surname>Jensen</surname><given-names>J.D.</given-names></name><name><surname>Bolon</surname><given-names>D.N.</given-names></name></person-group><article-title>Experimental illumination of a fitness landscape</article-title><source>Proc. Natl. Acad. Sci. USA</source><year>2011</year><volume>108</volume><fpage>7896</fpage><lpage>7901</lpage><pub-id pub-id-type="doi">10.1073/pnas.1016024108</pub-id><pub-id pub-id-type="pmid">21464309</pub-id></citation></ref>
<ref id="b39-genes-02-00748"><label>39.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lind</surname><given-names>P.A.</given-names></name><name><surname>Berg</surname><given-names>O.G.</given-names></name><name><surname>Andersson</surname><given-names>D.I.</given-names></name></person-group><article-title>Mutational robustness of ribosomal protein genes</article-title><source>Science</source><year>2010</year><volume>330</volume><fpage>825</fpage><lpage>827</lpage><pub-id pub-id-type="doi">10.1126/science.1194617</pub-id><pub-id pub-id-type="pmid">21051637</pub-id></citation></ref>
<ref id="b40-genes-02-00748"><label>40.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wylie</surname><given-names>C.S.</given-names></name><name><surname>Shakhnovich</surname><given-names>E.I.</given-names></name></person-group><article-title>A biophysical protein folding model accounts for most mutational fitness effects in viruses</article-title><source>Proc. Natl. Acad. Sci. USA</source><year>2011</year><volume>108</volume><fpage>9916</fpage><lpage>9921</lpage><pub-id pub-id-type="doi">10.1073/pnas.1017572108</pub-id><pub-id pub-id-type="pmid">21610162</pub-id></citation></ref>
<ref id="b41-genes-02-00748"><label>41.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hou</surname><given-names>J.</given-names></name><name><surname>Jun</surname><given-names>S.R.</given-names></name><name><surname>Zhang</surname><given-names>C.</given-names></name><name><surname>Kim</surname><given-names>S.H.</given-names></name></person-group><article-title>Global mapping of the protein structure space and application in structure-based inference of protein function</article-title><source>Proc. Natl. Acad. Sci. USA</source><year>2005</year><volume>102</volume><fpage>3651</fpage><lpage>3656</lpage><pub-id pub-id-type="doi">10.1073/pnas.0409772102</pub-id><pub-id pub-id-type="pmid">15705717</pub-id></citation></ref>
<ref id="b42-genes-02-00748"><label>42.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Skolnick</surname><given-names>J.</given-names></name><name><surname>Arakaki</surname><given-names>A.K.</given-names></name><name><surname>Lee</surname><given-names>S.Y.</given-names></name><name><surname>Brylinski</surname><given-names>M.</given-names></name></person-group><article-title>The continuity of protein structure space is an intrinsic property of proteins</article-title><source>Proc. Natl. Acad. Sci. USA</source><year>2009</year><volume>106</volume><fpage>15690</fpage><lpage>15695</lpage><pub-id pub-id-type="doi">10.1073/pnas.0907683106</pub-id><pub-id pub-id-type="pmid">19805219</pub-id></citation></ref>
<ref id="b43-genes-02-00748"><label>43.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pascual-García</surname><given-names>A.</given-names></name><name><surname>Abia</surname><given-names>D.</given-names></name><name><surname>Ortiz</surname><given-names>A.R.</given-names></name><name><surname>Bastolla</surname><given-names>U.</given-names></name></person-group><article-title>Cross-over between discrete and continuous protein structure space: Insights into automatic classification and networks of protein structures</article-title><source>PLoS Comput. Biol.</source><year>2009</year><volume>5</volume><pub-id pub-id-type="doi">10.1371/journal.pcbi.1000331</pub-id></citation></ref>
<ref id="b44-genes-02-00748"><label>44.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Osadchy</surname><given-names>M.</given-names></name><name><surname>Kolodny</surname><given-names>R.</given-names></name></person-group><article-title>Maps of protein structure space reveal a fundamental relationship between protein structure and function</article-title><source>Proc. Natl. Acad. Sci. USA</source><year>2011</year><volume>108</volume><fpage>12301</fpage><lpage>12306</lpage><pub-id pub-id-type="doi">10.1073/pnas.1102727108</pub-id><pub-id pub-id-type="pmid">21737750</pub-id></citation></ref>
<ref id="b45-genes-02-00748"><label>45.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Andreeva</surname><given-names>A.</given-names></name><name><surname>Howorth</surname><given-names>D.</given-names></name><name><surname>Chandonia</surname><given-names>J.M.</given-names></name><name><surname>Brenner</surname><given-names>S.E.</given-names></name><name><surname>Hubbard</surname><given-names>T.J.</given-names></name><name><surname>Chothia</surname><given-names>C.</given-names></name><name><surname>Murzin</surname><given-names>A.G.</given-names></name></person-group><article-title>Data growth and its impact on the SCOP database: New developments</article-title><source>Nucleic Acids Res.</source><year>2008</year><volume>36</volume><fpage>D419</fpage><lpage>425</lpage><pub-id pub-id-type="pmid">18000004</pub-id></citation></ref>
<ref id="b46-genes-02-00748"><label>46.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weiner</surname><given-names>J.</given-names><suffix>3rd</suffix></name><name><surname>Bornberg-Bauer</surname><given-names>E.</given-names></name></person-group><article-title>Evolution of circular permutations in multidomain proteins</article-title><source>Mol. Biol. Evol.</source><year>2006</year><volume>23</volume><fpage>734</fpage><lpage>743</lpage><pub-id pub-id-type="doi">10.1093/molbev/msj091</pub-id><pub-id pub-id-type="pmid">16431849</pub-id></citation></ref>
<ref id="b47-genes-02-00748"><label>47.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kimchi-Sarfaty</surname><given-names>C.</given-names></name><name><surname>Oh</surname><given-names>J.M.</given-names></name><name><surname>Kim</surname><given-names>I.W.</given-names></name><name><surname>Sauna</surname><given-names>Z.E.</given-names></name><name><surname>Calcagno</surname><given-names>A.M.</given-names></name><name><surname>Ambudkar</surname><given-names>S.V.</given-names></name><name><surname>Gottesman</surname><given-names>M.M.</given-names></name></person-group><article-title>A “silent” polymorphism in the MDR1 gene changes substrate specificity</article-title><source>Science</source><year>2007</year><volume>315</volume><fpage>525</fpage><lpage>528</lpage><pub-id pub-id-type="doi">10.1126/science.1135308</pub-id><pub-id pub-id-type="pmid">17185560</pub-id></citation></ref>
<ref id="b48-genes-02-00748"><label>48.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hollecker</surname><given-names>M.</given-names></name><name><surname>Creighton</surname><given-names>T.E.</given-names></name></person-group><article-title>Evolutionary conservation and variation of protein folding pathways. Two protease inhibitor homologues from black mamba venom</article-title><source>J. Mol. Biol.</source><year>1983</year><volume>168</volume><fpage>409</fpage><lpage>437</lpage><pub-id pub-id-type="doi">10.1016/S0022-2836(83)80026-6</pub-id><pub-id pub-id-type="pmid">6193277</pub-id></citation></ref>
<ref id="b49-genes-02-00748"><label>49.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Plaxco</surname><given-names>K.W.</given-names></name><name><surname>Simons</surname><given-names>K.T.</given-names></name><name><surname>Baker</surname><given-names>D.</given-names></name></person-group><article-title>Contact order, transition state placement and the refolding rates of single domain proteins</article-title><source>J. Mol. Biol.</source><year>1998</year><volume>277</volume><fpage>985</fpage><lpage>994</lpage><pub-id pub-id-type="doi">10.1006/jmbi.1998.1645</pub-id><pub-id pub-id-type="pmid">9545386</pub-id></citation></ref>
<ref id="b50-genes-02-00748"><label>50.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zarrine-Afsar</surname><given-names>A.</given-names></name><name><surname>Larson</surname><given-names>S.M.</given-names></name><name><surname>Davidson</surname><given-names>A.R.</given-names></name></person-group><article-title>The family feud: Do proteins with similar structures fold via the same pathway?</article-title><source>Curr. Opin. Struct. Biol.</source><year>2005</year><volume>15</volume><fpage>42</fpage><lpage>49</lpage><pub-id pub-id-type="doi">10.1016/j.sbi.2005.01.011</pub-id><pub-id pub-id-type="pmid">15718132</pub-id></citation></ref>
<ref id="b51-genes-02-00748"><label>51.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Han</surname><given-names>J.H.</given-names></name><name><surname>Batey</surname><given-names>S.</given-names></name><name><surname>Nickson</surname><given-names>A.A.</given-names></name><name><surname>Teichmann</surname><given-names>S.A.</given-names></name><name><surname>Clarke</surname><given-names>J.</given-names></name></person-group><article-title>The folding and evolution of multidomain proteins</article-title><source>Nat. Rev. Mol. Cell Biol.</source><year>2007</year><volume>8</volume><fpage>319</fpage><lpage>330</lpage><pub-id pub-id-type="doi">10.1038/nrm2144</pub-id><pub-id pub-id-type="pmid">17356578</pub-id></citation></ref>
<ref id="b52-genes-02-00748"><label>52.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shakhnovich</surname><given-names>E.</given-names></name></person-group><article-title>Protein folding thermodynamics and dynamics: Where physics, chemistry, and biology meet</article-title><source>Chem. Rev.</source><year>2006</year><volume>106</volume><fpage>1559</fpage><lpage>1588</lpage><pub-id pub-id-type="doi">10.1021/cr040425u</pub-id><pub-id pub-id-type="pmid">16683745</pub-id></citation></ref>
<ref id="b53-genes-02-00748"><label>53.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lindberg</surname><given-names>M.O.</given-names></name><name><surname>Oliveberg</surname><given-names>M.</given-names></name></person-group><article-title>Malleability of protein folding pathways: A simple reason for complex behaviour</article-title><source>Curr. Opin. Struct. Biol.</source><year>2007</year><volume>17</volume><fpage>21</fpage><lpage>29</lpage><pub-id pub-id-type="doi">10.1016/j.sbi.2007.01.008</pub-id><pub-id pub-id-type="pmid">17251003</pub-id></citation></ref>
<ref id="b54-genes-02-00748"><label>54.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Amunson</surname><given-names>K.E.</given-names></name><name><surname>Ackels</surname><given-names>L.</given-names></name><name><surname>Kubelka</surname><given-names>J.</given-names></name></person-group><article-title>Site-specific unfolding thermodynamics of a helix-turn-helix protein</article-title><source>J. Am. Chem. Soc.</source><year>2008</year><volume>130</volume><fpage>8146</fpage><lpage>8147</lpage><pub-id pub-id-type="doi">10.1021/ja802185e</pub-id><pub-id pub-id-type="pmid">18529000</pub-id></citation></ref>
<ref id="b55-genes-02-00748"><label>55.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dill</surname><given-names>K.A.</given-names></name><name><surname>Ozkan</surname><given-names>S.B.</given-names></name><name><surname>Shell</surname><given-names>M.S.</given-names></name><name><surname>Weikl</surname><given-names>T.R.</given-names></name></person-group><article-title>The protein folding problem</article-title><source>Annu. Rev. Biophys.</source><year>2008</year><volume>37</volume><fpage>289</fpage><lpage>316</lpage><pub-id pub-id-type="doi">10.1146/annurev.biophys.37.092707.153558</pub-id><pub-id pub-id-type="pmid">18573083</pub-id></citation></ref>
<ref id="b56-genes-02-00748"><label>56.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nakamura</surname><given-names>T.</given-names></name><name><surname>Makabe</surname><given-names>K.</given-names></name><name><surname>Tomoyori</surname><given-names>K.</given-names></name><name><surname>Maki</surname><given-names>K.</given-names></name><name><surname>Mukaiyama</surname><given-names>A.</given-names></name><name><surname>Kuwajima</surname><given-names>K.</given-names></name></person-group><article-title>Different folding pathways taken by highly homologous proteins, goat alpha-lactalbumin and canine milk lysozyme</article-title><source>J. Mol. Biol.</source><year>2010</year><volume>396</volume><fpage>1361</fpage><lpage>1378</lpage><pub-id pub-id-type="doi">10.1016/j.jmb.2010.01.021</pub-id><pub-id pub-id-type="pmid">20080106</pub-id></citation></ref>
<ref id="b57-genes-02-00748"><label>57.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Forsyth</surname><given-names>W.R.</given-names></name><name><surname>Matthews</surname><given-names>C.R.</given-names></name></person-group><article-title>Folding mechanism of indole-3-glycerol phosphate synthase from Sulfolobus solfataricus: A test of the conservation of folding mechanisms hypothesis in (beta(alpha))(8) barrels</article-title><source>J. Mol. Biol.</source><year>2002</year><volume>320</volume><fpage>1119</fpage><lpage>1133</lpage><pub-id pub-id-type="doi">10.1016/S0022-2836(02)00557-0</pub-id><pub-id pub-id-type="pmid">12126630</pub-id></citation></ref>
<ref id="b58-genes-02-00748"><label>58.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nakamura</surname><given-names>T.</given-names></name><name><surname>Makabe</surname><given-names>K.</given-names></name><name><surname>Tomoyori</surname><given-names>K.</given-names></name><name><surname>Maki</surname><given-names>K.</given-names></name><name><surname>Mukaiyama</surname><given-names>A.</given-names></name><name><surname>Kuwajima</surname><given-names>K.</given-names></name></person-group><article-title>Different folding pathways taken by highly homologous proteins, goat alpha-lactalbumin and canine milk lysozyme</article-title><source>J. Mol. Biol.</source><year>2010</year><volume>396</volume><fpage>1361</fpage><lpage>1378</lpage><pub-id pub-id-type="doi">10.1016/j.jmb.2010.01.021</pub-id><pub-id pub-id-type="pmid">20080106</pub-id></citation></ref>
<ref id="b59-genes-02-00748"><label>59.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Echave</surname><given-names>J.</given-names></name><name><surname>Fernández</surname><given-names>F.M.</given-names></name></person-group><article-title>A perturbative view of protein structural variation</article-title><source>Proteins</source><year>2010</year><volume>78</volume><fpage>173</fpage><lpage>180</lpage><pub-id pub-id-type="doi">10.1002/prot.22553</pub-id><pub-id pub-id-type="pmid">19731380</pub-id></citation></ref>
<ref id="b60-genes-02-00748"><label>60.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maguid</surname><given-names>S.</given-names></name><name><surname>Fernandez-Alberti</surname><given-names>S.</given-names></name><name><surname>Echave</surname><given-names>J.</given-names></name></person-group><article-title>Evolutionary conservation of protein vibrational dynamics</article-title><source>Gene</source><year>2008</year><volume>422</volume><fpage>7</fpage><lpage>13</lpage><pub-id pub-id-type="doi">10.1016/j.gene.2008.06.002</pub-id><pub-id pub-id-type="pmid">18577430</pub-id></citation></ref>
<ref id="b61-genes-02-00748"><label>61.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hollup</surname><given-names>S.M.</given-names></name><name><surname>Fuglebakk</surname><given-names>E.</given-names></name><name><surname>Taylor</surname><given-names>W.R.</given-names></name><name><surname>Reuter</surname><given-names>N.</given-names></name></person-group><article-title>Exploring the factors determining the dynamics of different protein folds</article-title><source>Protein Sci.</source><year>2011</year><volume>20</volume><fpage>197</fpage><lpage>209</lpage><pub-id pub-id-type="doi">10.1002/pro.558</pub-id><pub-id pub-id-type="pmid">21086444</pub-id></citation></ref>
<ref id="b62-genes-02-00748"><label>62.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dunker</surname><given-names>A.K.</given-names></name><name><surname>Obradovic</surname><given-names>Z.</given-names></name><name><surname>Romero</surname><given-names>P.</given-names></name><name><surname>Garner</surname><given-names>E.C.</given-names></name><name><surname>Brown</surname><given-names>C.J.</given-names></name></person-group><article-title>Intrinsic protein disorder in complete genomes</article-title><source>Genome Inform. Ser. Workshop Genome Inform.</source><year>2000</year><volume>11</volume><fpage>161</fpage><lpage>171</lpage><pub-id pub-id-type="pmid">11700597</pub-id></citation></ref>
<ref id="b63-genes-02-00748"><label>63.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Apic</surname><given-names>G.</given-names></name><name><surname>Gough</surname><given-names>J.</given-names></name><name><surname>Teichmann</surname><given-names>S.A.</given-names></name></person-group><article-title>Domain combinations in archaeal, eubacterial and eukaryotic proteomes</article-title><source>J. Mol. Biol.</source><year>2001</year><volume>310</volume><fpage>311</fpage><lpage>325</lpage><pub-id pub-id-type="doi">10.1006/jmbi.2001.4776</pub-id><pub-id pub-id-type="pmid">11428892</pub-id></citation></ref>
<ref id="b64-genes-02-00748"><label>64.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Apic</surname><given-names>G.</given-names></name><name><surname>Gough</surname><given-names>J.</given-names></name><name><surname>Teichmann</surname><given-names>S.A.</given-names></name></person-group><article-title>An insight into domain combinations</article-title><source>Bioinformatics</source><year>2001</year><volume>17</volume><fpage>S83</fpage><lpage>S89</lpage><pub-id pub-id-type="doi">10.1093/bioinformatics/17.suppl_1.S83</pub-id><pub-id pub-id-type="pmid">11472996</pub-id></citation></ref>
<ref id="b65-genes-02-00748"><label>65.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lynch</surname><given-names>M.</given-names></name></person-group><article-title>The frailty of adaptive hypotheses for the origins of organismal complexity</article-title><source>Proc. Natl. Acad. Sci. USA</source><year>2007</year><volume>104</volume><fpage>8597</fpage><lpage>8604</lpage><pub-id pub-id-type="doi">10.1073/pnas.0702207104</pub-id><pub-id pub-id-type="pmid">17494740</pub-id></citation></ref>
<ref id="b66-genes-02-00748"><label>66.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brown</surname><given-names>C.J.</given-names></name><name><surname>Takayama</surname><given-names>S.</given-names></name><name><surname>Campen</surname><given-names>A.M.</given-names></name><name><surname>Vise</surname><given-names>P.</given-names></name><name><surname>Marshall</surname><given-names>T.W.</given-names></name><name><surname>Oldfield</surname><given-names>C.J.</given-names></name><name><surname>Williams</surname><given-names>C.J.</given-names></name><name><surname>Dunker</surname><given-names>A.K.</given-names></name></person-group><article-title>Evolutionary rate heterogeneity in proteins with long disordered regions</article-title><source>J. Mol. Evol.</source><year>2002</year><volume>55</volume><fpage>104</fpage><lpage>110</lpage><pub-id pub-id-type="doi">10.1007/s00239-001-2309-6</pub-id><pub-id pub-id-type="pmid">12165847</pub-id></citation></ref>
<ref id="b67-genes-02-00748"><label>67.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Szalkowski</surname><given-names>A.M.</given-names></name><name><surname>Anisimova</surname><given-names>M.</given-names></name></person-group><article-title>Markov models of amino acid substitution to study proteins with intrinsically disordered regions</article-title><source>PLoS One</source><year>2011</year><volume>6</volume><pub-id pub-id-type="doi">10.1371/journal.pone.0020488</pub-id></citation></ref>
<ref id="b68-genes-02-00748"><label>68.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ward</surname><given-names>J.J.</given-names></name><name><surname>Sodhi</surname><given-names>J.S.</given-names></name><name><surname>McGuffin</surname><given-names>L.J.</given-names></name><name><surname>Buxton</surname><given-names>B.F.</given-names></name><name><surname>Jones</surname><given-names>D.T.</given-names></name></person-group><article-title>Prediction and functional analysis of native disorder in proteins from the three kingdoms of life</article-title><source>J. Mol. Biol.</source><year>2004</year><volume>337</volume><fpage>635</fpage><lpage>645</lpage><pub-id pub-id-type="doi">10.1016/j.jmb.2004.02.002</pub-id><pub-id pub-id-type="pmid">15019783</pub-id></citation></ref>
<ref id="b69-genes-02-00748"><label>69.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brown</surname><given-names>C.J.</given-names></name><name><surname>Johnson</surname><given-names>A.K.</given-names></name><name><surname>Daughdrill</surname><given-names>G.W.</given-names></name></person-group><article-title>Comparing models of evolution for ordered and disordered proteins</article-title><source>Mol. Biol. Evol.</source><year>2010</year><volume>27</volume><fpage>609</fpage><lpage>621</lpage><pub-id pub-id-type="doi">10.1093/molbev/msp277</pub-id><pub-id pub-id-type="pmid">19923193</pub-id></citation></ref>
<ref id="b70-genes-02-00748"><label>70.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Turoverov</surname><given-names>K.K.</given-names></name><name><surname>Kuznetsova</surname><given-names>I.M.</given-names></name><name><surname>Uversky</surname><given-names>V.N.</given-names></name></person-group><article-title>The protein kingdom extended: Ordered and intrinsically disordered proteins, their folding, supramolecular complex formation, and aggregation</article-title><source>Prog. Biophys. Mol. Biol.</source><year>2010</year><volume>102</volume><fpage>73</fpage><lpage>84</lpage><pub-id pub-id-type="doi">10.1016/j.pbiomolbio.2010.01.003</pub-id><pub-id pub-id-type="pmid">20097220</pub-id></citation></ref>
<ref id="b71-genes-02-00748"><label>71.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Amoutzias</surname><given-names>G.D.</given-names></name><name><surname>He</surname><given-names>Y.</given-names></name><name><surname>Gordon</surname><given-names>J.</given-names></name><name><surname>Mossialos</surname><given-names>D.</given-names></name><name><surname>Oliver</surname><given-names>S.G.</given-names></name><name><surname>van de Peer</surname><given-names>Y.</given-names></name></person-group><article-title>Posttranslational regulation impacts the fate of duplicated genes</article-title><source>Proc. Natl. Acad. Sci. USA</source><year>2010</year><volume>107</volume><fpage>2967</fpage><lpage>2971</lpage><pub-id pub-id-type="doi">10.1073/pnas.0911603107</pub-id><pub-id pub-id-type="pmid">20080574</pub-id></citation></ref>
<ref id="b72-genes-02-00748"><label>72.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goldstein</surname><given-names>R.A.</given-names></name></person-group><article-title>The structure of protein evolution and the evolution of protein structure</article-title><source>Curr. Opin. Struct. Biol.</source><year>2008</year><volume>18</volume><fpage>170</fpage><lpage>177</lpage><pub-id pub-id-type="doi">10.1016/j.sbi.2008.01.006</pub-id><pub-id pub-id-type="pmid">18328690</pub-id></citation></ref>
<ref id="b73-genes-02-00748"><label>73.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stoltzfus</surname><given-names>A.</given-names></name></person-group><article-title>On the possibility of constructive neutral evolution</article-title><source>J. Mol. Evol.</source><year>1999</year><volume>49</volume><fpage>169</fpage><lpage>181</lpage><pub-id pub-id-type="doi">10.1007/PL00006540</pub-id><pub-id pub-id-type="pmid">10441669</pub-id></citation></ref>
<ref id="b74-genes-02-00748"><label>74.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tompa</surname><given-names>P.</given-names></name><name><surname>Szász</surname><given-names>C.</given-names></name><name><surname>Buday</surname><given-names>L.</given-names></name></person-group><article-title>Structural disorder throws new light on moonlighting</article-title><source>Trends Biochem. Sci.</source><year>2005</year><volume>30</volume><fpage>484</fpage><lpage>489</lpage><pub-id pub-id-type="doi">10.1016/j.tibs.2005.07.008</pub-id><pub-id pub-id-type="pmid">16054818</pub-id></citation></ref>
<ref id="b75-genes-02-00748"><label>75.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kleinman</surname><given-names>C.L.</given-names></name><name><surname>Rodrigue</surname><given-names>N.</given-names></name><name><surname>Lartillot</surname><given-names>N.</given-names></name><name><surname>Philippe</surname><given-names>H.</given-names></name></person-group><article-title>Statistical potentials for improved structurally constrained evolutionary models</article-title><source>Mol. Biol. Evol.</source><year>2010</year><volume>27</volume><fpage>1546</fpage><lpage>1560</lpage><pub-id pub-id-type="doi">10.1093/molbev/msq047</pub-id><pub-id pub-id-type="pmid">20159780</pub-id></citation></ref>
<ref id="b76-genes-02-00748"><label>76.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lakner</surname><given-names>C.</given-names></name><name><surname>Holder</surname><given-names>M.T.</given-names></name><name><surname>Goldman</surname><given-names>N.</given-names></name><name><surname>Naylor</surname><given-names>G.J.</given-names></name></person-group><article-title>What's in a likelihood? Simple models of protein evolution and the contribution of structurally viable reconstructions to the likelihood</article-title><source>Syst. Biol.</source><year>2011</year><volume>60</volume><fpage>161</fpage><lpage>174</lpage><pub-id pub-id-type="doi">10.1093/sysbio/syq088</pub-id><pub-id pub-id-type="pmid">21233085</pub-id></citation></ref>
<ref id="b77-genes-02-00748"><label>77.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nasrallah</surname><given-names>C.A.</given-names></name><name><surname>Mathews</surname><given-names>D.H.</given-names></name><name><surname>Huelsenbeck</surname><given-names>J.P.</given-names></name></person-group><article-title>Quantifying the impact of dependent evolution among sites in phylogenetic inference</article-title><source>Syst. Biol.</source><year>2011</year><volume>60</volume><fpage>60</fpage><lpage>73</lpage><pub-id pub-id-type="doi">10.1093/sysbio/syq074</pub-id><pub-id pub-id-type="pmid">21081481</pub-id></citation></ref>
<ref id="b78-genes-02-00748"><label>78.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Philippe</surname><given-names>H.</given-names></name><name><surname>Casane</surname><given-names>D.</given-names></name><name><surname>Gribaldo</surname><given-names>S.</given-names></name><name><surname>Lopez</surname><given-names>P.</given-names></name><name><surname>Meunier</surname><given-names>J.</given-names></name></person-group><article-title>Heterotachy and functional shift in protein evolution</article-title><source>IUBMB Life</source><year>2003</year><volume>55</volume><fpage>257</fpage><lpage>265</lpage><pub-id pub-id-type="doi">10.1080/1521654031000123330</pub-id><pub-id pub-id-type="pmid">12880207</pub-id></citation></ref>
<ref id="b79-genes-02-00748"><label>79.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miyazawa</surname><given-names>S.</given-names></name><name><surname>Jernigan</surname><given-names>R.L.</given-names></name></person-group><article-title>Estimation of effective inter-residue contact energies from protein crystal structures: Quasi-chemical approximation</article-title><source>Macromolecules</source><year>1985</year><volume>18</volume><fpage>534</fpage><lpage>552</lpage><pub-id pub-id-type="doi">10.1021/ma00145a039</pub-id></citation></ref>
<ref id="b80-genes-02-00748"><label>80.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bastolla</surname><given-names>U.</given-names></name><name><surname>Farwer</surname><given-names>J.</given-names></name><name><surname>Knapp</surname><given-names>E.W.</given-names></name><name><surname>Vendruscolo</surname><given-names>M.</given-names></name></person-group><article-title>How to guarantee optimal stability for most representative structures in the Protein Data Bank</article-title><source>Proteins</source><year>2001</year><volume>44</volume><fpage>79</fpage><lpage>96</lpage><pub-id pub-id-type="doi">10.1002/prot.1075</pub-id><pub-id pub-id-type="pmid">11391771</pub-id></citation></ref>
<ref id="b81-genes-02-00748"><label>81.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dunker</surname><given-names>A.K.</given-names></name><name><surname>Oldfield</surname><given-names>C.J.</given-names></name><name><surname>Meng</surname><given-names>J.</given-names></name><name><surname>Romero</surname><given-names>P.</given-names></name><name><surname>Yang</surname><given-names>J.Y.</given-names></name><name><surname>Chen</surname><given-names>J.W.</given-names></name><name><surname>Vacic</surname><given-names>V.</given-names></name><name><surname>Obradovic</surname><given-names>Z.</given-names></name><name><surname>Uversky</surname><given-names>V.N.</given-names></name></person-group><article-title>The unfoldomics decade: An update on intrinsically disordered proteins</article-title><source>BMC Genomics</source><year>2008</year><volume>9</volume><pub-id pub-id-type="doi">10.1186/1471-2164-9-S2-S1</pub-id></citation></ref>
<ref id="b82-genes-02-00748"><label>82.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bryngelson</surname><given-names>J.D.</given-names></name><name><surname>Wolynes</surname><given-names>P.G.</given-names></name></person-group><article-title>Spin glasses and the statistical mechanics of protein folding</article-title><source>Proc. Natl. Acad. Sci. USA</source><year>1987</year><volume>84</volume><fpage>7524</fpage><lpage>7528</lpage><pub-id pub-id-type="doi">10.1073/pnas.84.21.7524</pub-id><pub-id pub-id-type="pmid">3478708</pub-id></citation></ref>
<ref id="b83-genes-02-00748"><label>83.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dosztányi</surname><given-names>Z.</given-names></name><name><surname>Csizmók</surname><given-names>V.</given-names></name><name><surname>Tompa</surname><given-names>P.</given-names></name><name><surname>Simon</surname><given-names>I.</given-names></name></person-group><article-title>The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins</article-title><source>J. Mol. Biol.</source><year>2005</year><volume>347</volume><fpage>827</fpage><lpage>839</lpage><pub-id pub-id-type="doi">10.1016/j.jmb.2005.01.071</pub-id><pub-id pub-id-type="pmid">15769473</pub-id></citation></ref></ref-list></back></article>
