<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">ijms</journal-id>
<journal-title>International Journal of Molecular Sciences</journal-title>
<abbrev-journal-title>Int. J. Mol. Sci.</abbrev-journal-title>
<issn pub-type="epub">1422-0067</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/ijms11114673</article-id>
<article-id pub-id-type="publisher-id">ijms-11-04673</article-id>
<article-categories>
<subj-group>
<subject>Article</subject></subj-group></article-categories>
<title-group>
<article-title>Amino Acid Patterns around Disulfide Bonds</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Marques</surname><given-names>José R. F.</given-names></name><xref ref-type="aff" rid="af1-ijms-11-04673">1</xref></contrib>
<contrib contrib-type="author">
<name><surname>da Fonseca</surname><given-names>Rute R.</given-names></name><xref ref-type="aff" rid="af2-ijms-11-04673">2</xref></contrib>
<contrib contrib-type="author">
<name><surname>Drury</surname><given-names>Brett</given-names></name><xref ref-type="aff" rid="af3-ijms-11-04673">3</xref></contrib>
<contrib contrib-type="author">
<name><surname>Melo</surname><given-names>André</given-names></name><xref ref-type="aff" rid="af1-ijms-11-04673">1</xref><xref ref-type="corresp" rid="c1-ijms-11-04673">*</xref></contrib></contrib-group>
<aff id="af1-ijms-11-04673">
<label>1</label> REQUIMTE/Departamento de Química e Bioquímica, Faculdade de Ciências da Universidade do Porto, Rua do Campo Alegre, 687, 4169-007 Porto, Portugal; E-Mail: <email>zerui.marques@fc.up.pt</email></aff>
<aff id="af2-ijms-11-04673">
<label>2</label> CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, 177, 4050-123 Porto, Portugal; E-Mail: <email>rute.r.da.fonseca@gmail.com</email></aff>
<aff id="af3-ijms-11-04673">
<label>3</label> LIAAD-INESC, Rua de Ceuta, 118, 6°, 4050-190 Porto, Portugal; E-Mail: <email>brett.drury@gmail.com</email></aff>
<author-notes>
<corresp id="c1-ijms-11-04673">
<label>*</label> Author to whom correspondence should be addressed; E-Mail: <email>asmelo@fc.up.pt</email>; Tel.: +351-220402503; Fax: +351-220402659.</corresp></author-notes>
<pub-date pub-type="collection">
<year>2010</year></pub-date>
<pub-date pub-type="epub">
<day>18</day>
<month>11</month>
<year>2010</year></pub-date>
<volume>11</volume>
<issue>11</issue>
<fpage>4673</fpage>
<lpage>4686</lpage>
<history>
<date date-type="received">
<day>17</day>
<month>10</month>
<year>2010</year></date>
<date date-type="rev-recd">
<day>4</day>
<month>11</month>
<year>2010</year></date>
<date date-type="accepted">
<day>11</day>
<month>11</month>
<year>2010</year></date></history>
<permissions>
<copyright-statement>© 2010 by the authors; licensee Molecular Diversity Preservation International, Basel, Switzerland.</copyright-statement>
<copyright-year>2010</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0">
<p>This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).</p></license></permissions>
<abstract>
<p>Disulfide bonds provide an inexhaustible source of information on molecular evolution and biological specificity. In this work, we described the amino acid composition around disulfide bonds in a set of disulfide-rich proteins using appropriate descriptors, based on <italic>ANOVA</italic> (for all twenty natural amino acids or classes of amino acids clustered according to their chemical similarities) and Scheffé (for the disulfide-rich proteins superfamilies) statistics. We found that weakly hydrophilic and aromatic amino acids are quite abundant in the regions around disulfide bonds, contrary to aliphatic and hydrophobic amino acids. The density distributions (as a function of the distance to the center of the disulfide bonds) for all defined entities presented an overall unimodal behavior: the densities are null at short distances, have maxima at intermediate distances and decrease for long distances. In the end, the amino acid environment around the disulfide bonds was found to be different for different superfamilies, allowing the clustering of proteins in a biologically relevant way, suggesting that this type of chemical information might be used as a tool to assess the relationship between very divergent sets of disulfide-rich proteins.</p></abstract>
<kwd-group>
<kwd>disulfide bond</kwd>
<kwd>neighborhood</kwd>
<kwd>classification</kwd>
<kwd>frequency</kwd>
<kwd>diversity</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>Cysteine’s (<italic>CYS</italic>) ability to dimerize makes it unique among the twenty natural amino acids. A disulfide bond is formed between two oxidized <italic>CYS</italic> thiol groups. Disulfide bonds induce conformational restrictions on proteins strongly influencing their folding, stability and function [<xref ref-type="bibr" rid="b1-ijms-11-04673">1</xref>–<xref ref-type="bibr" rid="b5-ijms-11-04673">5</xref>].</p>
<p>Disulfide topology has been successfully used for protein clustering, where the disulfide structure was found to be well-conserved even for apparently non-related proteins [<xref ref-type="bibr" rid="b6-ijms-11-04673">6</xref>–<xref ref-type="bibr" rid="b11-ijms-11-04673">11</xref>]. The disulfide topology has been subsequently used to establish evolutionary relationships not detected by sequence similarity based methods. Disulfide three-dimensional structure and connectivity are highly conserved patterns in nature, and have become the basis of several protein classification analyses [<xref ref-type="bibr" rid="b12-ijms-11-04673">12</xref>–<xref ref-type="bibr" rid="b19-ijms-11-04673">19</xref>].</p>
<p>The stabilization of disulfide bonds has also been the focus of various studies. These include: (i) The analysis of the protein environment in the neighborhood of both bonded and free cysteines [<xref ref-type="bibr" rid="b20-ijms-11-04673">20</xref>,<xref ref-type="bibr" rid="b21-ijms-11-04673">21</xref>]; (ii) the geometrical requirements of a disulfide bond [<xref ref-type="bibr" rid="b21-ijms-11-04673">21</xref>–<xref ref-type="bibr" rid="b23-ijms-11-04673">23</xref>]; (iii) the influence of pH [<xref ref-type="bibr" rid="b14-ijms-11-04673">14</xref>]; (iv) the role of redox mediators [<xref ref-type="bibr" rid="b23-ijms-11-04673">23</xref>–<xref ref-type="bibr" rid="b25-ijms-11-04673">25</xref>]; (v) the role of allosteric factors [<xref ref-type="bibr" rid="b26-ijms-11-04673">26</xref>,<xref ref-type="bibr" rid="b27-ijms-11-04673">27</xref>].</p>
<p>We have performed a systematic investigation on the amino acid composition around disulfide bonds of a set of disulfide-rich proteins selected according to their <italic>SCOP</italic> (Structural Classification of Proteins) classification [<xref ref-type="bibr" rid="b28-ijms-11-04673">28</xref>–<sup>30</sup>]. Our goal was to assess whether or not the observed patterns can be used to group the proteins according to their biological characteristics, and therefore be used as a classification criteria for very divergent proteins. In our previous work [<xref ref-type="bibr" rid="b6-ijms-11-04673">6</xref>], we demonstrated that the conformational patterns of disulfide bonds are sufficient to group proteins that share both functional and structural characteristics.</p>
<p>The protein set included twelve disulfide-rich protein superfamilies (according to the <italic>SCOP</italic> classification) that obeyed the following criteria: (i) contain a minimum of thirty disulfide bonds; (ii) have a minimum of five <italic>PDB</italic> structures available; (iii) have X-ray structures with a resolution higher than 2.5 Å and (iv) have only uncomplexed structures. The proteins belonged to the thioredoxin-like superfamily and eleven superfamilies containing small disulfide-rich proteins (<italic>SDP</italic>). The thioredoxin-like superfamily is very different from the other proteins in the set, namely because it: (i) presents a lower number of disulfide bonds per <italic>PDB</italic> structure; (ii) has an extensive hydrophobic core, completely absent in the small disulfide-rich proteins; (iii) is constituted by disulfide oxidoreductase enzymes; (iv) has a very structured secondary structure, compared to the few secondary structure elements characteristic of the small disulfide-rich proteins; (v) displays absence of disulfide cooperative effects (in small disulfide-rich proteins the disulfide and the buried side-chain influence the dynamics of the folded protein through stabilization effects resulting from the spatial proximity of two or more disulfide bonds) [<xref ref-type="bibr" rid="b12-ijms-11-04673">12</xref>].</p>
<p>Other authors have analyzed the importance of the amino acid environment around disulfide bonds for the stabilization of 3D-structures in proteins [<xref ref-type="bibr" rid="b20-ijms-11-04673">20</xref>,<xref ref-type="bibr" rid="b21-ijms-11-04673">21</xref>] but to date no studies have attempted to use this type of chemical information to aggregate a set of proteins into their respective superfamilies. This is the main purpose of the present work. Our approach involved the use of stratified statistics, which groups the members of a population (the various proteins) into relatively homogeneous and orthogonal subgroups (the described superfamilies) before sampling.</p></sec>
<sec sec-type="materials|methods">
<label>2.</label>
<title>Materials and Methods</title>
<sec>
<label>2.1.</label>
<title>General</title>
<p>We used three different criteria to describe the amino acid composition in the proximity of disulfide bonds: (i) all twenty natural amino acids were considered as independent units; (ii) the same amino acids were grouped into classes according to their chemical properties, and these classes clustered into two classification groups (<xref ref-type="table" rid="t1-ijms-11-04673">Table 1</xref>). Each entity (amino acid or class) was characterized both by a relative frequency and a diversity index. As a reference set we used a number of proteins selected from the <italic>PDB</italic> database by Xia and Xie [<xref ref-type="bibr" rid="b30-ijms-11-04673">31</xref>]. The protein set under study is characterized in <xref ref-type="table" rid="t2-ijms-11-04673">Table 2</xref>. A list of all the <italic>PDB</italic> structures analyzed is available in <xref ref-type="supplementary-material" rid="SD1">Table 1</xref> of Supplementary Material. A most frequent motif, combining <italic>SCOP</italic> clustering and structural elements, was also identified.</p>
<p>The analysis of the amino acid composition around disulfide bonds and the classification of the amino acid were carried using our program <italic>Disulph</italic> (see <xref ref-type="supplementary-material" rid="SD1">Table 2</xref> in Supplementary Material for details on <italic>Disulph</italic> functionalities). This program, written in FORTRAN, also calculates the relative frequency and the density of each entity in the neighboring region of a disulfide bond in twenty pre-determined spherical shells with thickness 0.5 Å. The neighboring region of a disulfide bond was defined as a sphere, with radius 10 Å, centered at the middle point of this bond, and excluding the cysteines involved in the bond (<xref ref-type="supplementary-material" rid="SD1">Table 3</xref> in Supplementary Material). All the residues containing at least an atom in that region were considered for the statistical analysis. We calculated the conservation of the different entities over different superfamilies using the relative frequency of each entity in the neighboring region of all disulfide bonds. We performed: (i) a one-way <italic>ANOVA</italic> hypothesis test with a significance of 5% for each entity (residue or class); (ii) a Scheffé test, with the same significance, for each entity and pair of superfamilies.</p></sec>
<sec>
<label>2.2.</label>
<title>Calculation of the Relative Frequencies for Each Entity</title>
<p>The relative frequency of entity <italic>A</italic>, in the neighborhood of disulfide <italic>j</italic>, present in superfamily <italic>m</italic>, is given by:
<disp-formula id="FD1">
<label>(1)</label>
<mml:math display="block">
<mml:mtext mathvariant="italic">rel</mml:mtext>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mtext mathvariant="italic">freq</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext mathvariant="italic">freq</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mtext mathvariant="italic">freq</mml:mtext>
<mml:mi>reference</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mtext mathvariant="italic">freq</mml:mtext>
<mml:mtext mathvariant="italic">reference</mml:mtext></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>where <italic>freq<sub>referece</sub></italic>(<italic>A</italic>) is the frequency of the same entity in the reference set.</p>
<p>The relative frequency of entity <italic>A</italic>, for the superfamily <italic>m</italic>, that presents <italic>nSS<sub>m</sub></italic> disulfide bonds, is given by:
<disp-formula id="FD2">
<label>(2)</label>
<mml:math display="block">
<mml:mtext mathvariant="italic">rel</mml:mtext>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mtext mathvariant="italic">freq</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:munderover>
<mml:mrow>
<mml:mtext mathvariant="italic">rel</mml:mtext>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mtext mathvariant="italic">freq</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></disp-formula></p>
<p>Considering a set with <italic>nSF</italic> superfamilies, the relative frequency of the entity in the sample (<italic>rel freq</italic>(<italic>A</italic>)) can be calculated by:
<disp-formula id="FD3">
<label>(3)</label>
<mml:math display="block">
<mml:mtext mathvariant="italic">rel</mml:mtext>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mtext mathvariant="italic">freq</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>F</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>F</mml:mi></mml:mrow></mml:munderover>
<mml:mrow>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:munderover>
<mml:mrow>
<mml:mtext mathvariant="italic">rel</mml:mtext>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mtext mathvariant="italic">freq</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mo>/</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></disp-formula></p></sec>
<sec>
<label>2.3.</label>
<title>ANOVA Test</title>
<p>Considering <italic>nSS<sub>total</sub></italic> as the total number of disulfide bonds in the protein set under study, we can now calculate two auxiliary quantities, (i) the mean-square error between the superfamilies (<italic>MS<sub>betweenSF</sub></italic>(<italic>A</italic>)) and (ii) the mean-square error within the superfamilies (<italic>MS<sub>withinSF</sub></italic>(<italic>A</italic>)):
<disp-formula id="FD4">
<label>(4)</label>
<mml:math display="block">
<mml:msub>
<mml:mtext mathvariant="italic">MS</mml:mtext>
<mml:mtext mathvariant="italic">betweenSF</mml:mtext></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>F</mml:mi></mml:mrow></mml:munderover>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>m</mml:mi></mml:msub>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext mathvariant="italic">rel</mml:mtext>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mtext mathvariant="italic">freq</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo>−</mml:mo>
<mml:mtext mathvariant="italic">rel</mml:mtext>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mtext mathvariant="italic">freq</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>F</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>and
<disp-formula id="FD5">
<label>(5)</label>
<mml:math display="block">
<mml:msub>
<mml:mtext mathvariant="italic">MS</mml:mtext>
<mml:mtext mathvariant="italic">withinSF</mml:mtext></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>F</mml:mi></mml:mrow></mml:munderover>
<mml:mrow>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:munderover>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext mathvariant="italic">rel</mml:mtext>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mtext mathvariant="italic">freq</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mo>−</mml:mo>
<mml:mtext mathvariant="italic">rel</mml:mtext>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mtext mathvariant="italic">freq</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mtext mathvariant="italic">total</mml:mtext></mml:mrow></mml:msub>
<mml:mo>−</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>F</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula></p>
<p>The statistical parameter <italic>F</italic>, associated with the one-way <italic>ANOVA</italic> test carried out for entity <italic>A</italic>, is calculated as a quotient between the two mean-square error values:
<disp-formula id="FD6">
<label>(6)</label>
<mml:math display="block">
<mml:mi>F</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mtext mathvariant="italic">MS</mml:mtext>
<mml:mtext mathvariant="italic">betweenSF</mml:mtext></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mtext mathvariant="italic">MS</mml:mtext>
<mml:mtext mathvariant="italic">withinSF</mml:mtext></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>This parameter should be interpreted as:
<list list-type="roman-lower">
<list-item>
<p>If <italic>F</italic> &lt; Fcritical, the relative frequency of the considered entity should be equal for all the superfamilies (null hypothesis).</p></list-item>
<list-item>
<p>If <italic>F</italic> &gt; Fcritical, the mentioned frequency should be different for at least two superfamilies (alternative hypothesis).</p></list-item></list></p>
<p>In the present case, <italic>F<sub>critical</sub></italic> = 1.8 and the null hypothesis never occurs.</p>
<p>Alternatively, the statistical parameter <italic>F</italic> can also be interpreted as a diversity index. The diversity of the associated entity over the sample increases when <italic>F</italic> increases. On the other hand, this diversity decreases over the sample when <italic>F</italic> decreases. The statistical parameter <italic>F</italic> is invariant with respect to any linear transformation. This means that, using this statistical index, diversity is a property intrinsically associated with the data sample and completely independent of the reference set considered.</p></sec>
<sec>
<label>2.4.</label>
<title>Scheffé Test</title>
<p>Complementary to the one-way <italic>ANOVA</italic> statistics carried out for entity <italic>A</italic>, we performed the Scheffé test to compare the variability associated with two superfamilies <italic>m</italic> and <italic>l</italic>. The correspondent statistical parameter 
<inline-formula>
<mml:math>
<mml:msubsup>
<mml:mi>F</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>l</mml:mn></mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">Scheffe</mml:mtext></mml:mrow></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is defined as:
<disp-formula id="FD7">
<label>(7)</label>
<mml:math display="block">
<mml:msubsup>
<mml:mi>F</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>l</mml:mn></mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">Scheffe</mml:mtext></mml:mrow></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext mathvariant="italic">rel</mml:mtext>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mtext mathvariant="italic">freq</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext mathvariant="italic">rel</mml:mtext>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mtext mathvariant="italic">freq</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mn>l</mml:mn></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>/</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mtext mathvariant="italic">MS</mml:mtext>
<mml:mrow>
<mml:mtext mathvariant="italic">withinSF</mml:mtext></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>×</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>l</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>×</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>F</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>This parameter has the same invariance properties of the statistics parameter <italic>F</italic>, defined for a one-way <italic>ANOVA</italic> test, and should be interpreted in a similar way:
<list list-type="simple">
<list-item>
<p>(iii) If 
<inline-formula>
<mml:math>
<mml:msubsup>
<mml:mi>F</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>l</mml:mn></mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">Scheffe</mml:mtext></mml:mrow></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> &lt; Fcritical, the relative frequency of the considered entity should be equal for the superfamilies <italic>m</italic> and <italic>l</italic> (null hypothesis).</p></list-item>
<list-item>
<p>(iv) If 
<inline-formula>
<mml:math>
<mml:msubsup>
<mml:mi>F</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>l</mml:mn></mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">Scheffe</mml:mtext></mml:mrow></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> &gt; Fcritical, the same frequency should differ for these two superfamilies (alternative hypothesis).</p></list-item></list></p>
<p>In the present case, <italic>F<sub>critical</sub></italic> = 1.8 and the null hypothesis frequently occur. However, the presentation of these results would be difficult, because 27 entities were analyzed. Therefore, we would have to present 31 tables. So, in order to present the differences in the chemical environment around disulfide bonds, we developed new descriptors designated by Scheffé distances. A Scheffé distance 
<inline-formula>
<mml:math>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>l</mml:mn></mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">Scheffe</mml:mtext></mml:mrow></mml:msubsup></mml:math></inline-formula> compares the chemical environment around disulfide bonds between two superfamilies <italic>m</italic> and <italic>l</italic> for any classification group with <italic>nE</italic> entities:
<disp-formula id="FD8">
<label>(8)</label>
<mml:math display="block">
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>l</mml:mi></mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">Scheffe</mml:mtext></mml:mrow></mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>×</mml:mo>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>E</mml:mi></mml:mrow></mml:munderover>
<mml:mrow>
<mml:msubsup>
<mml:mi>F</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>l</mml:mi></mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">Scheffe</mml:mtext></mml:mrow></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></disp-formula></p></sec>
<sec>
<label>2.5.</label>
<title>Representing the Distances between Superfamilies</title>
<p>In order to represent distances (
<inline-formula>
<mml:math>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>l</mml:mn></mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">Scheffe</mml:mtext></mml:mrow></mml:msubsup></mml:math></inline-formula>), inferred from the original 12-dimensional hyper-space, we adopted the intuitive form introduced by Xie <italic>et al.</italic> [<xref ref-type="bibr" rid="b31-ijms-11-04673">32</xref>]. The coordinates of the original objects (the 12 superfamiles) are projected in the 3D Cartesian space by minimizing the square deviation cost function <italic>SD</italic>:
<disp-formula id="FD9">
<label>(9)</label>
<mml:math display="block">
<mml:mi>S</mml:mi>
<mml:mi>D</mml:mi>
<mml:mo>=</mml:mo>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>F</mml:mi></mml:mrow></mml:munderover>
<mml:mrow>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:munderover>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>m</mml:mi></mml:mrow></mml:msub>
<mml:mo>−</mml:mo>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>l</mml:mi></mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">Scheffe</mml:mtext></mml:mrow></mml:msubsup></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:math></disp-formula>where <italic>d<sub>l,m</sub></italic> is the distance between the projections the superfamilies <italic>m</italic> and <italic>l</italic> in the 3D Cartesian space. We used the Newton method to carry out the iterative minimization process.</p></sec>
<sec>
<label>2.6.</label>
<title>Density of an Entity</title>
<p>The density of entity <italic>A</italic> within a spherical shell <italic>i</italic> of volume <italic>V<sub>i</sub></italic> where <italic>A</italic> occurs <italic>n</italic>(<italic>A</italic>)<italic><sub>i</sub></italic> times for the all the disulfide bonds included in the sample can be calculated as
<disp-formula id="FD10">
<label>(10)</label>
<mml:math display="block">
<mml:mi>d</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>n</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:math></disp-formula></p></sec>
<sec>
<label>2.7.</label>
<title>Disulfide Bonds Propensity</title>
<p>The disulfide bonds propensity <italic>Pr<sub>m</sub></italic>, for a superfamily <italic>m</italic> with <italic>nPDB<sub>m</sub></italic> PDB structures, is calculated as,
<disp-formula id="FD11">
<label>(11)</label>
<mml:math display="block">
<mml:msub>
<mml:mtext>Pr</mml:mtext>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>D</mml:mi>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>D</mml:mi>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:munderover>
<mml:mrow>
<mml:mn>100</mml:mn>
<mml:mo>×</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>S</mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>k</mml:mi></mml:msub></mml:mrow>
<mml:mo>/</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi></mml:msub></mml:math></disp-formula>where <italic>nSS<sub>k</sub></italic> and <italic>nres<sub>k</sub></italic> are respectively the number of disulfide bonds and the number of natural amino acids in the PDB structure <italic>k</italic>.</p></sec></sec>
<sec sec-type="results|discussion">
<label>3.</label>
<title>Results and Discussion</title>
<sec>
<label>3.1.</label>
<title>Frequency and Density</title>
<p>The relative frequencies of the various entities and the corresponding values of the statistical parameter <italic>F</italic> are presented in <xref ref-type="fig" rid="f1-ijms-11-04673">Figure 1</xref>. Cysteines are by far the most abundant amino acid around disulfide bonds, placing the class <italic>SULFUR</italic> on top of the most abundant classes (even though methionine has the lowest relative frequency of all amino acids). Almost all these cysteines are disulfide bonded, preventing mis-pairing effects. This predominant abundance results from the <italic>SDP</italic> patterns, associated with the above mentioned disulfide cooperative effects. In the thioredoxin-like proteins, which present the lowest disulfide propensities, the cysteine is less abundant than in the reference set. Weakly hydrophilic and aromatic amino acids are abundant when close to disulfide bonds, particularly tyrosine and tryptophan. Aliphatic and hydrophobic amino acids exhibited the lowest relative abundance, particularly alanine, valine leucine and isoleucine. Positively charged amino acids (arginine and lysine) are very abundant in the neighborhood of disulfides, but since negatively charged groups disrupt these bonds glutamate and aspartate have a very low relative frequency. Accordingly, disulfides involving cysteines located at the <italic>C</italic>-terminal of a protein are rarely spotted.</p>
<p>The abundance, evaluated by a relative frequency, provided valuable information on the general trends observed in the sample. Although different protein sets and methodologies were used, our results are reasonably consistent to those obtained by Petersen <italic>et al.</italic> [<xref ref-type="bibr" rid="b21-ijms-11-04673">21</xref>]. In fact, both studies are in agreement relatively to four of the five residues with highest abundance (cysteine, tryptophan, tyrosine and arginine). Aliphatic and hydrophobic amino acids exhibited the lowest relative abundance in both studies.</p>
<p>The densities for the twenty natural amino acids and the different entities in the various spherical shells (<xref ref-type="supplementary-material" rid="SD1">Table 3</xref> in Supplementary Material) are shown in <xref ref-type="fig" rid="f2-ijms-11-04673">Figure 2</xref>. The density distributions of the different entities as a function of the distance to the center of the disulfide bond display a common pattern: The densities are null at short distances, have maxima at intermediate distances and decrease for long distances.</p>
<p>Interestingly, we can see very different patterns for residues with similar relative frequencies. Among those that are on top of the frequency values (<xref ref-type="supplementary-material" rid="SD1">Table 4</xref> in Supplementary Material), cysteine is the one showing an almost uniform distribution with high concentration practically everywhere from 2 to 10 Å distance from the disulfide bond. Tyrosine and tryptophan which have relative frequency values of around 50% show radically different distributions: Tyrosine is abundant in all shells and tryptophan is only significantly present at a distance of 3.5–6 Å from the disulfide bond.</p></sec>
<sec>
<label>3.2.</label>
<title>Diversity</title>
<p>The entities (<italic>CYS</italic>, <italic>SULFUR</italic> and <italic>NHF</italic>) with highest relative abundance are associated with the largest diversity. However, the two quantities do not present any significant correlation.</p>
<p>The Scheffé distance matrices, obtained with the three classification criteria used in this work, were in reasonable agreement. In this context, we opted to represent only the projected 3D-Cartesian coordinates inferred from the 20-dimensional of natural amino acids in <xref ref-type="fig" rid="f3-ijms-11-04673">Figure 3</xref>.</p>
<p>These descriptors allowed us to find the superfamilies that present similar/dissimilar chemical environments around their disulfide bonds, providing useful information regarding evolutionary processes and further insight into the classification of disulfide-rich proteins. The main divergences, observed in <xref ref-type="fig" rid="f3-ijms-11-04673">Figure 3</xref>, can be explained by significant deviations from the most frequent motif identified in <xref ref-type="table" rid="t2-ijms-11-04673">Table 2</xref>.</p>
<p>The known differences between the thioredoxin-like superfamily and the 11 superfamilies with a disulfide-rich fold domain from small proteins class, are confirmed by the values the Scheffé descriptors. These differences include:
<list list-type="simple">
<list-item>
<p>(v) Unlike for the thioredoxin-like superfamily, the folding of small disulfide-rich proteins is dependent on disulfide bond cooperative effects—this is evident from the significantly larger relative frequency of cysteine residues observed in the small disulfide-rich proteins (<xref ref-type="fig" rid="f1-ijms-11-04673">Figure 1A</xref> and <xref ref-type="fig" rid="f4-ijms-11-04673">Figure 4</xref>);</p></list-item>
<list-item>
<p>(vi) thioredoxin-like proteins have a large hydrophobic core, absent in the small disulfide-rich proteins—this leads to significantly lower frequencies of amino acids from classes <italic>ALI</italic> and <italic>HB</italic> in the small disulfide-rich proteins relatively to the thioredoxin-like proteins (<xref ref-type="fig" rid="f1-ijms-11-04673">Figure 1B</xref> and <xref ref-type="fig" rid="f1-ijms-11-04673">1C</xref>).</p></list-item></list></p>
<p>Our results suggest that the amino acid patterns around disulfide bonds might be used as a tool to cluster proteins in a biologically relevant way. This is an interesting feature of disulfide bonds, that to date has never been considered (previous studies [<xref ref-type="bibr" rid="b20-ijms-11-04673">20</xref>,<xref ref-type="bibr" rid="b21-ijms-11-04673">21</xref>] have only analyzed global statistical tendencies).</p></sec></sec>
<sec sec-type="conclusions">
<label>4.</label>
<title>Conclusions</title>
<p>We did a thorough analysis of the amino acid neighborhood of the disulfide bonds using stratified statistics, which implies grouping the various proteins into superfamilies before sampling. We examined both the abundance and the diversity of individual amino acids and amino acid groups.</p>
<p>We found that the regions around disulfide bonds are particularly rich in weakly hydrophilic and aromatic amino acids. Aliphatic and hydrophobic amino acids exhibited the lowest relative abundance.</p>
<p>The diversity, associated with the distribution of the different entities over the sample, was determined by using the <italic>F</italic> descriptor within the <italic>ANOVA</italic> statistics. The results obtained show that the entities with large diversity are those presenting the largest discriminate behavior between the thioredoxin-like and the <italic>SDP</italic> superfamilies (the cysteine residue and classes <italic>SULFUR</italic>, <italic>NHF</italic> and <italic>HB</italic>).</p>
<p>We also evaluated the diversity within each superfamily using the Scheffé distances, which were introduced in this work. A most frequent motif was identified in the protein set. The 3D-cartesian projections of the Scheffé distances reflect essentially the deviations of the diverse superfamilies from this motif. In particular, the high divergence between the thioredoxin-like and the <italic>SDP</italic> superfamilies are clearly evident in this representation. These results suggest the possibility of using the composition of the chemical environment around disulfide bonds as a tool in protein classification of very divergent disulfide-rich proteins.</p></sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material id="SD1" content-type="local-data"><media mimetype="image" mime-subtype="pdf" xlink:href="ijms-11-04673-s001.pdf"/></supplementary-material></sec></body>
<back>
<ack>
<p>We thank the Fundação para a Ciência e a Tecnologia (FCT) for a doctoral scholarship granted to José Rui Ferreira Marques.</p>
<p>Rute R. da Fonseca was funded by FCT (SFRH/BPD/26769/2006).</p>
<p>We thank the Universidade do Porto for an electric wheelchair and a TrackerPro (a computer input device that takes the place of a mouse for people with no hand movement) granted to José Rui Ferreira Marques.</p></ack>
<ref-list>
<title>References</title>
<ref id="b1-ijms-11-04673"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bhattacharyya</surname><given-names>R</given-names></name><name><surname>Pal</surname><given-names>D</given-names></name><name><surname>Chakrabarti</surname><given-names>P</given-names></name></person-group><article-title>Disulfide bonds, their stereospecific environment and conservation in protein structures</article-title><source>Prot. Eng. Des. Sel</source><year>2004</year><volume>17</volume><fpage>795</fpage><lpage>808</lpage><pub-id pub-id-type="doi">10.1093/protein/gzh093</pub-id></citation></ref>
<ref id="b2-ijms-11-04673"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hogg</surname><given-names>PJ</given-names></name></person-group><article-title>Disulfide bonds as switches for protein function</article-title><source>Trends Biochem. Sci</source><year>2003</year><volume>28</volume><fpage>210</fpage><lpage>214</lpage><pub-id pub-id-type="doi">10.1016/S0968-0004(03)00057-4</pub-id><pub-id pub-id-type="pmid">12713905</pub-id></citation></ref>
<ref id="b3-ijms-11-04673"><label>3.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Klink</surname><given-names>TA</given-names></name><name><surname>Woycechowsky</surname><given-names>KJ</given-names></name><name><surname>Taylor</surname><given-names>KM</given-names></name><name><surname>Raines</surname><given-names>RT</given-names></name></person-group><article-title>Contribution of disulfide bonds to the conformational stability and catalytic activity of ribonuclease A</article-title><source>Eur. J. Biochem</source><year>2000</year><volume>267</volume><fpage>566</fpage><lpage>572</lpage><pub-id pub-id-type="doi">10.1046/j.1432-1327.2000.01037.x</pub-id><pub-id pub-id-type="pmid">10632727</pub-id></citation></ref>
<ref id="b4-ijms-11-04673"><label>4.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sardiu</surname><given-names>ME</given-names></name><name><surname>Cheung</surname><given-names>MS</given-names></name><name><surname>Yi-Kuo</surname><given-names>Y</given-names></name></person-group><article-title>Cysteine-cysteine contact preference leads to target-focusing in protein folding</article-title><source>Biophys. J</source><year>2007</year><volume>93</volume><fpage>938</fpage><lpage>951</lpage><pub-id pub-id-type="doi">10.1529/biophysj.106.097824</pub-id><pub-id pub-id-type="pmid">17617551</pub-id></citation></ref>
<ref id="b5-ijms-11-04673"><label>5.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wedemeyer</surname><given-names>WJ</given-names></name><name><surname>Welker</surname><given-names>E</given-names></name><name><surname>Narayan</surname><given-names>M</given-names></name><name><surname>Scheraga</surname><given-names>HA</given-names></name></person-group><article-title>Disulfide bonds and protein folding</article-title><source>Biochemistry</source><year>2000</year><volume>39</volume><fpage>4208</fpage><lpage>4216</lpage></citation></ref>
<ref id="b6-ijms-11-04673"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marques</surname><given-names>JR</given-names></name><name><surname>da Fonseca</surname><given-names>RR</given-names></name><name><surname>Drury</surname><given-names>B</given-names></name><name><surname>Melo</surname><given-names>A</given-names></name></person-group><article-title>Conformational characterization of disulfide bonds: A tool for protein classification</article-title><source>J. Theor. Biol</source><year>2010</year><volume>267</volume><fpage>388</fpage><lpage>395</lpage><pub-id pub-id-type="doi">10.1016/j.jtbi.2010.09.012</pub-id><pub-id pub-id-type="pmid">20851707</pub-id></citation></ref>
<ref id="b7-ijms-11-04673"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Benham</surname><given-names>CJ</given-names></name><name><surname>Jafri</surname><given-names>MS</given-names></name></person-group><article-title>Disulfide bonding patterns and protein topologies</article-title><source>Protein Sci</source><year>1993</year><volume>2</volume><fpage>41</fpage><lpage>54</lpage><pub-id pub-id-type="pmid">8443589</pub-id></citation></ref>
<ref id="b8-ijms-11-04673"><label>8.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gupta</surname><given-names>A</given-names></name><name><surname>van Vlijmen</surname><given-names>HWT</given-names></name><name><surname>Singh</surname><given-names>JA</given-names></name></person-group><article-title>Classification of disulfide patterns and its relationship to protein structure and function</article-title><source>Protein Sci</source><year>2004</year><volume>13</volume><fpage>2045</fpage><lpage>2058</lpage><pub-id pub-id-type="doi">10.1110/ps.04613004</pub-id><pub-id pub-id-type="pmid">15273305</pub-id></citation></ref>
<ref id="b9-ijms-11-04673"><label>9.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mas</surname><given-names>JM</given-names></name><name><surname>Aloy</surname><given-names>P</given-names></name><name><surname>Marti-Renom</surname><given-names>MA</given-names></name><name><surname>Oliva</surname><given-names>B</given-names></name><name><surname>Blanco-Aparicio</surname><given-names>C</given-names></name><name><surname>Molina</surname><given-names>MA</given-names></name><name><surname>Llorens</surname><given-names>R</given-names></name><name><surname>Querol</surname><given-names>E</given-names></name><name><surname>Aviles</surname><given-names>FX</given-names></name></person-group><article-title>Protein similarities beyond disulphide bridge topology</article-title><source>J. Mol. Biol</source><year>1998</year><volume>284</volume><fpage>541</fpage><lpage>548</lpage><pub-id pub-id-type="doi">10.1006/jmbi.1998.2194</pub-id><pub-id pub-id-type="pmid">9826496</pub-id></citation></ref>
<ref id="b10-ijms-11-04673"><label>10.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mas</surname><given-names>JM</given-names></name><name><surname>Aloy</surname><given-names>P</given-names></name><name><surname>Marti-Renom</surname><given-names>MA</given-names></name><name><surname>Oliva</surname><given-names>B</given-names></name><name><surname>Llorens</surname><given-names>R</given-names></name><name><surname>Aviles</surname><given-names>FX</given-names></name><name><surname>Querol</surname><given-names>E</given-names></name></person-group><article-title>Classification of protein disulphide-bridge topologies</article-title><source>J. Comput.-Aided Mol. Des</source><year>2001</year><volume>15</volume><fpage>477</fpage><lpage>487</lpage><pub-id pub-id-type="doi">10.1023/A:1011164224144</pub-id><pub-id pub-id-type="pmid">11394740</pub-id></citation></ref>
<ref id="b11-ijms-11-04673"><label>11.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Vlijmen</surname><given-names>HWT</given-names></name><name><surname>Gupta</surname><given-names>A</given-names></name><name><surname>Singh</surname><given-names>J</given-names></name></person-group><article-title>A Novel base of disulfide patterns and its application to the discovery of distantly related homologs</article-title><source>J. Mol. Biol</source><year>2004</year><volume>335</volume><fpage>1083</fpage><lpage>1092</lpage><pub-id pub-id-type="doi">10.1016/j.jmb.2003.10.077</pub-id><pub-id pub-id-type="pmid">14698301</pub-id></citation></ref>
<ref id="b12-ijms-11-04673"><label>12.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheek</surname><given-names>S</given-names></name><name><surname>Krishna</surname><given-names>SS</given-names></name><name><surname>Grishin</surname><given-names>NV</given-names></name></person-group><article-title>Structural classification of small, disulfide-rich protein domains</article-title><source>J. Mol. Biol</source><year>2006</year><volume>359</volume><fpage>215</fpage><lpage>237</lpage><pub-id pub-id-type="doi">10.1016/j.jmb.2006.03.017</pub-id><pub-id pub-id-type="pmid">16618491</pub-id></citation></ref>
<ref id="b13-ijms-11-04673"><label>13.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chuang</surname><given-names>CC</given-names></name><name><surname>Chen</surname><given-names>CY</given-names></name><name><surname>Yang</surname><given-names>JM</given-names></name><name><surname>Lyu</surname><given-names>PC</given-names></name><name><surname>Hwang</surname><given-names>JK</given-names></name></person-group><article-title>Relationship between protein structures and disulfide bonding patterns</article-title><source>Proteins</source><year>2003</year><volume>53</volume><fpage>1</fpage><lpage>5</lpage><pub-id pub-id-type="doi">10.1002/prot.10492</pub-id><pub-id pub-id-type="pmid">12945044</pub-id></citation></ref>
<ref id="b14-ijms-11-04673"><label>14.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Creighton</surname><given-names>TE</given-names></name></person-group><article-title>Disulphide bonds and protein stability</article-title><source>BioEssays</source><year>2005</year><volume>8</volume><fpage>57</fpage><lpage>63</lpage></citation></ref>
<ref id="b15-ijms-11-04673"><label>15.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harrison</surname><given-names>PM</given-names></name><name><surname>Sternberg</surname><given-names>JE</given-names></name></person-group><article-title>The disulphide-cross: From cystine geometry and clustering to classification of small disulphide-rich protein folds</article-title><source>J. Mol. Biol</source><year>1996</year><volume>264</volume><fpage>603</fpage><lpage>623</lpage><pub-id pub-id-type="doi">10.1006/jmbi.1996.0664</pub-id><pub-id pub-id-type="pmid">8969308</pub-id></citation></ref>
<ref id="b16-ijms-11-04673"><label>16.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kartik</surname><given-names>VJ</given-names></name><name><surname>Lavanya</surname><given-names>T</given-names></name><name><surname>Guruprasad</surname><given-names>K</given-names></name></person-group><article-title>Analysis of disulphide bond connectivity patterns in protein tertiary structure</article-title><source>Int. J. Biol. Macromol</source><year>2006</year><volume>38</volume><fpage>174</fpage><lpage>179</lpage><pub-id pub-id-type="doi">10.1016/j.ijbiomac.2006.02.004</pub-id><pub-id pub-id-type="pmid">16580722</pub-id></citation></ref>
<ref id="b17-ijms-11-04673"><label>17.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lenffer</surname><given-names>J</given-names></name><name><surname>Lai</surname><given-names>P</given-names></name><name><surname>El Mejaber</surname><given-names>W</given-names></name><name><surname>Khan</surname><given-names>AM</given-names></name><name><surname>Koh</surname><given-names>JLY</given-names></name><name><surname>Tan</surname><given-names>PTJ</given-names></name><name><surname>Seah</surname><given-names>SH</given-names></name><name><surname>Brusic</surname><given-names>V</given-names></name></person-group><article-title>CysView: Protein classification based on cysteine pairing patterns</article-title><source>Nucleic Acids Res</source><year>2004</year><volume>32</volume><fpage>W350</fpage><lpage>W355</lpage><pub-id pub-id-type="doi">10.1093/nar/gkh475</pub-id><pub-id pub-id-type="pmid">15215409</pub-id></citation></ref>
<ref id="b18-ijms-11-04673"><label>18.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thangudu</surname><given-names>RR</given-names></name><name><surname>Sharma</surname><given-names>P</given-names></name><name><surname>Srinivasan</surname><given-names>N</given-names></name><name><surname>Offmann</surname><given-names>B</given-names></name></person-group><article-title>Analycys: A database for conservation and conformation of disulphide bonds in homologous protein domains</article-title><source>Proteins</source><year>2007</year><volume>67</volume><fpage>255</fpage><lpage>261</lpage><pub-id pub-id-type="doi">10.1002/prot.21318</pub-id><pub-id pub-id-type="pmid">17285632</pub-id></citation></ref>
<ref id="b19-ijms-11-04673"><label>19.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thornton</surname><given-names>JM</given-names></name></person-group><article-title>Disulphide bridges in globular proteins</article-title><source>J. Mol. Biol</source><year>1981</year><volume>151</volume><fpage>261</fpage><lpage>287</lpage><pub-id pub-id-type="doi">10.1016/0022-2836(81)90515-5</pub-id><pub-id pub-id-type="pmid">7338898</pub-id></citation></ref>
<ref id="b19a-ijms-11-04673"><label>19.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jahandideh</surname><given-names>S</given-names></name><name><surname>Hoseini</surname><given-names>S</given-names></name><name><surname>Jahandideh</surname><given-names>M</given-names></name><name><surname>Hoseini</surname><given-names>A</given-names></name><name><surname>Yazdi</surname><given-names>AS</given-names></name></person-group><article-title>Analysis of factors that induce cysteine bonding state</article-title><source>Comput. Biol. Med</source><year>2009</year><volume>39</volume><fpage>332</fpage><lpage>339</lpage><pub-id pub-id-type="doi">10.1016/j.compbiomed.2009.01.006</pub-id><pub-id pub-id-type="pmid">19246035</pub-id></citation></ref>
<ref id="b20-ijms-11-04673"><label>20.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Petersen</surname><given-names>MTN</given-names></name><name><surname>Jonson</surname><given-names>PH</given-names></name><name><surname>Petersen</surname><given-names>SB</given-names></name></person-group><article-title>Amino acid neighbours and detailed conformational analysis of cysteines in proteins</article-title><source>Protein Eng. Des. Sel</source><year>1999</year><volume>12</volume><fpage>535</fpage><lpage>548</lpage><pub-id pub-id-type="doi">10.1093/protein/12.7.535</pub-id></citation></ref>
<ref id="b21-ijms-11-04673"><label>21.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dani</surname><given-names>VS</given-names></name><name><surname>Ramakrishnan</surname><given-names>C</given-names></name><name><surname>Varadarajan</surname><given-names>R</given-names></name></person-group><article-title>MODIP revisited: Re-evaluation and refinement of an automated procedure for modeling of disulfide bonds in proteins</article-title><source>Protein Eng. Des. Sel</source><year>2003</year><volume>16</volume><fpage>187</fpage><lpage>193</lpage><pub-id pub-id-type="doi">10.1093/proeng/gzg024</pub-id></citation></ref>
<ref id="b22-ijms-11-04673"><label>22.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Inaba</surname><given-names>K</given-names></name><name><surname>Murakami</surname><given-names>S</given-names></name><name><surname>Suzuki</surname><given-names>M</given-names></name><name><surname>Nakagawa</surname><given-names>A</given-names></name><name><surname>Yamashita</surname><given-names>E</given-names></name><name><surname>Okada</surname><given-names>K</given-names></name><name><surname>Ito</surname><given-names>K</given-names></name></person-group><article-title>Crystal structure of the DsbB-DsbA complex reveals a mechanism of disulfide bond generation</article-title><source>Cell</source><year>2006</year><volume>127</volume><fpage>789</fpage><lpage>201</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2006.10.034</pub-id><pub-id pub-id-type="pmid">17110337</pub-id></citation></ref>
<ref id="b23-ijms-11-04673"><label>23.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ito</surname><given-names>K</given-names></name></person-group><article-title>Editing disulphide bonds: error correction using redox currencies</article-title><source>Mol. Microbiol</source><year>2010</year><volume>75</volume><fpage>1</fpage><lpage>5</lpage><pub-id pub-id-type="doi">10.1111/j.1365-2958.2009.06953.x</pub-id><pub-id pub-id-type="pmid">19906178</pub-id></citation></ref>
<ref id="b24-ijms-11-04673"><label>24.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sevier</surname><given-names>CS</given-names></name><name><surname>Kaiser</surname><given-names>CA</given-names></name></person-group><article-title>Ero1 and redox homeostasis in the endoplasmic reticulum</article-title><source>BBA</source><year>2008</year><volume>1783</volume><fpage>549</fpage><lpage>556</lpage><pub-id pub-id-type="doi">10.1016/j.bbamcr.2007.12.011</pub-id><pub-id pub-id-type="pmid">18191641</pub-id></citation></ref>
<ref id="b25-ijms-11-04673"><label>25.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schmidt</surname><given-names>B</given-names></name><name><surname>Ho</surname><given-names>L</given-names></name><name><surname>Hogg</surname><given-names>PJ</given-names></name></person-group><article-title>Allosteric disulphide bonds</article-title><source>Biochemistry</source><year>2006</year><volume>45</volume><fpage>7429</fpage><lpage>7433</lpage><pub-id pub-id-type="doi">10.1021/bi0603064</pub-id><pub-id pub-id-type="pmid">16768438</pub-id></citation></ref>
<ref id="b26-ijms-11-04673"><label>26.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schmidt</surname><given-names>B</given-names></name><name><surname>Hogg</surname><given-names>PJ</given-names></name></person-group><article-title>Search for allosteric disulfide bonds in NMR structures</article-title><source>BMC Struct. Biol</source><year>2007</year><volume>7</volume><fpage>49</fpage><pub-id pub-id-type="doi">10.1186/1472-6807-7-49</pub-id><pub-id pub-id-type="pmid">17640393</pub-id></citation></ref>
<ref id="b27-ijms-11-04673"><label>27.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Murzin</surname><given-names>AG</given-names></name><name><surname>Brenner</surname><given-names>SE</given-names></name><name><surname>Hubbard</surname><given-names>T</given-names></name><name><surname>Chothia</surname><given-names>C</given-names></name></person-group><article-title>SCOP: A structural classification of proteins database for investigation of sequences and structures</article-title><source>J. Mol. Biol</source><year>1995</year><volume>247</volume><fpage>536</fpage><lpage>540</lpage><pub-id pub-id-type="pmid">7723011</pub-id></citation></ref>
<ref id="b28-ijms-11-04673"><label>28.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Andreeva</surname><given-names>A</given-names></name><name><surname>Howorth</surname><given-names>D</given-names></name><name><surname>Brenner</surname><given-names>SE</given-names></name><name><surname>Hubbard</surname><given-names>TJP</given-names></name><name><surname>Chothia</surname><given-names>C</given-names></name><name><surname>Murzin</surname><given-names>AG</given-names></name></person-group><article-title>SCOP database in 2004: Refinements integrate structure and sequence family data</article-title><source>Nucleic Acids Res</source><year>2004</year><volume>32</volume><fpage>D222</fpage><lpage>D229</lpage><pub-id pub-id-type="doi">10.1093/nar/gkh463</pub-id></citation></ref>
<ref id="b29-ijms-11-04673"><label>29.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Andreeva</surname><given-names>A</given-names></name><name><surname>Howorth</surname><given-names>D</given-names></name><name><surname>Chandonia</surname><given-names>J-M</given-names></name><name><surname>Brenner</surname><given-names>SE</given-names></name><name><surname>Hubbard</surname><given-names>TJP</given-names></name><name><surname>Chothia</surname><given-names>C</given-names></name><name><surname>Murzin</surname><given-names>AG</given-names></name></person-group><article-title>Data growth and its impact on the SCOP database: new developments</article-title><source>Nucleic Acids Res</source><year>2008</year><volume>36</volume><fpage>D419</fpage><lpage>D425</lpage><pub-id pub-id-type="pmid">18000004</pub-id></citation></ref>
<ref id="b30-ijms-11-04673"><label>31.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xia</surname><given-names>X</given-names></name><name><surname>Xie</surname><given-names>Z</given-names></name></person-group><article-title>Protein structure, neighbor effect, and a new index of amino acid dissimilarities</article-title><source>Mol. Biol. Evol</source><year>2002</year><volume>19</volume><fpage>58</fpage><lpage>67</lpage><pub-id pub-id-type="doi">10.1093/oxfordjournals.molbev.a003982</pub-id><pub-id pub-id-type="pmid">11752190</pub-id></citation></ref>
<ref id="b31-ijms-11-04673"><label>32.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname><given-names>D</given-names></name><name><surname>Tropsha</surname><given-names>A</given-names></name><name><surname>Schlick</surname><given-names>T</given-names></name></person-group><article-title>An efficient projection protocol for chemical databases: Singular value decomposition combined with truncated-Newton minimization</article-title><source>J. Chem. Inf. Comput. Sci</source><year>2000</year><volume>40</volume><fpage>167</fpage><lpage>177</lpage><pub-id pub-id-type="doi">10.1021/ci990333j</pub-id><pub-id pub-id-type="pmid">10661564</pub-id></citation></ref></ref-list>
<sec sec-type="display-objects">
<title>Figures and Tables</title>
<fig id="f1-ijms-11-04673" position="float">
<label>Figure 1.</label>
<caption>
<p>Relative frequencies around disulfide bonds of (<bold>A</bold>) the natural amino acids, (<bold>B</bold>) classes in classification group 1, and (<bold>C</bold>) classes in classification group 2. The black columns represent the relative frequencies for the sample. The other columns represent the relative frequencies for each superfamily. The values of the statistical parameter <italic>F</italic> associated with the one-way <italic>ANOVA</italic> test are presented in parenthesis.</p></caption><graphic xlink:href="ijms-11-04673f1.gif"/></fig>
<fig id="f2-ijms-11-04673" position="float">
<label>Figure 2.</label>
<caption>
<p>Densities for the twenty natural amino acids and the various classes in the different spherical shells. The following color notation is adopted: green means a density 50% smaller than a uniform density; yellow represents a density between 50% and 150% this density; and orange corresponds to 150% larger than the same reference.</p></caption><graphic xlink:href="ijms-11-04673f2.gif"/></fig>
<fig id="f3-ijms-11-04673" position="float">
<label>Figure 3.</label>
<caption>
<p>Projected 3D-Cartesian representation of the twelve superfamilies under study, inferred from the Scheffé distances calculated on the original 20-dimensional space of the natural amino acid.</p></caption><graphic xlink:href="ijms-11-04673f3.gif"/></fig>
<fig id="f4-ijms-11-04673" position="float">
<label>Figure 4.</label>
<caption>
<p>Representative amino acid disulfide environments (top: all side-chains; bottom: only the side-chains of the cysteines involved in disulfide-bonds are depicted). (<bold>A</bold>) thioredoxin-like (PDB id 1bed); (<bold>B</bold>) <italic>SDP</italic>’s superfamilies (plant defensin, PDB id 1q9b). A cutoff 10 Å around the disulfide bonds was considered.</p></caption><graphic xlink:href="ijms-11-04673f4.gif"/></fig>
<table-wrap id="t1-ijms-11-04673" position="float">
<label>Table 1.</label>
<caption>
<p>The amino acid classes assembled using various physicochemical criteria were clustered into two classification groups.</p></caption>
<table frame="box" rules="all">
<thead>
<tr><th align="left" valign="bottom"/>
<th align="center" valign="bottom"><bold>Classes</bold></th>
<th align="center" valign="bottom"><bold>Amino Acids</bold></th>
<th align="center" valign="bottom"><bold>Criteria</bold></th></tr></thead>
<tbody>
<tr>
<td align="center" valign="middle" rowspan="5"><bold>1</bold></td>
<td align="center" valign="middle"><bold>ALI</bold></td>
<td align="center" valign="middle">ALA, ILE, GLY, PRO, VAL, LEU</td>
<td align="center" valign="middle">aliphatic side chain</td></tr>
<tr>
<td align="center" valign="middle"><bold>AROM</bold></td>
<td align="center" valign="middle">TYR, PHE, TRP</td>
<td align="center" valign="middle">aromatic side chain (absorbs UV)</td></tr>
<tr>
<td align="center" valign="middle"><bold>SULFUR</bold></td>
<td align="center" valign="middle">CYS, MET</td>
<td align="center" valign="middle">side chain containing a sulfur atom</td></tr>
<tr>
<td align="center" valign="middle"><bold>POL</bold></td>
<td align="center" valign="middle">SER, THR, ASN, GLN</td>
<td align="center" valign="middle">polar side chain</td></tr>
<tr>
<td align="center" valign="middle"><bold>CAR</bold></td>
<td align="center" valign="middle">ASP, GLU, HIS, LYS, ARG</td>
<td align="center" valign="middle">charged side chain</td></tr>
<tr>
<td align="center" valign="middle" rowspan="4"><bold>2</bold></td>
<td align="center" valign="middle"><bold>HF</bold></td>
<td align="center" valign="middle">SER, THR, ASN, GLN, ASP, GLU, HIS, LYS, ARG</td>
<td align="center" valign="middle">hydrophilic</td></tr>
<tr>
<td align="center" valign="middle"><bold>HB</bold></td>
<td align="center" valign="middle">ALA, VAL, LEU, ILE, MET, PHE, TRP</td>
<td align="center" valign="middle">hydrophobic</td></tr>
<tr>
<td align="center" valign="middle"><bold>NHF</bold></td>
<td align="center" valign="middle">GLY, CYS, TYR</td>
<td align="center" valign="middle">weakly hydrophilic</td></tr>
<tr>
<td align="center" valign="middle"><bold>NHB</bold></td>
<td align="center" valign="middle">PRO</td>
<td align="center" valign="middle">weakly hydrophobic</td></tr></tbody></table></table-wrap>
<table-wrap id="t2-ijms-11-04673" position="float">
<label>Table 2.</label>
<caption>
<p>Set of superfamilies under study. The statistical analyses included all the disulfide bonds identified in this protein set. The values in the last three columns were calculated as sums over all the PDB structures of each superfamily (see PDB ids in Table 1 in Supplementary Material).</p></caption>
<table frame="box" rules="all">
<thead>
<tr>
<th align="center" valign="middle"><bold>SCOP Superfamily</bold></th>
<th align="center" valign="middle"><bold>SCOP Class</bold></th>
<th align="center" valign="middle"><bold>SCOP Fold</bold></th>
<th align="center" valign="middle"><bold>Dominant Secondary Structure</bold></th>
<th align="center" valign="middle"><bold>Disulfide Bond Propensity<sup><xref ref-type="table-fn" rid="tfn1-ijms-11-04673">#</xref></sup></bold></th>
<th align="center" valign="middle"><bold>Total Number of PDB Structures</bold></th>
<th align="center" valign="middle"><bold>Total Number of Disulfide Bonds</bold></th>
<th align="center" valign="middle"><bold>Total Number of Residues</bold></th></tr></thead>
<tbody>
<tr>
<td align="center" valign="middle">Crisp</td>
<td align="center" valign="middle">Small proteins</td>
<td align="center" valign="middle">Crisp domain-like</td>
<td align="center" valign="middle">α</td>
<td align="center" valign="middle">5.3%</td>
<td align="center" valign="middle">6</td>
<td align="center" valign="middle">54</td>
<td align="center" valign="middle">1367</td></tr>
<tr>
<td align="center" valign="middle">Cystine-Knot</td>
<td align="center" valign="middle">Small proteins</td>
<td align="center" valign="middle">Cystine-Knot cykotines</td>
<td align="center" valign="middle">β</td>
<td align="center" valign="middle">3.7%</td>
<td align="center" valign="middle">13</td>
<td align="center" valign="middle">112</td>
<td align="center" valign="middle">3131</td></tr>
<tr>
<td align="center" valign="middle">Defensin-like</td>
<td align="center" valign="middle">Small proteins</td>
<td align="center" valign="middle">Defensin-like</td>
<td align="center" valign="middle">β</td>
<td align="center" valign="middle">7.4%</td>
<td align="center" valign="middle">15</td>
<td align="center" valign="middle">47</td>
<td align="center" valign="middle">730</td></tr>
<tr>
<td align="center" valign="middle">EGF-Laminin</td>
<td align="center" valign="middle">Small proteins</td>
<td align="center" valign="middle">Knottins</td>
<td align="center" valign="middle">β</td>
<td align="center" valign="middle">6.4%</td>
<td align="center" valign="middle">27</td>
<td align="center" valign="middle">121</td>
<td align="center" valign="middle">2253</td></tr>
<tr>
<td align="center" valign="middle">Omega toxins</td>
<td align="center" valign="middle">Small proteins</td>
<td align="center" valign="middle">Knottins</td>
<td align="center" valign="middle">β</td>
<td align="center" valign="middle">8.9%</td>
<td align="center" valign="middle">28</td>
<td align="center" valign="middle">88</td>
<td align="center" valign="middle">992</td></tr>
<tr>
<td align="center" valign="middle">Plant lectins</td>
<td align="center" valign="middle">Small proteins</td>
<td align="center" valign="middle">Knottins</td>
<td align="center" valign="middle">β</td>
<td align="center" valign="middle">9.9%</td>
<td align="center" valign="middle">8</td>
<td align="center" valign="middle">100</td>
<td align="center" valign="middle">1045</td></tr>
<tr>
<td align="center" valign="middle">Small snake toxins</td>
<td align="center" valign="middle">Small proteins</td>
<td align="center" valign="middle">Snake toxins-like</td>
<td align="center" valign="middle">β</td>
<td align="center" valign="middle">6.5%</td>
<td align="center" valign="middle">40</td>
<td align="center" valign="middle">209</td>
<td align="center" valign="middle">3279</td></tr>
<tr>
<td align="center" valign="middle">Scorpion-like toxins</td>
<td align="center" valign="middle">Small proteins</td>
<td align="center" valign="middle">Knottins</td>
<td align="center" valign="middle">β</td>
<td align="center" valign="middle">7.9%</td>
<td align="center" valign="middle">70</td>
<td align="center" valign="middle">247</td>
<td align="center" valign="middle">3303</td></tr>
<tr>
<td align="center" valign="middle">BBI</td>
<td align="center" valign="middle">Small proteins</td>
<td align="center" valign="middle">Knottins</td>
<td align="center" valign="middle">β</td>
<td align="center" valign="middle">9.6%</td>
<td align="center" valign="middle">5</td>
<td align="center" valign="middle">33</td>
<td align="center" valign="middle">371</td></tr>
<tr>
<td align="center" valign="middle">BPTI-like</td>
<td align="center" valign="middle">Small proteins</td>
<td align="center" valign="middle">BPTI-like</td>
<td align="center" valign="middle">α + β</td>
<td align="center" valign="middle">5.1%</td>
<td align="center" valign="middle">12</td>
<td align="center" valign="middle">42</td>
<td align="center" valign="middle">814</td></tr>
<tr>
<td align="center" valign="middle">Kringle-like</td>
<td align="center" valign="middle">Small proteins</td>
<td align="center" valign="middle">Kringle-like</td>
<td align="center" valign="middle">β</td>
<td align="center" valign="middle">3.7%</td>
<td align="center" valign="middle">12</td>
<td align="center" valign="middle">53</td>
<td align="center" valign="middle">1771</td></tr>
<tr>
<td align="center" valign="middle">Thioredoxin-like</td>
<td align="center" valign="middle">Alpha and beta proteins</td>
<td align="center" valign="middle">Thioredoxin</td>
<td align="center" valign="middle">α/β</td>
<td align="center" valign="middle">0.8%</td>
<td align="center" valign="middle">43</td>
<td align="center" valign="middle">66</td>
<td align="center" valign="middle">10616</td></tr>
<tr>
<td align="center" valign="middle">Most frequent motif</td>
<td align="center" valign="middle">Small proteins</td>
<td align="center" valign="middle">Knottins</td>
<td align="center" valign="middle">β</td>
<td align="center" valign="middle">[6.7%, 7.3%]<xref ref-type="table-fn" rid="tfn2-ijms-11-04673">*</xref></td>
<td align="center" valign="middle">-</td>
<td align="center" valign="middle">-</td>
<td align="center" valign="middle">-</td></tr></tbody></table>
<table-wrap-foot><fn id="tfn1-ijms-11-04673">
<label>#</label>
<p>Calculated by <xref ref-type="disp-formula" rid="FD11">equation 11</xref>;</p></fn><fn id="tfn2-ijms-11-04673">
<label>*</label>
<p>Confidence interval, at a 95% level, for the disulfide bonds propensity of <italic>SDP</italic> structures; EGF: Epidermal growth factor; BBI: Bowman Birk Inhibitors; BPTI: basic pancreatic trypsin inhibitor.</p></fn></table-wrap-foot></table-wrap></sec></back></article>
