<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">viruses</journal-id>
      <journal-title>Viruses</journal-title>
      <abbrev-journal-title abbrev-type="publisher">Viruses</abbrev-journal-title>
      <abbrev-journal-title abbrev-type="pubmed">Viruses</abbrev-journal-title>
      <issn pub-type="epub">1999-4915</issn>
      <publisher>
        <publisher-name>MDPI</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3390/v4030348</article-id>
      <article-id pub-id-type="publisher-id">viruses-04-00348</article-id>
      <article-categories>
        <subj-group>
          <subject>Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Mining the Protein Data Bank to Differentiate Error from Structural Variation in Clustered Static Structures: An Examination of HIV Protease</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Venkatakrishnan</surname>
            <given-names>Balasubramanian</given-names>
          </name>
          <xref rid="fn1-viruses-04-00348" ref-type="fn">†</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Palii</surname>
            <given-names>Miorel-Lucian</given-names>
          </name>
          <xref rid="fn1-viruses-04-00348" ref-type="fn">†</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Agbandje-McKenna</surname>
            <given-names>Mavis</given-names>
          </name>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>McKenna</surname>
            <given-names>Robert</given-names>
          </name>
          <xref rid="c1-viruses-04-00348" ref-type="corresp">*</xref>
        </contrib>
      </contrib-group>
      <aff id="af1-viruses-04-00348">Department of Biochemistry and Molecular Biology, University of Florida, Gainesville, FL 32610, USA; Email: <email>balavenkat@ufl.edu</email> (B.V.); <email>mlpalii@gmail.com</email> (M.-L.P.); <email>mckenna@ufl.edu</email> (M.A.-McK.)</aff>
      <author-notes>
        <fn id="fn1-viruses-04-00348">
          <label>†</label>
          <p> These authors contributed equally to this work.</p>
        </fn>
        <corresp id="c1-viruses-04-00348"><label>*</label> Author to whom correspondence should be addressed; Email: <email>rmckenna@ufl.edu</email>; Tel.: +1-352-392-5696; Fax: +1-352-392-3422.</corresp>
      </author-notes>
      <pub-date pub-type="epub">
        <day>05</day>
        <month>03</month>
        <year>2012</year>
      </pub-date>
      <pub-date pub-type="collection"><month>03</month>
        <year>2012</year>
      </pub-date>
      <volume>4</volume>
      <issue>3</issue>
      <fpage>348</fpage>
      <lpage>362</lpage>
      <history>
        <date date-type="received">
          <day>05</day>
          <month>02</month>
          <year>2012</year>
        </date>
        <date date-type="rev-recd">
          <day>29</day>
          <month>02</month>
          <year>2012</year>
        </date>
        <date date-type="accepted">
          <day>01</day>
          <month>03</month>
          <year>2012</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>©  2012 by the authors; licensee MDPI, Basel, Switzerland.</copyright-statement>
        <copyright-year>2012</copyright-year>
        <license xmlns:xlink="http://www.w3.org/1999/xlink" license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/">
          <p>This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).</p>
        </license>
      </permissions>
      <abstract>
        <p>The Protein Data Bank (PDB) contains over 71,000 structures. Extensively studied proteins have hundreds of submissions available, including mutations, different complexes, and space groups, allowing for application of data-mining algorithms to analyze an array of static structures and gain insight about a protein’s structural variation and possibly its dynamics. This investigation is a case study of HIV protease (PR) using in-house algorithms for data mining and structure superposition through generalized formulæ that account for multiple conformations and fractional occupancies. Temperature factors (<italic>B</italic>-factors) are compared with spatial displacement from the mean structure over the entire study set and separately over bound and ligand-free structures, to assess the significance of structural deviation in a statistical context. Space group differences are also examined. </p>
      </abstract>
      <kwd-group>
        <kwd>B-factor and spatial variation</kwd>
        <kwd>data mining</kwd>
        <kwd>HIV protease</kwd>
        <kwd>structure superposition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec sec-type="intro">
      <title>1. Introduction</title>
      <sec>
        <title>1.1. The Protein Data Bank</title>
        <p>Established in 1971, the Protein Data Bank (PDB) has proved invaluable not only to the research community but also to students and educators [<xref ref-type="bibr" rid="B1-viruses-04-00348">1</xref>]. The PDB has outgrown its initial purpose as a repository of the atomic coordinates of protein structures [<xref ref-type="bibr" rid="B2-viruses-04-00348">2</xref>] and now contributes to the understanding of biological function by structural genomics and similar initiatives.</p>
        <p>Over 71,000 structures were in the database at the time of this writing, and more are deposited weekly [<xref ref-type="bibr" rid="B3-viruses-04-00348">3</xref>]. Considerable molecular dynamics work has been done to assess conformational changes and mobility in individual macromolecules, but looking at an entire array of static structures is an untapped approach. Extensively characterized proteins have hundreds of structures available in the PDB, including mutations, various inhibitor complexes, and different resolutions or crystallographic space groups. This provides a unique opportunity to apply data-mining algorithms to the multitude of static coordinates deposited in the PDB, obtaining a measure of reliability when deciding on the significance of a structural change, as well as possibly revealing an alternative, dynamic view of a protein.</p>
      </sec>
      <sec>
        <title>1.2. HIV protease</title>
        <p>The protein chosen as a case study to demonstrate this approach is the protease of the human immunodeficiency virus (HIV), the causative agent of acquired imunodeficiency syndrome (AIDS). The role of HIV protease (PR) in the maturation of the virus to an infective state has made it an attractive drug target. Several PR inhibitors have been used in AIDS therapy [<xref ref-type="bibr" rid="B4-viruses-04-00348">4</xref>,<xref ref-type="bibr" rid="B5-viruses-04-00348">5</xref>]. Highly Active Anti-Retroviral Therapy (HAART) combines multiple drugs, significantly improving prognoses [<xref ref-type="bibr" rid="B6-viruses-04-00348">6</xref>]. However, the high mutation rate of the virus has given rise to a number of polymorphs, including drug-resistant mutants [<xref ref-type="bibr" rid="B5-viruses-04-00348">5</xref>,<xref ref-type="bibr" rid="B7-viruses-04-00348">7</xref>,<xref ref-type="bibr" rid="B8-viruses-04-00348">8</xref>]. This has prompted extensive study of the mechanisms of inhibitor action and resistance in the various polymorphs. Recent investigations have looked at the structure of PR in different polymorphic forms, in both the bound and ligand-free state [<xref ref-type="bibr" rid="B8-viruses-04-00348">8</xref>,<xref ref-type="bibr" rid="B9-viruses-04-00348">9</xref>].</p>
        <p>PR is a homodimer, as shown in <xref ref-type="fig" rid="viruses-04-00348-f001">figure 1</xref>. Each of the 99-residue monomers contributes a catalytic aspartate (D25) to the active site, located above the dimeric interface and enclosed by a pair of flaps [<xref ref-type="bibr" rid="B10-viruses-04-00348">10</xref>]. The dynamic nature of the flaps has not prevented crystallographic examinations of PR, and numerous studies have worked toward elucidating the structural basis of drug action and resistance [<xref ref-type="bibr" rid="B11-viruses-04-00348">11</xref>]. </p>
        <fig id="viruses-04-00348-f001" position="anchor">
          <label>Figure 1</label>
          <caption>
            <p>The PR dimer. Cartoon diagram of PDB ID 3hvp showing the monomers in orange and blue. Regions of PR structure are labeled, and relevant residue numbers are given in parenthesis. Rendered using PyMOL.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-g001.tif"/>
        </fig>
        <fig id="viruses-04-00348-f002" position="anchor">
          <label>Figure 2</label>
          <caption>
            <p>Variability of PR primary structures in the PDB. Graph considers the 811 PR monomers obtained as described in the experimental section, except PDB ID 2rkg, which contains an insertion. Non-standard residues (e.g., norleucine) are also included.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-g002.tif"/>
        </fig>
        <p>Since the first complete crystal structure of HIV-1 protease was solved [<xref ref-type="bibr" rid="B10-viruses-04-00348">10</xref>], several hundred PR structures have been deposited in the PDB, covering significant variation in amino acid sequence, as shown in <xref ref-type="fig" rid="viruses-04-00348-f002">figure 2</xref>. The availability of such a substantial data set makes possible statistical probing into the properties of PR.</p>
        <p>Presented here is an investigation using a set of in-house tools to data-mine the PDB, superpose structures, and calculate various parameters by residue or by structure. Mean temperature factors (<italic>B</italic>-factors) and spatial displacements were examined and correlated to resolution, ligand presence, and space group, and the obtained results were compared to current biological views. </p>
      </sec>
    </sec>
    <sec sec-type="results">
      <title>2. Results and Discussion</title>
      <p>The occupancy-weighted average α-carbon <italic>B</italic>-factor was calculated for each of the resulting chains and plotted as the ordinate of a graph using structure resolution as the abscissa, in the hope of observing an association, even if possibly a weak one. However, <xref ref-type="fig" rid="viruses-04-00348-f003">figure 3</xref> shows at best a resolution-dependent upper bound for the mean C <italic>B</italic>-factor. The lack of a stronger association can at least partially be attributed to the surprisingly low <italic>B</italic>-factors reported by some structures. Many files contained atomic coordinates with <italic>B</italic>-factors of 2 Å<sup>2</sup> and below, and several included negative <italic>B</italic>-factors. It was therefore necessary to remove from the study-set any structures containing <italic>B</italic>-factors lower than some reasonable value. This cut-off value came from a high resolution structure with reliable low <italic>B</italic>-factors, with no negative values. The highest resolution structure of lysozyme in the PDB at the time of this writing, PDB ID 2vb1 [<xref ref-type="bibr" rid="B14-viruses-04-00348">14</xref>], lists no <italic>B</italic>-factors lower than 2.15 Å<sup>2</sup>, so this was selected as the cut-off. 597 HIV protease chains passed, represented in <xref ref-type="fig" rid="viruses-04-00348-f003">figure 3</xref> as blue points. This filtering step noticeably improved the linearity of the relationship between resolution and C <italic>B</italic>-factors. </p>
      <fig id="viruses-04-00348-f003" position="anchor">
        <label>Figure 3</label>
        <caption>
          <p>Quality of deposited PR structures. Monomers that passed the <italic>B-</italic>factor cut-off of 2.15 Å<sup>2</sup> are marked blue, whereas those that failed are red. NMR structures, to which the concept of resolution does not apply, were not used for this plot.</p>
        </caption>
        <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-g003.tif"/>
      </fig>
      <p>The PR monomers composing the study-set were superposed by the least squares method. Shown in <xref ref-type="fig" rid="viruses-04-00348-f004">figure 4</xref>A, the ribbon diagram representation of this superposition resembles an ensemble of NMR structures, even though no NMR structures were present in the data set. Though motion cannot be inferred directly from crystallographic data, it is worth noting that the greatest variation is observed in the flap and elbow, supporting the findings of NMR and molecular dynamics studies that have described these regions as the most dynamic [<xref ref-type="bibr" rid="B18-viruses-04-00348">18</xref>]. Interestingly, the flap region showed a greater relative thermal stability. <xref ref-type="fig" rid="viruses-04-00348-f004">Figure 4</xref>B, a putty cartoon based on mean Cα <italic>B</italic>-factors, shows much greater values in the elbow and 60’s loop than in the flap. However, when considering spatial displacement from the mean monomer (as in <xref ref-type="fig" rid="viruses-04-00348-f004">figure 4</xref>C), the tip of the flap joins the elbow and the 10’s and 60’s loops (defined as in <xref ref-type="fig" rid="viruses-04-00348-f001">figure 1</xref>) as one of the most variable regions, even though some of the range suggested by <xref ref-type="fig" rid="viruses-04-00348-f004">figure 4</xref>A has been averaged out. </p>
      <fig id="viruses-04-00348-f004" position="anchor">
        <label>Figure 4</label>
        <caption>
          <p>PR monomers. (<bold>A</bold>) Ribbon diagram of the final data set superposed by least squares. (<bold>B</bold>) Putty cartoon of <italic>B-</italic>factor variation on the mean structure, colored from low to high (yellow to red). (<bold>C</bold>) putty cartoon of spatial variation on the mean structure, colored from low to high (blue to green). Refer to <xref ref-type="fig" rid="viruses-04-00348-f001">figure 1</xref> for definitions of PR regions. Rendered using PyMOL (DeLano, 2002)</p>
        </caption>
        <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-g004.tif"/>
      </fig>
      <p>A possible explanation would be the existence of two distinct conformations of the enzyme: open and closed. In the latter, the presence of a ligand would enable interactions that hold the flap closed, ensuring its stability. In the former, steric clashes with symmetry-related molecules may limit flap opening and movement, or alternate conformers may be induced by amino acid variation. An analysis of crystal contacts across the various space groups mentioned in <xref ref-type="table" rid="viruses-04-00348-t001">Table 1</xref> affirms that there are several crystal contacts on the flap and elbow regions. Residues that formed crystal contacts in all the structures within each space group were used to calculate consensus contact regions within the space group. Though there are regions of contact that are specific to some space groups, the elbow and flap stretch and a number of other key contact points were common for all the space groups. </p>
      <table-wrap id="viruses-04-00348-t001" position="anchor">
        <object-id pub-id-type="pii">viruses-04-00348-t001_Table 1</object-id>
        <label>Table 1</label>
        <caption>
          <p>Distribution of crystal contacts by residue in representative structures from each of the space groups reported for PR. This is not a table but a figure. Author need to use Word Table tools to format table.</p>
        </caption>
        <table>
          <tbody><tr><td><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-i001.tif"/></td></tr></tbody>
        </table>
      </table-wrap>
      <p><xref ref-type="fig" rid="viruses-04-00348-f005">Figure 5</xref>A, a plot of the range of temperature factors and spatial displacement by residue, confirms the elbow, the tip of the flap, and the other loops as maxima. Overall, the <italic>B</italic>-factor seems to adequately fulfill its role as a <italic>de facto</italic> measure of spatial variation because there is high agreement in the location of the extrema of the two data series. However, the <italic>B</italic>-factor is not as reliable in predicting the magnitude of these extrema. Crystal contacts deduced from structures in different space groups seemingly coincide with regions of higher <italic>B</italic>-factors which may support the effect of crystal artifacts on the actual dynamics and <italic>B-</italic>factor values. The unreliability is especially apparent when separately treating ligand-bound (<xref ref-type="fig" rid="viruses-04-00348-f005">figure 5</xref>B) and ligand-free (<xref ref-type="fig" rid="viruses-04-00348-f005">figure 5</xref>C) monomers. The majority of PR structures in the PDB are bound to a ligand, so <xref ref-type="fig" rid="viruses-04-00348-f005">figure 5</xref>A,B do not differ significantly. </p>
      <fig id="viruses-04-00348-f005" position="anchor">
        <label>Figure 5</label>
        <caption>
          <p>PR <italic>B</italic>-factor (orange) and spatial displacement (blue) variation with residue number. (<bold>A</bold>) Final data set, (<bold>B</bold>) bound monomers, (<bold>C</bold>) ligand-free monomers. Values were normalized for comparison purposes. Mean and standard deviation values are given in <xref ref-type="table" rid="viruses-04-00348-t002">Table 2</xref>. Secondary structure elements are identified.</p>
        </caption>
        <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-g005.tif"/>
      </fig>
      <p>In <xref ref-type="fig" rid="viruses-04-00348-f005">figure 5</xref>C on the other hand, the tip of the flap exhibits by far the greatest displacement from the mean ligand-free structure, and this value is much larger than the corresponding <italic>B</italic>-factor might indicate. Furthermore, the distribution of ligand-free structures is as a whole more variable spatially than that of bound ones, as described in <xref ref-type="table" rid="viruses-04-00348-t002">Table 2</xref>. Spatial displacement over ligand-free structures has both a greater mean, 0.577 Å, and a greater standard deviation, 0.465 Å, than over bound structures (mean = 0.343 Å, standard deviation = 0.160 Å). The difference may be partially due to the discrepancy in sample sizes, but it nevertheless suggests the possibility of multiple PR conformations in the absence of a ligand. </p>
      <table-wrap id="viruses-04-00348-t002" position="anchor">
        <object-id pub-id-type="pii">viruses-04-00348-t002_Table 2</object-id>
        <label>Table 2</label>
        <caption>
          <p>PR B-factor and spatial displacement distribution. This table shows the mean spatial displacement observed in the ligand-bound Vs ligand-free PR structures. This table is a figure.</p>
        </caption>
        <table>
          <tbody><tr><td><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-i002.tif"/></td></tr></tbody>
        </table>
      </table-wrap>
      <p>From this analysis of an entire array of structures, it is also possible to obtain an estimate of what Å value represents a significant conformational change. Referring to the statistics in <xref ref-type="table" rid="viruses-04-00348-t002">Table 2</xref>, a change of 0.5 Å or below is within error range. A spatial displacement of 1.0 Å, or approximately four standard deviations from the mean of the entire study-set, is more convincing. In the distance matrix of pairwise <italic>RMSD</italic>s, of which a small segment is given as <xref ref-type="table" rid="viruses-04-00348-t003">Table 3</xref>, several structures, namely PDB IDs 1xl2, 2fns, 2fnt, 2hs1, 2hs2, and 5upj, have <italic>RMSD</italic>s of 0.95 Å and above separating the monomers that compose them, supporting the finding that the two monomers adopt different conformational states when PR binds an asymmetric ligand [<xref ref-type="bibr" rid="B19-viruses-04-00348">19</xref>]. The initial work of Prabu-Jeyabalan <italic>et al.</italic> [<xref ref-type="bibr" rid="B19-viruses-04-00348">19</xref>] was on an inactivated HIV-1 PR-substrate complex, and the two monomers in the reported structure (PDB ID 1f7a) have an <italic>RMSD</italic> of only 0.34 Å. However, of the aforementioned structures, only PDB IDs 2fns and 2fnt have peptide ligands whereas the rest are bound to non-peptides, and PDB ID 5upj is an HIV-2 PR. The observation may therefore be conjectured to hold generally for HIV proteases and asymmetric ligands.</p>
      <table-wrap id="viruses-04-00348-t003" position="anchor">
        <object-id pub-id-type="pii">viruses-04-00348-t003_Table 3</object-id>
        <label>Table 3</label>
        <caption>
          <p>Representative table of the pairwise RMSD distance (Å) matrix of the 587 monomers in the study set. Rows and columns are labeled with the PDB ID and chain identifier. This is a figure.</p>
        </caption>
        <table>
          <tbody><tr><td><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-i003.tif"/></td></tr></tbody>
        </table>
      </table-wrap>
      <p>To further understand the effects of ligand binding on PR structure, monomers were superposed within the bound and ligand-free subsets. <xref ref-type="fig" rid="viruses-04-00348-f006">Figure 6</xref>A shows the result for the ligand-free monomers. Surprisingly, not all structures had flaps in the “semi-open” or open conformations. Several exhibited the closed flap conformation, though closer examination revealed these to belong to covalently-bonded PR dimers (PDB IDs 1g6l and 1lv1) that were split into monomers by removing the bridge of connecting residues. This also explains why these structures differ noticeably at the C-terminus from the other ligand-free structures. The superposition of the mean bound and mean ligand-free monomers rendered as a ribbon diagram in <xref ref-type="fig" rid="viruses-04-00348-f006">figure 6</xref>B, shows a prominent difference in the tip of the flap but little variation elsewhere. <xref ref-type="fig" rid="viruses-04-00348-f006">Figure 6</xref>C gives the same information as a plot of spatial difference by residue, and the spike corresponding to the tip of the flap is unmistakable. However, the maximal distance, 2.75 Å, is smaller than the actual deviation between the open and closed conformations, because the mean ligand-free structure is closer to the semi-open state due to averaging.</p>
      <fig id="viruses-04-00348-f006" position="anchor">
        <label>Figure 6</label>
        <caption>
          <p>Ligand effects on PR monomer structure. (A) Superposition of ligand-free monomers; (B) superposition of mean ligand-free (orange) and bound (blue) monomers; (C) plot of spatial difference Vs residue number for mean ligand-free and bound monomers. Ribbon diagrams rendered using PyMOL.</p>
        </caption>
        <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-g006.tif"/>
      </fig>
      <p>Monomers were also organized on the basis of crystallographic space group and superposed to obtain mean monomers. Most of the representative monomers were in the closed conformation, as shown in <xref ref-type="fig" rid="viruses-04-00348-f007">figure 7</xref>. Exceptions were the monomers corresponding to the C2, P4<sub>1</sub>2<sub>1</sub>2, and P4<sub>1</sub> space groups. Interestingly, all structures that crystallized in the C2 space group were bound to a ligand, as listed in <xref ref-type="table" rid="viruses-04-00348-t004">Table 4</xref>. This surprising observation may be due to the fact that all C2 structures except PDB ID 1ztz were of HIV-2 PR and solved during the same study. The deviation of the P4<sub>1</sub>2<sub>1</sub>2 space group is accounted for by noting that all its monomers are ligand-free, except one (PDB ID 3bc4 [<xref ref-type="bibr" rid="B20-viruses-04-00348">20</xref>]) whose flaps are prevented from closing by two non-peptide inhibitors that pack the active site, acting as a wedge. Finally, the P4<sub>1</sub> space group has an almost equal distribution of bound and ligand-free monomers but differs the most from the closed conformation, which would be expected of a predominantly ligand-free space group. This may be because most of the P4<sub>1</sub> structures have mutations at residues 82 and 84, which are essential to ligand binding and structural stability in the active site [<xref ref-type="bibr" rid="B21-viruses-04-00348">21</xref>]. </p>
      <fig id="viruses-04-00348-f007" position="anchor">
        <label>Figure 7</label>
        <caption>
          <p>Superposition of mean PR monomer structures for the space groups: P2<sub>1</sub> (orange), C2 (red), P2<sub>1</sub>2<sub>1</sub>2 (chartreuse), P2<sub>1</sub>2<sub>1</sub>2<sub>1</sub> (yellow), I222 (purple), P4<sub>1</sub> (cyan), P4<sub>3</sub> (lime green), P4<sub>1</sub>2<sub>1</sub>2 (blue), P4<sub>3</sub>2<sub>1</sub>2 (magenta), I4<sub>1</sub>22 (salmon), P6<sub>1</sub> (olive), P6<sub>5</sub> (brown), P6<sub>1</sub>22 (pink) and I2<sub>1</sub>3 (green). Ribbon diagram rendered using PyMOL. <xref ref-type="table" rid="viruses-04-00348-t003">Table 3</xref> describes the distribution of space groups in the final data set.</p>
        </caption>
        <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-g007.tif"/>
      </fig>
      <table-wrap id="viruses-04-00348-t004" position="anchor">
        <object-id pub-id-type="pii">viruses-04-00348-t004_Table 4</object-id>
        <label>Table 4</label>
        <caption>
          <p>Distribution of PR structures by space group. In the strictest sense, the distribution should be further subdivided because not all structures belonging to the same space group have isomorphous unit cells.</p>
        </caption>
        <table>
          <tbody><tr><td><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-i004.tif"/></td></tr></tbody>
        </table>
      </table-wrap>
    </sec>
    <sec>
      <title>3. Experimental Section</title>
      <sec>
        <title>3.1. Data-mining the Protein Data Bank</title>
        <p>A list of relevant PDB IDs was obtained by querying the PDB for structures matching the keywords “HIV protease.” The same 174 hits were returned regardless of whether the query was executed programmatically or through the PDB’s web interface. Expanding the search parameters to include “human immunodeficiency virus protease” increased the number of results to 405. However, a review of the literature revealed that the true number of relevant structures in the PDB is greater still. Using the aforementioned search parameters, a home-grown script that does not limit itself to phrases explicitly declared as keywords found an additional 34 PDB IDs. Several of the previously-missed files were in fact PR structures that had managed to evade normal search mechanisms by having only general keywords, such as hydrolase.</p>
      </sec>
      <sec>
        <title>3.2. Refining the search results</title>
        <p>Of the 439 PDB IDs obtained, many corresponded not to PR, but to other proteins, including but not limited to integrase, reverse transcriptase, and proteases of the simian immunodeficiency viruses and feline immunodeficiency viruses (SIV and FIV, respectively). Therefore, a screening tool was implemented to omit structures containing no chains with primary structures within a specified edit distance [<xref ref-type="bibr" rid="B12-viruses-04-00348">12</xref>] of several reference PR sequences from both HIV-1 and HIV-2. The two variants exhibit very similar overall structure despite having only about 50% sequence identity [<xref ref-type="bibr" rid="B13-viruses-04-00348">13</xref>]. The results of the screen still had to be checked manually; SIV protease structures passed this test due to high sequence homology with PR. Conversely, covalently bonded PR dimers failed due to high edit distances; one of the monomers as well as any connecting residues had to be deleted to match the reference sequences. Hence, SIV proteases were removed, and tethered HIV proteases were readded. Despite the limitations, developing tools to facilitate the tedious task of selecting search results is an essential step towards the ultimate goal of having complete data mining packages to take advantage of the ever-increasing volume of information contained in the PDB.</p>
      </sec>
      <sec>
        <title>3.3. Quality control</title>
        <p>The 368 structure files remaining after the previous step included several structures from NMR experiments and were split into 811 PR monomers. In the case of the covalently bonded PR dimers, this involved stripping any connecting residues. NMR structures were excluded from the study-set. The final data-set can be found in the supplementary data section.</p>
      </sec>
      <sec>
        <title>3.4. Structure superposition</title>
        <p>The problem of superposing two sets of three-dimensional coordinates (atomic or not) reduces to identifying the rotation and translation that minimize some error function, usually the root-mean-square deviation (<italic>RMSD</italic>) between the coordinate sets in question: </p>
        <p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-i005.tif"/></p>
        <p> Because <italic>N</italic> is just a constant, and the square root is a strictly increasing function, this is equivalent to the least squares criterion. In this case, the optimal translation can be calculated independent of rotation and is well known as the vector separating the centroids of the two coordinate sets [<xref ref-type="bibr" rid="B15-viruses-04-00348">15</xref>].</p>
        <p>Several methods are known for computing the optimal rotation. The most popular seems to be that of Kabsch [<xref ref-type="bibr" rid="B16-viruses-04-00348">16</xref>], but this approach represents rotations as 3 x 3 matrices and can give rise to rotoinversions. For this investigation, the method of Horn [<xref ref-type="bibr" rid="B17-viruses-04-00348">17</xref>] was chosen because it circumvents this problem by instead representing rotations as unit quaternions. Letting (<italic>x<sub>Ai</sub>, y<sub>Ai</sub>, z<sub>Ai</sub></italic>) denote the displacement of the <italic>i</italic>th point in set <italic>A</italic> from its centroid, the optimal rotation becomes the eigenvector corresponding to the most positive eigenvalue of the symmetric 4 X 4 matrix: </p>
        <p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-i006.tif"/></p>
        <p>where <italic>S<sub>xx</sub></italic>= Σ<italic><sub>i</sub> x<sub>Ai</sub>x<sub>Bi</sub></italic>, <italic>S<sub>xy</sub></italic>= Σ<italic><sub>i</sub> x<sub>Ai</sub>y<sub>Bi</sub></italic>, and so on.</p>
        <p>Applying this algorithm to the atomic coordinates of protein structures involves accounting for fractional occupancies and multiple conformations. This amounts to using the occupancy-weighted centroids for the translation step and generalizing <italic>x<sub>Ai</sub></italic> to the occupancy-weighted average of its conformations <italic>j</italic>, </p>
        <p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-i007.tif"/></p>
        <p>and similarly for <italic>y</italic> and <italic>z</italic>, where <italic>occ<sub>Aij</sub></italic> denotes the occupancy of the <italic>j</italic>th conformation and <italic>occ<sub>Ai = </sub></italic>Σ<italic><sub>j</sub>occ<sub>Aij</sub></italic> is the total occupancy corresponding to (<italic>x<sub>Ai</sub>, y<sub>Ai</sub>, z<sub>Ai</sub>)</italic> . However, the <italic>S</italic> summations are also occupancy-weighted, which factors out the denominators in (3), and S<italic><sub>xy</sub></italic>, for example, becomes </p>
        <p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="viruses-04-00348-i008.tif"/></p>
        <p>In the simple case of a single conformation with full occupancy, the formula reduces to that of Horn [<xref ref-type="bibr" rid="B17-viruses-04-00348">17</xref>].</p>
        <p>Unfortunately, a closed-form recipe for the superposition of multiple structures does not exist. A first approach might be to superpose all structures onto the same reference structure, but the results may be erroneous if the reference is poorly chosen. A possible improvement would be to superpose each structure onto the mean of those already considered, but even this strategy is vulnerable to the effects of an arbitrary order of superposition because structures considered earlier are inherently attributed more importance. To alleviate this problem, the 587 monomers to be analyzed were sorted from highest to lowest resolution, with ties being broken in favor of the structure with the lowest mean C<sup>α</sup> <italic>B</italic>-factor. Equal treatment of all structures would be ideal, but preferring “better” monomers is acceptable.</p>
      </sec>
    </sec>
    <sec sec-type="conclusions">
      <title>4. Conclusions</title>
      <p>Analysis of a static array of PDB structures to gain further insight about a protein has great potential as a method to deduce a statistical bar for structural variation, as demonstrated by this PR case study. While there are other algorithms to data-mine the PDB, it is clear from this study that quality control is required before using a data-set for analysis. There also exist algorithms to superpose structures, but to our knowledge, this is the first method that also occupancy-weighs available conformations for the superposition. The algorithms described here were used to data-mine the PDB, filter search results, perform quality control, and superpose structures. This made possible a comparison of <italic>B</italic>-factors and spatial variation over the entire study-set of PR monomers, the bound and ligand-free subsets, and the different represented space groups. Examination of the resulting distributions is an alternative way of identifying a protein’s most variable regions and qualifying spatial displacement as significant or within the range of error. </p>
      <p>However, such an approach to protein study is made more difficult by the many different practices of PDB depositors even within the limits of a file format with a detailed specification. Choice of title, choice of keywords, numbering of residues, and organization into models and chains are often overlooked. This is unnoticeable to a human user, but it makes selection of the study-set the most complex and error-prone step of a data-mining endeavor. Additionally, many structures abuse <italic>B</italic>-factors, occupancies, and other parameters, or assign them special meaningless values not specified by the PDB file format. Therefore quality controls must be implemented to exclude from such investigations any structures with statistics that might bias results. For data-mining investigations to be successful, a paradigm shift will be required of depositors to the PDB: to stop treating the painstaking process of preparing a structure for submission as an unnecessary complication and see the PDB itself not just as a collection of coordinates, but as a tool that could shed light on many of the questions of structural biology.</p>
    </sec>
    <sec>
      <title>Acknowledgments</title>
      <p>This project could not have happened without the work of all those who have deposited structures to the PDB and those who continue to maintain a smoothly-running database. </p>
    </sec>
    <sec>
      <title>Conflict of Interest</title>
      <p>The authors declare no conflict of interest.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <title>References and Notes</title>
      <ref id="B1-viruses-04-00348">
        <label>1.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Berman</surname>
              <given-names>H.M.</given-names>
            </name>
            <name>
              <surname>Battistuz</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Bhat</surname>
              <given-names>T.N.</given-names>
            </name>
            <name>
              <surname>Bluhm</surname>
              <given-names>W.F.</given-names>
            </name>
            <name>
              <surname>Bourne</surname>
              <given-names>P.E.</given-names>
            </name>
            <name>
              <surname>Burkhardt</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Feng</surname>
              <given-names>Z.</given-names>
            </name>
            <name>
              <surname>Gilliland</surname>
              <given-names>G.L.</given-names>
            </name>
            <name>
              <surname>Iype</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Jain</surname>
              <given-names>S.</given-names>
            </name>
            <etal/>
          </person-group>
          <article-title>The Protein Data Bank</article-title>
          <source>Acta Crystallogr., Sect. D: Biol. Crystallogr.</source>
          <year>2002</year>
          <volume>58</volume>
          <fpage>899</fpage>
          <lpage>907</lpage>
          <pub-id pub-id-type="doi">10.1107/S0907444902003451</pub-id>
        </citation>
      </ref>
      <ref id="B2-viruses-04-00348">
        <label>2.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Bernstein</surname>
              <given-names>F.C.</given-names>
            </name>
            <name>
              <surname>Koetzle</surname>
              <given-names>T.F.</given-names>
            </name>
            <name>
              <surname>Williams</surname>
              <given-names>G.J.</given-names>
            </name>
            <name>
              <surname>Meyer</surname>
              <given-names>E.F.</given-names>
              <suffix>Jr.</suffix>
            </name>
            <name>
              <surname>Brice</surname>
              <given-names>M.D.</given-names>
            </name>
            <name>
              <surname>Rodgers</surname>
              <given-names>J.R.</given-names>
            </name>
            <name>
              <surname>Kennard</surname>
              <given-names>O.</given-names>
            </name>
            <name>
              <surname>Shimanouchi</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Tasumi</surname>
              <given-names>M.</given-names>
            </name>
          </person-group>
          <article-title>The Protein Data Bank: a computer-based archival file for macromolecular structures</article-title>
          <source>Arch. Biochem. Biophys.</source>
          <year>1978</year>
          <volume>185</volume>
          <fpage>584</fpage>
          <lpage>591</lpage>
          <pub-id pub-id-type="doi">10.1016/0003-9861(78)90204-7</pub-id>
        </citation>
      </ref>
      <ref id="B3-viruses-04-00348">
        <label>3.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Berman</surname>
              <given-names>H.M.</given-names>
            </name>
            <name>
              <surname>Westbrook</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Feng</surname>
              <given-names>Z.</given-names>
            </name>
            <name>
              <surname>Gilliland</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Bhat</surname>
              <given-names>T.N.</given-names>
            </name>
            <name>
              <surname>Weissig</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Shindyalov</surname>
              <given-names>I.N.</given-names>
            </name>
            <name>
              <surname>Bourne</surname>
              <given-names>P.E.</given-names>
            </name>
          </person-group>
          <article-title>The Protein Data Bank</article-title>
          <source>Nucleic Acids Res.</source>
          <year>2000</year>
          <volume>28</volume>
          <fpage>235</fpage>
          <lpage>242</lpage>
        <pub-id pub-id-type="doi">10.1093/nar/28.1.235</pub-id><pub-id pub-id-type="pmid">10592235</pub-id></citation>
      </ref>
      <ref id="B4-viruses-04-00348">
        <label>4.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Debouck</surname>
              <given-names>C.</given-names>
            </name>
          </person-group>
          <article-title>The HIV-1 protease as a therapeutic target for AIDS</article-title>
          <source>AIDS Res. Hum. Retroviruses</source>
          <year>1992</year>
          <volume>8</volume>
          <fpage>153</fpage>
          <lpage>164</lpage>
        <pub-id pub-id-type="doi">10.1089/aid.1992.8.153</pub-id><pub-id pub-id-type="pmid">1540403</pub-id></citation>
      </ref>
      <ref id="B5-viruses-04-00348">
        <label>5.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Flexner</surname>
              <given-names>C.</given-names>
            </name>
          </person-group>
          <article-title>HIV-protease inhibitors</article-title>
          <source>N. Engl. J. Med.</source>
          <year>1998</year>
          <volume>338</volume>
          <fpage>1281</fpage>
          <lpage>1292</lpage>
        <pub-id pub-id-type="doi">10.1056/NEJM199804303381808</pub-id><pub-id pub-id-type="pmid">9562584</pub-id></citation>
      </ref>
      <ref id="B6-viruses-04-00348">
        <label>6.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Carpenter</surname>
              <given-names>C.C.</given-names>
            </name>
            <name>
              <surname>Fischl</surname>
              <given-names>M.A.</given-names>
            </name>
            <name>
              <surname>Hammer</surname>
              <given-names>S.M.</given-names>
            </name>
            <name>
              <surname>Hirsch</surname>
              <given-names>M.S.</given-names>
            </name>
            <name>
              <surname>Jacobsen</surname>
              <given-names>D.M.</given-names>
            </name>
            <name>
              <surname>Katzenstein</surname>
              <given-names>D.A.</given-names>
            </name>
            <name>
              <surname>Montaner</surname>
              <given-names>J.S.</given-names>
            </name>
            <name>
              <surname>Richman</surname>
              <given-names>D.D.</given-names>
            </name>
            <name>
              <surname>Saag</surname>
              <given-names>M.S.</given-names>
            </name>
            <name>
              <surname>Schooley</surname>
              <given-names>R.T.</given-names>
            </name>
            <etal/>
          </person-group>
          <article-title>Antiretroviral therapy for HIV infection in 1997. Updated recommendations of the International AIDS Society-USA panel</article-title>
          <source>JAMA</source>
          <year>1997</year>
          <volume>277</volume>
          <fpage>1962</fpage>
          <lpage>1969</lpage>
        <pub-id pub-id-type="doi">10.1001/jama.1997.03540480062040</pub-id><pub-id pub-id-type="pmid">9200638</pub-id></citation>
      </ref>
      <ref id="B7-viruses-04-00348">
        <label>7.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Condra</surname>
              <given-names>J.H.</given-names>
            </name>
            <name>
              <surname>Schleif</surname>
              <given-names>W.A.</given-names>
            </name>
            <name>
              <surname>Blahy</surname>
              <given-names>O.M.</given-names>
            </name>
            <name>
              <surname>Gabryelski</surname>
              <given-names>L.J.</given-names>
            </name>
            <name>
              <surname>Graham</surname>
              <given-names>D.J.</given-names>
            </name>
            <name>
              <surname>Quintero</surname>
              <given-names>J.C.</given-names>
            </name>
            <name>
              <surname>Rhodes</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Robbins</surname>
              <given-names>H.L.</given-names>
            </name>
            <name>
              <surname>Roth</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Shivaprakash</surname>
              <given-names>M.</given-names>
            </name>
          </person-group>
          <article-title><italic>In vivo</italic> emergence of HIV-1 variants resistant to multiple protease inhibitors</article-title>
          <source>Nature</source>
          <year>1995</year>
          <volume>374</volume>
          <fpage>569</fpage>
          <lpage>571</lpage>
          <pub-id pub-id-type="doi">10.1038/374569a0</pub-id>
        </citation>
      </ref>
      <ref id="B8-viruses-04-00348">
        <label>8.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Clemente</surname>
              <given-names>J.C.</given-names>
            </name>
            <name>
              <surname>Robbins</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Graña</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Paleo</surname>
              <given-names>M.R.</given-names>
            </name>
            <name>
              <surname>Correa</surname>
              <given-names>J.F.</given-names>
            </name>
            <name>
              <surname>Villaverde</surname>
              <given-names>M.C.</given-names>
            </name>
            <name>
              <surname>Sardina</surname>
              <given-names>F. J.</given-names>
            </name>
            <name>
              <surname>Govindasamy</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Agbandje-McKenna</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>McKenna</surname>
              <given-names>R.</given-names>
            </name>
            <etal/>
          </person-group>
          <article-title>Design, synthesis, evaluation, and crystallographic-based structural studies of HIV-1 protease inhibitors with reduced response to the V82A mutation</article-title>
          <source>J. Med. Chem.</source>
          <year>2008</year>
          <volume>51</volume>
          <fpage>852</fpage>
          <lpage>860</lpage>
        <pub-id pub-id-type="doi">10.1021/jm701170f</pub-id><pub-id pub-id-type="pmid">18215016</pub-id></citation>
      </ref>
      <ref id="B9-viruses-04-00348">
        <label>9.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Heaslet</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Rosenfeld</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Giffin</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Lin</surname>
              <given-names>Y.C.</given-names>
            </name>
            <name>
              <surname>Tam</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Torbett</surname>
              <given-names>B.E.</given-names>
            </name>
            <name>
              <surname>Elder</surname>
              <given-names>J.H.</given-names>
            </name>
            <name>
              <surname>McRee</surname>
              <given-names>D. E.</given-names>
            </name>
            <name>
              <surname>Stout</surname>
              <given-names>C.D.</given-names>
            </name>
          </person-group>
          <article-title>Conformational flexibility in the flap domains of ligand-free HIV protease</article-title>
          <source>Acta Crystallogr., Sect. D: Biol. Crystallogr.</source>
          <year>2007</year>
          <volume>63</volume>
          <fpage>866</fpage>
          <lpage>875</lpage>
          <pub-id pub-id-type="doi">10.1107/S0907444907029125</pub-id>
        </citation>
      </ref>
      <ref id="B10-viruses-04-00348">
        <label>10.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Wlodawer</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Miller</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Jaskólski</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Sathyanarayana</surname>
              <given-names>B.K.</given-names>
            </name>
            <name>
              <surname>Baldwin</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Weber</surname>
              <given-names>I.T.</given-names>
            </name>
            <name>
              <surname>Selk</surname>
              <given-names>L.M.</given-names>
            </name>
            <name>
              <surname>Clawson</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Schneider</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Kent</surname>
              <given-names>S.B.</given-names>
            </name>
          </person-group>
          <article-title>Conserved folding in retroviral proteases: crystal structure of a synthetic HIV-1 protease</article-title>
          <source>Science</source>
          <year>1989</year>
          <volume>245</volume>
          <fpage>616</fpage>
          <lpage>621</lpage>
        <pub-id pub-id-type="doi">10.1126/science.2548279</pub-id><pub-id pub-id-type="pmid">2548279</pub-id></citation>
      </ref>
      <ref id="B11-viruses-04-00348">
        <label>11.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Wlodawer</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Vondrasek</surname>
              <given-names>J.</given-names>
            </name>
          </person-group>
          <article-title>Inhibitors of HIV-1 protease: a major success of structure-assisted drug design</article-title>
          <source>Ann. Rev. Biophys. Biomol. Struct.</source>
          <year>1998</year>
          <volume>27</volume>
          <fpage>249</fpage>
          <lpage>284</lpage>
        <pub-id pub-id-type="doi">10.1146/annurev.biophys.27.1.249</pub-id></citation>
      </ref>
      <ref id="B12-viruses-04-00348">
        <label>12.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Levenshtein</surname>
              <given-names>V.I.</given-names>
            </name>
          </person-group>
          <article-title>Binary codes capable of correcting deletions, insertions, and reversals</article-title>
          <source>Sov. Phys. Dokl.</source>
          <year>1966</year>
          <volume>10</volume>
          <fpage>707</fpage>
          <lpage>710</lpage>
        </citation>
      </ref>
      <ref id="B13-viruses-04-00348">
        <label>13.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Priestle</surname>
              <given-names>J.P.</given-names>
            </name>
            <name>
              <surname>Fässler</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Rösel</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Tintelnot-Blomley</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Strop</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Grütter</surname>
              <given-names>M.G.</given-names>
            </name>
          </person-group>
          <article-title>Comparative analysis of the X-ray structures of HIV-1 and HIV-2 proteases in complex with CGP 53820, a novel pseudosymmetric inhibitor</article-title>
          <source>Struct.</source>
          <year>1995</year>
          <volume>3</volume>
          <fpage>381</fpage>
          <lpage>389</lpage>
        <pub-id pub-id-type="doi">10.1016/S0969-2126(01)00169-1</pub-id></citation>
      </ref>
      <ref id="B14-viruses-04-00348">
        <label>14.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Wang</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Dauter</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Alkire</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Joachimiak</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Dauter</surname>
              <given-names>Z.</given-names>
            </name>
          </person-group>
          <article-title>Triclinic lysozyme at 0.65 A resolution</article-title>
          <source>Acta Crystallogr., Sect. D: Biol. Crystallogr.</source>
          <year>2007</year>
          <volume>63</volume>
          <fpage>1254</fpage>
          <lpage>1268</lpage>
          <pub-id pub-id-type="doi">10.1107/S0907444907054224</pub-id>
        </citation>
      </ref>
      <ref id="B15-viruses-04-00348">
        <label>15.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Flower</surname>
              <given-names>D.R.</given-names>
            </name>
          </person-group>
          <article-title>Rotational superposition: a review of methods</article-title>
          <source>J. Mol. Graphics Modell.</source>
          <year>1999</year>
          <volume>17</volume>
          <fpage>238</fpage>
          <lpage>244</lpage>
        </citation>
      </ref>
      <ref id="B16-viruses-04-00348">
        <label>16.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Kabsch</surname>
              <given-names>W. </given-names>
            </name>
          </person-group>
          <article-title>A discussion of the solution for the best rotation to relate two sets of vectors</article-title>
          <source>Acta Crystallogr., Sect. A: Found. Crystallogr.</source>
          <year>1978</year>
          <volume>34</volume>
          <fpage>827</fpage>
          <lpage>828</lpage>
          <pub-id pub-id-type="doi">10.1107/S0567739478001680</pub-id>
        </citation>
      </ref>
      <ref id="B17-viruses-04-00348">
        <label>17.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Horn</surname>
              <given-names>B.K.P.</given-names>
            </name>
          </person-group>
          <article-title>Closed-form solution of absolute orientation using unit quaternions</article-title>
          <source>J. Opt. Soc. Am. A:</source>
          <year>1987</year>
          <volume>4</volume>
          <fpage>629</fpage>
          <lpage>642</lpage>
          <pub-id pub-id-type="doi">10.1364/JOSAA.4.000629</pub-id>
        </citation>
      </ref>
      <ref id="B18-viruses-04-00348">
        <label>18.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Collins</surname>
              <given-names>J.R.</given-names>
            </name>
            <name>
              <surname>Burt</surname>
              <given-names>S.K.</given-names>
            </name>
            <name>
              <surname>Erickson</surname>
              <given-names>J.W.</given-names>
            </name>
          </person-group>
          <article-title>Flap opening in HIV-1 protease simulated by “activated” molecular dynamics</article-title>
          <source>Nat. Struct. Biol.</source>
          <year>1995</year>
          <volume>2</volume>
          <fpage>334</fpage>
          <lpage>338</lpage>
        <pub-id pub-id-type="doi">10.1038/nsb0495-334</pub-id><pub-id pub-id-type="pmid">7796268</pub-id></citation>
      </ref>
      <ref id="B19-viruses-04-00348">
        <label>19.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Prabu-Jeyabalan</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Nalivaika</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Schiffer</surname>
              <given-names>C.A.</given-names>
            </name>
          </person-group>
          <article-title>How does a symmetric dimer recognize an asymmetric substrate? A substrate complex of HIV-1 protease</article-title>
          <source>J. Mol. Biol.</source>
          <year>2000</year>
          <volume>301</volume>
          <fpage>1207</fpage>
          <lpage>1220</lpage>
        <pub-id pub-id-type="doi">10.1006/jmbi.2000.4018</pub-id><pub-id pub-id-type="pmid">10966816</pub-id></citation>
      </ref>
      <ref id="B20-viruses-04-00348">
        <label>20.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Böttcher</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Blum</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Dörr</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Heine</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Diederich</surname>
              <given-names>W.E.</given-names>
            </name>
            <name>
              <surname>Klebe</surname>
              <given-names>G.</given-names>
            </name>
          </person-group>
          <article-title>Targeting the open-flap conformation of HIV-1 protease with pyrrolidine-based inhibitors</article-title>
          <source>ChemMedChem</source>
          <year>2008</year>
          <volume>3</volume>
          <fpage>1337</fpage>
          <lpage>1344</lpage>
        <pub-id pub-id-type="doi">10.1002/cmdc.200800113</pub-id><pub-id pub-id-type="pmid">18720485</pub-id></citation>
      </ref>
      <ref id="B21-viruses-04-00348">
        <label>21.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Logsdon</surname>
              <given-names>B.C.</given-names>
            </name>
            <name>
              <surname>Vickrey</surname>
              <given-names>J.F.</given-names>
            </name>
            <name>
              <surname>Martin</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Proteasa</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Koepke</surname>
              <given-names>J.I.</given-names>
            </name>
            <name>
              <surname>Terlecky</surname>
              <given-names>S.R.</given-names>
            </name>
            <name>
              <surname>Wawrzak</surname>
              <given-names>Z.</given-names>
            </name>
            <name>
              <surname>Winters</surname>
              <given-names>M.A.</given-names>
            </name>
            <name>
              <surname>Merigan</surname>
              <given-names>T.C.</given-names>
            </name>
            <name>
              <surname>Kovari</surname>
              <given-names>L.C.</given-names>
            </name>
          </person-group>
          <article-title>Crystal structures of a multidrug-resistant human immunodeficiency virus type 1 protease reveal an expanded active-site cavity</article-title>
          <source>J. Virol.</source>
          <year>2004</year>
          <volume>78</volume>
          <fpage>3123</fpage>
          <lpage>3132</lpage>
        <pub-id pub-id-type="doi">10.1128/JVI.78.6.3123-3132.2004</pub-id><pub-id pub-id-type="pmid">14990731</pub-id></citation>
      </ref>
    </ref-list>
  </back>
</article>
