<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">ijms</journal-id>
<journal-title>International Journal of Molecular Sciences</journal-title>
<abbrev-journal-title>Int. J. Mol. Sci.</abbrev-journal-title>
<issn pub-type="epub">1422-0067</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/ijms13078752</article-id>
<article-id pub-id-type="publisher-id">ijms-13-08752</article-id>
<article-categories>
<subj-group>
<subject>Article</subject></subj-group></article-categories>
<title-group>
<article-title>Comparison of Different Ranking Methods in Protein-Ligand Binding Site Prediction</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Gao</surname><given-names>Jun</given-names></name><xref ref-type="aff" rid="af1-ijms-13-08752">1</xref><xref ref-type="aff" rid="af2-ijms-13-08752">2</xref><xref ref-type="author-notes" rid="fn1-ijms-13-08752">†</xref></contrib>
<contrib contrib-type="author">
<name><surname>Liu</surname><given-names>Qi</given-names></name><xref ref-type="aff" rid="af1-ijms-13-08752">1</xref><xref ref-type="author-notes" rid="fn1-ijms-13-08752">†</xref></contrib>
<contrib contrib-type="author">
<name><surname>Kang</surname><given-names>Hong</given-names></name><xref ref-type="aff" rid="af1-ijms-13-08752">1</xref></contrib>
<contrib contrib-type="author">
<name><surname>Cao</surname><given-names>Zhiwei</given-names></name><xref ref-type="aff" rid="af1-ijms-13-08752">1</xref><xref ref-type="corresp" rid="c1-ijms-13-08752">*</xref></contrib>
<contrib contrib-type="author">
<name><surname>Zhu</surname><given-names>Ruixin</given-names></name><xref ref-type="aff" rid="af1-ijms-13-08752">1</xref><xref ref-type="aff" rid="af3-ijms-13-08752">3</xref><xref ref-type="aff" rid="af4-ijms-13-08752">4</xref><xref ref-type="corresp" rid="c1-ijms-13-08752">*</xref></contrib></contrib-group>
<aff id="af1-ijms-13-08752">
<label>1</label>College of Life Science and Biotechnology, Tongji University, Shanghai 200092, China; E-Mails: <email>jungao@shmtu.edu.cn</email> (J.G.); <email>qiliu@tongji.edu.cn</email> (Q.L.); <email>kangh67@hotmail.com</email> (H.K.)</aff>
<aff id="af2-ijms-13-08752">
<label>2</label>College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China</aff>
<aff id="af3-ijms-13-08752">
<label>3</label>Institute for Advanced Study of Translational Medicine, Tongji University, Shanghai 200092, China</aff>
<aff id="af4-ijms-13-08752">
<label>4</label>School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian 116600, China</aff>
<author-notes>
<corresp id="c1-ijms-13-08752">
<label>*</label>Authors to whom correspondence should be addressed; E-Mails: <email>zwcao@tongji.edu.cn</email> (Z.C.); <email>rxzhu@tongji.edu.cn</email> (R.Z.); Tel./Fax: +86-21-65981041 (Z.C.), (R.Z.).</corresp><fn id="fn1-ijms-13-08752">
<label>†</label>
<p>These authors contributed equally to this work.</p></fn></author-notes>
<pub-date pub-type="collection">
<year>2012</year></pub-date>
<pub-date pub-type="epub">
<day>16</day>
<month>07</month>
<year>2012</year></pub-date>
<volume>13</volume>
<issue>7</issue>
<fpage>8752</fpage>
<lpage>8761</lpage>
<history>
<date date-type="received">
<day>14</day>
<month>05</month>
<year>2012</year></date>
<date date-type="rev-recd">
<day>19</day>
<month>06</month>
<year>2012</year></date>
<date date-type="accepted">
<day>02</day>
<month>07</month>
<year>2012</year></date></history>
<permissions>
<copyright-statement>© 2012 by the authors; licensee Molecular Diversity Preservation International, Basel, Switzerland.</copyright-statement>
<copyright-year>2012</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0">
<p>This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).</p></license></permissions>
<abstract>
<p>In recent years, although many ligand-binding site prediction methods have been developed, there has still been a great demand to improve the prediction accuracy and compare different prediction algorithms to evaluate their performances. In this work, in order to improve the performance of the protein-ligand binding site prediction method presented in our former study, a comparison of different binding site ranking lists was studied. Four kinds of properties, <italic>i.e.</italic>, pocket size, distance from the protein centroid, sequence conservation and the number of hydrophobic residues, have been chosen as the corresponding ranking criterion respectively. Our studies show that the sequence conservation information helps to rank the real pockets with the most successful accuracy compared to others. At the same time, the pocket size and the distance of binding site from the protein centroid are also found to be helpful. In addition, a multi-view ranking aggregation method, which combines the information among those four properties, was further applied in our study. The results show that a better performance can be achieved by the aggregation of the complementary properties in the prediction of ligand-binding sites.</p></abstract>
<kwd-group>
<kwd>ranking aggregation</kwd>
<kwd>protein-ligand binding site</kwd>
<kwd>prediction</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<title>1. Introduction</title>
<p>In most cellular processes, proteins interact with many other molecules to perform their biological functions. The successful identification of ligand-binding sites on protein surfaces is generally the starting point for the annotation of protein function and drug discovery. In addition, as a result of various structural genomic projects performed, structural information of proteins with little or no functional annotations is increasing exponentially. However, in most cases, protein-ligand complex structures are not easily experimentally accessible, which leads to the demand of <italic>in silico</italic> methods to serve as an alternative [<xref ref-type="bibr" rid="b1-ijms-13-08752">1</xref>,<xref ref-type="bibr" rid="b2-ijms-13-08752">2</xref>]. Fortunately, it has been proven that the prediction of binding sites using computational methods is efficient and powerful compared to <italic>in vivo</italic> approaches, and several computational methods have been presented in this area [<xref ref-type="bibr" rid="b3-ijms-13-08752">3</xref>,<xref ref-type="bibr" rid="b4-ijms-13-08752">4</xref>]. However, research in this area is clearly in an infant stage and there still remain many issues to be solved and improved.</p>
<p>To predict the potential binding site, several computational methods have been developed. Briefly, these algorithms can be divided into three categories, <italic>i.e.</italic>, (1) purely geometry-based methods, which follow the assumption that the protein-ligand binding sites are generally located at crevices on the protein surface or cavities in the protein. Methods falling in this category include POCKET [<xref ref-type="bibr" rid="b5-ijms-13-08752">5</xref>], LIGSITE [<xref ref-type="bibr" rid="b6-ijms-13-08752">6</xref>], PASS [<xref ref-type="bibr" rid="b7-ijms-13-08752">7</xref>], SURFNET [<xref ref-type="bibr" rid="b8-ijms-13-08752">8</xref>], and PocketPicker [<xref ref-type="bibr" rid="b9-ijms-13-08752">9</xref>] <italic>etc</italic>.; (2) energetic-based methods, which coat the protein surface with a layer of probes to calculate van der Waals interaction energies between the protein and probes. As an example, Q-SiteFinder [<xref ref-type="bibr" rid="b10-ijms-13-08752">10</xref>] is a classical tool falling in this category; (3) knowledge based methods, which includs various statistical methods [<xref ref-type="bibr" rid="b11-ijms-13-08752">11</xref>], machine learning methods [<xref ref-type="bibr" rid="b12-ijms-13-08752">12</xref>] and similarity comparison methods. Besides, a part of them predict protein-ligand binding sites by searching for clusters or patterns of the conserved residues [<xref ref-type="bibr" rid="b13-ijms-13-08752">13</xref>,<xref ref-type="bibr" rid="b14-ijms-13-08752">14</xref>].</p>
<p>Generally speaking, a computational method for binding site prediction has to consider several challenging issues: (1) Identification of candidate protein-ligand binding sites [<xref ref-type="bibr" rid="b5-ijms-13-08752">5</xref>–<xref ref-type="bibr" rid="b17-ijms-13-08752">17</xref>], which relate to delimit cavities or pockets at the protein surface that are likely to bind molecules; (2) ranking binding sites according to their likeliness to accept a molecule, since there are often several presumed binding sites that can be predicted on a protein surface, and it is necessary to derive an approach to characterize and rank them to select the more relevant ones [<xref ref-type="bibr" rid="b18-ijms-13-08752">18</xref>]; (3) induced fit, which may enhance the fidelity of molecular recognition in the presence of competition and noise via conformational proofreading mechanism [<xref ref-type="bibr" rid="b19-ijms-13-08752">19</xref>]. In this study, we focus primarily on the ranking of binding sites. It is said that the largest pocket tends to frequently correspond to the observed ligand-binding site [<xref ref-type="bibr" rid="b20-ijms-13-08752">20</xref>]. Based on this assumption, most prediction methods rank the candidate sites according to the pocket size. Nevertheless, different studies have also tried to solve this ranking problem from other perspectives [<xref ref-type="bibr" rid="b16-ijms-13-08752">16</xref>,<xref ref-type="bibr" rid="b21-ijms-13-08752">21</xref>,<xref ref-type="bibr" rid="b22-ijms-13-08752">22</xref>].</p>
<p>Our former work for binding site prediction is based on the integration of sequence conservation information with geometry-based cleft identification. In this study, in order to improve the performance of our work and investigate the contribution of different ranking methods in the prediction of protein-ligand binding sites, five ranking methods (pocket size, distance from the protein centroid, sequence conservation, number of hydrophobic residues, multi-view method) involving four properties have been tested. The results show that (1) if only one property is considered, the use of sequence conservation information helps ranking the pockets best; and (2) the innovative multi-view method, which integrates complementary properties such as pocket size and distance from the protein centroid, can achieve a better performance than if only one individual property is considered.</p></sec>
<sec sec-type="results|discussion">
<title>2. Results and Discussion</title>
<sec>
<title>2.1. Individual Property Comparison</title>
<p>For the bound and unbound/bound test sets, 17 pockets were predicted for each protein on average with our geometry-based site finding method. The TOP1 and TOP3 accuracy differs for different ranking methods. The accuracy of the TOP 1 and TOP 3 in different individual property prediction ranking lists is listed in <xref ref-type="table" rid="t1-ijms-13-08752">Table 1</xref>. A geometry-based method, SURFNET [<xref ref-type="bibr" rid="b8-ijms-13-08752">8</xref>], with its own ranking algorithm is also tested for comparison. It is shown that ranking that presumes binding sites according to conservation score achieves the best performance with a 59% success rate in the top 1 prediction, which means that almost 124 of the 210 proteins in the bound test set are correctly predicted. Ranking with the criterion of “volume and distance from the protein centroid” (shown in the “Distance” column) also performs with better results, which may indicate that the size and the depth of the binding site could be helpful in ligand binding site prediction. However, we found that ranking according to the hydrophobic attribute does not deliver the expected results. We explain this by the fact that the description of hydrophobic properties in our study may be too simple.</p></sec>
<sec>
<title>2.2. Ranking Aggregation from a Multi-View Perspective</title>
<p>In some cases the conservation profiles of proteins are not easily accessible, which may make it impossible to rank presumed binding sites by conservation score. In addition, there is an urgent need for developing an efficient approach to fully integrate various complementary ranking lists from a comprehensive multi-view perspective. Thus in our study, an innovative ranking aggregation method is further applied to address these problems. We integrate the ranking lists of different properties like the combination of “binding site size” and “the distance from the protein centroid”. The corresponding results are listed in <xref ref-type="table" rid="t2-ijms-13-08752">Table 2</xref>. It is shown that after the ranking aggregation, most of the success rates are improved remarkably and some of them are comparable to the conservation ones. These results indicate that the combination of different individual complimentary properties will generally improve the prediction success rate. In addition, “Volume plus Distance” is found to be an alternative to “Conservation” when proteins with no conservation profiles are predicted. An example (PDB: 2SIM [<xref ref-type="bibr" rid="b23-ijms-13-08752">23</xref>]) for such a kind of ranking aggregation is presented in <xref ref-type="table" rid="t3-ijms-13-08752">Table 3</xref>. It can be seen that the ordering of the correctly predicted binding sites (*Pocket 9) is promoted after ranking aggregation, which leads to the improvement of the TOP 1 success rate. In <xref ref-type="fig" rid="f1-ijms-13-08752">Figure 1</xref>, the surface position of Pocket 9 is visualized with Jmol [<xref ref-type="bibr" rid="b24-ijms-13-08752">24</xref>]. However, it is worth noting that when two or more properties that are not complementary are used, such as the information of volume and conservation, the final success rate probably does not show any improvement.</p>
<p>In summary, our study has not only validated the significance of sequence conservation in ligand binding site prediction, but also indicated the usefulness of the size and depth of the binding site in the ranking of binding sites. Furthermore, rather than only considering one property, an innovative multi-view ranking method was applied, which could achieve a much better performance for binding site prediction.</p></sec></sec>
<sec sec-type="methods">
<title>3. Methods</title>
<p>Our study relies on a new protein-ligand binding site prediction method introduced in our previous work. It is based on the integration of geometry and sequence conservation information [<xref ref-type="bibr" rid="b4-ijms-13-08752">4</xref>]. An overview of the ranking study is presented in <xref ref-type="fig" rid="f2-ijms-13-08752">Figure 2</xref>.</p>
<sec>
<title>3.1. Four Properties Used for Ranking</title>
<p>The four properties for the ranking of binding sites are calculated as follows:</p>
<list list-type="order">
<list-item>
<p>Pocket size. This is one of the most popular ranking properties. In this study, the volume of every presumed binding site is calculated with the Qhull program [<xref ref-type="bibr" rid="b25-ijms-13-08752">25</xref>].</p></list-item>
<list-item>
<p>Distance of binding site from the protein centroid. This property is considered to reflect the depth of a presumed binding site. And the distance is defined as the Euclidian distance between the protein centroid and the geometric center of the presumed binding site.</p>
<p>
<disp-formula id="FD1">
<label>(1)</label>
<mml:math id="mm1" display="block">
<mml:semantics id="sm1">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>=</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi></mml:mrow>
<mml:mi>b</mml:mi></mml:msub>
<mml:mo>-</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi></mml:mrow>
<mml:mi>p</mml:mi></mml:msub>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>+</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi></mml:mrow>
<mml:mi>b</mml:mi></mml:msub>
<mml:mo>-</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi></mml:mrow>
<mml:mi>p</mml:mi></mml:msub>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>+</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>z</mml:mi></mml:mrow>
<mml:mi>b</mml:mi></mml:msub>
<mml:mo>-</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>z</mml:mi></mml:mrow>
<mml:mi>p</mml:mi></mml:msub>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:msqrt></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>where (<italic>x</italic><italic><sub>b</sub></italic>, <italic>y</italic><italic><sub>b</sub></italic>, <italic>z</italic><italic><sub>b</sub></italic>) is the coordination of the predicted binding site center, and (<italic>x</italic><italic><sub>p</sub></italic>, <italic>y</italic><italic><sub>p</sub></italic>, <italic>z</italic><italic><sub>p</sub></italic>) is the center of the protein.</p></list-item>
<list-item>
<p>Sequence conservation value. The sequence conservation information is achieved by the ConSurf-DB [<xref ref-type="bibr" rid="b26-ijms-13-08752">26</xref>], which provides the pre-calculated evolutionary conservation profiles for proteins with known structures in the PDB. In ConSurf-DB, every residue in every corresponding protein is evaluated with a normalized conservation score so that its average over all residues is zero and the standard deviation is one. Low (negative) scores indicate the conserved positions while the high scores indicate the variable ones. In our study, the candidate binding sites are ranked according to the conservation score of all residues in the same binding site.</p></list-item>
<list-item>
<p>The number of hydrophobic residues. Due to the importance of hydrophobicity in protein-ligand binding sites [<xref ref-type="bibr" rid="b27-ijms-13-08752">27</xref>,<xref ref-type="bibr" rid="b28-ijms-13-08752">28</xref>], the number of hydrophobic residues in each presumed binding site is also calculated. The hydrophobic residues include ALA, VAL, LEU, ILE, PRO, PHE, TRP and MET. The following equation is used to calculate hydrophobic residues:</p>
<p>
<disp-formula id="FD2">
<label>(2)</label>
<mml:math id="mm2" display="block">
<mml:semantics id="sm2">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi></mml:mrow>
<mml:mi>H</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi></mml:mrow>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>∈</mml:mo>
<mml:mo>{</mml:mo>
<mml:mi>A</mml:mi>
<mml:mi>L</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>V</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>L</mml:mi>
<mml:mi>E</mml:mi>
<mml:mi>U</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>I</mml:mi>
<mml:mi>L</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>R</mml:mi>
<mml:mi>O</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>H</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>R</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>M</mml:mi>
<mml:mi>E</mml:mi>
<mml:mi>T</mml:mi>
<mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></p></list-item></list></sec>
<sec>
<title>3.2. Multi-View Ranking Aggregation</title>
<p>The complementary properties listed above might be helpful in ranking presumed binding sites. Such an innovative ranking aggregation method was also applied in our previous study [<xref ref-type="bibr" rid="b29-ijms-13-08752">29</xref>]. It is based on the equalitarian philosophical paradigm to seek a consensus list among individual ranking lists. Before defining the two distance measures, some necessary notations should be introduced. Let <italic>M</italic><italic><sub>i</sub></italic>(1), ···, <italic>M</italic><italic><sub>i</sub></italic>(<italic>k</italic>) be the scores associated with the ordered list <italic>L</italic><italic><sub>i</sub></italic>, where <italic>M</italic><italic><sub>i</sub></italic>(1) is the best score, <italic>M</italic><italic><sub>i</sub></italic>(2) is the second best one, and so on. Let <italic>r</italic><italic><sub>Li</sub></italic>(<italic>A</italic>) be the rank of A in the list <italic>L</italic><italic><sub>i</sub></italic> if A is within the top <italic>k</italic>, and otherwise equal to <italic>k</italic> + 1. The distance between two ranking lists can be defined as:</p>
<disp-formula id="FD3">
<label>(3)</label>
<mml:math id="mm3" display="block">
<mml:semantics id="sm3">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi></mml:mrow>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi></mml:mrow>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>∪</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:munder>
<mml:mrow>
<mml:mo>∣</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>r</mml:mi></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi></mml:mrow>
<mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>-</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>r</mml:mi></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∣</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>which is also named the Spearman’s footrule distance [<xref ref-type="bibr" rid="b30-ijms-13-08752">30</xref>]. <italic>r</italic><italic><sup>L<sub>j</sub></sup></italic> <italic>(t)</italic> in <xref rid="FD3" ref-type="disp-formula">equation (3)</xref> indicates the position of element <italic>t</italic> in the ordered list <italic>j</italic>.</p>
<p>In order to discover a comprehensive ranking list that would also be as close as possible to all the given ranking lists, an optimization function is defined:</p>
<disp-formula id="FD4">
<label>(4)</label>
<mml:math id="mm4" display="block">
<mml:semantics id="sm4">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>δ</mml:mi></mml:mrow>
<mml:mo>*</mml:mo></mml:msup>
<mml:mo>=</mml:mo>
<mml:mtext>arg </mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext>min</mml:mtext>
<mml:mo stretchy="false">{</mml:mo>
<mml:mi mathvariant="normal">Φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>δ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">}</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>
<disp-formula id="FD5">
<label>(5)</label>
<mml:math id="mm5" display="block">
<mml:semantics id="sm5">
<mml:mrow>
<mml:mi mathvariant="normal">Φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>δ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>m</mml:mi></mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>W</mml:mi></mml:mrow>
<mml:mi>i</mml:mi></mml:msub>
<mml:mi>d</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>δ</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi></mml:mrow>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>where <italic>W</italic><italic><sub>i</sub></italic> is the importance weight of ranking list <italic>L</italic><italic><sub>i</sub></italic>. It is set to one in our study as we treat the four properties equally. Parameter <italic>d</italic>, which is calculated according to Spearman distance, represents the distance between the “comprehensive list” <italic>δ</italic><sup>*</sup> and <italic>L</italic><italic><sub>i</sub></italic>. The goal of the ranking aggregation is to find <italic>δ</italic><sup>*</sup> which minimizes the total distance between the “comprehensive list” and every ranking list. To accomplish this goal, the Cross-Entropy method (CE) [<xref ref-type="bibr" rid="b31-ijms-13-08752">31</xref>] is used here, which is a general Monte Carlo approach for multi-extremum optimization. The CE algorithm requires users to set a number of parameters. It is recommended that the number of samples <italic>N</italic> for each stage is set to at least 10 <italic>k</italic><sup>2</sup>, and the rarity parameter <italic>ρ</italic> in the sampling stage of CE [<xref ref-type="bibr" rid="b31-ijms-13-08752">31</xref>] used to update the cell probabilities is set to 0.01 when <italic>N</italic> is relatively large, and 0.1 when <italic>N</italic> is small (less than 100). All data are aggregated under <italic>R</italic> statistical environment with the <italic>RankAggreg</italic> package.</p></sec>
<sec sec-type="methods">
<title>3.3. Test Dataset and Evaluation of the Pocket Prediction</title>
<p>In this study, two datasets, <italic>i.e.</italic>, the 210 bound structures and 48 unbound/bound structures, which are used to evaluate the LIGSITE<sup>csc</sup> [<xref ref-type="bibr" rid="b16-ijms-13-08752">16</xref>] algorithm are also used as a kind of unbound/bound and bound test set. To assess the quality of binding-site predictions, a standard evaluation method presented previously [<xref ref-type="bibr" rid="b4-ijms-13-08752">4</xref>,<xref ref-type="bibr" rid="b6-ijms-13-08752">6</xref>,<xref ref-type="bibr" rid="b9-ijms-13-08752">9</xref>,<xref ref-type="bibr" rid="b16-ijms-13-08752">16</xref>] is applied, which defines a prediction to be a met, if the geometric center of the presumed pocket lies within 4 Å to any atom of the ligand. Predictions that do not meet this criterion are excluded in the calculation of prediction success rates.</p>
<p>We also used another evaluation measurement, <italic>i.e.</italic>, the Matthews Correlation Coefficient [<xref ref-type="bibr" rid="b32-ijms-13-08752">32</xref>] (MCC) as a comparison. For each protein, residue predictions were classified as true positives (TP: correctly predicted binding site residues), true negatives (TN: correctly predicted nonbinding site residues), false negatives (FN: incorrectly predicted as nonbinding site residues), false positives (FP: incorrectly predicted as binding site residues). The MCC was computed using <xref rid="FD6" ref-type="disp-formula">Equation 6</xref>:</p>
<disp-formula id="FD6">
<label>(6)</label>
<mml:math id="mm6" display="block">
<mml:semantics id="sm6">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>C</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>×</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>×</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi></mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>×</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>×</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>×</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>For the bound and unbound/bound test sets, the MCC score for each protein can be calculated with a certain prediction method. In our implementation, different score can be calculated for different ranking methods. To determine the significant differences between different ranking methods as well as their combinations, the one-sided Wilcoxon signed ranked sum test is used based on MCC scores for each protein. The statistical evaluation is performed using <italic>R</italic> (version 2.15.0).</p>
<p>For the 210 bound structures, the evaluation is very straightforward and we will follow the above described routing procedure. For the unbound/bound dataset, the Biojava development package [<xref ref-type="bibr" rid="b33-ijms-13-08752">33</xref>] is first used for the alignment of all the structures, and the ligands in the bound structures are mirrored to the corresponding unbound structures. Finally the predictions are performed for the unbound structures and then they are checked against the bound structures.</p></sec></sec>
<sec sec-type="conclusions">
<title>4. Conclusions</title>
<p>The prediction of protein-ligand binding sites has great significance for protein function annotation and computer-aided drug design. Besides the binding site identification, the binding sites’ ranking according to their likeliness to accept a molecule is also an important and challenging issue. In order to improve the findings of our previous work, this paper represents an initial effort to study the contribution of different ranking methods to protein-ligand binding site prediction. Five ranking methods (pocket size, distance from the protein centroid, sequence conservation, number of hydrophobic residues, multi-view ranking aggregation) have been tested in our study. The results show that when only one property is considered, the use of sequence conservation information helps ranking the pockets best. In addition, pocket size and depth can also serve as important attributes. Moreover, it is also proven that ranking aggregation which involves complementary properties can obtain a better performance than that of individual properties. This finding not only supports the findings of our previous work, but also provides useful suggestions for other related binding site identification studies.</p></sec></body>
<back>
<ack>
<title>Acknowledgments</title>
<p>This work was supported in part by grants from National Natural Science Foundation of China (Grant No. 30976611, Grant No. 31100956 and Grant No. 61173117). Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20100072120050, Grant No. 20110072120048), and TCM modernization of Shanghai (Grant No. 09dZ1972800).</p></ack>
<ref-list>
<title>References</title>
<ref id="b1-ijms-13-08752"><label>1</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname><given-names>R.</given-names></name><name><surname>Hu</surname><given-names>L.</given-names></name><name><surname>Li</surname><given-names>H.</given-names></name><name><surname>Su</surname><given-names>J.</given-names></name><name><surname>Cao</surname><given-names>Z.</given-names></name><name><surname>Zhang</surname><given-names>W.</given-names></name></person-group><article-title>Novel natural inhibitors of CYP1A2 identified by <italic>in silico</italic> and <italic>in vitro</italic> screening</article-title><source>Int. J. Mol. Sci</source><year>2011</year><volume>12</volume><fpage>3250</fpage><lpage>3262</lpage><pub-id pub-id-type="doi">10.3390/ijms12053250</pub-id><pub-id pub-id-type="pmid">21686183</pub-id></citation></ref>
<ref id="b2-ijms-13-08752"><label>2</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname><given-names>R.</given-names></name><name><surname>Liu</surname><given-names>Q.</given-names></name><name><surname>Tang</surname><given-names>J.</given-names></name><name><surname>Li</surname><given-names>H.</given-names></name><name><surname>Cao</surname><given-names>Z.</given-names></name></person-group><article-title>Investigations on inhibitors of hedgehog signal pathway: A quantitative structure-activity relationship study</article-title><source>Int. J. Mol. Sci</source><year>2011</year><volume>12</volume><fpage>3018</fpage><lpage>3033</lpage><pub-id pub-id-type="doi">10.3390/ijms12053018</pub-id><pub-id pub-id-type="pmid">21686166</pub-id></citation></ref>
<ref id="b3-ijms-13-08752"><label>3</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Henrich</surname><given-names>S.</given-names></name><name><surname>Salo-Ahen</surname><given-names>O.M.</given-names></name><name><surname>Huang</surname><given-names>B.</given-names></name><name><surname>Rippmann</surname><given-names>F.F.</given-names></name><name><surname>Cruciani</surname><given-names>G.</given-names></name><name><surname>Wade</surname><given-names>R.C.</given-names></name></person-group><article-title>Computational approaches to identifying and characterizing protein binding sites for ligand design</article-title><source>J. Mol. Recognit</source><year>2010</year><volume>23</volume><fpage>209</fpage><lpage>219</lpage><pub-id pub-id-type="pmid">19746440</pub-id></citation></ref>
<ref id="b4-ijms-13-08752"><label>4</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dai</surname><given-names>T.</given-names></name><name><surname>Liu</surname><given-names>Q.</given-names></name><name><surname>Gao</surname><given-names>J.</given-names></name><name><surname>Cao</surname><given-names>Z.</given-names></name><name><surname>Zhu</surname><given-names>R</given-names></name></person-group><article-title>A new protein-ligand binding sites prediction method based on the integration of protein sequence conservation information</article-title><source>BMC Bioinforma</source><year>2011</year><volume>12</volume><issue>Suppl 14</issue><fpage>S9</fpage></citation></ref>
<ref id="b5-ijms-13-08752"><label>5</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Levitt</surname><given-names>D.G.</given-names></name><name><surname>Banaszak</surname><given-names>L.J.</given-names></name></person-group><article-title>POCKET: A computer graphics method for identifying and displaying protein cavities and their surrounding amino acids</article-title><source>J. Mol. Graph</source><year>1992</year><volume>10</volume><fpage>229</fpage><lpage>234</lpage><pub-id pub-id-type="pmid">1476996</pub-id></citation></ref>
<ref id="b6-ijms-13-08752"><label>6</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hendlich</surname><given-names>M.</given-names></name><name><surname>Rippmann</surname><given-names>F.</given-names></name><name><surname>Barnickel</surname><given-names>G</given-names></name></person-group><article-title>LIGSITE: Automatic and efficient detection of potential small molecule-binding sites in proteins</article-title><source>J. Mol. Graph. Model</source><year>1997</year><volume>15</volume><fpage>359</fpage><lpage>363</lpage><fpage>389</fpage></citation></ref>
<ref id="b7-ijms-13-08752"><label>7</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brady</surname><given-names>G.P.</given-names><suffix>Jr</suffix></name><name><surname>Stouten</surname><given-names>P.F.</given-names></name></person-group><article-title>Fast prediction and visualization of protein binding pockets with PASS</article-title><source>J. Comput. Aided Mol. Des</source><year>2000</year><volume>14</volume><fpage>383</fpage><lpage>401</lpage><pub-id pub-id-type="pmid">10815774</pub-id></citation></ref>
<ref id="b8-ijms-13-08752"><label>8</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Laskowski</surname><given-names>R.A.</given-names></name></person-group><article-title>SURFNET: A program for visualizing molecular surfaces, cavities, and intermolecular interactions</article-title><source>J. Mol. Graph</source><year>1995</year><volume>13</volume><fpage>323</fpage><lpage>330</lpage><pub-id pub-id-type="pmid">8603061</pub-id></citation></ref>
<ref id="b9-ijms-13-08752"><label>9</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weisel</surname><given-names>M.</given-names></name><name><surname>Proschak</surname><given-names>E.</given-names></name><name><surname>Schneider</surname><given-names>G.</given-names></name></person-group><article-title>PocketPicker: Analysis of ligand binding-sites with shape descriptors</article-title><source>Chem. Cent. J</source><year>2007</year><volume>1</volume><fpage>7</fpage><pub-id pub-id-type="pmid">17880740</pub-id></citation></ref>
<ref id="b10-ijms-13-08752"><label>10</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Laurie</surname><given-names>A.T.</given-names></name><name><surname>Jackson</surname><given-names>R.M.</given-names></name></person-group><article-title>Q-SiteFinder: An energy-based method for the prediction of protein-ligand binding sites</article-title><source>Bioinformatics</source><year>2005</year><volume>21</volume><fpage>1908</fpage><lpage>1916</lpage><pub-id pub-id-type="pmid">15701681</pub-id></citation></ref>
<ref id="b11-ijms-13-08752"><label>11</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname><given-names>Y.Q.</given-names></name><name><surname>Liang</surname><given-names>S.D.</given-names></name><name><surname>Zhang</surname><given-names>C.</given-names></name><name><surname>Liu</surname><given-names>S.</given-names></name></person-group><article-title>Protein binding site prediction using an empirical scoring function</article-title><source>Nucleic Acids Res</source><year>2006</year><volume>34</volume><fpage>3698</fpage><lpage>3707</lpage><pub-id pub-id-type="pmid">16893954</pub-id></citation></ref>
<ref id="b12-ijms-13-08752"><label>12</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sonavane</surname><given-names>S.</given-names></name><name><surname>Chakrabarti</surname><given-names>P.</given-names></name></person-group><article-title>Prediction of active site cleft using support vector machines</article-title><source>J. Chem. Inf. Model</source><year>2010</year><volume>50</volume><fpage>2266</fpage><lpage>2273</lpage><pub-id pub-id-type="pmid">21080689</pub-id></citation></ref>
<ref id="b13-ijms-13-08752"><label>13</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Manning</surname><given-names>J.R.</given-names></name><name><surname>Jefferson</surname><given-names>E.R.</given-names></name><name><surname>Barton</surname><given-names>G.J.</given-names></name></person-group><article-title>The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction</article-title><source>BMC Bioinforma</source><year>2008</year><volume>9</volume><fpage>51</fpage></citation></ref>
<ref id="b14-ijms-13-08752"><label>14</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Caffrey</surname><given-names>D.R.</given-names></name><name><surname>Somaroo</surname><given-names>S.</given-names></name><name><surname>Hughes</surname><given-names>J.D.</given-names></name><name><surname>Mintseris</surname><given-names>J.</given-names></name><name><surname>Huang</surname><given-names>E.S.</given-names></name></person-group><article-title>Are protein-protein interfaces more conserved in sequence than the rest of the protein surface?</article-title><source>Protein Sci</source><year>2004</year><volume>13</volume><fpage>190</fpage><lpage>202</lpage><pub-id pub-id-type="pmid">14691234</pub-id></citation></ref>
<ref id="b15-ijms-13-08752"><label>15</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Prymula</surname><given-names>K.</given-names></name><name><surname>Jadczyk</surname><given-names>T.</given-names></name><name><surname>Roterman</surname><given-names>I.</given-names></name></person-group><article-title>Catalytic residues in hydrolases: Analysis of methods designed for ligand-binding site prediction</article-title><source>J. Comput. Aided Mol. Des</source><year>2011</year><volume>25</volume><fpage>117</fpage><lpage>133</lpage><pub-id pub-id-type="pmid">21104192</pub-id></citation></ref>
<ref id="b16-ijms-13-08752"><label>16</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname><given-names>B.</given-names></name><name><surname>Schroeder</surname><given-names>M.</given-names></name></person-group><article-title>LIGSITE<sup>csc</sup>: Predicting ligand binding sites using the Connolly surface and degree of conservation</article-title><source>BMC Struct. Biol</source><year>2006</year><volume>6</volume><fpage>19</fpage><pub-id pub-id-type="pmid">16995956</pub-id></citation></ref>
<ref id="b17-ijms-13-08752"><label>17</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname><given-names>B.</given-names></name></person-group><article-title>MetaPocket: A meta approach to improve protein ligand binding site prediction</article-title><source>OMICS</source><year>2009</year><volume>13</volume><fpage>325</fpage><lpage>330</lpage><pub-id pub-id-type="pmid">19645590</pub-id></citation></ref>
<ref id="b18-ijms-13-08752"><label>18</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Le Guilloux</surname><given-names>V.</given-names></name><name><surname>Schmidtke</surname><given-names>P.</given-names></name><name><surname>Tuffery</surname><given-names>P.</given-names></name></person-group><article-title>Fpocket: An open source platform for ligand pocket detection</article-title><source>BMC Bioinforma</source><year>2009</year><volume>10</volume><fpage>168</fpage></citation></ref>
<ref id="b19-ijms-13-08752"><label>19</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Savir</surname><given-names>Y.</given-names></name><name><surname>Tlusty</surname><given-names>T.</given-names></name></person-group><article-title>Conformational proofreading: The impact of conformational changes on the specificity of molecular recognition</article-title><source>PLoS One</source><year>2007</year><volume>2</volume><fpage>e468</fpage><pub-id pub-id-type="pmid">17520027</pub-id></citation></ref>
<ref id="b20-ijms-13-08752"><label>20</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nayal</surname><given-names>M.</given-names></name><name><surname>Honig</surname><given-names>B.</given-names></name></person-group><article-title>On the nature of cavities on protein surfaces: Application to the identification of drug-binding sites</article-title><source>Proteins</source><year>2006</year><volume>63</volume><fpage>892</fpage><lpage>906</lpage><pub-id pub-id-type="pmid">16477622</pub-id></citation></ref>
<ref id="b21-ijms-13-08752"><label>21</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>An</surname><given-names>J.</given-names></name><name><surname>Totrov</surname><given-names>M.</given-names></name><name><surname>Abagyan</surname><given-names>R.</given-names></name></person-group><article-title>Comprehensive identification of “druggable” protein ligand binding sites</article-title><source>Genome Inform</source><year>2004</year><volume>15</volume><fpage>31</fpage><lpage>41</lpage><pub-id pub-id-type="pmid">15706489</pub-id></citation></ref>
<ref id="b22-ijms-13-08752"><label>22</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhong</surname><given-names>S.</given-names></name><name><surname>MacKerell</surname><given-names>A.D.</given-names><suffix>Jr</suffix></name></person-group><article-title>Binding response: A descriptor for selecting ligand binding site on protein surfaces</article-title><source>J. Chem. Inf. Model</source><year>2007</year><volume>47</volume><fpage>2303</fpage><lpage>2315</lpage><pub-id pub-id-type="pmid">17900106</pub-id></citation></ref>
<ref id="b23-ijms-13-08752"><label>23</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Crennell</surname><given-names>S.J.</given-names></name><name><surname>Garman</surname><given-names>E.F.</given-names></name><name><surname>Philippon</surname><given-names>C.</given-names></name><name><surname>Vasella</surname><given-names>A.</given-names></name><name><surname>Laver</surname><given-names>W.G.</given-names></name><name><surname>Vimr</surname><given-names>E.R.</given-names></name><name><surname>Taylor</surname><given-names>G.L.</given-names></name></person-group><article-title>The structures of Salmonella typhimurium LT2 neuraminidase and its complexes with three inhibitors at high resolution</article-title><source>J. Mol. Biol</source><year>1996</year><volume>259</volume><fpage>264</fpage><lpage>280</lpage><pub-id pub-id-type="pmid">8656428</pub-id></citation></ref>
<ref id="b24-ijms-13-08752"><label>24</label><citation citation-type="web"><source>Jmol: An open-source Java viewer for chemical structures in 3D</source><comment>Available online: <ext-link xlink:href="http://www.jmol.org/" ext-link-type="uri">http://www.jmol.org/</ext-link></comment><access-date>accessed on 29 March 2012</access-date></citation></ref>
<ref id="b25-ijms-13-08752"><label>25</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barber</surname><given-names>C.B.</given-names></name><name><surname>Dobkin</surname><given-names>D.P.</given-names></name><name><surname>Huhdanpaa</surname><given-names>H.</given-names></name></person-group><article-title>The Quickhull algorithm for convex hulls</article-title><source>ACM Trans. Math. Softw</source><year>1996</year><volume>22</volume><fpage>469</fpage><lpage>483</lpage></citation></ref>
<ref id="b26-ijms-13-08752"><label>26</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goldenberg</surname><given-names>O.</given-names></name><name><surname>Erez</surname><given-names>E.</given-names></name><name><surname>Nimrod</surname><given-names>G.</given-names></name><name><surname>Ben-Tal</surname><given-names>N.</given-names></name></person-group><article-title>The ConSurf-DB: Pre-calculated evolutionary conservation profiles of protein structures</article-title><source>Nucleic Acids Res</source><year>2009</year><volume>37</volume><fpage>D323</fpage><lpage>D327</lpage><pub-id pub-id-type="pmid">18971256</pub-id></citation></ref>
<ref id="b27-ijms-13-08752"><label>27</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname><given-names>L.</given-names></name><name><surname>Berne</surname><given-names>B.J.</given-names></name><name><surname>Friesner</surname><given-names>R.A.</given-names></name></person-group><article-title>Ligand binding to protein-binding pockets with wet and dry regions</article-title><source>Proc. Natl. Acad. Sci. USA</source><year>2011</year><volume>108</volume><fpage>1326</fpage><lpage>1330</lpage><pub-id pub-id-type="pmid">21205906</pub-id></citation></ref>
<ref id="b28-ijms-13-08752"><label>28</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guharoy</surname><given-names>M.</given-names></name><name><surname>Chakrabarti</surname><given-names>P.</given-names></name></person-group><article-title>Conserved residue clusters at protein-protein interfaces and their use in binding site identification</article-title><source>BMC Bioinforma</source><year>2010</year><volume>11</volume><fpage>286</fpage></citation></ref>
<ref id="b29-ijms-13-08752"><label>29</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kang</surname><given-names>H.</given-names></name><name><surname>Sheng</surname><given-names>Z.</given-names></name><name><surname>Zhu</surname><given-names>R.</given-names></name><name><surname>Huang</surname><given-names>Q.</given-names></name><name><surname>Liu</surname><given-names>Q.</given-names></name><name><surname>Cao</surname><given-names>Z.</given-names></name></person-group><article-title>Virtual drug screen schema based on multiview similarity integration and ranking aggregation</article-title><source>J. Chem. Inf. Model</source><year>2012</year><volume>52</volume><fpage>834</fpage><lpage>843</lpage><pub-id pub-id-type="pmid">22332590</pub-id></citation></ref>
<ref id="b30-ijms-13-08752"><label>30</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fagin</surname><given-names>R.</given-names></name><name><surname>Kumar</surname><given-names>R.</given-names></name><name><surname>Sivakumar</surname><given-names>D.</given-names></name></person-group><article-title>Comparing top k lists</article-title><source>SIAM J. Discret. Math</source><year>2003</year><volume>17</volume><fpage>134</fpage><lpage>160</lpage></citation></ref>
<ref id="b31-ijms-13-08752"><label>31</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pihur</surname><given-names>V.</given-names></name><name><surname>Datta</surname><given-names>S.</given-names></name></person-group><article-title>Weighted rank aggregation of cluster validation measures: A Monte Carlo cross-entropy approach</article-title><source>Bioinformatics</source><year>2007</year><volume>23</volume><fpage>1607</fpage><lpage>1615</lpage><pub-id pub-id-type="pmid">17483500</pub-id></citation></ref>
<ref id="b32-ijms-13-08752"><label>32</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Matthews</surname><given-names>B.W.</given-names></name></person-group><article-title>Comparison of the predicted and observed secondary structure of T4 phage lysozyme</article-title><source>Biochim. Biophys. Acta</source><year>1975</year><volume>405</volume><fpage>442</fpage><lpage>451</lpage><pub-id pub-id-type="pmid">1180967</pub-id></citation></ref>
<ref id="b33-ijms-13-08752"><label>33</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holland</surname><given-names>R.C.</given-names></name><name><surname>Down</surname><given-names>T.A.</given-names></name><name><surname>Pocock</surname><given-names>M.</given-names></name><name><surname>Prlic</surname><given-names>A.</given-names></name><name><surname>Huen</surname><given-names>D.</given-names></name><name><surname>James</surname><given-names>K.</given-names></name><name><surname>Foisy</surname><given-names>S.</given-names></name><name><surname>Drager</surname><given-names>A.</given-names></name><name><surname>Yates</surname><given-names>A.</given-names></name><name><surname>Heuer</surname><given-names>M.</given-names></name><etal/></person-group><article-title>BioJava: An open-source framework for bioinformatics</article-title><source>Bioinformatics</source><year>2008</year><volume>24</volume><fpage>2096</fpage><lpage>2097</lpage><pub-id pub-id-type="pmid">18689808</pub-id></citation></ref></ref-list>
<sec sec-type="display-objects">
<title>Figures and Tables</title>
<fig id="f1-ijms-13-08752" position="float">
<label>Figure 1</label>
<caption>
<p>The surface position of Pocket 9 in protein structure. PDB ID: 2SIM. (Red points: water molecule; Light blue: the whole protein; Golden: molecular ligand; Purple: predicted binding site constituted by amino acids).</p></caption>
<graphic xlink:href="ijms-13-08752f1.gif"/></fig>
<fig id="f2-ijms-13-08752" position="float">
<label>Figure 2</label>
<caption>
<p>The concept of multi-view ranking aggregation.</p></caption>
<graphic xlink:href="ijms-13-08752f2.gif"/></fig>
<table-wrap id="t1-ijms-13-08752" position="float">
<label>Table 1</label>
<caption>
<p>Prediction success rate presented by different ranking methods.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="middle"/>
<th colspan="3" align="center" valign="middle">Bound</th>
<th colspan="3" align="center" valign="middle">Unbound/bound</th></tr>
<tr>
<th align="left" valign="middle"/>
<th colspan="3" align="left" valign="middle">
<hr/></th>
<th colspan="3" align="left" valign="middle">
<hr/></th></tr>
<tr>
<th align="center" valign="middle">Methods</th>
<th align="center" valign="middle">TOP1</th>
<th align="center" valign="middle">MCC for TOP1</th>
<th align="center" valign="middle">TOP3</th>
<th align="center" valign="middle">TOP1</th>
<th align="center" valign="middle">MCC for TOP1</th>
<th align="center" valign="middle">TOP3</th></tr></thead>
<tbody>
<tr>
<td align="left" valign="top">Conservation score</td>
<td align="center" valign="top">59%</td>
<td align="center" valign="top">0.53</td>
<td align="center" valign="top">73%</td>
<td align="center" valign="top">57</td>
<td align="center" valign="top">0.53</td>
<td align="center" valign="top">72</td></tr>
<tr>
<td align="left" valign="top">Distance</td>
<td align="center" valign="top">48%</td>
<td align="center" valign="top">0.53</td>
<td align="center" valign="top">66%</td>
<td align="center" valign="top">56</td>
<td align="center" valign="top">0.53</td>
<td align="center" valign="top">70</td></tr>
<tr>
<td align="left" valign="top">Volume</td>
<td align="center" valign="top">47%</td>
<td align="center" valign="top">0.50</td>
<td align="center" valign="top">69%</td>
<td align="center" valign="top">44</td>
<td align="center" valign="top">0.53</td>
<td align="center" valign="top">59</td></tr>
<tr>
<td align="left" valign="top">Hydrophobic</td>
<td align="center" valign="top">39%</td>
<td align="center" valign="top">0.51</td>
<td align="center" valign="top">62%</td>
<td align="center" valign="top">30</td>
<td align="center" valign="top">0.51</td>
<td align="center" valign="top">48</td></tr>
<tr>
<td align="left" valign="top">SURFNET (Control)</td>
<td align="center" valign="top">42%</td>
<td align="center" valign="top">~</td>
<td align="center" valign="top">57%</td>
<td align="center" valign="top">~</td>
<td align="center" valign="top">~</td>
<td align="center" valign="top">~</td></tr></tbody></table></table-wrap>
<table-wrap id="t2-ijms-13-08752" position="float">
<label>Table 2</label>
<caption>
<p>Prediction success rate of ranking aggregation.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="middle"/>
<th colspan="3" align="center" valign="middle">Bound</th>
<th colspan="3" align="center" valign="middle">Unbound/bound</th></tr>
<tr>
<th align="center" valign="middle"/>
<th colspan="3" align="left" valign="middle">
<hr/></th>
<th colspan="3" align="left" valign="middle">
<hr/></th></tr>
<tr>
<th align="center" valign="middle">Methods</th>
<th align="center" valign="middle">TOP1</th>
<th align="center" valign="middle">MCC <xref ref-type="table-fn" rid="tfn1-ijms-13-08752">*</xref> for TOP1</th>
<th align="center" valign="middle">TOP3</th>
<th align="center" valign="middle">TOP1</th>
<th align="center" valign="middle">MCC for TOP1</th>
<th align="center" valign="middle">TOP3</th></tr></thead>
<tbody>
<tr>
<td align="left" valign="top">CON + DIS</td>
<td align="center" valign="top">57%</td>
<td align="center" valign="top">0.52</td>
<td align="center" valign="top">74%</td>
<td align="center" valign="top">61</td>
<td align="center" valign="top">0.53</td>
<td align="center" valign="top">74</td></tr>
<tr>
<td align="left" valign="top">VOL + DIS</td>
<td align="center" valign="top">52%</td>
<td align="center" valign="top">0.51</td>
<td align="center" valign="top">73%</td>
<td align="center" valign="top">54</td>
<td align="center" valign="top">0.53</td>
<td align="center" valign="top">74</td></tr>
<tr>
<td align="left" valign="top">CON + VOL</td>
<td align="center" valign="top">52%</td>
<td align="center" valign="top">0.52</td>
<td align="center" valign="top">72%</td>
<td align="center" valign="top">48</td>
<td align="center" valign="top">0.54</td>
<td align="center" valign="top">65</td></tr>
<tr>
<td align="left" valign="top">VOL + HYDRO</td>
<td align="center" valign="top">46%</td>
<td align="center" valign="top">0.50</td>
<td align="center" valign="top">67%</td>
<td align="center" valign="top">39</td>
<td align="center" valign="top">0.53</td>
<td align="center" valign="top">61</td></tr>
<tr>
<td align="left" valign="top">DIS + HYDRO</td>
<td align="center" valign="top">47%</td>
<td align="center" valign="top">0.51</td>
<td align="center" valign="top">68%</td>
<td align="center" valign="top">44</td>
<td align="center" valign="top">0.49</td>
<td align="center" valign="top">63</td></tr>
<tr>
<td align="left" valign="top">CON + HYDRO</td>
<td align="center" valign="top">53%</td>
<td align="center" valign="top">0.51</td>
<td align="center" valign="top">70%</td>
<td align="center" valign="top">39</td>
<td align="center" valign="top">0.53</td>
<td align="center" valign="top">61</td></tr>
<tr>
<td align="left" valign="top">DIS + CON + HYDRO</td>
<td align="center" valign="top">53%</td>
<td align="center" valign="top">0.50</td>
<td align="center" valign="top">72%</td>
<td align="center" valign="top">48</td>
<td align="center" valign="top">0.51</td>
<td align="center" valign="top">67</td></tr>
<tr>
<td align="left" valign="top">VOL + CON + HYDRO</td>
<td align="center" valign="top">51%</td>
<td align="center" valign="top">0.52</td>
<td align="center" valign="top">71%</td>
<td align="center" valign="top">41</td>
<td align="center" valign="top">0.55</td>
<td align="center" valign="top">63</td></tr>
<tr>
<td align="left" valign="top">VOL + DIS + HYDRO</td>
<td align="center" valign="top">50%</td>
<td align="center" valign="top">0.52</td>
<td align="center" valign="top">71%</td>
<td align="center" valign="top">46</td>
<td align="center" valign="top">0.50</td>
<td align="center" valign="top">67</td></tr>
<tr>
<td align="left" valign="top">VOL + DIS + CON</td>
<td align="center" valign="top">54%</td>
<td align="center" valign="top">0.51</td>
<td align="center" valign="top">73%</td>
<td align="center" valign="top">52</td>
<td align="center" valign="top">0.53</td>
<td align="center" valign="top">74</td></tr>
<tr>
<td align="left" valign="top">VOL + DIS + CON + HYDRO</td>
<td align="center" valign="top">53%</td>
<td align="center" valign="top">0.52</td>
<td align="center" valign="top">72%</td>
<td align="center" valign="top">48</td>
<td align="center" valign="top">0.53</td>
<td align="center" valign="top">67</td></tr></tbody></table>
<table-wrap-foot><fn id="tfn1-ijms-13-08752">
<label>*</label>
<p>The one-sided Wilcoxon signed ranked sum test is used based on the Matthews Correlation Coefficient (MCC) scores for each protein. The <italic>p</italic> values for the comparison of different methods are listed in the Supporting Information (Table S1 for bound test set, S2 for unbound/bound test set).</p></fn></table-wrap-foot></table-wrap>
<table-wrap id="t3-ijms-13-08752" position="float">
<label>Table 3</label>
<caption>
<p>Part of results obtained for different ranking methods, which include volume (VOL), distance of presumed binding sites from the protein centroid (DIS), rank aggregation (REG) for VOL and DIS, and conservation score (CONS).</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="bottom">Rank</th>
<th align="center" valign="bottom">VOL</th>
<th align="center" valign="bottom">DIS</th>
<th align="center" valign="bottom">REG</th>
<th align="center" valign="bottom">CONS</th></tr></thead>
<tbody>
<tr>
<td align="center" valign="top">1</td>
<td align="center" valign="top">Pocket 0</td>
<td align="center" valign="top">Pocket 12</td>
<td align="center" valign="top"><xref ref-type="table-fn" rid="tfn2-ijms-13-08752">*</xref><bold>Pocket 9</bold></td>
<td align="center" valign="top"><xref ref-type="table-fn" rid="tfn2-ijms-13-08752">*</xref><bold>Pocket 9</bold></td></tr>
<tr>
<td align="center" valign="top">2</td>
<td align="center" valign="top"><xref ref-type="table-fn" rid="tfn2-ijms-13-08752">*</xref><bold>Pocket 9</bold></td>
<td align="center" valign="top"><xref ref-type="table-fn" rid="tfn2-ijms-13-08752">*</xref><bold>Pocket 9</bold></td>
<td align="center" valign="top">Pocket 0</td>
<td align="center" valign="top">Pocket 5</td></tr>
<tr>
<td align="center" valign="top">3</td>
<td align="center" valign="top">Pocket 5</td>
<td align="center" valign="top">Pocket 0</td>
<td align="center" valign="top">Pocket 10</td>
<td align="center" valign="top">Pocket 0</td></tr>
<tr>
<td align="center" valign="top">4</td>
<td align="center" valign="top">Pocket 10</td>
<td align="center" valign="top">Pocket 7</td>
<td align="center" valign="top">Pocket 12</td>
<td align="center" valign="top">Pocket 2</td></tr></tbody></table>
<table-wrap-foot><fn id="tfn2-ijms-13-08752">
<label>*</label>
<p>Pocket 9 corresponds to the observed binding site.</p></fn></table-wrap-foot></table-wrap></sec></back></article>
