These authors contributed equally to this work.

This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Molecular docking is an important method for the research of protein-protein interaction and recognition. A protein can be considered as a network when the residues are treated as its nodes. With the contact energy between residues as link weight, a weighted residue network is constructed in this paper. Two weighted parameters (strength and weighted average nearest neighbors’ degree) are introduced into this model at the same time. The stability of a protein is characterized by its strength. The global topological properties of the protein-protein complex are reflected by the weighted average nearest neighbors’ degree. Based on this weighted network model and these two parameters, a new docking scoring function is proposed in this paper. The scoring and ranking for 42 systems’ bound and unbounded docking results are performed with this new scoring function. Comparing the results obtained from this new scoring function with that from the pair potentials scoring function, we found that this new scoring function has a similar performance to the pair potentials on some items, and this new scoring function can get a better success rate. The calculation of this new scoring function is easy, and the result of its scoring and ranking is acceptable. This work can help us better understand the mechanisms of protein-protein interactions and recognition.

Protein-protein docking is an important method for the protein-protein interaction and molecular recognition [

From the view point of complex network [

In the residue network model, we simplify each residue to a single point, and this point is used to be the node of the network. Based on the distances between these points, the links between them can be set. If the distance between two nodes is less than a cut-off value, then there will be a link between these two nodes [

In our previous works, we have used the residue network to do the protein-protein docking scoring [

When we consider the contact energy between residues, we can get some interesting results [

From the data set of protein-protein docking benchmark 2.0 [

In the docking benchmark 2.0, there are four types of test cases: enzyme-inhibitor, antibody-antigen, other and difficult test cases. For the antibody-antigen complexes, there are some complementarity-determining regions, and the binding modes in the antibody-antigen complexes are relatively fixed. So we did not include the antibody-antigen complexes in our test.

In the remainder of test cases, only the single-chain monomer structures for ligand and receptor were selected to do the docking and the residue network analysis. These 42 complexes can be classified into two groups. The ‘Enzyme-Inhibitor’ group contains 18 complex structures. The ‘others’ group contains 24 structures. The size of these protein complexes is from 185 to 1100. With the RosettaDock 1.0 program [

In the residue network, the geometrical center of each amino acid’s side chain is chosen to act as network node. The link between a pair of nodes is determined by the distance between these two nodes. If the distance between residues _{ij}_{c}

Based on the contact energies between residues, the weighted network can be constructed, and its adjacency matrix element can be expressed as:

where _{ij}

For the contact energies between residues used in this work, all the values of the contact energies are less than zero. In other words, all the energies in this set of potential are negative value. So, the use of the absolute value of the contact energy is reasonable, and we can do the addition and subtraction between these absolute values.

For the covalent bond between residues

Additionally, a new network parameter–strength (marked with S) is introduced into the weighted residue network. The definition of strength of node

where _{ij}^{w}_{i}

For the contact energy between residues suggested by Miyazawa

In the unweighted network model, the average nearest neighbors degree _{nn}_{,}_{i}

where _{ij}_{i}_{nn}_{,}_{i}^{w}

where the meaning of _{i}_{ij}_{ij}_{i}_{nn}^{w}

where _{nn}_{,}_{i}^{w}

The nearest neighbor degree of a given node measures the effective attraction of this node to connect with its environment. For nodes with a high degree or low degree in the environment of a specific node, the weight of the interactions is taken as a referential meaning. The weighted average nearest neighbors degree of the whole network measures the weighted assortative or disassortative properties of the whole weighted network, also with the weights of actual interactions among nodes as a referential standard. This parameter can be used to evaluate the connection mode between different nodes with various degrees.

When we get the docking results, we superimpose the receptors of the decoy onto the native structure, so the RMSD of the ligand (L_RMSD) over its backbone atoms (_{α}

For these decoy structures, we analyze two parameters of the weighted residue network: the strength ( _{nn}^{w}_{nn}^{w}

From these two figures, we can find that _{nn}^{w}

For the parameter _{nn}^{w}

So, if we use a linear combination of these two parameters, the relative size of these two items will change dramatically with the system size. For a bigger system, the whole strength of the network will have a bigger contribution than that of 〈_{nn}^{w}_{nn}^{w}_{n}

For all 42 systems, we do a bound and unbound docking calculation in this paper. In order to assess the quality of this scoring function, we use some indicators to evaluate its discriminative ability for the docking decoy, such as: the correlation coefficients between the scoring values and L_RMSD; L_RMSD of the first rank; rank of the first hit and number of hits in top 10 scores. All these four indicators are commonly used in the evaluation process of other docking scoring functions. We select the pair potentials (RP) scoring function [

With the 1udi system as an illustration,

For the correlation coefficients between the scoring values and L_RMSD, it reflects the scoring results from the point view of whole, and a higher correlation coefficient value is more accepted. Through the comparison between

For the unbound decoys of all 42 systems, the comparison results on this indicator between Sn and RP is shown in

For the number of hits in top 10 scores, it measures the discriminatory power of the scoring function to pick hits out in their top 10 scores. This indicator is a most indicative one, so this parameter is commonly used to measure the ‘success rate’ in docking. The more hits that are picked out, the more preferable the scoring function can be considered.

There are 30 systems on which the Sn and the RP do not pick the hit out in their top 10 scores. In the remaining 12 systems, there are six systems on which the Sn gets more hits in their top 10 scores than that of RP, and there is one system that the Sn gets the same account of hits as RP. All the comparisons regarding the number of hits are shown in

For the rank of the first hit, it reports the rank position of the best decoy in the scoring sequence. If the rank value is 1,

The RMSD of 1st rank reflects the quality of the first decoy in the rank sequence. The smaller the RMSD of the first ranked decoy, the better the scoring function performed. The comparison result on this indicator between Sn and RP on 42 systems is shown in

For the first rank structures, if its RMSD is larger than 10 Angstroms, this first rank structure should be thought of as wrong. Consequently there is no sense in comparing these structures.

There are 19 complexes with a low L_RMSD (less than 10 Angstroms) in

From a global perspective, we carried out a performance evaluation of the Sn scoring function. As generally used in the related work, the success rate of a scoring function is used to measure its average ability to rank a near-native structure within some number of predictions (NP).

If we can find at least one near-native structure in the top NP decoys from the ranked queue, this case is defined as a successful case under NP. Then we can calculate the percentage of successful test cases in the data set. When the NP changed, we were able to get the success rate curve.

We calculated the success rate for the Sn scoring function and the RP scoring function. For a comparison, we also did a random ranking for 100 times, and then calculated its average success rate. For the unbound decoys, the success rate of Sn, RP and a random scoring are shown in

From

From these comparison results, as a whole, we can conclude that the Sn scoring function has a similar discriminative ability with the RP scoring function for unbounded docking decoy.

For the bound docking results, we also did the comparisons from these four point views. For the correlation coefficients between the scoring values and L_RMSD, we find that there are 22 systems on which Sn has a better performance than the RP scoring function. For the number of hits in top 10 scores, there are 18 systems on which Sn picks more hits out in their top 10 scores, and there are 15 systems that these two scoring function get the same amount of hits in the top 10. Sn has a powerful discrimination to pick more hits out in their top 10 scores than the RP. For the rank of the first hit, we find that there are 19 systems in which the first hit is ranked ahead by Sn than the RP score. In addition, there are 13 systems that the rank position of the first hit is same in the Sn as that in the RP. For the RMSD of 1st rank, there are 29 complexes with a small RMSD (less than 10 Angstroms). For these 29 systems, there are 16 systems in which the Sn function gets smaller RMSD than the RP scoring, and there are 13 systems that the Sn function gets higher RMSD than the RP scoring.

For bounded docking decoy, we also compared the success rate between the Sn scoring function, RP scoring function and a random rank. The result is shown in

From

Because the hit number of a bound docking is bigger than that of an unbound docking, the random rank of bound decoys set will get a better performance than a random rank for a set of unbound docking decoys.

From the comparison of the results mentioned above, we can conclude that Sn also has a similar discriminative ability with the RP scoring function on the scoring and ranking of the bounded docking decoy.

On the different group, the same scoring function has a different performance. The results of Sn on the Enzyme-Inhibitor are better than that of the ‘others’ type. The highest correlation coefficients between the scoring values and L_RMSD is 0.7, obtained from the 1udi system in the Enzyme-Inhibitor group. The main reason for this phenomenon is that the ‘others’ complex is a kind of structure which is difficult to research in the docking. The ‘others’ complex often holds an important role in the signal transduction or in the synergistic effect of organism. They have the essential characteristic of drug identification targets. This type of complex has a great theoretical research value and a potential application prospect. However, if the conformation change before and after the binding is great, then the sampling and the scoring all have certain difficulties in the docking process.

Based on the weighted residue network, we proposed a new docking scoring function Sn. With this scoring function, we do the scoring rank for 42 systems’ bound and unbound docking results. Comparing with the results obtained from the RP scoring function, we find that the Sn scoring function has a similar performance with the RP on four items. On some special systems, or on some indicators, this new scorning function has a better performance. When comparing the success rate, Sn has a better performance than RP. So, we can conclude that Sn has a higher power to pick the hit out than RP.

Compared with other types scoring function, the advantage of this new scoring function is the simplicity and clearness of its calculation. It does not need a heavy computation, but the scoring rank result is acceptable.

Furthermore, with the weighted residue network model, the global topological characteristics of the protein-protein complex can be considered in this scoring function. The detail of the interaction between residues, containing the interaction modes and interaction strength, will be taken into account in the calculation of this scoring function. It is helpful for the explanation of the structure mechanisms for protein-protein interactions.

In this work, we only test this new scoring function with 42 single-chain monomer structures. Actually, it can be used to evaluate multi-chain protein complexes.

This scoring function can be used as a scoring item for a combinational scoring function, and the related work is undergoing. There are some new scoring methods, and some relative works have been a very big inspiration to our work [

This work was supported by National Natural Science Foundation of China (Grant No. 31070828 and 11047183), Natural Science Foundation of Shanxi (Grant No. 2009021018-2) and China Postdoctoral Science Foundation funded project (Grant No. 20100471587). An earlier version of this paper was presented at the International Conference on BMEI 2011.

The relationship between the whole strength of the network (S) and the L_RMSD (The 1udi was taken as an illustration).

The relationship between the weighted average nearest neighbors’ degree and the L_RMSD (The 1udi was taken as an illustration).

The correlation between the L_RMSD and the Sn scoring function (The 1udi was taken as an illustration).

The correlation between the L_RMSD and the RP scoring function (The 1udi was taken as an illustration).

For the unbound dock results, the comparison of the correlation between the score value and the L_RMSD for Sn and RP score.

For the unbound dock results, the comparison of the numbers of the hit structures of the Sn and RP score function.

For the unbound dock results, the comparison of the rank of the first hit structure of the Sn and RP score function.

For the unbound dock results, the comparison of the RMSD of the first rank structures obtained from the Sn and the RP score function.

For the unbound dock results, the comparison of the success rate between the Sn scoring function, RP scoring function and a random rank.

For the bound dock results, the comparison of the success rate between the Sn scoring function, RP scoring function and a random rank.