- freely available
- re-usable

*Pharmaceuticals*
**2012**,
*5*(10),
1064-1079;
doi:10.3390/ph5101064

^{1}

^{2}

## Abstract

**:**We have developed a method for estimating protein-ligand binding free energy (ΔG) based on the direct protein-ligand interaction obtained by a molecular dynamics simulation. Using this method, we estimated the ΔG value statistically by the average values of the van der Waals and electrostatic interactions between each amino acid of the target protein and the ligand molecule. In addition, we introduced fluctuations in the accessible surface area (ASA) and dihedral angles of the protein-ligand complex system as the entropy terms of the ΔG estimation. The present method included the fluctuation term of structural change of the protein and the effective dielectric constant. We applied this method to 34 protein-ligand complex structures. As a result, the correlation coefficient between the experimental and calculated ΔG values was 0.81, and the average error of ΔG was 1.2 kcal/mol with the use of the fixed parameters. These results were obtained from a 2 nsec molecular dynamics simulation.

## 1. Introduction

The protein-ligand binding free energy (ΔG) has been calculated by various computational methods. Many protein-ligand docking programs have been developed to estimate ΔG [1,2,3,4,5,6,7], but the existing docking software is relatively inaccurate [1,2]. There is an almost 50% success rate of reproducing a protein-ligand complex structure within a root mean square deviation (RMSD) of <2 Å [6,7] and the accuracy of ΔG estimation remains at approximately 2–3 kcal/mol [6,7,8,9,10].

There have been several reports on protein-compound docking and free energy calculation by molecular dynamics (MD) simulation. Even if a protein-ligand complex structure is unknown, ab initio MD docking simulations show protein-ligand complex structures and free energy landscapes [11,12,13,14]. Generalized ensemble methods have been adopted for wide conformational searches [15,16,17,18].

In an explicit water model, if a protein-ligand complex structure is known, the binding free energy and the potential of mean force (PMF) along the dissociation path can be obtained by using the filling potential (FP) method [18], the meta-dynamics method [19,20], the smooth reaction path generation (SRPG) method [21], or Jarzynski’s method [22]. We previously proposed the FP and SRPG methods [18,21], each of which generates a reaction path (dissociation path) of the ligand and calculates the free energy surface along the path based on ab initio MD simulation. The other trend is the application of Jarzynski’s equation [22]. In this method, a harmonic potential that restrains the ligand at a particular position moves slowly and leads the ligand from the binding state to the dissociation state, and the free energy profile is calculated. Among these methods, MP-CAFFE has been applied to various species, and the ΔG estimation error was almost 1 kcal/mol [23].

The molecular-mechanics Poisson-Boltzmann surface-area (MMPBSA) method [24] and the linear interaction energy (LIE) method [25] have successfully been used to reproduce the trend of ΔGs for a single target protein. These methods are much faster than the ab initio MD methods described above. In the LIE method, ΔG is evaluated based on the average van der Waals (vdW) energy and the average electrostatic energy. The weight parameters of the vdW and electrostatic terms are optimized for each target. To apply the LIE method, multiple active compounds and their docking poses are necessary in order to optimize the parameters for each target protein.

The COMBINE method is based on the assumption that biological activities can be correlated with a linear combination of a subset of the van der Waals and electrostatic terms of the interaction energies between a ligand and its surrounding protein residues (such as the target receptor) [26,27]. The protein-ligand binding free energy ΔG is given by:

where E_{i}^{vdw} and E_{i}^{ele} are the van der Waals and electrostatic terms of the interaction energies, respectively, between the ligand and the i-th residue of a protein (the target protein) and c is a constant. w_{i}^{vdw} and w_{i}^{ele} are parameters to be determined to reproduce the experimental data.

The coefficients of Equation (1) (w_{i}^{vdw} and w_{i}^{ele}) could be determined by partial least squares (PLS) analysis. As is the case with the LIE, to apply the COMBINE method, multiple active compounds and their docking poses are necessary in order to optimize the parameters for each target protein. It has been shown that the COMBINE analysis predicts binding free energies with good accuracy and also identifies important amino acid residues for the improvement of affinity [26,28,29,30,31,32].

In the present study, we propose a ΔG estimation method based on the direct protein-ligand interaction obtained by molecular dynamics simulation. We introduced the entropy term and the local effective dielectric constant, and modified the van der Waals potential to improve the accuracy of the present method so that it does not require multiple active compounds to predict the ΔG value.

## 2. Results and Discussion

#### 2.1. Theoretical Background

ΔG is calculated by Zwanzig equation as follows [33]:

where U_{b}, U_{u}, and < >_{b} represent the potential of the protein-ligand bound and unbound states and the average over the bound-state trajectory, respectively.

Kubo’s cumulant expansion gives the following equation excluding the log and exp functions as [34]:

The first term (linear term) corresponds to the enthalpy, and the higher-order term corresponds to the entropy. The second term becomes:

When we assume:

Then, the second term of Equation (3) becomes:

And Equation (3) becomes:

The linear term (the first and second terms; <U_{b}>_{b} − <U_{u}>_{u}) of Equation (7) corresponds to the LIE approximation, when the energy difference is due to the receptor-ligand and solvent-ligand interaction energies. The LIE approximation calculates ΔG by:

where E^{vdW}_{X} and E^{ele}_{X} are the protein-ligand van der Waals interaction energy and the electrostatic interaction energy between the ligand and the surrounding molecules, R-L/S-L represents the interaction of the protein-ligand complex system/solute-ligand system, and the brackets (< >_{X}) represent the average over simulation of the protein-ligand complex system (RL) or the solute-ligand system (SL). The LIE equation includes two parameters: α and β. These parameters are known to reproduce the experimental data for each target protein.

The COMBINE approximation calculates the ΔG value by:

where the E^{vdW} (i) and E^{ele} (i) are the protein-ligand van der Waals interaction energy and the electrostatic interaction energy between the i-th residue of the protein and the ligand, and w is the parameter. The simulation of the ligand-solution system is not necessary.

Both the LIE approximation without solvent and the COMBINE approximation with the residue-independent w parameter gave the same equation:

#### 2.2. Entropy Term

In the present study, we introduced the entropy term in Equation (10) as follows. We call this method the direct interaction approximation without solvent (DIAV) method:

where E^{vdW}(i) and E^{ele}(i) are the vdW and electrostatic interactions between the i-th residue of the protein and the ligand, respectively. Svdw(i) and Sele(i) are fluctuations of E^{vdW}(i) and E^{ele}(i) during the molecular dynamics simulation, respectively. The τ*S_{x} term represents the energy fluctuation of the system corresponding to the second-order term of Equation 7 ((<U_{b} − <U_{b}>_{b})^{2}).

In Equation (11), the τ*S_{x} term is the fluctuation of energy, but we found that the energy fluctuation itself is not suitable for evaluating ΔG. Instead of the energy, S_{x} is the fluctuation of a property x that is related to the energy. The properties x in the current study are the accessible surface area (x = ASA), the dihedral angles (x = DIH), the vdW potential (x = vdW), and the electrostatic potential (x = ELE) of the protein-ligand complex structure. In the present study, we determine which property is best for estimating ΔG. There are five parameters: α, α2, β, β2, and τ.

#### 2.3. Modification of van der Waals Potential Term

To represent the van der Waals (vdW) interaction, a Lennard-Jones (LJ) 12-6-type function is used. In the docking score, the vdW interaction (lipophilic atom contact) term represents both the vdW interaction and the cavity formation energy in solvent; in water, the latter is 10 times greater than the vdW interaction. This function gives very large values when atomic conflicts occur. To reduce these conflicts, an LJ 9-6-type function has been used in a protein-ligand docking study [3]. In general, the absolute value of the vdW interaction is much smaller than the ΔG value. The LJ 12-6 value represents the atomic contact and its hydrophobic interaction. Thus, in the present study, we apply LJ 12-6, LJ 9-6, LJ 8-4, and LJ 6-3-type functions as follows:

where R_{e} is the equilibrium distance. The R_{e} and the well depth values are set to the same values obtained from AMBER param99 [35] and the general AMBER force field (GAFF) [36].

The data-sampling MD simulation is performed with the conventional AMBER force field (LJ 12-6 potential), and the analysis is performed using Equations (12)–(15).

#### 2.4. Effective Dielectric Constant

In the ligand-binding pocket, the effective dielectric constant (ε_{eff}) should be different at each point, since the ε_{eff} values of proteins are 2–4 and the ε_{eff} of water is 78.5. The E^{ele}(i) should be scaled by the ε_{eff}. We introduced the modification of the electrostatic interaction as follows (we call this method the direct interaction approximation with solvent (DIAS) method):

where E_{mod}^{ele}(i) is the E^{ele}(i) value scaled by the ε_{eff}. The ε_{eff} value could be calculated from the ratio between the electrostatic force calculated in the explicit water model and that in vacuum, as follows:

where E_{j}^{ele}(i) is the electrostatic interaction between the i-th residue and the j-th atom of the ligand in vacuum. The following scale factor might be a candidate:

or:

where F_{i}^{real} and F_{i}^{vac} are the electrostatic force acting on the i-th atom of the protein considering the solvent and not considering the solvent in the explicit water model, respectively. The F^{real} and F^{vac} were calculated by the molecular dynamics simulation in the explicit water model and in vacuum, respectively.

The scale factor ε’_{eff}^{i} by Equation (18) or (19) could be unrealistically large when the denominators of Equations (12) and (13) are nearly zero. Thus, we introduce a parameter x and the scale function as follows:

The value of ε_{eff}^{i} in Equation (20) is 1 < ε_{eff}^{i}, while the actual ε_{eff} value could be less than 1. But we introduced parameter β in Equation (16), thus the actual ε_{eff} parameter is ε_{eff}^{i}/β. In the following analysis, the factor ε_{eff}^{i} in Equation (20) is used as the actual scale factor.

#### 2.5. Examination of Entropy Term

We applied the DIAV method (Equation (11)) to the protein-ligand complex structures to examine the entropy property term Sx and performed the leave-one-out cross-validation test, as summarized in Table 1, which also summarizes the optimized parameters.

Statistics | ΔG_{simple} (Equation 10) | ELE ^{a} | vdW ^{a} | ASA ^{a} | DIH ^{a} |
---|---|---|---|---|---|

Average error (kcal/mol) | 2.22 | 2.30 | 1.85 | 2.06 | 1.94 |

R | 0.72 | 0.70 | 0.72 | 0.71 | 0.67 |

α | 0.22 | 0.22 | 0.18 | 0.17 | 0.15 |

β | 0.017001 | 0.016411 | 0.005958 | 0.012600 | 0.010430 |

τ*100 | - | −0.078460 | −28.506610 | −0.026605 | −0.000696 |

The vdW potential is the LJ 12-6 type. a: property (x) of Equation (11). Here α2 = β2 = 0. The energies are presented in kcal/mol, and R represents the correlation coefficient.

In the leave-one-out cross-validation test, one data is selected as the test data that is to be predicted and the other data are used as the teaching data to generate the prediction model equation. The test data is selected one after another in the given data set until all data are selected as the test data. The vdW energy term was set to an LJ 12-6 function, and the dielectric constant was set to 1. The values of the parameters α2 and β2 were set to zero. The ASA parameters (atomic solvation parameter and radius of each atom) were obtained from a previous study [37]. In the present study, the parameters of Equation (11) were optimized by the least-squares deviation error of the ΔG values. Compared to the results obtained by the simplified version of the COMBINE method (Equation (10)), the DIAV method (Equation (11)) slightly improved the accuracy of ΔG.

#### 2.6. Examination of vdW Term

We examined the DIAV method with the vdW term using Equations (12)–(15). The results and the optimized parameters are summarized in Table 2. The ε value was set to 1. The entropy property x was set to the ASA. We also examined the case of x = DIH, and the result was quite similar to that obtained in the case of x = ASA. The LJ 8-4-type function gave the best result among the four functions (LJ 12-6, LJ 9-6, LJ 8-4, and LJ 6-3) in the leave-one-out cross-validation test, while the accuracy obtained was similar among the functions. Thus, the LJ6-3 and LJ4-2−type functions were not used in the following study; instead we focused on the LJ8-4 function.

**Table 2.**Cross-validation results obtained by the DIAV method (Equation (11)) to examine the van der Waals potential type.

Statistics | LJ9-6 | LJ8-4 | LJ6-3 |
---|---|---|---|

Average error (kcal/mol) | 2.26 | 1.75 | 1.89 |

R | 0.69 | 0.76 | 0.71 |

α | 0.1727 | 0.0428 | 0.0066 |

β | 0.0139 | 0.0072 | 0.0078 |

τ*10000 | −2.9273 | −2.5677 | −2.8531 |

The energies are presented in kcal/mol, and R represents the correlation coefficient.

The vdW parameters represent both the protein-ligand vdW interaction and the hydrophobic interaction. In the present study, however, the number of data were limited to the optimization of the parameters, and then we used just the original vdW parameters.

#### 2.7. Examination of α2 and β2 Parameters

We examined the parameters α2 and β2 of the DIAV method (Equation (11)). The vdW potential was set to the LJ 8-4-type function, and the dielectric constant was set to 1. The entropy property x was set to the ASA. We also examined the case of x = DIH; the result was quite similar to that obtained in the case of x = ASA. The leave-one-out cross-validation results and the optimized parameters are summarized in Table 3. The optimized α2 and β2 were about 0.01 and −0.0013, respectively, and the modulated vdW and electrostatic energy values were close to the original (intact) values. Actually, the parameters α2 and β2 improved the ΔG estimation accuracy, and the equation includes five parameters (α, β, τ, α2, and β2). The two additional parameters (α2 and β2) slightly improved the average accuracy.

#### 2.8. Examination of Effective Dielectric Constant Term

We applied the idea of the effective dielectric constant. We applied the DIAS method (Equation (16)) to the estimation of ΔG using the ε_{eff} defined by Equations (18) and (19). The leave-one-out cross-validation results and the optimized parameters are summarized in Table 4.

**Table 3.**Cross-validation results obtained by the DIAV method (Equation (11)) to examine α2 and β2 parameters.

Statistics | ASA | DIH |
---|---|---|

Average error (kcal/mol) | 1.63 | 1.59 |

R | 0.80 | 0.76 |

α | 0.04146 | 0.03832 |

β | 0.00643 | 0.00491 |

τ*10000 | −2.74887 | −0.06949 |

α2 | 0.0093 | 0.0093 |

β2 | −0.0013 | −0.0015 |

The vdW potential is the LJ 8-4 type. The energies are presented in kcal/mol, and R represents the correlation coefficient.

**Table 4.**Cross-validation results obtained by Equation (10), the DIAV (Equation (11)), and the DIAS (Equation (16)) methods.

PDB ID | ΔG_{exptl} (kcal/mol) | ΔG_{simple} (Equation (10)) (kcal/mol) | ΔG_{DIAV} (Equation (11)) (kcal/mol) | ΔG_{DIAS} (Equation (16)) (kcal/mol) |
---|---|---|---|---|

1abe | −9.57 | −5.46 | −6.27 | −6.68 |

1abf | −7.39 | −6.30 | −6.67 | −6.90 |

1apu | −10.50 | −13.50 | −11.98 | −11.76 |

1dbb | −12.27 | −8.75 | −11.79 | −11.69 |

1dbj | −10.47 | −8.35 | −12.27 | −12.10 |

1dog | −5.48 | −5.40 | −6.09 | −6.12 |

1dwb | −3.98 | −3.69 | −4.83 | −5.05 |

1epo | −10.85 | −17.25 | −14.82 | −15.56 |

1etr | −10.09 | −9.91 | −10.35 | −10.08 |

1ets | −11.62 | −11.05 | −11.82 | −11.52 |

1ett | −8.44 | −9.46 | −9.99 | −9.75 |

1hpv | −12.57 | −14.02 | −12.88 | −12.78 |

1hsl | −9.96 | −6.53 | −6.74 | −7.18 |

1htf | −11.04 | −12.45 | −11.12 | −11.00 |

1hvr | −12.97 | −16.98 | −14.67 | −14.95 |

1nsd | −7.23 | −7.44 | −8.33 | −8.13 |

1pgp | −7.77 | −11.01 | −11.09 | −10.24 |

1phg | −11.81 | −6.88 | −8.03 | −8.22 |

1ppc | −8.80 | −9.83 | −8.66 | −8.85 |

1pph | −8.49 | −8.50 | −7.87 | −8.00 |

1rbp | −9.17 | −9.29 | −8.58 | −8.91 |

1tng | −4.00 | −4.15 | −4.64 | −4.90 |

1tnh | −4.59 | −3.54 | −4.24 | −4.61 |

1ulb | −7.23 | −3.82 | −5.71 | −5.74 |

2cgr | −9.92 | −7.07 | −10.94 | −10.88 |

2gbp | −10.36 | −8.95 | −9.27 | −9.77 |

2ifb | −7.41 | −9.57 | −8.53 | −8.38 |

2phh | −6.38 | −4.09 | −6.83 | −6.79 |

2r04 | −8.48 | −10.39 | −10.31 | −10.26 |

2tsc | −11.62 | −11.05 | −8.68 | −8.28 |

2ypi | −6.58 | −5.40 | −5.72 | −6.45 |

3ptb | −6.46 | −4.93 | −5.02 | −4.55 |

4dfr | −13.23 | −11.52 | −13.93 | −13.52 |

5abp | −9.05 | −6.64 | −7.19 | −7.59 |

Averageerror | - | 1.88 | 1.30 | 1.22 |

R | - | 0.73 | 0.81 | 0.81 |

α | - | 0.0503 | 0.0378 | 0.0307 |

β | - | 0.0125 | 0.0082 | 0.0118 |

τ∗10000 | - | - | −2.4178 | −2.4312 |

α2 | - | - | 0.0093 | 0.01 |

β2 | - | - | −0.0011 | −0.00312 |

x | - | - | - | 0.6 |

The vdW potential is the LJ 8-4 type. The property x of Sx is the ASA. The energies are presented in kcal/mol, and R represents the correlation coefficient.

As with the results described above, the best property x among the four properties (ASA, DIH, vdW, and ELE) was the ASA. The DIAS results obtained by Equation (19) were better than those obtained by Equation (18). The DIAS results in Table 4 were obtained by using Equation (19). The consideration of ε_{eff} slightly improved the ΔG estimation. As a result, the correlation coefficient between the experimental and the calculated ΔG values was 0.81, and the average error of ΔG was 1.2 kcal/mol. This result greatly improved the results obtained by Equation (10). Figure 1 shows the correlation between experimental and calculated ΔG values obtained by the DIAS method (Equation (16)).

**Figure 1.**Cross-validation results obtained by the DIAV method. The experimental data (ΔG

_{exptl}) and the calculated value (ΔG

_{DIAV}).

We examined the time-dependency of the ΔG obtained by the DIAS method. After 1 nsec MD simulation for equilibration, the sampling runs of 0.5 nsec, 1 nsec, 1.5 nsec and 2 nsec were performed. The ΔG values did not depend on the sampling-time length so much. Namely, the average error over the 34 target protein-ligand complexes were 1.43 kcal/mol, 1.22 kcal/mol, 1.23kcal/mol and 1.23 kcal/mol for the 0.5 nsec, 1 nsec, 1.5 nsec and 2nsec sampling times, respectively. The initial structures of these simulations were the experimentally obtained protein-compound complex structures. Thus, the protein-compound interaction did not depend on the sampling-time length so much.

The results showed that the current method worked well for various target proteins. This method could be extended as:

where α, α_{2}, β, β_{2}, and τ_{x} are the parameters. This extension is one of the generalized forms of Equation (16). We examined the combination of two properties out of five. The averaged error was increased by the combination of two entropy terms. Thus, Equation (16) is simple and accurate compared to Equation (21).

We applied the generalized Born surface area (GBSA) method [37,38,39] for the ΔG calculation to the same protein-compound set used in the current study. The average error and the correlation coefficient between the experimental and calculated ΔG values were 51.7 kcal/mol and 0.03 that showed very weak correlation, respectively. The GBSA method is good to reproduce the trend of ΔG values of many ligands for one target protein. In the current study, each target protein has only one or a few ligands. The error of the ΔG obtained by the GBSA method is large, thus, the GBSA method could not reproduce the trend of the ΔG value in the current study. In this examination, the DIAS/DIAV methods showed the better results than the GBSA method.

#### 2.9. Application to Docking-Pose Prediction

To evaluate the present method, we applied the DIAS method to the protein-ligand docking pose prediction. Usually, only 20%–30% of the docking poses generated by the protein-ligand docking program are correct (RMSD < 2 Å) in the cross-docking test, whereas 50%–70% of the docking poses generated by the protein-ligand docking program are correct (RMSD < 2 Å) in the self-docking test [7]. Of course, the cross-docking test is necessary for practical evaluation of the protein-ligand docking. In this section we mimicked the cross-docking test. We selected the docking poses by both the DIAS method (Equation (16)) and the docking program (Sievgene/myPresto [7]), then compared the results. The docking score of Sievgene was determined as:

where N_{rot}, E_{ASA}, E_{vdW}, E_{ele}, E_{hyd}, and E_{intra-vdW} represent the number of rotatable bonds of the ligand molecule, the hydrophobic energy due to the accessible surface area, the vdW energy, the protein-ligand Coulombic potential, the hydrogen bond energy, and the intramolecular vdW energy of the ligand for Sievgene [7]. Also, c_{rot}, c_{AV}, c_{ele}, c_{hyd}, and c_{intra-vdW} are the optimized coefficient for each energy term. For each atom type, the sum of E_{ASA} and E_{vdW} gives a grid potential, and both energy terms are always simultaneously calculated. Thus, these two terms share the same coefficient, c_{AV}. Sievgene utilizes the grid potential to calculate each energy term except for the intramolecular interactions. In this study, a mesh size of 60 × 60 × 60 was adopted.

In this test, we prepared three types of protein structures: (model 1) the intact protein structure prepared in Section 2, (model 2) the energy-minimized structure of apo protein in water, and (model 3) the final structure of 2-nsec MD simulation of apo protein in water. The Sievgene docking program generated five docking poses for each target protein of the three prepared structures (models 1–3). Then each protein-ligand complex structure was evaluated by the DIAS (Equation (16)) with the fixed parameter described in Table 5 in the same manner described in the previous section (the vdW function was the LJ 8-4 type function, and the property x of Sx was the ASA). The best score poses were selected by Sievgene based on docking score, and the best ΔG poses were selected by DIAS. The results are summarized in Table 5.

Initial structure (intact PDB coordinates: model 1) | Top ΔG structure by the DIAS method | Top scoring structure by Sievgene | Best among the top 5 structures |
---|---|---|---|

RMSD < 1 Å | 29.4% | 35.3% | 47.1% |

RMSD < 2 Å | 41.2% | 76.5% | 94.1% |

RMSD < 3 Å | 47.1% | 94.1% | 94.1% |

Energy-minimized structure (model 2) | Top ΔG structure by the DIAS method | Top scoring structure by Sievgene | Best among the top 5 structures |

RMSD < 1 Å | 40.0% | 6.7% | 66.7% |

RMSD < 2 Å | 73.3% | 46.7% | 93.3% |

RMSD < 3 Å | 80.0% | 73.3% | 93.3% |

Structure after MD simulation (model 3) | Top ΔG structure by the DIAS method | Top scoring structure by Sievgene | Best among the top 5 structures |

RMSD < 1 Å | 20.0% | 0.0% | 0.0% |

RMSD < 2 Å | 33.3% | 33.3% | 33.3% |

RMSD < 3 Å | 53.3% | 46.7% | 66.7% |

The vdW potential is the LJ 8-4 type. The property x of Sx is the ASA.

When the energy-minimized structures (model 2) were used, the results obtained by the DIAS method were much better than the Sievgene results. The DIAS method selected the correct poses at a rate of 73% (RMSD < 2 Å). Even if the DIAS method selected the best docking poses among the five poses generated by Sievgene, 93% of the five generated poses satisfy the RMSD < 2 Å. Thus, the DIAS method selected 78% (73% out of 93%) of the correct poses. This shows that the DIAS method is useful for practical pose prediction in drug design.

When the initial structures (model 1) were used, the Sievgene results were better than the results obtained by the DIAS method. This is a trivial self-docking test, and the MD simulations for energy calculation should slightly change the ligand coordinates from the crystal structures by thermal fluctuation. When the final structures of the MD simulation (model 3) were used, only 33.3% of the docking poses were correct (RMSD < 2 Å) by Sievgene and the DIAS method. Still, the results obtained by the DIAS method were better than those obtained by Sievgene. The shapes of the ligand-binding pockets should be changed from their suitable structures after the MD simulations. This model structure is not suitable for docking studies.

## 3. Data Preparation

To determine the coefficients for the ΔG score, we performed a protein-ligand docking simulation based on the known complex structures registered in the Protein Data Bank. Here, 34 complexes accompanied by the experimental binding free-energy values were selected from the database that was used to determine the ΔG scores of the PRO_LEADS [6]. The PDB identifiers and the names are summarized in Table 6. In the test dataset, the metalloproteins were removed from the present analysis. Metal atoms (Zn and Fe atoms) formed covalent bonds with O and S atoms of the ligands, and the classical force field that we applied could not represent the covalent bond. Thus, the present method cannot calculate ΔG values for metalloproteins with high precision.

PDB ID | Protein |
---|---|

1abe | L-ARABINOSE-BINDING PROTEIN |

1abf | L-ARABINOSE-BINDING PROTEIN |

1apu | ACID PROTEINASE (PENICILLOPEPSIN) |

1dbb | FAB' FRAGMENT |

1dbj | FAB' FRAGMENT |

1dog | GLUCOAMYLASE |

1dwb | THROMBIN |

1epo | ENDOTHIA ASPARTIC PROTEINASE |

1etr | THROMBIN |

1ets | THROMBIN |

1ett | THROMBIN |

1hpv | HIV-1 PROTEASE |

1hsl | HISTIDINE-BINDING PROTEIN |

1htf | HIV-1 PROTEASE |

1hvr | HIV-1 PROTEASE |

1nsd | NEURAMINIDASE |

1pgp | 6-PHOSPHOGLUCONATE DEHYDROGENASE |

1phg | CYTOCHROME P450 |

1ppc | TRYPSIN |

1pph | TRYPSIN |

1rbp | RETINOL-BINDING PROTEIN |

1tng | TRYPSIN |

1tnh | TRYPSIN |

1ulb | PURINE NUCLEOSIDE PHOSPHORYLASE |

2cgr | IGG2B (KAPPA) FAB FRAGMENT |

2gbp | D-GALACTOSE/D-GLUCOSE-BINDING PROTEIN |

2ifb | INTESTINAL FATTY ACID BINDING |

2phh | P-HYDROXYBENZOATE HYDROXYLASE |

2r04 | RHINOVIRUS 14 (HRV14) |

2tsc | THYMIDYLATE SYNTHASE |

2ypi | TRIOSE PHOSPHATE ISOMERASE |

3ptb | TRYPSIN |

4dfr | DIHYDROFOLATE REDUCTASE |

5abp | L-ARABINOSE-BINDING PROTEIN |

The structural ensembles generated from the PDB structure given by MD in explicit water were prepared as follows. All target proteins were prepared with ligands (protein-ligand complex structure). The force fields and the charges of the protein atoms originated from AMBER parm99 [35]. The atomic charge of each ligand was determined by the restricted electrostatic point charge (RESP) procedure using HF/6-31G*-level quantum chemical calculations [40]. We used Gaussian98 to perform the quantum chemical calculations [41]. The whole structure of each protein was embedded in a sphere of TIP3P [42] water (CAP water), including ion particles of 0.1% Na^{+} and Cl^{−}, in order to neutralize the total charge of the systems. The center of the sphere was set at the mass center of the protein. The shortest distance between the protein atom and the CAP sphere wall was set to 10 Å. Before an MD calculation was performed for the entire system, an MD calculation for only the solvent parts (solvent water and counter ions) was performed with the protein, ligand, and metal ion coordinates fixed, so as to bring the solvent parts sufficiently close to an equilibrium state. The SHAKE method was used to constrain covalent bonds between heavy and hydrogen atoms in any molecule in the system [43]. MD simulations of the entire system were performed using 2.0 fsec time steps with the temperature set at 310 K; the fast multipole method [44] was used to calculate the Coulombic interaction. The cutoff distance of the van der Waals interaction was 12.0 Å. The MD simulations were performed by using cosgene/myPresto [18]. After equilibration steps of 1,000 psec, the protein coordinates were sampled every 1 psec. Finally, we obtained 1,000 structures for each target protein in the 1,000 psec production run. The software program myPresto version 4 [45] was used for the simulation.

## 4. Conclusions

We have developed the direct interaction approximation (DIA) method and examined both the direct interaction approximation without solvent (DIAV) and with solvent (DIAS) methods. The results showed that the inclusion of the fluctuation of the ASA/dihedral angle terms drastically improved the accuracy of ΔG. The DIAV method (Equation (16)) was the final form for the simple and accurate estimation of ΔG. The effective dielectric constant should be calculated by Equations (19) and (20), and the vdW potential should be the LJ 8-4-type function. This equation included six parameters: α, β, α2, β2, τ, and x. The six optimized parameters could be applied to all of the target proteins.

In the explicit water model, the DIA (DIAV and DIAS) methods required only the MD simulation of the protein-ligand complex. The DIA method with the LJ 8-4-type function improved the accuracy of the calculated ΔG value drastically: the correlation coefficient between the experimental and the calculated ΔG values was improved to 0.8 as obtained by the DIAV method, from 0.7 as obtained by the simplified COMBINE method without the entropy term (Equation (10)), and the average error of ΔG was improved to 1.2 kcal/mol as obtained by the DIAS method, from 1.9 kcal/mol as obtained by Equation (10).

## Acknowledgements

This work was supported by grants from the New Energy and Industrial Technology Development Organization of Japan (NEDO) and by the Ministry of Economy, Trade, and Industry (METI) of Japan.

## References

- Warren, G.L.; Andrews, C.W.; Capelli, A.M.; Clarke, B.; LaLonde, J.; Lambert, M.H.; Lindvall, M.; Nevins, N.; Semus, S.F.; Senger, S.; et al. A critical assessment of docking programs and scoring functions. J. Med. Chem.
**2006**, 49, 5912–5931. [Google Scholar] [CrossRef] - Kontoyianni, M.; Sokol, G.S.; McClellan, L.M. Evaluation of library ranking efficacy in virtual screening. J. Comput. Chem.
**2005**, 26, 11–22. [Google Scholar] [CrossRef] - Kuntz, I.D.; Blaney, J.M.; Oatley, S.J.; Langridge, R.; Ferrin, T.E. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol.
**1982**, 161, 269–288. [Google Scholar] [CrossRef] - Rarey, M.; Kramer, B.; Lengauer, T.; Klebe, G. A fast flexible docking method using an incremental construction algorithm. J. Mol. Biol.
**1996**, 261, 470–489. [Google Scholar] [CrossRef] - Jones, G.; Willet, P.; Glen, R.C.; Leach, A.R.; Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol.
**1997**, 267, 727–748. [Google Scholar] [CrossRef] - Baxter, C.A.; Murray, C.W.; Clark, D.E.; Westhead, D.R.; Eldridge, M.D. Flexible docking using tabu search and an empirical estimate of binding affinity. Proteins
**1998**, 33, 367–382. [Google Scholar] [CrossRef] - Fukunishi, Y.; Mikami, Y.; Nakamura, H. Similarities among receptor pockets and among compounds: Analysis and application to in silico ligand screening. J. Mol. Graph. Model.
**2005**, 24, 34–45. [Google Scholar] [CrossRef] - Zhang, C.; Liu, S.; Zhu, Q.; Zhou, Y. A knowledge-based energy function for protein-ligand, protein-protein, and protein-DNA complexes. J. Med. Chem.
**2005**, 48, 2325–2335. [Google Scholar] [CrossRef] - Muegge, I.; Martin, Y.C. A general and fast scoring function for protein-ligand interactions: A simplified potential approach. J. Med. Chem.
**1999**, 42, 791–804. [Google Scholar] [CrossRef] - Fukunishi, Y.; Mikami, Y.; Kubota, S.; Nakamura, H. Multiple target screening method for robust and accurate in silico ligand screening. J. Mol. Graphics Modell.
**2005**, 25, 61–70. [Google Scholar] - Shan, Y.; Kim, T.E.; Eastwood, M.P.; Dror, R.O.; Seeliger, M.A.; Shaw, D.E. How does a drug molecule find its target binding site? J. Am. Chem. Soc.
**2011**, 133, 9181–9183. [Google Scholar] [CrossRef] - Dror, R.O.; Pan, A.C.; Arlow, D.H.; Borhani, D.W.; Maragakis, P.; Shan, Y.; Xu, H.; Shaw, D.E. Pathway and mechanism of drug binding to G-protein-coupled receptors. Proc. Natl. Acad. Soc. USA
**2011**, 108, 13118–13123. [Google Scholar] [CrossRef] - Kamiya, N.; Yonezawa, Y.; Nakamura, H.; Higo, J. Protein-inhibitor flexible docking by a multicanonical sampling: Native complex structure with the lowest free energy and a free-energy barrier distinguishing the native complex from the others. Proteins
**2008**, 70, 41–53. [Google Scholar] - Nakajima, N; Higo, J; Kidera, A; Nakamura, H. Flexible docking of a ligand peptide to a receptor protein by multicanonical molecular dynamics simulation. Chem. Phys. Lett.
**1997**, 278, 297–301. [Google Scholar] [CrossRef] - Berg, B.A.; Neuhaus, T. Multicanonical algorithms for first order phase transitions. Phys. Lett. B
**1991**, 267, 249–253. [Google Scholar] [CrossRef] - Nakajima, N.; Nakamura, H.; Kidera, A. Multicanonical ensemble generated by molecular dynamics simulation for enhanced conformational sampling of peptides. J. Phys. Chem. B
**1997**, 101, 817–824. [Google Scholar] - Kim, J.G.; Fukunishi, Y.; Nakamura, H. Multicanonical molecular dynamics algorithm employing adaptive force-biased iteration scheme. Phys. Rev. E
**2004**. [Google Scholar] [CrossRef] - Fukunishi, Y.; Mikami, Y.; Nakamura, H. The filling potential method: A method for estimating the free energy surface for protein-ligand docking. J. Phys. Chem. B
**2003**, 107, 13201–13210. [Google Scholar] [CrossRef] - Gervasio, F.L.; Laio, A.; Parrinello, M. Flexible docking in solution using metadynamics. J. Am. Chem. Soc.
**2005**, 127, 2600–2607. [Google Scholar] [CrossRef] - Branduardi, D.; Gervasio, F.L.; Parrinello, M. From A to B in free energy space. J. Chem. Phys.
**2007**, 054103. [Google Scholar] - Fukunishi, Y.; Mitomo, D.; Nakamura, H. Protein-ligand binding free energy calculation by the smooth reaction path generation (SRPG) method. J. Chem. Inf. Model.
**2009**, 49, 1944–1951. [Google Scholar] [CrossRef] - Liphardt, J.; Dumont, S.; Smith, S.B.; Tinoco I., Jr.; Bustamante, C. Equilibrium information from nonequilibrium measurements in an experimental test of Jarzynski’s equality. Science
**2002**, 296, 1832–1835. [Google Scholar] [CrossRef] - Fujitani, H.; Tanida, Y.; Matsuura, A. Massively parallel computation of absolute binding free energy with well-equilibrated states. Phys. Rev. E
**2009**, 021914. [Google Scholar] - Kollman, P.A.; Massova, I.; Reyes, C.; Kuhn, B.; Huo, S.; Chong, L.; Lee, M.; Lee, T.; Duan, Y.; Wang, W.; et al. Calculating structures and free energies of complex molecules: Combining molecular mechanics and continuum models. Acc. Chem. Res.
**2000**, 33, 889–897. [Google Scholar] [CrossRef] - Hansson, T.; Marelius, J.; Åqvist, J. Ligand binding affinity prediction by linear interaction energy methods. J. Comput-Aided. Mol. Des.
**1998**, 12, 27–35. [Google Scholar] - Ortiz, A.R.; Pisabarro, M.T.; Gago, F.; Wade, R.C. Prediction of drug binding affinities by comparative binding energy analysis. J. Med. Chem.
**1995**, 38, 2681–2691. [Google Scholar] [CrossRef] - Cuevas, C.; Pastor, M.; Perez, C.; Gago, F. Comparative binding energy (COMBINE) analysis of human neutrophil elastase inhibition by pyridone-containing trifluoromethylketones. Comb. Chem. High Throughput Screen.
**2001**, 4, 627–642. [Google Scholar] - Perez, C.; Pastor, M.; Ortiz, A.R.; Gago, F. Comparative binding energy analysis of HIV-1 protease inhibitors: incorporation of solvent effects and validation as a powerful tool in receptor-based drug design. J. Med. Chem.
**1998**, 41, 836–852. [Google Scholar] [CrossRef] - Lozano, J.J.; Pastor, M.; Cruciani, G.; Gaedt, K.; Centeno, N.B.; Gago, F.; Sanz, F. 3D-QSAR methods on the basis of ligand-receptor complexes. Application of COMBINE and GRID/GOLPE methodologies to a series of CYP1A2 ligands. J. Comput. Aided Mol. Des.
**2000**, 14, 341–353. [Google Scholar] - Tomic, S.; Nilsson, L.; Wade, R.C. Nuclear receptor-DNA binding specificity: A COMBINE and Free-Wilson QSAR analysis. J. Med. Chem.
**2000**, 43, 1780–1792. [Google Scholar] [CrossRef] - Wang, T.; Wade, R.C. Comparative binding energy (COMBINE) analysis of influenza neuraminidase-inhibitor complexes. J. Med. Chem.
**2001**, 44, 961–971. [Google Scholar] [CrossRef] - Murcia, M.; Ortiz, A.R. Virtual screening with flexible docking and COMBINE-based models. Application to a series of factor Xa inhibitors. J. Med. Chem.
**2004**, 47, 805–820. [Google Scholar] [CrossRef] - Zwanzig, R.W. High-temperature equation of state by a perturbation method. I. Nonpolar gases. J. Chem. Phys.
**1954**, 22, 1420–1426. [Google Scholar] [CrossRef] - Kubo, R. Generalized cumulant expansion method. J. Phys. Soc. Jpn.
**1962**, 17, 1100–1120. [Google Scholar] [CrossRef] - Case, D.A.; Darden, T.A.; Cheatham, T.E. III; Simmerling, C.L.; Wang, J.; Duke, R.E.; Luo, R.; Merz, K.M.; Wang, B.; Pearlman, D.A.; et al. AMBER 8; University of California: San Francisco, CA, USA, 2004. [Google Scholar]
- Wang, J.; Wolf, R.M.; Caldwell, J.W.; Kollman, P.A.; Case, D.A. Development and testing of a general amber force field. J. Compt. Chem.
**2004**, 25, 1157–1174. [Google Scholar] [CrossRef] - Hawkins, D.G.; Cramer, J.C.; Truhlar, G.D. Parametrized models of aqueous free energies of solvation based on pairwise descreening of solute atomic charges from a dielectric medium. J. Phys. Chem.
**1996**, 100, 19824–19839. [Google Scholar] [CrossRef] - Watanabe, Y.S.; Kim, J.; Fukunishi, Y.; Nakamura, H. Free energy Landscape of small peptides in an implicit solvent model determined by force-biased multicanonical dynamics simulation. Chem. Phys. Letts.
**2004**, 400, 258–263. [Google Scholar] [CrossRef] - Lyne, P.D.; Lamb, M.L.; Saeh, J.C. Accurate prediction of the relative potencies of members of a series of kinase inhibitors using molecular docking and MM-GBSA scoring. J. Med. Chem.
**2006**, 49, 4805–4808. [Google Scholar] [CrossRef] - Wang, J.; Cieplak, P.; Kollman, P.A. How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? J. Comput. Chem.
**2000**, 21, 1049–1074. [Google Scholar] [CrossRef] - Frisch, M.J.; Trucks, G.W.; Schlegel, H.B.; Scuseria, G.E.; Robb, M.A.; Cheeseman, J.R.; Zakrzewski, V.G.; Montgomery, J.A.; Stratmann, R.E., Jr.; Burant, J.C.; et al. Gaussian 98 (Revision A.9); Gaussian Inc.: Pittsburgh, PA, USA, 1998. [Google Scholar]
- Jorgensen, W.L.; Chandrasekhar, J.; Madura, J.D.; Impey, R.W.; Klein, M.L. Comparison of simple potential functions for simulating lipid water. J Chem. Phys.
**1983**, 79, 926–935. [Google Scholar] [CrossRef] - Ryckaert, J.P.; Ciccotti, G.; Berendsen, H.J.C. Numerical integration of the cartesian equations of motion of a system with constraints: Molecular dynamics of n-alkanes. J. Comput. Phys.
**1977**, 23, 327–341. [Google Scholar] [CrossRef] - Greengard, L.; Rokhlin, V. A fast algorithm for particle simulations. J. Comput. Phys.
**1987**, 73, 325–348. [Google Scholar] [CrossRef] - MyPresto, version 4; a program suite composed of several molecular simulations for drug development; Osaka University: Osaka, Japan, 2012.

- Sample Availability: Samples of the compounds and proteins are available from the authors.

© 2012 by the authors; licensee MDPI, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).