This article is an openaccess article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
We have developed a method for estimating proteinligand binding free energy (ΔG) based on the direct proteinligand interaction obtained by a molecular dynamics simulation. Using this method, we estimated the ΔG value statistically by the average values of the van der Waals and electrostatic interactions between each amino acid of the target protein and the ligand molecule. In addition, we introduced fluctuations in the accessible surface area (ASA) and dihedral angles of the proteinligand complex system as the entropy terms of the ΔG estimation. The present method included the fluctuation term of structural change of the protein and the effective dielectric constant. We applied this method to 34 proteinligand complex structures. As a result, the correlation coefficient between the experimental and calculated ΔG values was 0.81, and the average error of ΔG was 1.2 kcal/mol with the use of the fixed parameters. These results were obtained from a 2 nsec molecular dynamics simulation.
The proteinligand binding free energy (ΔG) has been calculated by various computational methods. Many proteinligand docking programs have been developed to estimate ΔG [
There have been several reports on proteincompound docking and free energy calculation by molecular dynamics (MD) simulation. Even if a proteinligand complex structure is unknown,
In an explicit water model, if a proteinligand complex structure is known, the binding free energy and the potential of mean force (PMF) along the dissociation path can be obtained by using the filling potential (FP) method [
The molecularmechanics PoissonBoltzmann surfacearea (MMPBSA) method [
The COMBINE method is based on the assumption that biological activities can be correlated with a linear combination of a subset of the van der Waals and electrostatic terms of the interaction energies between a ligand and its surrounding protein residues (such as the target receptor) [
where E_{i}
The coefficients of Equation (1) (
In the present study, we propose a ΔG estimation method based on the direct proteinligand interaction obtained by molecular dynamics simulation. We introduced the entropy term and the local effective dielectric constant, and modified the van der Waals potential to improve the accuracy of the present method so that it does not require multiple active compounds to predict the ΔG value.
ΔG is calculated by Zwanzig equation as follows [
where U_{b}, U_{u}, and < >_{b} represent the potential of the proteinligand bound and unbound states and the average over the boundstate trajectory, respectively.
Kubo’s cumulant expansion gives the following equation excluding the log and exp functions as [
The first term (linear term) corresponds to the enthalpy, and the higherorder term corresponds to the entropy. The second term becomes:
When we assume:
Then, the second term of Equation (3) becomes:
And Equation (3) becomes:
The linear term (the first and second terms; <U_{b}>_{b} − <U_{u}>_{u}) of Equation (7) corresponds to the LIE approximation, when the energy difference is due to the receptorligand and solventligand interaction energies. The LIE approximation calculates ΔG by:
where E^{vdW}_{X} and E^{ele}_{X} are the proteinligand van der Waals interaction energy and the electrostatic interaction energy between the ligand and the surrounding molecules, RL/SL represents the interaction of the proteinligand complex system/soluteligand system, and the brackets (< >_{X}) represent the average over simulation of the proteinligand complex system (RL) or the soluteligand system (SL). The LIE equation includes two parameters: α and β. These parameters are known to reproduce the experimental data for each target protein.
The COMBINE approximation calculates the ΔG value by:
where the E^{vdW} (i) and E^{ele} (i) are the proteinligand van der Waals interaction energy and the electrostatic interaction energy between the ith residue of the protein and the ligand, and w is the parameter. The simulation of the ligandsolution system is not necessary.
Both the LIE approximation without solvent and the COMBINE approximation with the residueindependent w parameter gave the same equation:
In the present study, we introduced the entropy term in Equation (10) as follows. We call this method the direct interaction approximation without solvent (DIAV) method:
where E^{vdW}(i) and E^{ele}(i) are the vdW and electrostatic interactions between the ith residue of the protein and the ligand, respectively. Svdw(i) and Sele(i) are fluctuations of E^{vdW}(i) and E^{ele}(i) during the molecular dynamics simulation, respectively. The τ*S_{x} term represents the energy fluctuation of the system corresponding to the secondorder term of Equation 7 ((<U_{b} − <U_{b}>_{b})^{2}).
In Equation (11), the τ*S_{x} term is the fluctuation of energy, but we found that the energy fluctuation itself is not suitable for evaluating ΔG. Instead of the energy, S_{x} is the fluctuation of a property x that is related to the energy. The properties x in the current study are the accessible surface area (x = ASA), the dihedral angles (x = DIH), the vdW potential (x = vdW), and the electrostatic potential (x = ELE) of the proteinligand complex structure. In the present study, we determine which property is best for estimating ΔG. There are five parameters: α, α2, β, β2, and τ.
To represent the van der Waals (vdW) interaction, a LennardJones (LJ) 126type function is used. In the docking score, the vdW interaction (lipophilic atom contact) term represents both the vdW interaction and the cavity formation energy in solvent; in water, the latter is 10 times greater than the vdW interaction. This function gives very large values when atomic conflicts occur. To reduce these conflicts, an LJ 96type function has been used in a proteinligand docking study [
where R_{e} is the equilibrium distance. The R_{e} and the well depth values are set to the same values obtained from AMBER param99 [
The datasampling MD simulation is performed with the conventional AMBER force field (LJ 126 potential), and the analysis is performed using Equations (12)–(15).
In the ligandbinding pocket, the effective dielectric constant (ε_{eff}) should be different at each point, since the ε_{eff} values of proteins are 2–4 and the ε_{eff} of water is 78.5. The E^{ele}(i) should be scaled by the ε_{eff}. We introduced the modification of the electrostatic interaction as follows (we call this method the direct interaction approximation with solvent (DIAS) method):
where E_{mod}^{ele}(i) is the E^{ele}(i) value scaled by the ε_{eff}. The ε_{eff} value could be calculated from the ratio between the electrostatic force calculated in the explicit water model and that in vacuum, as follows:
where E_{j}^{ele}(i) is the electrostatic interaction between the ith residue and the jth atom of the ligand in vacuum. The following scale factor might be a candidate:
or:
where F_{i}^{real} and F_{i}^{vac} are the electrostatic force acting on the ith atom of the protein considering the solvent and not considering the solvent in the explicit water model, respectively. The F^{real} and F^{vac} were calculated by the molecular dynamics simulation in the explicit water model and in vacuum, respectively.
The scale factor ε’_{eff}^{i} by Equation (18) or (19) could be unrealistically large when the denominators of Equations (12) and (13) are nearly zero. Thus, we introduce a parameter
The value of ε_{eff}^{i} in Equation (20) is 1 < ε_{eff}^{i}, while the actual ε_{eff} value could be less than 1. But we introduced parameter β in Equation (16), thus the actual ε_{eff} parameter is ε_{eff}^{i}/β. In the following analysis, the factor ε_{eff}^{i} in Equation (20) is used as the actual scale factor.
We applied the DIAV method (Equation (11)) to the proteinligand complex structures to examine the entropy property term Sx and performed the leaveoneout crossvalidation test, as summarized in
Crossvalidation results obtained by Equation (10) and the DIAV method (Equation (11)).
Statistics  ΔG_{simple} (Equation 10)  ELE ^{a}  vdW ^{a}  ASA ^{a}  DIH ^{a} 

Average error (kcal/mol)  2.22  2.30  1.85  2.06  1.94 
R  0.72  0.70  0.72  0.71  0.67 
α  0.22  0.22  0.18  0.17  0.15 
β  0.017001  0.016411  0.005958  0.012600  0.010430 
τ*100    −0.078460  −28.506610  −0.026605  −0.000696 
The vdW potential is the LJ 126 type. a: property (x) of Equation (11). Here α2 = β2 = 0. The energies are presented in kcal/mol, and R represents the correlation coefficient.
In the leaveoneout crossvalidation test, one data is selected as the test data that is to be predicted and the other data are used as the teaching data to generate the prediction model equation. The test data is selected one after another in the given data set until all data are selected as the test data. The vdW energy term was set to an LJ 126 function, and the dielectric constant was set to 1. The values of the parameters α2 and β2 were set to zero. The ASA parameters (atomic solvation parameter and radius of each atom) were obtained from a previous study [
We examined the DIAV method with the vdW term using Equations (12)–(15). The results and the optimized parameters are summarized in
Crossvalidation results obtained by the DIAV method (Equation (11)) to examine the van der Waals potential type.
Statistics  LJ96  LJ84  LJ63 

Average error (kcal/mol)  2.26  1.75  1.89 
R  0.69  0.76  0.71 
α  0.1727  0.0428  0.0066 
β  0.0139  0.0072  0.0078 
τ*10000  −2.9273  −2.5677  −2.8531 
The energies are presented in kcal/mol, and R represents the correlation coefficient.
The vdW parameters represent both the proteinligand vdW interaction and the hydrophobic interaction. In the present study, however, the number of data were limited to the optimization of the parameters, and then we used just the original vdW parameters.
We examined the parameters α2 and β2 of the DIAV method (Equation (11)). The vdW potential was set to the LJ 84type function, and the dielectric constant was set to 1. The entropy property x was set to the ASA. We also examined the case of x = DIH; the result was quite similar to that obtained in the case of x = ASA. The leaveoneout crossvalidation results and the optimized parameters are summarized in
We applied the idea of the effective dielectric constant. We applied the DIAS method (Equation (16)) to the estimation of ΔG using the ε_{eff} defined by Equations (18) and (19). The leaveoneout crossvalidation results and the optimized parameters are summarized in
Crossvalidation results obtained by the DIAV method (Equation (11)) to examine α2 and β2 parameters.
Statistics  ASA  DIH 

Average error (kcal/mol)  1.63  1.59 
R  0.80  0.76 
α  0.04146  0.03832 
β  0.00643  0.00491 
τ*10000  −2.74887  −0.06949 
α2  0.0093  0.0093 
β2  −0.0013  −0.0015 
The vdW potential is the LJ 84 type. The energies are presented in kcal/mol, and R represents the correlation coefficient.
Crossvalidation results obtained by Equation (10), the DIAV (Equation (11)), and the DIAS (Equation (16)) methods.
PDB ID  ΔG_{exptl} (kcal/mol)  ΔG_{simple} (Equation (10)) (kcal/mol)  ΔG_{DIAV} (Equation (11)) (kcal/mol)  ΔG_{DIAS} (Equation (16)) (kcal/mol) 

1abe  −9.57  −5.46  −6.27  −6.68 
1abf  −7.39  −6.30  −6.67  −6.90 
1apu  −10.50  −13.50  −11.98  −11.76 
1dbb  −12.27  −8.75  −11.79  −11.69 
1dbj  −10.47  −8.35  −12.27  −12.10 
1dog  −5.48  −5.40  −6.09  −6.12 
1dwb  −3.98  −3.69  −4.83  −5.05 
1epo  −10.85  −17.25  −14.82  −15.56 
1etr  −10.09  −9.91  −10.35  −10.08 
1ets  −11.62  −11.05  −11.82  −11.52 
1ett  −8.44  −9.46  −9.99  −9.75 
1hpv  −12.57  −14.02  −12.88  −12.78 
1hsl  −9.96  −6.53  −6.74  −7.18 
1htf  −11.04  −12.45  −11.12  −11.00 
1hvr  −12.97  −16.98  −14.67  −14.95 
1nsd  −7.23  −7.44  −8.33  −8.13 
1pgp  −7.77  −11.01  −11.09  −10.24 
1phg  −11.81  −6.88  −8.03  −8.22 
1ppc  −8.80  −9.83  −8.66  −8.85 
1pph  −8.49  −8.50  −7.87  −8.00 
1rbp  −9.17  −9.29  −8.58  −8.91 
1tng  −4.00  −4.15  −4.64  −4.90 
1tnh  −4.59  −3.54  −4.24  −4.61 
1ulb  −7.23  −3.82  −5.71  −5.74 
2cgr  −9.92  −7.07  −10.94  −10.88 
2gbp  −10.36  −8.95  −9.27  −9.77 
2ifb  −7.41  −9.57  −8.53  −8.38 
2phh  −6.38  −4.09  −6.83  −6.79 
2r04  −8.48  −10.39  −10.31  −10.26 
2tsc  −11.62  −11.05  −8.68  −8.28 
2ypi  −6.58  −5.40  −5.72  −6.45 
3ptb  −6.46  −4.93  −5.02  −4.55 
4dfr  −13.23  −11.52  −13.93  −13.52 
5abp  −9.05  −6.64  −7.19  −7.59 
Averageerror    1.88  1.30  1.22 
R    0.73  0.81  0.81 
α    0.0503  0.0378  0.0307 
β    0.0125  0.0082  0.0118 
τ∗10000      −2.4178  −2.4312 
α2      0.0093  0.01 
β2      −0.0011  −0.00312 
x        0.6 
The vdW potential is the LJ 84 type. The property x of Sx is the ASA. The energies are presented in kcal/mol, and R represents the correlation coefficient.
As with the results described above, the best property x among the four properties (ASA, DIH, vdW, and ELE) was the ASA. The DIAS results obtained by Equation (19) were better than those obtained by Equation (18). The DIAS results in
Crossvalidation results obtained by the DIAV method. The experimental data (ΔG_{exptl}) and the calculated value (ΔG_{DIAV}).
We examined the timedependency of the ΔG obtained by the DIAS method. After 1 nsec MD simulation for equilibration, the sampling runs of 0.5 nsec, 1 nsec, 1.5 nsec and 2 nsec were performed. The ΔG values did not depend on the samplingtime length so much. Namely, the average error over the 34 target proteinligand complexes were 1.43 kcal/mol, 1.22 kcal/mol, 1.23kcal/mol and 1.23 kcal/mol for the 0.5 nsec, 1 nsec, 1.5 nsec and 2nsec sampling times, respectively. The initial structures of these simulations were the experimentally obtained proteincompound complex structures. Thus, the proteincompound interaction did not depend on the samplingtime length so much.
The results showed that the current method worked well for various target proteins. This method could be extended as:
where α, α_{2}, β, β_{2}, and τ_{x} are the parameters. This extension is one of the generalized forms of Equation (16). We examined the combination of two properties out of five. The averaged error was increased by the combination of two entropy terms. Thus, Equation (16) is simple and accurate compared to Equation (21).
We applied the generalized Born surface area (GBSA) method [
To evaluate the present method, we applied the DIAS method to the proteinligand docking pose prediction. Usually, only 20%–30% of the docking poses generated by the proteinligand docking program are correct (RMSD < 2 Å) in the crossdocking test, whereas 50%–70% of the docking poses generated by the proteinligand docking program are correct (RMSD < 2 Å) in the selfdocking test [
where
In this test, we prepared three types of protein structures: (model 1) the intact protein structure prepared in
Docking accuracy.
Initial structure (intact PDB coordinates: model 1)  Top ΔG structure by the DIAS method  Top scoring structure by Sievgene  Best among the top 5 structures 

RMSD < 1 Å  29.4%  35.3%  47.1% 
RMSD < 2 Å  41.2%  76.5%  94.1% 
RMSD < 3 Å  47.1%  94.1%  94.1% 




RMSD < 1 Å  40.0%  6.7%  66.7% 
RMSD < 2 Å  73.3%  46.7%  93.3% 
RMSD < 3 Å  80.0%  73.3%  93.3% 




RMSD < 1 Å  20.0%  0.0%  0.0% 
RMSD < 2 Å  33.3%  33.3%  33.3% 
RMSD < 3 Å  53.3%  46.7%  66.7% 
The vdW potential is the LJ 84 type. The property x of Sx is the ASA.
When the energyminimized structures (model 2) were used, the results obtained by the DIAS method were much better than the Sievgene results. The DIAS method selected the correct poses at a rate of 73% (RMSD < 2 Å). Even if the DIAS method selected the best docking poses among the five poses generated by Sievgene, 93% of the five generated poses satisfy the RMSD < 2 Å. Thus, the DIAS method selected 78% (73% out of 93%) of the correct poses. This shows that the DIAS method is useful for practical pose prediction in drug design.
When the initial structures (model 1) were used, the Sievgene results were better than the results obtained by the DIAS method. This is a trivial selfdocking test, and the MD simulations for energy calculation should slightly change the ligand coordinates from the crystal structures by thermal fluctuation. When the final structures of the MD simulation (model 3) were used, only 33.3% of the docking poses were correct (RMSD < 2 Å) by Sievgene and the DIAS method. Still, the results obtained by the DIAS method were better than those obtained by Sievgene. The shapes of the ligandbinding pockets should be changed from their suitable structures after the MD simulations. This model structure is not suitable for docking studies.
To determine the coefficients for the ΔG score, we performed a proteinligand docking simulation based on the known complex structures registered in the Protein Data Bank. Here, 34 complexes accompanied by the experimental binding freeenergy values were selected from the database that was used to determine the ΔG scores of the PRO_LEADS [
List of the proteins used.
PDB ID  Protein 

1abe  LARABINOSEBINDING PROTEIN 
1abf  LARABINOSEBINDING PROTEIN 
1apu  ACID PROTEINASE (PENICILLOPEPSIN) 
1dbb  FAB' FRAGMENT 
1dbj  FAB' FRAGMENT 
1dog  GLUCOAMYLASE 
1dwb  THROMBIN 
1epo  ENDOTHIA ASPARTIC PROTEINASE 
1etr  THROMBIN 
1ets  THROMBIN 
1ett  THROMBIN 
1hpv  HIV1 PROTEASE 
1hsl  HISTIDINEBINDING PROTEIN 
1htf  HIV1 PROTEASE 
1hvr  HIV1 PROTEASE 
1nsd  NEURAMINIDASE 
1pgp  6PHOSPHOGLUCONATE DEHYDROGENASE 
1phg  CYTOCHROME P450 
1ppc  TRYPSIN 
1pph  TRYPSIN 
1rbp  RETINOLBINDING PROTEIN 
1tng  TRYPSIN 
1tnh  TRYPSIN 
1ulb  PURINE NUCLEOSIDE PHOSPHORYLASE 
2cgr  IGG2B (KAPPA) FAB FRAGMENT 
2gbp  DGALACTOSE/DGLUCOSEBINDING PROTEIN 
2ifb  INTESTINAL FATTY ACID BINDING 
2phh  PHYDROXYBENZOATE HYDROXYLASE 
2r04  RHINOVIRUS 14 (HRV14) 
2tsc  THYMIDYLATE SYNTHASE 
2ypi  TRIOSE PHOSPHATE ISOMERASE 
3ptb  TRYPSIN 
4dfr  DIHYDROFOLATE REDUCTASE 
5abp  LARABINOSEBINDING PROTEIN 
The structural ensembles generated from the PDB structure given by MD in explicit water were prepared as follows. All target proteins were prepared with ligands (proteinligand complex structure). The force fields and the charges of the protein atoms originated from AMBER parm99 [
We have developed the direct interaction approximation (DIA) method and examined both the direct interaction approximation without solvent (DIAV) and with solvent (DIAS) methods. The results showed that the inclusion of the fluctuation of the ASA/dihedral angle terms drastically improved the accuracy of ΔG. The DIAV method (Equation (16)) was the final form for the simple and accurate estimation of ΔG. The effective dielectric constant should be calculated by Equations (19) and (20), and the vdW potential should be the LJ 84type function. This equation included six parameters: α, β, α2, β2, τ, and x. The six optimized parameters could be applied to all of the target proteins.
In the explicit water model, the DIA (DIAV and DIAS) methods required only the MD simulation of the proteinligand complex. The DIA method with the LJ 84type function improved the accuracy of the calculated ΔG value drastically: the correlation coefficient between the experimental and the calculated ΔG values was improved to 0.8 as obtained by the DIAV method, from 0.7 as obtained by the simplified COMBINE method without the entropy term (Equation (10)), and the average error of ΔG was improved to 1.2 kcal/mol as obtained by the DIAS method, from 1.9 kcal/mol as obtained by Equation (10).
This work was supported by grants from the New Energy and Industrial Technology Development Organization of Japan (NEDO) and by the Ministry of Economy, Trade, and Industry (METI) of Japan.