Integration of Ligand-Based Drug Screening with Structure-Based Drug Screening by Combining Maximum Volume Overlapping Score with Ligand Doscking

Ligand-based and structure-based drug screening methods were integrated for in silico drug development by combining the maximum-volume overlap (MVO) method with a protein-compound docking program. The MVO method is used to select reliable docking poses by calculating volume overlaps between the docking pose in question and the known ligand docking pose, if at least a single protein-ligand complex structure is known. In the present study, the compounds in a database were docked onto a target protein that had a known protein-ligand complex structure. The new score is the summation of the docking score and the MVO score, which is the measure of the volume overlap between the docking poses of the compound in question and the known ligand. The compounds were sorted according to the new score. The in silico screening results were improved by comparing the MVO score to the original docking score only. The present method was also applied to some target proteins with known ligands, and the results demonstrated that it worked well.


Introduction
There have been two approaches to in silico drug screening: structure-based drug screening (SBDS) and ligand-based drug screening (LBDS). Their key technologies are, respectively, a proteincompound docking program based on the structure of the receptor protein and a molecular similarity calculation program for various chemical compounds.
There have been many reports on LBDS [1][2][3][4][5], which is essentially a similarity search based on known active compounds. The outline of similarity searches is summarized in a review article [5]. To perform a similarity search, one approach is to use a molecular descriptor (a set of many substructures) such as the MACCS key, developed by Molecular Design Limited (MDL; San Leandro, CA, USA) together with a Daylight descriptor (Daylight Chemical Information Systems, Aliso Viejo, CA, USA). Another approach is to overlap the known active compound with compounds selected from a compound library such as ROCS of OpenEye.
Previously, we developed the maximum-volume overlap (MVO) method to improve the docking pose of a compound by using a known protein-ligand complex structure [6]. The MVO method was applied to a molecular dynamics (MD)-MVO method for ligand-based in silico drug screening [7]. MD-MVO is a molecular dynamics simulation method for molecular overlapping (alignment). The molecular system consists of a query compound and one or more other compounds selected from a compound library. The intermolecular interaction between compounds is proportional to the molecular overlap instead of the van der Waals and Coulombic interactions between atoms of different molecules. This method was able to achieve both conformer generation of molecules and molecular overlapping (alignment) at the same time. After energy minimization and short-time MD simulation, the molecules in the system were overlapped with each other, and the similarity between compounds was measured according to the volume of the overlap.
Typically, LBDS and SBDS are used independently. LBDS uses only the known active compounds and not the target protein structure. Conversely, SBDS uses the target protein structure but not the known active compounds [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22], although there are exceptions. The machine-learning multiple target screening (MS-MTS) method uses both the target protein structure and the active compounds [23]. The MS-MTS method requires at least several known active compounds to obtain good screening results, since the MS-MTS method is a sort of machine learning approach. In the MS-MTS method, the docking score is modified by the linear combination of the docking scores with many proteins, as follows: where S a i new is the docking score of the i-th compound with the a-th protein and m a b is a set of parameters. The parameter m a b is optimized to the maximum area under the database enrichment curve (AUC) of known active compounds by a Monte Carlo simulation.
In the present study, we integrated the concepts of LBDS with those of SBDS. Namely, the docking simulation method and the MVO scoring method were combined to introduce the protein force field into the MD-MVO similarity search method. This method requires only a single protein-ligand complex structure instead of multiple ligands as required by the MS-MTS method. Figure 1 shows a schematic representation of the present method. To apply this method, one or more protein-ligand complex structures are necessary. The protein-ligand complex structures were obtained by X-ray crystallography, NMR experiments, or computer simulations. After the compounds in the database were docked onto the target protein, the protein-compound interaction energy and the volume overlap were assessed between the docked compound and the reference ligand that was given a priori. The docking score, S combined-MVO , is the summation of the conventional docking score (H dock ), which is the protein-compound interaction energy, and the MVO score (S MVO ), which is the volume overlap between the docked compound and the reference ligand:

Theoretical Background
where  is a coupling parameter (0 <  < 1). The MVO score (S MVO ) is a measure of the volume overlap among the docked compound and the known ligands of the protein-ligand complex structures. We proposed S MVO in previous papers [6,7]. Let molecules A and B be the template molecule (the known active compound) and the query molecule(s) (the compound in the database), respectively. Let H  and H  (=H  ) be the Hamiltonian of molecule  and the interaction energy between molecules  and , respectively. Here,  and  are A and B. The total Hamiltonian (H) of this system is: and: where , N A , N B , c, w, and q thr are the conversion factor, the number of atoms in molecule A, the number of atoms in molecule B, a coefficient, a switching function, and a threshold value, respectively. In the present study, parameters , c, and q thr were set as 1 kcal/mol, 1.0 Å -2 , and 0.2 in the atomic unit, respectively, as in our previous study [7]. The predicted docking poses are generally 2-3 Å different from the true docking poses, and the errors of the coordinates are larger than the N-H bond lengths. In the amine group, the N and H atoms have negative and positive charges, respectively. Thus the atomic charges of both N and H atoms of amine are set to +0.5 to reduce the error of H MVO . Also, the atomic charges of all O atoms are set to -0.5 to increase the chance they will overlap. The overlap of O atoms is important in order to maintain the pharmacophore, but the atomic charge of the O atom depends on its neighboring atoms. The intramolecular interactions of molecules A and B are calculated by the usual Hamiltonians. These Hamiltonians consist of bond, angle, torsion angle, improper torsion angle, 1-4 and 1-5 van der Waals interactions, and Coulombic interactions. The interaction between A and B is represented by the MVO potential by H AB . This implies that there are no van der Waals or Coulombic interactions between A and B.
The score S AB is the measure used to evaluate the overlap between molecules A and B: While many kinds of definitions are available, our screening tests showed that the S AB by Equation. 6 is the most useful. After the rough docking using the Sievgene docking program [13], the docking poses are optimized based on S combined-MVO score with Equations. (2) and (6). Test molecules are sorted according to the S combined-MVO score.
ROCS (OpenEye) calculates the volume overlap between two molecules, while the MVO calculates the volume overlap between two molecules with considering the atomic charges of these molecules. Thus, the MVO method in this study could be replaced by ROCS and the combinations of the other docking programs (Glide, GOLD, DOCK, and etc) and ROCS should be possible.

Examination of Used Parameters and Evaluation of the Combined MVO with Docking Method
The present method was applied to the drug screening of six target proteins: MIF, COX-2, Thermolysin, HIV protease-1, GST, and carboxypeptidase A. Eighteen screening results were obtained. The area under database enrichment curve (AUC) values of the 18 screening results and the hit ratio (% of the active compounds) at the first 1% compounds selected (enrichment factor) are summarized in Tables 1 and 2. The AUC values are expressed in percentages, and the AUC values of 50% and 100% correspond to the random screening and the ideal screening. The screening results depended on the decoy set used (compound library), so that two decoy sets were used in this examination. The deviations in the AUC values of the 18 screening results are summarized as  values in Tables 1 and 2.  The  [of equatoin (2)] dependence of the AUC value was examined with a change in  from 0 to 1. The present combination of MVO with the docking method using  = 0 is equivalent to the original Sievgene docking program, and that with  = 1 is equivalent to the score of the original MD-MVO method. The optimal  value was found to be 0.5 from Tables 1 and 2. This trend did not depend on the difference in decoy sets. Considering the AUC values in Tables 1 and 2, the optimal  value should exist between 0.5 and 0.65. The trends of the 1% hit ratio and the AUC values are convex upward between  =0 and  =1, we selected the  =0.5 that mean the simple average of the docking score and the MVO score for further consideration. The Sievgene docking scores were distributed from -2 to -4, and the MVO scores were distributed from 0 to -1. The hit ratio was almost proportional to the AUC value. Namely, the correlation coefficients between the hit ratio and the AUC were 0.53 and 0.75 for the LigandBOX decoy set and the Coelacanth decoy set, respectively. Thus, we used the AUC value in the following discussions.
The van der Waals radius dependence of the AUC value was also examined. The van der Waals radii of the protein atoms were reduced by the damping factor (k), which was changed from 1 to 0.8. The protein structure was fixed in the compound docking process, whereas in the docking process it should change as induced fitting. To mimic the induced fitting and reduce the atomic conflict between the compound and the protein, the van der Waals radius of protein was reduced by the damping factor.
Although the AUC value depends strongly on the  value, it does not strongly depend on the damping factor, the optimum value of which is 0.85 to 0.95.
The screening results obtained by the combined MVO with docking method were better than the results obtained by the original Sievgene scoring method and those obtained by the MD-MVO method. Thus, the combination of the Sievgene score and the MVO score worked well for both compound sets. Also, the deviation in the hit ratios (AUC values) obtained by the combined MVO with docking method was smaller than those obtained by the original Sievgene score and the MVO score. This suggests that the hit ratio by the combined MVO with docking method is stable for many target proteins. The in-silico docking screening results strongly depend on both the protein and compound data sets. Perfect shape and electrostatic complementarities between protein and compound show strong affinity. In this case, slight structure changes of the protein or the compound should cause the atom confliction and the repulsive interaction or very weak affinity should be observed. In many docking studies including the current study, the protein structure is rigid and the ligand flexibility is partially considered (only rotatable bonds are rotated and the bond length/angle are fixed). Recently, ensemble docking approach is applied to reduce this problem [25][26][27]. In the current method, the ligand-based score (MVO score) should contribute to reduce the structure-dependence of the screening results.
The large value of  in Table 2 comparing to that in Table 1 was due to the low AUC values for COX-2. The AUC values of COX-2 (1cox, 1pxx, 3pgh, 4cox and 6cox) were almost half of the average value of AUC. The decoy set of Table 2 was provided by only one company (Coelacanth Co. Ltd.) and the decoy set of Table 1 was provided by 45 companies. The compound provided by Coelacanth was a random library but it could be biased to bind COX-2. If the Coelacanth decoy set was less random than the LigandBOX decoy set, the  values in Table 2 should be bigger than those in Table1. We also applied the maximum common substructure search [28] with the ranking by the Tanimoto index as the similarity search method. The results were shown in Tables 1 and 2. The screening results obtained by the combined MVO with docking method were slightly better than the results obtained by the obtained by the MCS method, when the LigandBOX decoy set and the both results were similar to each other when the Coelacanth decoy set was used.
The present method was applied to the drug screening of five target proteins with using the DUD: COX-2, ER, HIV, THR, and TRY [29]. Ten screening results were obtained. The AUC values of the 10 screening results are summarized in Table 3. The screening results obtained by the combined MVO with docking method were slightly worse than the results obtained by the original Sievgene scoring method and those obtained by the MCS method. But the deviation of the AUC value obtained by the combined MVO with docking method were smaller than the results obtained by the original Sievgene scoring method and those obtained by the MCS method. The method with small AUC deviation is useful, since the hit ratio does not vary widely for each target. The screening results obtained by the combined MVO with docking method were different from the results obtained by the MCS method. Namely, the correlation coefficient between the ranking orders by the combined MVO with docking and the MSC methods was only 0.38. The users can use both the methods and try the consensus results of them.
The MS-MTS method does not require protein-ligand complex structures. On the contrary, the combined MVO with docking method requires at least a single protein-ligand complex structure, but the number of active compounds could be only one. Thus, the combined MVO with docking method should be suitable for fragment screening by X-ray crystallography in which one protein-ligand complex structure is provided. And the MS-MTS method should be suitable for a conventional HTS assay.
The Sievgene docking score consists of an accessible surface term (mainly hydrophobic interaction), an electrostatic term, a hydrogen bonding term, and a van der Waals term. The contributions of the accessible surface term, electrostatic term, hydrogen bonding term, and van der Waals term were 89.9%, 0.88%, 5.41%, and 3.78%, respectively. Usually, the hydrophobic interaction gives almost the same contribution as the hydrophilic interaction, and obviously the Sievgene docking score underestimates the hydrophilic interaction. The suitable charge distribution of the potentially active compound could not be predicted by the Sievgene docking score. On the other hand, the MVO method considers the charge distribution of the compound based on the known active compound, but the MVO score cannot evaluate the protein-compound interaction. Thus, the combination of the Sievgene docking score and the MVO score should compensate for the weak point of each scoring method.
We applied our method to the estrogen receptor alpha (ER-) that is a transcription factor and has its agonists, antagonists and modulators. A complex of ER- with estrogen (PDB ID: 1ere) was selected as the template and the ligands of the other complex structures were docked into ER- pocket ( Table 4). Estrogen is an agonist of ER-, and most of the ligands are antagonists or modulators. Table 4 shows the RMSD values between the experimental and docked coordinates of ligand heavy atoms were calculated. The main-chain RMSDs of proteins were also calculated. The conventional docking (=0), the MVO method (=1) and the combined-MVO with docking method (=0.5) were examined and the current method slightly improved the docking accuracy. As shown in Table 4, the ligand structures were not steroid scaffold that is a template ligand structure. Since the docking poses of these compounds were improved by the combined-MVO with docking method, the current method could be applied to various structurally different molecules from the template molecule to improve the docking poses. We examined the correlation between the experimental protein-ligand binding free energy and the calculated score. Thrombin (PDB IDs: 1dwb, 1etr, 1ets and 1ett) and trypsin (PDB IDs: 1ppc, 1pph, 1tng, 1tnh and 3ptb) were selected as targets. For thrombin, 1dwb was selected as a template and the other protein-ligand complex structures were superimposed onto the coordinates of 1dwb. For, trypsin, 1ppc was selected as a template and the other protein-ligand complex structures were superimposed onto the coordinates of 1ppc. The effect of the induced fitting was small. Namely, the main-chain RMSDs of thrombin and trypsin were less than 0.7 Å and 0.3 Å, respectively. The conventional docking (=0), the MVO method (=1) and the combined-MVO with docking method (=0.5) were examined and the current method did not improved the correlation between the binding free energy and the score. For thrombin, the correlation coefficients were 0.94, 0.56 and 0.29 for Sievgene docking score, the S combined-MVO score and the MVO score, respectively. For trypsin, the correlation coefficients were 0.92, 0.90 and 0.40 for Sievgene docking score, the S combined-MVO score and the MVO score, respectively. In practical use, we should try to find ligands with stronger affinity than the template ligand that is an initial weak hit compound. Thus, the similarity between ligands should not correlate with the binding affinity.
The combined MVO with docking method could be applied to drug design. When a desirable compound is designed, a set of compounds that are similar to the designed compound could be found by this method, considering the protein-compound interaction. When a designed compound cannot be synthesized, we can get similar compounds as an alternative from the commercially available compound library.

Methods: Combining MVO with the Docking Method
Protein-compound docking simulation was performed by the program, Sievgene, which is a proteinligand flexible docking program for in silico drug screening [13]. This program generates many conformers (100 conformers by default) for each compound and keeps the target protein structure rigid, but with soft interaction forces adapting its slight structural change to some extent. There have been more than 100 protein-compound docking programs reported and the algorithms were summarized in a review [30]. There are mainly three types of scoring functions and these functions are based on inter-atomic distances among the protein and the compound atoms. Such scoring functions are strongly depending on the structure change of the protein, thus ensemble docking or consensus scoring has been used to overcome the problem. Sievgene scoring function was designed to consider the structural change of the target protein. In the inner region of the target protein, the protein is approximated as an elastic body, while the atomic pair-wise scoring function is applied in the outer region of the target protein. This docking program was developed with a performance yielding about 50% of the reconstructed complexes at a distance of less than 2 Å RMSD for the 132 complexed receptors with the compounds in PDB. The results predicted by our program were almost the same as those predicted by other docking programs [13].
where N rot , E ASA , E vdW , E ele , E hyd , and E intra-vdW represent the number of rotatable bonds of the ligand molecule, the hydrophobic energy due to the accessible surface area, the vdW energy, the proteinligand Coulombic potential, the hydrogen bond energy, and the intramolecular vdW energy of the ligand for Sievgene [13]. Also, c rot , c AV , c ele , c hyd , and c intra-vdW are the optimized coefficient for each energy term. For each atom type, the sum of E ASA and E vdW gives a grid potential, and both energy terms are always simultaneously calculated. Thus, these two terms share the same coefficient, c AV . Sievgene utilizes the grid potential to calculate each energy term except for the intramolecular interactions. In this study, a mesh size of 60  60  60 was adopted. The docking procedure was as follows (see Figure 1): Step 1 The pocket is indicated by the known ligand coordinates, and the potential energy grids were generated around the ligand-binding pocket.
Step 2 Electrostatic potential field on the accessible surface of the receptor is calculated to find a total of 30 potential minima and maxima. Also, hydrophobic potential is calculated by using a methane probe to find those 30 potential minima. Triangles are generated to connect these points; the data regarding these triangles are recorded in a hash table.
Step 3 The program reads a compound of the database and then generates its conformers. The dihedral angles are randomly incremented every 120 degrees. Step 4 The global search program chooses any three atoms of the compound and superimposes the compound onto the receptor surface according to the geometric hash method. The S combined-MVO score is then evaluated.
Step 5 Starting from the initial coordinate generated in step 4, the compound coordinates reaches the optimal complex structure using the steepest descent method to minimize the S combined-MVO score with the grid potential of the receptor force field and the known-ligand coordinates. The AMBER-type molecular force field is used.
In the present study, the docking simulations were performed by a modified version of the Sievgene/myPresto program.

Preparation of Materials
To evaluate our method, we performed a protein-compound docking simulation based on the soluble protein structures registered in the Protein Data Bank (PDB). Our target proteins and the inhibitors were almost equal to those used in our previous work [31]. The target proteins were the macrophage migration inhibitory factor (PDB code: 1gcz), COX-2 (PDB code: 1cx2, 1pxx, 3pgh, 4cox and 6cox), HIV protease-1 (PDB code: 1aid, 1hpx and 1ivp), thermolysin (PDB code: 1tmn, 2tmn, 1tlp), and GST (PDB code: 18gs, 2gss, and 3pgt), and carboxypeptidase A (PDB code: 1cbx, 1cps, 3cpa). The protein-ligand complex structures were selected from the PDB, and the hydrogen atoms were added for the present docking study. The atomic charges of the proteins were the same as the atomic charges of AMBER parm99 [32].
The compound set consisted of 14 inhibitors of MIF, 28 inhibitors of thermolysin, 14 inhibitors of COX-2, 19 inhibitors of HIV protease-1, 12 inhibitors of GST, and 3 inhibitors of carboxypeptidase A [31]. Two decoy sets were prepared. One is a set of 11,050 potential-negative compounds of the Coelacanth chemical compound library (Coelacanth Corporation, East Windsor, NJ, USA) which is a random library, and the other is a set of 10,000 randomly selected compounds from the LiganBOX database [24]. The LigandBOX data were provided by 45 companies, while the Coelacanth decoy set was provided by only one company. The Coelacanth decoy was bigger than the LigandBOX decoy. The average mass weight of the Coelacanth decoy set was 423Da and the average number of heavy atoms was 30.9. The mass weight of each compound of the LiganBOX decoy set was > 150 Da and < 350 Da, the average mass weight was 289 Da and the average number of heavy atoms was 20.4. The molecular size of the DUD decoy set was intermediate and the size depended on the target protein. The average mass weight was 300-450 Da and the average number of heavy atoms was 20-35.
The 3D coordinates of the Coelacanth chemical compound library were generated by the Concord program (Tripos, St. Louis, MO, USA) from the 2D Sybyl SD files provided by the Coelacanth Chemical Corporation. The 3D coordinates of the inhibitors were generated by Chem3D (Cambridge Software, Cambridge, MA, USA). The atomic charges of each ligand were determined by the Gasteiger method [33,34]. The LigandBOX library generation procedure was described in our previous paper [24].
The other compound set was the decoy set of the directory of useful decoy (DUD) for each target protein [29]. The specific DUD decoy and ligand sets were prepared for each target. The cyclooxygenase-2 (COX2), estrogen receptor (ER), HIV protease-1 (HIV), thrombin (THR), and trypsin (TRY) were selected for the validation test of the current method. Two target protein structures were selected for each of the other five proteins. Namely 4cox and 6cox for COX2, 3erd and 3ert for ER, 1hpv and 1htf for HIV, 1etr and 1ets for THR, and 1tng and 1tnh for TRY were used, respectively. The compound set consisted of inhibitors of a target protein and compounds of a decoy set. The numbers of prepared inhibitors or antagonists were 426, 39, 62, 72, and 49 for COX2, ER, HIV, THR, and TRY, respectively. The numbers of compounds of DUD decoy sets for COX2, ER, HIV, THR and TRY were 13289, 1448, 2038, 2456, and 1664, respectively.

Conclusions
We have developed a combined MVO with docking method, which is an integration of SBDS and LBDS. This method can be applied when at least a single protein-ligand complex structure is available. Compounds from a database were docked around the known-ligand coordinates by a flexible docking method. The volume overlap between the known ligand and the docked compound was calculated, and the total score was the sum of the volume overlap and the protein-compound binding energy. The combined MVO with the docking method was applied to in silico drug screenings for several target proteins. The hit ratio obtained by this combined method was higher than that achieved in a native docking-screening study, and it was better than the results obtained by the original Sievgene docking score and the MVO score.