Next Article in Journal
Identification of a Multicomponent Traditional Herbal Medicine by HPLC–MS and Electron and Light Microscopy
Next Article in Special Issue
Extracting Fitness Relationships and Oncogenic Patterns among Driver Genes in Cancer
Previous Article in Journal
Design of New-Generation Usable Forms of Topical Haemostatic Agents Containing Chitosan
Previous Article in Special Issue
Developing an Agent-Based Drug Model to Investigate the Synergistic Effects of Drug Combinations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

HIGA: A Running History Information Guided Genetic Algorithm for Protein–Ligand Docking

Key Laboratory of Medical Image Computing of Northeastern University, Ministry of Education, and School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
*
Author to whom correspondence should be addressed.
Molecules 2017, 22(12), 2233; https://doi.org/10.3390/molecules22122233
Submission received: 8 November 2017 / Revised: 3 December 2017 / Accepted: 12 December 2017 / Published: 15 December 2017

Abstract

:
Protein-ligand docking is an essential part of computer-aided drug design, and it identifies the binding patterns of proteins and ligands by computer simulation. Though Lamarckian genetic algorithm (LGA) has demonstrated excellent performance in terms of protein-ligand docking problems, it can not memorize the history information that it has accessed, rendering it effort-consuming to discover some promising solutions. This article illustrates a novel optimization algorithm (HIGA), which is based on LGA for solving the protein-ligand docking problems with an aim to overcome the drawback mentioned above. A running history information guided model, which includes CE crossover, ED mutation, and BSP tree, is applied in the method. The novel algorithm is more efficient to find the lowest energy of protein-ligand docking. We evaluate the performance of HIGA in comparison with GA, LGA, EDGA, CEPGA, SODOCK, and ABC, the results of which indicate that HIGA outperforms other search algorithms.

1. Introduction

Drug molecular design, as a new drug research method and means, has achieved a lot of theoretical and practical research findings [1,2,3,4]. Protein-ligand docking is a typical method for structure-based drug discovery and design, the aim of which is to find the best ligand conformation of a ligand against a protein receptor target with the lowest energy [5,6,7,8,9]. The progress of X-ray diffraction technology of biological macromolecules provides us with more important structures of proteins and ligands. These structures can be used as targets for bioactive substances to control diseases in animals and plants, and they allow for people to understand the biological mechanisms of active substances simply [10,11,12]. The rapid development of computer technology has promoted the further development of molecular drug design effectively and significantly reduced the cost of drug research. For the docking process, the ligands are placed at the active site of the receptors, then the position and orientation of the ligands are adjusted by some binding and complementary principles, and then the optimal binding modes are obtained finally.
A search algorithm and a scoring function are the basic tools of a docking method for solving the two goals above. The scoring function is used to evaluate the binding conformation of ligands and receptors that were predicted by computer simulation [13,14,15,16,17]. In the docking process, it is necessary to obtain the binding affinity accurately as the basis for optimization. The scoring function not only provides a fast method for evaluating the binding affinity, but it also assists a docking in efficiently exploring the binding space of a ligand [18,19]. The score function can be directly used as the fitness function of optimization algorithms.
The search algorithm aims to identify the optimal binding mode between ligands and receptors, including the location of small ligands relative to proteins and conformational changes in molecules [20,21]. The best result of searching is the docked conformation with the lowest energy. The space search in molecular docking is the NP-hard problem, so it is impossible to traverse all of the search space. Heuristic algorithms have a lot of successful applications for protein-ligand docking problems [22,23,24]. For example, simulated annealing (SA) [25], genetic algorithm (GA) [26,27,28], Lamarckian genetic algorithm (LGA) [29], SODOCK [30,31,32], and artificial bee colony (ABC) [33]. However, the existing algorithms do not make a reasonable use of the history information, which results in the insufficient quality of the solutions that they obtain. Therefore, an efficient optimization algorithm that can find lower docking energy and RMSD is desirable.
The LGA is proven to be efficient, but it does not take advantage of the history information that is accumulated during the optimization procedure, resulting in it being hard to discover some promising solutions. As a result, we report an LGA-based novel genetic algorithm that enhances the performance of protein-ligand docking by utilizing the running history information in this article. The proposed algorithm can be abbreviated as HIGA, which stands for running history information guided genetic algorithm. Running history information refers to the information that retained during the running process of the algorithm, has a guiding role for the subsequent iterations. Running history information includes elite individuals, individuals with the historical optimal solution and suboptimal solution, and the location of individuals.
The three critical strengths of HIGA are as follows. (1) CE crossover is proposed to optimize crossover operation of the algorithm [34]. CE crossover uses history information to retain individuals with good genes, and these individuals make the subsequent population better; (2) ED muation is proposed to guide the evolutionary direction according to running historical information [35]; (3) Binary space partitioning (BSP) tree is employed to maintain the diversity of the individuals in a population [36]. The BSP tree can memorize all of the evaluated solutions so as to avoid solution re-evaluation.
The environment and the scoring function of AutoDock 4.2.6 are adopted as the experimental platform in the article [37,38,39]. AutoDock, as an open source academic software, which can embed the improved algorithm conveniently. AutoDock first uses the amino acid residues around the active site of the receptor to form a box. Then, it uses different types of atoms as the probe to scan, calculate the grid energy, and to search for the ligand in the range of the box. At last, it scores according to the different conformation, orientation, and position of the ligand. In order to demonstrate the power HIGA, we perform the experiement on a set of protein-ligand structures from PDBbind 2016 [40]. The performance of GA, LGA, EDGA, CEPGA, SODOCK, ABC, and HIGA is compared on these datasets. The experimental results show that our method has improved power in the aspects obtained energy and RMSD, convergence performance, data distribution, and hypothesis test.

2. Results and Discussion

2.1. Data Preparation and Parameter Setting

The docking power of seven algorithms is compared, and they are GA, LGA, EDGA, CEPGA, SODOCK, ABC, and HIGA. For the compared algorithms, the number of iterations is 27,000, the number of energy evaluations is 1.5 × 106, the number of individuals is 50, and the other parameters are the default. The algorithms are terminated by reaching the number of iterations or the number of energy evaluations. We randomly choose a hundred X-ray crystallographic complexes (PDB) from PDBbind 2016 to make up Dataset 1, and then we choose sixteen complexes that have a different number of rotatable bonds in ligands from the hundred complexes to constitute Dataset 2. Before docking, we preprocess the downloaded proteins and ligands. The steps of protein processing are removing water molecules, adding charges, assigning hydrogens, and solvation. The procedures for treating ligands are adding charges, assigning hydrogens, detecting root, and choosing torsions. The molecular structures of the sixteen ligands in Dataset 2 are showed in Figure 1, and the sixteen complexes are briefly described below.
(1)
3ptb β-trypsin/ben (benzamidine)
β-Trypsin, isolated from the pancreas of pigs, sheep, as well as cattle, is used as a protease. Benzamidine, an inhibitor, is generally utilized in suppressing proteolysis of proteins.
(2)
1aha α-momorcharin/ade (adenine)
α-Momorcharin originates from seeds of Momordica charantia, while adenine is a biological component of organism.
(3)
3hvt HIV-1 reverse transcriptase/nvp
HIV-1 reverse transcriptase (RT), a phosphate enzyme, is involved in synthesis of cDNA. Nvp is a strong, non-nucleoside RT suppressor.
(4)
1phg cytochrome P450-cam/hem (protoporphyrin IX)
Cytochrome P450-cam, participating in metabolism of exogenous, as well as endogenous substances, is a superfamily of heme-thiolate proteins. Protoporphyrin IX, a purple brown crystalline powder, can dissolve in methanol, while is not soluble in ether, chloroform, acetone, or water.
(5)
2mcp McPC-603/pc (phosphocholine)
McPC-603, a myeloma protein from mouse that binds to phosphocholine, interacts with phosphatidylcholine synthesis in tissues.
(6)
1stp streptavidin/btn (biotin)
Streptavidin, a protein obtained from streptomyces, harbors a similar biological features with affinity. Biotin, a member of B vitamins, plays a critical role in normal metabolism of proteins as well as fats.
(7)
6rnt ribonuclease T1/ca (calcium ion)
Ribonuclease T1, an endonuclease, is able to discard the non-hybridized RNA area in DNA-RNA hybrid. Calcium ion plays a vital role in human physiological functions.
(8)
4dfr dihydrofolate reductase/mtx (methotrexate)
Dihydrofolate reductase, has universally been utilized as a therapeutic target in anti-tumor therapy, as well as other aspects. Methotrexate, a drug with potent immunosuppressive effect, is capable of inhibiting proliferation as well as division of immune cells.
(9)
1ett thrombin/4qq
Thrombin is a formless, white to gray, freeze-dried powder, and 4qq is regarded as a non-polymer suppressor.
(10)
1hri human rhinovirus/s57
Human rhinovirus causes the majority of human common cold. s57 is a member of imidazole.
(11)
1hvr protease/xk2
Protease, an enzyme widely found in animals as well as plants, is capable of catalyzing protein catabolism. xk2, a small molecule inhibitor, is able to decrease or even prohibit chemical reaction rate.
(12)
4hmg hemagglutinin/sia (sialic acid)
Hemagglutinin is the cause of coagulation of erythrocytes. Sialic acids, generated at terminal sugars, are members of acidic monosaccharides.
(13)
1cdg cyclodextrin glycosyl transferase/mal (maltose)
Cyclodextrin glycosyltransferase, a bacterial enzyme, is able to produce cyclodextrins. Maltose, made of starch, as well as malt, is widely utilized as nutrient as well as culture medium.
(14)
1htf HIV-1 protease/g26
HIV-1 protease is capable of separating newly-generated polyproteins into individual peptides. g26, a non-polymer suppressor, is an amide with easily oxidizable and highly reactive perssad.
(15)
1glq glutathione S-transferase/gtb (S-(P-nitrobenzyl)Glutathione)
Glutathione S-transferase, a series of enzymes, is associated with hepatic detoxification process. S-(P-nitrobenzyl) Glutathione is a critical synthesis of glutathione precursor.
(16)
1tmn thermolysin/nas (2-naphthalenesulfonic acid)
Thermolysin, a biological component, is featured by a more rapid hydrolysis of hydrophobic amino acids. 2-naphthalenesulfonic acid, white powder or crystal, can dissolve in water but not in alcohol, which is widely adopted in organic synthesis.

2.2. Comparison of Energy and RMSD

The primary objective of our experiment is to find the lowest energy. The values of the lowest energy, calculated by the semi-empirical free energy force field [15], is the most important criterion to evaluate the performance of the compared algorithms. Root-mean-square positional deviation (RMSD) is also the commonly used standard to evaluate the molecular docking results. RMSD compares the optimal docking structure with the experimentally measured actual structure. If the RMSD is smaller than a given threshold 2.0 Å after docking, then the docking can be considered successful. Each algorithm runs one time in Dataset 1. The success of the docking is recorded, Average RMSD (all cases) and Average RMSD (RMSD < 2 Å) are calculated (Table 1). For the number of success cases and the Average RMSD, HIGA is obviously superior to other algorithms. Each algorithm runs twenty times in Dataset 2, the lowest energy and RMSD are recorded, and the results are showed in Table 2. For the lowest energy of the sixteen complexes, HIGA finds the twelve lowest values, EDGA finds the two lowest values, CEPGA and SODOCK find the one lowest value respectively, and GA, LGA, ABC do not find any of the lowest values. Although HIGA does not find the lowest energy in 1aha, 1ett, 4hmg, and 1htf, the energy values found are still promising. For example, the energy −16.15 kcal mol−1 found by HIGA is close the lowest energy −16.23 kcal mol−1 found by EDGA in 1aha; the energy −21.17 kcal mol−1 found by HIGA is close the lowest energy −21.79 kcal mol−1 found by SODOCK in 1htf. The number of the lowest RMSD that is found by HIGA, GA, LGA, EDGA, CEPGA, ABC SODOCK is 7, 2, 1, 2, 1, 1 and 2, respectively. HIGA has no absolute advantage in finding the lowest RMSD, but it is better than the other algorithms. In conclusion, the best search method is HIGA with regard to its average performance.

2.3. Cluster Analysis of Docked Conformations

After twenty times docking of each complex in Dataset 2, twenty docked conformations are obtained. These conformations exhaustively compared to one another to determine similarities, and they are clustered if they are similar enough. The range of the formed clusters is 0 to 20. The clusters are ranked in order of increasing energy, by the lowest energy in each cluster. Rank 1 is the lowest energy cluster, it refers to how often the structure with the lowest energy is found. The concentration of the clusters and the docked structures in rank 1 can reflect the stability of the algorithm. The cluster analysis is performed in Dataset 2, and the results are shown in Table 3. The mean of the number of clusters found is lowest for HIGA (3.72), followed by CEPGA (4.04), EDGA (4.24), LGA (4.58), SODOCK (10.30), ABC (6.52), and finally GA (13.04). The mean of the number of docked structures in rank 1 is 17.04 for HIGA, 16.20 for CEPGA, 15.92 for EDGA, 15.72 for LGA, 11.82 for SODOCK, 8.70 for ABC, and 8.00 for GA. Hence, the most reliable search method is HIGA.

2.4. Convergence Analysis

Convergence means that the convergence curve of the objective solution tends to be stable after several iterations. The convergence diagrams of seven different algorithms for solving different test cases of Dataset 2 are shown in Figure 2. The number of iterations is 3000, 6000, 9000, 12,000, 15,000, 18,000, 21,000, 24,000 and 27,000, respectively, and these values are used as the horizontal axis of the convergence diagrams. The energy of each algorithm under different iteration times is calculated as the vertical axis. At the early stage of each algorithm, the energy value decreases as the number of iterations increases. But, in the later stage, the energy values of some algorithms tend to be fixed. This phenomenon, which is caused by the decrease of population diversity and the loss of evolutionary capacity, is called premature convergence. For example, LGA, EDGA, and SODOCK are prematurely convergent after iterating 15,000 times in 2mcp; GA and ABC are prematurely convergent after iterating 18,000 times in 2mcp; LGA, EDGA, SODOCK, and ABC are prematurely convergent after iterating 21,000 times in 6rnt. From these graphs, it can be concluded that HIGA is superior to other algorithms regarding preventing premature and solution quality.

2.5. Data Distribution Analysis

The data distribution can reflect the concentration of the data and the stability of the algorithm. We calculate the minimum, the first quartile, the median, the third quartile, and the maximum of the energy values of each PDB, and then we use the five statistical quantities to draw the box plots (Figure 3). The median, which is not affected by the extreme data, is suitable as a centralized trend value. It is evident that the median energy of HIGA is lower than that of the other algorithms. The first quartile is the upper boundary of the box, the third quartile is the lower boundary of the box, and the data distribution is concentrated or dispersed and can be determined by observing the box. It can be seen that the data distribution of HIGA is the most concentrated. The points outside the maximum and minimum are the outliers, and the outliers have an undesirable consequence of data distribution. For example, the outlier of GA in 1glq; the outlier of LGA in 3hvt; the outlier of EDGA in 1stp; the outlier of SODOCK in 1cdg; the outlier of ABC in 4hmg. HIGA and CEPGA have no outliers. It can be concluded that HIGA is a stable method for protein-ligand docking.

2.6. Execution Time Analysis

The execution time of the seven compared algorithms for solving different complexes of Dataset 2 are shown in Table 4. The time is recorded by how many seconds per run. There a direct relation between the execution time and the problem complexity. In addition to a few complexes, the execution time of the algorithms increases as the number of rotatable bonds increases. As seen from the table, the execution time of GA is the fastest, followed by LGA, EDGA, CEPGA, HIGA, ABC, and SODOCK. The best performance algorithm HIGA in previous experiments does not play an advantage in the time performance. However, the execution time of HIGA is not greatly increased when compared to the fastest algorithm GA. This can demonstrate that HIGA does not raise the performance at the cost of increasing the execution time.

2.7. Comparison Based on the Hypothesis Test

We use the hypothesis test to determine the difference between each algorithm in Table 5. Five of the best values obtained by each algorithm for every PDB are taken, and the critical value of hypothesis test is set to 0.05. When comparing algorithm A to algorithm B, we can conclude the algorithm A is significantly different from the algorithm B if the p-value is less than 0.05. The significant difference demonstrates that the algorithm A is superior to the algorithm B. As seen from the table, HIGA, EDGA, and CEPGA are better than GA, LGA, SODOCK, and ABC in most of the PDBs. Furthermore, HIGA is better than EDGA and CEPGA according to the p-value. We can make a conclusion from the statistical analysis that HIGA is significantly better than the other algorithms.

3. Materials and Methods

3.1. Running History Information Guided Genetic Algorithm

HIGA is an LGA-based hybrid search algorithm that is designed for protein-ligand docking. GA is an algorithm for simulating natural evolution process, and it generates solutions to optimise problems with the assistance of mutation, selection, and crossover. In each iteration of the genetic algorithm, the individual selection is made according to the fitness of the individual in the feasible region of the problem. Thus, a new and better approximate solution is generated. GA is simple, easy to realize, and has high robustness, so it has been successfully applied to solve the protein-ligand docking problem. LGA, which combines GA with local search method, searches the potential energy surface rapidly by GA and optimises the potential energy surface by local search method. LGA is proved to be an effective method for the protein-ligand docking, but the solutions that are obtained by LGA are unstable because of its randomness. Therefore, HIGA is proposed to overcome the drawback. Three core mechanisms are used to enhance the performance of HIGA for docking problems, and they are CE crossover, ED muation, and binary space partitioning (BSP) tree. CE crossover uses history information to improve the randomness of crossover operation. ED muation can guide the solutions to evolve in a better direction. BSP tree can memorize evaluated solutions that it has visited so that the diversity of solutions is maintained and the waste of computer resources is reduced. Taking the three techniques into consideration, the novel algorithm is guided to find some promising solutions by the running history information. Figure 4 is the block diagram of HIGA, and the grey parts are the three innovative mechanisms, which are newly added into LGA. HIGA first initializes the population randomly, and then obtains the next population by CE crossover, ED mutation, BSP tree, local search, selection and fitness evaluation. The process is iterated until a preset termination condition is reached.

3.2. CE Crossover

Since the parents of elite individuals are not preserved in the crossover process of LGA, the chances of getting better solutions are reduced by the subsequent crossover operation. CE crossover is proposed to solve the problem in this section. This new crossover can preserve the parents of the elite individuals with the best solution so that the good genes can be extended to improve the quality of the next-generation population. Table 6 shows the pseudocode of CE crossover. In the strategy, Mfather and Mmother are introduced, in which Mfather and Mmother represent the parents of the elite individual, e. The individual with current optimum fitness is selected as the elite individual at current iteration, and Mfather and Mmother are saved. The preserved individuals and the individuals of next-generation form a new population. The genes of these preserved individuals are excellent, and the possibility that they continue to reproduce individuals with good genes is greater. For each iteration, the number of elite individuals is set up to a certain percentage of the number of individuals, and the default is ten percent.

3.3. ED Mutation

In LGA, some genes of the individuals are random changes by mutation operation, which results in the search direction of the algorithm is also random and aimless. At the early stage of this algorithm, the randomness plays a very positive guiding effect for global search. It is because the optimal solution in which the direction is unknown in the case of no search experience. With the continuous iteration of the algorithm, the group search experience is accumulating, the direction is gradually clear, and the search space starts to converge gradually. This means that the current best solution is inevitably abandoned, even the global best solution. Therefore, ED mutation is proposed to optimize the search direction. The pseudocode of ED mutation is shown in Table 7. Where mi is the current solution; θ and δ are random numbers; β is a particular adjustable parameter, Moptimum is the historic optimal solution; and, Msub represents is the historic suboptimal solution; Mmax is the maximum solution; Mmin is the minimum solution. In the formula, the historic optimal solution and the historic suboptimal solution are used to ensure the correctness of the optimal solution direction. This mechanism is similar to the addition of vectors, and the addition can ensure that the algorithm evolves towards a better direction.

3.4. BSP Tree

The BSP tree that has been applied in non-revisiting genetic algorithm (NrGA) is used to store the evaluated solutions, and it divides the search space in accordance with the cumulative distribution of the solutions being evaluated. The BSP tree has only one root node at the beginning, and this node represents the entire search space. Each node that is subsequently inserted into BSP tree represents a subspace. If a parent node has two child nodes (l and r), the subspaces that are represented by the two child nodes are disjoint, and the sum of the two spaces is the subspace corresponding to the parent node. BSP tree is different from with Octree employed in QuickVina-W [41]. BSP tree stores all of the solutions that the algorithm has searched before, while Octree stores high-quality history points which are the output of last iteration of local optimization from all of the searching threads during the runtime. Table 8 shows the pseudo code for the working principle of BSP tree. Where RF is the revisit flag; and, d is the distance between two nodes. The position of each previous individual generated by the algorithm is recorded in a node of the tree. When the new generated individual m visits the node l or r, their positions are checked. If m = l or m = r, m is revisited and RF is 1. If the solution is revisited, it mutates to a nearest unvisited neighbor subspace. BSP tree can guarantee the location of all solutions is different so that the diversity of the individuals is maintained and the sample space of GA is increased.

4. Conclusions

The article presents HIGA, which combines binary space partitioning (BSP), ED muation, and CE crossover to extend the power of the LGA-based algorithm. CE crossover amplifies the probability of repeated use of the individual with good genes, so that HIGA is more suitable to the changes of the environment evolution. ED mutation provides a guarantee for the evolutionary direction of HIGA, and its concept originates from the property of vector addition. By using the BSP tree, the search algorithm can not only remove revisits, but also guide to search for the next unvisited position. We have compared the performance of HIGA, GA, LGA, EDGA, CEPGA, SODOCK, and ABC, the results of which indicate that HIGA outperforms that the other algorithms, suggesting that HIGA can enhance the power of AutoDock to protein-ligand docking.

Acknowledgments

This work was supported by the National Natural Science Foundation Program of China (61772124), the State Key Program of National Natural Science of China (61332014), the Fundamental Research Funds for the Central Universities under Grant 150402002 and Grant 150404008, and the Peak Discipline Construction of Computer Science and Technology under Grant 02190021821001.

Author Contributions

B.G. and Y.Z. conceived and designed the experiments; B.G. and C.Z. performed the experiments; B.G. and C.Z. analyzed the data; B.G. wrote the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bohlooli, F.; Sepehri, S.; Razzaghi-AsI, N. Response surface methodology in drug design: A case study on docking analysis of a potent antifungal fluconazole. Comput. Biol. Chem. 2017, 67, 158–173. [Google Scholar] [CrossRef] [PubMed]
  2. Zhao, Y.H.; Wang, G.R.; Yin, Y. Improving ELM-based microarray data classification by diversified sequence features selection. Neural Comput. Appl. 2016, 27, 155–166. [Google Scholar] [CrossRef]
  3. Li, Y.; Zhao, Y.H.; Wang, G.R.; Wang, Z.H.; Gao, M. ELM-Based Large-Scale Genetic Association Study via Statistically Significant Pattern. IEEE Trans. Syst. Man Cybern. Syst. 2017, 1–14. [Google Scholar] [CrossRef]
  4. Allen, S.E.; Dokholyan, N.V.; Bowers, A.A. Dynamic docking of conformationally constrained macrocycles: Methods and applications. ACS Chem. Biol. 2016, 11, 10–24. [Google Scholar] [CrossRef] [PubMed]
  5. Zou, Q.; Li, J.J.; Song, L.; Zeng, X.X.; Wang, G.H. Similarity computation strategies in the microRNA-disease network: A Survey. Brief. Funct. Genom. 2016, 15, 55–64. [Google Scholar] [CrossRef] [PubMed]
  6. Bjerrum, E.J. Machine learning optimization of cross docking accuracy. Comput. Biol. Chem. 2016, 62, 133–144. [Google Scholar] [CrossRef] [PubMed]
  7. Guedes, I.A.; de Magalhães, C.S.; Dardenne, L.E. Receptor-ligand molecular docking. Biophys. Rev. 2014, 6, 75–87. [Google Scholar] [CrossRef] [PubMed]
  8. Huang, S.Y.; Zou, X.Q. Advances and challenges in protein-ligand docking. Int. J. Mol. Sci. 2010, 11, 3016–3034. [Google Scholar] [CrossRef] [PubMed]
  9. Jug, G.; Anderluh, M.; Tomašič, T. Comparative evaluation of several docking tools for docking small molecule ligands to DC-SIGN. J. Mol. Model. 2015, 21, 164–178. [Google Scholar] [CrossRef] [PubMed]
  10. Zhao, Y.H.; Wang, G.R.; Zhang, X.; Yu, J.X.; Wang, Z.H. Learning Phenotype Structure Using Sequence Model. IEEE Trans. Knowl. Data Eng. 2014, 26, 667–681. [Google Scholar] [CrossRef]
  11. Zhao, Y.H.; Yu, J.X.; Wang, G.R.; Chen, L.; Wang, B.; Yu, G. Maximal Subspace Coregulated Gene Clustering. IEEE Trans. Knowl. Data Eng. 2008, 20, 83–98. [Google Scholar] [CrossRef]
  12. Zeng, X.X.; Liao, Y.L.; Liu, Y.S.; Zou, Q. Prediction and validation of disease genes using HeteSim Scores. IEEE ACM Trans. Comput. Biol. Bioinform. 2017, 14, 687–695. [Google Scholar] [CrossRef] [PubMed]
  13. Moitessier, N.; Englebienne, P.; Lee, D.; Lawandi, J.; Gorbeil, C.R. Towards the development of universal, fast and highly accurate docking/scoring methods: A long way to go. Br. J. Pharmacol. 2008, 153, 7–26. [Google Scholar] [CrossRef] [PubMed]
  14. Hu, X.; Balaz, S.; Shelver, W.H. A practical approach to docking of zinc metalloproteinase inhibitors. J. Mol. Graph. Model. 2004, 22, 293–307. [Google Scholar] [CrossRef] [PubMed]
  15. Huey, R.; Morris, G.M.; Olson, A.J.; Goodsell, D.S. Software news and update a semiempirical free energy force field with charge-based desolvation. J. Comput. Chem. 2006, 10, 1145–1152. [Google Scholar]
  16. Jain, A.N. Scoring functions for protein-ligand docking. Curr. Protein Pept. Sci. 2006, 7, 407–420. [Google Scholar] [PubMed]
  17. Muryshev, A.E.; Tarasov, D.N.; Butygin, A.V.; Butygina, O.V.; Aleksandrov, A.B.; Nikitin, S.M. A novel scoring function for molecular docking. J. Comput. Aided Mol. Des. 2003, 17, 597–605. [Google Scholar] [CrossRef] [PubMed]
  18. Bharatham, N.; Bharatham, K.; Shelat, A.A.; Bashford, D. Ligand binding more prediction by docking: Mdm2/mdmx inhibitors as a case study. J. Chem. Inf. Model. 2014, 54, 648–659. [Google Scholar] [CrossRef] [PubMed]
  19. Li, Z.F.; Gu, J.F.; Zhuan, H.Y.; Kang, L.; Zhao, X.Y.; Guo, Q. Adaptive molecular docking method baesd on information entropy genetic algorithm. Appl. Soft Comput. 2015, 26, 299–302. [Google Scholar] [CrossRef]
  20. Feinstein, W.P.; Brylinski, M. Calculating an optimal box size for ligand docking and virtual screening against experimental and predicted binding pockets. J. Cheminform. 2015, 7. [Google Scholar] [CrossRef] [PubMed]
  21. Ain, Q.U.; Aleksandrova, A.; Roessler, F.D.; Ballester, P.J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2015, 5, 405–424. [Google Scholar] [CrossRef] [PubMed]
  22. Guo, L.Y.; Yan, Z.Q.; Zheng, X.L.; Hu, L.; Yang, Y.L.; Wang, J. A comparison of various optimization algorithms of protein-ligand docking programs by fitness accuracy. J. Mol. Model. 2014, 20. [Google Scholar] [CrossRef] [PubMed]
  23. Blum, C.; Puchinger, J.; Raidl, G.R.; Roli, A. Hybrid mataheuristics in combinatorial optimization: A survey. Appl. Soft Comput. 2011, 11, 4135–4151. [Google Scholar] [CrossRef]
  24. Lόpez-Camacho, E.; Godoy, M.J.; Garcỉa-Nieto, J.; Nebro, A.J.; Aldana-Montes, J.F. Solving molecular flexible docking problems with mataheuristics: A comparative study. Appl. Soft Comput. 2015, 28, 379–393. [Google Scholar] [CrossRef]
  25. Goodsell, D.S.; Olson, A.J. Automated docking of substrates to proteins by simulated annealing. Proteins Struct. Funct. Genet. 1990, 8, 195–202. [Google Scholar] [CrossRef] [PubMed]
  26. Cao, T.C.; Li, T.H. A combination of numeric genetic algorithm and tabu search can be applied to molecular docking. Comput. Biol. Chem. 2004, 28, 303–312. [Google Scholar]
  27. Jones, G.; Willett, P.; Glen, R.C.; Leach, A.R.; Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 1997, 267, 727–748. [Google Scholar] [CrossRef] [PubMed]
  28. Thomsen, R. Flexible ligand docking using evolutionary algorithms: Investigating the effects of variation operators and local search hybrids. Biosystems 2003, 72, 57–73. [Google Scholar] [CrossRef]
  29. Fuhrmann, J.; Rurainsk, A.; Lenhof, H.P.; Neumann, D. A new Lamarckian genetic algorithm for flexible ligang-receptor docking. J. Comput. Chem. 2010, 31, 1911–1918. [Google Scholar] [PubMed]
  30. Chen, H.M.; Liu, B.F.; Huang, H.L.; Hwang, S.F.; Ho, S.Y. SODOCK: Swarm optimization for highly flexible protein-ligand docking. J. Comput. Chem. 2007, 28, 612–623. [Google Scholar] [CrossRef] [PubMed]
  31. Jason, S.; Merkle, D.; Middendorf, M. Molecular docking with multi-objective particle swarm optimization. Appl. Soft Comput. 2008, 8, 666–675. [Google Scholar] [CrossRef]
  32. Ng, M.C.; Fong, S.; Siu, S.W. PSOVina: The hybrid particle swarm optimization algorithm for protein-ligand docking. J. Bioinform. Comput. Biol. 2015, 13. [Google Scholar] [CrossRef] [PubMed]
  33. Uehara, S.; Fujimoto, K.J.; Tanaka, S. Protein-ligand docking using fitness learning-based artificial bee colony with proximity stimuli. Phys. Chem. Chem. Phys. 2015, 17, 16412–16417. [Google Scholar] [CrossRef] [PubMed]
  34. Guan, B.X.; Zhang, C.S.; Ning, J.X. Genetic Algorithm with a Crossover Elitist Preservation Mechanism for Protein-Ligand Docking. AMB Express 2017, 7. [Google Scholar] [CrossRef] [PubMed]
  35. Guan, B.X.; Zhang, C.S.; Ning, J.X. EDGA: A Population Evolution Direction Guided Genetic Algorithm for Protein-Ligand Docking. J. Comput. Biol. 2016, 23, 585–596. [Google Scholar] [CrossRef] [PubMed]
  36. Yuen, S.Y.; Chow, C.K. A genetic algorithm that adaptively mutates and never revisits. IEEE. Trans. Evol. Comput. 2009, 13, 454–472. [Google Scholar] [CrossRef]
  37. Kitchen, D.B.; Decomez, H.; Furr, J.R.; Bajorath, J. Docking and scoring in virtual screening for drug discovery: Methods and applications. Nat. Rev. Drug Discov. 2004, 3, 935–949. [Google Scholar] [CrossRef] [PubMed]
  38. Morris, G.M.; Huey, R.; Lindstrom, W.; Sanner, M.F.; Belew, R.K.; Goodsell, D.S.; Olson, A.J. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J. Comput. Chem. 2009, 30, 2785–2791. [Google Scholar] [CrossRef] [PubMed]
  39. Castro-Alvarez, A.; Costa, A.M.; Vilarrasa, J. The Performance of Several Docking Programs at Reproducing Protein-Macrolide-Like Crystal Structures. Molecules 2017, 22, 136. [Google Scholar] [CrossRef] [PubMed]
  40. Wang, R.X.; Fang, X.L.; Lu, Y.P.; Yang, C.Y.; Wang, S.M. The PDBbind database: Methodologies and updates. J. Med. Chem. 2005, 48, 4111–4119. [Google Scholar] [CrossRef] [PubMed]
  41. Hassan, N.M.; Alhossary, A.A.; Mu, Y.G.; Kwoh, C.K. Protein-Ligand Blind Docking Using QuickVina-W with Inter-Process Spatio-Temporal Integration. Sci. Rep. 2017, 7. [Google Scholar] [CrossRef] [PubMed]
Sample Availability: Not available.
Figure 1. The molecular structures of ligands.
Figure 1. The molecular structures of ligands.
Molecules 22 02233 g001
Figure 2. Convergence diagrams of seven algorithms in different X-ray crystallographic complexes (PDB).
Figure 2. Convergence diagrams of seven algorithms in different X-ray crystallographic complexes (PDB).
Molecules 22 02233 g002aMolecules 22 02233 g002b
Figure 3. Box plots of seven algorithms in different PDB.
Figure 3. Box plots of seven algorithms in different PDB.
Molecules 22 02233 g003aMolecules 22 02233 g003b
Figure 4. Block diagram of HIGA.
Figure 4. Block diagram of HIGA.
Molecules 22 02233 g004
Table 1. Results of success case and average root-mean-square positional deviation (RMSD).
Table 1. Results of success case and average root-mean-square positional deviation (RMSD).
AlgorithmSuccess CaseAverage RMSD (All Cases)Average RMSD (RMSD < 2 Å)
HIGA901.811.32
GA583.121.80
LGA682.551.69
EDGA772.211.62
CEPGA812.131.59
SODOCK732.881.79
ABC623.251.83
Table 2. The lowest energy and RMSD results.
Table 2. The lowest energy and RMSD results.
HIGAGALGAEDGACEPGASODOCKABC
PDBLigandTorsionsEnergyRMSDEnergyRMSDEnergyRMSDEnergyRMSDEnergyRMSDEnergyRMSDEnergyRMSD
3ptbben0−12.221.95−10.311.66−11.531.92−12.181.95−11.721.90−11.572.00−10.951.97
1ahaade1−16.150.90−15.261.20−16.100.45−16.230.38−15.320.89−14.951.44−13.951.85
3hvtnvp2−18.190.45−15.780.40−17.220.33−17.520.53−17.900.30−16.780.58−15.950.68
1phghem3−9.580.60−7.481.25−8.560.80−9.510.70−9.320.64−9.151.34−7.951.67
2mcppc4−9.351.15−7.761.46−8.221.33−9.251.23−9.101.20−7.721.42−7.851.64
1stpbtn5−13.900.85−11.301.84−13.371.65−13.671.25−13.570.90−13.521.00−13.201.58
6rntca6−9.620.50−8.780.65−9.130.70−9.230.85−9.320.58−9.121.95−8.951.45
4dfrmtx7−12.821.56−10.130.90−11.441.23−12.741.59−12.121.90−11.741.67−10.211.97
1ett4qq8−13.941.40−11.421.62−13.891.38−12.791.20−14.211.29−12.081.54−12.751.65
1hris579−11.021.18−9.671.80−10.211.87−10.331.37−10.891.38−10.311.68−10.131.67
1hvrxk210−31.500.80−22.351.08−30.850.62−28.100.75−31.060.64−29.290.68−28.650.85
4hmgsia11−10.211.65−8.501.68−10.091.70−10.431.58−10.321.89−10.081.36−9.951.60
1cdgmal12−8.901.65−7.321.69−8.221.94−8.111.54−8.731.48−8.451.80−7.131.12
1htfg2613−21.171.20−19.261.58−20.691.33−20.891.30−21.481.27−21.791.42−19.801.80
1glqgtb14−9.651.25−8.731.27−9.271.87−9.371.60−9.461.38−8.831.90−9.231.58
1tmnnas15−10.710.95−9.681.11−10.111.20−10.071.45−10.290.85−10.621.95−9.580.65
Table 3. Results of the rate of clusters and the rate in rank 1.
Table 3. Results of the rate of clusters and the rate in rank 1.
AlgorithmHIGAGALGAEDGACEPGASODOCKABC
Number of clusters3.7214.004.584.244.0410.3013.04
Number in rank 117.048.0015.7215.9216.2011.828.70
Table 4. Results of execution time.
Table 4. Results of execution time.
PDBTorsionsHIGAGALGAEDGACEPGASODOCKABC
3ptb01.791.561.741.771.751.891.90
1aha11.911.621.801.831.901.952.08
3hvt21.971.751.931.951.961.942.13
1phg32.752.212.552.612.702.232.65
2mcp42.722.352.622.702.682.512.69
1stp53.793.013.533.773.593.843.81
6rnt63.843.123.143.703.544.273.64
4dfr75.474.985.185.335.454.996.06
1ett88.447.838.178.288.378.297.97
1hri910.4910.1510.3710.3910.4112.3411.06
1hvr1012.5312.0712.1312.2812.5114.8611.92
4hmg1113.9013.2113.8413.9513.8916.0913.60
1cdg1212.7112.3812.5912.6112.6615.9213.06
1htf1313.0112.4912.7812.9912.8416.2613.18
1glq1415.8714.9515.5015.7715.8520.1217.22
Average 7.416.917.197.337.348.57.53
Table 5. Results of hypothesis test.
Table 5. Results of hypothesis test.
PDB HIGAGALGAEDGACEPGASODOCKABC
3ptbHIGA-0.0040.0100.0440.0350.0120.008
GA0.996-0.9920.9950.9940.9930.688
LGA0.9900.008-0.9860.9820.5630.425
EDGA0.9560.0050.014-0.3400.0170.010
CEPGA0.9650.0060.0180.660-0.0280.011
SODOCK0.9880.0070.4370.9830.972-0.306
ABC0.9920.3120.5750.9900.9890.694-
1ahaHIGA-0.0290.0620.5240.0440.0240.017
GA0.971-0.9640.9880.9050.3420.260
LGA0.9380.036-0.9810.4600.0300.024
EDGA0.4760.0120.019-0.0170.0100.005
CEPGA0.9560.0950.5400.983-0.0380.027
SODOCK0.9760.6580.9700.9900.962-0.470
ABC0.9830.7400.9760.9950.9730.530-
3hvtHIGA-0.0050.0170.0200.0230.0120.008
GA0.995-0.7950.9740.9930.7880.537
LGA0.9830.205-0.8780.9650.3240.208
EDGA0.9800.0260.122-0.5150.1030.033
CEPGA0.9770.0070.0350.485-0.0290.011
SODOCK0.9880.2120.6760.8970.971-0.215
ABC0.9920.4630.7920.9670.9890.785-
1phgHIGA-0.0060.0150.2420.0460.0230.010
GA0.994-0.6820.9910.9830.9680.545
LGA0.9850.318-0.9760.9620.6200.422
EDGA0.7580.0090.024-0.4400.0380.015
CEPGA0.9540.0170.0380.560-0.0450.028
SODOCK0.9770.0320.3800.9620.955-0.240
ABC0.9900.4550.5780.9850.9720.760-
2mcpHIGA-0.0020.0080.0360.0150.0010.004
GA0.998-0.7920.9940.9920.4920.610
LGA0.9920.208-0.9880.9870.0920.224
EDGA0.9640.0060.012-0.4420.0050.011
CEPGA0.9850.0080.0130.558-0.0070.012
SODOCK0.9990.5080.9080.9950.993-0.640
ABC0.9960.3900.7760.9890.9880.360-
1stpHIGA-0.0010.0060.0480.0330.0140.003
GA0.999-0.7910.9640.9520.8920.587
LGA0.9940.209-0.9420.8870.6240.450
EDGA0.9520.0360.058-0.4140.0820.043
CEPGA0.9670.0480.1130.586-0.2080.062
SODOCK0.9860.1080.3760.9180.792-0.215
ABC0.9970.4130.5500.9570.9380.785-
6rntHIGA-0.0050.0200.0220.0380.0150.007
GA0.995-0.9420.9650.9850.6950.588
LGA0.9800.058-0.7920.8880.3680.127
EDGA0.9780.0350.208-0.5040.0590.040
CEPGA0.9620.0150.1120.496-0.0470.025
SODOCK0.9850.3050.6320.9410.953-0.404
ABC0.9930.4120.8730.9600.9750.596-
4dfrHIGA-0.0020.0060.0350.0320.0210.003
GA0.998-0.9610.9930.9850.9720.587
LGA0.9940.039-0.9660.9500.6240.450
EDGA0.9650.0070.034-0.4900.0420.013
CEPGA0.9680.0150.0500.510-0.0600.016
SODOCK0.9790.0280.3760.9580.940-0.215
ABC0.9970.4130.5500.9870.9840.785-
1ettHIGA-0.0080.0820.0600.5150.0440.057
GA0.992-0.9860.9250.9980.5620.637
LGA0.9180.014-0.4820.9320.0730.151
EDGA0.9400.0750.518-0.9500.0770.205
CEPGA0.4850.0020.0680.050-0.0400.045
SODOCK0.9560.4320.9270.9230.960-0.520
ABC0.9430.3630.8490.7950.9550.480-
1hriHIGA-0.0010.0180.0250.0410.0210.004
GA0.999-0.9760.9970.9980.9820.635
LGA0.9820.024-0.8620.8900.7230.117
EDGA0.9750.0030.138-0.5040.1400.015
CEPGA0.9590.0020.1100.406-0.1300.010
SODOCK0.9790.0180.2770.8600.870-0.020
ABC0.9960.3650.8830.9850.9900.980-
1hvrHIGA-0.0020.0300.0120.0450.0210.014
GA0.998-0.9950.5620.9960.9420.665
LGA0.9700.005-0.0260.9570.1770.044
EDGA0.9880.4380.974-0.9820.7110.609
CEPGA0.9550.0040.0430.018-0.0350.024
SODOCK0.9790.0580.8230.2890.965-0.368
ABC0.9860.3350.9560.3910.9760.632-
4hmgHIGA-0.0220.0720.5220.5140.0540.032
GA0.978-0.9720.9950.9850.9280.900
LGA0.9280.028-0.9800.9450.4170.214
EDGA0.4780.0050.020-0.4870.0170.010
CEPGA0.4860.0150.0450.513-0.0420.026
SODOCK0.9460.0720.5830.9830.958-0.240
ABC0.9680.1000.7860.9900.9740.760-
1cdgHIGA-0.0030.0210.0140.0370.0320.001
GA0.997-0.8830.7840.9950.9850.408
LGA0.9790.117-0.4830.9520.7630.105
EDGA0.9860.2160.517-0.9830.8440.125
CEPGA0.9630.0050.0480.017-0.0590.003
SODOCK0.9680.0150.2370.1560.941-0.012
ABC0.9990.5920.8950.8750.9970.988-
1htfHIGA-0.0150.2430.4800.5440.6240.030
GA0.985-0.9730.9740.9850.9870.637
LGA0.7530.027-0.6520.7590.8830.251
EDGA0.5200.0160.348-0.6180.6400.235
CEPGA0.4560.0150.2410.382-0.5380.028
SODOCK0.3760.0130.1270.3600.462-0.017
ABC0.9700.3630.7490.7650.9720.983-
1glqHIGA-0.0010.0120.0180.0220.0020.005
GA0.999-0.9820.9910.9950.6950.788
LGA0.9880.018-0.9550.9650.1630.227
EDGA0.9820.0090.045-0.5100.0390.042
CEPGA0.9780.0050.0350.490-0.0270.031
SODOCK0.9980.3050.8370.9610.973-0.704
ABC0.9950.2120.7730.9580.9690.296-
1tmnHIGA-0.0040.0230.0120.0280.0430.002
GA0.996-0.7830.6920.9650.9830.408
LGA0.9770.217-0.2830.7160.7460.105
EDGA0.9880.3080.717-0.8100.9750.125
CEPGA0.9720.0350.2840.190-0.6440.028
SODOCK0.9530.0170.2560.0250.356-0.012
ABC0.9980.5920.8950.8750.9720.988-
Table 6. Pseudocode of CE crossover.
Table 6. Pseudocode of CE crossover.
Algorithm: CE Crossover
Input: (1) a population with n indiviudals, (2) elitists e.
Output: a population after CE crossover
01. For i: = 1 to n do
02.  Find the historial optimal individual m0
03.  If the fitness of current individual mi < m0 then
04.    m0 = mi
05.    e = m0
06.    preserve Mfather and Mmother
07.  End if
08.  Mfather, Mmother and e next population
09. End for
Table 7. Pseudocode of ED mutation.
Table 7. Pseudocode of ED mutation.
Algorithm: ED Mutation
Input: (1) a population with n indiviudals, (2) balance factor β.
Output: a population after ED mutation
01. For i: = 1 to n do
02. Find the optimal solution Moptimum and the historic suboptimal solution Msub
03. If θ < β then
04.  mi = mmin + θ (MmaxMmin)
05. Else
06.  mi = mi + θ (MoptimumMsub) + δ (Moptimummi)
07. End if
08. End for
Table 8. Pseudocode of BSP tree.
Table 8. Pseudocode of BSP tree.
Algorithm: BSP Tree
Input: (1) an individual m, (2) BSP tree T (3) revisit flag RF
Output: an individual m that never revisits
01. Curr_node: = root node of T
02. RF = 0
03. If (Curr_node has two child nodes: l and r) then
04. Compare m with child node l and r
05.   If (m = l) or (m = r) then
06.     RF = 1
07.   End if
08.   If d (l, m) < d (r, m) then
09.     Curr_node: = child node l
10.   Else
11.     Curr_node: = child node r
12.   End if
13.   Repeat steps 03-12
14. Else
15.   If (RF = 0) then
16.     Insert a child node to Curr_node that records m
17.   Else
18.     Creat a new child node by mutating
19.   End if
20. End if

Share and Cite

MDPI and ACS Style

Guan, B.; Zhang, C.; Zhao, Y. HIGA: A Running History Information Guided Genetic Algorithm for Protein–Ligand Docking. Molecules 2017, 22, 2233. https://doi.org/10.3390/molecules22122233

AMA Style

Guan B, Zhang C, Zhao Y. HIGA: A Running History Information Guided Genetic Algorithm for Protein–Ligand Docking. Molecules. 2017; 22(12):2233. https://doi.org/10.3390/molecules22122233

Chicago/Turabian Style

Guan, Boxin, Changsheng Zhang, and Yuhai Zhao. 2017. "HIGA: A Running History Information Guided Genetic Algorithm for Protein–Ligand Docking" Molecules 22, no. 12: 2233. https://doi.org/10.3390/molecules22122233

Article Metrics

Back to TopTop