In this section, a series of analyses in terms of algorithmic performance comparisons, mono/multi-objective approaches, ligand binding site and molecular interactions are reported. Furthermore, an actual case study based on aeroplysinin-1 compounds for a drug design application is described.
4.1. Performance Comparisons
shows the median and interquartile range of the computed solutions for IHV
quality indicators for the set of 11 docking instances and the six algorithms being compared: NSGA-II, ssNSGA-II, SMPSO, GDE3, MOEA/D and SMS-EMOA. In the case of IHV
, the higher the median value, the better the result, whereas for Iɛ+
, the lower the numerical values, the higher the quality of solutions.
The hyper-volume is a quality indicator that takes into account both convergence and diversity. According to the reported results, SMPSO achieves the best IHV
values in seven out of the eleven considered problems; GDE3 is the second best performing technique. The Iɛ+
provides a measure of convergence, and the figures in Table 1
confirm SMPSO as the metaheuristic providing the best overall performance.
In order to present these results with statistical confidence (in this study, α
= 0.05), a series of non-parametric statistical tests have been applied (in several cases, the distributions of the results did not follow the conditions of normality and homoscedasticity [26
]). Therefore, the analyses and comparisons focus on the entire distribution of each of the two metrics studied. Specifically, Friedman’s ranking and Holm’s post hoc
multi-comparison tests [26
] have been applied in order to know which algorithms are statistically worse than the control one (the algorithm with the best ranking). This combination of Friedman’s ranking and post hoc
Holm’s test has been shown to offer a well-balanced statistical framework, mostly on non-parametric distribution results [26
]. Friedman’s test establishes a first ranking of algorithms in order to choose a control sample (best ranked algorithm), which is used in Holm’s post hoc
test for multi-comparison. This statistical framework has been recommended in the specialized literature [26
], as it is well adapted to compare stochastic based methods, such as metaheuristics. In this regard, as shown in Table 2
, GDE3 reaches the best ranking value (Friedman) with 1.81 for the HV indicator, followed by SMPSO, MOEA/D, SMS-EMOA, NSGA-II and ssNSGA-II. Therefore, GDE3 is established as the control algorithm for HV in the post hoc
Holm test, which is compared with the remaining algorithms. The adjusted p
in Table 2
) resulting from these comparisons are, for the last three algorithms (SMS-EMOA, NSGA-II and ssNSGA-II), lower than the confidence level, meaning that GDE3 is statistically better than these algorithms. In the case of Iɛ+
, SMPSO is better ranked than MOEA/D and GDE3, although without statistical differences in these cases. SMPSO is statistically better than NSGA-II, SMS-EMOA and ssNSGA-II.
Broadly, summing up all ranking positions (as shown in the right-hand column of Table 2
), we can observe that SMPSO shows the overall best balance for the two quality indicators. In addition, this algorithm obtained statistically better results than NSGA-II, SMS-EMOA and ssNSGA-II. GDE3 obtained the second best performance, followed by MOEA/D.
These results are graphically supported by two examples included in Figure 3
, where the fronts having the highest hyper-volume values obtained by SMPSO, GDE3 and MOEA/D for instances 1BV9 and 1D4K are plotted against the reference front (RF black curve). In this figure, it is easily observable that SMPSO always obtains solutions in regions of the reference front where GDE3 and MOEA/D do not converge. These last two algorithms show a good spread of solutions in their Pareto front approximations, but with a limited convergence to only one region of the reference front. An interesting observation in this sense is that, for all studied molecular instances, SMPSO converges to the region biased towards the Einter
objective (left-hand side of reference front), whereas GDE3 and MOEA/D generate their fronts of non-dominated solutions in a different region to the ones of SMPSO, therefore focused in this case on the intramolecular energy optimization (right-hand side of reference front).
We can state that the specific learning procedures induced by SMPSO and GDE3 lead these algorithms to search in different regions of the problem landscape, hence generating solutions in complementary parts of the reference front.
4.2. Comparison with a Mono-Objective Approach
To analyze the benefit of using the multi-objective formulation of the molecular docking from the decision-maker’s (i.e.
, a biologist) point of view, we have also solved the problem instances with the mono-objective LGA technique provided by AutoDock 4.2. In order to allow comparisons, we have used Equation (1)
to recalculate the mono-objective fitness values of the Pareto front approximations yielded by SMPSO.
shows a mono-objective comparison of the best solutions obtained (out of 30 independent runs) from both the SMPSO algorithm and the LGA for all of the instances. In general, we can observe that SMPSO outperforms LGA for almost all problem instances, but for 1HTF and 1HTG, although showing close binding energies in these cases. These results can be explained by the small size of the ligand in the case of the 1HTF complex. As was reported in [19
], the experiments that involve small ligands and flexibility in the ARG-8 side chains of the macromolecule increase the size of the conformational space. This can explain the final binding energy results obtained by the SMPSO in comparison with the LGA algorithm. In the case of the 1HTG complex, this instance includes a larger HIV-protease inhibitor. Despite there being a difference between the results obtained by the SMPSO and the LGA algorithm, the difference of the final binding energy values obtained is smaller than the rest of the instances (except 1HTF). These results obtained can be explained due to the stochastic component of the algorithms used. Furthermore, it is worth mentioning that the 1HTG computed ligand conformation returned by the SMPSO is bound to the active site. Therefore, we have to note that even using the mono-objective formulation (as done in AutoDock), the multi-objective general purpose approach by SMPSO is able to provide experts with more optimized solutions than LGA, this last technique being specifically designed for the molecular docking problem.
shows the front that has the highest hyper-volume value obtained by SMPSO for the instance 1AJV. This instance involves a cyclic urea inhibitor and an HIV-protease macromolecule, a complex problem given the ligand features. The best LGA solution is presented as a point with a dashed vertical line, this solution being the sum of the Einter
resulting from the LGA algorithm. The front solutions to the left of the dashed vertical dominate the best LGA solution, while those to the right side have better energy values in the Eintra
objective. In this regard, the selection of one energy or another depends on the biology expert. For example, the biologist can be interested in either a solution with a smaller Eintra
and a more stable inhibitor conformation or a solution with a smaller Einter
and a more stable ligand-macromolecule complex. Therefore, considering the mono-objective fitness function, the solutions obtained from the SMPSO generally show a more stable docking conformation than the ones of LGA.
In Figure 4
, the inhibitor docked to the HIV-protease macromolecule of the instance 1AJV is also observable. The ligand-macromolecule complexes of this figure show the inhibitor conformation resulting from the SMPSO solution and the inhibitor conformation from LGA. The first complex is energetically more stable than the second one given the energy results obtained; the binding energy for the first one corresponds to −11.57 kcal/mol and the second one to −7.26 kcal/mol. Both inhibitors are docked to the active site of the HIV-protease macromolecule, but the SMPSO inhibitor conformation has a better docking position.
4.3. Analysis on Ligand Binding Site and Molecular Interactions
In addition to the analysis done in the previous subsection, we have carried out a new comparison of the SMPSO and LGA algorithms, but in terms of the resulting RMSD (Å) values. For this purpose, we have focused on instances 1DK4 and 1BV9 in this analysis, in order to extend our previous analyses concerning reference fronts in Figure 3
. Figure 5
shows the reference fronts generated from all non-dominated solutions obtained through 30 runs of the SMPSO algorithm for instances 1DK4 and 1BV9. The colored vertical bars depict the RMSD values obtained for each solution. These RMSDs were calculated according to the average distance of the atomic coordinates between computed and reference ligands. As shown in this figure, the bars that represent better RMSD results (lower distance between computed and reference ligands) are darker than those bars that depict worse RMSD results. According to the solutions that are shown in Figure 5
for instances 1BV9 and 1D4K, the results with worse RMSD values correspond to solutions with higher Einter
and lower Eintra
. In contrast, those solutions with better RMSD show lower Einter
and higher Eintra
In order to perform an analysis based on the ligand binding site of the 1DK4 and 1BV9 instances and the molecular interactions, we have selected a solution for each instance from the non-dominated solution fronts of Figure 5
. The criteria followed in choosing these two solutions was achieving a balance between the RMSDs, the Einter
. For the LGA algorithm, we have selected the two best solutions for 1D4K and 1BV9, obtained in terms of the binding energy. The ligand conformations from the SMPSO and LGA algorithms and the reference ligand are compared in Figures 6
In Figure 6
, image (A) shows the best energy solution (the ligand in green) obtained by the LGA algorithm for instance 1D4K and the reference ligand (in orange). Image (B) shows the solution selected from the non-dominated solution fronts and the reference ligand. As shown, the inhibitor in (B) has a better conformation than in (A) given that the ligand conformation obtained by the SMPSO is closer to the tunnel-shaped active site of the HIV-protease receptor, this portion being very similar to the reference ligand. The RMSD scores of the ligand conformations by the LGA and SMPSO algorithms are 6.03 Å and 0.79 Å, respectively. Image (C) shows the H-bonding interactions between the ligand conformation returned by the SMPSO and the receptor. The ASP29 interacts with the ligand through a hydrogen bond in the same way as has been reported for the reference ligand, in line with other authors’ conclusions [30
] with respect to the resulting conformations of SMPSO solutions found in the literature.
In Figure 7
, (A) and (B) show the best energy solution by the LGA and the solution selected from the non-dominated solution fronts for instance 1BV9. As in the 1D4K example, the best ligand conformation corresponds to the solution returned by the SMPSO. The ligand conformation computed by the SMPSO is better positioned in the active site of the HIV-protease and, therefore, with respect to the reference ligand. The RMSD scores obtained for the LGA and the SMPSO are 8.79 Å and 0.59 Å. Figure (C) shows the H-bonds between the inhibitor and the receptor. The ASP30, GLY48 and ILE50 interact with the ligand through a hydrogen bond. In fact, it was shown that these amino acids are involved in the interaction between the reference ligand and the receptor [31
4.4. Application of Multi-Objective Docking in Drug Discovery: A Use Case Based on the Aeroplysinin-1 Compound and EGFR
In this section, we move a step forward by presenting a case study applying our multi-objective docking approach in drug discovery. We have selected three non-dominated solutions (encoding docking conformations) from several executions of SMPSO, as it has been the best performing algorithm in our previous comparisons. Therefore, once we have tested SMPSO on an academic benchmark of chemical compounds related to the HIV-protease (which was used for AutoDock 4 studies to test the new force field), we apply this technique here in the scope of a real study case with the aeroplysinin-1 and EGFR (epidermal growth factor receptor) for drug discovery.
It is worth mentioning that aeroplysinin-1 is a bromo-compound produced by Verongia
sponges as a chemical defense to protect them from bacterial pathogens, such as Staphylococcus albus
, Bacillus cereus
and Bacillus subtilis
]. Aeroplysinin-1 has been extracted in vitro
], and several analogues have been synthesized from this compound given its inhibitory activity against tyrosine kinases [35
]. As growth factors, like EGF (epidermal growth factor) and VEGF (vascular epidermal growth factor), are involved in the regulation of the cell growth and proliferation, the targets of these factors are mostly receptors with tyrosine kinase activity (TKA), so aeroplysinin-1 is a candidate TKA receptor inhibitor for testing in silico
and in vitro
. Therefore, anti-tumoral action has been reported in several in vitro
studies in which aeroplysinin-1 has a cytotoxic effect against tumoral cells from different tissues [36
] and also an anti-angiogenic effect in the previous phases of the angiogenesis process [37
]. Other studies have reported that aeroplysin-1 inhibits the kinase activity of the EGFR and induces the accumulation of this receptor in human breast tumoral cells [38
Accordingly, we have selected a use case based on a study in silico with the aeroplysinin-1 compound and the cellular ecto- and intra-domains of the EGFR using the proposed multi-objective approach in this paper. With this use case, we attempt to understand how aeroplysinin-1 interferes in the kinase activity of the EGFR given the in vitro studies that have been performed with this compound. In this regard, the multi-objective approach presented in this paper provides the expert with a tool for assisting them with the selection of specific docking solutions according to the weight of the Einter and Eintra. The selection of a specific docking solution would depend on the drug discovery problems of the expert.
In order to carry out the docking studies with the multi-objective technique presented, we obtained the crystallographic structures for the EGFR from the PDB database. For the EGFR intradomain of Homo sapiens
, 1M17 has been used. This crystallographic structure presents the tyrosine domain kinase that includes residues 671 to 998 and the known anti-tumoral erlotinib drug (OSI-774, CP-358,774, TarcevaTM), which is an EGFR kinase-specific inhibitor. The 1YY9 crystal structure of Homo sapiens
, which includes residues 25 to 642, has been selected for the EGFR ecto-domains. The aeroplysinin-1 (+) isomer was drawn using the ACD/ChemSketch software [39
] given that the aeroplysinin-1 crystallographic structure has not been found.
Aeroplysinin-1 docking instance preparation: firstly, we used ADT to detect the four rotatable bounds, add the partial atomic charges and calculate the AutoDock atom types. For the preparation of the EGFR ecto- and endo-domains, the Chimera UCSF software was used to separate the intracellular kinase domain and the ecto-domains from their respective co-crystallized ligands and to remove all crystallographic molecules that are not involved in the ligand-receptor interaction, such as N-acetyl-glucosamine, alpha-mannose, etc. ADT was also used to add polar hydrogens and partial charges to the two EGFR macromolecules. To calculate the map, we established a grid that included all of the domains of both EGFR macromolecules with a grid spacing of 0.375 Å. The resulting files were used as inputs to run AutoGrid and AutoDock with jMetal. The SMPSO algorithm has been set with a population of 150 individuals, as done in our previous benchmark experiments. In this case, as we are dealing with real case studies, we set our algorithm with a larger number of energy evaluations (25,000,000) per run (30 independent runs), hence looking for optimized solutions with exhaustive experimentation.
shows the set of non-dominated solutions from two independent SMPSO runs of the EGFR kinase domain and the aeroplysinin-1 compound. In this instance, we have focused on those solutions (see the black points in Figure 8
) with more negative Einter
. The values of Einter
of the solutions from Runs 7 and 23 are −5.4 and −6.13 kcal/mol, respectively. The Einter
represents the binding affinity between the aeroplysinin-1 compound and the EGFR kinase domain. In this case, it is expected that the aeroplysinin-1 adopts an energetically-stable conformation to the cleft between the amino-terminal and carboxyl-terminal lobes of the EGFR domain kinase, as has been reported in the literature with other compounds, such as the Erlotinib (a 4-anilinoquinazoline inhibitor), ATP, ATP analogues and ATP-competitive inhibitors [40
]. In Figure 9
, (A) and (C) show how the aeroplysinin-1 compound is bound to the EGFR kinase domain. These results are in accordance with those reported for the Erlotinib drug that is bound to the same binding site in the 1M17 crystallographic structure. Images (B) and (D) show the molecular interactions between aeroplysinin-1 and the amino acids of the EGFR domain kinase cleft. Image (B) shows that the H17 of the -OH second group forms a H-bond with the MET-769 amide oxygen. In the case of the Erlotinib compound complexed with the EGFR kinase domain, the N1 accepts an H-bond from the MET-769 amide nitrogen [40
]. Image (D) shows the H-bond created between the O14 and a hydrogen from the side chain of the LYS-721. In the studies with the kinase inhibitor Erlotinib, it was found that the THR-766, LYS-721 and LEU-764 are <4 Å from the acetylene moiety on the anilino ring [40
]. The docking results obtained from the SMPSO can explain other enzymatic studies with semi-synthetic derivatives of aeroplysinin-1 that have an inhibitory activity against N+/K+ ATPase [41
]. These results can clarify how this compound inhibits the EGFR enzymatic activity and show its possible applicability to targets, like other tyrosine kinase receptors involved in cell proliferation and growth. Furthermore, the multi-objective approach makes the selection of those solutions with different weights of Einter
easier, as mentioned.
shows the set of non-dominated solutions of the SMPSO algorithm Run 5 of the aeroplysinin-1 and ecto-domain EGFR receptor use case. In this instance, the values of the Einter
of the selected solution (see the black point in Figure 10
) are −3.46 and −0.78 kcal/mol, respectively. In this case, we have selected a solution not corresponding to the best Einter
, but rather to the solution with a good balance between these two energies. As stated, Einter
represents the binding affinity between ligand-receptor, whereas Eintra
describes the energy associated with the ligand deformation in the docking process. In this case, this solution can be interesting given the type of interaction between aeroplysinin-1 and the ecto-domains of the EGFR receptor that is represented and described below.
Image (A) in Figure 11
shows that the aeroplysinin-1 compound of the solution selected is bound to domain II of the EGFR ectodomains. This domain plays an important role in the EGFR receptor activation based on monomer-monomer interactions, as has been reported in previous studies [42
]. This mechanism is based on the binding between the EGF (epidermal growth factor) and the EGFR domains I and III, deactivating the EGFR receptor autoinhibition. This conformational change of the receptor leads to the exposition of domains II and IV. Domain II is more involved in the monomer-monomer interaction than domain IV. All of the solutions of the runs preformed by SMPSO have demonstrated that aeroplysinin-1 tends to bind to domain II in terms of the final binding energy. In a more detailed view of the interaction of aeroplysin-1/EGFR ecto-domains, image (B) in Figure 11
shows that an H-bond is formed between the hydrogen of the ARG-285 and H17 of aeroplysinin-1. Such a solution selected with a low Eintra
can be useful in cases in which it is necessary to have a more energetic stability of the ligand, like the use case presented in which the aeroplysinin-1 compound interferes in the EGFR dimerization through domain II and requires a more stable docked ligand conformation.