Next Article in Journal
Cellular Regulation of Kynurenic Acid-Induced Cell Apoptosis Pathways in AGS Cells
Next Article in Special Issue
Assessing How Residual Errors of Scoring Functions Correlate to Ligand Structural Features
Previous Article in Journal
24-Epibrassinolide Promotes Fatty Acid Accumulation and the Expression of Related Genes in Styrax tonkinensis Seeds
Previous Article in Special Issue
Binding Studies and Lead Generation of Pteridin-7(8H)-one Derivatives Targeting FLT3
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hierarchical Clustering and Target-Independent QSAR for Antileishmanial Oxazole and Oxadiazole Derivatives

by
Henrique R. Teles
1,
Leonardo L. G. Ferreira
1,*,
Marilia Valli
1,
Fernando Coelho
2 and
Adriano D. Andricopulo
1,*
1
Laboratory of Medicinal and Computational Chemistry (LQMC), Center for Research and Innovation in Biodiversity and Drug Discovery (CIBFar), Institute of Physics of São Carlos, University of São Paulo (USP), Av. João Dagnone, n° 1100, São Carlos 13563-120, SP, Brazil
2
Laboratory of Synthesis of Natural Products and Drugs, Institute of Chemistry, University of Campinas, Campinas 13083-970, SP, Brazil
*
Authors to whom correspondence should be addressed.
Int. J. Mol. Sci. 2022, 23(16), 8898; https://doi.org/10.3390/ijms23168898
Submission received: 22 June 2022 / Revised: 22 July 2022 / Accepted: 25 July 2022 / Published: 10 August 2022
(This article belongs to the Special Issue QSAR and Chemoinformatics in Molecular Modeling and Drug Design 4.0)

Abstract

:
Leishmaniasis is a neglected tropical disease that kills more than 20,000 people each year. The chemotherapy available for the treatment of the disease is limited, and novel approaches to discover novel drugs are urgently needed. Herein, 2D- and 4D-quantitative structure–activity relationship (QSAR) models were developed for a series of oxazole and oxadiazole derivatives that are active against Leishmania infantum, the causative agent of visceral leishmaniasis. A clustering strategy based on structural similarity was applied with molecular fingerprints to divide the complete set of compounds into two groups. Hierarchical clustering was followed by the development of 2D- (R2 = 0.90, R2pred = 0.82) and 4D-QSAR models (R2 = 0.80, R2pred = 0.64), which showed improved statistical robustness and predictive ability.

1. Introduction

Visceral leishmaniasis (VL) is a neglected tropical disease (NTD) caused by the protozoan parasites Leishmania infantum and L. donovani and is the most severe form of leishmaniasis. Transmission occurs through the bite of infected female Phlebotominae sandflies. VL is a fatal disease if not treated and is the second leading cause of death among parasitic conditions after malaria. The disease has become a severe global health problem, with more than 200 million people currently at risk of infection worldwide [1]. The current treatments include drugs such as amphotericin B, sodium stibogluconate, miltefosine, and paromomycin, which present several drawbacks, including distribution and availability issues, long-term and complex treatment regimens, teratogenicity, toxicity, and drug resistance [2,3]. These shortcomings, combined with the high burden caused by the disease, highlight an urgent need for new treatment options for VL.
Despite the widespread use of the available drugs, little is known about their mechanisms of action, which reflects the main strategy that has been used for NTD drug discovery: phenotypic screening [4]. Despite the lack of information on the molecular targets, the phenotypic strategy is useful to account for activity against whole cells along with aspects of cell uptake, cytocidal or cytostatic mechanisms, and time-to-kill, among other relevant issues.
Therefore, the phenotypic approach has been widely used in drug discovery for NTDs given the very few validated molecular targets explored in the field [5,6].
Aligned with the strategy of phenotypic screening, the combination of different bioactive chemical scaffolds can be used to enrich the chemical diversity explored in NTD drug discovery. Recently, our group reported a hybridization approach combining two heterocyclic cores presenting antiparasitic activity, which guided the design of novel hybrid compounds with promising antileishmanial and anti-Trypanosoma cruzi activities [4]. One core is represented by oxadiazoles, and the other is 3-substituted 2-oxindoles, both of which are useful scaffolds, especially as antileishmanial agents [7,8,9,10,11].
The main goal of this work was the development of QSAR models using our in-house set of oxazoles and oxadiazoles that displayed in vitro antileishmanial activity against intracellular amastigotes. To this end, a computational approach using 2D- and 4D-QSAR strategies was conducted. Hierarchical clustering based on ligand structure was applied to split the dataset into two structurally similar groups. The integration of the experimental results with the applied ligand-based drug design studies (LBDD) yielded statistically significant QSAR models with the ability to predict the activity of new antileishmanial agents within a defined applicability domain.

2. Results and Discussion

2.1. AutoQSAR

The initial model was obtained considering the complete set of molecules using the AutoQSAR method. A set of seven binary fingerprints (dendritic, linear, atom pair, atom triplet, topological, MOLPRINT 2D, and radial) was generated to characterize the structures and create the 2D molecular descriptors. Different regression techniques, such as multiple linear regression (MLR), partial least squares regression (PLS), principal components regression (PCR), and kernel-based PLS (KPLS), were adopted to build the set of models. Random selection of the molecules for the test and training sets was applied using the AutoQSAR machine learning routine specifically designed for this end. All seven binary fingerprints available were used as molecular descriptors to build the 2D-QSAR models. The fingerprints and regression approaches were systematically combined to generate the best models, which are described in Table 1.
For the complete dataset, the model that produced the best statistical parameters and score was obtained by the MOLPRINT2D binary fingerprint. The best regression method selected by the AutoQSAR routine was the KPLS technique with an 80:20 training/test set ratio, that is, 52 molecules in the training set and 13 compounds in the test set. This model yielded an R2 value of 0.6304 and a Q2 value of 0.6107.
Aiming to improve the predictive ability of the model, a structural analysis of the molecules in the dataset suggested that they have a diverse scaffold pattern. Therefore, to improve the QSAR results, hierarchical clustering was applied to the entire dataset.
To start the hierarchical clustering analysis, the binary fingerprints (dendritic, linear, atom pair, atom triplet, topological, MOLPRINT 2D, and radial) were calculated for the entire set of molecules and used as molecular descriptors for this analysis. First, the Kelley level [12] was used to select the optimal number of clusters. In this step, a considerable number of singletons (clusters with one molecule) were obtained, which indicates the structural diversity in the dataset. The next step was the separation of the entire dataset into two groups. To separate these two groups according to the similarity between the molecules, the total number of clusters was divided to generate only two clusters which were as populous as possible; i.e., starting with the Kelley level, the number of clusters was reduced until the formation of two groups. These two groups of compounds were defined so that both groups included a considerable number of molecules to build the two QSAR models. This strategy resulted in the exclusion of two molecules that were identified as structural outliers [13,14]. As a result of this cluster analysis, the initial dataset originated two groups of compounds: the G1 group with 27 compounds and the G2 group with 35 compounds. The scaffolds of each group are presented in Figure 1. The structural diversity present in the dataset can be observed through the scatter plot obtained with the multidimensional scaling (MDS) plot illustrated in Figure 2. From each of these groups, two new and independent QSAR models were built.
It is worth noting that group G1 in Figure 1 and Figure 2 is less structurally diverse than group G2. In Figure 2, the molecules of the G1 group are more concentrated, while in the G2 group, the molecules are more dispersed over the MDS plot. This structural diversity may have influenced the QSAR statistical parameters for each group, as the less diverse group (G1) resulted in better statistical indicators.
For the G1 group, the model that produced the best statistical parameters and score was represented by radial fingerprints and included 22 molecules in the training set and 5 in the test set (proportion 80:20). This is indicated by R2 = 0.9069, SD = 0.1039, Q2 = 0.8201, RMSE = 0.0945 and KPLS factor = 2. The best models for each training/test set split are shown in Table 2.
For the G2 group, the model that produced the best statistical parameters was represented by dendritic fingerprints and included 28 molecules in the training set and 7 in the test set (proportion 80:20). This finding is indicated by R2 = 0.8206, SD = 0.1377, Q2 = 0.8001, RMSE = 0.1081 and KPLS factor = 3. The best models for each split are shown in Table 3.
The predicted pIC50 values obtained for both groups, G1 and G2, are represented graphically in Figure 3 along with the experimental pIC50 values. Both plots show good agreement between the experimental and predicted activity for the AutoQSAR models. In addition, Table 4 shows the predicted and experimental pIC50 values for the entire dataset (complete set model) and for the two groups of molecules obtained after the hierarchical cluster analysis (cluster model).
By examining the scaffolds of the two groups, it is noticeable that although the structures of the molecules of both clusters have similarities, the merging of the two groups, which would represent greater structural diversity, does not result in improved models. In line with this finding, the G2 group, which has a greater structural diversity, represented by a smaller scaffold and larger R substituents, demonstrated a slightly smaller improvement than G1 in the statistical parameters (a difference of 0.0863 in R2 and 0.02 in Q2).
In addition to the predictive capacity, KPLS 2D-QSAR allows the visualization of regions in the molecules responsible for increasing or decreasing the biological response through the generation of contribution maps, as depicted in Figure 4. Green and red colors represent positive and negative contributions to response, respectively. For the G1 group, halogen substituents showed positive contributions to substituents R1, R3, R4 and R5. The only difference among molecules 7, 22, and 23 is the substitution of bromine, fluorine, and chlorine in substituent R3, and all substitutions in this position showed positive contributions. However, the chlorine in molecule 23 contributed to a greater increase in activity, which can be validated directly by the pIC50 in Table 4. In general, the nitro group substitution was unfavorable for molecules in G1. For the G2 group, the hydrogen atom in position R1 showed an unfavorable contribution. In this case, the phenyl group, and especially the methoxy group substitution, showed a favorable contribution in R1. An exception for the positive contribution of the nitro group was observed with the oxazole core. In substituent R3, halogen (except bromide) and nitro substituents showed a positive contribution, but asymmetrical electron density was slightly unfavorable, which favors double halogen substitutions in both meta positions or one halogen in the para position. For substituent R2, the hydroxyl group showed a positive contribution, while the benzene sulfonamide showed a negative contribution.
The MDS scatter plot of the dataset, obtained by the geometric convex-hull method, allows the definition of the chemical space over which the model, represented by the training set, is applicable. The applicability domain for the 2D-QSAR models for the G1 and G2 groups is shown in Figure 5.

2.2. 4D-QSAR

Three-dimensional quantitative structure–activity relationship (3D-QSAR) modeling is a broadly used method in computer-assisted molecular design. The method assumes that changes in the binding affinities of ligands are related to changes in molecular properties represented by molecular fields. A common and popular method is comparative molecular field analysis (CoMFA). Some issues are inherent in 3D-QSAR [15,16], mainly in receptor-independent 3D-QSAR (RI-3D-QSAR). For example, the QSAR model in the CoMFA method is strongly dependent and sensitive to conformations and alignments of the molecules. Another limitation is that the bioactive conformation of a molecule should be used, which may not coincide with the lowest energy conformations, which are commonly used whenever the molecular target is unknown. In this work, several structural alignment methods were used as attempts to achieve a suitable alignment of the compounds to be used in subsequent CoMFA analyses. However, due to the limitations described above, no suitable CoMFA models were obtained (see Supplementary File for CoMFA models). Following these concepts, a 4D-QSAR approach was used in this work to address the limitations associated with 3D-QSAR models. The LQTA-QSAR approach explores the main advantages of both CoMFA and 4D-QSAR modeling [16,17]. This method is based on the generation of a CEP for each compound instead of only one conformation, which is followed by the calculation of 3D descriptors using the Coulomb and Lennard–Jones potentials. To generate the 4D-QSAR models, the strategy used in the 2D-QSAR analyses was repeated, i.e., we built a model with the entire dataset, which was followed by the generation of groups by using hierarchical clustering. Two-hundred training and test sets were randomly divided and subjected to QSAR model construction. For the 2D-QSAR studies, the best results were obtained by using an 80:20 ratio between the training and test sets; thus, the same ratio was applied to the 4D-QSAR analyses.
The model generated using the complete dataset was used for comparison purposes. LQTAgridPy software was used to generate a matrix with 21,252 descriptors. After applying a variance cutoff and the Pearson cutoff, 554 descriptors were subjected to PyQSAR. This software uses a clustering method to reduce the search space. In this step, PyQSAR also eliminates descriptors with low variance. A selection based on a genetic algorithm (GA) was used to maintain the best descriptors from the different clusters. The GA-based selections were repeated until the optimal variable selection was achieved. PyQSAR selected a set of descriptors that resulted in the following parameters: R2 = 0.4599, R2pred = 0.4353. The set of selected descriptors included [15_19_20_NH3+_C], [15_20_15_NH3+_LJ], [16_21_20_NH3+_C], [16_23_10_NH3+_LJ], and [18_19_11_NH3+_C].
Group G1: The dataset used in the 4D-QSAR for G1 was the same as that previously used in the 2D-QSAR, with 27 compounds divided into 22 molecules for the training set and 5 compounds for the test set. The LQTAgridPy software resulted in a matrix with 19,404 descriptors. After truncation of the Lennard–Jones potential, the variance cutoff, and the Pearson cutoff, the filters led to a significant variable reduction to 903 descriptors. Each of these descriptors represents a grid point with the fields acting upon it. This reduced matrix was used as the input for the selection of variables and generation of the model by PyQSAR. The model chosen is represented by 5 descriptors (Equation (1)) and generated the following results: R2 = 0.8033, RMSE = 0.1313, Q25-fold = 0.6600, RMSEcv = 0.1716, R2pred = 0.6480.
pIC50 = 5.1535 + 0.8409[15_13_6_NH3+_LJ]
−0.7075[16_12_5_NH3+_LJ]
+0.1484[16_20_10_NH3+_LJ]
−0.1210[21_17_13_NH3+_LJ]
+0.1913[22_12_12_NH3+_LJ]
Group G2: For G2, the same 35 compounds used in 2D-QSAR, divided into 28 compounds in the training set and 7 in the test set, were employed in the 4D-QSAR. The LQTAgridPy software resulted in a matrix with 21,252 descriptors. After truncation of the Lennard–Jones potential, the variance cutoff, and the Pearson cutoff, the filters led to a significant variable reduction to 3353 descriptors. Each of these descriptors represents a grid point with the fields acting on it. This reduced matrix was used as the input for the selection of variables and generation of the model by PyQSAR. The selected model is represented by five descriptors (Equation (2)) and generated the following results: R2 = 0.7005, RMSE = 0.156, Q25-fold = 0.6095, RMSEcv = 0.1701, R2pred = 0.6581.
pIC50 = 4.9338 + 0.2170[16_20_11_NH3+_LJ]
+0.1303[17_19_15_NH3+_LJ]
−0.7328[17_26_15_NH3+_C]
+0.2770[18_23_14_NH3+_LJ]
+0.7227[19_26_20_NH3+_C]
The 4D-QSAR statistical parameters are summarized in Table 5.
The contribution maps were generated to allow the visualization of the positive and negative contributions of groups in the 4D-QSAR model (Figure 6). Green spheres represent steric interactions with positive regression coefficients, and red represents steric interactions with negative regression coefficients. Similarly, blue spheres indicate electrostatic descriptors with negative regression coefficients, and yellow represents positive regression coefficients. Positive coefficients contribute positively to the pIC50 values, while negative coefficients contribute negatively. The analysis for both groups G1 and G2 indicates that the major correlation between structure and activity is not related to the oxadiazole or oxazole core, but primarily to the substituents attached to these rings.
Group G1: For G1, the positive steric contributions [16_20_10_NH3+_LJ] and [22_12_12_NH3+_LJ] are mainly related to the halogen substituents at R4 and R5, which, due to energy minimization, are facing towards [16_20_10_NH3+_LJ] or [22_12_12_NH3+_LJ] in some of these molecules. The negative steric contribution [21_17_13_NH3+_LJ] is related to the hydroxyl group, and mainly to bulky substituents at R2. The results of the 4D-QSAR models for G1 are related to the degree of flexibility and pIC50. Molecules with a higher degree of freedom showed poor biological activity, which may pose an obstacle to the formation of stable intermolecular interactions with the molecular target [18].
Group G2: For G1, a higher conformational degree of freedom of the molecules is also associated with low pIC50 values, which can be noticed in Figure 6 when comparing the least and most active compounds. The descriptor [18_23_14_NH3+_LJ] indicates that bulky substituents at group R3, mainly represented by compound 31 with an ethenylbenzene (styrene) substituent, show a positive steric contribution. The positive contribution of [19_26_20_NH3+_C] is associated with halogen-substituted compounds in the para position of the phenyl in group R4, whereas the [17_26_15_NH3+_C] contribution indicates that these same atoms can decrease the biological response because of the assumed conformations.

3. Materials and Methods

3.1. Dataset Characterization

The dataset used for both QSAR modeling methods includes 64 molecules, 62 having an oxadiazole ring and 2 having an oxazole core, as shown in Table 6. The in vitro assays against L. infantum were performed in our research group using the same experimental conditions, as previously reported [4]. The potency of the compounds was expressed as the concentration required to kill 50% of parasites in vitro (IC50). The antileishmanial activity was determined as the number of intracellular amastigotes in THP-1 macrophages, which is the relevant form of the parasite for drug discovery purposes. The IC50 values (ranging from 2.38 to 52.59 µM) were converted into pIC50 values for appropriately scaling the data, which ranged from 4.28 to 5.62. The distribution of pIC50 values over the dataset compounds is illustrated in the histogram in Figure 7.
In addition to the characterization of the activity profile of the dataset, a scaffold analysis was performed for the R-groups using Canvas (Maestro, Schrödinger) [19]. The general scaffolds for the series were generated through an automated search for the maximum common substructure (MCS).

3.2. 2D-QSAR

The 2D-QSAR was performed with the machine learning tools of AutoQSAR [19] embedded in Maestro [20] (release 2016-3, Schrödinger LLC, New York, NY, USA) as previously reported [21,22]. In all AutoQSAR calculations, the proportion between the test and training sets was defined as follows: 70:30 (70% of the compounds for training the models and 30% for the test set), 75:25, and 80:20. The best model was selected based on internal validation parameters, such as the regression coefficient (R2) and the 8 standard deviation (SD) for the training set, and external validation parameters, i.e., the predicted regression coefficient (Q2) and the root-mean-square error (RMSE) for the test set compounds. The best models were recreated within Canvas using the same test set, training set, and binary fingerprint generated in the AutoQSAR modeling.

3.3. Hierarchical Clustering

Hierarchical clustering based on 2D similarity analyses of the dataset was performed using Canvas 1.1 software. Linear fingerprints were calculated, and the similarity matrix was evaluated using the Tanimoto coefficient as the similarity metric [23]. This is a popular similarity metric for comparing chemical structures represented by means of fingerprints, and structures are usually considered similar if the index is higher than 0.85. A higher number of shared features results in an index closer to 1. Conversely, a higher number of unique features results in an index closer to zero. The agglomerative method chosen was the average linkage. Based on the similarity results, the dataset compounds were split into two groups, and the Kelley index was used to select the optimal number of clusters [12]. Following this procedure, the number of clusters was decreased to generate two different groups so that if the number of clusters was reduced again, it would result in the merging of groups G1 and G2. To visualize the diversity of the identified groups in a plane, a multidimensional scaling (MDS) approach was employed with the similarity matrix as input in a Knime node [24].
The applicability domain (AD) defines a region or limits where the model is able to reliably perform according to predictions [25]. The AD generated in this work was built using the geometric convex-hull method [26]. After the MDS, the coordinates of the training set were submitted to SciPy to generate the convex-hull output.

3.4. 4D-QSAR

The 4D-QSAR was performed with the LQTA-QSAR method [27]. The molecular dynamics simulation was performed using GROMACS version 4.6.5 [28,29]. A dodecahedron box was filled with explicit transferable intermolecular potential 3-point (TIP3P) water molecules, and the ffG43a1 [30,31] force field was used for the all-atom molecular simulations. The minimum distance between the molecule and walls was set to 10 Å. The energy minimization step was performed using the steepest descent gradient and conjugate gradient methods for a maximum of 4000 calculation steps. The pressure of the system was controlled by Parrinello–Rahman [32] coupling, and the temperature was kept constant by the Berendsen thermostat [33]. The volume of the system was balanced by heating in steps of 50 K, 100 K, 200 K, and 350 K for 10 ps each, and the system was ultimately cooled to 300 K for a 500 ps simulation. All the conformations for each ligand obtained through molecular dynamics simulations were placed in a “.gro” file extension. The conformational ensemble profile (CEP) to be used for the 4D-QSAR models was assembled considering the ligand conformations obtained from 50 to 500 ps. The alignment was generated considering the matching of the atom positions of the oxazole and oxadiazole rings. The alignment was submitted to LQTAgridPy, a Python version of LQTAgrid. The probe NH+3 was selected and used to represent the N-terminal unit. The probe swept all grid points from the box to compute all Coulomb and Lennard–Jones descriptors. The data were preprocessed with the energy cutoff of the Lennard–Jones descriptors from the CoMFA method. If the descriptor computed at an x, y, z position had a value of Lennard–Jones energy equal to or lower than 30 kcal/mol, no cutoff was applied. Otherwise, if the energy value exceeded 30 kcal/mol, then the logarithmic value of the residual was added to 30 kcal/mol, according to the following:
LJx,y,z < 30 kcal/mol → LJx,y,z = LJx,y,z
LJx,y,z ≥ 30 kcal/mol → L Jx,y,z = 30 + logLJx,y,z − 30
The filtering method for the descriptor selection excluded those variables with absolute values of the Pearson correlation coefficient (|r|) of less than 0.2 with respect to the pIC50 [18,27] and the low-variance descriptors that only slightly changed between compounds (those with variance below the cutoff value of 0.01). The remaining descriptors were selected by PyQSAR [34], an open-source QSAR model generator. The variable selection used in PyQSAR uses the strategies of hierarchical clustering and a genetic algorithm (GA). Finally, multiple linear regression (MLR) was performed with the generated descriptors, and the pIC50 values were used as the independent variables to build the model. The process of internal validation was carried out through conventional noncross-validated correlation (R2). The robustness was examined by 5-fold cross-validated correlation (Q25-fold) coefficients. For external validation, the test set was evaluated according to the coefficient of determination of external validation (R2pred). The images of the contribution maps were created by using PyMOL version 1.8.4.0 [35].

4. Conclusions

Receptor-independent QSAR methods were employed in the development of 2D- and 4D-QSAR models for a series of oxadiazole and oxazole antileishmanial derivatives. The clustering of the dataset proved to be advantageous for optimizing the statistical parameters in both the 2D-QSAR and 4D-QSAR models presented in this work. The final models exhibited good internal consistency and external predictive power and were able to accurately predict the pIC50 values when compared to the experimental values for both 2D and 4D models within the applicability domain. Once new compounds are designed, the hierarchical clustering, MDS plot, and applicability domain are useful tools to evaluate which group they belong to, and then the corresponding model can be applied. The results for the 2D-QSAR models compared to that of the 4D models suggest that for this dataset, 2D descriptors correlate better to the variation in the biological activity. The reasons for poorer results in QSAR methods that require 3D conformations are unknown; however, they may be linked to the mode of action of this series, which is yet to be discovered. Although the molecular target of these compounds is so far unknown, we can speculate from the structure of the compounds that the several functionalities that are able to form hydrogen-bonds and π-stacking interactions play a significant role in the biological activity of this series, for example, the hydroxy-oxyindole, phenyl and oxadiazole rings. However, the exact role of each functionality in terms of ligand–target complexes could only be disclosed after the discovery and structural resolution of the molecular target. In addition to the activity prediction, the generated 2D and 4D contribution maps provided information about structural and conformational features that can be used as a valuable tool to guide future efforts in the design of antileishmanial agents.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms23168898/s1.

Author Contributions

Conceptualization, H.R.T., L.L.G.F., M.V., F.C. and A.D.A.; Formal analysis, H.R.T., L.L.G.F. and M.V.; Funding acquisition, A.D.A.; Investigation, H.R.T., L.L.G.F., M.V., F.C. and A.D.A.; Supervision, A.D.A.; Writing—original draft, H.R.T., L.L.G.F. and M.V.; Writing—review & editing, H.R.T., L.L.G.F., M.V., F.C. and A.D.A. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge the National Council for Scientific and Technological Development (CNPq), the Coordination for the Improvement of Higher Education Personnel (CAPES), and the Sao Paulo Research Foundation (FAPESP) (CIBFar grant 2013/07600-3; M.V. grant 2019/05967-3), Brazil, for financial support.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare they have no conflict of interest.

References

  1. Ruiz-Postigo, J.A.; Grout, L.; Saurabh, J. Global leishmaniasis surveillance, 2017–2018, and first report on 5 additional indi cators/Surveillance mondiale de la leishmaniose, 2017–2018, et premier rapport sur 5 indicateurs supplementaires. Wkly. Epidemiol. Rec. 2020, 95, 265–280. [Google Scholar]
  2. Polonio, T.; Efferth, T. Leishmaniasis: Drug resistance and natural products. Int. J. Mol. Med. 2008, 22, 277–286. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Gourbal, B.; Sonuc, N.; Bhattacharjee, H.; Legare, D.; Sundar, S.; Ouellette, M.; Rosen, B.P.; Mukhopadhyay, R. Drug uptake and modulation of drug resistance in Leishmania by an aquaglyceroporin. J. Biol. Chem. 2004, 279, 31010–31017. [Google Scholar] [CrossRef] [Green Version]
  4. Fernandes, F.S.; Santos, H.; Lima, S.R.; Conti, C.; Rodrigues, M.T., Jr.; Zeoly, L.A.; Ferreira, L.L.; Krogh, R.; Andricopulo, A.D.; Coelho, F. Discovery of highly potent and selective antiparasitic new oxadiazole and hydroxy-oxindole small molecule hybrids. Eur. J. Med. Chem. 2020, 201, 112418. [Google Scholar] [CrossRef] [PubMed]
  5. Gilbert, I.H. Drug discovery for neglected diseases: Molecular target-based and phenotypic approaches: Miniperspectives series on phenotypic screening for antiinfective targets. J. Med. Chem. 2013, 56, 7719–7726. [Google Scholar] [CrossRef]
  6. Ferreira, L.L.; de Moraes, J.; Andricopulo, A.D. Approaches to advance drug discovery for neglected tropical diseases. Drug Discov. Today 2022, 27, 2278–2287. [Google Scholar] [CrossRef]
  7. Taha, M.; Ismail, N.H.; Ali, M.; Rashid, U.; Imran, S.; Uddin, N.; Khan, K.M. Molecular hybridization conceded exceptionally potent quinolinyl-oxadiazole hybrids through phenyl linked thiosemicarbazide antileishmanial scaffolds: In silico validation and SAR studies. Bioorganic Chem. 2017, 71, 192–200. [Google Scholar] [CrossRef]
  8. Taha, M.; Ismail, N.H.; Imran, S.; Selvaraj, M.; Jamil, W.; Ali, M.; Kashif, S.M.; Rahim, F.; Khan, K.M.; Adenan, M.I.; et al. Synthesis and molecular modelling studies of phenyl linked oxadiazole-phenylhydrazone hybrids as potent antileishmanial agents. Eur. J. Med. Chem. 2017, 126, 1021–1033. [Google Scholar] [CrossRef] [PubMed]
  9. Pitasse-Santos, P.; Sueth-Santiago, V.; Lima, M.E. 1,2,4-and 1,3,4-Oxadiazoles as Scaffolds in the Development of Antiparasitic Agents. J. Braz. Chem. Soc. 2018, 29, 435–456. [Google Scholar] [CrossRef]
  10. Scala, A.; Cordaro, M.; Grassi, G.; Piperno, A.; Barberi, G.; Cascio, A.; Risitano, F. Direct synthesis of C3-mono-functionalized oxindoles from N-unprotected 2-oxindole and their antileishmanial activity. Bioorganic Med. Chem. 2014, 22, 1063–1069. [Google Scholar] [CrossRef] [PubMed]
  11. Saha, S.; Acharya, C.; Pal, U.; Chowdhury, S.R.; Sarkar, K.; Maiti, N.C.; Jaisankar, P.; Majumder, H.K. A novel spirooxindole derivative inhibits the growth of Leishmania donovani parasites both in vitro and in vivo by targeting type IB topoisomerase. Antimicrob. Agents Chemother. 2016, 60, 6281–6293. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Kelley, L.A.; Gardner, S.P.; Sutcliffe, M.J. An automated approach for clustering an ensemble of NMR-derived protein structures into conformationally related subfamilies. Protein Eng. Des. Sel. 1996, 9, 1063–1065. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Santos-Filho, O.A.; Cherkasov, A. Using molecular docking, 3D-QSAR, and cluster analysis for screening structurally diverse data sets of pharmacological interest. J. Chem. Inf. Model. 2008, 48, 2054–2065. [Google Scholar] [CrossRef] [PubMed]
  14. Tropsha, A. Best Practices for QSAR Model Development, Validation, and Exploitation. Mol. Inform. 2010, 29, 476–488. [Google Scholar] [CrossRef] [PubMed]
  15. Hopfinger, A.; Wang, S.; Tokarski, J.S.; Jin, B.; Albuquerque, M.; Madhav, P.J.; Duraiswami, C. Construction of 3D-QSAR models using the 4D-QSAR analysis formalism. J. Am. Chem. Soc. 1997, 119, 10509–10524. [Google Scholar] [CrossRef]
  16. Ghasemi, J.B.; Safavi-Sohi, R.; Barbosa, E.G. 4D-LQTA-QSAR and docking study on potent gram-negative specific LpxC inhibitors: A comparison to CoMFA modeling. Mol. Divers. 2012, 16, 203–213. [Google Scholar] [CrossRef]
  17. Cramer, R.D.; Patterson, D.E.; Bunce, J.D. Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. J. Am. Chem. Soc. 1988, 110, 5959–5967. [Google Scholar] [CrossRef] [PubMed]
  18. Melo, E.; Ferreira, M. A 4D structure-activity relationship model to predict HIV-1 integrase strand transfer inhibition using the LQTA-QSAR methodology. J. Chem. Inf. Model. 2012, 52, 1722–1732. [Google Scholar] [CrossRef]
  19. Dixon, S.L.; Duan, J.; Smith, E.; von Bargen, C.D.; Sherman, W.; Repasky, M.P. AutoQSAR: An automated machine learning tool for best-practice quantitative structure–activity relationship modeling. Future Med. Chem. 2016, 8, 1825–1839. [Google Scholar] [CrossRef]
  20. Release, S. Maestro; Version 3; Schrödinger LLC: New York, NY, USA, 2017. [Google Scholar]
  21. Medeiros, A.R.; Ferreira, L.L.; de Souza, M.L.; de Oliveira Rezende, C., Jr.; Espinoza-Chávez, R.M.; Dias, L.C.; Andricopulo, A.D. Chemoinformatics Studies on a Series of Imidazoles as Cruzain Inhibitors. Biomolecules 2021, 11, 579. [Google Scholar] [CrossRef]
  22. De Souza, A.S.; Ferreira, L.L.; de Oliveira, A.S.; Andricopulo, A.D. Quantitative Structure-Activity Relationships for Structurally Diverse Chemotypes Having Anti-Trypanosoma cruzi Activity. Int. J. Mol. Sci. 2019, 20, 2801. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Maggiora, G.; Vogt, M.; Stumpfe, D.; Bajorath, J. Molecular similarity in medicinal chemistry: Miniperspective. J. Med. Chem. 2014, 57, 3186–3204. [Google Scholar] [CrossRef] [PubMed]
  24. Berthold, M.R.; Cebron, N.; Dill, F.; Gabriel, T.R.; Kötter, T.; Meinl, T.; Ohl, P.; Thiel, K.; Wiswedel, B. KNIME—The Konstanz Information Miner: Version 2.0 and Beyond. SIGKDD Explor. Newsl. 2009, 11, 26–31. [Google Scholar] [CrossRef] [Green Version]
  25. Weaver, S.; Gleeson, M.P. The importance of the domain of applicability in QSAR modeling. J. Mol. Graph. Model. 2008, 26, 1315–1326. [Google Scholar] [CrossRef] [PubMed]
  26. Sahigara, F.; Mansouri, K.; Ballabio, D.; Mauri, A.; Consonni, V.; Todeschini, R. Comparison of different approaches to define the applicability domain of QSAR models. Molecules 2012, 17, 4791–4810. [Google Scholar] [CrossRef] [Green Version]
  27. Martins, J.P.A.; Barbosa, E.G.; Pasqualoto, K.F.; Ferreira, M.M. LQTA-QSAR: A new 4D-QSAR methodology. J. Chem. Inf. Modeling 2009, 49, 1428–1436. [Google Scholar] [CrossRef]
  28. Bekker, H.; Berendsen, H.; Dijkstra, E.; Achterop, S.; Vondrumen, R.; Vanderspoel, D.; Sijbers, A.; Keegstra, H.; Renardus, M. Gromacs-a parallel computer for molecular-dynamics simulations. In Physics Computing ’92; World Scientific Publishing: Singapore, 1993; pp. 252–256. [Google Scholar]
  29. Berendsen, H.J.; van der Spoel, D.; van Drunen, R. GROMACS: A message-passing parallel molecular dynamics implementation. Comput. Phys. Commun. 1995, 91, 43–56. [Google Scholar] [CrossRef]
  30. Schuler, L.D.; Daura, X.; van Gunsteren, W.F. An improved GROMOS96 force field for aliphatic hydrocarbons in the condensed phase. J. Comput. Chem. 2001, 22, 1205–1218. [Google Scholar] [CrossRef]
  31. Chandrasekhar, I.; Kastenholz, M.; Lins, R.D.; Oostenbrink, C.; Schuler, L.D.; Tieleman, D.P.; van Gunsteren, W.F. A consistent potential energy parameter set for lipids: Dipalmitoylphosphatidylcholine as a benchmark of the GROMOS96 45A3 force field. Eur. Biophys. J. 2003, 32, 67–77. [Google Scholar] [CrossRef] [PubMed]
  32. Parrinello, M.; Rahman, A. Crystal Structure and Pair Potentials: A Molecular-Dynamics Study. Phys. Rev. Lett. 1980, 45, 1196–1199. [Google Scholar] [CrossRef]
  33. Berendsen, H.J.; Postma, J.V.; van Gunsteren, W.F.; DiNola, A.; Haak, J.R. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 1984, 81, 3684–3690. [Google Scholar] [CrossRef] [Green Version]
  34. Kim, S.; Cho, K.H. PyQSAR: A fast QSAR modeling platform using machine learning and Jupyter notebook. Bull. Korean Chem. Soc. 2019, 40, 39–44. [Google Scholar] [CrossRef] [Green Version]
  35. Schrödinger, LLC. The PyMOL Molecular Graphics System; Version 1.8; Schrödinger Inc.: New York, NY, USA, 2016. [Google Scholar]
Figure 1. General scaffold of group G1 (left) and group G2 (right).
Figure 1. General scaffold of group G1 (left) and group G2 (right).
Ijms 23 08898 g001
Figure 2. Scatter plot obtained by multidimensional scaling (MDS). The molecules of the G1 group are shown in blue, the molecules of the G2 group are shown in orange, and the structural outliers are shown in green.
Figure 2. Scatter plot obtained by multidimensional scaling (MDS). The molecules of the G1 group are shown in blue, the molecules of the G2 group are shown in orange, and the structural outliers are shown in green.
Ijms 23 08898 g002
Figure 3. Experimental versus predicted and estimated pIC50 values for the training and test sets of the G1 (left) and G2 (right) groups.
Figure 3. Experimental versus predicted and estimated pIC50 values for the training and test sets of the G1 (left) and G2 (right) groups.
Ijms 23 08898 g003
Figure 4. Contribution map generated by KPLS of groups G1 (AD) and G2 (EH). Green colors represent positive contributions, and red colors indicate negative contributions to biological activity.
Figure 4. Contribution map generated by KPLS of groups G1 (AD) and G2 (EH). Green colors represent positive contributions, and red colors indicate negative contributions to biological activity.
Ijms 23 08898 g004
Figure 5. Applicability domain for group G1 (left) and group G2 (right). The molecules of the training set are illustrated in blue, and the compounds of the test set are depicted in orange.
Figure 5. Applicability domain for group G1 (left) and group G2 (right). The molecules of the training set are illustrated in blue, and the compounds of the test set are depicted in orange.
Ijms 23 08898 g005
Figure 6. Contribution maps of the most and least potent compounds of group G1 (A,B) and group G2 (C,D).
Figure 6. Contribution maps of the most and least potent compounds of group G1 (A,B) and group G2 (C,D).
Ijms 23 08898 g006
Figure 7. Histogram for the distribution of the experimental pIC50 values over the entire compound set used in the QSAR studies.
Figure 7. Histogram for the distribution of the experimental pIC50 values over the entire compound set used in the QSAR studies.
Ijms 23 08898 g007
Table 1. The most statistically significant models generated by AutoQSAR using the complete dataset.
Table 1. The most statistically significant models generated by AutoQSAR using the complete dataset.
Training Set (%)R2SDQ2 (R2pred)RMSENFingerprint
700.53780.21800.49370.21231Radial
750.59970.20650.52840.19942Dendritic
800.63040.20220.61070.18175MOLPRINT 2D
R2: coefficient of determination for the training set; SD: standard deviation; Q2: predictive correlation coefficient for the test set (R2pred); RMSE: root-mean-square error for the test set predictions; N: optimum number of components.
Table 2. Statistically significant models generated by AutoQSAR for the G1 group.
Table 2. Statistically significant models generated by AutoQSAR for the G1 group.
Training Set (%)R2SDQ2 (R2pred)RMSENFingerprint
700.89820.11780.71320.10182Radial
750.80120.14130.70220.16681Radial
800.90690.10390.82010.09452Radial
Table 3. Statistically significant models generated by AutoQSAR for the G2 group.
Table 3. Statistically significant models generated by AutoQSAR for the G2 group.
Training Set (%)R2SDQ2 (R2pred)RMSENFingerprint
700.61090.2050.42060.18292MOLPRINT 2D
750.56930.20400.53510.10412MOLPRINT 2D
800.82060.13770.80010.10813Dendritic
Table 4. Experimental, predicted, and residual values of pIC50 for the G1 and G2 groups.
Table 4. Experimental, predicted, and residual values of pIC50 for the G1 and G2 groups.
2D-QSAR4D-QSAR
Complete Set Model Cluster ModelComplete Set ModelCluster Model
No.pIC50 exppIC50 predResidueGrouppIC50 predResiduepIC50 predResiduepIC50 predResidue
15.1385.1840.046G15.1870.0495.066−0.0715.210.072
24.9134.911−0.002- 1- 1- 14.92390.011- 1- 1
35.1334.962−0.171G15.061−0.0725.1780.0455.1710.039
45.4785.312−0.165G15.433−0.0445.099−0.3795.229−0.249
54.9224.9550.033G14.9950.0734.848−0.0734.893−0.028
65.295.017−0.273G15.173−0.1165.257−0.0335.155−0.135
75.4285.231−0.197G15.385−0.0435.118−0.3095.4620.035
84.9845.1030.119G14.9930.0095.1940.2115.0640.081
95.3875.470.082G15.388−0.0015.111−0.2765.293−0.094
104.9554.904−0.051G15.0980.1434.9900.0354.882−0.073
115.3975.4670.07G15.5540.1575.214−0.1835.349−0.047
125.1885.2320.043G15.2820.0935.056−0.1325.084−0.104
134.2894.8480.559G14.3690.0804.7470.4594.3590.071
145.2935.158−0.135G15.3450.0524.959−0.3345.205−0.088
155.0884.848−0.24G14.807−0.2815.2710.1845.013−0.075
165.2484.848−0.4G15.1040.1445.081−0.1675.4420.195
174.975.0070.037G14.970.0005.0950.1254.919−0.051
184.9315.1410.209G14.9940.0625.1960.2665.2380.308
195.3135.3330.019G15.235−0.0795.055−0.2585.5150.202
205.1935.062−0.132G15.126−0.0675.2450.0525.168−0.024
215.2215.3330.112G15.2450.0245.138−0.0835.3090.089
225.4555.5430.088G15.4930.0384.993−0.4625.306−0.148
235.6025.53−0.072G15.493−0.1095.358−0.2445.382−0.219
245.3145.185−0.13G15.264−0.0504.918−0.3965.294−0.019
255.1374.902−0.235G15.1820.0445.123−0.0135.133−0.004
264.6584.860.202G14.7160.0584.9210.2644.9240.266
275.5455.152−0.393G15.47−0.0755.5650.025.5510.007
284.5874.8820.295G14.6510.0644.468−0.1194.583−0.004
295.0334.812−0.221G24.856−0.1774.631−0.4024.92−0.113
305.1154.982−0.133G25.1460.0314.831−0.2844.786−0.329
315.0845.0940.01G25.120.0365.1350.0515.040−0.044
324.5924.7850.193G24.8410.2494.6990.1084.6950.104
335.0814.982−0.099G25.1120.0315.1930.1135.020−0.06
344.9764.785−0.192G24.907−0.0695.180.2044.965−0.011
355.0965.074−0.022G25.2690.1735.076−0.025.042−0.053
364.9324.883−0.049G24.891−0.0415.1880.2575.1050.173
375.1354.857−0.278G24.87−0.2655.116−0.0195.090−0.045
384.9814.9920.011G24.926−0.0555.3290.3485.1330.153
394.5984.7580.16- 1- 1- 14.5562−0.042- 1- 1
404.2794.3730.094G24.2830.0044.8020.5234.2890.011
414.4264.35−0.076G24.34−0.0864.5230.0974.9120.487
425.115.1370.027G25.1630.0534.873−0.2375.013−0.097
434.7554.9090.155G24.644−0.1105.0550.34.7810.027
444.7234.830.107G24.595−0.1284.8310.1095.0090.287
454.3584.7360.378G24.6020.2445.0100.6534.5890.232
464.9855.0140.03G24.974−0.0114.949−0.0354.886−0.099
474.9885.3730.385G25.1860.1984.937−0.054.944−0.043
484.6634.8740.212G24.8680.2054.9120.254.8820.22
494.7444.9250.181G24.728−0.0164.8260.0824.7950.051
504.925.0280.108G24.893−0.0275.0750.1554.838−0.082
515.0495.050.001G25.0920.0435.1080.0595.0630.015
524.6874.7530.067G24.8280.1414.8950.2094.8540.168
534.4454.6080.164G24.4560.0114.9480.5044.7640.319
545.414.933−0.477G25.297−0.1135.4120.0025.5180.109
555.0684.916−0.152G25.070.0024.870−0.1975.1490.082
564.944.925−0.015G24.9780.0384.860−0.0794.846−0.093
575.6235.444−0.18G25.455−0.1685.477−0.1455.596−0.027
585.0084.874−0.134G24.908−0.1004.992−0.0164.96−0.044
595.0725.0940.022G24.871−0.2014.875−0.1964.856−0.215
605.1375.1930.055G25.2190.0814.900−0.2374.911−0.226
615.2915.269−0.022G25.2990.0085.069−0.2224.945−0.345
625.074.875−0.196G25.1150.0455.027−0.0434.978−0.092
634.7474.8860.139G24.741−0.0074.9170.174.9530.207
645.165.12−0.04G25.077−0.0835.048−0.1125.043−0.116
- 1 Structural outlier.
Table 5. Statistically significant 4D-QSAR models.
Table 5. Statistically significant 4D-QSAR models.
DatasetR2RMSEQ25-foldRMSEcvR2pred
Complete dataset0.45990.22770.41370.24120.4353
G10.80330.13130.66000.17160.6480
G20.70050.15600.60950.17010.6581
Table 6. Structures and pIC50 values of the dataset compounds used in the QSAR studies.
Table 6. Structures and pIC50 values of the dataset compounds used in the QSAR studies.
No.StructurepIC50 expNo.StructurepIC50 expNo.StructurepIC50 exp
1 Ijms 23 08898 i0015.1382 Ijms 23 08898 i0024.9133 Ijms 23 08898 i0035.133
4 Ijms 23 08898 i0045.4785 Ijms 23 08898 i0054.9226 Ijms 23 08898 i0065.29
7 Ijms 23 08898 i0075.4288 Ijms 23 08898 i0084.9849 Ijms 23 08898 i0095.387
10 Ijms 23 08898 i0104.95511 Ijms 23 08898 i0115.39712 Ijms 23 08898 i0125.188
13 Ijms 23 08898 i0134.28914 Ijms 23 08898 i0145.29315 Ijms 23 08898 i0155.088
16 Ijms 23 08898 i0165.24817 Ijms 23 08898 i0174.9718 Ijms 23 08898 i0184.931
19 Ijms 23 08898 i0195.31320 Ijms 23 08898 i0205.19321 Ijms 23 08898 i0215.221
22 Ijms 23 08898 i0225.45523 Ijms 23 08898 i0235.60224 Ijms 23 08898 i0245.314
25 Ijms 23 08898 i0255.13726 Ijms 23 08898 i0264.65827 Ijms 23 08898 i0275.545
28 Ijms 23 08898 i0284.58729 Ijms 23 08898 i0295.03330 Ijms 23 08898 i0305.115
31 Ijms 23 08898 i0315.08432 Ijms 23 08898 i0324.59233 Ijms 23 08898 i0335.081
34 Ijms 23 08898 i0344.97635 Ijms 23 08898 i0355.09636 Ijms 23 08898 i0364.932
37 Ijms 23 08898 i0375.13538 Ijms 23 08898 i0384.98139 Ijms 23 08898 i0394.598
40 Ijms 23 08898 i0404.27941 Ijms 23 08898 i0414.42642 Ijms 23 08898 i0425.11
43 Ijms 23 08898 i0434.75544 Ijms 23 08898 i0444.72345 Ijms 23 08898 i0454.358
46 Ijms 23 08898 i0464.98547 Ijms 23 08898 i0474.98848 Ijms 23 08898 i0484.663
49 Ijms 23 08898 i0494.74450 Ijms 23 08898 i0504.9251 Ijms 23 08898 i0515.049
52 Ijms 23 08898 i0524.68753 Ijms 23 08898 i0534.44554 Ijms 23 08898 i0545.41
55 Ijms 23 08898 i0555.06856 Ijms 23 08898 i0564.9457 Ijms 23 08898 i0575.623
58 Ijms 23 08898 i0585.00859 Ijms 23 08898 i0595.07260 Ijms 23 08898 i0605.137
61 Ijms 23 08898 i0615.29162 Ijms 23 08898 i0625.0763 Ijms 23 08898 i0634.747
64 Ijms 23 08898 i0645.16
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Teles, H.R.; Ferreira, L.L.G.; Valli, M.; Coelho, F.; Andricopulo, A.D. Hierarchical Clustering and Target-Independent QSAR for Antileishmanial Oxazole and Oxadiazole Derivatives. Int. J. Mol. Sci. 2022, 23, 8898. https://doi.org/10.3390/ijms23168898

AMA Style

Teles HR, Ferreira LLG, Valli M, Coelho F, Andricopulo AD. Hierarchical Clustering and Target-Independent QSAR for Antileishmanial Oxazole and Oxadiazole Derivatives. International Journal of Molecular Sciences. 2022; 23(16):8898. https://doi.org/10.3390/ijms23168898

Chicago/Turabian Style

Teles, Henrique R., Leonardo L. G. Ferreira, Marilia Valli, Fernando Coelho, and Adriano D. Andricopulo. 2022. "Hierarchical Clustering and Target-Independent QSAR for Antileishmanial Oxazole and Oxadiazole Derivatives" International Journal of Molecular Sciences 23, no. 16: 8898. https://doi.org/10.3390/ijms23168898

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop