A QSAR Study for Antileishmanial 2-Phenyl-2,3-dihydrobenzofurans †

Bernal, Freddy A.; Schmidt, Thomas J.

doi:10.3390/molecules28083399

Open AccessArticle

A QSAR Study for Antileishmanial 2-Phenyl-2,3-dihydrobenzofurans ^†^†

by

Freddy A. Bernal

^‡

and

Thomas J. Schmidt

^*

University of Münster, Institute of Pharmaceutical Biology and Phytochemistry (IPBP), PharmaCampus—Corrensstraße 48, 48149 Münster, Germany

^*

Author to whom correspondence should be addressed.

^†

This work is cordially dedicated to the memory of Prof. A. Wilhelm Alfermann, Düsseldorf, deceased on 23 February 2023, who spent a great part of his scientific life working on natural lignans.

^‡

Current address: Transfer Group Anti-Infectives, Leibniz Institute for Natural Product Research and Infection Biology, HKI, Beutenbergstraße 11a, 07745 Jena, Germany.

Molecules 2023, 28(8), 3399; https://doi.org/10.3390/molecules28083399

Submission received: 8 March 2023 / Revised: 4 April 2023 / Accepted: 6 April 2023 / Published: 12 April 2023

(This article belongs to the Special Issue Computational Approaches: Drug Discovery and Design in Medicinal Chemistry and Bioinformatics II)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Leishmaniasis, a parasitic disease that represents a threat to the life of millions of people around the globe, is currently lacking effective treatments. We have previously reported on the antileishmanial activity of a series of synthetic 2-phenyl-2,3-dihydrobenzofurans and some qualitative structure–activity relationships within this set of neolignan analogues. Therefore, in the present study, various quantitative structure–activity relationship (QSAR) models were created to explain and predict the antileishmanial activity of these compounds. Comparing the performance of QSAR models based on molecular descriptors and multiple linear regression, random forest, and support vector regression with models based on 3D molecular structures and their interaction fields (MIFs) with partial least squares regression, it turned out that the latter (i.e., 3D-QSAR models) were clearly superior to the former. MIF analysis for the best-performing and statistically most robust 3D-QSAR model revealed the most important structural features required for antileishmanial activity. Thus, this model can guide decision-making during further development by predicting the activity of potentially new leishmanicidal dihydrobenzofurans before synthesis.

Keywords:

2-phenyl-2,3-dihydrobenzofurans; Leishmania; 3D-QSAR; QSAR; neolignan analogues

1. Introduction

The World Health Organization (WHO) has recognized Leishmaniasis as a public health concern, being one of the so-called Neglected Tropical Diseases (NTDs). It has been estimated that 600,000 to 1 million people are infected every year with the various forms of Leishmaniasis, primarily in tropical and subtropical regions [1,2], and despite control and surveillance campaigns, the panorama has worsened lately with clear outbreaks due to management issues associated with the COVID-19 pandemic [1]. Even though the disease burden (in disability-adjusted life years, DALYs) had reduced by 5.4% from 2015 to 2019 [1], it is still considerably high (>600,000 DALYs [3]) with more than 230,000 newly reported cases in 2021 [1]. Caused by parasites of the genus Leishmania, it exists in several clinical forms related to the particular species affecting the host [2,4,5]. Current treatments are inadequate, displaying several drawbacks including, but not limited to, high toxicity and poor efficacy [3,6,7]. Different institutions and research groups have put great effort into the search for antileishmanials [3,8,9,10,11,12], however, effective drugs remain to be found, especially for Visceral Leishmaniasis, the most aggressive form of the disease.

The use of computational methods to aid in solving different problems in drug discovery pipelines is becoming more and more important, particularly with the advent of artificial intelligence [13,14,15,16,17,18,19]. In silico approaches have, therefore, been applied for the rational design and discovery of potential drugs against Leishmaniasis [20,21,22,23,24,25]. Due to the limited knowledge about validated targets for Leishmaniasis, ligand-based methods investigating structure–activity relationships (SARs) may represent a suitable approach. During our research program for fighting Leishmaniasis using natural products and natural product-like small molecules, we have reported a series of 2-phenyl-2,3-dihydrobenzofurans, synthetic analogues of natural dihydrobenzofuran neolignans, with antileishmanial potential [26]. In such a study, from qualitative inspection of the compounds’ structure and activity, it became evident that an in-depth study focusing on quantitative structure–activity relationships (QSAR) for this series of compounds would be interesting. Therefore, we present herein a comparative QSAR study for antileishmanial 2-phenyl-2,3-dihydrobenzofurans, using different machine learning methods and molecular descriptors, as well as 3D-QSAR. The various models’ statistical performance was assessed exhaustively using a comprehensive set of existing quality metrics and compared between the different approaches. Key structural features conferring activity were finally deduced from the best-performing model.

2. Results and Discussion

2.1. Data Set

The data set used in the present study comprises a series of seventy congeneric trans-2-phenyl-2,3-dihydrobenzofurans with antileishmanial potential previously synthesized and reported by us [26]. According to their structural features, two different groups, A and B can be easily distinguished (Figure 1). The full list of the individual compounds’ structures used in this study and their antileishmanial activity are reported in Tables S1 and S2 (Supplementary Materials). Compounds in class A have natural product-like structures fairly close to neolignans commonly found in plants [27,28], which are biosynthesized as dimers of phenylpropenoid building blocks. Compounds of class B are synthetic analogues lacking the characteristic dimeric nature of natural products. The selected compounds possess widely distributed antileishmanial activity, with IC₅₀ values against axenic amastigotes of Leishmania donovani ranging from 0.5 to >200 μM (i.e., covering almost three orders of magnitude), which makes them amenable for QSAR analyses.

2.2. QSAR Modeling

3D molecular models for each compound (Tables S1 and S2) were obtained by energy minimization of the lowest energy conformer from a conformational search, using the semi-empirical AM1 method. The resulting structures were used to calculate molecular descriptors for the purpose of machine learning (ML)-based QSAR and aligned for 3D-QSAR modeling. Using the Molecular Operating Environment (MOE) software [29], a total set of 435 molecular descriptors was obtained. Feature selection through contingency analysis, as implemented in MOE (Table S3, Supplementary Materials), led to a reduced set of 107 descriptors, corresponding to those with the highest utility for QSAR modeling (see Section 3.1 for details; for the full list of descriptors see Table S9, Supplementary Materials). Multiple Linear Regression (MLR), Random Forest (RF), and Support Vector Regression (SVR) were used as learning algorithms for the training of descriptor-based models. On the other hand, the structures prepared as mentioned above were aligned using Open3DAlign [30], whereupon Open3DQSAR [31] was employed to train 3D-QSAR models using Partial Least Squares (PLS) regression. Details for each model type are presented below. Three different combinations of training/test sets were used in each case by the random splitting of the samples (percentage ratio of 74/26 unless otherwise stated).

2.2.1. Model Building

As a first approach, MLR was used to train models MLR1-MLR3. In all cases, a maximum number of 5 to 6 features were employed, thus being within the fifth of the samples in the training set, as recommended by the Organization for Economic Co-operation and Development (OECD) [32]. A genetic algorithm (GA) [32,33,34] was then used for efficient feature selection. Several independent searches with a different fixed number of features (3, 4, 5, and 6 descriptors) were performed. From each run, the top five models according to

Q^{2}

(coefficient of determination during leave-one-out cross-validation, CV) were kept for further analysis (mathematical equations representing models MLR1-MLR3 are shown in Table S4, Supplementary Materials).

During our second descriptor-based approach, models were independently trained using RF and SVR (models RF1–RF3 and SVR1–SVR3, respectively). These algorithms, unlike MLR, are not limited to strict linear correlation and might hence perform better in the case of nonlinear SAR. Hyperparameter optimization was independently performed in each case to obtain the best possible outcome (according to CV).

Finally, a CoMFA-like method was applied to generate models 3D1–3D3, based on Molecular Interaction Field (MIF) calculations and PLS. The three models differed in the composition of training/test sets at a constant splitting ratio. MIFs were obtained using the MMFF94 force field, as implemented in the Open3DQSAR [31] software. Feature selection (see experimental section) led to data matrices typically exceeding 2000 variables. The optimum number of final latent variables (LV) for PLS was chosen by CV.

2.2.2. Model Validation

It is well-known that a high

Q^{2}

on its own does not assure good predictive power [35]. Nor necessarily does a high

R_{p r e d}^{2}

, mainly due to its strong dependence on the selection of the training set [36]. Therefore, the performance of the generated models was validated through exhaustive statistical assessment (Table S5, Supplementary Materials) using thirteen different metrics commonly accepted within the QSAR community (Table S6, Supplementary Materials). The use of such statistical parameters ensured a comprehensive assessment of model performance; however, this was accompanied by a practical limitation in terms of model comparison. Thus, dimensionality reduction by Principal Component Analysis (PCA) offered a simple solution for a qualitative comparison among models (Figure S1, Supplementary Materials). It was then evident that SVR and MLR models performed similarly.

The relative variability among validation metrics was seemingly low as evidenced by the corresponding coefficient of variation, demonstrating that, for most of the statistical parameters, all the models performed rather comparably (Table S5, Supplementary Materials). In addition to directly comparing all the metrics for the validated models, two consensus scores (F1 and F2) were calculated (Figure 2). We have already shown the utility of such a strategy for validation and model performance assessment in regression problems [37] (see Table S7, Supplementary Materials, for score definitions). The F1 score denotes the number of statistical parameters within typical or commonly established thresholds (i.e., the number of positive assessments). F2 assigns either a reward or a penalization for each statistical parameter included to reflect compliance with established thresholds in order to assure good performance (the higher F2, the better the model) [37]. Since it was observed that F1 alone might lead to misinterpretations for models with poor CV statistics, F2 was exclusively calculated for models with

Q^{2}

and

R_{p r e d}^{2}

above 0.5, thus guaranteeing more stringent criteria.

Evidently, important differences among models became obvious through the use of both consensus scores. According to F1, the 3D-QSAR models (3D1–3D3) clearly outperformed all the other models, without failing any criteria (i.e., complying with all the parameters’ thresholds used herein). Models RF1 and RF2 showed compliance with 10 out of 12 criteria, however, the latter model might suffer from overfitting as suspected from poor CV statistics (

Q^{2}

< 0.5). Models SVR1 and SVR2 showed compliance with 8 validation criteria, and although MLR1-MLR3 exhibited the same level of compliance, SVR models afforded generally better F2 scores. Thus, even the best MLR model (MLR1) performed worse than the best SVR and RF models (SVR1 and RF1, respectively, Figure 2). According to both consensus scores, F1 and F2, the MIF-based 3D-QSAR is the method of choice for modeling the data set under study.

The effect of training/test size on model performance was analyzed by changing the respective ratio from 74:26 to 70:30 (model 3D4) and 80:20 (model 3D5). As can be seen, the newly generated models displayed higher F2 values (Figure 2) than 3D1–3D3. Thus, regardless of the training set size, 3D-QSAR models are comparably good predictors.

Owing to their higher F2 scores, the quality of models 3D4 and 3D5 was further investigated through the determination of their robustness. The progressive scrambling method [38], as implemented in Open3DQSAR, was used to achieve this ultimate comparison. The method calculates a normalized correlation coefficient (

{Q_{0}^{*}}^{2}

) resulting from the fitting of

Q^{2}

and

R^{2}

after progressive scrambling, which can be interpreted in the same manner as a normal

Q^{2}

value [38] (i.e., the higher the better). The calculated

{Q_{0}^{*}}^{2}

for models 3D4 and 3D5 was 0.61 and 0.59, respectively. The rather subtle difference would suggest that both models are equally robust. Nevertheless, subsequent analysis was performed with 3D4 as the nominally best model.

2.2.3. Applicability Domain for Model 3D4

Once having a valid and robust model in hands, determination of the applicability domain (AD) was mandatory in order to fulfill another OECD requirement [32]. The AD in its currently accepted definition is the response and chemical structure space in which the model makes predictions with a given reliability [39]. Therefore, it constitutes a fundamental point to assure the correct use of any model when the prediction of new, unseen compounds is desired. Within the plethora of existing methods for defining AD [39,40], the leverage method [39,40,41,42] (a distance-based method) was used in the present research. The corresponding Williams plot (standardized residuals vs. leverage) is shown in Figure 3A. It becomes obvious that none of the compounds in the test set appeared beyond the “warning leverage” (denoted h* and represented by the vertical dashed line), indicating that all of them are within the AD of the model. Leverage values higher than h* in the test set would have indicated unreliable predictions as a result of substantial extrapolation [39,41]. Notably, compounds 24 and 21 (see Table S1, Supplementary Materials, for structures), members of the training set, have leverages higher than h*, showing their significant influence on the regression model. Both of them were accurately predicted (low standardized residuals). On the other hand, compound 13 (see Figure 3B) yielded a relatively large standardized residual, demonstrating that its activity was not entirely well-predicted by the model, although still located within an accepted range (<2.5σ). Compliance with AD during model building and validation, as demonstrated above for all the compounds used, represents yet another strength of model 3D4. The combination of good validation statistics with proven robustness and well-defined AD, therefore, makes it a reliable model for the prediction of the antileishmanial activity of dihydrobenzofurans. Activity predictions by model 3D4 are summarized in Table S8 (Supplementary Materials).

2.2.4. Model Interpretation

One of the most important goals of QSAR models, in addition to predicting the activity of new compounds, is their interpretation in order to rationalize the underlying SARs [43]. Interpretation is particularly straightforward in the case of CoMFA and other 3D-QSAR variants based on MIFs, due to the implicit easiness of visualization [44]. Therefore, an analysis of the key structural features affecting the antileishmanial activity was carried out by inspecting the MIFs-derived contour maps for model 3D4. Such maps represent MIF regions with a high impact on the PLS regression model and are generated by plotting the PLS coefficients of MIF regions with absolute values higher than a certain threshold. CoMFA maps for 3D4 are shown in Figure 3C,D around the structures of compounds 13 and 30, as representative potent and inactive compounds, respectively (see Figure 3B for chemical structures). Van der Waals interactions (Figure 3C) around the substituent on position C-5 showed a positive effect on the activity, being probably the most important characteristic (green contours V₁ and V₂). This was in very good agreement with previously observed qualitative SARs [26]. Compounds bearing an acrylate unit were generally more active than those without it, which, according to the 3D-QSAR model, is partially due to increased steric bulk in that region. Moreover, compounds with bulky alkoxy groups obtained by esterification of the acrylate moiety were more active (contour V₂). However, the steric MIFs also indicated a small negative effect on activity in cases where this substituent was too large (as found for compound 16, yellow contour V₃). The presence of substituents on positions C-3′ and C-5′ (pending phenyl ring) were determined as positively affecting the antileishmanial activity, too (contour V₄). A similar trend was evident for the substituents on the carboxy group attached to C-3 (green contour V₅). On the other hand, analysis of the electrostatic interaction field (Figure 3D) revealed that the presence of electron-rich chains on C-5 increased the activity (big red contour E₁). In contrast, some electron deficiency on the aromatic ring near C-5 might improve the activity (blue contour E₂). Electron deficiency on the pending phenyl group resulted in a favorable effect on activity (blue contour E₃) as well, whereas H-bond donors on C-3′ and C-5′ led to superior activity (red contours E₄). The MIFs also indicated that the establishment of H-bonds by the carboxy group on C-3 might improve the activity (both donor and acceptor nature; contours E₅).

The steric field contributed to a larger extent to the explanation of the variance in activity (62.43% steric vs. 37.57% electrostatic) in model 3D4, suggesting that increasing the lateral chain sizes to a certain optimum played a more important role than, for instance, changing electron density on the benzofuran moiety. All the observations and conclusions retrieved from this model were in full agreement with the reported qualitative SAR analyses [26].

3. Materials and Methods

3.1. Data Preparation

A basic preparation of the data set was carried out for all the compounds included as follows: 2D structures of the trans-2-phenyl-2,3-dihydrobenzofurans were converted into 3D models assuring a (2R)-configuration in the Molecular Operating Environment (MOE) software (version 2018.0101) [29]. Since all compounds were synthesized and tested as racemates, this does not imply that the R-enantiomers are the eutomers. Each structure was then submitted to energy minimization using the Amber10:EHT molecular field. Subsequently, a conformational search using the LowMD mode in MOE within an energy window of 5 kcal/mol and RMSD limit of 0.75 Å was performed. The structures of the lowest energy conformers were refined by energy minimization using the semi-empirical AM1 method with the MOPAC module of MOE. The obtained 3D structures were used for the calculation of the whole set of 435 molecular descriptors available in MOE. The suitability of the molecular descriptors for QSAR purposes was assessed by contingency analysis as implemented in MOE. Minimum threshold values of 0.6 for the contingency coefficient and 0.2 for Cramér’s V, uncertainty, and correlation coefficients were chosen for the selection of 107 descriptors to be used in the QSAR study (Table S3; see Table S9, Supplementary Materials, for the final list of descriptors used). Activity data (Tables S1 and S2, Supplementary Materials) were used in the form of the negative decadic logarithm (pIC₅₀) of the half-maximal inhibitory concentration (IC₅₀ in mol/L).

3.2. Multiple Linear Regression Models

The data set was divided into training and test sets as follows: the compounds were sorted in descending order of activity (pIC₅₀) and 18 different bins were defined. From each bin, a compound was randomly selected and assigned to a test set representing 26% of the samples. The process was repeated several times to obtain different training/test set compositions for model building.

QSAR models were then built for those data sets using the genetic algorithm-driven variable selection and multiple linear regression analysis (GA/MLR) [32,33,34]. The GA/MLR algorithm was obtained from the CCG/MOE SVL exchange website (script GA.svl) [45]. A fixed number of variables was used in all cases and models with 3 to 7 variables generated for each training set. For each GA/MLR run, a set of 100 models was generated. Each GA run had a maximum of 1000 evolution cycles as termination criteria. In each case, the five models with the highest

Q^{2}

during leave-one-out (LOO) cross-validation (CV) were tested for external predictivity by calculating the activity of the test set compounds.

3.3. Random Forest Models

Random forest (RF) is an ensemble learning method based on the use of a group of decision trees [46,47]. A bootstrapped sample of data is employed for the training of each tree, typically considering a randomly selected subset of features during node splitting. The final predicted property is an average of all the predictions obtained from the individual decision trees. Reduction in the Gini index or Gini “impurity” drives node splitting [48]. Data sets prepared as described before were submitted to RF regression using Scikit-learn [49]. The number of trees in the forest, the minimum number of samples required to be at a leaf node, the minimum number of samples required to split an internal node, the maximum number of features to consider for the best split, and the number of samples to draw from the training set during bootstrap were subject of optimization in this work. A coarse-to-fine approach was followed to accomplish such a hyperparameter tuning. During the first instance, random sampling within the selected hyperparameter space was performed, while the second instance consisted of an exhaustive grid search.

Q^{2}

statistics (5-fold CV) guided the selection of the best combination of hyperparameters. The corresponding Scikit-learn implementations were applied to successfully accomplish this sequential process.

3.4. Support Vector Machines Models

Support Vector Machines (SVM) attempt to segregate the data set into different classes of objects defining so-called hyperplanes [50]. Data points located close to the hyperplanes are denominated support vectors. Minimization of the gap between the support vectors delimiting a hyperplane (margin) drives the selection of the best hyperplanes. Kernel functions are typically required to help find the hyperplanes through a process of transformation from a lower to a higher dimensional space (i.e., increasing the dimensionality). Datasets prepared as described before were submitted to SVM regression using Scikit-learn [49]. The kernel function, the kernel coefficient gamma, the epsilon-tube (if applicable), and the regularization parameter C [50] were subject to optimization, using the same protocol as described for RF.

3.5. 3D-QSAR Models

The 3D molecular structures obtained as described above were aligned using the automatic alignment algorithm implemented in Open3DAlign [30]. Molecular interaction fields (MIFs) were calculated using the MMFF94 force field with default probes (neutral C atom in alkyl chain with sp³ hybridization for the steric MIF and a single positive punctual charge for the electrostatic MIF) with a 1.0 Å grid-step in a box of 28 × 30 × 23 Å (box big enough as to have a 5 Å out gap around the largest molecule). The number of obtained variables was reduced according to conventional cutoff limits (±30 kcal/mol). The remaining variables were scaled by the block unscaled weighting [51] algorithm implemented in Open3DQSAR [31] (version 2.281). Then, a variable selection procedure comprising Smart Region Definition [52] and Fractional Factorial Design [53] was carried out. Both algorithms were directly applied in Open3DQSAR. Finally, models were generated by Partial Least Squares (PLS) regression [54,55], using LOO-CV. The PLS coefficients were exported to be visualized in MOE as MIF contours. Different training and test sets were created by random splitting, where the latter constituted 20, 26, and 30% of the samples.

3.6. Statistical Validation

Quality assurance was assessed by calculating several statistical parameters [56], in addition to the conventional

R^{2}

and

Q^{2}

, and are denoted as follows:

R_{0}^{2}

,

{R_{0}^{'}}^{2}

, k, and k′ [35],

R_{m}^{2}

[36,57], mean absolute error (MAE) [58],

R_{p r e d}^{2}

(=

Q_{F 1}^{2}

) [59],

Q_{F 2}^{2}

[60],

Q_{F 3}^{2}

[61], and the concordance correlation coefficient (CCC) [62]. Definitions for those validation parameters are included in the Supporting Information (Table S6, Supplementary Materials). An in-house MATLAB script was employed for the simultaneous calculation of all the parameters. Thereafter, the models were scored using the global scoring functions F1 and F2, as previously reported [37] (see Table S7 for definition).

A comparison of the robustness of the two models with the best F2 scores was achieved using the statistical variation in the progressive scrambling method [38] as implemented in Open3DQSAR. Assessment of the applicability domain was carried out by the leverage approach [42,63], using an in-house MATLAB algorithm. The results were displayed as the corresponding Williams plot [42,63].

4. Conclusions

The promising antileishmanial potential of some 2-phenyl-2,3-dihydrobenzofurans, together with its evident structural dependence, encouraged us to thoroughly explore the structure–activity relationships underlying an existing medium size data set.

To this end, a considerable number of different QSAR models for the antileishmanial activity of the studied compounds were created. Three different machine learning methods trained on a matrix of 107 molecular descriptors, i.e., MLR, RF, and SVR, as well as 3D-QSAR based on the compounds’ MIFs were used to generate regression models. A comprehensive quality assessment by various validation metrics clearly demonstrated that 3D-QSAR models exhibited the best statistical quality, outperforming all descriptor-based models obtained with the other approaches. After evaluation of statistical robustness, model 3D4 was chosen to analyze the underlying MIFs for structural information quantitatively related to the antileishmanial activity. The significant role of an acrylate unit on C-5 was disclosed. Furthermore, a positive steric effect on the activity by bulky ester groups on that acrylate was confirmed. Substitutions on C-3′, C-4′, or C-5′ causing electron deficiency on the 2-phenyl ring might increase the activity, while H-bond donors on C-3′ and C-5′ would also improve it. Finally, the assessment of the applicability domain of the model emphasized the proper inclusion of all the studied compounds.

In summary, a complete statistical analysis and comparison of various QSAR models led to an exhaustively validated and robust final model able to predict the antileishmanial potency of 2-phenyl-2,3-dihydrobenzofurans. The major outcome of this research can thus be considered as a fundamental first-line tool for further analysis and development of this kind of compound to fight Leishmaniasis.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/molecules28083399/s1, Table S1: Antileishmanial activity and chemical structure of compounds from class A, Table S2: Antileishmanial activity and chemical structure of compounds from class B, Table S3: Feature selection by contingency analysis in MOE, Table S4: Equations describing models MLR1-MLR3, Table S5: Model performance assessment using different validation metrics, Table S6: Definition of validation parameters used for assessing the performance of the models, Table S7: Definition of consensus scoring functions, Figure S1: Comparison of models’ performance using different metrics and PCA, Table S8: Activity predictions by model 3D4, Table S9: Molecular descriptors considered for model building. References [26,35,37,57,58,60,61,62,64,65,66,67] are cited in the Supplementary Materials.

Author Contributions

Conceptualization, F.A.B. and T.J.S.; methodology, software utilization, validation, formal analysis, visualization, and writing—original draft preparation, F.A.B.; supervision, resources, writing—review and editing, T.J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available from authors on reasonable request.

Acknowledgments

F. Bernal would like to thank the Colombian Ministry of Science, Technology, and Innovation (formerly Colciencias) for his doctoral fellowship in Münster, Germany. This study is part of the activities within the Research Network Natural Products against Neglected Diseases (ResNet NPND, http://www.resnetnpnd.org/, accessed on 5 April 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Sample Availability

Not applicable.

References

WHO. Global Report on Neglected Tropical Diseases 2023; WHO: Geneva, Switzerland, 2023. [Google Scholar]
DNDi. New Hope for Leishmaniasis; DNDi R&D and Access Programmes in Focus: Geneva, Switzerland, 2022. [Google Scholar]
De Rycker, M.; Wyllie, S.; Horn, D.; Read, K.D.; Gilbert, I.H. Anti-Trypanosomatid Drug Discovery: Progress and Challenges. Nat. Rev. Microbiol. 2023, 21, 35–50. [Google Scholar] [CrossRef] [PubMed]
Burza, S.; Croft, S.L.; Boelaert, M. Leishmaniasis. Lancet 2018, 392, 951–970. [Google Scholar] [CrossRef]
Mann, S.; Frasca, K.; Scherrer, S.; Henao-Martínez, A.F.; Newman, S.; Ramanan, P.; Suarez, J.A. A Review of Leishmaniasis: Current Knowledge and Future Directions. Curr. Trop. Med. Rep. 2021, 8, 121–132. [Google Scholar] [CrossRef] [PubMed]
Van den Kerkhof, M.; Mabille, D.; Chatelain, E.; Mowbray, C.E.; Braillard, S.; Hendrickx, S.; Maes, L.; Caljon, G. In Vitro and in Vivo Pharmacodynamics of Three Novel Antileishmanial Lead Series. Int. J. Parasitol. Drugs Drug Resist. 2018, 8, 81–86. [Google Scholar] [CrossRef] [PubMed]
Olías-Molero, A.I.; de la Fuente, C.; Cuquerella, M.; Torrado, J.J.; Alunda, J.M. Antileishmanial Drug Discovery and Development: Time to Reset the Model? Microorganisms 2021, 9, 2500. [Google Scholar] [CrossRef]
Furuse, Y. Analysis of Research Intensity on Infectious Disease by Disease Burden Reveals Which Infectious Diseases Are Neglected by Researchers. Proc. Natl. Acad. Sci. USA 2019, 116, 478–483. [Google Scholar] [CrossRef] [Green Version]
Ferreira, L.L.G.; de Moraes, J.; Andricopulo, A.D. Approaches to Advance Drug Discovery for Neglected Tropical Diseases. Drug Discov. Today 2022, 27, 2278–2287. [Google Scholar] [CrossRef]
Pinheiro, A.C.; de Souza, M.V.N. Current Leishmaniasis Drug Discovery. RSC Med. Chem. 2022, 13, 1029–1043. [Google Scholar] [CrossRef]
Bekhit, A.A.; El-Agroudy, E.; Helmy, A.; Ibrahim, T.M.; Shavandi, A.; Bekhit, A.E.-D.A. Leishmania Treatment and Prevention: Natural and Synthesized Drugs. Eur. J. Med. Chem. 2018, 160, 229–244. [Google Scholar] [CrossRef] [Green Version]
Kapil, S.; Singh, P.K.; Silakari, O. An Update on Small Molecule Strategies Targeting Leishmaniasis. Eur. J. Med. Chem. 2018, 157, 339–367. [Google Scholar] [CrossRef]
Frye, L.; Bhat, S.; Akinsanya, K.; Abel, R. From Computer-Aided Drug Discovery to Computer-Driven Drug Discovery. Drug Discov. Today Technol. 2021, 39, 111–117. [Google Scholar] [CrossRef] [PubMed]
Cerchia, C.; Lavecchia, A. New Avenues in Artificial-Intelligence-Assisted Drug Discovery. Drug Discov. Today 2023, 28, 103516. [Google Scholar] [CrossRef] [PubMed]
Sabe, V.T.; Ntombela, T.; Jhamba, L.A.; Maguire, G.E.M.; Govender, T.; Naicker, T.; Kruger, H.G. Current Trends in Computer Aided Drug Design and a Highlight of Drugs Discovered via Computational Techniques: A Review. Eur. J. Med. Chem. 2021, 224, 113705. [Google Scholar] [CrossRef] [PubMed]
Jiménez-Luna, J.; Grisoni, F.; Weskamp, N.; Schneider, G. Artificial Intelligence in Drug Discovery: Recent Advances and Future Perspectives. Expert Opin. Drug Discov. 2021, 16, 949–959. [Google Scholar] [CrossRef]
Zhao, L.; Ciallella, H.L.; Aleksunes, L.M.; Zhu, H. Advancing Computer-Aided Drug Discovery (CADD) by Big Data and Data-Driven Machine Learning Modeling. Drug Discov. Today 2020, 25, 1624–1638. [Google Scholar] [CrossRef]
Leelananda, S.P.; Lindert, S. Computational Methods in Drug Discovery. Beilstein J. Org. Chem. 2016, 12, 2694–2718. [Google Scholar] [CrossRef] [Green Version]
Macalino, S.J.Y.; Gosu, V.; Hong, S.; Choi, S. Role of Computer-Aided Drug Design in Modern Drug Discovery. Arch. Pharm. Res. 2015, 38, 1686–1701. [Google Scholar] [CrossRef]
Njogu, P.M.; Guantai, E.M.; Pavadai, E.; Chibale, K. Computer-Aided Drug Discovery Approaches against the Tropical Infectious Diseases Malaria, Tuberculosis, Trypanosomiasis, and Leishmaniasis. ACS Infect. Dis. 2016, 2, 8–31. [Google Scholar] [CrossRef]
Peña-Guerrero, J.; Nguewa, P.A.; García-Sosa, A.T. Machine Learning, Artificial Intelligence, and Data Science Breaking into Drug Design and Neglected Diseases. WIREs Comput. Mol. Sci. 2021, 11, e1513. [Google Scholar] [CrossRef]
Winkler, D.A. Use of Artificial Intelligence and Machine Learning for Discovery of Drugs for Neglected Tropical Diseases. Front. Chem. 2021, 9, 614073. [Google Scholar] [CrossRef]
Roca, C.; Sebastián-Pérez, V.; Campillo, N.E. In Silico Tools for Target Identification and Drug Molecular Docking in Leishmania. In Drug Discovery for Leishmaniasis; Rivas, L., Gil, C., Eds.; The Royal Society of Chemistry: London, UK, 2017; pp. 130–152. [Google Scholar]
Ferreira, L.L.G.; Andricopulo, A.D. Chemoinformatics Strategies for Leishmaniasis Drug Discovery. Front. Pharmacol. 2018, 9, 1278. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Halder, A.K.; Dias Soeiro Cordeiro, M.N. Advanced in Silico Methods for the Development of Anti-Leishmaniasis and Anti-Trypanosomiasis Agents. Curr. Med. Chem. 2020, 27, 697–718. [Google Scholar] [CrossRef] [PubMed]
Bernal, F.A.; Gerhards, M.; Kaiser, M.; Wünsch, B.; Schmidt, T.J. (±)-trans-2-Phenyl-2,3-Dihydrobenzofurans as Leishmanicidal Agents: Synthesis, in Vitro Evaluation and SAR Analysis. Eur. J. Med. Chem. 2020, 205, 112493. [Google Scholar] [CrossRef]
Teponno, R.B.; Kusari, S.; Spiteller, M. Recent Advances in Research on Lignans and Neolignans. Nat. Prod. Rep. 2016, 33, 1044–1092. [Google Scholar] [CrossRef] [Green Version]
Dewick, P.M. The Shikimate Pathway: Aromatic Amino Acids and Phenylpropanoids. In Medicinal Natural Products: A Biosynthetic Approach; John Wiley & Sons, Ltd.: Chichester, UK, 2009; pp. 137–186. ISBN 9780470742761. [Google Scholar]
Chemical Computing Group ULC Molecular Operating Environment (MOE). 1010 Sherbooke St.West, Suite #910; Version 2018.01; Chemical Computing Group ULC Molecular Operating Environment: Montreal, QC, Canada, 2019. [Google Scholar]
Tosco, P.; Balle, T.; Shiri, F. Open3DALIGN: An Open-Source Software Aimed at Unsupervised Ligand Alignment. J. Comput. Aided. Mol. Des. 2011, 25, 777–783. [Google Scholar] [CrossRef] [PubMed]
Tosco, P.; Balle, T. Open3DQSAR: A New Open-Source Software Aimed at High-Throughput Chemometric Analysis of Molecular Interaction Fields. J. Mol. Model. 2011, 17, 201–208. [Google Scholar] [CrossRef]
OECD. Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models; OECD Series on Testing and Assessment; OECD Publishing: Paris, France, 2007; ISBN 9789264085442. [Google Scholar]
Hopfinger, A.J.; Patel, H.C. Application of Genetic Algorithms to the General QSAR Problem and to Guiding Molecular Diversity Experiments. In Genetic Algorithms in Molecular Modeling; Devillers, J., Ed.; Academic Press Limited: London, UK, 1996; pp. 131–157. ISBN 9780122138102. [Google Scholar]
Sukumar, N.; Prabhu, G.; Saha, P. Applications of Genetic Algorithms in QSAR/QSPR Modeling. In Applications of Metaheuristics in Process Engineering; Valadi, J., Siarry, P., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 315–324. ISBN 9783319065083. [Google Scholar]
Golbraikh, A.; Tropsha, A. Beware of q²! J. Mol. Graph. Model. 2002, 20, 269–276. [Google Scholar] [CrossRef]
Roy, P.P.; Roy, K. On Some Aspects of Variable Selection for Partial Least Squares Regression Models. QSAR Comb. Sci. 2008, 27, 302–313. [Google Scholar] [CrossRef]
Bernal, F.A.; Schmidt, T.J. A Comprehensive QSAR Study on Antileishmanial and Antitrypanosomal Cinnamate Ester Analogues. Molecules 2019, 24, 4358. [Google Scholar] [CrossRef] [Green Version]
Clark, R.D.; Fox, P.C. Statistical Variation in Progressive Scrambling. J. Comput. Aided. Mol. Des. 2004, 18, 563–576. [Google Scholar] [CrossRef]
Netzeva, T.I.; Worth, A.P.; Aldenberg, T.; Benigni, R.; Cronin, M.T.D.; Gramatica, P.; Jaworska, J.S.; Kahn, S.; Klopman, G.; Marchant, C.A.; et al. Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships—The Report and Recommendations of ECVAM Workshop 52. Altern. Lab. Anim. 2005, 33, 155–173. [Google Scholar] [CrossRef] [PubMed]
Gadaleta, D.; Mangiatordi, G.F.; Catto, M.; Carotti, A.; Nicolotti, O. Applicability Domain for QSAR Models. Int. J. Quant. Struct. Relatsh. 2016, 1, 45–63. [Google Scholar] [CrossRef]
Gramatica, P. Principles of QSAR Models Validation: Internal and External. QSAR Comb. Sci. 2007, 26, 694–701. [Google Scholar] [CrossRef]
Gramatica, P. On the Development and Validation of QSAR Models. In Computational Toxicology. Methods in Molecular Biology; Reisfeld, B., Mayeno, A., Eds.; Humana Press: Totowa, NJ, USA, 2013; Volume 930, pp. 499–526. ISBN 9781627030588. [Google Scholar]
Fujita, T.; Winkler, D.A. Understanding the Roles of the “Two QSARs”. J. Chem. Inf. Model. 2016, 56, 269–274. [Google Scholar] [CrossRef] [PubMed]
Artese, A.; Cross, S.; Costa, G.; Distinto, S.; Parrotta, L.; Alcaro, S.; Ortuso, F.; Cruciani, G. Molecular Interaction Fields in Drug Discovery: Recent Advances and Future Perspectives. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2013, 3, 594–613. [Google Scholar] [CrossRef]
Chemical Computing Group (CCG)—Support and Training. Available online: https://www.chemcomp.com/Support.htm (accessed on 5 February 2019).
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Fawagreh, K.; Gaber, M.M.; Elyan, E. Random Forests: From Early Developments to Recent Advancements. Syst. Sci. Control Eng. Open Access J. 2014, 2, 602–609. [Google Scholar] [CrossRef] [Green Version]
Nembrini, S.; König, I.R.; Wright, M.N. The Revival of the Gini Importance? Bioinformatics 2018, 34, 3711–3718. [Google Scholar] [CrossRef] [Green Version]
Pedregosa, F.; Weiss, R.; Brucher, M.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Smola, A.J.; Schölkopf, B. A Tutorial on Support Vector Regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
Kastenholz, M.A.; Pastor, M.; Cruciani, G.; Haaksma, E.E.J.; Fox, T. GRID/CPCA: A New Computational Tool To Design Selective Ligands. J. Med. Chem. 2000, 43, 3033–3044. [Google Scholar] [CrossRef]
Pastor, M.; Cruciani, G.; Clementi, S. Smart Region Definition: A New Way to Improve the Predictive Ability and Interpretability of Three-Dimensional Quantitative Structure-Activity Relationships. J. Med. Chem. 1997, 40, 1455–1464. [Google Scholar] [CrossRef]
Baroni, M.; Costantino, G.; Cruciani, G.; Riganelli, D.; Valigi, R.; Clementi, S. Generating Optimal Linear PLS Estimations (GOLPE): An Advanced Chemometric Tool for Handling 3D-QSAR Problems. Quant. Struct. Relatsh. 1993, 12, 9–20. [Google Scholar] [CrossRef]
Wold, S.; Sjöström, M.; Eriksson, L. PLS-Regression, a Basic Tool of Chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
Abdi, H. Partial Least Squares Regression and Projection on Latent Structure Regression. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 97–106. [Google Scholar] [CrossRef]
Gramatica, P.; Sangion, A. A Historical Excursus on the Statistical Validation Parameters for QSAR Models: A Clarification Concerning Metrics and Terminology. J. Chem. Inf. Model. 2016, 56, 1127–1131. [Google Scholar] [CrossRef]
Roy, K.; Chakraborty, P.; Mitra, I.; Ojha, P.K.; Kar, S.; Das, R.N. Some Case Studies on Application of “Rm2” Metrics for Judging Quality of Quantitative Structure-Activity Relationship Predictions: Emphasis on Scaling of Response Data. J. Comput. Chem. 2013, 34, 1071–1082. [Google Scholar] [CrossRef]
Roy, K.; Das, R.N.; Ambure, P.; Aher, R.B. Be Aware of Error Measures. Further Studies on Validation of Predictive QSAR Models. Chemom. Intell. Lab. Syst. 2016, 152, 18–33. [Google Scholar] [CrossRef]
Shi, L.M.; Fang, H.; Tong, W.; Wu, J.; Perkins, R.; Blair, R.M.; Branham, W.S.; Dial, S.L.; Moland, C.L.; Sheehan, D.M. QSAR Models Using a Large Diverse Set of Estrogens. J. Chem. Inf. Comput. Sci. 2001, 41, 186–195. [Google Scholar] [CrossRef]
Schüürmann, G.; Ebert, R.-U.; Chen, J.; Wang, B.; Kühne, R. External Validation and Prediction Employing the Predictive Squared Correlation Coefficient—Test Set Activity Mean vs Training Set Activity Mean. J. Chem. Inf. Model. 2008, 48, 2140–2145. [Google Scholar] [CrossRef]
Consonni, V.; Ballabio, D.; Todeschini, R. Evaluation of Model Predictive Ability by External Validation Techniques. J. Chemom. 2010, 24, 194–201. [Google Scholar] [CrossRef]
Chirico, N.; Gramatica, P. Real External Predictivity of QSAR Models: How to Evaluate It? Comparison of Different Validation Criteria and Proposal of Using the Concordance Correlation Coefficient. J. Chem. Inf. Model. 2011, 51, 2320–2335. [Google Scholar] [CrossRef] [PubMed]
Eriksson, L.; Jaworska, J.; Worth, A.P.; Cronin, M.T.D.; McDowell, R.M.; Gramatica, P. Methods for Reliability and Uncertainty Assessment and for Applicability Evaluations of Classification- and Regression-Based QSARs. Environ. Health Perspect. 2003, 111, 1361–1375. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Roy, K.; Kar, S.; Das, R.N. Statistical Methods in QSAR/QSPR. In A Primer on QSAR/QSPR Modeling; Springer: Cham, Switzerland, 2015; pp. 37–59. ISBN 9780128015056. [Google Scholar]
Roy, P.P.; Paul, S.; Mitra, I.; Roy, K. On Two Novel Parameters for Validation of Predictive QSAR Models. Molecules 2009, 14, 1660–1701. [Google Scholar] [CrossRef]
Roy, K.; Mitra, I.; Kar, S.; Ojha, P.K.; Das, R.N.; Kabir, H. Comparative Studies on Some Metrics for External Validation of QSPR Models. J. Chem. Inf. Model. 2012, 52, 396–408. [Google Scholar] [CrossRef]
Lin, L.I.-K. A Concordance Correlation Coefficient to Evaluate Reproducibility. Biometrics 1989, 45, 255–268. [Google Scholar] [CrossRef]

Figure 1. General structures of trans-2-phenyl-2,3-dihydrobenzofurans investigated in the present study. For all structures of the set of compounds under study, see Tables S1 and S2, Supplementary Materials.

Figure 2. Consensus scores F1 and F2 for all the generated models according to [37]. Bars are colored by model type.

Figure 3. Applicability domain and interpretation of 3D-QSAR model 3D4. (A) Williams plot for AD definition of the model. Horizontal dashed lines represent 2σ and 3σ. The Vertical dashed line represents h* (see text for interpretation). (B) Chemical structures of representative potent (compound 13) and non-potent (compound 30) antileishmanials. (C,D) MIF regions showing steric interactions affecting positively (green) and negatively (yellow) the activity (C), and electrostatic interactions by positively (blue) and negatively (red) charged regions positively affecting the activity (D). MIFs with a strong impact on activity according to model 3D4 (LV = 5) are plotted around the structure of compounds 13 (dark gray) and 30 (light gray).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bernal, F.A.; Schmidt, T.J. A QSAR Study for Antileishmanial 2-Phenyl-2,3-dihydrobenzofurans ^†. Molecules 2023, 28, 3399. https://doi.org/10.3390/molecules28083399

AMA Style

Bernal FA, Schmidt TJ. A QSAR Study for Antileishmanial 2-Phenyl-2,3-dihydrobenzofurans ^†. Molecules. 2023; 28(8):3399. https://doi.org/10.3390/molecules28083399

Chicago/Turabian Style

Bernal, Freddy A., and Thomas J. Schmidt. 2023. "A QSAR Study for Antileishmanial 2-Phenyl-2,3-dihydrobenzofurans ^†" Molecules 28, no. 8: 3399. https://doi.org/10.3390/molecules28083399

Article Menu

A QSAR Study for Antileishmanial 2-Phenyl-2,3-dihydrobenzofurans ^†^†

Abstract

1. Introduction