Next Article in Journal
Simulation of Plasma Level Changes in Cerivastatin and Its Metabolites, Particularly Cerivastatin Lactone, Induced by Coadministration with CYP2C8 Inhibitor Gemfibrozil, CYP3A4 Inhibitor Itraconazole, or Both, Using the Metabolite-Linked Model
Previous Article in Journal
Suzetrigine: A Novel Non-Opioid Analgesic for Acute Pain Management—A Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identification of Pharmacophore Groups with Antimalarial Potential in Flavonoids by QSAR-Based Virtual Screening

by
Adriana de Oliveira Fernandes
1,2,
Valéria Vieira Moura Paixão
1,
Yria Jaine Andrade Santos
1,
Eduardo Borba Alves
1,
Ricardo Pereira Rodrigues
3,
Daniela Aparecida Chagas-Paula
4,
Aurélia Santos Faraoni
1,
Rosana Casoti
5,
Marcus Vinicius de Aragão Batista
6,
Marcel Bermudez
2,
Silvio Santana Dolabella
7,* and
Tiago Branquinho Oliveira
1,*
1
Department of Pharmacy, Centre for Biological Health Sciences, Federal University of Sergipe, São Cristóvão 49100-000, SE, Brazil
2
Institute for Pharmaceutical and Medicinal Chemistry, University of Münster, 48149 Münster, NRW, Germany
3
Faculty of Pharmaceutical Sciences, State University of Campinas, Campinas 13083-871, SP, Brazil
4
Chemistry Institute, Federal University of Alfenas, Alfenas 37130-000, MG, Brazil
5
Department of Antibiotics, Federal University of Pernambuco, Recife 50670-901, PE, Brazil
6
Department of Biology, Centre for Biological Health Sciences, Federal University of Sergipe, São Cristóvão 49100-000, SE, Brazil
7
Department of Morphology, Centre for Biological Health Sciences, Federal University of Sergipe, São Cristóvão 49100-000, SE, Brazil
*
Authors to whom correspondence should be addressed.
Drugs Drug Candidates 2025, 4(3), 33; https://doi.org/10.3390/ddc4030033
Submission received: 26 May 2025 / Revised: 27 June 2025 / Accepted: 1 July 2025 / Published: 4 July 2025
(This article belongs to the Section In Silico Approaches in Drug Discovery)

Abstract

Background/Objectives: Severe malaria, mainly caused by Plasmodium falciparum, remains a significant therapeutic challenge due to increasing drug resistance and adverse effects. Flavonoids, known for their wide range of bioactivities, offer a promising route for antimalarial drug discovery. The aim of this study was to elucidate key structural features associated with antimalarial activity in flavonoids and to develop accurate, interpretable predictive models. Methods: Curated databases of flavonoid structures and their activity against P. falciparum strains and enzymes were constructed. Molecular fingerprinting and decision tree analyses were used to identify key pharmacophoric groups. Subsequently, molecular descriptors were generated and reduced to build multiple classification and regression models. Results: These models demonstrated high predictive accuracy, with test set accuracies ranging from 92.85% to 100%, and R2 values from 0.64 to 0.97. Virtual screening identified novel flavonoid candidates with potential inhibitory activity. These were further evaluated using molecular docking and molecular dynamics simulations to assess binding affinity and stability with Plasmodium proteins (FabG, FabZ, and FabI). The predicted active ligands exhibited stable pharmacophore interactions with key protein residues, providing insights into binding mechanisms. Conclusions: This study provides highly predictive models for antimalarial flavonoids and enhances the understanding of structure–activity relationships, offering a strong foundation for further experimental validation.

1. Introduction

Malaria is a vector-borne disease caused by protozoan parasites of the genus Plasmodium, particularly Plasmodium falciparum, Plasmodium vivax, Plasmodium malariae, Plasmodium ovale, and Plasmodium knowlesi [1]. These parasites invade erythrocytes and typically induce flu-like symptoms, including fever, headache, vomiting, and arthralgia [2,3]. The World Malaria Report (2023) estimated that approximately 263 million cases occurred globally, with 89.7% occurring in Africa [4]. In Brazil, malaria is predominantly observed in the Amazon region and among rural or indigenous populations [5]. Owing to its strong association with impoverished tropical areas and insufficient sanitation infrastructure, malaria is classified as a neglected tropical disease [4].
Malaria has two distinct clinical forms: uncomplicated and severe. The uncomplicated form, which may be caused by any Plasmodium species, is typically characterised by milder symptoms. In contrast, severe malaria, predominantly associated with P. falciparum, manifests with more clinical features due to elevated parasitemia levels [6]. Standard treatment usually involves intravenous or intramuscular administration of antimalarial drugs. Artesunate is the preferred option in severe cases due to its favourable safety profile, although prolonged administration may suppress immune function [7]. Other therapeutic alternatives, such as artemether and quinine dihydrochloride, present limitations related to absorption and therapeutic efficacy [7]. Chloroquine, formerly a first-line antimalarial, has been largely discontinued due to widespread resistance and adverse effects [7,8].
Several key enzymes involved in the fatty-acid biosynthesis II (FAS-II) pathway, such as β-ketoacyl-ACP-reductase (FabG), β-hydroxyacyl-ACP-dehydratase (FabZ), and enoyl-ACP-reductase (FabI), constitute important molecular targets for antimalarial therapy [9,10]. Notably, these enzymes are absent in human hosts, enhancing their appeal as selective drug targets. The de novo pyrimidine biosynthesis pathway, and in particular the enzyme dihydroorotate dehydrogenase (DHODH), represents an additional target due to its critical role in Plasmodium metabolism [11]. Many antimalarial agents are derived from plant sources, such as quinine and artemisinin, highlighting the significant pharmacological potential of natural products [12,13,14]. Among these, flavonoids, a large class of polyphenolic compounds, have garnered attention for their diverse biological properties, including antiparasitic activity [14,15,16,17,18,19,20,21,22].
Flavonoids exhibit a characteristic chemical structure consisting of two phenyl rings (A and C) connected by a three-carbon bridge to a third ring (B). Their ability to modulate various enzymatic activities like xanthine oxidase (XO), cyclooxygenase (COX), lipoxygenase (LOX), and phosphoinositide 3-kinase makes them strong candidates for therapeutic applications, including antimalarial therapy [23,24].
Cheminformatic methodologies, particularly Structure–Activity Relationship (SAR) and Quantitative Structure–Activity Relationship (QSAR) modelling, are increasingly employed in rational drug design [25,26]. These approaches aim to correlate molecular descriptors with biological activity, facilitating the identification and prediction of novel bioactive compounds. The integration of QSAR with pharmacophore modelling, aimed at elucidating essential steric and electronic features required for target interaction, further improves the reliability of predictive models [27,28,29,30,31]. Recent advances in artificial intelligence (AI), especially in machine learning (ML), have proven highly beneficial and have become a well-established standard in computer-aided drug design [32,33,34,35,36]. Several studies have employed ML-based QSAR models to generate novel, accurate, and reliable antimalarial hits [37,38,39]. However, despite being a well-known class of polyphenols, flavonoids remain underexplored as potential inhibitors—particularly in the context of targeting FAS-II enzymes. In this context, the present study employed SAR, QSAR, molecular docking, and molecular dynamics simulations to investigate flavonoids as potential antimalarial agents against FAS-II enzymes, focusing on identifying important pharmacophoric features and constructing robust ML models for virtual screening.

2. Results and Discussion

Following the data mining to build the flavonoid databases, a total of 1945 molecular descriptors were calculated, which were reduced to only 4 for the FlavStr database (focusing on flavonoids targeting different P. falciparum strains for inhibition) and 3 for each enzyme model after selection with the RFc (Random Forest classification) and filtering by Pearson correlation. For the FlavStr database, the selected classification descriptors were ATS8m, MDEO-22, maxHdsCH, and minsCH3, while the regression descriptors were smr_VSA10, minsCH3, MDEO-22, and GGI8. Each enzyme model shared the same descriptors with both classification and regression algorithms. The FabG model retained the descriptors AATS8m, VE3_Dzv, and FNSA-1. The FabZ model retained the descriptors ATS3m, nHBint7, and SIC3. Lastly, FabI had the descriptors VE3_Dzv, maxHBint9, and TDB6u. Table 1 shows the correlations between the selected descriptors, evidencing that all values are below the 0.65 threshold. The highest correlation is observed for the descriptors ATS8m and MDEO-22 from FlavStr classification, with a value of 0.61, and the lowest is observed for ATS8m and maxHdsCH, with −0.2. For FlavStr regression, the highest correlation occurred between MDEO-22 and smr_VSA10 with 0.38, while the lowest occurred between GGI8 and minsCH3 with −0.08. For FabG, the highest correlation occurred between AATS8m and VE3_Dzv with 0.48, while the lowest occurred between FNSA-1 and VE3_Dzv with −0.06. For FabZ, the highest was seen between ATS3m and nHBint7 with 0.48, while the lowest was between ATS3m and SIC3 with −0.58. As for FabI, the highest happened between VE3_Dzv and maxHBint9 with 0.46, and the lowest between maxHBint9 and TDB6u with −0.53.
Most of the selected descriptors were two-dimensional (2D), except for FNSA-1 and TDB6u, and predominantly belonged to the topological descriptors class, which considers the connectivity between atoms. The descriptors that belong to this class included ATS8m and ATS3m (Broto–Moreau autocorrelation, weighted by mass), MDEO-22 (molecular distance between all secondary oxygens), GGI8 (topological charge index), AATS8m (average Broto–Moreau autocorrelation, weighted by mass), VE3_Dzv (Barysz matrix descriptor), and TDB6u (3D topological distance-based autocorrelation descriptor). In another similar class, electro-topological states, the descriptors maxHdsCH, minsCH3, nHBint7, and maxHBint9 were found. Lastly, the smr_VSA10 descriptor indicates molecular refractivity, the FNSA-1 is a charged partial surface area descriptor, and SIC3 is defined by a structural information content index.
This study aimed to evaluate the performance of both classification (c) and regression (r) models, employing linear and non-linear algorithms, to determine the most effective approach for this problem type. The scientific literature suggests a preference for the use of classification models in QSAR. Some studies that have employed regression models for this task tend to use PLS or Pace regression [30,31]. Previous investigations evaluating the antimalarial potential of sesquiterpene lactones concluded that establishing a robust relationship between subtle structural variations and their activity, which would ideally facilitate compound optimisation, was not feasible [30]. This observation may help to explain the findings regarding the PLS model, which, despite being considered a valid predictive model, did not perform as well for the FlavStr test set (TS) and exhibited a significant RMSE value. Nevertheless, PLS appears to exhibit better performance when dealing with smaller datasets, as in the study of Yousefinejad et al. [40], which used 42 1H-imidazol-2-yl-pirimidin-4,6-diamines derivatives with antimalarial potential and concluded with R2 values above average. Even for studies seeking the SAR of cyclopeptide alkaloids against P. falciparum based on PLS, a smaller library generates better results, which helps to explain why our enzyme models fared much better than the FlavStr model [41].
The models accepted based on the predefined inclusion criteria were as follows: multilayer perceptron classification (MLPc) and support vector machine regression (SVMr) for FlavStr; RFc and MLPr for FabG; SVMc and MLPr for FabZ; and K-Nearest Neighbours (KNNs) and PLS for FabI. Their respective results are presented in Table 2 and Table 3. The complete tables displaying the results for all developed models can be accessed in Tables S1 and S2, while the ROC curves for each set can be found in Figures S1–S60, and their respective scatter plots can be found in Figures S61–S90.
The FlavDB comprised 310 flavonoid structures. Extensive searches in online databases and scientific literature, however, yielded no information regarding their antimalarial activity. Following the application of the applicability domain (APD) filter for each database, all 310 structures were considered reliable for both the FlavStr classification and regression models. For FabG, these numbers were 26 and 13, respectively. FabZ had 48 reliable structures for the classification model and 55 for the regression model. Lastly, the number of reliable structures for FabI was 28 and 29 for the respective models.
The developed models were utilised to predict the activity of molecules that passed their respective applicability domain (APD) filters, thereby classifying them as either active or inactive based on their structures. The numeric activity intervals for both classes were defined based on the initial box-plot threshold, and the percentages of activities classified by the models are presented in Figure 1.
As a result of the classification algorithms, it was observed that the MLPc model for the FlavStr database classified all structures as active. In contrast, the RFc model for FabG, the SVMc model for FabZ, and the KNN model for FabI classified 0, 21, and 26 structures as active, and 26, 27, and 2 as inactive, respectively. Regarding the regression algorithms, the SVMr model for the FlavStr database considered 175 structures active and 135 inactive. The MLPr models for FabG and FabZ resulted in only 1 active structure for each model; therefore, 12 and 54 structures were considered inactive, respectively. The PLS model for FabI resulted in 20 active and 9 inactive structures.
The use of artificial neural networks (ANNs) and other supervised learning algorithms for QSAR predictions has been on the rise, particularly feed-forward networks, as exemplified by the study of Yousefinejad et al. [40]. The authors developed a simple ANN in MATLAB v. 7.6 to predict the activity of imidazole piperazines against P. falciparum strains W2 and 3D7. The architecture employed was based on a single hidden layer with two neurons, both utilising a sigmoid function, and only one output neuron with a linear function. Unlike the present study, the mean squared error (MSE) was used as the primary criterion to assess the quality of the model’s performance. As a result, the authors state that non-linear methods perform significantly better than linear ones, and that their developed ANN demonstrates good predictive ability. Despite the slight difference in the architecture of our MLP, this corroborates our findings, as superior performance was also detected with a lower number of hidden layers and the use of sigmoid activation functions.
Other studies employing MLPs have demonstrated benefits from using fewer hidden layers, although often with more hidden neurons to accommodate a higher number of variables [41,42]. Optimisation of MLP architecture has shown that the number of hidden layers and neurons significantly impacts performance, with increased hidden layers reducing error dispersion, though an excessive number of neurons per layer can lead to increased error over time [43]. Ultimately, optimal network architecture is problem-dependent, aligning with the principle of parsimony where network complexity should match problem complexity [44]. It is also pertinent that the use of sigmoid or hyperbolic tangent functions in hidden layers is commonly associated with vanishing gradient problems in multi-layered networks [45]. This phenomenon, driven by proportional weight modification during back-propagation, can impede weight updates as gradients become infinitesimally small. However, this issue is generally uncommon in smaller networks, such as the one developed in this study, being more frequently observed in deep learning architectures [46].
Previous work has explored generalised antimalarial activity prediction using various classification algorithms (e.g., RF, SVM, KNN, XGBoost), with SVM and XGBoost achieving around 85% accuracy against P. falciparum [47]. However, the use of highly diverse molecular databases and a large number of descriptors in such studies has been identified as a potential issue that could compromise QSAR model quality [48]. To mitigate this, a follow-up study by the same authors developed more robust SVM and GRNN models with a reduced set of only 15 descriptors, yielding higher accuracies of 87.3% (SVM) and 88.9% (GRNN) for the test set.
Considering that previous authors defined their models as robust in terms of accuracy, and taking into account the size and variability of our datasets, it is possible to infer that all five of our classification models are also robust, as they exhibited accuracy values around or even exceeding those reported. Furthermore, our models are specific to flavonoids, which makes them more sensitive by avoiding overgeneralisation, while also being able to explore the chemical universe of the database with only 14 descriptors split into eight different models.
Regarding the descriptors, our findings revealed a remarkable presence of surface and topological descriptors, which encompass electro-topological states and molecular refractivity. As previously mentioned, all descriptors except for two were two-dimensional (2D), meaning they primarily consider the connectivity between atoms based on the presence and nature of chemical bonds. Consequently, molecules are perceived as an inter- and intra-atomic interaction graph, rendering them highly sensitive to structural features such as size, shape, symmetry, chains, cycles, and polarisability, among others [49,50]. For the description of flavonoids, topological indices appear to be the most frequent, probably due to their emphasis on lipophilicity and the distance between the molecule’s oxygens [51,52,53]. In contrast, 3D pharmacophoric models for prenylated flavonoids have highlighted the importance of descriptors such as shape (including flexibility and globularity), hydrophilic/hydrophobic volume, and surface area. Notably, flexibility, a 3D descriptor, was identified as crucial for bioactivity, enhancing receptor binding, particularly with membrane proteins [54].
The predominant use of two-dimensional (2D) descriptors could be advantageous for model simplification. In general, three-dimensional (3D) descriptors utilise more complex information and may increase a model’s uncertainty, potentially leading to overly speculative predictions, while also being more computationally demanding [55]. In this study, not only topological descriptors—the most common among the 2D descriptors—were selected, but also electro-topological state indices, which demonstrate the probability of interaction between a certain atom in a molecule and other atoms in other molecules. This helps to elucidate the extent to which an atom is exposed on the molecule’s surface or trapped within its structure [56]. The minsCH3 descriptor itself, selected for the regression models, could be important to corroborate our SAR findings by showing that methylations can influence receptor binding.
The training (Tr) dataset partitioning for all models was more restrictive than usual (around 80%), which may have contributed to lower accuracy and R2 results for both training and test (Ts) sets, particularly for regression algorithms [57]. This aligns with studies indicating that while various partitioning ratios can yield good results, the composition and splitting method of datasets are more significant than just the ratio [58,59]. The enzyme databases, being considerably smaller than FlavStr, could have resulted in higher accuracy and R2 values, but at the cost of reduced chemical representability, potentially diminishing model relevance. Furthermore, the inability to retrieve information regarding the protozoan’s life cycle during database construction complicated the interpretation of molecular mechanisms of action, as substances can exhibit activity specific to certain membranes or organelles [60]. It is also recognised that combining results from diverse assays, even without large, consistent datasets, can introduce noise into machine learning models [61].
Docking calculations for FabZ were performed with the 21 structures predicted as active by the SVMc model, utilising 10 runs per ligand. The same procedure was applied to the 26 structures predicted as active by the KNN model for FabI (Table S3). The docking model was validated by redocking. Only RMSD values below 2.0 were considered adequate.
The ligands exhibiting the highest docking scores for FabZ were 6′-Hydroxyangolensin, Angolensin, Butein, 6′-Hydroxy-O-desmethylangolensin, and 4′-O-Methylequol. For FabI, these ligands were O-Desmethylangolensin, Dihydrodaidzein 7-O-glucuronide, Cyanidin, 3′,4′,5,7-Tetrahydroxyisoflavone, and Equol 4′-O-glucuronide (Figure 2). The same structures were used for the MD simulations.
For the SVMc model concerning FabZ, all the aforementioned ligands with the highest scores exhibited maximum probability of being active, which is one. Conversely, for the KNN model concerning FabI, the predicted probabilities were 0.59, 0.59, 0.58, 0.62, and 0.58, respectively. Regarding the MLPr model for FabZ, it has predicted the pIC50 for each of those structures as −2.19, while for FabI’s PLS model, the values were −1.0, 0.54, −0.33, 0.70, and 0.30, respectively. Most of the compounds with the highest docking scores belong to the class of isoflavones and are structurally similar to either Angolensin, Cyanidin, or Equol.
Angolensin, a phytoestrogen found in Entandrophragma angolense, possesses a history of traditional use against malaria [62]. Butein, isolated from Rhus verniciflua, exhibits diverse biological activities, including significant anti-inflammatory and antioxidant properties [63]. Its antimalarial activity has been specifically assessed through binding to Falcipain-2, demonstrating superior intermolecular interactions and inhibitory effects compared to related chalcones [64]. While Dihydrodaidzein 7-O-glucuronide is less directly studied for malaria, Daidzein demonstrated promise in virtual screening against P. vivax’s Duffy binding protein, remaining stable in molecular dynamics simulations [65]. Cyanidin, from Peganum harmala, also demonstrated antiplasmodial activity against P. falciparum, with an IC50 value of 24 mg/L in extract form [66].
During the analysis of lateral chains that could influence flavonoid’s activities, which corresponds to the SAR part of this study, it is noticeable that there is a predominance of active structures belonging to the flavanols, isoflavones, flavonols, and flavanonols subclasses, in addition to the remarkable presence of hydroxylations and methoxylations in positions 3′ and 4′, as well as in positions 3, 5 and 7 (Figure 3A–D). The gallate group and other aromatic side chains, particularly at position 3 or 7, appeared crucial for bioactivity, as these features were primary nodes in our FabG model and appeared in other branches of the FlavStr model. This aligns with findings from studies on Sudanese flora [67] and Mundulea sericea [68], which demonstrated antiplasmodial activity in plants rich in similar flavonoids, including Quercetin, Catechin, and Epicatechin-gallate.
Methoxylations in flavonoids exhibit a mixed correlation with antiplasmodial activity in the heme detoxification pathway; some Quercetin derivatives are active regardless of their methoxylation degree, whilst other compounds with lower methoxylation also exhibit high inhibition [69]. Experimental evidence supports epigallocatechin-gallate as a potent inhibitor of P. falciparum’s heat shock proteins [69]. Notably, this compound, present in our database, possesses a gallate group at position 3 and was classified as active by our decision tree, while epigallocatechin, lacking this group, was inactive [9]. Most substructures in our databases, being primarily flavonoids and natural products, contain Pan-Assay Interference Compounds (PAINS) motifs [70]. However, our virtual screening appears to have largely filtered these out, and the predicted lateral chains largely avoided the worst PAINS [71]. In conclusion, the gallate group likely contributes to antiplasmodial activity, though further chemical, kinetic, and biological tests are necessary for definitive confirmation.
The MD simulation trajectories are presented in Figures S91–S100. Every triplicate had at least one simulation considered stable, with most ligands having two out of three stable simulations and some even having three out of three, namely 3′,4′,5,7-Tetrahydroxyisoflavone, Cyanidin, Dihydrodaidzein 7-O-Glucuronide, and O-Desmethylangolensin. Stability was defined as a ligand remaining within the binding site with minimal movement and/or pose changes throughout the simulated time. In some cases, such as with O-Desmethylangolensin and 3′,4′,5,7-Tetrahydroxyisoflavone, the ligands showed a tendency to fold over their own axis, similar to a V-shape, in the early stages of the simulation, maintaining that position with much less movement until the final frame. This suggests that the initial docked pose might not have been ideal, and the ligands subsequently accommodated themselves in a more favourable pose throughout the simulation, which implies that a higher docking score does not always correlate with a better result. The interaction analyses conducted with LigandScout for both proteins and all ligands can be seen in Figure S101, while Figure 4 shows the four most stable ligands.
The ligand 6′-Hydroxyangolensin formed mainly hydrophobic (H) interactions with residues Phe102, Val143, and Trp179, hydrogen bond acceptor (HBA) interactions with Phe171, and hydrogen bond donor (HBD) interactions with Glu147 and Ala150 (Figure 4A). The ligand Angolensin formed H interactions with Ile139, Val143, Ala150, Phe169, and Leu170, HBA with Phe171, and HBD with Glu147 (Figure 4B). The ligand Butein formed H interactions with Ile139, Ala150, and Leu170, and HBD with Glu147 and Phe171. The ligand 6′-Hydroxy-O-desmethylangolensin formed H interactions with Ile139, Phe171, and Trp179, HBA with Gln145 and Phe171, HBD with His98 and Glu147, and also an aromatic ring interaction (AR) with Trp179. The ligand 4′-O-Methylequol formed H interactions with Phe134 and Ile139 and one HBA with Gln145. The ligand Dihydrodaidzein 7-O-Glucuronide formed H interactions with Tyr111, Ala217, Leu265, Tyr267, Met281, Ala319, Ile323, and Phe368, HBA with Ala217 and Leu315, and HBD with Ala322 (Figure 4C). The ligand O-Desmethylangolensin formed H interactions with Tyr111, Ala217, Leu265, Ala269, Met281, Ala312, Ala320, and Ile323, HBA with Tyr277, and HBD with Ser215 and Tyr277 (Figure 4D). The ligand Cyanidin formed H interactions with Tyr111 and Ala217, HBA with Asn324, and HBD with Ser215 and Asn324. The ligand 3′,4′,5,7-Tetrahydroxyisoflavone formed H interactions with Ala217, Met281, Ala319, Ala320, and Ile323, HBA with Tyr111, Tyr277, and Leu315, and HBD with Ser215, Leu215, and Ser217. Finally, the ligand Equol 4′-O-Glucuronide formed H interactions with Tyr111, Ala217, Leu265, Met281, Ala319, Ala322, and Ile323, HBA with Arg318, and HBD with Thr266 and Ala312.
Overall, some consistency in the interactions between the ligands and residues was observed, indicating which features could be important for binding and, consequently, for the inhibition of the proteins. Figure 5 shows the aligned pharmacophore axis for all five ligands in each binding pocket, evidencing a curved, hydrophobic-dense core with occasional aromatic stacking interaction, surrounded at the edges by HBA and HBD interactions, which points to regions rich in charged or polar residues. This fits well with flavonoids, and in particular with the proposed candidates.

3. Materials and Methods

3.1. Database Building

The database was constructed by mining data from two different bioactivity reviews, comprising 81 flavonoid structures. These structures are distributed in four sub-databases: FlavStr (which focuses on flavonoids with different P. falciparum strains as inhibition targets), FabG, FabZ, and FabI (which are important proteins for inhibition). FlavStr contains 80 structures, while the protein databases have a total of 22, with 13 of them present for both FabG and FabI, and 16 of them for FabZ. The 80 collected structures from the FlavStr database are classified into flavones (12.50%), flavonols (36.25%), flavanones (12.50%), flavanols (12.50%), isoflavones (8.75%), isoflavans (6.25%), biflavonoids (3.75%), and flavanonols (7.50%). The protein databases comprised only five subclasses among all three of them: flavanols (18.18%), flavanones (9.09%), flavones (13.63%), flavonols (45.4%), and isoflavones (13.63%) (Table S4).
From previous studies [9,21], the following information was extracted: flavonoid names, their Simplified Molecular Input Line Entry System (SMILES) notation, and their biological activity as minimal inhibitory concentration (IC50) in µM (this value then converted to its negative logarithm, pIC50), along with their 2D structure drawn with MarvinSketch v. 19.22 [72]. The pIC50 values were used in a simple box plot calculation to separate them into two classes of activity, “active” and “inactive” [73], a task performed with the KNIME v. 4.0.2 [74] platform.
The FlavStr database contains chemical data from 13 strains of P. falciparum, divided into chloroquine-sensitive (3D7, D6, FcB1, HB3, NF54, and PoW) and chloroquine-resistant (7G8, Dd2, F32, FcM29, FCR-3, K1, W2) strains [75,76]. The frequency of the obtained strains, in percentages, was distributed as follows: 7.50% for 3D7, 4.10% for 7G8, 9.50% for D6, 9.50% for Dd2, 2.70% for F32, 4.10% for FcB1, 2.70% for PoW, and 10.20% for W2, totalling 57.10% frequency for chloroquine-sensitive strains and 42.80% for chloroquine-resistant strains. Regarding the protein databases, the quantification was 28.30% for FabG, 28.30% for FabI, and 34.80% for FabZ.
The pIC50 values for the FlavStr database ranged from −2.78 to 2.77, with a mean of −0.32 and a standard deviation of 1.34. For the FabG model, the pIC50 values ranged from −2.00 to 0.52, with a mean of −0.64 and a standard deviation of 0.72. For FabZ, the maximum, minimum, mean, and standard deviation values were 0.40, −1.60, −0.63, and 0.74, respectively. As for FabI, the aforementioned values were 0.70, −1.70, −0.25, and 0.78, respectively. Through the box plot calculation, two activity classes were created: “active” (ranging from −1.06 to 2.77 for FlavStr, −0.50 to 0.52 for FabG, −0.76 to 0.40 for FabZ, and −0.17 to 0.70 for FabI); and “inactive” (ranging from −2.18 to −1.07 for FlavStr, −2.00 to −0.6 for FabG, −1.60 to −0.77 for FabZ, and −1.70 to −0.18 for FabI).

3.2. Descriptors Calculation and Selection

The three-dimensional (3D) structure conformations were generated by OpenBabel v. 2.3.1 and optimised with MOPAC v. 2016 using the PM7 method [77]. With this information, all available descriptors from the RDKit v. 2021.03.1 package [78] and PaDEL v. 2.21 [79] software were calculated. Both software packages contain two-dimensional (2D) and 3D descriptors for calculation, distributed into constitutional, topological, geometric, electronic, quantic, and physico-chemical. The descriptors were normalised using a linear transformation that ranged from 0 to 1, then descriptors with zero variance were excluded. Following this, a RFc algorithm was devised to select and reduce the number of descriptors, retaining only the most informative ones and avoiding data redundancy. Lastly, to verify the selection quality, the descriptors were submitted to a Pearson’s correlation filter, with a cut-off value of 0.65 [80].

3.3. Activity Prediction Model

In order to build a prediction model for the compounds from both databases, they were submitted to several supervised learning algorithms for classification (c) and regression (r). The classification analysis used the following algorithms: decision tree (DTc), RFc, SVMc, KNN, and MLPc with backpropagation learning [40,41]. Regarding the regression analysis, all of the previous algorithms were also used, with the exception of the KNN, which was replaced by a partial least squares (PLS) [31]. The algorithms were tuned by evaluating the parameters in the cross-validation; initially, the default Boolean and string parameters were tested and, when necessary, each parameter was adjusted empirically to improve the accuracy (classification models) and R2CV (regression models). For the numeric parameters, a parameter optimisation loop node was used in order to assess which value could bring the highest accuracy and R2CV values. Details regarding the utilised parameters for the construction of the models and their tuning can be seen in Tables S5–S46. All models built followed a 66% partitioning for the training sets (Tr), while the CV sets were partitioned via the LOO method and using the same seed (1562765658145). The main difference was the type of partitioning, which was stratified for the classification models and random and linear for the regression FlavStr and enzymes models, respectively. All of those algorithms were built and evaluated using the KNIME software. Both FlavStr and all enzyme databases are separated by Tr and test sets (Ts), and their selected descriptors can be seen in Table S4.
As evaluation metrics for the classification algorithms, overall accuracy (A), error index (E), Cohen’s Kappa (K), and receiver operating characteristic curve (ROC) values were taken into account, while the regression algorithms were evaluated based on their determination coefficient (R2), mean absolute error (MAE), and root mean square error (RMSE) [81,82,83]. The selected descriptors for each type of model were submitted to an applicability domain (APD) assessment with the Enalos v. 2021 [84] package, where the domains were verified by leverage (extrapolation extension) and similarity (Euclidean distances), in order to define the chemical universe of the models [85]. The calculations were based on the Ts sets and used in the following virtual screening assessment, where the algorithm was chosen based on the domain with the highest reliability. A confidence level of 100% for at least one domain is therefore indispensable.

3.4. Virtual Screening Development

The best models for each type of prediction and database were chosen, which resulted in one classification and one regression model each for FlavStr, FabG, FabZ, and FabI. The metrics of selection for the classification models were determined by an extensive search of the specialised literature and took into account their A values, with the highest K and lowest E values from the Ts set used for the removal of possible draws. Thus, the defined acceptability intervals were Tr ≥ 85%, Ts ≥ 80%, and CV ≥ 70%. For the regression models, the predictability and acceptance criteria from the Enalos package were used, which are based on the criteria already established by the specific literature [86], which states that a valid continuous model should satisfy all of the four following criteria: R2 > 0.6; RCVext2 > 0.5; (R2 − R02)/R2 < 0,1 or (R2 − R′02)/R2 < 0.1 and 0.85 ≤ k ≤ 1.15 or 0.85 ≤ k′ ≤ 1.15. For the removal of possible draws, the lowest MAE and RMSE values for the Ts set were used.
Following the selection of the best models, a new database with different natural and semi-synthetic flavonoids was created (FlavDB), this time without previous information regarding their antiplasmodial activity (Table S47). This database gathered the names of the structures, followed by their SMILES notation, and the structures were retrieved from the ChEBI [87] and Phenol-Explorer [88] online repositories. The new database was then submitted to the already trained best models and, once again, was filtered by the APD of the selected descriptors, enabling their predictions to be measured.
After the screening of the trained models, a second screening through molecular docking was performed. This step was performed with GOLD v. 5.4 [89]. The chemical structures from FlavDB were used as input ligands, and the enzymes previously mentioned were used as input targets, which had their crystallographic structures obtained from the RCSB Protein Data Bank (PDB) repository [90]. Their active site was inferred based on the position of the complexed ligand. To validate the software’s parameters and assess its predictive ability, a redocking procedure was performed. At this stage, the ligands crystallised with the receptors were removed and reinserted, with the pose quality being assessed by means of root mean square deviation (RMSD) values.
The proteins used for the analyses were FabZ (PDB code: 3AZA) and FabI (PDB code: 3LT4). Both proteins had inhibitors already crystallised with the structures, namely 8-(benziloxy)-5-chloroquinoline (NAS91-10) and triclosan variant PB4 (TCL-PB4), respectively.
Hydrogens were added accordingly, and water molecules not participating in relevant interactions were removed from the protein. The active site of FabZ consists of His98, His133, and Glu147. For FabI, the key-residues are Tyr267, Tyr277, Gly313, Pro314, Ile323, Phe368, Ile369, and Ala372. The atomic distance radius selected for insertion of ligands was 6 Å for all calculations, and the number of runs was defined as 50 for the redocking step and 10 for the docking step. The scoring function ChemPLP was used for all analyses.

3.5. Molecular Dynamics (MD) and Pharmacophore Assessment

In order to assess which side chains would be relevant for the structure’s activity, the radicals were submitted to a classificatory decision tree (DTc) algorithm within the WEKA v. 3.6.14 data analysis package [90] (Table S2). The databases were fingerprinted (1 = presence, 0 = absence) according to the presence of radicals, heteroatoms and lateral chains. The proposed relationship relies on the fact that if a certain radical shows a higher activity value, it is possible that this is the substructure that promotes the bioactivity. The A, E, and K values were used as evaluation criteria [91].
MD simulations were performed in order to virtually assess how and for how long the proposed candidates stay bound to the active site, as well as how they behave in there. The proteins and ligands with the best docked poses were first pre-processed with the software MOE v. 2020.0901 [92]. This pre-processing consists of protonating the protein–ligand complex into 3D, minimising its energy, correcting structural issues and inspecting it for possible atom clashes. The force field used for all simulations was OPLS-AA. The systems were then processed with the Maestro package contained in the software Desmond v. 2022.1 [93], where they were optimised for the simulations and the systems were built. The time set was 100 ns with 1000 frames, temperature 300 K, pressure 1.01 bar, and all simulations were performed in triplicate with varying random seeds (i.e., 2007, 2008, 2009). In addition, the same methods were applied to the wildtype proteins to understand how their previously crystallised ligands behave in a simulation and to help us draw comparisons between them and our results. The simulation’s trajectories were observed with the software VMD v. 1.9.3 [94], and their protein–ligand interactions and pharmacophores were assessed with the software LigandScout v. 4.4.3 [95].

4. Conclusions

This study evaluated the antimalarial potential of flavonoids using a combination of cheminformatics approaches. We systematically compared multiple ML algorithms for the prediction of P. falciparum inhibition and successfully identified highly accurate models. In addition to highlighting the most suitable algorithms for QSAR modelling, the study also revealed promising flavonoid-based hit candidates. While further experimental validation and pharmacokinetic profiling are required to confirm their therapeutic potential, the identification of these compounds underscores the predictive power of our approach. Despite relying on relatively small datasets, the models achieved high specificity and accuracy, and consistently identified key flavonoid subclasses and pharmacophore features—particularly hydroxyl, methoxy, and gallate groups—as critical to antimalarial activity. To our knowledge, flavonoid-based QSAR modelling against malaria remains a relatively unexplored area, and this work contributes valuable insights for future virtual screening efforts.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ddc4030033/s1, Figures S1–S60: ROC curves for all devised models; Figures S61–S90: Complete set of scatter plots for all devised models; Figures S91–S100: MD simulation trajectories for each receptor; Figure S101: Interactions between receptors and predicted ligands; Tables S1 and S2: Algorithms scores; Table S3: Best 20 ligands for each receptor; Table S4: Complete training databases with actual and predicted values; Tables S5–S46: Complete algorithms parameters: Table S47: FlavDB database.

Author Contributions

Conceptualisation, A.d.O.F., M.V.d.A.B. and T.B.O.; Data curation, A.d.O.F. and E.B.A.; Formal analysis, A.d.O.F., V.V.M.P., Y.J.A.S., E.B.A., R.P.R., D.A.C.-P., A.S.F., R.C., M.B. and T.B.O.; Funding acquisition, D.A.C.-P., S.S.D. and T.B.O.; Investigation, A.d.O.F., V.V.M.P., Y.J.A.S. and T.B.O.; Methodology, A.d.O.F., V.V.M.P., Y.J.A.S., E.B.A., R.P.R., D.A.C.-P., A.S.F., R.C., M.V.d.A.B., M.B., S.S.D. and T.B.O.; Project administration, T.B.O.; Resources, A.S.F., R.C., M.V.d.A.B., M.B., S.S.D. and T.B.O.; Supervision, M.V.d.A.B. and T.B.O.; Validation, A.d.O.F., R.P.R., M.V.d.A.B., M.B. and T.B.O.; Visualisation, A.d.O.F., M.V.d.A.B. and T.B.O.; Writing—original draft, A.d.O.F. and T.B.O.; Writing—review and editing, A.d.O.F., V.V.M.P., Y.J.A.S., E.B.A., R.P.R., D.A.C.-P., A.S.F., R.C., M.V.d.A.B., M.B., S.S.D. and T.B.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Basu, S.; Sahi, P.K. Malaria: An update. Indian J. Pediatr. 2017, 84, 521–528. [Google Scholar] [CrossRef] [PubMed]
  2. Shin, H.I.; Bora, K.; Kim, Y.J.; Kim, T.Y.; Cho, S.H.; Lee, S.E. Diagnosis and molecular analysis in imported Plasmodium ovale curtisi and P. ovale wallikeri malaria cases from West and South Africa during 2013–2016. Korean J. Parasitol. 2020, 58, 61–65. [Google Scholar] [CrossRef]
  3. Gozalo, A.S.; Robinson, C.K.; Holdridge, J.; Mahecha, O.F.L.; Elkins, W.R. Overview of Plasmodium spp. and Animal models in malaria research. Comp. Med. 2024, 74, 205–230. [Google Scholar] [CrossRef] [PubMed]
  4. WHO. World Malaria Report 2024; World Health Organization: Geneva, Switzerland, 2024. [Google Scholar]
  5. Recht, J.; Siqueira, A.M.; Monteiro, W.M.; Herrera, S.M.; Herrera, S.; Lacerda, M.V.G. Malaria in Brazil, Colombia, Peru and Venezuela: Current challenges in malaria control and elimination. Malar. J. 2017, 16, 273. [Google Scholar] [CrossRef]
  6. Bittaye, S.O.; Jagne, A.; Jaiteh, L.E.; Nadjm, B.; Amambua-Ngwa, A.; Sesay, A.K.; Singhateh, Y.; Effa, E.; Nyan, O.; Njie, R. Clinical manifestations and outcomes of severe malaria in adult patients admitted to a tertiary hospital in the Gambia. Malar. J. 2022, 21, 270. [Google Scholar] [CrossRef]
  7. Dayananda, K.K.; Achur, R.N.; Gowda, D.C. Epidemiology, drug resistance, and pathophysiology of Plasmodium vivax malaria. J. Vector Borne Dis. 2018, 55, 1–8. [Google Scholar] [CrossRef]
  8. Daily, J.P.; Minuti, A.; Khan, N. Diagnosis, treatment, and prevention of malaria in the US: A review. JAMA 2022, 28, 460–471. [Google Scholar] [CrossRef]
  9. Tasdemir, D.; Lack, G.; Brun, R.; Rüedi, P.; Scapozza, L.; Perozzo, R. Inhibition of Plasmodium falciparum fatty acid biosynthesis: Evaluation of FabG, FabZ and FabI as drug targets for flavonoids. J. Med. Chem. 2006, 49, 3345–3353. [Google Scholar] [CrossRef] [PubMed]
  10. Kumar, S.P.; Pandya, H.A.; Desai, V.H.; Jasrai, Y.T. Compound prioritization from inverse docking experiment using receptor-centric and ligand-centric methods: A case study on Plasmodium falciparum Fab enzymes. J. Mol. Recognit. 2014, 27, 215–229. [Google Scholar] [CrossRef]
  11. Hoelz, L.V.; Calil, F.A.; Nonato, M.C.; Pinheiro, L.C.S.; Boechat, N. Plasmodium falciparum dihydroorotate dehydrogenase: A drug target against malaria. Future Med. Chem. 2018, 10, 1853–1874. [Google Scholar] [CrossRef]
  12. Kingston, D.G.I.; Cassera, M.B. Antimalarial natural products. Prog. Chem. Org. Nat. Prod. 2022, 117, 1–106. [Google Scholar] [CrossRef]
  13. White, N.J.; Chotivanich, K. Artemisinin-resistant malaria. Clin. Microbiol. Rev. 2024, 37, e0010924. [Google Scholar] [CrossRef] [PubMed]
  14. Tajuddeen, N.; van Heerden, F.R. Antiplasmodial natural products: An update. Malar. J. 2019, 18, 404. [Google Scholar] [CrossRef]
  15. Rakha, A.; Umar, N.; Rabail, R.; Butt, M.S.; Kieliszek, M.; Hassoun, A.; Aadil, R.M. Anti-inflammatory and anti-allergic potential of dietary flavonoids: A review. Biomed. Pharmacother. 2022, 153, 113945. [Google Scholar] [CrossRef] [PubMed]
  16. Gupta, M.; Ahmad, J.; Ahamad, J.; Kundu, S.; Goel, A.; Mishra, A. Flavonoids as promising anticancer therapeutics: Contemporary research, nanoantioxidant potential, and future scope. Phytother. Res. 2023, 37, 5159–5192. [Google Scholar] [CrossRef]
  17. Adewole, K.E. Nigerian antimalarial plants and their anticancer potential: A review. J. Integr. Med. 2020, 18, 92–113. [Google Scholar] [CrossRef] [PubMed]
  18. Martínez-Castillo, M.; Pacheco-Yepez, J.; Flores-Huerta, N.; Guzmán-Téllez, P.; Jarillo-Luna, R.A.; Cárdenas-Jaramillo, L.M.; Campos-Rodríguez, R.; Shibayama, M. Flavonoids as natural treatment against Entamoeba histolytica. Front. Cell. Infect. Microbiol. 2018, 8, 209. [Google Scholar] [CrossRef]
  19. Araújo, M.V.; Queiroz, A.C.; Silva, J.F.M.; Silva, A.E.; Silva, J.K.S.; Silva, G.R.; Silva, E.C.O.; Souza, S.T.; Fonseca, E.J.S.; Camara, C.A.; et al. Flavonoids induce cell death in Leishmania amazonensis: In vitro characterization by flow cytometry and Raman spectroscopy. Analyst 2019, 144, 5232–5244. [Google Scholar] [CrossRef]
  20. Nabavi, S.F.; Sureda, A.; Daglia, M.; Izadi, M.; Rastrelli, L.; Nabavi, S.M. Flavonoids and Chagas’ disease: The story so far! Curr. Top. Med. Chem. 2017, 17, 460–466. [Google Scholar] [CrossRef]
  21. Boniface, P.K.; Ferreira, E.I. Flavonoids as efficient scaffolds: Recent trends for malaria, leishmaniasis, Chagas disease and dengue. Phytother. Res. 2019, 33, 2473–2517. [Google Scholar] [CrossRef]
  22. Baldim, J.L.; Alcântara, B.G.V.; Domingos, O.S.; Soares, M.G.; Caldas, I.S.; Noaves, R.D.; Oliveira, T.B.; Lago, J.H.G.; Chagas-Paula, D.A. The correlation between chemical structures and antioxidant, prooxidant and antitrypanosomatid properties of flavonoids. Oxid. Med. Cell. Longev. 2017, 2017, 3789856. [Google Scholar] [CrossRef] [PubMed]
  23. Panche, A.N.; Diwan, A.D.; Chandra, S.R. Flavonoids: An overview. J. Nutr. Sci. 2016, 5, e47. [Google Scholar] [CrossRef] [PubMed]
  24. Gallego-Delgado, J.; Ty, M.; Orengo, J.M.; Van Der Hoef, D.; Rodriguez, A. A surprising role for uric acid: The inflammatory malaria response. Curr. Rheumatol. Rep. 2014, 16, 401. [Google Scholar] [CrossRef]
  25. Chen, H.; Kogej, T.; Engkvist, O. Cheminformatics in drug discovery, an industrial perspective. Mol. Inform. 2018, 37, e1800041. [Google Scholar] [CrossRef]
  26. Zhang, L.; Tan, J.; Han, D.; Zhu, H. From machine learning to deep learning: Progress in machine intelligence for rational drug design. Drug Discov. Today. 2017, 22, 1680–1685. [Google Scholar] [CrossRef]
  27. Neves, B.J.; Braga, R.C.; Melo Filho, C.C.; Moreira Filho, J.T.; Muratov, E.N.; Andrade, C.H. QSAR-based virtual screening: Advances and applications in drug discovery. Front. Pharmacol. 2018, 9, 1275. [Google Scholar] [CrossRef]
  28. Muratov, E.N.; Bajorath, J.; Sheridan, R.P.; Tetko, I.V.; Filimov, D.; Poroikov, V.; Oprea, T.I.; Baskin, I.I.; Varnek, A.; Roitberg, A.; et al. QSAR without borders. Chem. Soc. Rev. 2020, 49, 3525–3564. [Google Scholar] [CrossRef]
  29. Schneidman-Duhovny, D.; Dror, O.; Inbar, Y.; Nussinov, R.; Wolfson, H.J. Deterministic pharmacophore detection via multiple flexible alignment of drug-like molecules. J. Comput. Biol. 2008, 15, 737–754. [Google Scholar] [CrossRef] [PubMed]
  30. Schmidt, T.J.; Nour, A.M.M.; Khalid, S.A.; Kaiser, M.; Brun, R. Quantitative-antiprotozoal relationships of sesquiterpene lactones. Molecules 2009, 14, 2062–2076. [Google Scholar] [CrossRef]
  31. Sadgrove, N.J.; Oliveira, T.B.; Khumalo, G.P.; van Vuuren, S.F.; van Wyk, B.-E. Antimicrobial isoflavones and derivatives from Erythrina (Fabaceae): Structure activity perspective (SAR & QSAR) on experimental and mined values against Staphylococcus aureus. Antibiotics 2020, 9, 223. [Google Scholar] [CrossRef]
  32. Lee, Y.W.; Choi, J.W.; Shin, E.H. Machine learning model for predicting malaria using clinical information. Comput. Biol. Med. 2021, 129, 1041151. [Google Scholar] [CrossRef] [PubMed]
  33. Mbaye, O.; Ba, M.L.; Sy, A. On the efficiency of machine learning models in malaria prediction. Stud. Health Technol. Inform. 2021, 27, 437–441. [Google Scholar] [CrossRef]
  34. Eze, P.U.; Asogwa, C.O. Deep machine learning model trade-offs for malaria elimination in resource-constrained locations. Bioengineering 2021, 8, 150. [Google Scholar] [CrossRef]
  35. Ford, C.T.; Janie, D. Ensemble machine learning modelling for the prediction of artemisinin resistance in malaria. F100Res 2020, 9, 62. [Google Scholar] [CrossRef]
  36. Borba, J.V.B.; Salazar-Alvarez, L.C.; Ferreira, L.T.; Silva-Mendonca, S.; Silva, M.F.B.D.; Sanches, I.H.; Clementino, L.D.C.; Magalhaes, M.L.; Rimoldi, A.; Calit, J.; et al. Innovative multistage ML-QSAR models for malaria: From data to discovery. ACS Med. Chem. Lett. 2024, 15, 1386–1395. [Google Scholar] [CrossRef]
  37. Hesping, E.; Chua, M.J.; Pflieger, M.; Qian, Y.; Dong, L.; Bachu, P.; Liu, L.; Kurz, T.; Fisher, G.M.; Skinner-Adams, T.S.; et al. QSAR classification models for prediction of hydroxamate histone deacetylase inhibitor activity against malaria parasites. ACS Infect. Dis. 2020, 8, 106–117. [Google Scholar] [CrossRef] [PubMed]
  38. Ncube, N.B.; Tukulula, M.; Govender, K.G. Leveraging computational tools to combat malaria: Assessment and development of new therapeutics. J. Cheminform. 2024, 16, 50. [Google Scholar] [CrossRef]
  39. Yousefinejad, S.; Mahboubifar, M.; Eskandari, R. Quantitative structure-activity relationship to predict the antimalarial activity in a set of new imidazolpiperazines based on artificial neural networks. Malar. J. 2019, 18, 310. [Google Scholar] [CrossRef] [PubMed]
  40. Kalafi, E.Y.; Nor, N.A.M.; Taib, N.A.; Ganggayah, M.D.; Town, C.; Dhillon, S.K. Machine learning and deep learning approaches in breast cancer survival prediction using clinical data. Folia Biol. 2019, 65, 212–220. [Google Scholar] [CrossRef]
  41. Tuenter, E.; Segers, K.; Kang, K.B.; Viaene, J.; Sung, S.H.; Cos, P.; Maes, L.; Heyde, Y.V.; Pieters, L. Antiplasmodial activity, cytotoxicity and structure-activity relationship study of cyclopetide alkaloids. Molecules 2017, 22, 224. [Google Scholar] [CrossRef]
  42. Alkadri, S.; Ledwos, N.; Mirchi, N.; Reich, A.; Yilmaz, R.; Driscoll, M.; Del Maestro, R.F. Utilizing a multilayer perceptron artificial neural network to assess a virtual reality surgical procedure. Comput. Biol. Med. 2021, 136, 104770. [Google Scholar] [CrossRef] [PubMed]
  43. Castro, W.; Oblitas, J.; Santa-Cruz, R.; Avila-George, H. Multilayer perceptron architecture optimisation using parallel computing techniques. PLoS ONE 2017, 12, e0189369. [Google Scholar] [CrossRef]
  44. Stathakis, D. How many hidden layers and nodes? Int. J. Remote Sens. 2009, 30, 2133–2147. [Google Scholar] [CrossRef]
  45. Lee, J.G.; Jun, S.; Cho, Y.W.; Lee, H.; Kim, G.B.; Seo, J.B.; Kim, N. Deep learning in medical imaging: General overview. Korean J. Radiol. 2017, 18, 570–584. [Google Scholar] [CrossRef]
  46. Majumdar, A.; Gupta, M. Recurrent transforming learning. Neural Netw. 2019, 118, 271–279. [Google Scholar] [CrossRef] [PubMed]
  47. Danishuddin, G.M.; Malik, M.Z.; Subbarao, N. Development and rigorous validation of antimalarial predictive models using machine learning approaches. SAR QSAR Environ. Res. 2019, 30, 543–560. [Google Scholar] [CrossRef] [PubMed]
  48. Liu, Q.; Deng, J.; Liu, M. Classification models for predicting the antimalarial activity against Plasmodium falciparum. SAR QSAR Environ. Res. 2020, 31, 312–324. [Google Scholar] [CrossRef]
  49. Grisoni, F.; Consonni, V.; Todeschini, R. Impact of molecular descriptors on computational models. Methods Mol. Biol. 2018, 1825, 171–209. [Google Scholar] [CrossRef]
  50. Leach, A.R.; Gillet, V.J. Cheminformatics. In Comprehensive Medicinal Chemistry II; Taylor, J.B., Triggle, D.J., Eds.; Elsevier Science: Amsterdam, The Netherlands, 2006; Volume 3, pp. 235–264. [Google Scholar] [CrossRef]
  51. Araya-Coutier, C.; Vincken, J.P.; Van de Schans, M.G.M.; Hageman, J.; Schaftenaar, G.; Den Besten, H.M.W.; Gruppen, H. QSAR-based molecular signatures of prenylated (iso)flavonoids underlying antimicrobial potency against and membrane-disruption in Gram positive and Gram negative bacteria. Sci. Rep. 2018, 8, 9267. [Google Scholar] [CrossRef]
  52. Damale, M.; Harke, S.N.; Khan, F.A.K.; Shinde, D.B.; Sangshetti, J.N. Recent advances in multidimensional QSAR (4D-6D): A critical review. Mini Rev. Med. Chem. 2014, 14, 35–55. [Google Scholar] [CrossRef]
  53. Hall, L.H.; Mohney, B.; Kier, L.B. The electrotopological state: An atom index for QSAR. Quant. Struct. Act. Relatsh. 1991, 10, 43–51. [Google Scholar] [CrossRef]
  54. Korjus, K.; Hebart, M.N.; Vicente, R. An efficient data partitioning to improve classification performance while keeping parameters interpretable. PLoS ONE 2016, 11, e0161788. [Google Scholar] [CrossRef] [PubMed]
  55. Oliveira, T.B.; Gobbo-Neto, L.; Schmidt, T.J.; Da Costa, F.B. Study of chromatographic retention of natural terpenoids by cheminformatics tools. J. Chem. Inf. Model. 2015, 55, 26–38. [Google Scholar] [CrossRef] [PubMed]
  56. Puzyn, T.; Mostrag-Szlichtyng, A.; Gajewicz, A.; Skrynski, M.; Worth, A.P. Investigating the influence of data splitting on the predictive ability of QSAR/QSPR models. Struct. Chem. 2011, 22, 795–804. [Google Scholar] [CrossRef]
  57. Landrum, G.A.; Riniker, S. Combining IC50 or Ki values from different sources is a source of significant noise. J. Chem. Inf. Model. 2024, 64, 1560–1567. [Google Scholar] [CrossRef] [PubMed]
  58. Kiyama, R. Estrogenic flavonoids and their molecular mechanisms of action. J. Nutr. Biochem. 2023, 114, 109250. [Google Scholar] [CrossRef]
  59. Pfitscher, A.; Reiter, E.; Jungbauer, A. Receptor binding and transactivation activities of red clover isoflavones and their metabolites. J. Steroid Biochem. Mol. Biol. 2008, 112, 87–94. [Google Scholar] [CrossRef]
  60. Happi, G.M.; Ngadjui, B.T.; Green, I.R.; Kouam, S.F. Phytochemistry and pharmacology of the genus Entandrophragma over the 50 years from 1967 to 2018: A ‘golden’ overview. J. Pharm. Pharmacol. 2018, 70, 1431–1460. [Google Scholar] [CrossRef]
  61. Padmavathi, G.; Roy, N.K.; Bordoloi, D.; Arfuso, F.; Mishra, S.; Sethi, G.; Bishayee, A.; Kunnumakkara, A.B. Butein in health and disease: A comprehensive review. Phytomedicine 2017, 25, 118–127. [Google Scholar] [CrossRef]
  62. Okoye, I.; Yu, S.; Caruso, F.; Rossi, M. X-ray structure determination, antioxidant voltammetry studies of Butein and 2′,4′-Dihydroxy-3,4-dimethoxychalcone. Computational studies of 4 structurally related 2′,4′-diOH chalcones to examine their antimalarial activity by binding to Falcipain-2. Molecules 2021, 26, 6511. [Google Scholar] [CrossRef]
  63. Yasir, M.; Park, J.; Han, E.-T.; Park, W.S.; Han, J.-H.; Kwon, Y.-S.; Lee, H.-J.; Chun, W. Virtual screening of flavonoids against Plasmodium vivax Duffy binding protein utilizing molecular docking and molecular dynamics simulation. Curr. Comput. Aided Drug Des. 2024, 20, 616–627. [Google Scholar] [CrossRef]
  64. Chabir, N.; Ibrahim, H.; Romdhane, M.; Valentin, A.; Moukarzel, B.; Mars, M.; Bouajila, J. Seeds of Peganum harmala L. chemical analysis, antimalarial and antioxidant activities, and cytotoxicity against human breast cancer cells. Med. Chem. 2015, 11, 94–101. [Google Scholar] [CrossRef] [PubMed]
  65. Mahmoud, A.B.; Mäser, P.; Kaiser, M.; Hamburger, M.; Khalid, S. Mining sudanese medicinal plants for antiprotozoal agents. Front. Pharmacol. 2020, 11, 865. [Google Scholar] [CrossRef] [PubMed]
  66. Chepkirui, C.; Ochieng, P.J.; Sarkar, B.; Hussain, A.; Pal, C.; Yang, L.J.; Coghi, P.; Akala, H.M.; Derese, S.; Ndakala, A.; et al. Antiplasmodial and antileishmanial flavonoids from Mundulea sericea. Fitoterapia 2021, 149, 104796. [Google Scholar] [CrossRef] [PubMed]
  67. Ortiz, S.; Vásquez-Ocmín, P.G.; Cojean, S.; Bouzidi, C.; Michel, S.; Figadère, B.; Grougnet, R.; Boutefnouchet, S.; Maciuk, A. Correlation study on methoxylation pattern of flavonoids and their heme-targeted antiplasmodial activity. Bioorg. Chem. 2020, 104, 104243. [Google Scholar] [CrossRef]
  68. Zininga, T.; Ramatsui, L.; Makhado, P.B.; Makumire, S.; Achlinou, I.; Hoppe, H.; Dirr, H.; Shonhai, A. (−)-Epigallocatechin-3-gallate inhibits the chaperone activity of Plasmodium falciparum Hsp70 chaperones and abrogates their association with functional partners. Molecules 2017, 22, 2139. [Google Scholar] [CrossRef]
  69. Baell, J.B. Feeling nature’s PAINS: Natural products, natural product drugs, and Pan Assay Interference Compounds (PAINS). J. Nat. Prod. 2016, 19, 616–628. [Google Scholar] [CrossRef]
  70. Dahlin, J.L.; Walters, M.A. How to triage PAINS-Full Research. Assay Drug Dev. Technol. 2016, 14, 168–174. [Google Scholar] [CrossRef]
  71. MarvinSketch, v. 19.22; ChemAxon Ltd.: Budapest, Hungary, 2022. Available online: https://chemaxon.com/products/marvin (accessed on 30 June 2025).
  72. Luo, J.; Cisler, R.A. Discovering outliers or potential drug toxicities using a large-scale data-driven approach. Cancer Inform. 2016, 15, 211–217. [Google Scholar] [CrossRef]
  73. KNIME, v. 4.0.2; KNIME AG: Zürich, Switzerland, 2022. Available online: https://www.knime.com/downloads (accessed on 30 June 2025).
  74. Molina-Cruz, A.; DeJong, R.J.; Ortega, C.; Haile, A.; Abban, E.; Rodrigues, J.; Jaramillo-Gutierrez, G.; Barillas-Mury, C. Some strains of Plasmodium falciparum, a human malaria parasite, evade complement-like system of Anopheles gambiae mosquitoes. Proc. Natl. Acad. Sci. USA 2012, 109, E1957–E1962. [Google Scholar] [CrossRef]
  75. Awandare, G.A.; Nyarko, P.B.; Aniweh, Y.; Ayivor-Djanie, R.; Stoute, J.A. Plasmodium falciparum strains spontaneously switch evasion phenotype in suspension culture. Sci. Rep. 2018, 8, 5782. [Google Scholar] [CrossRef] [PubMed]
  76. MOPAC2016. Paddington Circle, USA. 2022. Available online: http://openmopac.net/downloads.html (accessed on 30 June 2025).
  77. RDKit, v. 2021.03.1; Greg Landrum: Dallas, TX, USA, 2022. Available online: https://www.rdkit.org (accessed on 30 June 2025).
  78. Yap, C.W. PaDEL-Descriptor: An open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 2011, 32, 1466–1474. [Google Scholar] [CrossRef] [PubMed]
  79. Akoglu, H. User’s guide to correlation coefficients. Turk. J. Emerg. Med. 2018, 18, 91–93. [Google Scholar] [CrossRef]
  80. Saunders, L.J.; Russel, R.A.; Crabb, D.P. The coefficient of determination: What determines a useful R2 statistic? Investig. Ophtalmol. Vis. Sci. 2010, 53, 6830–6832. [Google Scholar] [CrossRef]
  81. Russo, D.P.; Zorn, K.M.; Clark, A.M.; Zhu, H.; Ekins, S. Comparing multiple machine learning algorithms and metrics for estrogen receptor binding prediction. Mol. Pharm. 2018, 15, 4361–4370. [Google Scholar] [CrossRef] [PubMed]
  82. Taraji, M.; Haddad, P.R.; Amos, R.I.J.; Talebi, M.; Szucs, R.; Dolan, J.W.; Pohl, C.A. Error measures in quantitative structure-activity relationship studies. J. Chromatogr. A. 2017, 1524, 298–302. [Google Scholar] [CrossRef]
  83. Enalos, v. 2021; NovaMechanics Ltd.: Nicosia, Cyprus, 2022. Available online: http://www.novamechanics.com (accessed on 30 June 2025).
  84. Kar, S.; Roy, K.; Leszczynski, J. Applicability domain: A step toward confident predictions and decidability for QSAR modelling. Methods Mol. Biol. 2018, 1800, 141–169. [Google Scholar] [CrossRef]
  85. Tropsha, A.; Gramatica, P.; Gombar, V.J. The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb. Sci. 2003, 22, 69–77. [Google Scholar] [CrossRef]
  86. Hastings, J.; Owen, G.; Dekker, A.; Ennis, M.; Kale, N.; Muthukrishnan, V.; Turner, S.; Swainston, N.; Mendes, P.; Steinbeck, C. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016, 44, 1214–1219. [Google Scholar] [CrossRef]
  87. Neveu, V.; Pérez-Jiménez, J.; Vos, F.; Crespy, V.; Du Chaffaut, L.; Mennen, L.; Knox, C.; Eisner, R.; Cruz, J.; Wishart, D.; et al. Phenol-Explorer: An online comprehensive database on polyphenol contents in foods. Database 2010, 2010, bap024. [Google Scholar] [CrossRef]
  88. GOLD, v. 5.4; The Cambridge Crystallographic Data Centre: Cambridge, UK, 2022. Available online: https://www.ccdc.cam.ac.uk (accessed on 30 June 2025).
  89. Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. [Google Scholar] [CrossRef] [PubMed]
  90. Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA data mining software: An update. ACM SIGKDD Explor. Newsl. 2009, 11, 10–18. [Google Scholar] [CrossRef]
  91. Kaur, A.; Kaur, I. An empirical evaluation of classification algorithms for fault prediction in open source projects. J. King Saud Univ. Comput. Info. Sci. 2018, 30, 2–17. [Google Scholar] [CrossRef]
  92. Molecular Operating Environment (MOE), v. 2020.0901; Chemical Computing Group ULC: Montreal, QC, Canada, 2025.
  93. Bowers, K.J.; Chow, E.; Xu, H.; Dror, R.O.; Eastwood, M.P.; Gregersen, B.A.; Klepeis, J.L.; Kolossvary, I.; Moraes, M.A.; Sacerdoti, F.D.; et al. Scalable algorithms for molecular dynamics simulations on commodity clusters. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC06), Tampa, FL, USA, 11–17 November 2006. [Google Scholar]
  94. Humphrey, W.; Dalke, A.; Schulten, K. VMD—Visual Molecular Dynamics. J. Molec. Graph. 1996, 14, 33–38. [Google Scholar] [CrossRef] [PubMed]
  95. Wolber, G.; Langer, T. LigandScout: 3-D Pharmacophores derived from protein-bound ligands and their use as virtual screening filters. J. Chem. Inf. Model. 2004, 45, 160–169. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Quantification (in percentages) of structures classified as active or inactive by each model.
Figure 1. Quantification (in percentages) of structures classified as active or inactive by each model.
Ddc 04 00033 g001
Figure 2. Chemical structure of the five best ligands based on their respective ChemPLP score classification. (A) 6′-Hydroxyangolensin; (B) Angolensin; (C) Butein; (D) 6′-Hydroxy-O-desmethylangolensin; (E) 4’-O-Methylequol; (F) O-Desmethylangolensin; (G) Dihydrodaidzein 7-O-glucuronide; (H) Cyanidin; (I) 3′-4′-5-7-Tetrahydroxyisoflavone; (J) Equol 4′-O-glucuronide.
Figure 2. Chemical structure of the five best ligands based on their respective ChemPLP score classification. (A) 6′-Hydroxyangolensin; (B) Angolensin; (C) Butein; (D) 6′-Hydroxy-O-desmethylangolensin; (E) 4’-O-Methylequol; (F) O-Desmethylangolensin; (G) Dihydrodaidzein 7-O-glucuronide; (H) Cyanidin; (I) 3′-4′-5-7-Tetrahydroxyisoflavone; (J) Equol 4′-O-glucuronide.
Ddc 04 00033 g002
Figure 3. Decision trees generated with the WEKA software for the FlavStr and protein databases. The blue and red arrows determine the presence or absence of a certain lateral chain, respectively. The numbers in black symbolise the position of that chain in relation to the basic structure of a flavonoid. (A) Decision tree built for the FlavStr database, evidencing the importance of cyclic lateral chains, methylations and hydroxylations for the structure’s activity; (B) decision tree built for the FabG database, this time evidencing the importance of a gallate group for the bioactivity; (C) decision tree built for the FabI database, where only a OH group in position 3′ in ring B matters for the bioactivity; (D) decision tree built the FabZ database, which is very similar to FabI’s tree, except that the OH group needs to be in position 4′ for the bioactivity.
Figure 3. Decision trees generated with the WEKA software for the FlavStr and protein databases. The blue and red arrows determine the presence or absence of a certain lateral chain, respectively. The numbers in black symbolise the position of that chain in relation to the basic structure of a flavonoid. (A) Decision tree built for the FlavStr database, evidencing the importance of cyclic lateral chains, methylations and hydroxylations for the structure’s activity; (B) decision tree built for the FabG database, this time evidencing the importance of a gallate group for the bioactivity; (C) decision tree built for the FabI database, where only a OH group in position 3′ in ring B matters for the bioactivity; (D) decision tree built the FabZ database, which is very similar to FabI’s tree, except that the OH group needs to be in position 4′ for the bioactivity.
Ddc 04 00033 g003
Figure 4. Trajectory snapshots from the complexes formed between the proteins FabZ and FabI and two of their respective best ligands. The frames shown were chosen based on the mode (i.e., most repetitive value) of the RMSD value for each frame. Yellow spheres represent hydrophobic interactions (H), green vectors represent HBDs, red vectors represent HBAs), and purple rings represent aromatic ring interactions (AR). (A) Ligand 6′-Hydroxyangolensin bound to the active site of FabZ; (B) ligand Angolensin bound to the active site of FabZ; (C) ligand Dihydrodaizein 7-O-Glucuronide bound to the active site of FabI; (D) ligand O-Desmethylangolensin bound to the active site of FabI.
Figure 4. Trajectory snapshots from the complexes formed between the proteins FabZ and FabI and two of their respective best ligands. The frames shown were chosen based on the mode (i.e., most repetitive value) of the RMSD value for each frame. Yellow spheres represent hydrophobic interactions (H), green vectors represent HBDs, red vectors represent HBAs), and purple rings represent aromatic ring interactions (AR). (A) Ligand 6′-Hydroxyangolensin bound to the active site of FabZ; (B) ligand Angolensin bound to the active site of FabZ; (C) ligand Dihydrodaizein 7-O-Glucuronide bound to the active site of FabI; (D) ligand O-Desmethylangolensin bound to the active site of FabI.
Ddc 04 00033 g004
Figure 5. Tridimensional representation of the pharmacophore aligned axis for each protein. Yellow spheres represent hydrophobic interactions (H), green vectors represent hydrogen bond donors (HBD), red vectors represent hydrogen bond acceptors (HBA), and purple rings represent aromatic ring interactions (AR). (A) Pharmacophore model for FabZ; (B) pharmacophoric model for FabI.
Figure 5. Tridimensional representation of the pharmacophore aligned axis for each protein. Yellow spheres represent hydrophobic interactions (H), green vectors represent hydrogen bond donors (HBD), red vectors represent hydrogen bond acceptors (HBA), and purple rings represent aromatic ring interactions (AR). (A) Pharmacophore model for FabZ; (B) pharmacophoric model for FabI.
Ddc 04 00033 g005
Table 1. Pearson’s correlation values calculated for selected molecular descriptors by database.
Table 1. Pearson’s correlation values calculated for selected molecular descriptors by database.
FlavStr Classification
ATS8mminsCH3maxHdsCHMDEO-22
ATS8m1.00−0.02−0.210.61
minsCH3−0.021.000.220.34
maxHdsCH−0.200.221.00−0.13
MDEO-220.610.34−0.131.00
FlavStr regression
smr_VSA10minsCH3MDEO-22GGI8
smr_VSA101.000.110.380.27
minsCH30.111.000.34−0.08
MDEO-220.380.341.000.51
GGI80.27−0.080.511.00
FabG
AATS8mVE3_DzvFNSA-1
AATS8m1.000.480.48
VE3_Dzv0.481.00−0.06
FNSA-10.48−0.061.00
FabZ
ATS3mnHBint7SIC3
ATS3m1.000.48−0.58
nHBint70.481.00−0.14
SIC3−0.58−0.141.00
FabI
VE3_DzvmaxHBint9TDB6u
VE3_Dzv1.000.46−0.45
maxHBint90.461.00−0.53
TDB6u−0.45−0.531.00
Table 2. Selected classification algorithms’ overall accuracy (A), Cohen’s Kappa (K), and error index (E) scores.
Table 2. Selected classification algorithms’ overall accuracy (A), Cohen’s Kappa (K), and error index (E) scores.
AlgorithmDatabaseDatasetAKE
MLPFlavStrTr86.53%0.7213.46%
Ts92.85%0.857.14%
CV73.07%0.4526.92%
RFFabGTr100.00%1.000.00%
Ts100.00%1.000.00%
CV87.50%0.7112.50%
SVMFabZTr100.00%1.000.00%
Ts100.00%1.000.00%
CV100.00%1.000.00%
KNNFabITr100.00%1.000.00%
Ts100.00%1.000.00%
CV87.50%0.7512.50%
Table 3. Selected regression algorithms’ R2, RMSE, and acceptability criteria.
Table 3. Selected regression algorithms’ R2, RMSE, and acceptability criteria.
AlgorithmDatabaseDatasetR2RMSEAcceptability
Criteria
SVMFlavStrTr0.650.72
Ts0.640.86Passed
CV0.540.83
MLPFabGTr0.970.13
Ts0.950.09Passed
CV0.940.19
MLPFabZTr0.990.03
Ts0.970.10Passed
CV0.990.03
PLSFabITr0.990.07
Ts0.940.14Passed
CV0.990.07
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fernandes, A.d.O.; Paixão, V.V.M.; Santos, Y.J.A.; Alves, E.B.; Rodrigues, R.P.; Chagas-Paula, D.A.; Faraoni, A.S.; Casoti, R.; Batista, M.V.d.A.; Bermudez, M.; et al. Identification of Pharmacophore Groups with Antimalarial Potential in Flavonoids by QSAR-Based Virtual Screening. Drugs Drug Candidates 2025, 4, 33. https://doi.org/10.3390/ddc4030033

AMA Style

Fernandes AdO, Paixão VVM, Santos YJA, Alves EB, Rodrigues RP, Chagas-Paula DA, Faraoni AS, Casoti R, Batista MVdA, Bermudez M, et al. Identification of Pharmacophore Groups with Antimalarial Potential in Flavonoids by QSAR-Based Virtual Screening. Drugs and Drug Candidates. 2025; 4(3):33. https://doi.org/10.3390/ddc4030033

Chicago/Turabian Style

Fernandes, Adriana de Oliveira, Valéria Vieira Moura Paixão, Yria Jaine Andrade Santos, Eduardo Borba Alves, Ricardo Pereira Rodrigues, Daniela Aparecida Chagas-Paula, Aurélia Santos Faraoni, Rosana Casoti, Marcus Vinicius de Aragão Batista, Marcel Bermudez, and et al. 2025. "Identification of Pharmacophore Groups with Antimalarial Potential in Flavonoids by QSAR-Based Virtual Screening" Drugs and Drug Candidates 4, no. 3: 33. https://doi.org/10.3390/ddc4030033

APA Style

Fernandes, A. d. O., Paixão, V. V. M., Santos, Y. J. A., Alves, E. B., Rodrigues, R. P., Chagas-Paula, D. A., Faraoni, A. S., Casoti, R., Batista, M. V. d. A., Bermudez, M., Dolabella, S. S., & Oliveira, T. B. (2025). Identification of Pharmacophore Groups with Antimalarial Potential in Flavonoids by QSAR-Based Virtual Screening. Drugs and Drug Candidates, 4(3), 33. https://doi.org/10.3390/ddc4030033

Article Metrics

Back to TopTop