AICpred: Machine Learning-Based Prediction of Potential Anti-Inflammatory Compounds Targeting TLR4-MyD88 Binding Mechanism

Fry-Nartey, Lucindah N.; Akafia, Cyril; Nkonu, Ursula S.; Baiden, Spencer B.; Dorvi, Ignatus Nunana; Agyenkwa-Mawuli, Kwasi; Agyapong, Odame; Hayford, Claude Fiifi; Wilson, Michael D.; Miller, Whelton A.; Kwofie, Samuel K.

doi:10.3390/info16010034

Open AccessArticle

AICpred: Machine Learning-Based Prediction of Potential Anti-Inflammatory Compounds Targeting TLR4-MyD88 Binding Mechanism

by

Lucindah N. Fry-Nartey

^1,2,3,

Cyril Akafia

^1,4

,

Ursula S. Nkonu

¹,

Spencer B. Baiden

¹,

Ignatus Nunana Dorvi

^1,3,

Kwasi Agyenkwa-Mawuli

^1,2,

Odame Agyapong

¹,

Claude Fiifi Hayford

¹

,

Michael D. Wilson

^2,†

,

Whelton A. Miller III

^5,6,7,*

and

Samuel K. Kwofie

^1,3,*

¹

Department of Biomedical Engineering, School of Engineering Sciences, College of Basic and Applied Sciences, University of Ghana, Legon, Accra P.O. Box LG 77, Ghana

²

Department of Parasitology, Noguchi Memorial Institute for Medical Research, College of Health Sciences, University of Ghana, Legon, Accra P.O. Box LG 581, Ghana

³

West Africa Centre for Cell Biology of Infectious Pathogens, Department of Biochemistry, Cell and Molecular Biology, University of Ghana, Legon, Accra P.O. Box LG 54, Ghana

⁴

Department of Psychiatry, Yale University, New Haven, CT 06511, USA

⁵

Department of Medicine, Loyola University Medical Center, Loyola University Chicago, Maywood, IL 60153, USA

⁶

Department of Molecular Pharmacology & Neuroscience, Loyola University Medical Center, Loyola University Chicago, Maywood, IL 60153, USA

⁷

Department of Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA

^*

Authors to whom correspondence should be addressed.

^†

Deceased.

Information 2025, 16(1), 34; https://doi.org/10.3390/info16010034

Submission received: 31 July 2024 / Revised: 24 December 2024 / Accepted: 30 December 2024 / Published: 7 January 2025

(This article belongs to the Special Issue Advances in Machine Learning and Intelligent Information Systems)

Download

Browse Figures

Versions Notes

Abstract

Toll-like receptor 4 (TLR4) has been implicated in the production of uncontrolled inflammation within the body, known as the cytokine storm. Studies that employ machine learning (ML) in the prediction of potential inhibitors of TLR4 are limited. This study introduces AICpred, a robust, free, user-friendly, and easily accessible machine learning-based web application for predicting inhibitors against TLR4 by targeting the TLR4-myeloid differentiation primary response 88 (MyD88) interaction. MyD88 is a crucial adaptor protein in the TLR4-induced hyper-inflammation pathway. Predictive models were trained using random forest, adaptive boosting (AdaBoost), eXtreme gradient boosting (XGBoost), k-nearest neighbours (KNN), and decision tree models. To handle imbalance within the training data, resampling techniques such as random under-sampling, synthetic minority oversampling technique, and the random selection of 5000 instances of the majority class were employed. A 10-fold cross-validation strategy was used to evaluate model performance based on metrics including accuracy, balanced accuracy, and recall. The XGBoost model demonstrated superior performance with accuracy, balanced accuracy, and recall scores of 0.994, 0.958, and 0.917, respectively, on the test. The AdaBoost and decision tree models also excelled with accuracies ranging from 0.981 to 0.992, balanced accuracies between 0.921 and 0.944, and recall scores between 0.845 and 0.891 on both training and test datasets. The XGBoost model was deployed as AICpred and was used to screen compounds that have been reported to have positive effects on mitigating the hyperinflammation-associated cytokine storm, which is a key factor in COVID-19. The models predicted Baricitinib, Ibrutinib, Nezulcitinib, MCC950, and Acalabrutinib as anti-TLR4 compounds with prediction probability above 0.90. Additionally, compounds known to inhibit TLR4, including TAK-242 (Resatorvid) and benzisothiazole derivative (M62812), were predicted as bioactive agents within the applicability domain with probabilities above 0.80. Computationally inferred compounds using AICpred can be explored as potential starting skeletons for therapeutic agents against hyperinflammation. These predictions must be consolidated with experimental screening to enhance further optimisation of the compounds. AICpred is the first of its kind targeting the inhibition of TLR4-MyD88 binding and is freely available at http://197.255.126.13:8080.

Keywords:

toll-like receptor 4 (TLR4); machine learning; anti-inflammatory; inflammatory; inhibitors; cytokine storm; web application

1. Introduction

Toll-like receptor 4 (TLR4) is a key transmembrane receptor that recognises both infectious and non-infectious inflammatory stimuli [1]. TLR4 signalling leads to pro-inflammatory action through the production of pro-inflammatory cytokines, which are small proteins responsible for controlling antibody production, recruiting immune cells, and regulating inflammation for combating infection [1,2]. The primary cytokines involved in signalling between cells of the immune system are interleukins (ILs), tumour necrosis factors (TNFs), interferons (IFNs), and chemokines which can be classified as either pro-inflammatory or anti-inflammatory [3]. During viral infections, antiviral responses in neighbouring cells are activated, and the recruitment of innate and adaptive immune cells is triggered by damage-associated molecular patterns (DAMPs) and pathogen-associated molecular patterns (PAMPs) [4,5,6]. TLR4 recognises invading pathogens via PAMPS and molecules with endogenous origin from damaged tissues via DAMPS to activate protective self-defence mechanisms for the body. Myeloid differentiation primary response 88 (MyD88) is an important adaptor protein in TLR4 signalling [7]. MyD88 interacts with TLR4 through its Toll/interleukin-1 receptor (TIR) domain [8]. The interaction initiates a signalling cascade leading to nuclear factor kappa B (NF-κB) activation and cytokine production [7,9]. NF-κB is a key mediator of pro-inflammatory activity in the body [9].

The balance between pro-inflammatory cytokines, which are responsible for producing an inflammatory response, and anti-inflammatory cytokines, responsible for controlling inflammation in the body, is complex and important [10]. During viral infections, a variety of immune cells are triggered, which leads to a disruption in the cytokine balance, causing pro-inflammatory cytokines to increase by a considerable amount [11]. While pro-inflammatory cytokines work to ameliorate the further spread of the virus, their uncontrolled production can lead to severe undesirable outcomes, known as cytokine storm (CS) [10]. The cytokine storm (CS) is a critical factor in the poor prognosis of both infectious and non-infectious diseases, contributing to severe symptoms and increased fatality rates [1,12]. Given its involvement in the cytokine storm, inhibiting TLR4 signalling, particularly by targeting the TLR4-MyD88 interaction, emerges as a promising therapeutic strategy for managing inflammatory conditions [13,14].

To identify anti-inflammatory compounds, machine learning (ML) techniques were employed to develop predictive models for inhibitors of the cyclooxygenase-2 (COX-2) enzyme, a key mediator in inflammatory response [15,16]. These models achieved accuracies of 0.746 to 0.754, sensitivity scores of 0.612 to 0.686, and specificity scores of 0.777 to 0.814. In addition, a deep learning (DL)-based model was developed to predict the anti-inflammatory activity of peptides with an area under the curve (AUC) of 0.919 and MCC of 0.735 [17]. In a recent study, deep variational autoencoders and contrast learning, which are advanced computational methods, were used in the identification of anti-inflammatory peptides [18]. Although studies that target TLR4 specifically are limited, a recent study developed ML models using a compiled list of 78 compounds from the ChEMBL database to predict TLR4 inhibitors specifically for the treatment of Mycoplasma pneumonia disease [19,20]. These studies, however, do not specifically target the TLR4-MyD88 binding, which has been shown by multiple studies to be a plausible target for the treatment of the cytokine storm [13,15,17,18]. Hence, there is a pressing need for accessible, user-friendly, and robust applications that specifically target the TLR4-MyD88 interaction, which has been shown to be a plausible drug target for mitigating the cytokine.

In this study, we developed and hosted the first robust ML-based application, AICpred, to predict potential inhibitors against the TLR4-MyD88 binding mechanism. AICpred is user-friendly and easily accessible. We evaluated the potential of AICpred by predicting compounds implicated in inhibiting CS as anti-TLR4. AICpred attained high performance on standard evaluation metrics, which makes the application a potential resource in drug discovery pipelines. The predicted compounds have the potential to be utilised as initial backbones to experimentally evaluate their propensity to inhibit the CS via TLR4-MyD88 inflammatory mechanisms. Although the precise interaction mechanism is unknown, studies have revealed that the COVID-19 spike protein interacts with TLR4 to cause hyperinflammation [21,22]. These observations have been supported by both in silico and in vitro studies [23,24,25]. This gives merit to investigating AICpred predicted compounds as therapeutics for COVID-19-induced cytokine storm.

2. Materials and Methods

2.1. Methods

In this study, five ML algorithms comprising random forest, k-nearest neighbours (KNN), decision tree, eXtreme gradient boosting (XGBoost), and adaptive boosting (AdaBoost) were employed to construct predictive models for identifying potential anti-inflammatory compounds targeting TLR4. Random forest is frequently used for QSAR prediction due to its superior robustness, predictability, and ease of use [26]. The KNN algorithm, known for its simplicity and effectiveness for classification studies, offers the additional advantage of interpretability, allowing for the identification of structural features that may contribute to biological activity [27]. Decision tree algorithms have also been recognised for their ability to interpret based on the clear decision rules that link descriptors with biological activity [28,29]. Ensemble methods such as XGBoost and AdaBoost are particularly well-suited for classification problems, offering enhanced accuracy by minimising bias in imbalanced datasets and rapidly converging to global minima [30,31]. Their superior performance in QSAR prediction has been well-documented [32,33]. XGBoost, characterised by the intelligent aggregation of weak learners (trees) to develop a robust model (Figure 1), has been widely reported for its exceptional performance in QSAR modelling, making it an optimal choice for this study [32,33,34].

Although support vector machines (SVMs) have been successfully used to develop high-performing QSAR models, they were excluded from this study for several reasons [36,37]. While SVM-based models can achieve strong predictive performance across various datasets, their computational complexity—both in terms of time and memory—grows substantially as the dataset size increases, limiting their scalability for large-scale molecular screening [38,39,40]. In contrast, algorithms like random forest, XGBoost, and AdaBoost efficiently handle large datasets and have demonstrated optimal performance when applied to QSAR modelling [41,42,43,44,45].

The predictive models were implemented using the scikit-learn library (version 1.0.2), an open-source machine learning framework for Python (version 3.7.12) library. Bioassay data were obtained from PubChem for the cross-validation and testing of these models [46]. Their performance was evaluated using balanced accuracy, precision, F1 score, recall, Matthew’s correlation coefficient (MCC), and the area under the receiver operating curve (AUROC) as appropriate metrics. Additionally, the models were further validated against known TLR4 inhibitors to ensure their reliability. A summary of the study methodology is shown in Figure 2.

2.2. Dataset Extraction

The dataset of compounds with experimentally measured activity against TLR4 was downloaded from PubChem with Assay ID 861 [46]. This assay is part of broader research comprising four additional assays aimed at identifying inhibitors of TLR4 [51]. The assay identifies compounds that exhibit an inhibitory effect on the toll-like receptor by measuring the inhibition of constitutive TLR4–MYD88 binding [51,52]. The compounds were classified based on an algorithm that calculates a threshold value by summing the percent inhibition of all compounds and adding three times their standard deviation [51]. Any compound exhibiting greater percent inhibition than the threshold was labelled as active. An activity score was subsequently calculated by normalising the percent inhibition of each test compound to the highest observed inhibition value, with negative inhibition values assigned an activity score of zero [51,52]. The dataset consisted of 356 active and 195,623 inactive compounds.

2.3. Descriptors Computation

The Mold2 Descriptor Generator Software (version 2.0.0), designed for efficient and rapid calculation of a broad range of descriptors [53], was used to calculate the molecular descriptors for each compound in the active and inactive sets. Molecular descriptors are mathematical representations of the properties of a molecule, obtained by a well-specified algorithm [54]. As machine learning (ML) models require quantitative data, molecular descriptors provide a means to quantify the physiochemical properties of the compounds [55]. These descriptors serve as the features that are used to train and test the ML models.

A total of 777 2D molecular descriptors (features) were computed for each molecule in the entire dataset. These features were labelled D001 to D777, covering properties such as the counts for functional groups, structural features, and counts for atoms and bonds. For instance, descriptors D012 to D019 were features described by the number of multiple bonds, number of circuit structures, number of rotatable bonds, rotatable bond fractions, number of double bonds, number of aromatic bonds, and the sum of conventional bond orders, respectively. The descriptions of each of the 777 molecular descriptors are provided by the United States Food and Drug Administration [56].

2.4. Data Pre-Processing and Feature Selection

Data preprocessing is a crucial step in ML workflows, involving the cleaning and transformation of data to ensure it is suitable for analysis [57]. A custom Python script was used to remove null values, duplicates, and irrelevant non-compound features from the data. Additionally, the activity column was encoded in a binary format, where active and inactive compounds were assigned values of 1 and 0, respectively. Eighty percent (80%) of the data were allocated for training and validating the models, while the remaining twenty percent (20%) was held out as a test set.

A basic feature selection process was employed in developing the predictive models. To ensure that the ML algorithms perform optimally, low variance features, which are features with little or no useful information, were eliminated using the VarianceThreshold function from scikit-learn’s feature selection package, with a threshold of 0 [58].

The dataset exhibited a high-class imbalance between the active and inactive compounds, which can affect model performance by underrepresenting the minority class [59]. To address this, three resampling techniques were applied to the training data, generating our different models for each of the three resampling techniques in addition to no resampling. The resampling techniques are included herein (1) random undersampling, where data points from the majority class were randomly selected without replacement in an active to inactive ratio of 1:2 [60,61]. This was achieved using the RandomUnderSampler function from Imbalanced-learn (version 0.10.1) with a sampling_strategy of 0.5. (2) Synthetic Minority Oversampling Technique (SMOTE), which creates synthetic instances of the minority class by finding the k-nearest neighbours to the feature vector using mathematical computations [44]. (3) Five thousand inactive compounds randomly sampled for training along with the original number of active compounds.

2.5. Model Training

Model training was carried out on the Kaggle platform, leveraging its computational resources, robust environment, and support for ML tasks, using Scikit-Learn, Pandas, NumPy, and other Python libraries. For each algorithm employed in this study, four different models were trained on the original unbalanced data and the three data resampling techniques described above. The random forest algorithm was employed, which constructs multiple decision trees trained on a random subset of features and samples. The final prediction is determined by majority voting of the individual tree predictions [62]. Decision trees consist of a hierarchical structure with root nodes, internal nodes, and leaf nodes, utilising a sigmoid activation function [28]. Boosting algorithms such as AdaBoost and XGBoost were also applied, where they use a combination of weak learners to create a stronger, more robust model (Figure 1) [63]. For the K-nearest neighbours (KNN) algorithm, a non-parametric model, no explicit architecture or activation function is required [64]. Hyperparameters for the KNN model, such as the number of neighbours (set to 10) and the number of jobs (set to −1), were optimised.

2.6. Model Evaluation and Validation

The developed models were evaluated using a 10-fold cross-validation strategy and further validated on the held-out test set. The evaluation metrics used were balanced accuracy, precision, F1 score, recall, Matthew’s correlation coefficient (MCC), and the area under the receiver operating curve (AUROC). The formulae and interpretation of these performance metrics are shown in Table 1. A confusion matrix based on the true positive (TP), true negative (TN), false positive (FP), and false negative (FN) parameters was used at each model evaluation stage [65].

Accuracy represents the ratio of correctly predicted instances to total instances but may not be appropriate for imbalanced data without the application of data sampling techniques to introduce balance, making it a viable metric for model comparison [44,66]. In addition to accuracy, balanced accuracy assesses the true measure of accuracy for each class in imbalanced data [67]. Precision determines the fraction of correctly predicted positives, while recall measures the fraction of actual positives correctly predicted [68]. F1 score is a result of a concordant mean of precision and recall [69]. MCC only presents a high score when the model prediction produces good results in all four categories of the confusion matrix [70]. The AUROC score evaluates a model’s ability to distinguish between classes using error costs [71]. The models were further validated with known (experimentally determined) inhibitors of TLR4. A total of five known inhibitors of TLR4, including Resatorvid, M62812, ZINC25778142, (+)-Naltrexone, and (+)-Naloxone, were evaluated via the trained model to further assess their performance [47,48,49,50]. Table 2 states these inhibitors and their half-maximal inhibitory concentrations (IC₅₀).

2.7. Applicability Domain Analysis

The applicability domain for predictive models refers to the physiochemical, structural, or biological information on which the training set of the model has been generated and within which the model’s predictions for new compounds are considered relevant and reliable [72]. The applicability domain is necessary for defining the boundary in which a given model’s prediction can be regarded to be reliable [73]. In this study, the applicability domain of the trained models was assessed by generating a plot of standardised descriptor values for each compound in both the training and test sets. The descriptors for each compound were standardised as part of this analysis [74]:

S_{k i} = \frac{| X_{k i} - {\bar{X}}_{i} |}{σ_{X_{i}}}

(1)

where

$k = 1, 2, 3$ … $n$ number of compounds;
$i = 1, 2, 3$ … $n$ number of molecular descriptors;
$S_{k i} =$ Standardized descriptor $i$ for compound $k$ from the training or test set;
$X_{k i} =$ Actual descriptor $i$ for the compound $k$ from the training or test set;
${\bar{X}}_{i} =$ Mean value for the descriptor $X_{i}$ from the training compounds only;
$σ_{X_{i}} =$ Standard deviation of the descriptor $X_{i}$ from training compounds only.

2.8. Web Server Development

An interactive web application was developed utilising the best-performing models with widely adopted tools and libraries. The backend was implemented using Flask [75], while the front end was designed with Hypertext Markup Language (HTML), Cascading Style Sheets (CSS), and JavaScript [76].

2.9. Screening of COVID-19-Induced CS Inhibitors

The deployed models were utilised to screen inhibitors that have been involved in clinical trials, clinical observations, in vitro assays, and animal experimentation to significantly alleviate the cytokine storm associated with COVID-19. These include inhibitors of protein targets such as JAK, Nod-like receptor family pyrin domain-containing 3 (NLRP3), and Bruton’s Tyrosine Kinase (BTK), whose inhibition has been shown to demonstrate anti-inflammatory effects in COVID-19 cases (Table 3) [77,78,79]. The Simplified Molecular Input Line Entry System (SMILES) of each of the compounds was obtained from PubChem and used as inputs for the deployed models [46].

3. Results

3.1. Data Pre-Processing

Following data cleaning, the dataset consisted of 194,888 inactive compounds and 356 active compounds, all with 777 features (descriptors). To facilitate model development and evaluation, the dataset was partitioned into training and test sets. The training set contained 155,917 inactive compounds and 278 active compounds, while the test set consisted of 38,971 inactive compounds and 78 active compounds. Dimensionality reduction was applied, resulting in a final training set of 156,195 rows and 645 columns and a testing set of 39,049 rows and 645 columns.

The three data resampling iterations were then performed to balance the training data shown in Figure 3. No resampling was performed on the heldout test set. Random undersampling resulted in a balanced training set of 556 inactive compounds and 278 active compounds. The application of SMOTE increased the number of active compounds to 77,958, while the number of inactive compounds remained at 155,917. In the fourth iteration, a subset of 5000 inactive compounds was randomly selected and combined with the 356 active compounds. Here, 4000 inactive compounds and 284 active compounds were used for training, while 1000 inactive compounds and 72 active compounds were used for testing.

3.2. Model Development and Evaluation

The performance of the models was assessed using a 10-fold cross-validation based on accuracy, balanced accuracy, recall, precision, F1 score, and MCC (Figure 4). The models were then tested on held-out data to further assess their performance on unseen data (Table 4). The AUROC was also calculated for each model based on their true positive and false negative rates of prediction (Figure 5).

XGBoost models had the highest performances (Table 4), with evaluation metrics above 0.85 except the model trained with data from the random undersampled iteration (Table S1). In contrast, KNN showed the weakest performance with balanced accuracies as low as 0.51, precision scores as low as 0.01, and F1 scores as low as 0.02 (Table S2). Among the four different training and testing iterations, random undersampling iterations produced the poorest results with precisions as low as 0.03 for random forest (Table S3), 0.05 for decision tree (Table S4), and 0.01 for KNN. Performance of the AdaBoost models was comparable to the XGBoost models (Table S5).

3.3. Validation with Known Inhibitors of TLR4

The eight top-performing models were further validated using experimentally determined inhibitors of TLR4. Each of the models predicted at least 1 inhibitor as active except the AdaBoost model trained on unsampled data. The XGBoost model trained on the random 5000 data resampling technique was able to predict all five known inhibitors as active with prediction probabilities ranging from 0.83 to 0.997 (Table 5). Due to its performance across metrics and validation with known inhibitors, this XGBoost model was selected for deployment as a web server.

3.4. Results of Applicability Domain Analysis

According to the descriptor standardisation approach, 1.16% of the training data were classified as outliers while 0.09% of the external test data fell outside the model’s domain of applicability (Figure 6). These outliers represent compounds with molecular descriptors significantly deviating from the majority of the data, indicating that their predictions may be less reliable. The low percentage of test set outliers suggests that the model’s applicability domain is well-defined and capable of handling most of the new compounds with confidence.

3.5. Model Deployment

The models have been made accessible via a web server application named AICpred, which can be freely accessed at http://197.255.126.13:8080 (accessed on 24 December 2024) (Figure 7). Users can utilise this web server platform to predict potential inhibitors of TLR4 by submitting a compound in SMILES format or a text file containing the compound ID and its corresponding SMILES.

3.6. Evaluating COVID-19-Induced CS Inhibitors

The AICpred web server was employed to screen known inhibitors of JAK, NLRP3, and BTK. With the exception of Nezulcitinib, all inhibitors were within the domain of applicability of both models and were predicted as active against TLR4 with prediction probabilities ranging from 0.992 to 0.996 (Table 6). The details of the results are shown in Table 6.

4. Discussion

The role of the cytokine storm in disrupting the immune balance and its clinical implications in sepsis management and multiorgan failure is well established [85]. To address this challenge, we have developed AICpred, a machine learning-based web application designed to facilitate the identification of potential compounds targeting the TLR4-MyD88 binding. Traditional ML algorithms have been reported to perform better on the majority class than the minority class [86]. To mitigate the issue of class imbalance in our dataset, resampling techniques were employed in optimising model performance [87]. Furthermore, dimensionality reduction was applied to eliminate redundant and irrelevant data, which improved computational efficiency, increased learning accuracy, and provided deeper insights into the models [88].

Among the algorithms, boosting algorithms (XGBoost and AdaBoost) and tree-based algorithms (decision trees and random forests) produced the best-performing models. All XGBoost models, with the exception of the model trained with the random undersampling iteration, achieved values above 0.85 for all evaluation metrics. Boosting algorithms are a type of ensemble learning technique that enhance the performance of individual base learners by fusing them into a composite whole [89]. Thus, making them fast and capable of handling possible overfitting and missing data [90]. Notably, XGBoost achieved excellent generalisability and accuracy, even in the presence of imbalanced data, with accuracies reaching 0.99 and balanced accuracies above 0.75, outperforming other models in robustness and prediction accuracy [91,92,93].

In this study, the resampling technique that produced one of the best-performing models for each algorithm (except KNN) was the random selection of 5000 inactive compounds versus the original number of actives (356). In contrast, random undersampling performed the weakest among all iterations, likely due to the loss of important information from the majority class [94]. SMOTE, a popular technique for handling class imbalance in many studies, was applied and performed well with XGBoost models, yielding an accuracy and balanced accuracy both above 0.90 [44,95,96,97].

The performance of our models was compared with previous studies. For example, ML models have been trained on 1565 inhibitors and 1671 non-inhibitors to predict inhibitors against the signal transducer and activator of transcription 3 (STAT3) to alleviate the cytokine storm [98]. The study was based on a 2-D descriptor with the best random forest model yielding accuracies of 0.763 and 0.763. In contrast, the random forest models in our study achieved training and validation accuracies of 0.971 and 0.968, respectively. In addition, the XGBoost models in this study outperformed others by over 25% [98]. In another study, a random forest model has also been developed for the prediction of plant-derived compounds for anti-COVID-19 therapy [99]. Although comparable in terms of accuracy, their random forest model slightly outperformed the model presented in this study, recording a 46% higher recall and a 29% higher F1 score.

To further ensure the reliability of our models, we analysed the applicability domain, which defines the range within which model predictions can be considered reliable [73]. Coincidentally, the best models from each algorithm were trained on the data generated from the random selection of 5000 inactive compounds in addition to the original number of active compounds. The applicability domain contains 99.91% of the external test set, thus, within three standard deviations from the mean of the training data. This structural similarity between the training and test data contributed to the models’ strong performance and reliability [74,100].

The best-performing models for each algorithm were further validated with experimentally determined inhibitors of TLR4 (Table 5). These inhibitors have been shown in vivo or in vitro to inhibit the lipopolysaccharide (LPS)-induced production of pro-inflammatory cytokines. LPS, a vital component of the outer membrane of Gram-negative bacteria, is a pathogen-associated molecular pattern (PAMP) that stimulates TLR4, leading to the production of pro-inflammatory cytokines such as tumour necrosis factor ⍺ (TNF-⍺) and interleukin 6 (IL-6) [101]. The XGBoost model trained on the random 5000 data resampling technique successfully predicted all five experimentally determined inhibitors as active with prediction probabilities ranging from 0.83 to 0.997.

The deployed models were further utilised to predict the activity of inhibitors targeting JAK, NLRP3, and BTK (Figure 6). The inhibition of these targets has been shown to have anti-inflammatory effects on the COVID-19-induced cytokine storm [77,78,79]. Interestingly, TLR4 has been reported to trigger the activation of the NF-κB pathway, while other studies also indicate that BTK and its upstream activator, haematopoietic cell kinase (HCK), are involved in toll-like receptor signalling [9,82]. The XGBoost model predicted all these inhibitors as active with prediction probabilities ranging from 0.992 to 0.996. These findings suggest the potential utility of the developed models in the prediction of anti-inflammatory compounds for study on the possible alleviation of the COVID-19-induced cytokine storm. However, experimental and clinical validation is required to gain deeper insights into the biology of hyperinflammation and immune response.

The study focused on tree-based, ensemble, and non-parametric supervised methods, namely random forest, decision tree, AdaBoost, XGBoost, and KNN, due to their established performance in QSAR modelling. Future work will incorporate deep learning methods as they demonstrate superior performance compared to traditional ML techniques in applications including drug-induced liver injury prediction and the effect of endocrine-disrupting chemicals on human health [102,103]. QSAR deep learning methods have achieved high accuracy (>0.90) in qualitative predictions and excellent quantitative performance on large datasets (coefficient of determination (R²) = 0.80, predictive square correlation coefficient (Q²) = 0.86) [102]. These approaches show great promise in predicting drug-target interactions and identifying potential inhibitors for drug discovery, utilising architectures such as CNN-RNN hybrids and graph representation learning [104,105]. Incorporating deep learning methods in future studies is a plausible next step in addition to ensemble methods that combine different models to enhance prediction accuracy [106].

By integrating the best-performing model into the AICpred web application, researchers can utilise these models for drug repurposing or screening of large compound libraries for the identification of novel therapeutic agents targeting diseases where TLR4 plays a major role in the mechanisms. However, this study is limited by the highly imbalanced datasets used for the training. Hence, highlighting the need for more high-throughput screening studies to unravel inhibitors of TLR4. Future studies may benefit from larger and more balanced datasets. Additionally, computational predictions of AICpred must be corroborated with in vitro and in vivo studies to gain insights into their mechanism and effects on mitigating hyper-inflammation. The study did not incorporate information about the mechanism of action (MOA) and binding modes but rather used the available bioactive datasets [107], which would have been essential for post-computational insights [107,108]. Future studies may benefit from the incorporation of MOA and binding mode information in training and testing datasets to enhance model predictive efficiency.

5. Conclusions

The binding of TLR4 to key receptors in the immune pathways elicits inflammation-associated mechanisms. In this study, we developed and deployed an XGBoost model as a web-based application for the discovery of potential anti-inflammatory compounds as plausible inhibitors of the TLR4-MyD88 signalling pathway. Robust ML models comprising random forest, adaptive boosting (AdaBoost), eXtreme gradient boosting (XGBoost), k-nearest neighbours, and decision trees were trained and evaluated using a 10-fold cross-validation strategy. Among the models, XGBoost exhibited the best performance, achieving accuracy, balanced accuracy, precision, and recall scores of 0.994, 0.958, 1.000, and 0.917, respectively. The predictions via the XGBoost model were further validated on experimentally determined TLR4 inhibitors and compounds associated with inflammation-mediated CS. Baricitinib, Ibrutinib, Nezulcitinib, MCC950, and Acalabrutinib were predicted as inhibitory compounds against TLR4-MyD88 binding with probabilities above 0.90. Similarly, TAK-242 (Resatorvid) and the benzisothiazole derivative (M62812), known inhibitors of TLR4, were predicted as bioactive agents within the applicability domain with prediction probabilities above 0.80. The XGBoost model has been integrated into a web application, known as AICpred, which is accessible at http://197.255.126.13:8080 (accessed on 24 December 2024). The platform enables the screening of large compound libraries to aid in identifying potential inhibitors of TLR4-MyD88 binding as possible anti-inflammatory molecules. Targeting the TLR4-MyD88 binding serves as a plausible starting point for designing therapeutic agents against CS, which is implicated in COVID-19 pathophysiology.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/info16010034/s1, Table S1: Performance of the eXtreme Gradient Boosting (XGBoost) classifier during a 10-fold cross-validation and on the external test data. The data shows results for all iterations of data resampling techniques undertaken in the study; Table S2: Performance of the K-Nearest Neighbors (KNN) classifier during a 10-fold cross-validation and on the external test data. The data shows results for all iterations of data resampling techniques undertaken in the study; Table S3: Performance of the random forest classifier during a 10-fold cross-validation and on the external test data. The data shows results for all iterations of data resampling techniques undertaken in the study; Table S4: Performance of the decision tree classifier during a 10-fold cross-validation and on the external test data. The data shows results for all iterations of data resampling techniques undertaken in the study; Table S5: Performance of the Adaptive Boosting (AdaBoost) classifier during a 10-fold cross-validation and on the external test data. The data shows results for all iterations of data resampling techniques undertaken in the study.

Author Contributions

Conceptualization, S.K.K., M.D.W., L.N.F.-N., C.A., U.S.N. and S.B.B.; methodology, S.K.K., L.N.F.-N., C.A., S.B.B., U.S.N. and I.N.D.; validation, L.N.F.-N., O.A., K.A.-M. and C.A.; formal analysis, L.N.F.-N., W.A.M.III and C.A.; writing—original draft preparation, L.N.F.-N.; writing—review and editing, L.N.F.-N., C.A., O.A., O.A., C.F.H., W.A.M.III and S.K.K.; supervision, S.K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Associated code is available at github.com/cyrilakafia/aicpred-code.git (accessed on 24 December 2024) and web server is found at http://197.255.126.13:8080 (accessed on 24 December 2024).

Acknowledgments

This manuscript is dedicated to Michael David Wilson, a professor of parasitology who passed away during the revision of the manuscript. Wilson was the former deputy director of the Noguchi Memorial Institute for Medical Research, University of Ghana. He dedicated his entire life to the mentoring of young scientists in the subregion. We would also like to acknowledge Kwabena Afari Fuachie and Donald Kwame Asiedu of MinoHealth AI Labs for the technical support in developing the web server.

Conflicts of Interest

The authors declare no conflicts of interest.

List of Abbreviations

Abbreviation	Definition
AdaBoost	Adaptive boosting
AUC	Area under the curve
AUROC	Area under the receiver operating characteristic
BTK	Bruton’s tyrosine kinase
COVID-19	coronavirus disease 2019
CS	Cytokine storm
DAMPs	damage-associated molecular patterns
FN	False negative
FP	False positive
IC₅₀	half-maximal inhibitory concentration
JAK	Janus kinase
KNN	k-nearest neighbours
MCC	Matthew’s correlation coefficient
ML	Machine Learning
NF-κB	Nuclear Factor-kappa B
NLRP3	Nod-like receptor family pyrin domain-containing 3
PAMPs	pathogen-associated molecular patterns
QSAR	Quantitative structure-activity relationship
SARS-CoV-2	Severe Acute Respiratory Syndrome Coronavirus 2
SMILES	Simplified Molecular Input Line Entry System
SMOTE	Synthetic minority oversampling technique
TLR4	Toll-like receptor 4
TN	True negative
TP	True positive
XGBoost	eXtreme gradient boosting

References

Molteni, M.; Gemma, S.; Rossetti, C. The Role of Toll-Like Receptor 4 in Infectious and Noninfectious Inflammation. Mediat. Inflamm. 2016, 2016, 6978936. [Google Scholar] [CrossRef] [PubMed]
Coondoo, A. Cytokines in Dermatology—A Basic Overview. Indian J. Dermatol. 2011, 56, 368–374. [Google Scholar] [CrossRef] [PubMed]
Wilson, M.S.; Metink-Kane, M.M. Cytokines, Inflammation and Pain. Bone 2012, 23, 1–7. [Google Scholar] [CrossRef]
Tang, L.; Yin, Z.; Hu, Y.; Mei, H. Controlling Cytokine Storm Is Vital in COVID-19. Front. Immunol. 2020, 11, 570993. [Google Scholar] [CrossRef] [PubMed]
Vardhana, S.A.; Wolchok, J.D. The Many Faces of the Anti-COVID Immune Response. J. Exp. Med. 2020, 217, e20200678. [Google Scholar] [CrossRef] [PubMed]
Chousterman, B.G.; Swirski, F.K.; Weber, G.F. Cytokine Storm and Sepsis Disease Pathogenesis. Semin. Immunopathol. 2017, 39, 517–528. [Google Scholar] [CrossRef] [PubMed]
Takeuchi, O.; Akira, S. MyD88 as a Bottle Neck in Toll/IL-1 Signaling. In Toll-like Receptor Family Members and Their Ligands; Beutler, B., Wagner, H., Eds.; Springer: Berlin/Heidelberg, Germany, 2002; pp. 155–164. ISBN 978-3-642-59430-4. [Google Scholar]
Ohnishi, H.; Tochio, H.; Kato, Z.; Orii, K.E.; Li, A.; Kimura, T.; Hiroaki, H.; Kondo, N.; Shirakawa, M. Structural Basis for the Multiple Interactions of the MyD88 TIR Domain in TLR4 Signaling. Proc. Natl. Acad. Sci. USA 2009, 106, 10260–10265. [Google Scholar] [CrossRef] [PubMed]
Liu, T.; Zhang, L.; Joo, D.; Sun, S.C. NF-ΚB Signaling in Inflammation. Signal Transduct. Target. Ther. 2017, 2, 17023. [Google Scholar] [CrossRef] [PubMed]
Cicchese, J.M.; Evans, S.; Hult, C.; Joslyn, L.R.; Wessler, T.; Millar, J.A.; Marino, S.; Cilfone, N.A.; Mattila, J.T.; Linderman, J.J.; et al. Dynamic Balance of Pro- and Anti-Inflammatory Signals Controls Disease and Limits Pathology. Immunol. Rev. 2018, 285, 147–167. [Google Scholar] [CrossRef] [PubMed]
Newton, K.; Dixit, V.M. Signaling in Innate Immunity and Inflammation. Cold Spring Harb. Perspect. Biol. 2012, 4, a006049. [Google Scholar] [CrossRef]
Pal, R.; Chaudhary, M.J. Pharmacotherapeutics for Cytokine Storm in COVID-19. In Stem Cells; Verma, Y.K., Satija, N.K., Raghav, P.K., Tyagi, N., Kumar, S., Eds.; Academic Press: Cambridge, MA, USA, 2024; pp. 101–125. ISBN 978-0-323-95545-4. [Google Scholar]
Avbelj, M.; Horvat, S.; Jerala, R. The Role of Intermediary Domain of MyD88 in Cell Activation and Therapeutic Inhibition of TLRs. J. Immunol. 2011, 187, 2394–2404. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Liang, X.; Bao, X.; Xiao, W.; Chen, G. Toll-like Receptor 4 (TLR4) Inhibitors: Current Research and Prospective. Eur. J. Med. Chem. 2022, 235, 114291. [Google Scholar] [CrossRef] [PubMed]
Khan, M.F.; Rashid, R.B.; Rashid, M.A. Identification of Natural Compounds with Analgesic and Antiinflammatory Properties Using Machine Learning and Molecular Docking Studies. Lett. Drug Des. Discov. 2021, 19, 256–262. [Google Scholar] [CrossRef]
Noordhuis, M.G.; Eijsink, J.J.H.; Roossink, F.; de Graeff, P.; Pras, E.; Schuuring, E.; Wisman, G.B.A.; de Bock, G.H.; van der Zee, A.G.J. Prognostic Cell Biological Markers in Cervical Cancer Patients Primarily Treated With (Chemo)Radiation: A Systematic Review. Int. J. Radiat. Oncol.*Biol.*Phys. 2011, 79, 325–334. [Google Scholar] [CrossRef]
Alotaibi, F.; Attique, M.; Khan, Y.D. AntiFlamPred: An Anti-Inflammatory Peptide Predictor for Drug Selection Strategies. Comput. Mater. Contin. 2021, 69, 1039–1055. [Google Scholar] [CrossRef]
Xu, Y.; Zhang, S.; Zhu, F.; Liang, Y. A Deep Learning Model for Anti-Inflammatory Peptides Identification Based on Deep Variational Autoencoder and Contrastive Learning. Sci. Rep. 2024, 14, 18451. [Google Scholar] [CrossRef] [PubMed]
Zhu, Z.; Rahman, Z.; Aamir, M.; Shah, S.Z.A.; Hamid, S.; Bilawal, A.; Li, S.; Ishfaq, M. Insight into TLR4 Receptor Inhibitory Activity via QSAR for the Treatment of Mycoplasma Pneumonia Disease. RSC Adv. 2023, 13, 2057–2069. [Google Scholar] [CrossRef] [PubMed]
Zdrazil, B.; Felix, E.; Hunter, F.; Manners, E.J.; Blackshaw, J.; Corbett, S.; de Veij, M.; Ioannidis, H.; Lopez, D.M.; Mosquera, J.F.; et al. The ChEMBL Database in 2023: A Drug Discovery Platform Spanning Multiple Bioactivity Data Types and Time Periods. Nucleic Acids Res. 2024, 52, D1180–D1192. [Google Scholar] [CrossRef]
Kaushik, D.; Bhandari, R.; Kuhad, A. TLR4 as a Therapeutic Target for Respiratory and Neurological Complications of SARS-CoV-2. Expert Opin. Ther. Targets 2021, 25, 491–508. [Google Scholar] [CrossRef] [PubMed]
Aboudounya, M.M.; Holt, M.R.; Heads, R.J. SARS-CoV-2 Spike S1 Glycoprotein Is a TLR4 Agonist, Upregulates ACE2 Expression and Induces pro-Inflammatory M1 Macrophage Polarisation. bioRxiv 2021. [Google Scholar] [CrossRef]
Shirato, K.; Kizaki, T. SARS-CoV-2 Spike Protein S1 Subunit Induces pro-Inflammatory Responses via Toll-like Receptor 4 Signaling in Murine and Human Macrophages. Heliyon 2021, 7, e06187. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Kuang, M.; Li, J.; Zhu, L.; Jia, Z.; Guo, X.; Hu, Y.; Kong, J.; Yin, H.; Wang, X.; et al. SARS-CoV-2 Spike Protein Interacts with and Activates TLR41. Cell Res. 2021, 31, 818–820. [Google Scholar] [CrossRef] [PubMed]
Ma, Z.; Li, X.; Fan, R.L.Y.; Yang, K.Y.; Ng, C.S.H.; Lau, R.W.H.; Wong, R.H.L.; Ng, K.K.; Wang, C.C.; Ye, P.; et al. A Human Pluripotent Stem Cell-Based Model of SARS-CoV-2 Infection Reveals an ACE2-Independent Inflammatory Activation of Vascular Endothelial Cells through TLR4. Stem Cell Rep. 2022, 17, 538–555. [Google Scholar] [CrossRef] [PubMed]
Davronova, R.; Adilovab, F. A Comparative Analysis of the Ensemble Methods for Drug Design. AIP Conf. Proc. 2020, 2365, 030001. [Google Scholar] [CrossRef]
Rosas-Jimenez, J.G.; Garcia-Revilla, M.A.; Madariaga-Mazon, A.; Martinez-Mayorga, K. Predictive Global Models of Cruzain Inhibitors with Large Chemical Coverage. ACS Omega 2021, 6, 6722–6735. [Google Scholar] [CrossRef] [PubMed]
Song, Y.Y.; Lu, Y. Decision Tree Methods: Applications for Classification and Prediction. Shanghai Arch. Psychiatry 2015, 27, 130. [Google Scholar] [CrossRef] [PubMed]
Reddy, A.S.; Kumar, S.; Garg, R. Hybrid-Genetic Algorithm Based Descriptor Optimization and QSAR Models for Predicting the Biological Activity of Tipranavir Analogs for HIV Protease Inhibition. J. Mol. Graph. Model. 2010, 28, 852–862. [Google Scholar] [CrossRef] [PubMed]
Yu, B.; Qiu, W.; Chen, C.; Ma, A.; Jiang, J.; Zhou, H.; Ma, Q. SubMito-XGBoost: Predicting Protein Submitochondrial Localization by Fusing Multiple Feature Information and EXtreme Gradient Boosting. Bioinformatics 2020, 36, 1074–1081. [Google Scholar] [CrossRef] [PubMed]
Ding, Y.; Zhu, H.; Chen, R.; Li, R. An Efficient AdaBoost Algorithm with the Multiple Thresholds Classification. Appl. Sci. 2022, 12, 5872. [Google Scholar] [CrossRef]
Noviandy, T.R.; Idroes, G.M.; Hardi, I. Machine Learning Approach to Predict AXL Kinase Inhibitor Activity for Cancer Drug Discovery Using XGBoost and Bayesian Optimization. J. Soft Comput. Data Min. 2024, 5, 46–56. [Google Scholar] [CrossRef]
Robles, J.; Sotelo, F.; Rojas, C.; Hurtado, J.; Lopez, J. Performance Analysis of XGBoost Models with Ultrafast Shape Recognition Descriptors in Ligand-Based Virtual Screening. ACM Int. Conf. Proc. Ser. 2021, 7, 8–14. [Google Scholar] [CrossRef]
Wu, Z.; Zhu, M.; Kang, Y.; Leung, E.L.H.; Lei, T.; Shen, C.; Jiang, D.; Wang, Z.; Cao, D.; Hou, T. Do We Need Different Machine Learning Algorithms for QSAR Modeling? A Comprehensive Assessment of 16 Machine Learning Algorithms on 14 QSAR Data Sets. Brief. Bioinform. 2021, 22, bbaa321. [Google Scholar] [CrossRef] [PubMed]
Liu, J.-J.; Liu, J.-C. Permeability Predictions for Tight Sandstone Reservoir Using Explainable Machine Learning and Particle Swarm Optimization. Geofluids 2022, 2022, 2263329. [Google Scholar] [CrossRef]
Bharti, D.R.; Lynn, A.M. QSAR Based Predictive Modeling for Anti-Malarial Molecules. Bioinformation 2017, 13, 154. [Google Scholar] [CrossRef] [PubMed]
Darnag, R.; Minaoui, B.; Fakir, M. QSAR Models for Prediction Study of HIV Protease Inhibitors Using Support Vector Machines, Neural Networks and Multiple Linear Regression. Arab. J. Chem. 2017, 10, S600–S608. [Google Scholar] [CrossRef]
Nalepa, J.; Kawulok, M. Selecting Training Sets for Support Vector Machines: A Review. Artif. Intell. Rev. 2018, 52, 857–900. [Google Scholar] [CrossRef]
Liu, C.; Wang, W.; Wang, M.; Lv, F.; Konan, M. An Efficient Instance Selection Algorithm to Reconstruct Training Set for Support Vector Machine. Knowl. Based Syst. 2017, 116, 58–73. [Google Scholar] [CrossRef]
Cervantes, J.; García Lamont, F.; López-Chau, A.; Rodríguez Mazahua, L.; Sergio Ruíz, J. Data Selection Based on Decision Tree for SVM Classification on Large Data Sets. Appl. Soft Comput. 2015, 37, 787–798. [Google Scholar] [CrossRef]
Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why Do Tree-Based Models Still Outperform Deep Learning on Typical Tabular Data? Adv. Neural Inf. Process. Syst. 2022, 35, 507–520. [Google Scholar]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar] [CrossRef]
Genuer, R.; Poggi, J.M.; Tuleau-Malot, C.; Villa-Vialaneix, N. Random Forests for Big Data. Big Data Res. 2017, 9, 28–46. [Google Scholar] [CrossRef]
Adams, J.; Agyenkwa-Mawuli, K.; Agyapong, O.; Wilson, M.D.; Kwofie, S.K. EBOLApred: A Machine Learning-Based Web Application for Predicting Cell Entry Inhibitors of the Ebola Virus. Comput. Biol. Chem. 2022, 101, 107766. [Google Scholar] [CrossRef]
Wu, X.; Gong, J.; Ren, S.; Tan, F.; Wang, Y.; Zhao, H. A Machine Learning-Based QSAR Model Reveals Important Molecular Features for Understanding the Potential Inhibition Mechanism of Ionic Liquids to Acetylcholinesterase. Sci. Total Environ. 2024, 915, 169974. [Google Scholar] [CrossRef]
Wang, Y.; Xiao, J.; Suzek, T.O.; Zhang, J.; Wang, J.; Bryant, S.H. PubChem: A Public Information System for Analyzing Bioactivities of Small Molecules. Nucleic Acids Res. 2009, 37, W623–W633. [Google Scholar] [CrossRef]
Ii, M.; Matsunaga, N.; Hazeki, K.; Nakamura, K.; Takashima, K.; Seya, T.; Hazeki, O.; Kitazaki, T.; Iizawa, Y. A Novel Cyclohexene Derivative, Ethyl (6R)-6-[N-(2-Chloro-4-Fluorophenyl)Sulfamoyl]Cyclohex-1-Ene-1-Carboxylate (TAK-242), Selectively Inhibits Toll-Like Receptor 4-Mediated Cytokine Production through Suppression of Intracellular Signaling. Mol. Pharmacol. 2006, 69, 1288–1295. [Google Scholar] [CrossRef] [PubMed]
Nakamura, M.; Shimizu, Y.; Sato, Y.; Miyazaki, Y.; Satoh, T.; Mizuno, M.; Kato, Y.; Hosaka, Y.; Furusako, S. Toll-like Receptor 4 Signal Transduction Inhibitor, M62812, Suppresses Endothelial Cell and Leukocyte Activation and Prevents Lethal Septic Shock in Mice. Eur. J. Pharmacol. 2007, 569, 237–243. [Google Scholar] [CrossRef] [PubMed]
Švajger, U.; Brus, B.; Turk, S.; Sova, M.; Hodnik, V.; Anderluh, G.; Gobec, S. Novel Toll-like Receptor 4 (TLR4) Antagonists Identified by Structure- and Ligand-Based Virtual Screening. Eur. J. Med. Chem. 2013, 70, 393–399. [Google Scholar] [CrossRef]
Wang, X.; Zhang, Y.; Peng, Y.; Hutchinson, M.R.; Rice, K.C.; Yin, H.; Watkins, L.R. Pharmacological Characterization of the Opioid Inactive Isomers (+)-Naltrexone and (+)-Naloxone as Antagonists of Toll-like Receptor 4. Br. J. Pharmacol. 2016, 173, 856–869. [Google Scholar] [CrossRef]
Lee, H.K.; Brown, S.J.; Rosen, H.; Tobias, P.S. Application of β-Lactamase Enzyme Complementation to the High-Throughput Screening of Toll-like Receptor Signaling Inhibitors. Mol. Pharmacol. 2007, 72, 868–875. [Google Scholar] [CrossRef]
Lee, H.-K.; Dunzendorfer, S.; Tobias, P.S. Cytoplasmic Domain-Mediated Dimerizations of Toll-like Receptor 4 Observed by β-Lactamase Enzyme Fragment Complementation. J. Biol. Chem. 2004, 279, 10564–10574. [Google Scholar] [CrossRef] [PubMed]
Hong, H.; Xie, Q.; Ge, W.; Qian, F.; Fang, H.; Shi, L.; Su, Z.; Perkins, R.; Tong, W. Mold2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics. J. Chem. Inf. Model. 2008, 48, 1337–1344. [Google Scholar] [CrossRef]
Kwofie, S.K.; Agyenkwa-Mawuli, K.; Adams, J.; Anteh, P.; Agyapong, O.; Wilson, M.D. Deep Neural Networks Predict Inhibitors of Schistosoma Mansoni Thioredoxin Glutathione Reductase (SmTGR). J. Comput. Biophys. Chem. 2022, 21, 237–247. [Google Scholar] [CrossRef]
Bajusz, D.; Rácz, A.; Héberger, K. Chemical Data Formats, Fingerprints, and Other Molecular Descriptions for Database Analysis and Searching. Compr. Med. Chem. III 2017, 3–8, 329–378. [Google Scholar] [CrossRef]
Mold2 News and Publications|FDA. Available online: https://www.fda.gov/science-research/mold2/mold2-news-and-publications (accessed on 7 September 2024).
Fan, C.; Chen, M.; Wang, X.; Wang, J.; Huang, B. A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery from Building Operational Data. Front. Energy Res. 2021, 9, 652801. [Google Scholar] [CrossRef]
Dy, J.G.; Brodley, C.E. Feature Selection for Unsupervised Learning. J. Mach. Learn. Res. 2004, 5, 845–889. [Google Scholar]
Korkmaz, S. Deep Learning-Based Imbalanced Data Classification for Drug Discovery. J. Chem. Inf. Model. 2020, 60, 4180–4190. [Google Scholar] [CrossRef] [PubMed]
Blagus, R.; Lusa, L. SMOTE for High-Dimensional Class-Imbalanced Data. BMC Bioinform. 2013, 14, 106. [Google Scholar] [CrossRef]
Dubey, R.; Zhou, J.; Wang, Y.; Thompson, P.M.; Ye, J. Analysis of Sampling Techniques for Imbalanced Data: An N = 648 ADNI Study. Neuroimage 2014, 87, 220–241. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Schapire, R.E. The Boosting Approach to Machine Learning: An Overview. In Nonlinear Estimation and Classification; Springer: New York, NY, USA, 2003. [Google Scholar]
Zhang, Z. Introduction to Machine Learning: K-Nearest Neighbors. Ann. Transl. Med. 2016, 4, 218. [Google Scholar] [CrossRef]
Steurer, M.; Hill, R.J.; Pfeifer, N. Metrics for Evaluating the Performance of Machine Learning Based Automated Valuation Models. J. Prop. Res. 2021, 38, 99–129. [Google Scholar] [CrossRef]
Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; Bing, G. Learning from Class-Imbalanced Data: Review of Methods and Applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
García, V.; Mollineda, R.A.; Sánchez, J.S. Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions. Lecture Notes in Computer Science. In Proceedings of the Iberian Conference on Pattern Recognition and Image Analysis, Póvoa de Varzim, Portugal, 10–12 June 2009; Volume 5524, pp. 441–448. [Google Scholar] [CrossRef]
Binkhonain, M.; Zhao, L. A Review of Machine Learning Algorithms for Identification and Classification of Non-Functional Requirements. Expert Syst. Appl. X 2019, 1, 100001. [Google Scholar] [CrossRef]
Lipton, Z.C.; Elkan, C.; Narayanaswamy, B. Thresholding Classifiers to Maximize F1 Score. arXiv 2014, arXiv:1402.1892. [Google Scholar]
Chicco, D.; Jurman, G. The Advantages of the Matthews Correlation Coefficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
Sousa, L.; Suter, F.; Goldman, A.; Sakellariou, R.; Sinnen, O. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Introduction. Lecture Notes in Computer Science. In Proceedings of the European Conference on Parallel Processing, Bordeaux, France, 29 August–2 September 2011; Volume 6852, p. 154. [Google Scholar] [CrossRef]
Kar, S.; Roy, K.; Leszczynski, J. Applicability Domain: A Step toward Confident Predictions and Decidability for QSAR Modeling. In Methods in Molecular Biology; Humana Press Inc.: Totowa, NJ, USA, 2018; Volume 1800, pp. 141–169. [Google Scholar]
Hanser, T.; Barber, C.; Marchaland, J.F.; Werner, S. Applicability Domain: Towards a More Formal Definition. SAR QSAR Environ. Res. 2016, 27, 893–909. [Google Scholar] [CrossRef]
Roy, K.; Kar, S.; Ambure, P. On a Simple Approach for Determining Applicability Domain of QSAR Models. Chemom. Intell. Lab. Syst. 2015, 145, 22–29. [Google Scholar] [CrossRef]
Relan, K. Deploying Flask Applications BT. In Building REST APIs with Flask: Create Python Web Services with MySQL; Relan, K., Ed.; Apress: Berkeley, CA, USA, 2019; pp. 159–182. ISBN 978-1-4842-5022-8. [Google Scholar]
Gasston, P. The Modern Web: Multi-Device Web Development with HTML5, CSS3, and JavaScript; No Starch Press: San Francisco, CA, USA, 2013. [Google Scholar]
Peterson, D.; Damsky, W.; King, B. The Use of Janus Kinase Inhibitors in the Time of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). J. Am. Acad. Dermatol. 2020, 82, e223. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Qin, C.; Fei, Y.; Shen, M.; Zhou, Y.; Zhang, Y.; Zeng, X.; Zhang, S. Anti-Inflammatory and Immune Therapy in Severe Coronavirus Disease 2019 (COVID-19) Patients: An Update. Clin. Immunol. 2022, 239, 109022. [Google Scholar] [CrossRef] [PubMed]
Rodrigues, T.S.; de Sá, K.S.G.; Ishimoto, A.Y.; Becerra, A.; Oliveira, S.; Almeida, L.; Gonçalves, A.V.; Perucello, D.B.; Andrade, W.A.; Castro, R.; et al. Inflammasomes Are Activated in Response to SARS-CoV-2 Infection and Are Associated with COVID-19 Severity in Patients. J. Exp. Med. 2021, 218, e20201707. [Google Scholar] [CrossRef]
Kalil, A.C.; Patterson, T.F.; Mehta, A.K.; Tomashek, K.M.; Wolfe, C.R.; Ghazaryan, V.; Marconi, V.C.; Ruiz-Palacios, G.M.; Hsieh, L.; Kline, S.; et al. Baricitinib plus Remdesivir for Hospitalized Adults with COVID-19. N. Engl. J. Med. 2021, 384, 795–807. [Google Scholar] [CrossRef] [PubMed]
Singh, D.; Bogus, M.; Moskalenko, V.; Lord, R.; Moran, E.J.; Crater, G.D.; Bourdet, D.L.; Pfeifer, N.D.; Woo, J.; Kaufman, E.; et al. A Phase 2 Multiple Ascending Dose Study of the Inhaled Pan-JAK Inhibitor Nezulcitinib (TD-0903) in Severe COVID-19. Eur. Respir. J. 2021, 58, 2100673. [Google Scholar] [CrossRef] [PubMed]
Treon, S.P.; Castillo, J.J.; Skarbnik, A.P.; Soumerai, J.D.; Ghobrial, I.M.; Guerrera, M.L.; Meid, K.; Yang, G. The BTK Inhibitor Ibrutinib May Protect against Pulmonary Injury in COVID-19–Infected Patients. Blood 2020, 135, 1912–1915. [Google Scholar] [CrossRef] [PubMed]
Roschewski, M.; Lionakis, M.S.; Sharman, J.P.; Roswarski, J.; Goy, A.; Monticelli, M.A.; Roshon, M.; Wrzesinski, S.H.; Desai, J.V.; Zarakas, M.A.; et al. Inhibition of Bruton Tyrosine Kinase in Patients with Severe COVID-19. Sci. Immunol. 2020, 5, 110. [Google Scholar] [CrossRef]
Zeng, J.; Xie, X.; Feng, X.L.; Xu, L.; Han, J.B.; Yu, D.; Zou, Q.C.; Liu, Q.; Li, X.; Ma, G.; et al. Specific Inhibition of the NLRP3 Inflammasome Suppresses Immune Overactivation and Alleviates COVID-19 like Pathology in Mice. EBioMedicine 2022, 75, 103803. [Google Scholar] [CrossRef] [PubMed]
Reddy, H.; Javvaji, C.K.; Malali, S.; Kumar, S.; Acharya, S.; Toshniwal, S. Navigating the Cytokine Storm: A Comprehensive Review of Chemokines and Cytokines in Sepsis. Cureus 2024, 16, e54275. [Google Scholar] [CrossRef] [PubMed]
Ganganwar, V. An Overview of Classification Algorithms for Imbalanced Datasets. Int. J. Emerg. Technol. Adv. Eng. 2012, 2, 42–47. [Google Scholar]
Ghorbani, R.; Ghousi, R. Comparing Different Resampling Methods in Predicting Students’ Performance Using Machine Learning Techniques. IEEE Access 2020, 8, 67899–67911. [Google Scholar] [CrossRef]
Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature Selection in Machine Learning: A New Perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
Tanha, J.; Abdi, Y.; Samadi, N.; Razzaghi, N.; Asadpour, M. Boosting Methods for Multi-Class Imbalanced Data Classification: An Experimental Review. J. Big Data 2020, 7, 70. [Google Scholar] [CrossRef]
Sharma, G. Pros and Cons of Different Sampling Techniques. Int. J. Appl. Res. 2017, 3, 749–752. [Google Scholar]
Rout, N.; Mishra, D.; Mallick, M.K. An Advance Extended Binomial GLMBoost Ensemble Method with Synthetic Minority Over-Sampling Technique for Handling Imbalanced Datasets. Int. J. Electr. Comput. Eng. 2023, 13, 4357–4368. [Google Scholar] [CrossRef]
Wang, S.; Yao, X. Diversity Analysis on Imbalanced Data Sets by Using Ensemble Models. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009—Proceedings, Nashville, TN, USA, 30 March–2 April 2009; pp. 324–331. [Google Scholar] [CrossRef]
Srisongkram, T.; Khamtang, P.; Weerapreeyakul, N. Prediction of KRASG12C Inhibitors Using Conjoint Fingerprint and Machine Learning-Based QSAR Models. J. Mol. Graph. Model. 2023, 122, 108466. [Google Scholar] [CrossRef] [PubMed]
Idakwo, G.; Thangapandian, S.; Luttrell, J.; Li, Y.; Wang, N.; Zhou, Z.; Hong, H.; Yang, B.; Zhang, C.; Gong, P. Structure–Activity Relationship-Based Chemical Classification of Highly Imbalanced Tox21 Datasets. J. Cheminform. 2020, 12, 66. [Google Scholar] [CrossRef]
Song, J.; Xu, Z.; Cao, L.; Wang, M.; Hou, Y.; Li, K. The Discovery of New Drug-Target Interactions for Breast Cancer Treatment. Molecules 2021, 26, 7474. [Google Scholar] [CrossRef] [PubMed]
Doğuç, Ö.; Silahtaroğlu, G.; Canbolat, Z.N.; Hambarde, K.; Yiğitbaşı, A.A.; Gökay, H.; Yılmaz, M. Diagnosis of COVID-19 Via Patient Breath Data Using Artificial Intelligence. Emerg. Sci. J. 2023, 7, 105–113. [Google Scholar] [CrossRef]
Kumari, C.; Abulaish, M.; Subbarao, N. Using SMOTE to Deal with Class-Imbalance Problem in Bioactivity Data to Predict MTOR Inhibitors. SN Comput. Sci. 2020, 1, 150. [Google Scholar] [CrossRef]
Dhall, A.; Patiyal, S.; Sharma, N.; Devi, N.L.; Raghava, G.P.S. Computer-Aided Prediction of Inhibitors against STAT3 for Managing COVID-19 Associated Cytokine Storm. Comput. Biol. Med. 2021, 137, 104780. [Google Scholar] [CrossRef] [PubMed]
Erlina, L.; Paramita, R.I.; Kusuma, W.A.; Fadilah, F.; Tedjo, A.; Pratomo, I.P.; Ramadhanti, N.S.; Nasution, A.K.; Surado, F.K. Virtual Screening of Indonesian Herbal Compounds as COVID-19 Supportive Therapy: Machine Learning and Pharmacophore Modeling Approaches. BMC Complement. Med. Ther. 2022, 22, 207. [Google Scholar] [CrossRef]
Mathea, M.; Klingspohn, W.; Baumann, K. Chemoinformatic Classification Methods and Their Applicability Domain. Mol. Inform. 2016, 35, 160–180. [Google Scholar] [CrossRef]
Lu, Y.C.; Yeh, W.C.; Ohashi, P.S. LPS/TLR4 Signal Transduction Pathway. Cytokine 2008, 42, 145–151. [Google Scholar] [CrossRef] [PubMed]
Heo, S.; Safder, U.; Yoo, C. Deep Learning Driven QSAR Model for Environmental Toxicology: Effects of Endocrine Disrupting Chemicals on Human Health. Environ. Pollut. 2019, 253, 29–38. [Google Scholar] [CrossRef] [PubMed]
Gini, G.; Zanoli, F. Machine Learning and Deep Learning Methods in Ecotoxicological QSAR Modeling. In Ecotoxicological QSARs; Roy, K., Ed.; Springer: New York, NY, USA, 2020; pp. 111–149. ISBN 978-1-0716-0150-1. [Google Scholar]
Yu, L.; Xue, L.; Liu, F.; Li, Y.; Jing, R.; Luo, J. The Applications of Deep Learning Algorithms on in Silico Druggable Proteins Identification. J. Adv. Res. 2022, 41, 219–231. [Google Scholar] [CrossRef]
Li, G.; Zhao, B.; Su, X.; Yang, Y.; Hu, P.; Zhou, X.; Hu, L. Discovering Consensus Regions for Interpretable Identification of RNA N6-Methyladenosine Modification Sites via Graph Contrastive Clustering. IEEE J. Biomed. Health Inform. 2024, 28, 2362–2372. [Google Scholar] [CrossRef] [PubMed]
Kwon, S.; Bae, H.; Jo, J.; Yoon, S. Comprehensive Ensemble in QSAR Prediction for Drug Discovery. BMC Bioinform. 2019, 20, 521. [Google Scholar] [CrossRef]
Serra, A.; Önlü, S.; Coretto, P.; Greco, D. An Integrated Quantitative Structure and Mechanism of Action-Activity Relationship Model of Human Serum Albumin Binding. J. Cheminform. 2019, 11, 38. [Google Scholar] [CrossRef] [PubMed]
Wu, L.; Liu, Z.; Auerbach, S.; Huang, R.; Chen, M.; McEuen, K.; Xu, J.; Fang, H.; Tong, W. Integrating Drug’s Mode of Action into Quantitative Structure-Activity Relationships for Improved Prediction of Drug-Induced Liver Injury. J. Chem. Inf. Model. 2017, 57, 1000–1006. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Schematic diagram of XGBoost architecture. The terms “a_n” and “r_n” are the regularisation parameters and residuals computed for the nth tree in the architecture. The final model prediction is a weighted aggregation of the predictions made by its constituent trees [35].

Figure 2. A schematic diagram showing the development of the ML models. The compound data obtained from PubChem were analysed to understand the chemical space. After this, their molecular descriptors were computed with the Mold2 software. The data were pre-processed before use in the development of the ML models to handle duplicates, null values, and irrelevant non-compound features. Models were validated, and performance metrics were computed. The models were further validated using known inhibitors of TLR4 [47,48,49,50].

Figure 3. Data resampling techniques. The ratio of active compounds to inactive compounds after (A) no sampling technique (original data), (B) undersampling of the inactive class, (C) SMOTE, and (D) random selection of 5000 inactive compounds.

Figure 4. Performance of best models for each algorithm. A clustered bar plot showing the accuracy, precision, and recall of the best random forest, decision tree, AdaBoost, XGBoost, and KNN models on the external test data.

Figure 5. Receiver operating curves (ROCs) for the best XGBoost and decision tree models. ROC shows the true positive rate against the false positive rate on the external test set for the XGBoost, decision tree, random forest, AdaBoost, and KNN models trained with the random 5000 data resampling technique.

Figure 6. Applicability domain of developed models. Standardised descriptor values are plotted against the logarithm of their various PubChem Compound IDs (CIDs) for the data generated from the random selection of 5000 inactive compounds, as well as in addition to the original number of active compounds. This represents both training and held-out test compounds.

Figure 7. Web application interface. The interface shows the prediction of Baricitinib (PubChem CID: 44205240) using the web application AICpred. Here, the XGBoost classifier predicts Baricitinib as active against TLR4 with a prediction probability of 0.996. The applicability domain analysis on the right shows that Baricitinib is within the domain of applicability of the XGBoost model.

Table 1. Performance metrics for model evaluation. The accuracy, balanced accuracy, recall, precision, F1 score, and MCC were used to evaluate the developed ML models at both the cross-validation and testing stages.

Metrics	Mathematical Definition	Interpretation
Accuracy	$\frac{T P + T N}{T P + F P + F N + T N}$	1—Perfect 0—Poor
Recall	$\frac{T P}{T P + F N}$	1—Perfect 0—Poor
Precision	$\frac{T P}{T P + F P}$	1—Perfect 0—Poor
F1 score	$2 \times \frac{r e c a l l \times p r e c i s o n}{r e c a l l + p r e c i s i o n}$	1—Perfect 0—Poor
Balanced accuracy	$\frac{S e n s i t i v i t y + S p e c i f i c i t y}{2}$	1—Perfect 0—Poor
MCC	$\frac{(T P \times T N - F P \times F N)}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}}$	+1—Perfect −1—Poor

Table 2. Experimentally determined Toll-like receptor 4 (TLR4) inhibitors and their half-maximal inhibitory concentration (IC₅₀) values retrieved from the literature.

Known Inhibitor	IC₅₀	Reference
Resatorvid/TAK-242	11 to 33 nM	[47]
M62812	7 µM	[48]
ZINC25778142	16.6 µM	[49]
(+)-Naloxone	105.5 µM	[50]
(+)-Naltrexone	94.4 µM	[50]

Table 3. Nod-like receptor family pyrin domain-containing 3 (NLRP3), Janus Kinase (JAK), and Bruton’s Tyrosine Kinase (BTK) inhibitors screened using the deployed XGBoost model.

Inhibitor	Target	PubChem CID	Reference
Baricitinib	JAK1 and JAK2	44205240	[80]
Nezulcitinib	All JAK isoforms	146421275	[81]
Ibrutinib	BTK	24821094	[82]
Acalabrutinib	BTK	71226662	[83]
MCC950	NLRP3	9910393	[84]

Table 4. Performance of the best models for each algorithm during a 10-fold cross-validation (CV) and on the external test data. All best-performing models for each algorithm were developed on the random 5000 data sampling technique.

Model	Process	Accuracy	Balanced Accuracy	Precision	Recall	F1 Score	AUROC	MCC
Random Forest	CV	0.972	0.787	1.000	0.573	0.723	0.971	0.743
Random Forest	Test	0.968	0.770	0.975	0.542	0.696	0.987	0.714
Decision Trees	CV	0.981	0.939	0.839	0.891	0.863	0.939	0.854
Decision Trees	Test	0.981	0.938	0.842	0.889	0.865	0.938	0.855
AdaBoost	CV	0.987	0.921	0.958	0.845	0.897	0.983	0.893
AdaBoost	Test	0.992	0.944	0.985	0.889	0.934	0.998	0.931
XGBoost	CV	0.994	0.958	1.000	0.915	0.955	0.998	0.954
XGBoost	Test	0.994	0.958	1.000	0.917	0.957	0.999	0.955
KNN	CV	0.942	0.604	0.685	0.215	0.323	0.757	0.360
KNN	Test	0.936	0.611	0.548	0.236	0.330	0.778	0.332

Table 5. Model validation with known inhibitors. The activity of five known inhibitors was predicted using the best-performing XGBoost model, which was trained on the random 5000 data resampling technique presented in this study. The prediction probability of the model is also presented.

Known Inhibitor	IC₅₀	Reference	XGBoost Prediction Probability
Resatorvid/TAK-242	11–33 nM	[47]	0.830
M62812	1–3 µg/mL	[48]	0.983
ZINC25778142	16.6 µM	[49]	0.997
(+)-Naloxone	105.5 µM	[50]	0.996
(+)-Naltrexone	94.4 µM	[50]	0.996

Table 6. Activity prediction of JAK, NLRP3, and BTK inhibitors using the AICpred web server. The applicability of the models to these inhibitors is also highlighted.

Inhibitor	XGBoost Prediction (Prediction Probability)	Within Applicability Domain
Baricitinib	Active (0.996)	Yes
Nezulcitinib	Active (0.994)	No
Ibrutinib	Active (0.995)	Yes
Acalabrutinib	Active (0.993)	Yes
MCC950	Active (0.996)	Yes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fry-Nartey, L.N.; Akafia, C.; Nkonu, U.S.; Baiden, S.B.; Dorvi, I.N.; Agyenkwa-Mawuli, K.; Agyapong, O.; Hayford, C.F.; Wilson, M.D.; Miller, W.A., III; et al. AICpred: Machine Learning-Based Prediction of Potential Anti-Inflammatory Compounds Targeting TLR4-MyD88 Binding Mechanism. Information 2025, 16, 34. https://doi.org/10.3390/info16010034

AMA Style

Fry-Nartey LN, Akafia C, Nkonu US, Baiden SB, Dorvi IN, Agyenkwa-Mawuli K, Agyapong O, Hayford CF, Wilson MD, Miller WA III, et al. AICpred: Machine Learning-Based Prediction of Potential Anti-Inflammatory Compounds Targeting TLR4-MyD88 Binding Mechanism. Information. 2025; 16(1):34. https://doi.org/10.3390/info16010034

Chicago/Turabian Style

Fry-Nartey, Lucindah N., Cyril Akafia, Ursula S. Nkonu, Spencer B. Baiden, Ignatus Nunana Dorvi, Kwasi Agyenkwa-Mawuli, Odame Agyapong, Claude Fiifi Hayford, Michael D. Wilson, Whelton A. Miller, III, and et al. 2025. "AICpred: Machine Learning-Based Prediction of Potential Anti-Inflammatory Compounds Targeting TLR4-MyD88 Binding Mechanism" Information 16, no. 1: 34. https://doi.org/10.3390/info16010034

APA Style

Fry-Nartey, L. N., Akafia, C., Nkonu, U. S., Baiden, S. B., Dorvi, I. N., Agyenkwa-Mawuli, K., Agyapong, O., Hayford, C. F., Wilson, M. D., Miller, W. A., III, & Kwofie, S. K. (2025). AICpred: Machine Learning-Based Prediction of Potential Anti-Inflammatory Compounds Targeting TLR4-MyD88 Binding Mechanism. Information, 16(1), 34. https://doi.org/10.3390/info16010034

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AICpred: Machine Learning-Based Prediction of Potential Anti-Inflammatory Compounds Targeting TLR4-MyD88 Binding Mechanism

Abstract

1. Introduction

2. Materials and Methods

2.1. Methods

2.2. Dataset Extraction

2.3. Descriptors Computation

2.4. Data Pre-Processing and Feature Selection

2.5. Model Training

2.6. Model Evaluation and Validation

2.7. Applicability Domain Analysis

2.8. Web Server Development

2.9. Screening of COVID-19-Induced CS Inhibitors

3. Results

3.1. Data Pre-Processing

3.2. Model Development and Evaluation

3.3. Validation with Known Inhibitors of TLR4

3.4. Results of Applicability Domain Analysis

3.5. Model Deployment

3.6. Evaluating COVID-19-Induced CS Inhibitors

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

List of Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI