Skip Content
You are currently on the new version of our website. Access the old version .
MoleculesMolecules
  • Review
  • Open Access

9 February 2023

Comparative Studies on Resampling Techniques in Machine Learning and Deep Learning Models for Drug-Target Interaction Prediction

and
School of Computer Sciences, Universiti Sains Malaysia, Pulau Pinang 11800, Malaysia
*
Author to whom correspondence should be addressed.

Abstract

The prediction of drug-target interactions (DTIs) is a vital step in drug discovery. The success of machine learning and deep learning methods in accurately predicting DTIs plays a huge role in drug discovery. However, when dealing with learning algorithms, the datasets used are usually highly dimensional and extremely imbalanced. To solve this issue, the dataset must be resampled accordingly. In this paper, we have compared several data resampling techniques to overcome class imbalance in machine learning methods as well as to study the effectiveness of deep learning methods in overcoming class imbalance in DTI prediction in terms of binary classification using ten (10) cancer-related activity classes from BindingDB. It is found that the use of Random Undersampling (RUS) in predicting DTIs severely affects the performance of a model, especially when the dataset is highly imbalanced, thus, rendering RUS unreliable. It is also found that SVM-SMOTE can be used as a go-to resampling method when paired with the Random Forest and Gaussian Naïve Bayes classifiers, whereby a high F1 score is recorded for all activity classes that are severely and moderately imbalanced. Additionally, the deep learning method called Multilayer Perceptron recorded high F1 scores for all activity classes even when no resampling method was applied.

1. Introduction

Drug discovery plays an important role in the pharmaceutical and medical fields. In drug discovery, the prediction of drug-target interactions (DTIs) is the key to identifying potential drugs. Drugs in DTIs are usually chemical compounds, while targets are proteins [1]. Through the prediction of DTIs, high profits can be obtained, especially in the pharmaceutical field [1]. Hence, the demand to further identify potential interactions among drugs and targets has spiked an interest among researchers in pharmaceutical labs to perform DTI predictions [1]. DTI prediction can be done through several computational methods such as molecular docking and machine learning [1]. To perform these computational methods, the chemical compounds are usually represented as a Simplified Molecular Input Line Entry System (SMILES) string [2]. The SMILES notation is a user-friendly and easy-to-interpret notation often used by scientists to represent molecular structures of a chemical compound in computers [2]. Chemical compounds can also be represented as molecular fingerprints. Extended-Connectivity Fingerprints (ECFP) is the latest fingerprint methodology that is widely used in computational chemistry such as in drug-target interaction prediction, similarity searching and clustering [3].
Machine learning is currently one of the most significant and rapidly evolving topics in computer-aided drug discovery [4]. Machine learning is the preferred choice for DTI prediction as it enables large-scale testing of candidates within a short span of time, hence, making it easier for scientists and researchers to predict DTIs [1]. Machine learning can further be classified into supervised and unsupervised learning [4]. Supervised machine learning algorithms such as Naïve Bayes (NB), Random Forest (RF), Support Vector Machine (SVM) and k-Nearest Neighbour (kNN) are widely used in drug discovery, specifically in drug-target interaction prediction [4,5]. Unsupervised machine learning algorithms such as k-Means Clustering and Hierarchical Clustering can also be used for drug-target interaction prediction [4].
Furthermore, in 2013, Merck posted a multi-problem Quantitative Structure-Activity Relationship (QSAR) machine learning challenge [6]. The challenge was won by a deep learning network with a relative accuracy of approximately 14% over Merck’s in-house systems and even resulted in an article in The New York Times [6]. Since the challenge, advanced chemocentric machine learning methods with a focus on emerging deep learning technologies are being presented [6]. Deep learning architectures such as Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs) appear to be well-suited for DTI prediction because they allow multitask learning and automatically construct complex features [6,7]. Deep learning is currently the most popular technique in drug discovery [8].
The accuracy of computational methods such as machine learning and deep learning plays a big part in determining whether a prediction was made successfully or not, and the predictive accuracy in any algorithm usually depends on the dataset that is being used. Including too many noisy variables in a dataset may reduce the accuracy of the prediction and lead to the over-fitting of data, which often produces promising but non-reproducible results [9]. Usually, real datasets cannot be directly fed into a learning algorithm due to class imbalance [10]. Class imbalance occurs when one class is represented by greatly more (majority) samples than another (minority) in binary classification [11]. When an imbalanced class is used, the classification of data may be negatively affected [10]. Machine learning algorithms are generally inclined by imbalanced data because most standard learning algorithms expect balanced class distribution [10]. Therefore, learning classification techniques are poorly achieved with imbalanced data [10,11]. Furthermore, according to [12], one of the major problems in DTI prediction is the existence of no true negative interactions and extreme class imbalance. These problems often affect the predictive performance of even powerful learning algorithms devastatingly [12].
According to [11,13], there are two possible ways to solve the problem of imbalanced classes, either by modifying the learning classification algorithms or modifying the data that is being presented to them. The authors in [11] have decided to modify the imbalanced data instead of modifying the learning algorithms because most machine learning algorithms are trained based on the assumption that the ratios of each class are equal. Furthermore, according to [14], one strategy to address class imbalance in learning algorithms is to generate one or more datasets, each with a different class distribution from the original dataset. Hence, two main categories of data resampling are utilized, undersampling and oversampling [13,14]. Undersampling involves the process of discarding instances from the majority class, while oversampling adds new instances to the minority class in order to achieve a balanced dataset, as discussed in [10,11,13,14].
One example of an oversampling strategy that can be applied is the Synthetic Minority Oversampling Technique (SMOTE) [13,14]. SMOTE oversamples the minority class in the dataset by synthesizing fake minority data into the original dataset, allowing the minority class to be balanced with the majority class [11,15]. Additionally, if new instances are added randomly to the minority class of the original dataset, the technique is called Random Oversampling (ROS) [13,14]. If the random discarding of instances from the majority class is performed, the process is known as Random Undersampling (RUS). Other resampling strategies include cross-validation and Adaptive Synthetic Resampling (ADASYN) [9,14,15]. Cross-validation is a resampling method where parts of the original dataset are sequentially left out and a multivariable analysis is conducted repeatedly until the entire sample has been assessed [16].
Nevertheless, the problems of imbalanced datasets in the binary classification of DTI prediction have not been properly solved yet. Class imbalance in general was addressed by various authors in other fields. For example, the authors in [10] performed an analysis of resampling an imbalanced class of a heart failure dataset, while in [11], the authors performed experiments to compare various resampling strategies on a clinical dataset. Furthermore, the authors of [15] have also performed an analysis on the in-silico prediction of blood-brain permeability of compounds using machine learning and resampling methods to overcome class imbalance. However, no comparative studies have been done on chemical datasets to determine the best resampling method to be used in the prediction of drug-target interactions. As mentioned earlier, the authors in [12] focused on the binary classification of multiple activity classes at once, instead of a single activity class. Moreover, the authors even proposed a new learning method where DTI prediction is addressed as a multi-output prediction task by learning ensembles of multi-output bi-clustering trees (eBICT) instead of using the available resampling techniques to balance their dataset, which leaves the problem of class imbalance in the binary classification of a single activity class in DTI prediction unanswered.
The effectiveness of deep learning algorithms in DTI prediction is also a popular topic of research in chemoinformatics. As previously mentioned, deep learning is currently the most successful technique in drug discovery and it has been proven to be successful in many other chemoinformatic fields, such as drug toxicity prediction and drug synergy prediction [8]. Moreover, in recent years, deep learning algorithms such as Artificial Neural Networks (ANNs), Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have won numerous contests in pattern recognition and machine learning [17]. Deep learning algorithms are more advanced and complex and they are suited for drug-target interaction prediction because they allow multi-task learning as well as the extraction of complex features [7]. Furthermore, deep networks also provide hierarchical representations of a compound. However, the authors in [18] stated that in the existing methods to predict DTIs, this is generally treated as a binary classification problem which leads to severe class imbalance. Although the problem was addressed, the authors have instead decided to develop an ensemble classifier to overcome class imbalance by integrating several resampling techniques within the classifier itself [18]. Moreover, deep learning algorithms are also known to perform better than baseline machine learning algorithms and they are mostly used to represent complex features of a compound [7,19].
At the same time, [1,17] did not address the class imbalance problem in their datasets for DTI prediction, and yet both these works yielded high accuracy values without the use of any of the resampling methods mentioned by [13,14]. Hence, we would also like to take this opportunity to experiment with deep learning algorithms to study the effectiveness of deep learning methods without the use of any resampling method and compare its results with the machine learning algorithms using resampling techniques to overcome the class imbalance problem in DTI prediction. Furthermore, recent DTI problems are also becoming more advanced and they are now treated as a multiclass classification problem rather than binary classification [6,7,18]. Thus, in this study, we will address the class imbalance problem as a binary classification problem in which we will be focusing on a single activity class for both machine learning and deep learning methods in the hope of overcoming class imbalance in the datasets.

3. Results and Discussion

3.1. Machine Learning vs Machine Learning with Sampling

After conducting all the experiments with various machine learning approaches with the presence and absence of resampling techniques, the performance of the machine learning models was assessed using the accuracy, precision and recall metrics for each activity class and the detailed results are thoroughly explained in the Supplementary Materials. Based on all the results discussed in terms of accuracy, precision and recall, we can say that the accuracy, precision and recall metrics are not enough to fully evaluate a model. The performance of the models highly depends on the dataset that is being fed to them for learning and predicting, regardless of whether it is balanced or unbalanced.
In this study, the dataset that we have used is highly imbalanced with the number of inactive compounds almost two times more than the number of active compounds for each of the activity class. It is also noticeable from the results different activity classes perform better and worse when different resampling methods are being used. This could be due to the difference in the number of active compounds per activity class. According to [44], the classification of the degree of imbalance for imbalanced data is as shown in Table 1.
Table 1. The classification of the degree of imbalance for imbalanced data.
To further investigate and determine the best machine learning classifier and resampling method combination, the results are sorted by the number of active compounds per activity class and the respective percentage of minority class for each cancer-related activity class (Table 2), which can be computed using (Equation (2)):
P e r c e n t a g e   o f   M i n o r i t y = T o t a l   N u m b e r   o f   M i n o r i t y   I n s t a n c e s T o t a l   N u m b e r   o f   I n s t a n c e s × 100 %
Table 2. The degree of imbalance and the percentage of minority class sorted by the number of active compounds per activity class.
In general, Random Undersampling (RUS) performs the worst among all the resampling methods when applied to any of the classifiers developed, whereby a drop is always observed in terms of the precision and recall of each classifier model. However, RUS seemed to perform quite well in terms of precision and recall on the RF classifier in terms of the activity class, PDGFR-B (Supplementary Materials, Figure S8). Note that PDGFR-B has only 41 active compounds (Table 2) and when RUS is implemented, the number of inactive compounds was severely undersampled, which will result in lost data; hence, although good recall and precision are recorded, the model is unreliable as most of the information about the activity class is lost due to severe undersampling.
Furthermore, SMOTE, ADAYSN, BorderlineSMOTE, SVM-SMOTE and SMOTETomek performed well with the RF classifier across all activity classes in terms of accuracy. Nonetheless, there is not one specific resampling method that stands out as the best, as different activity classes perform better with different resampling techniques. However, it is also important to understand that high accuracy does not mean that the model is performing predictions correctly, the precision and recall values also play an important role in evaluating a model’s performance, whereby high precision and recall values mean that the model is able to make correct predictions efficiently. Hence, in Table 3, it is found that the best resampling method varies across all activity classes when evaluated under different metrics. For example, in terms of precision, when no resampling is applied with RF across the activity classes BRAF, CDK-6, HER2, PDGFR-A and VEGFR-1, the precision value is the best compared to the other activity classes where resampling is required. It is also interesting to note that for activity classes that are severely and moderately imbalanced; MEK1, PDGFR-B, KRAS and PD-1 (Table 2), the precision is at its best when resampling methods are applied to the RF classifiers as depicted in Table 3.
Table 3. Best machine learning and resampling method pair for all cancer-related activity classes in terms of accuracy, precision and recall.
In terms of recall, the results differ from the accuracy and precision values. Here, the combination of the GNB classifier with SVM-SMOTE performs the best for the activity classes BRAF, CDK-6, HER2 and PD-1, while other activity classes perform the best with SMOTE (KRAS), BorderlineSMOTE (PDGFR-A) and SMOTETomek (MEK1). Nonetheless, from Table 3 we can observe that there is no single resampling method that stands out as the best across all activity classes. Nevertheless, after analysing the results for all the classifiers with different resampling methods across all activity classes, it is obvious that the RF classifier is the best classifier in terms of its average accuracy, precision and recall, and for each activity class, the best resampling method differs as well. To visualize this, the average performance of the classifiers was computed as shown in Figure 1.
Figure 1. Average accuracy, precision and recall of all machine learning classifiers in general.
Based on Figure 1, the RF classifier performs the best in terms of accuracy, precision and recall, with an average accuracy of 96.19%, an average precision of 86.26% and an average recall of 83.29%. In conclusion, RF is the best classifier in general, and GNB is the second-best classifier among the other classifiers in terms of precision and recall, while SVM and DT are considered the weakest classifiers due to their low accuracy, precision and recall recorded throughout all the experiments conducted.
To further determine the best resampling method for each activity class with its pairing classifier, further evaluation based on the recall and precision values was done in order to compute the F1 score. Since the results obtained so far are not enough for us to determine the best overall resampling method, the F1 score was computed (Supplementary Materials, Figures S11 and S12). Based on the F1 scores, we have managed to determine the best machine learning classifier and resampling method pair for each cancer-related activity class. The results, along with the respective F1 scores are shown in Table 4.
Table 4. Best machine learning classifier and resampling method pair for each activity with their respective F1 score.
However, there are a few exceptions that we would like to highlight, especially for the PDGFR-B and VEGFR-1 activity classes. It is observed that when the RF classifier is applied with RUS for PDGFR-B, the F1 score is the highest with a value of 93.50%. As discussed earlier, RUS is an unreliable resampling method as it severely undersamples the inactive data, which might result in a huge loss in data. Thus, we have decided to choose the next highest F1 score value for PDGFR-B, namely, 85.68% using the GNB classifier with SVM-SMOTE as the resampling method.
Furthermore, for VEGFR-1, the pairing is denoted as RF + None, which indicates that the F1 score is the highest when no resampling method is used with the RF classifier, while with other resampling methods, a huge drop in the F1 score can be observed (Supplementary Materials, Figure S11). To summarize, it is found that no specific resampling method performs well across all activity classes in the dataset. However, it was observed that the SVM-SMOTE resampling method showed promising results when paired with the RF and GNB classifiers on severely and moderately imbalanced activity classes such as MEK1, PDGFR-A, PDGFR-B, HER2 and KRAS.

3.2. Deep Learning with No Resampling

3.2.1. Convolutional Neural Network (CNN)

Generally, CNN performs well on all activity classes in terms of accuracy, with values ranging from 93.89% to 99.76%, which in general is better than all the ML classifiers even when no resampling methods are applied (Refer to Figure 2). However, a drop in the precision values is observed among all activity classes, especially for activity classes that are severely imbalanced—MEK1 and PDGFR-B (Table 2). A huge drop in both the precision and recall values is also observed in the PDGFR-A and VEGFR-1 activity classes.
Figure 2. The performance of CNN in terms of accuracy, precision, recall and F1 score without sampling.
Additionally, MEK1 also records a 100% recall value despite a drop in the precision value, which means that CNN was able to find all the true positives (positive interactions between the drug and target).

3.2.2. Multilayer Perceptron (MLP)

From Figure 3 we can observe that MLP in general performs well in terms of accuracy and F1 score. We have also found that when activity classes that are severely and moderately imbalanced, mainly CDK-6, KRAS, MEK1, and PDGFR-B (Table 2) were fed through the layers in MLP, the precision value recorded for all of them was 100% (Figure 3). Furthermore, with activity classes that are severely imbalanced (CDK-6 and PDGFR-B), a 100% accuracy, precision, recall and F1 score was recorded.
Figure 3. The performance of MLP in terms of accuracy, precision, recall and F1 score without sampling.
Both deep learning methods, CNN and MLP, performed exceptionally well in terms of accuracy across all the activity classes even when no resampling methods were applied. When comparing the performance of CNN and MLP, MLP on average was able to correctly predict positive interactions between a drug and target, especially for activity classes that are severely and moderately imbalanced. The F1 scores of MLP across all activity classes are also better than CNN. A further comparison in terms of the F1 score between the machine learning classifiers with their respective pairing with resampling methods and with MLP was done to study the effectiveness of deep learning methods in overcoming class imbalance in the binary classification of DTI prediction. The results of the comparison are summarized in Table 5.
Table 5. Comparison between the F1 score of machine learning paired with resampling and the F1 score of MLP.
From Table 5 we can observe that MLP performs better for almost all activity classes, except for the severely imbalanced activity class, PD-1 and the moderately imbalanced activity classes, PDGFR-A and VEGFR-1, whereby the F1 score is high for PD-1 when ADASYN is used with the RF classifier and when SVM-SMOTE is used with the GNB classifier for PDGFR-A. It is also interesting to note that for VEGFR-1, which is moderately imbalanced, no resampling is required when the RF classifier is used compared to MLP, where there was also no resampling method. Nonetheless, on average, MLP, the deep learning method where no resampling was applied, performed better than machine learning classifiers paired with various resampling methods with an average F1 score of 92.36%.

4. Materials and Methods

To overall methodology of this study is visualized in Figure 4.
Figure 4. The four main phases of the study: data acquisition, data preprocessing, predictive modelling and performance evaluation.

4.1. Data Acquisition

The data used in this study were obtained from BindingDB. The BindingDB database is a public, web-accessible database consisting of over 2 million binding data for over 8816 target proteins and 1 million small molecules [45]. However, in this paper, we have only selected 10 activity classes to demonstrate DTI prediction in terms of binary classification to minimize the scope of our search, which is the target proteins in cancers. The activity classes selected are popular proteins that are used to detect and treat cancer in the human body. There are many target proteins that play a huge role in detecting and treating various types of cancer such as the HER2 protein for breast cancer and the BRAF protein in lung cancer [46,47,48]. The selected activity classes are listed in Table 6. Table 6 also highlights the number of active compounds that are interacting with each respective activity class along with the class’s abbreviation.
Table 6. Selected Activity Classes.

4.2. Data Preprocessing

To prepare the selected activity classes for the prediction of drug-target interactions (DTIs), the SMILES notation of each active compound for each activity class in the BindingDB dataset will be converted into chemical fingerprints. Before converting the SMILES notations into chemical fingerprints, we will first perform SMILES canonicalization. The SMILES notation or Simplified Molecular Input Line Entry System is the most popular annotation used by scientists to represent a chemical compound [2]. In drug discovery, it is common that a structure may be represented as many different SMILES strings, as one can start with any atom in a molecule to derive a SMILES string. Hence, it is important to represent each chemical compound with a unique set of strings, which is where SMILES canonicalization comes in. This process is done to eliminate redundancy in the SMILES string representation of the chemical compounds. Redundancy happens when compounds that are similar form a different conformation that affects the SMILES notation. To perform SMILES canonicalization, we have used an open-source chemoinformatic extension called RDKit within KNIME, a popular data analytics software used in drug discovery [49,50]. Hence, the SMILES notation of each active compound for each of the activity classes’ files will be used as an input to convert them into canonical SMILES.
The fingerprint representation that we will be using is the Extended-Connectivity Fingerprints (ECFP) representation. The idea of ECFP is to encode the structure of a molecule in a bit string of ‘1′ to represent the presence, and ‘0′ the absence, of a particular substructure in the molecule [3]. In this paper, the canonical SMILES notations were converted to ECFP fingerprints of radius 4 or ECFP4. ECFP4 is widely used in the field of chemoinformatics, specifically in DTI prediction as demonstrated in [51,52]. To perform the conversion from canonical SMILES to ECFP4, we will also use KNIME. The conversion can be done using the Chemistry Development Kit (CDK) extension within KNIME. CDK is an open-source library that is used in chemoinformatics, specifically in drug discovery [53]. CDK fingerprints were chosen over the fingerprints in RDKit due to a compatibility issue with KNIME in which, when RDKit was used in converting canonical SMILES to ECFP4, KNIME issued multiple warnings and promptly crashes even before the conversion process was complete. Thus, the CDK extension was chosen as our next alternative to perform the conversion efficiently.
The first step in the conversion from SMILES to ECFP4 was to read all the files containing the SMILES notation (for each activity class) and then, using the notation as an input, convert it to canonical SMILES. Then, the canonical SMILES were converted into 1024 binary bits of ECFP4 fingerprints. Finally, the fingerprints were written into a new CSV file for further processing. The overall process is shown in the KNIME workflow in Figure 5.
Figure 5. The workflow developed to convert the SMILES notation to canonical SMILES using the RDKit extension and then into chemical fingerprints (ECFP4) using the CDK extension in KNIME.
After successfully converting from SMILES to ECFP4, the inactive data for each of the activity classes are generated through a dataset generation program, whereby, if the compound is active against a specified activity class, the target is set to ‘1′ and if it is found to be inactive, the target column will be set to ‘0′. The general idea of this program is to compare the active compounds of a specified activity class against all the other activity classes in order to mark the active and inactive compounds, whereby, if a match is found, then it is denoted as ‘1′ and if no match was found, then it is denoted as ‘0′. This process was repeated ten times for each activity class. Finally, to properly clean the data for further use, the duplicates and missing data were also removed. The final numbers of active and inactive compounds for each activity class at the end of this phase are listed in Table 7.
Table 7. The number of active and inactive compounds per activity class after data preprocessing and cleaning.

4.3. Predictive Modeling

The next step was to develop baseline machine learning models with the implementation of various resampling techniques. In this study, we have developed four baseline machine learning methods infused with six different data resampling techniques using the Python programming language with the help of the scikit-learn and imbalanced-learn libraries, as well as two baseline deep learning methods without the use of any resampling techniques.

4.3.1. Machine Learning vs Machine Learning with Resampling

The four machine learning methods that are developed are Support Vector Machine (SVM), Random Forest (RF), Gaussian Naïve Bayes (GNB) and Decision Tree (DT), and the six resampling methods that will be used are Synthetic Minority Oversampling Technique (SMOTE), Random Undersampling (RUS), Adaptive Synthetic (ADASYN), BorderlineSMOTE, SVM-SMOTE and SMOTETomek. A brief explanation of each resampling method is given in Table 8.
Table 8. Brief descriptions of the resampling methods used in this study.
Before applying any of the resampling techniques mentioned above within the machine learning models, the models will be optimized by tuning the parameters. Hyperparameter tuning is done so that the model with the best parameters is determined before feeding any data into it for learning. The parameters are defined based on the documentation in scikit-learn for each of the machine learning classifiers to be tuned [60]. For the RF classifier, the parameters that were considered for tuning are the number of estimators (n_estimators), the minimum number of samples required to split a node (min_samples_split), the minimum number of samples needed to be a lead node (min_samples_leaf), the maximum depth of the tree (max_depth) and the function to measure the quality of a split (criterion). For the GNB classifier, the parameter that was tuned is the var_smoothing parameter, which is a user-defined variable that is added to the default value of distribution variance, which is derived from the training dataset in order to smoothen the Gaussian curve in making predictions [60]. For the SVM classifier, the parameters considered for tuning are the C value, which is a regularization parameter, the kernel used (this variable helps the SVM to achieve the right mapping function to perform predictions) and finally, the gamma value which is the kernel coefficient that defines the influence of the training points in the dataset in order to perform predictions [61]. To tune the DT classifier, the parameters that were defined are similar to those in the RF classifier, with the addition of the max_features variable and the ccp_alpha values in which max_features is the maximum number of features considered to find the best split and ccp_alpha or Cost Complexity Pruning Alpha is a variable that can be used to control the number of nodes pruned in the tree to make predictions [61].
Hyperparameter tuning of the parameters defined above was done using the GridSearchCV function in scikit-learn [60]. To use GridSearchCV, we first need to create a dictionary of the parameters that we want to tune (as discussed above), and pass it into the function with the associated machine learning classifier. Then, the model will be fitted with the X (features, or ECFP4 fingerprints) and Y (target activity) data for the function to iterate and check for all combinations of all the parameters. The dataset used to perform hyperparameter tuning is a dummy dataset of our BindingDB dataset containing a random number of active and inactive compounds. At the end of this process, the best parameters are printed out and they are summarized in Table 9.
Table 9. Machine learning classifiers with their optimized parameters after hyperparameter tuning.
After tuning and determining the best estimator for all the machine learning models as depicted in the table above, the classifiers were infused with k-fold cross-validation where k is 10. The 10-fold cross-validation was applied to make sure that all the data are being tested. In this approach, the dataset is separated into ten (10) different folds. In each iteration of the code, one-fold is set aside for testing while the rest are used for training. This was done 10 times repeatedly until all the folds have been used for testing and training.

4.3.2. Deep Learning with No Sampling

The two (2) deep learning models developed are Convolutional Neural Network (CNN) and Multilayer Perceptron (MLP). The deep learning models are developed with the help of the TensorFlow library for Python programming with no resampling technique applied. The architecture of the CNN that was developed is similar to the CNN found in [51], with the addition of two Batch Normalization layers. A screenshot of the model’s architecture is shown in Figure 6 below.
Figure 6. Overall architecture of the developed CNN model.
The architecture of MLP is similar to the one developed in [43], whereby, instead of four linear layers, three linear layers of size 256, 128 and 64 neurons were developed for training and testing. Both these deep learning models were infused with 10-fold cross-validation and will be fitted and trained across 10 epochs with a batch size of 256, and this process was repeated 10 times to accommodate each activity class.

4.4. Performance Evaluation

To evaluate the performance of each of the machine learning and deep learning models with or without resampling strategies, we have used the scikit-learn library to compute the performance metrics of the models, mainly the Accuracy, Precision, Recall and F1 values. The formula for accuracy is (Equation (3)):
A c c u r a c y = T P + T N T P + T N + F P + F N
where TP means True Positive (the number of drug-target pairs predicted as interactions correctly), TN stands for True Negative (the number of negative pairs predicted as non-interactions correctly) [62], FP means False Positive (the number of negative drug-target pairs classified as interactions incorrectly, and FN means False Negative (the number of positive drug-target pairs classified as non-interactions incorrectly) [62]. Hence, the accuracy value of a model means the number of correct predictions of interactions over the total number of predictions. The formula for the precision value is (Equation (4)):
P r e c i s i o n = T P T P + F P
Thus, the precision value indicates how good the model is at predicting positive interactions between a drug (compound) and a target. Conversely, the recall value measures the model’s ability to detect positive samples, which in this case means detecting positive interactions between a drug and a target. The formula to calculate the recall value is (Equation (5)):
R e c a l l = T P T P + F N
The F1 score is defined as the harmonic mean of the precision and recall values and this metric is essential to compare the performance of the models when a resampling method is or is not present. F1 can be computed using (Equation (6)):
F 1 = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
A comparison and analysis will be made using the results obtained from the experiments of drug-target interaction (DTI) prediction using various resampling methods in machine learning models as well as DTI prediction in deep learning models without the use of any resampling methods for all 10 cancer-related activity classes selected.

5. Conclusions

In this study, machine learning and deep learning approaches were used to perform drug-target interactions on an imbalanced dataset by comparing different resampling techniques, namely, Synthetic Minority Oversampling Technique (SMOTE), Random Undersampling (RUS), Adaptive Synthetic (ADASYN), BorderlineSMOTE, SVM-SMOTE and SMOTE with Tomek Links (SMOTETomek). The imbalanced dataset consists of 10 different activity classes, all target proteins in cancer. The data collected in this study can be used as a benchmark dataset in order to predict drug-target interactions (DTIs) in cancer, especially in identifying and discovering new anticancer drugs in the near future. It was found that the use of Random Undersampling (RUS) in predicting drug-target interactions severely affects the performance of a model, especially when the dataset is highly imbalanced. Although high recall and F1 scores were observed for severely imbalanced activity classes, RUS is considered unreliable. This is due to the fact that, in drug-target interaction prediction, the active and inactive data within a dataset is extremely crucial in identifying new drug-target pairs. Hence, using RUS may be misleading since most of the data will be lost due to undersampling, thus, rendering it an unreliable resampling method for DTI predictions. Conversely, SVM-SMOTE can be used as a go-to resampling method when dealing with imbalanced datasets, especially when it is paired with the Random Forest (FR) and Gaussian Naïve Bayes (GNB) classifiers. With SVM-SMOTE, a consistently high F1 score was recorded for almost all activity classes that are severely and moderately imbalanced (over 85%). Last but not least, it is also important to note that the deep learning method, Multilayer Perceptron (MLP) recorded a constantly high F1 score of over 90% across all activity classes even when no resampling method was applied for DTI prediction. However, there is still room for additional resampling methods as well as their use with other deep learning and hybrid algorithms in predicting DTIs in cancer.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/molecules28041663/s1, Figure S1: The accuracy, precision and recall values for BRAF; Figure S2: The accuracy, precision and recall values for CDK-6; Figure S3: The accuracy, precision and recall values for HER2; Figure S4: The accuracy, precision and recall values for KRAS; Figure S5: The accuracy, precision and recall values for MEK1; Figure S6: The accuracy, precision and recall values for PD-1; Figure S7: The accuracy, precision and recall values for PDGFR-A; Figure S8: The accuracy, precision and recall values for PDGFR-B; Figure S9: The accuracy, precision and recall values for VEGFR-1; Figure S10: The accuracy, precision and recall values for VEGFR-2; Figure S11: F1 scores for all activity classes with the RF classifier; Figure S12: F1 scores for all activity classes with the GNB classifier.

Author Contributions

Conceptualization, A.K.A.K. and N.H.A.H.M.; methodology, A.K.A.K.; software, A.K.A.K.; validation, N.H.A.H.M.; formal analysis, A.K.A.K.; investigation A.K.A.K.; resources, A.K.A.K. and N.H.A.H.M.; data curation, A.K.A.K.; writing—original draft preparation, A.K.A.K.; writing—review and editing, A.K.A.K. and N.H.A.H.M.; visualization, A.K.A.K.; supervision, N.H.A.H.M.; project administration, N.H.A.H.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Ministry of Higher Education (MOHE), Malaysia via the Fundamental Research Grant Scheme (FRGS) FRGS/1/2019/ICT02/USM/02/4 – 203.PKOMP.6711800.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gao, K.Y.; Fokoue, A.; Luo, H.; Iyengar, A.; Dey, S.; Zhang, P. Interpretable Drug Target Prediction Using Deep Neural Representation. In Proceedings of the Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 3371–3377. [Google Scholar]
  2. Weininger, D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Model 1988, 28, 31–36. [Google Scholar] [CrossRef]
  3. Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model 2010, 50, 742–754. [Google Scholar] [CrossRef]
  4. Lo, Y.-C.; Rensi, S.E.; Torng, W.; Altman, R.B. Machine Learning in Chemoinformatics and Drug Discovery. Drug Discov. Today 2018, 23, 1538–1546. [Google Scholar] [CrossRef]
  5. Mitchell, B.O.J. Machine Learning Methods in Chemoinformatics. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2014, 4, 468–481. [Google Scholar] [CrossRef]
  6. Gawehn, E.; Hiss, J.A.; Schneider, G. Deep Learning in Drug Discovery. Mol. Inform. 2016, 35, 3–14. [Google Scholar] [CrossRef]
  7. Unterthiner, T.; Mayr, A.; Klambauer, G.; Steijaert, M.; Wegner, J.K.; Ceulemans, H.; Hochreiter, S. Deep Learning for Drug Target Prediction. In Proceedings of the Conference Neural Information Processing Systems Foundation, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
  8. Mayr, A.; Klambauer, G.; Unterthiner, T.; Steijaert, M.; Wegner, J.K.; Ceulemans, H.; Clevert, D.-A.; Hochreiter, S. Large-Scale Comparison of Machine Learning Methods for Drug Target Prediction on ChEMBL. Chem. Sci. 2018, 9, 5441–5451. [Google Scholar] [CrossRef]
  9. Molinaro, A.M.; Simon, R.; Pfeiffer, R.M. Prediction Error Estimation: A Comparison of Resampling Methods. Bioinformatics 2005, 21, 3301–3307. [Google Scholar] [CrossRef]
  10. Khaldy, M.A.; Kambhampati, C. Resampling Imbalanced Class and the Effectiveness of Feature Selection Methods for Heart Failure Dataset. Int. Robot. Autom. J. 2018, 4, 37–45. [Google Scholar] [CrossRef]
  11. Poolsawad, N.; Kambhampati, C.; Cleland, J.G.F. Balancing Class for Performance of Classification with a Clinical Dataset. In Proceedings of the Proceedings of the World Congress on Engineering, London, UK, 2–4 July 2014; Volume 1. [Google Scholar]
  12. Pliakos, K.; Vens, C. Drug-Target Interaction Prediction with Tree-Ensemble Learning and Output Space Reconstruction. BMC Bioinform. 2020, 21, 49. [Google Scholar] [CrossRef]
  13. Johnson, J.M.; Khoshgoftaar, T.M. Survey on Deep Learning with Class Imbalance. J. Big Data 2019, 6, 27. [Google Scholar] [CrossRef]
  14. Hasanin, T.; Khoshgoftaar, T.M.; Leevy, J.L.; Bauder, R.A. Severely Imbalanced Big Data Challenges: Investigating Data Sampling Approaches. J. Big Data 2019, 6, 107. [Google Scholar] [CrossRef]
  15. Wang, Z.; Yang, H.; Wu, Z.; Wang, T.; Li, W.; Tang, Y.; Liu, G. In Silico Prediction of Blood–Brain Barrier Permeability of Compounds by Machine Learning and Resampling Methods. ChemMedChem 2018, 13, 2189–2201. [Google Scholar] [CrossRef] [PubMed]
  16. Ransohoff, D.F. Rules of Evidence for Cancer Molecular-Marker Discovery and Validation. Nat. Rev. Cancer 2004, 4, 309–314. [Google Scholar] [CrossRef]
  17. Korotcov, A.; Tkachenko, V.; Russo, D.P.; Ekins, S. Comparison of Deep Learning with Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets. Mol. Pharm. 2017, 14, 4462–4475. [Google Scholar] [CrossRef]
  18. Ezzat, A.; Wu, M.; Li, X.-L.; Kwoh, C.-K. Drug-Target Interaction Prediction via Class Imbalance-Aware Ensemble Learning. BMC Bioinform. 2016, 17, 509. [Google Scholar] [CrossRef]
  19. Yaseen, B.T.; Kurnaz, S. Drug–Target Interaction Prediction Using Artificial Intelligence. Appl. Nanosci. 2021. [Google Scholar] [CrossRef]
  20. Gao, D.; Chen, Q.; Zeng, Y.; Jiang, M.; Zhang, Y. Applications of Machine Learning in Drug Target Discovery. Curr. Drug Metab. 2020, 21, 790–803. [Google Scholar] [CrossRef]
  21. Carracedo-Reboredo, P.; Liñares-Blanco, J.; Rodríguez-Fernández, N.; Cedrón, F.; Novoa, F.J.; Carballal, A.; Maojo, V.; Pazos, A.; Fernandez-Lozano, C. A Review on Machine Learning Approaches and Trends in Drug Discovery. Comput. Struct. Biotechnol. J. 2021, 19, 4538–4558. [Google Scholar] [CrossRef]
  22. Vamathevan, J.; Clark, D.; Czodrowski, P.; Dunham, I.; Ferran, E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spitzer, M.; et al. Applications of Machine Learning in Drug Discovery and Development. Nat. Rev. Drug Discov. 2019, 18, 463–477. [Google Scholar] [CrossRef]
  23. Xu, L.; Ru, X.; Song, R. Application of Machine Learning for Drug–Target Interaction Prediction. Front Genet 2021, 12. [Google Scholar] [CrossRef]
  24. Bagherian, M.; Sabeti, E.; Wang, K.; Sartor, M.A.; Nikolovska-Coleska, Z.; Najarian, K. Machine Learning Approaches and Databases for Prediction of Drug–Target Interaction: A Survey Paper. Brief Bioinform. 2021, 22, 247–269. [Google Scholar] [CrossRef]
  25. Faulon, J.-L.; Misra, M.; Martin, S.; Sale, K.; Sapra, R. Genome Scale Enzyme–Metabolite and Drug–Target Interaction Predictions Using the Signature Molecular Descriptor. Bioinformatics 2008, 24, 225–233. [Google Scholar] [CrossRef]
  26. Ding, Y.; Tang, J.; Guo, F. Identification of Drug-Target Interactions via Multiple Information Integration. Inf. Sci. (N.Y.) 2017, 418–419, 546–560. [Google Scholar] [CrossRef]
  27. Lavecchia, A. Machine-Learning Approaches in Drug Discovery: Methods and Applications. Drug Discov. Today 2015, 20, 318–331. [Google Scholar] [CrossRef]
  28. Patel, L.; Shukla, T.; Huang, X.; Ussery, D.W.; Wang, S. Machine Learning Methods in Drug Discovery. Molecules 2020, 25, 5277. [Google Scholar] [CrossRef]
  29. Madhukar, N.S.; Khade, P.K.; Huang, L.; Gayvert, K.; Galletti, G.; Stogniew, M.; Allen, J.E.; Giannakakou, P.; Elemento, O. A Bayesian Machine Learning Approach for Drug Target Identification Using Diverse Data Types. Nat. Commun. 2019, 10, 5221. [Google Scholar] [CrossRef]
  30. Yao, Z.-J.; Dong, J.; Che, Y.-J.; Zhu, M.-F.; Wen, M.; Wang, N.-N.; Wang, S.; Lu, A.-P.; Cao, D.-S. TargetNet: A Web Service for Predicting Potential Drug–Target Interaction Profiling via Multi-Target SAR Models. J. Comput. Aided Mol. Des. 2016, 30, 413–424. [Google Scholar] [CrossRef]
  31. Li, Z.-C.; Huang, M.-H.; Zhong, W.-Q.; Liu, Z.-Q.; Xie, Y.; Dai, Z.; Zou, X.-Y. Identification of Drug–Target Interaction from Interactome Network with ‘Guilt-by-Association’ Principle and Topology Features. Bioinformatics 2016, 32, 1057–1064. [Google Scholar] [CrossRef]
  32. Yu, H.; Chen, J.; Xu, X.; Li, Y.; Zhao, H.; Fang, Y.; Li, X.; Zhou, W.; Wang, W.; Wang, Y. A Systematic Prediction of Multiple Drug-Target Interactions from Chemical, Genomic, and Pharmacological Data. PLoS ONE 2012, 7, e37608. [Google Scholar] [CrossRef]
  33. Ezzat, A.; Wu, M.; Li, X.; Kwoh, C.-K. Computational Prediction of Drug-Target Interactions via Ensemble Learning. In Methods in Molecular Biology; Humana Press Inc.: New York, NY, USA, 2019; Volume 1903, pp. 239–254. [Google Scholar]
  34. Chen, H.; Engkvist, O.; Wang, Y.; Olivecrona, M.; Blaschke, T. The Rise of Deep Learning in Drug Discovery. Drug Discov. Today 2018, 23, 1241–1250. [Google Scholar] [CrossRef]
  35. Lavecchia, A. Deep Learning in Drug Discovery: Opportunities, Challenges and Future Prospects. Drug Discov. Today 2019, 24, 2017–2032. [Google Scholar] [CrossRef]
  36. Lipinski, C.F.; Maltarollo, V.G.; Oliveira, P.R.; da Silva, A.B.F.; Honorio, K.M. Advances and Perspectives in Applying Deep Learning for Drug Design and Discovery. Front. Robot. AI 2019, 6. [Google Scholar] [CrossRef]
  37. Abbasi, K.; Razzaghi, P.; Poso, A.; Ghanbari-Ara, S.; Masoudi-Nejad, A. Deep Learning in Drug Target Interaction Prediction: Current and Future Perspectives. Curr. Med. Chem. 2021, 28, 2100–2113. [Google Scholar] [CrossRef]
  38. Rifaioglu, A.S.; Atas, H.; Martin, M.J.; Cetin-Atalay, R.; Atalay, V.; Doğan, T. Recent Applications of Deep Learning and Machine Intelligence on in Silico Drug Discovery: Methods, Tools and Databases. Brief Bioinform. 2019, 20, 1878–1912. [Google Scholar] [CrossRef]
  39. Hasan Mahmud, S.M.; Chen, W.; Jahan, H.; Dai, B.; Din, S.U.; Dzisoo, A.M. DeepACTION: A Deep Learning-Based Method for Predicting Novel Drug-Target Interactions. Anal. Biochem. 2020, 610, 113978. [Google Scholar] [CrossRef] [PubMed]
  40. Lee, I.; Keum, J.; Nam, H. DeepConv-DTI: Prediction of Drug-Target Interactions via Deep Learning with Convolution on Protein Sequences. PLoS Comput. Biol. 2019, 15, e1007129. [Google Scholar] [CrossRef]
  41. Dara, S.; Dhamercherla, S.; Jadav, S.S.; Babu, C.M.; Ahsan, M.J. Machine Learning in Drug Discovery: A Review. Artif. Intell. Rev. 2022, 55, 1947–1999. [Google Scholar] [CrossRef]
  42. Wang, H.; Zhou, G.; Liu, S.; Jiang, J.-Y.; Wang, W. Drug-Target Interaction Prediction with Graph Attention Networks. arXiv 2021, arXiv:2107.06099. [Google Scholar]
  43. Tayebi, A.; Yousefi, N.; Yazdani-Jahromi, M.; Kolanthai, E.; Neal, C.; Seal, S.; Garibay, O. UnbiasedDTI: Mitigating Real-World Bias of Drug-Target Interaction Prediction by Using Deep Ensemble-Balanced Learning. Molecules 2022, 27, 2980. [Google Scholar] [CrossRef]
  44. Google Developers Imbalanced Data | Data Preparation and Feature Engineering for Machine Learning | Google Developers. Available online: https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data (accessed on 12 July 2022).
  45. Gilson, M.K.; Liu, T.; Baitaluk, M.; Nicola, G.; Hwang, L.; Chong, J. BindingDB in 2015: A Public Database for Medicinal Chemistry, Computational Chemistry and Systems Pharmacology. Nucleic Acids Res. 2016, 44, D1045–D1053. [Google Scholar] [CrossRef]
  46. Charlton, P.; Spicer, J. Targeted Therapy in Cancer. Medicine 2016, 44, 34–38. [Google Scholar] [CrossRef]
  47. Mohamed, A.; Krajewski, K.; Cakar, B.; Ma, C.X. Targeted Therapy for Breast Cancer. Am. J. Pathol. 2013, 183, 1096–1112. [Google Scholar] [CrossRef] [PubMed]
  48. Chan, B.A.; Hughes, B.G.M. Targeted Therapy for Non-Small Cell Lung Cancer: Current Standards and the Promise of the Future. Transl. Lung Cancer Res. 2015, 4, 36–54. [Google Scholar] [CrossRef] [PubMed]
  49. P. Mazanetz, M.; J. Marmon, R.; B. T. Reisser, C.; Morao, I. Drug Discovery Applications for KNIME: An Open Source Data Mining Platform. Curr. Top Med. Chem. 2012, 12, 1965–1979. [Google Scholar] [CrossRef] [PubMed]
  50. Landrum, G.; Tosco, P.; Kelley, B.; Sriniker; Gedeck; Vianello, R.; Nadine, S.; Kawashima; Dalkel. RDKit: Open-Source Chemoinformatics. 2021. Available online: https://zenodo.org/record/5773460#.Y-Sf3HbMJPY (accessed on 8 April 2022).
  51. Ismail, H.; Ahamed Hassain Malim, N.H.; Mohamad Zobir, S.Z.; Wahab, H.A. Comparative Studies On Drug-Target Interaction Prediction Using Machine Learning and Deep Learning Methods With Different Molecular Descriptors. In Proceedings of the 2021 International Conference of Women in Data Science at Taif University (WiDSTaif ), Taif, Saudi Arabia, 30–31 March 2021; pp. 1–6. [Google Scholar]
  52. Feng, Q.; Dueva, E.; Cherkasov, A.; Ester, M. PADME: A Deep Learning-Based Framework for Drug-Target Interaction Prediction. 2018. arXiv 2018, arXiv:1807.09741. [Google Scholar]
  53. Steinbeck, C.; Han, Y.; Kuhn, S.; Horlacher, O.; Luttmann, E.; Willighagen, E. The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics. J. Chem. Inf. Comput. Sci. 2003, 43, 493–500. [Google Scholar] [CrossRef] [PubMed]
  54. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  55. Lemaitre, G.; Nogueira, F.; Aridas, C.K. Imbalanced-Learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. J. Mach. Learn. Res. 2016, 18, 559–563. [Google Scholar]
  56. He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
  57. Han, H.; Wang, W.-Y.; Mao, B.-H. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3644, pp. 878–887. [Google Scholar]
  58. Nguyen, H.M.; Cooper, E.W.; Kamei, K. Borderline Over-Sampling for Imbalanced Data Classification. Int. J. Knowl. Eng. Soft Data Paradig. 2011, 3, 4. [Google Scholar] [CrossRef]
  59. Batista, G.E.A.P.A.; Bazzan, A.L.C.; Monard, M.C. Balancing Training Data for Automated Annotation of Keywords: A Case Study. Second Brazilian Workshop on Bioinformatics 2003, 2, 10–18. [Google Scholar]
  60. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Müller, A.; Nothman, J.; Louppe, G.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2012, 12, 2825–2830. [Google Scholar]
  61. Agrawal, T. Hyperparameter Optimization in Machine Learning; Apress: Berkeley, CA, USA, 2021; ISBN 978-1-4842-6578-9. [Google Scholar]
  62. Wang, C.; Wang, W.; Lu, K.; Zhang, J.; Chen, P.; Wang, B. Predicting Drug-Target Interactions with Electrotopological State Fingerprints and Amphiphilic Pseudo Amino Acid Composition. Int. J. Mol. Sci. 2020, 21, 5694. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.