A Novel Autoencoder-Based Feature Selection Method for Drug-Target Interaction Prediction with Human-Interpretable Feature Weights

Ozsert Yigit, Gozde; Baransel, Cesur

doi:10.3390/sym15010192

Open AccessArticle

A Novel Autoencoder-Based Feature Selection Method for Drug-Target Interaction Prediction with Human-Interpretable Feature Weights

by

Gozde Ozsert Yigit

^* and

Cesur Baransel

Department of Computer Engineering, Gaziantep University, 27410 Gaziantep, Turkey

^*

Author to whom correspondence should be addressed.

Symmetry 2023, 15(1), 192; https://doi.org/10.3390/sym15010192

Submission received: 13 December 2022 / Revised: 5 January 2023 / Accepted: 6 January 2023 / Published: 9 January 2023

Download

Browse Figures

Versions Notes

Abstract

:

Drug-target interaction prediction provides important information that could be exploited for drug discovery, drug design, and drug repurposing. Chemogenomic approaches for predicting drug-target interaction assume that similar receptors bind to similar ligands. Capturing this similarity in so-called “fingerprints” and combining the target and ligand fingerprints provide an efficient way to search for protein-ligand pairs that are more likely to interact. In this study, we constructed drug and target fingerprints by employing features extracted from the DrugBank. However, the number of extracted features is quite large, necessitating an effective feature selection mechanism since some features can be redundant or irrelevant to drug-target interaction prediction problems. Although such feature selection methods are readily available in the literature, usually they act as black boxes and do not provide any quantitative information about why a specific feature is preferred over another. To alleviate this lack of human interpretability, we proposed a novel feature selection method in which we used an autoencoder as a symmetric learning method and compared the proposed method to some popular feature selection algorithms, such as Kbest, Variance Threshold, and Decision Tree. The results of a detailed performance study, in which we trained six Multi-Layer Perceptron (MLP) Networks of different sizes and configurations for prediction, demonstrate that the proposed method yields superior results compared to the aforementioned methods.

Keywords:

drug-target interaction prediction; autoencoder; multi-layer perceptron; dimensionality reduction; feature selection; DrugBank

1. Introduction

Drugs are chemicals, of either natural or synthetic origin, which are used to prevent, treat, and diagnose disease. Drugs have so-called targets in the body, which are macromolecules that have an established function in the pathophysiology of a disease. There are four major drug targets in organisms, namely proteins (including receptors and enzymes), nucleic acids (including DNA and RNA), carbohydrates, and lipids. For various reasons that will not be discussed here, the majority of the available drugs in the market target proteins [1].

Receptors are flexible protein molecules, which are scattered across the surface of target cells. Small drug-like molecules, called ligands, bind to these receptors to form a drug-protein complex, which in turn controls the function of receptors to produce a therapeutically physiological response [2]. The chemical signals that the receptor receives from the ligand alter the protein conformations and cellular functions at the binding site through a process that is referred to as drug-target interaction (DTI). Ligands generally form noncovalent bonds and have transient interactions with unique binding sites, which have shape complementarity to their ligands. Therefore, knowledge about a ligand binding site conveys some critical information to the drug designer regarding the factors which allow binding for a very specific ligand with high affinity and specificity. It also improves the prediction of protein–ligand and protein–protein interactions (PPI) [3].

Developing a new drug molecule has always been a long, multi-dimensional and costly process. Reportedly, out of 5000–10,000 compounds examined, only one molecule reaches the market [4]. Over the years, protein–protein interactions are widely studied through various experimental techniques such as NMR (Nuclear Magnetic Resonance), spectrography, crystallography, and cryo-electron microscopy. Relevant data are stored in popular databases like PDB (Protein Data Bank). Detailed physicochemical and biological properties of many ligand molecules are also available in various chemical compound databases. Today, many other databases are at the disposal of researchers, containing huge amounts of data, which could be exploited for drug discovery, drug design, and drug repurposing. Consequently, we witnessed significant developments in the field of Computer Aided Drug Design (CADD), which is, broadly stated, aimed to predict whether a given drug molecule will bind to a target and, if so, how strongly. There are a plethora of methods [5,6] for the prediction of drug-target interactions. Of special interest to us in this paper are the methods based on the structural similarity between the drug and the target. These methods create a subspace combining the chemical structural spaces of drugs and proteins in which possible interactions are predicted [7].

In [8], the authors provide “A comprehensive review of feature-based methods for drug target interaction prediction”. Regarding the limitations of feature-based methods, the paper states “…The process of calculating and extracting features is time-consuming which increases the complexity. Moreover, the process of selecting and extracting optimal features is a complex task. The features calculated for the drugs and the targets need to be evaluated to find the optimal set of features that provide greater accuracy. But, various feature selection and feature extraction methods are complex adding to the complexity of the method”. In this paper, we address this issue and propose an autoencoder-based method for selecting the more representative features of the drugs and the targets for drug-target interaction prediction. Consequently, a researcher can reduce the dimensionality of the original dataset without needing to worry about leaving a more important feature out for the benefit of a less useful one. The proposed method also addresses the problem of class imbalance, which is known to affect the performance of drug-target interaction prediction methods [9]. Imbalanced data typically refers to a problem with classification problems where the classes are not represented equally. It was seen that the total number of samples in the positive class was far less than the total number of samples in the negative class. Class imbalance problem occurs whenever there is a significant difference in the numbers of available samples for the classes. In this case, the accuracy of the results provided by the classification methods deteriorate and these methods which assume a more balanced representation between classes tend to misclassify the instances of the underrepresented class.

In Table 1, we provide a list of relevant studies from the literature in which the studies that use autoencoders for feature selection in DTI prediction problems are grouped separately.

The majority of the studies in the table are black-box deep-learning methods, whereas autoencoder-based methods are comparatively rare. Our study differs from the studies provided in Table 1 in two fundamental respects. First, we propose a white-box method. Second, we do not use the latent space representations that the autoencoder produces for feature selection, but feature values themselves, rather than transformed values. Consequently, we can select a subset of original features based on their importance as indicated by the weights between the input layer and the first hidden layer of the autoencoder after training. Previous studies ([23,24] in Table 1) that use the autoencoder for feature selection rely on latent space representations, thus their results are less human-interpretable compared to our results, which can provide a complete list of the order of importance for all the original features involved. The proposed method yields good prediction performance at modest computational complexity and cost and is envisioned to contribute to health institutions in their endeavor for drug discovery.

This paper is organized as follows. Section 2 introduces the proposed method. In Section 3, we provide the results from a performance evaluation study. The paper continues with results and discussions of the performance study and ends with conclusions.

2. The Proposed Method

The methodology employed in the development of our method proposal for drug-target prediction is shown in Figure 1. Accordingly, we introduce our approach in many consecutive steps:

First, we explain how we constructed drug-target feature vectors and how we extracted and transformed relevant data from the available databases.
Then, we explain how we used these vectors in training an autoencoder that allows us to reduce the dimensionality of the feature vectors.
Next, we train a classification network for predicting whether a given drug and a target are likely to interact.
Lastly, we experiment with different network structures and discuss the outcomes of these experiments.

2.1. Construction of Drug-Target Interaction Vectors

First, we construct the drug feature vectors [d₁, d₂, …d_n] and the target feature vectors [f₁, f₂, …f_m] independently. Then, we concatenate these vectors to obtain (n × m) drug-target pair vectors [d₁, d₂, …d_n, f₁, f₂, …f_m] which will be used in the autoencoder and the classifier. Since the data are not readily available in the format we require in a single database, it has to be collected, transformed, and combined appropriately.

Our primary data source in this study is the DrugBank Database. It is a comprehensive, freely accessible, online database containing information on drugs and targets. The latest release of DrugBank [25] contains 13,431 drug entries, including 2617 approved small molecule drugs, and 1345 approved biotech (protein/peptide) drugs. DrugBank data are indexed by DrugBank Identification Numbers, where a query returns the representation of a related item in the so-called Canonical SMILES (SMI) format [26], for most, but not all, entries. Filtering out the entries with no SMI representation left us with 5771 drugs. SMI representations cannot be readily used as an input vector to an autoencoder. Therefore, we employed the Rcpi (Molecular Informatics Toolkit for Compound-Protein Interaction in Drug Discovery) [27] library in R-studio to perform the necessary conversions. We used eight feature descriptors to extract drug features, namely, Drug Amino Acid Count (20 features), Drug ALOGP (3 features), Drug Atom Count (1 feature), Drug Aromatic Atoms Count (1 feature), Drug Apol (1 feature), Drug Autocorrelation Charge (5 features), Drug Autocorrelation Mass (5 features), and Drug Autocorrelation Polarizability (5 features). Consequently, each one of the 5771 drugs at our disposal has a representation by 41 features (Figure 2).

After extracting drug features by using Rcpi, we proceeded to extract target features using target sequences in PROFEAT, by employing three feature descriptors, namely, Composition, Transition, and Distribution (CTD, 504 features), Amino Acid Composition (AAC, 20 features) and Pseudo-Amino Acid Composition (PAAC, 150 features), yielding 674 features in total for each of the 4845 targets.

By concatenating 5771 drug vectors with 4854 target vectors, we obtain 5771 × 4845 = 27,960,495 distinct vector pairs. Examples of the drug feature matrix, target feature matrix, and concatenated drug-target matrix are provided in Table 2, Table 3, and Table 4, respectively. The vector (D₁, D₂, ….D₅₇₇₁) contains drugs, and the vector (F_D₁ (i), F_D₂ (i), ….F_D₄₁ (i)) contains feature values for the drug D_i. Similarly, the vector (T₁, T₂, ….T₄₈₄₅) is the target vector and the vector (F_T₁ (j), F_T₂ (j), ….F_T₆₇₄ (j)) represent features for target T_j. The Drug-Target Interaction Matrix contains 27,960,495 rows, the input data of the autoencoder.

Next, we need to label each entry in the Drug-Target Interaction Matrix either as an interacting pair (marked as 1) or as a non-interacting pair (marked as 0) so that the data can be used for training the classifier. However, the number of entries in the matrix with no interaction is much larger than the number of entries with an interaction, leading to a problem called class imbalance, which directly affects the generalization ability of machine learning algorithms [9]. There are two basic approaches to handling the class imbalance problem. The first one is to add copies of instances from the under-represented class (called oversampling or, more formally, sampling with replacement), and the second one is to delete instances from the over-represented class (called undersampling). Machine learning-based studies generally utilize a random sampling approach in which samples are randomly chosen from the majority class until the numbers of samples in both classes are the same [11,28]. However, studies that do not attempt to alleviate the imbalance problem also exist in the literature [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29].

We employed the oversampling method in this paper, utilizing the Scikit-Learn library. For oversampling, two parameters should be set, namely, replace (indicating the minority class) and n_samples (representing the number of samples to be oversampled from this minority class for creating a balanced dataset).

2.2. The Modified Autoencoder

In the simplest form, an autoencoder is a symmetric neural network that is trained to copy its input to its output, as closely as possible, by using a hidden layer (Figure 3).

For data compression, we constrain the hidden layer so that the number of neurons in the hidden layer is smaller than the number of neurons in the input layer. This constraint forces the autoencoder to capture the most salient features of the input data in a space of lower dimensionality than the input space. An autoencoder whose internal representation has a smaller dimensionality than the input data is known as an under-complete autoencoder.

After training, the under-complete autoencoder learns to reconstruct the input at the output such that the value of the output node xⁱ is as close to the value of the input node x_i as possible, for all x_i. Note that, in Figure 3, a dimensionality reduction has occurred because the three-dimensional input vector (x₁, x₂, x₃) receives a compressed representation (h₁, h₂) in two dimensions, called the latent space. Not all input nodes contribute to the latent space representations equally. For example, in the extreme case of w_3.1 = w_3.2 = 0, we can surmise that the feature x₃ is a redundant feature and can eliminate it from the input dataset without any adverse effects in further processing. This is the essence of our approach to achieving dimensionality reduction of drug-target interaction vectors in this paper. We simply train an autoencoder network using drug-target vector pairs, and order features in decreasing order of importance according to the weights accumulated during the training so that features associated with relatively smaller weights can be eliminated.

During the training, we feed the input data (the rows of the Drug-Target Pair Matrix) in batches to the autoencoder and keep track of the weight accumulation on every link connecting a node in the input layer to a node in the hidden layer. If the input layer has n nodes, the hidden layer has m nodes, and the accumulated weight of the link between the input node (i) and the hidden layer node (j) is w(i,j) after the training, then the weight W(i) of the node (i) in the input layer (i.e., the weight of the feature it represents) is:

W (i) = \frac{\sum_{j = 1}^{m} w (i, j)}{m}

(1)

Then, we construct the vector of feature weights F = (W(1), …W(n)). Putting this vector into descending order provides a quantified way for researchers to compare the relative importance of one feature to another, and to pick up the most important features they choose to work with.

The proposed feature selection method employs a simple autoencoder with a single hidden layer containing 13 nodes. The number of nodes in the input layer is 715, as explained in the previous section. The activation functions of the encoder and decoder are ReLu and Sigmoid, respectively. The loss function is Binary Cross entropy. We experimented with two different batch sizes, namely, 25 and 100, where the batch size of 25 delivered better results (Table 5). Input vectors are applied to the autoencoder and trained until convergence occurs.

The following pseudocode (Figure 4) summarizes the steps of the proposed algorithm.

Note that we used the rows of the Drug-Target Pair Matrix for training and subsequent feature selection. Alternatively, we could have used the same technique on drugs and targets independently and performed feature selection before constructing the Drug-Target Pair Matrix. The rationale behind our preference can be explained as follows. Previous studies confirmed that small variations in protein sequence only slightly affect protein 3D structure, and despite these accumulated mutations, specific intra- and inter-molecular interactions in protein families and superfamilies are conserved. In other words, protein structure is more conserved than protein sequence, and proteins generally with a common ancestor encode for similar 3D structures [30]. Thus, we assume that aforementioned specific intra- and inter-molecular interactions manifest themselves in the values of the features and the possibility of a significant relationship between an insignificant drug feature and/or an insignificant target feature should not be dismissed a priori.

2.3. Ablation Study for the Classifier

In general, drug-target interaction prediction is a binary classification problem where interacting drug-target pairs are grouped in the positive class and drug-target pairs without interaction are grouped in the negative class. We use Multi-Layer Perceptrons (MLP) for classification in the drug-target interaction prediction problem. The accuracy of MLP is closely related to the number of hidden layers and the number of neurons in each hidden layer [31]. Without a proper setting, an MLP either may be insufficiently discriminative or might easily be stuck in local minima. In the absence of any hard and fast rules for setting the number of hidden layers and the number of nodes in them, the optimal numbers are generally determined with trial and error.

We set aside 20% of the available data for testing and experimented with six different MLP structures for performance evaluation (Table 6). We consider the architectures s1–s4 comparatively small networks, whereas architectures s5 and s6 are relatively larger. We use two hidden layers in this ablation study.

Other important parameters of the MLP classification models and their values are given in Table 7.

3. Performance Evaluation

3.1. The Performance Metric

In this section, we will use seven algorithms for feature selection, and compare the drug-protein prediction results delivered on six MLP structures, using Accuracy and F-score values as performance metrics. Accuracy alone is not a good metric to use when a class imbalance exists in the dataset. F1-score is a better accuracy metric, which considers not only the number of prediction errors but also the type of errors that the model makes. The F1-score combines the precision and recall of a classifier into a single metric by taking their harmonic mean, which is primarily used to compare the performance of two classifiers. The highest possible value of an F-score is 1.0, indicating perfect precision and recall. The Scikit-Learn library has a custom method for calculating F-scores. The formulas for calculating accuracy, precision, and recall are provided below,

a c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(2)

p r e c i s i o n = \frac{T P}{T P + F P}

(3)

r e c a l l = \frac{T P}{T P + F N}

(4)

where TP stands for the number of true positives, TN stands for the number of true negatives, FP stands for the number of false positives, and FN stands for the number of negatives negatives. For further details, see [32].

3.2. Feature Selection Approaches

In this section, we will empirically compare seven algorithms for feature selection in drug-protein prediction problems. In the first case, we performed no feature selection on the dataset (No FS) which allows a comparison of the six MLP networks between themselves using the same data.

The next three methods are univariate feature selection methods, which use statistical tests to choose features that have the best relationship with the output variable. They analyze each feature separately to determine the strength of the relationship of the feature with the response variable [33]. SelectKbest class from the Scikit-Learn library was utilized to select the top performing 10, 50, and 100 features from the dataset.

		test = SelectKBest(score_func=chi2, k=n_feature)
                         fit = test.fit(X, y)
                         # summarize scores
                         np.set_printoptions(precision=3)
                         #print(fit.scores_)
                         features = fit.transform(X)

The selection of representative cases is performed according to the results of a grid search, which is summarized in Table 8.

In Table 8, bold numbers indicate the top accuracy values for a given architecture, and the underlined numbers indicate the second best. Accordingly, we selected the Kbest 10, Kbest 50, and Kbest 100 as the representatives of the univariate feature selection methods for comparison.

The VarianceThreshold method removes all features whose variance does not exceed a certain threshold. By default, all zero-variance features that have the same value for all samples are excluded. Here, we utilized the VarianceThreshold method from the Scikit-Learn library with default values [34].

		selector = VarianceThreshold()
                         fit=selector.fit(X,y)
                         features=fit.transform(X)
                         1,k=features.shape
                         n_feature=k
                         features_index = fit.get_support(indices=True)

Scikit-Learn library uses the Classification and Regression Tree (CART) algorithm to train Decision Trees. The CART algorithm is a greedy algorithm, which searches for an optimum split at the top level, then repeats the process at each subsequent level. Scikit-Learn library uses Gini impurity by default [35].

		clf = ExtraTreesClassifier(n_estimators=50)
                         clf = clf.fit(X, y)
                         clf.feature_importances_
                         model = SelectFromModel(clf, prefit=True)
                         features = model.transform(X)
                         l,n_feature=features.shape
                         features_index=model.get_support(indices=True)

The last two methods represent the proposed autoencoder (AE) method. We performed a grid search to select the best-performing autoencoder case where we started the grid search with a step size of 50 and then reduced it to 10 for obtaining good candidates with higher precision (Table 9). If the next accuracy result was not better than the current one, we retained the current candidate. For example, 150 is a good candidate, but increasing the neuron size to 160 yields a better performance. However, 170 is not better than 160, thus, 160 was marked as a viable choice before proceeding to a neuron size of 200 with a larger step size.

Grid search results indicate the AE 350 case as the best across all of the architectures.

We summarize the accuracy and F-score values for all feature selection methods and for all structures in Table 10, where the bold values represent the best accuracy and the underlined values represent the second-best accuracy values amongst the methods under each structure.

The accuracies in Table 10 are also provided graphically in Figure 5.

4. Results and Discussion

The performance study yields the following observations.

All methods delivered consistently better results as the MLP structures become larger in width and depth. The proposed approach was no exception.
When we used all 715 features without elimination, the delivered performance was competitive (second best on four out of six MLP structures). However, the proposed approach performed better with only 350 features, both in terms of accuracy and F-Scores.
The proposed method delivers the best accuracy values with 350 features out of 715, and with larger sizes of MLP. We surmise that the capacity of the autoencoder was not sufficient for proper encoding on smaller networks.
The input data is heavily imbalanced, with approximately less than 13,000 interacting cases vs. 27 million non-interacting cases. Although we used oversampling to alleviate this problem, the adverse effects of class imbalance persist, as indicated by very low F-Scores. In this regard, AE 360 delivered the best F-Scores (albeit still small, 0.1), outranking other methods by a large margin (about 0.003 on average).

5. Conclusions

In this paper, we employed the weights of the trained autoencoder, rather than the outputs of the output layer, for feature selection. After training the autoencoder, the weights between the input layer and the first hidden layer are summed, averaged, and sorted. These sorted weights supply useful information about the relative importance of each input feature for dimension reduction. The N input features with the highest importance were selected and an MLP network was used to evaluate the performance of the selected features in classification, where the value of N is left to the discretion of the user.

Note that we distinguish between classification and feature selection tasks in this paper, where we focus on feature selection rather than classification. Although classification performances themselves were poor, test results clearly illustrate the efficiency of using an autoencoder as a feature selection algorithm. Better classification results can be obtained by changing the structure of the classification part or using different classification approaches. Since the proposed method has yielded competitive results in general, we expect that it can be a promising feature selection algorithm for the DTI prediction problem.

Additionally, note that we did not sacrifice human interpretability in feature selection for performance. This is a facet of the DTI prediction problem, which is clearly of practical importance, yet seldom is addressed in other studies in the literature. Here, we proposed a white-box feature selection method that provides quantitative information to the researcher about why a specific feature is preferred over another, without compromising performance or explainability. We believe that the proposed approach can be used not only for DTI prediction but for other feature selection problems as well.

Finally, we must acknowledge the fact that none of the tested feature selection algorithms, including the proposed method, was able to yield satisfactorily high F-Scores, quite possibly due to the severely imbalanced nature of the dataset. In that regard, note that the so-called noninteracting cases may not be always noninteracting at all, and harboring interactions yet to be discovered. We do not know anything about them, as far as the coding is concerned. If the interaction data were encoded as 1 (confirmed interaction), 0 (not known), and −1 (confirmed noninteraction), we might be able to get more definitive and practically useful results.

Author Contributions

Conceptualization, G.O.Y.; methodology, G.O.Y.; software, G.O.Y.; validation, G.O.Y.; formal analysis, C.B.; investigation, G.O.Y. and C.B.; resources, G.O.Y.; data curation, G.O.Y.; writing-original draft preparation, G.O.Y. and C.B.; writing--review and editing, G.O.Y. and C.B.; visualization, G.O.Y. and C.B.; supervision C.B.; project administration, G.O.Y.; funding acquisition, C.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is unavailable due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Live Science. Available online: https://www.livescience.com/37247-dna.html (accessed on 10 November 2021).
Padmanabhan, S. Handbook of Pharmacogenomics and Stratified Medicines; Academic Press: Cambridge, MA, USA, 2014; Chapter 18. [Google Scholar]
Sachdev, K.; Gupta, M.K. A comprehensive review of feature-based methods for drug target interaction prediction. J. Biomed. Inform. 2019, 93, 103159. [Google Scholar] [CrossRef]
Chen, R.; Liu, X.; Jin, S.; Lin, J.; Liu, J. Machine Learning for Drug-Target Interaction Prediction. Molecules 2018, 23, 2208. [Google Scholar] [CrossRef] [Green Version]
Lindsay, M.A. Target discovery. Nat. Rev. Drug Discov. 2003, 2, 831–838. [Google Scholar] [CrossRef]
Yang, Y.; Adelstein, S.J.; Kassis, A.I. Target discovery from data mining approaches. Drug Discov. Today 2009, 14, 147–154. [Google Scholar] [CrossRef]
Mousavian, Z.; Masoudi-Nejad, A. Drug–target interaction prediction via chemogenomic space: Learning-based methods. Expert Opin. Drug Metab. Toxicol. 2014, 10, 1273–1287. [Google Scholar] [CrossRef]
Johnson, M.A.; Maggiora, G.M. Concepts and Applications of Molecular Similarity; Wiley: Hoboken, NJ, USA, 1991. [Google Scholar]
He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
Ezzat, A.; Wu, M.; Li, X.; Kwoh, C. Drug-target interaction prediction via class imbalance-aware ensemble learning. In Proceedings of the 2016, 15th International Conference On Bioinformatics (INCOB 2016), Queenstown, Singapore, 21–23 September 2016. [Google Scholar]
Zhang, Y.; Jiang, Z.; Chen, C.; Wei, Q.; Gu, H.; Yu, B. DeepStack-DTIs: Predicting drug-target interactions using LightGBM feature selection and deep-stacked ensemble classifier. Interdiscip. Sci. Comput. Life Sci. 2022, 14, 311–330. [Google Scholar] [CrossRef]
Hasan Mahmud, S.M.; Chen, W.; Jahan, H.; Dai, B.; Din, S.U.; Dzisoo, A.M. DeepACTION: A deep learning-based method for predicting novel drug-target interactions. Anal. Biochem. 2020, 610, 11. [Google Scholar] [CrossRef]
You, J.; McLeod, R.D.; Hu, P. Predicting drug-target interaction network using deep learning model. Comput. Biol. Chem. 2019, 80, 90–101. [Google Scholar] [CrossRef]
Beck, B.R.; Shin, B.; Choi, Y.; Park, S.; Kang, K. Predicting commercially available antiviral drugs that may act on the novel coronavirus (SARS-CoV-2) through a drug-target interaction deep learning model. Comput. Struct. Biotechnol. J. 2020, 18, 784–790. [Google Scholar] [CrossRef]
Wang, L.; You, Z.; Chen, X.; Xia, S.-X.; Liu, F.; Yan, X.; Zhou, Y.; Song, K.-J. A computational-based method for predicting drug-target interactions by using a stacked autoencoder deep neural network. J. Comput. Biol. 2018, 25, 361–373. [Google Scholar] [CrossRef]
Monteiro, N.R.; Ribeiro, B.; Arrais, J. Drug-target interaction prediction: End-to-end deep learning approach. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 18, 2364–2374. [Google Scholar] [CrossRef]
Redkar, S.; Mondal, S.; Joseph, A.; Hareesha, K.S. A machine learning approach for drug-target interaction prediction using wrapper feature selection and class balancing. Mol. Inform. 2020, 39, 1900062. [Google Scholar] [CrossRef]
Eslami Manoochehri, H.; Nourani, M. Drug-target interaction prediction using semi-bipartite graph model and deep learning. BMC Bioinform. 2020, 21, 248. [Google Scholar] [CrossRef]
Peng, J.; Li, J.; Shang, X. A learning-based method for drug-target interaction prediction based on feature representation learning and deep neural network. BMC Bioinform. 2020, 21 (Suppl. S13), 394. [Google Scholar] [CrossRef]
Wang, Y.-B.; You, Z.-H.; Yang, S.; Yi, H.-C.; Chen, Z.-H.; Zheng, K. A deep learning-based method for drug-target interaction prediction based on long short-term memory neural network. BMC Med. Informatics Decis. Mak. 2020, 20 (Suppl. S2), 49. [Google Scholar] [CrossRef] [Green Version]
Meena, T.; Roy, S. Bone fracture detection using deep supervised learning from radiological images: A paradigm shift. Diagnostics 2022, 12, 2420. [Google Scholar] [CrossRef] [PubMed]
Pal, D.; Reddy, P.B.; Roy, S. Attention UW-Net: A fully connected model for automatic segmentation and annotation from Chest X-ray. Comput. Biol. Med. 2022, 150, 106083. [Google Scholar] [CrossRef]
Xu, X.; Gu, H.; Wang, Y.; Wang, J.; Qin, P. Autoencoder-based feature selection method for classification of anticancer drug response. Front. Genet. 2019, 10, 233. [Google Scholar] [CrossRef] [Green Version]
Abid, A.; Balin, M.F.; Zou, J. Concrete autoencoders for differentiable feature selection and reconstruction. arXiv 2019, arXiv:1901.09346. [Google Scholar]
DrugBank. DrugBank Fall 2019 Feature Release. Available online: https://go.drugbank.com/ (accessed on 4 March 2021).
O’Boyle, N.M. Towards a Universal SMILES representation—A standard method to generate canonical SMILES based on the InChI. J. Cheminform. 2012, 4, 22. [Google Scholar] [CrossRef]
PackageRcpi. Available online: https://www.rdocumentation.org/packages/Rcpi/versions/1.8.0 (accessed on 4 March 2021).
Yu, H.; Chen, J.; Xu, X.; Li, Y.; Zhao, H.; Fang, Y.; Li, X.; Zhou, W.; Wang, W.; Wang, Y. A systematic prediction of multiple drug-target interactions from chemical, genomic, and pharmacological data. PLoS ONE 2012, 7, e37608. [Google Scholar] [CrossRef]
Xia, Z.; Xia, Z.; Wu, L.Y.; Zhou, X.; Wong, S. Semi-Supervised Drug-Protein Interaction Prediction from Heterogeneous Biological Spaces. BMC Syst. Biol. 2010, 4, S6. [Google Scholar] [CrossRef] [Green Version]
Ballesteros, J.; Palczewski, K. G protein-coupled receptor drug discovery: Implications from the crystal structure of rhodopsin. Curr. Opin. Drug Discov. Dev. 2001, 4, 561–574. [Google Scholar]
Imrie, C.E.; Durucan, S. A River flow prediction using artificial neural networks: Generalization beyond the calibration range. J. Hydrol. 2000, 233, 138–153. [Google Scholar] [CrossRef]
Fletcher, R.H.; Suzanne, W. Clinical Epidemiology: The Essentials, 4th ed.; Lippincott Williams & Wilkins: Baltimore, MD, USA, 2005; p. 45.126. ISBN 0-7817-5215-9. [Google Scholar]
Raschka, S.; Mirjalili, V. Python Machine Learning: MAchine Learning and Deep Learning with Python, Scikit-Learn, and TensorFlow, 2nd ed.; Packt: Birmingham, UK, 2017. [Google Scholar]
Scikit Learn. Available online: https://scikit-learn.org/stable/modules/feature_selection.html (accessed on 4 March 2021).
GeÌron, A. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 2nd ed.; O’Reilly: Sebastopol, CA, USA, 2019. [Google Scholar]

Figure 1. Method Development Graph.

Figure 2. Data preparation steps for the drug Oseltamivir, with AlogP features.

Figure 3. A simple autoencoder.

Figure 4. Pseudocode of feature selection.

Figure 5. Classification accuracies of prediction models using MLP on DTI Dataset.

Table 1. Studies in the literature.

No	Author	Imbalance Solving	Dimension Reduction	Method	Description	Performance
The studies that use deep learning methods for DTI prediction problem
1	Sachdev and Gupta, 2019 [3]	No	PCA	SVM	Proposes a novel technique by using ensemble classifiers to increase the prediction of drug-target interaction.	AUC 0.96
2	Ezzat et al., 2016 [10]	Yes	No	SVM, RF, DT	Proposes an ensemble learning method predict DTI.	AUC 0.90
3	Zhang et al., 2022 [11]	Yes	LightGBM	DeepStack-DTI	Proposes a novel method, DeepStack-DTI, for predicting DTI using deep learning.	Accuracy 97.54
4	Hasan Mahmut et al., 2020 [12]	Yes/MMIB	LASSO (Least Absolute Shrinkage and Selection)	DeepAction	Proposes a deep-learning method to predict potential or unknown DTIs.	AUC 0.98
5	You et al., 2019 [13]	No	LASSO	LASSO-DNN	Shows that the efficient representations of drug and target features are key for building learning models for predicting DTIs.	AUC 0.89 Accuracy 0.81
6	Beck et al., 2020 [14]	No	No	MT-DTI (Molecule Transformer-Drug Target Interaction)	Uses a pre-trained deep-learning-based drug-target interaction model called MT-DTI to identify commercially available drugs that could act on viral proteins of SARS-CoV-2.	Generally similar results compared to the conventional 3D structure-based prediction model.
7	Wang et al., 2018 [15]	Yes	-	Stacked AutoEncoder-RF	Presents a new computational method for predicting DTIs from drug molecular structure and protein sequence by using the stacked autoencoder of deep learning.	Accuracy 0.94
8	Monteiro et al., 2021 [16]	Yes	-	Deep Neural Network Architecture	Presents a deep-learning architecture model, which exploits the particular ability of CNNs to obtain 1D representations from protein sequences and SMILES strings.	Accuracy 0.91 Sensitivity 0.82 F1-score 0.87
9	Redkar et al., 2019 [17]	Yes	Synthetic minority oversampling technique (SMOTE)	RF, SVM	Addresses challenges faced by datasets with the class imbalance and high dimensionality to develop a predictive model for DTI prediction.	Accuracy 95.9
10	Manoochehri and Nourani, 2020 [18]	Yes	-	Deep Neural Network	Models DTI prediction problems as link prediction in a semi-bipartite graph and uses deep learning as a learning tool.	The model improves the performance 14% AUROC
11	Peng et al., 2020 [19]		Denoising Autoencoder	Depp CNN	Implements a learning-based method for DTI prediction based on feature representation learning and deep neural networks.	AUROC 0.94 AUPR 0.94
12	Wang et al., 2020 [20]		SPCA	Deep Long Short-Term Memory	Develops a deep-learning-based model for DTIs prediction.	AUC 0.99
The studies that use deep learning methods in other medical areas
13	Meena and Roy, 2022 [21]	-	-	Deep Learning based methods	Reviews some fracture detection and classification approaches and claims that CNN-based models performed very well.	-
14	Pal et al., 2022 [22]	Yes	-	UW-Net	To reduce the use of tedious and prone-to-error manual annotations from chest X-rays, the author gave a probabilistic map for automatic annotation from a small dataset.	F1-Score 95.7%
The studies that use autoencoders for feature selection in DTI prediction problem
15	Xu et. al., 2019 [23]	Yes	AutoEncoder& Boruta	SVM-RFE, KNN, Autohidden, Naive Bayes	Determines a small set of features for the random forest to predict drug response by using the Boruta algorithm.	AUC 0.70
16	Abid et al., 2019 [24]	No	Concrete Autoencoder		Proposes a new method for differentiable, end-to-end feature selection via backpropagation.	Concrete autoencoders show good performance.
17	Our Proposal	Resampling Method	Autoencoder, DT, Kbest, Variance Threshold	MLP	Presents a new autoencoder-based feature selection approach for DTI prediction problem.	Accuracy 0.92

Table 2. Example of the drug feature matrix.

	F_D₁	F_D₂	…	F_D₄₁
D₁	1	2	…	3
D₂	4	5	…	9
…	…	…	…	…
D₅₇₇₁	6	7		8

Table 3. Example of the drug feature matrix.

	F_T₁	F_T₂	…	F_T₆₇₄
T₁	A	B	…	C
T₂	D	E	…	F
…	…	…	…	…
T₄₈₄₅	X	Y	…	Z

Table 4. Drug-Target Interaction Matrix.

	F_D₁ (f₁)	F_D₂ (f₂)	…	F_D₄₁	F_T₁	F_T₂	…	F_T₆₇₄ (f₇₁₅)
x₁	1	2	…	3	A	B	…	C
x₂	1	2	…	3	D	E	…	F
…	…	…	…	…	…	…	…	…
X_27,960,495	6	7	…	8	X	Y	…	Z

Table 5. The parameters for Autoencoder.

Parameter	Value
Number of neurons in the hidden layer	13
Input Dimension Size	715
Activation Functions	(ReLu, Sigmoid)
Loss Function	Binary Cross Entropy
Batch Size	(25, 100)

Table 6. Structures employed in the ablation study for the MLP classifier.

Structure	Designation	Structure
S1	MLP-5-2	Input + 5 hidden neurons + 2 hidden neurons + 2 output neurons
S2	MLP-5-3	Input + 5 hidden neurons + 3 hidden neurons + 2 output neurons
S3	MLP-10-10	Input + 10 hidden neurons + 10 hidden neurons + 2 output neurons
S4	MLP-20-10	Input + 20 hidden neurons + 10 hidden neurons + 2 output neurons
S5	MLP-50-20	Input + 50 hidden neurons + 20 hidden neurons + 2 output neurons
S6	MLP-150-50	Input + 150 hidden neurons + 50 hidden neurons + 2 output neurons

Table 7. The intervals for values of the parameters in MLP models.

Parameter	Value
Number of neurons in the hidden layer	[2–20]
Learning rate	(0–1)
Momentum	(0–1)
Maximum iteration	10,000
Number of epochs	500

Table 8. Classification accuracies for Drug-Target Interaction dataset (Kbest- Grid Search 10–350).

	S1	S2	S3	S4	S5	S6
Kbest10	0.56	0.62	0.69	0.66	0.69	0.71
Kbest 50	0.6	0.7	0.64	0.7	0.8	0.8
Kbest 100	0.24	0.68	0.65	0.68	0.74	0.82
Kbest 160	0.22	0.56	0.62	0.6	0.69	0.71
Kbest 200	0.24	0.63	0.67	0.54	0.6	0.56
Kbest 240	0.2	0.45	0.53	0.25	0.67	0.61
Kbest 300	0.32	0.3	0.32	0.2	0.43	0.56
Kbest 350	0.3	0.54	0.65	0.54	0.43	0.52

Table 9. Classification accuracies on Drug-Target Interaction dataset (Autoencoder-Grid Search 50–350).

	Structure 1	Structure 2	Structure 3	Structure 4	Structure 5	Structure 6
Autoencoder 10	0.12	0.23	0.12	0.14	0.13	0.18
Autoencoder 50	0.15	0.41	0.16	0.23	0.26	0.28
Autoencoder 100	0.38	0.32	0.17	0.28	0.35	0.39
Autoencoder 160	0.42	0.27	0.25	0.38	0.47	0.46
Autoencoder 200	0.35	0.15	0.24	0.34	0.43	0.45
Autoencoder 240	0.38	0.26	0.37	0.48	0.73	0.75
Autoencoder 300	0.43	0.39	0.45	0.56	0.73	0.78
Autoencoder 350	0.85	0.68	0.69	0.89	0.9	0.92

Table 10. Classification accuracies on Drug-Target Interaction dataset (Autoencoder-Grid Search 50–350).

	Accuracy						F-Score
Approach	S1	S2	S3	S4	S5	S6	S1	S2	S3	S4	S5	S6
No FS	0.74	0.69	0.7	0.75	0.86	0.86	0.0024	0.001	0.002	0.0024	0.003	0.003
Kbest 10	0.56	0.62	0.69	0.66	0.69	0.71	0.0014	0.0028	0.0021	0.0022	0.0026	0.003
Kbest 50	0.6	0.7	0.64	0.7	0.8	0.8	0.0017	0.003	0.0022	0.0021	0.0026	0.0025
Kbest 100	0.24	0.68	0.65	0.68	0.74	0.82	0.0012	0.0026	0.0025	0.0026	0.0034	0.004
Variance	0.45	0.73	0.72	0.75	0.81	0.75	0.0011	0.002	0.002	0.023	0.074	0.01
Tree	0.55	0.66	0.71	0.71	0.81	0.81	0.0017	0.0025	0.0023	0.0023	0.0035	0.0032
AE 350	0.85	0.68	0.69	0.89	0.9	0.92	0.0031	0.002	0.002	0.005	0.08	0.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ozsert Yigit, G.; Baransel, C. A Novel Autoencoder-Based Feature Selection Method for Drug-Target Interaction Prediction with Human-Interpretable Feature Weights. Symmetry 2023, 15, 192. https://doi.org/10.3390/sym15010192

AMA Style

Ozsert Yigit G, Baransel C. A Novel Autoencoder-Based Feature Selection Method for Drug-Target Interaction Prediction with Human-Interpretable Feature Weights. Symmetry. 2023; 15(1):192. https://doi.org/10.3390/sym15010192

Chicago/Turabian Style

Ozsert Yigit, Gozde, and Cesur Baransel. 2023. "A Novel Autoencoder-Based Feature Selection Method for Drug-Target Interaction Prediction with Human-Interpretable Feature Weights" Symmetry 15, no. 1: 192. https://doi.org/10.3390/sym15010192

APA Style

Ozsert Yigit, G., & Baransel, C. (2023). A Novel Autoencoder-Based Feature Selection Method for Drug-Target Interaction Prediction with Human-Interpretable Feature Weights. Symmetry, 15(1), 192. https://doi.org/10.3390/sym15010192

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Autoencoder-Based Feature Selection Method for Drug-Target Interaction Prediction with Human-Interpretable Feature Weights

Abstract

1. Introduction

2. The Proposed Method

2.1. Construction of Drug-Target Interaction Vectors

2.2. The Modified Autoencoder

2.3. Ablation Study for the Classifier

3. Performance Evaluation

3.1. The Performance Metric

3.2. Feature Selection Approaches

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI