Comparison of Target Features for Predicting Drug-Target Interactions by Deep Neural Network Based on Large-Scale Drug-Induced Transcriptome Data

Uncovering drug-target interactions (DTIs) is pivotal to understand drug mode-of-action (MoA), avoid adverse drug reaction (ADR), and seek opportunities for drug repositioning (DR). For decades, in silico predictions for DTIs have largely depended on structural information of both targets and compounds, e.g., docking or ligand-based virtual screening. Recently, the application of deep neural network (DNN) is opening a new path to uncover novel DTIs for thousands of targets. One important question is which features for targets are most relevant to DTI prediction. As an early attempt to answer this question, we objectively compared three canonical target features extracted from: (i) the expression profiles by gene knockdown (GEPs); (ii) the protein–protein interaction network (PPI network); and (iii) the pathway membership (PM) of a target gene. For drug features, the large-scale drug-induced transcriptome dataset, or the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 dataset was used. All these features are closely related to protein function or drug MoA, of which utility is only sparsely investigated. In particular, few studies have compared the three types of target features in DNN-based DTI prediction under the same evaluation scheme. Among the three target features, the PM and the PPI network show similar performances superior to GEPs. DNN models based on both features consistently outperformed other machine learning methods such as naïve Bayes, random forest, or logistic regression.


Introduction
In silico prediction of drug-target interaction (DTI) is becoming more data-driven than conventional modeling-based approaches such as docking or molecular dynamic simulation. Deep neural network (DNN) is increasingly being applied to highly complex and challenging problems such as protein folding [1]. The vast majority of drugs and compounds are expected to interact with multiple targets, i.e., polypharmacology [2]. While millions of DTIs have been identified, and increasingly keep being revealed, it is still costly and time-consuming to validate DTIs experimentally even by high throughput screening (HTS) [3]. It is most likely that there still exist unknown DTIs for both approved drugs and clinical candidate compounds. Such hidden DTIs could critically impact the drug development process including unexpected clinical outcome, or may broaden their indications through drug repositioning.
In order to facilitate identification of DTIs, a number of in silico approaches have been developed [3][4][5]. There are three major approaches for compound virtual screening: ligand-based virtual screening (LBVS), structure-based virtual screening (SBVS) such as docking, and the chemogenetic method. Ligand-based approaches are based on the idea that structurally similar compounds tend

Extraction of Drug Features
We obtained the LINCS L1000 dataset, which includes~205,034 genes expression profiles perturbed by more than 20 K compounds in 71 human cell lines. LINCS L1000 was generated using Luminex L1000 technology, where the expression levels of 978 landmark genes were measured by fluorescence intensity [17]. The LINCS L1000 dataset provides five different levels of dataset depending on the stage of data processing pipeline. The Level 1 dataset contains the raw expression values from Luminex 1000 platform; the Level 2 dataset gives the gene expression values for 978 landmark genes after deconvolution; the Level 3 provides normalized gene expression values for the landmark genes as well as imputed values for an additional~12,000 genes; the Level 4 dataset contains z-scores relative to the all the samples or vehicle controls in the plate. The Level 5 dataset is the expression signature genes extracted by merging the z-scores of replicates; We used the Level 5 data marked as exemplar signatures, Pharmaceutics 2019, 11, 377 3 of 11 which is relatively more robust, thus reliable set of DEGs (Differentially Expressed Genes). We took the concatenated expression values of 978 landmark genes for both drug-induced expression profiles (DEPs) and their untreated controls, resulting in a vector of 978 + 978 = 1956 in length. Since there are multiple untreated controls, we took the median values of replicates.

Expression Profiles by Gene Knockdown (GEPs)-Based Target Features
Knockdown GEPs were obtained from LINCS Data Portal (http://lincsportal.ccs.miami.edu/dcicportal/). The target features based on knockdown GEPs were extracted similarly to the compound features as the concatenated vector of 1956 elements from the landmark GEPs of gene knockdown as well as their control GEPs in LINCS. In order to filter out potentially spurious signals from off-targets, we took only the consensus gene signatures (CGS) for this study.

Target Features by Protein-Protein Interaction (PPI) Network
Target features were extracted using Node2vec method [24], where vector representation of each node is generated in a given network. The degree of network neighborhood is measured by random walk directed by two parameters, p and q. The return parameter, p and the inout parameter, q control the probability of a walk staying inward revisiting nodes or staying close to the preceding nodes (1/p), or moving outward farther away (1/q). We set the return hyperparameter, p = 1, and the inout hyperparameter, q = 2. The algorithmic detail of node2vec is available in [24] as well as the source code.

Pathway Membership (PM)-Based Target Features
We extracted vector representations of target genes using Global Vectors for Word Representation (GloVe) method by Pennington et al. [25]. The C2 (curated gene sets) and C5 (Gene Ontology gene sets) from MSigDB (v6.2) were used to define pathway membership [26,27]. GloVe counts global co-occurrence across diverse pathways at different levels, while node2vec extracts local context of network neighborhoods based on random walks. The package for GloVe was downloaded from https://nlp.stanford.edu/projects/glove/.

Construction of Deep Neural Networks (DNNs) and Machine Learning Models
The DNNs were implemented using TensorFlow 1.13 [28] and Keras 2.24 [29]. We adopted Adam [30] to optimize DNNs, and applied dropout = 0.5 on each FC layer [31] & early stopping at 20 epochs in order to prevent overfitting.
Naïve Bayes (NB), logistic regression (LR), and random forest (RF) methods were implemented using python package, scikit-learn (https://scikit-learn.org). Naïve Bayes was performed under default setting. The LR model was built after experimenting both L1 and L2 regularization methods as well as several values of regularization strength (C = 0.01, 0.1 and 1.0). For pathway membership, the LR model was built using L2 regularization at C = 0.1 showing best performance among the conditions tested. Similarly, the LR model for PPI was constructed using L2 regularization at C = 0.01. For the RF model, the number of max features (N) were tested at 1/2 × F, F, and 2F, where F = length of input feature vectors (default by scikit-learn), where F = 47 (= √ 2 × 978 + 256 = 2 × number of landmark genes for drug & control + length of target features from PM or PPI). We set N = 23 of 2/F showing better performance, and generated 10,000 trees in our RF models.

Extraction of Drug and Target Features
In this work, input features consist of: (i) drug features; and (ii) target features, from which DTI predictions are made for the corresponding drugs and targets. For drug features, we chose to use DEPs of 978 landmark genes from LINCS L1000 because DEP is expected to provide highly rich information on MoA, and have been effectively used for DTI prediction [14][15][16]. DEPs were set common to all the DNN models or other machine learning techniques, so that different target features are compared under the same condition. We used concatenated expression profiles of 978 landmark genes for both drug-treated and untreated controls, which resulted in 978 × 2 = 1956 feature vectors.
We took three canonical types of target features that represent distinct information on target properties [23,32,33], but were not objectively compared for DTI prediction: (i) knockdown GEPs (GEP); (ii) PPI network (PPI); and (iii) pathway membership (PM). First, knockdown GEPs were available for 4500 genes across 17 cell lines. Again, GEPs were generated as 1,956 feature vectors by concatenating the two landmark expression profiles by gene knockdown and control. Knockdown GEPs reflect transcriptomic changes by genetic perturbation, which may be correlated to DEPs by drugs targeting the same gene(s).
Second, PPI features were extracted by node2vec method [24] using the PPI network of STRING v11 with 13,592 genes and 297,050 interactions at confidence cutoff ≥ 0.7 [34]. In Node2vec, node features are extracted by applying random walks in a network. Accordingly, this method is designed to preserve local neighborhoods, i.e., mapping neighboring nodes of a network embedded close together in the resulting feature space.
Third, pathway membership is used to extract target features. Pathway is defined as the collection of functionally related genes, and is a standard tool for functional interpretation including drug MoA [35]. Since pathway member genes are functional related, but not necessarily connected in a PPI network, PM features represent a more global context than PPI-based ones. For feature extraction by PM, we chose Global Vectors for Word Representation (GloVe) by Pennington et al. [25], where GloVe tend to embed targets more closely if they share common pathways more frequently. GloVe is based on word embedding methods, which were also applied to extract gene features using Gene Ontology or coexpression [33,36].

Overall Architecture of DNNs
Our DNN models consist of several blocks of feedforward deep neural networks (FF-DNNs), and both drug and target features are needed as input vectors. Focusing on comparative analysis of target features, we did not try more complex types of DNNs such as RNN (recursive neural network) or CNN (convolutional neural network).
The overall architecture is shown in Figure 1. Their characteristics include: (i) two separate blocks of input features for drug and target, respectively; (ii) the output values of input blocks are concatenated, and additionally connected to 1~2 hidden layers. The input block for drug consists of 1~3 fully connected (FC) hidden layers, which is the same for the target block of GEP. Regarding target features for PPI network and pathway membership, vector representation of genes are already achieved by node2vec and GloVe, respectively. Therefore, their target features are directly employed instead of adding hidden layers.

Comparison of Target Features
Then, we compared the prediction performances by the three target features. For objective comparison, the cross-validation & test cycles were performed only for common target space; 965 targets were common to the three type of target features ( Figure S1). Therefore, we first tried comparative evaluation of target features limited to the common 965 targets.
For cross-validation, we collected the DTI dataset from six different DTI databases of ChEMBL Target, Therapeutic Targets Database, MATADOR, KEGG Drug, IUPHAR, PharmGKB, and KiDB [37][38][39][40][41][42][43]. We also collected independent DTI datasets for testing from Binding MOAD and DrugBank [44,45]. DTI information were made mutually exclusive between cross-validation, and test step by allowing any DTI allocated only once. We used known DTIs as positives, and an equal number  Tables S2 and S6, respectively. target features, we did not try more complex types of DNNs such as RNN (recursive neural network) or CNN (convolutional neural network).
The overall architecture is shown in Figure 1. Their characteristics include: (i) two separate blocks of input features for drug and target, respectively; (ii) the output values of input blocks are concatenated, and additionally connected to 1~2 hidden layers. The input block for drug consists of 1~3 fully connected (FC) hidden layers, which is the same for the target block of GEP. Regarding target features for PPI network and pathway membership, vector representation of genes are already achieved by node2vec and GloVe, respectively. Therefore, their target features are directly employed instead of adding hidden layers. We constructed a series of DNNs under different hyperparameter combinations, i.e., number of hidden neurons, activation function, type of target feature, and learning rate (Table S1). Learning parameters were optimized for each target features via 10-fold cross-validation across all 24 parameter combinations (Table S1). The best parameter combination was variable depending on the type of target features ( Figure S2). Accordingly, optimized DNNs for each feature were constructed as shown in Figure S3 for final comparison of target features. The DNNs were re-trained using the full training set, and evaluated against the test set ( Figure 2). PM and PPI show similar performance of AUROC (Area Under a Receiver Operating Characteristic (ROC) curve) = 0.80 and 0.79, respectively, which are substantially better than GEP (AUROC = 0.71). In order to check the degree of independence among the target features, we calculated the correlations between their distance matrices. PM and PPI seem to be moderately correlated, but uncorrelated with GEP ( Figure S4).

Comparison with DNN and Other Machine Learning Methods
Since PM and PPI showed similarly better performance than GEP, we next compared these two features with other machine learning methods. The comparison was performed in the common target space of 1955 genes between PM and PPI ( Figure S1). Accordingly, DNN models were reconstructed and evaluated according to the new target space (Table S3) under a 10-fold cross validation and test scheme. The DNN models were compared with naïve Bayes (NB), logistic regression (LR), and random forest (RF). The machine learning classifiers were trained and evaluated under the same conditions as DNNs. Again, PM and PPI showed similar performance with PM only slightly better. DNNs showed consistently better AUROC ( Figure 3A,B) and AUPR (Area under a Precision-Recall curve, Figure 3C,D) than NB, LR, and RF classifiers (Table S4). Such trend is the same with precision at top k% of 0.1~20% Pharmaceutics 2019, 11, 377 6 of 11 ( Figure 3E,F). RF models showed comparable performances at top 0.1%~1%. Overall, the performances of DNNs improved compared to those in the previous section (Figures 2 and 3), probably due to expanded training dataset. hidden neurons, activation function, type of target feature, and learning rate (Table S1). Learning parameters were optimized for each target features via 10-fold cross-validation across all 24 parameter combinations (Table S1). The best parameter combination was variable depending on the type of target features ( Figure S2). Accordingly, optimized DNNs for each feature were constructed as shown in Figure  S3 for final comparison of target features. The DNNs were re-trained using the full training set, and evaluated against the test set ( Figure 2). PM and PPI show similar performance of AUROC (Area Under a Receiver Operating Characteristic (ROC) curve) = 0.80 and 0.79, respectively, which are substantially better than GEP (AUROC = 0.71). In order to check the degree of independence among the target features, we calculated the correlations between their distance matrices. PM and PPI seem to be moderately correlated, but uncorrelated with GEP ( Figure S4).

Construction of PM-Based DNN Model Using the Full Dataset
In our experiment, PM was shown best among the three target features tested. Since this model was constructed using a limited dataset, a final model was built using the full dataset (Table S5) under the optimal architecture ( Figure S3C). The three PM-based DNNs are of the same architecture and hyperparameters, and different only in terms of the input dataset. It tends to show a better performance with increasing input dataset, although the two DNNs of the largest inputs show little difference ( Figure 4A).
In order to evaluate the performance on novel drugs that were not trained during the model-building stage, we collected an additional DTI dataset from the Comparative Toxicogenomics Database (CTD) [46]. We took 576 DTIs for 64 compounds not included in training dataset. In the previous sections, positive and negative dataset was made balanced (e.g., 1:1 ratio) to avoid training bias, which was also frequent in many other related studies for DTI prediction. In reality, positive-negative DTIs are highly skewed toward negatives because positive DTIs are sparse. We also evaluated our DNN model at different negative/positive ratios (1:1 to 100:1), which is expected to better simulate real world situations of sparse DTIs. For each of the 64 compounds, the evaluation was repeated 100 times by sampling negatives 1~100 times the number of positive DTIs, and the median AUROCs were calculated. The distribution of median AUROCs show a robust distribution of 0.82 at varying negative/positive ratios ( Figure 4B). random forest (RF). The machine learning classifiers were trained and evaluated under the same conditions as DNNs. Again, PM and PPI showed similar performance with PM only slightly better. DNNs showed consistently better AUROC ( Figure 3A,B) and AUPR (Area under a Precision-Recall curve, Figure 3C,D) than NB, LR, and RF classifiers (Table S4). Such trend is the same with precision at top k% of 0.1~20% ( Figure 3E,F). RF models showed comparable performances at top 0.1%~1%. Overall, the performances of DNNs improved compared to those in the previous section (Figures 2 and 3), probably due to expanded training dataset.

Construction of PM-Based DNN Model Using the Full Dataset
In our experiment, PM was shown best among the three target features tested. Since this model was constructed using a limited dataset, a final model was built using the full dataset (Table S5) under the optimal architecture ( Figure S3C). The three PM-based DNNs are of the same architecture and hyperparameters, and different only in terms of the input dataset. It tends to show a better performance with increasing input dataset, although the two DNNs of the largest inputs show little difference ( Figure 4A).
In order to evaluate the performance on novel drugs that were not trained during the modelbuilding stage, we collected an additional DTI dataset from the Comparative Toxicogenomics Database (CTD) [46]. We took 576 DTIs for 64 compounds not included in training dataset. In the previous sections, positive and negative dataset was made balanced (e.g., 1:1 ratio) to avoid training bias, which was also frequent in many other related studies for DTI prediction. In reality, positivenegative DTIs are highly skewed toward negatives because positive DTIs are sparse. We also evaluated our DNN model at different negative/positive ratios (1:1 to 100:1), which is expected to better simulate real world situations of sparse DTIs. For each of the 64 compounds, the evaluation was repeated 100 times by sampling negatives 1~100 times the number of positive DTIs, and the median AUROCs were calculated. The distribution of median AUROCs show a robust distribution of 0.82 at varying negative/positive ratios ( Figure 4B).

Discussion
Recently, the deep neural network (DNN) has been gaining increasing interest as a means of solving complex problems such as protein folding, image-based diagnosis, genomic data interpretation as well as DTI prediction [47][48][49]. DTI prediction is critical for efficient hit discovery, target deconvolution after phenotypic screening, and identifying novel indication in drug

Discussion
Recently, the deep neural network (DNN) has been gaining increasing interest as a means of solving complex problems such as protein folding, image-based diagnosis, genomic data interpretation as well as DTI prediction [47][48][49]. DTI prediction is critical for efficient hit discovery, target deconvolution after phenotypic screening, and identifying novel indication in drug repositioning. Drug-induced transcriptome provides rich information on drug MoA, and is considered as an alternative approach for predicting DTIs to modeling-based methods such as docking. For DTI prediction by DNN, we took large-scale DEPs from LINCS L1000 as drug features.
Unlike other related works based on protein sequences [50][51][52], we used target features reflecting three distinct aspects of protein function, and compared their relative performances. In spite of its preliminary nature, our result suggests that our DNN models consistently outperform several canonical machine learning methods with PM (and PPI) showing the best results. Although network architectures and hyper parameters may not be explored thoroughly, our results may provide useful information for further improvement. Unexpectedly, GEPs showed the worst performance even though DEPs and GEPs were of the same nature, and generated by the same platform. The L1000 dataset has several limitations in that it actually measured only 1000 landmark genes, while those for~12,000 genes were inferred, and unavailable for the rest~8000 genes (https://www.biorxiv.org/content/10.1101/332825v2) for both DEPs and GEPs. The performances could be influenced by network type or gene sets for target features from the PPI network or pathway membership, respectively. Most of all, our DNN model may not have been trained sufficiently due to a limited number of DTI training data. This suggests that there is plenty of room for improvement with increasing dataset and integration of different features.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4923/11/8/377/s1, Figure S1: Target coverage by the three target features of PPI network, knockdown GEPs, and pathway membership, where the number of associated unique targets is indicated; Figure S2: Performance distribution depending on each hyper parameter of (A) Activation function before concatenation; (B) Number of hidden neurons for drug features (DEPs); (C) Number of hidden neurons after concatenation; and (D) Learning rate. The best parameter combination was selected showing the minimum median of loss in 10-fold cross-validation for each parameter and feature type; Figure S3: Optimized architecture of DNN models for (a) knockdown GEPs; (b) PPI network; and (c) pathway membership; Figure S4: The degree of Pearson (Spearman) correlations among the distance matrices of target features; Table S1. Hyper parameters for building DNN models; Table S2: Data sources for preparing training, validation, and test dataset; Table S3: A common dataset of PM and PPI to compare DNN and other machine learning; Table S4: Performance comparison between DNN and other machine learning in AUROC (AUPR); Table S5: Datasets for training and testing the final DNN model.