Protein-Protein Interactions Prediction Using a Novel Local Conjoint Triad Descriptor of Amino Acid Sequences

Protein-protein interactions (PPIs) play crucial roles in almost all cellular processes. Although a large amount of PPIs have been verified by high-throughput techniques in the past decades, currently known PPIs pairs are still far from complete. Furthermore, the wet-lab experiments based techniques for detecting PPIs are time-consuming and expensive. Hence, it is urgent and essential to develop automatic computational methods to efficiently and accurately predict PPIs. In this paper, a sequence-based approach called DNN-LCTD is developed by combining deep neural networks (DNNs) and a novel local conjoint triad description (LCTD) feature representation. LCTD incorporates the advantage of local description and conjoint triad, thus, it is capable to account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. DNNs can not only learn suitable features from the data by themselves, but also learn and discover hierarchical representations of data. When performing on the PPIs data of Saccharomyces cerevisiae, DNN-LCTD achieves superior performance with accuracy as 93.12%, precision as 93.75%, sensitivity as 93.83%, area under the receiver operating characteristic curve (AUC) as 97.92%, and it only needs 718 s. These results indicate DNN-LCTD is very promising for predicting PPIs. DNN-LCTD can be a useful supplementary tool for future proteomics study.


Introduction
Protein-protein interactions (PPIs) play critical roles in virtually all cellular processes, including immune response [1], DNA transcription and replication [2], and signal transduction [3]. Therefore, correctly identifying PPIs can not only better elucidate protein functions but also further understand the various biological processes in cells [4][5][6]. In recent years, biologists take advantage of high-throughput technologies to detect PPIs, such as mass spectrometric (MS), tandem affinity purification (TAP) [7], yeast two-hybrid system (Y2H) [8,9], and so on. Unfortunately, these wet-lab experiments are costly and labor-intensive, and have a high rate of both false positive and false negative, and limited coverage. Hence, it is extremely imperative to develop reliable computational models to predict PPIs in large scale [10].
So far, a number of computational methods have been developed for the detection of PPIs. Most of these methods are based on the genomic information, such as Gene Ontology and annotations [11], phylogenetic profile, and gene fusion [12]. Methods employ 3D structural information of proteins [13,14] and the sequence conservation between interacting proteins [15] also have been reported. However, these methods are heavily dependent on the pre-knowledge of the proteins, such as protein functional domains, structure information of proteins, and physicochemical properties of proteins [16,17]. In other words, all these methods are hardly implementable unless the pre-knowledge about proteins is available. Compared to the abundant data of protein sequences, other types of data including 3D structure, Gene Ontology annotations, and domain-domain interactions of proteins are still limited.
Many researchers have innovated sequence-based methods for detecting PPIs [18][19][20][21][22][23][24], and experimental results have shown that the information of the amino acid sequences alone is sufficient to identify new PPIs. Among them, Shen et al. [18] achieved an excellent effect based on support vector machine (SVM). They grouped 20 standard amino acids into 7 classes according to their dipoles, volumes of the side chains, and then employed conjoint triad (CT) method to extract the features information of amino acid sequences based on the classification of amino acids. Next, SVM predictor is used to predict PPIs. Their method yields a high prediction accuracy of 89.3% on human PPIs. However, it does not consider the neighboring effect and PPIs are almost always occurring in the non-continuous segments of amino acid sequences. Guo et al. [19] developed SVM-based method by using auto covariance (AC) to abstract the feature information in the discontinuous amino acid segments in the sequence, and obtained a perfect result with accuracy as 86.55% on Saccharomyces cerevisiae (S. cerevisiae). Yang et al. [20] introduced local descriptor (LD) to encode amino acid sequences based on k-nearest neighbor (kNN). In this study, they grouped 20 standard amino acids into 7 classes as done by Shen et al. [18]. Then they divided an entire protein sequence into ten segments with varying length and extracted information of each segment. Finally, they applied kNN to predict PPIs. This kNN based method achieves prediction accuracy as 86.15% on S. cerevisiae. You et al. [21] innovated a novel multi-scale continuous and discontinuous (MCD) descriptor based on the LD [20]. In order to discover more information from amino acid sequences, MCD descriptor applies the binary coding scheme to construct varying length segments and abstracts the feature vectors from these segments. Then the minimum redundancy maximum relevancy criterion [25], which can reduce the feature abundance and computation complexity, is used to select an optimal feature subset. Finally, SVM is employed to predict new PPIs. This solution obtains a high accuracy as 91.36% on S. cerevisiae. Recently, Du et al. [22] employed deep neural networks (DNNs), a recently famous and popular machine learning technique, and amphiphilic pseudo amino acid composition (APAAC) [26] to predict new PPIs. They firstly extracted the feature information from two respective amino acid sequences by APAAC, then they took APAAC features of two respective proteins as inputs of two separate DNNs and fused the two DNNs to predict PPIs. Their method obtains an accuracy of 92.5% on PPIs of S. cerevisiae.
LD descriptor [20] only considers the neighboring effect of adjacent two types of amino acids. Hence, it cannot sufficiently abstract information of neighboring amino acids but can sufficiently discover information of discontinuous segments of the amino acid sequences. On the other hand, CT [18] considers the neighboring effect of adjacent three types of amino acids but ignores the discontinuous information. Given these observations, we combine the advantage of local descriptor [20] and conjoint triad method [18], and introduce a novel feature representation method called local conjoint triad descriptor (LCTD). LCTD can better account for the interactions between sequentially distant but spatially close amino acid residues than LD [20] and CT [18]. DNNs, a recently powerful machine learning technique, can not only reduce the impact of noise in the raw data and automatically extract high-level abstractions, but also have better performance than traditional models [27,28]. Inspired by these characteristics of DNNs, we employ DNNs to detect the PPIs based LCTD feature representation of amino acid sequences and introduce an approach called DNN-LCTD. Particularly, DNN-LCTD extracts the feature information of the amino acid sequences by LCTD, then it trains a 3-hidden layers neural network by taking feature sets derived from LCTD as inputs and accelerates training by graphics processing unit (GPU). Finally, the learned network is employed to predict new PPIs. We perform experiments on PPIs of S. cerevisiae, DNN-LCTD achieves 93.12% accuracy, 93.83% sensitivity, 93.75% precision, and area under the receiver operating characteristic curve (AUC) as 97.92%, and only uses 718 s. Experimental results on other five independent datasets: Caenorhabditis elegans (4013 interacting pairs), Escherichia coli (6954 interacting pairs), Helicobacter pylori (1420 interacting pairs), Homo sapiens (1412 interacting pairs), and Mus musculus (313 interacting pairs), further demonstrate the effectiveness of DNN-LCTD.

Results and Discussion
In this section, we briefly introduce the evaluation metrics employed in performance comparison. Then, we provide the recommended configuration of experiments. Finally, we analyze and discuss the experimental results and compare our results with those of other related work.

Evaluation Metrics
To reasonably evaluate the performance of DNN-LCTD, five-fold cross validation is adopted. Cross validation can avoid the overfitting and enhance the generalization performance [29]. Six evaluation metrics are used to quantitatively measure the prediction performance of DNN-LCTD, including overall prediction accuracy (ACC), precision (PE), recall (RE), specificity (SPE), matthews correlation coefficient (MCC), F 1 score values, and area under the receiver operating characteristic curve (AUC). They (except AUC) are defined as follows: where TP (true positive) is the number of the true PPIs that are correctly predicted, the FN (false negative) is the number of the true interacting pairs that are failed to be predicted, TN (true negative) is the number of the true non-interactions protein pairs of that are correctly predicted, FP (false positive) is the number of true non-interactions pairs that are failed to be predicted. MCC is a measure for the quality of binary classification. MCC equal to 0 means completely random prediction, −1 means completely wrong prediction and 1 means perfect prediction. F 1 score is a harmonic average of precision and recall. A larger F 1 denotes a better performance. Receiver operating characteristic curve (ROC) can elucidate the diagnostic ability of a binary classifier system by graphical plot. This curve is produced by plotting the true positive rate versus the false positive rate under different thresholds [30,31]. AUC is the area under the ROC curve and its value is widely employed to compare predictors. The larger the value of AUC, the better the predictor is.

Experimental Setup
DNN-LCTD is implemented on Tensorlfow platform https://www.tensorflow.org. The flowchart of DNN-LCTD is shown in Figure 1. DNN-LCTD firstly encodes the amino acid sequences using the novel LCTD. After that, we train a 3-hidden layers neural network with GPU based on the encoded feature sets. Finally, we apply the learned DNN to predict new PPIs. Hyper-parameters of the DNN model heavily impact the experimental results. Deep learning algorithms have ten or more hyper-parameters to be properly specified, trying all of them is impossible in practice [32]. We summarize the recommended configuration of DNN-LCTD in Table 1. As to the parameters setup of the comparing methods, we use the grid search approach to obtain the optimal parameters. The optimal parameters is shown in Table Figure 1. The flowchart of DNN-LCTD for predicting protein-protein interactions. There are some abbreviations in this figure, including database of interacting proteins (DIP), protein information resource, local conjoint triad descriptor (LCTD), protein-protein interactions (PPIs), and graphics processing unit (GPU). The No neg is the number of non-interacting protein pairs, No pos is the number of interacting protein pairs. Y/N means yes/no.

Results on PPIs of S. cerevisiae
In order to achieve good experimental results, the corresponding hyper-parameters for deep neural network are firstly optimized. Table 1 provides the recommended hyper-parameters that are chosen by a large number of experiments. Considering the numerous samples used in this work, five-fold cross validation is adopted to reduce the impact of data dependency and to minimize the risk of over-fitting. Thus, five models are generated for the five sets of data. Table 3 reports the results of DNN-LCTD on five individual folds (fold 1-5) and the overall average results of five folds. From Table 3, we can observe that all the prediction accuracies are nearly ≥93.1%, the precisions are ≥93.35%, all the recalls are almost ≥93.4%, the specificities are ≥92.75%, and the F 1 are ≥92.4%. In order to comprehensively evaluate the performance of DNN-LCTD, the MCC and AUC are also calculated. DNN-LCTD achieves superior prediction performance with an average accuracy as 93.11%, precision as 93.75%, recall as 92.40%, specificity as 92.75%, MCC as 86.24%, F 1 as 93.06%, and AUC as 97.95%.  Table 2. Optimal parameters of comparing methods.

Method Name Parameters
Guo's work [19] SVM + AC Plenty sequence-based methods have been employed to predict PPIs. We compare the prediction performance of DNN-LCTD with the other existing approaches on S. cerevisiae, including Guo et al. [19], Yang et al. [20], Zhou et al. [33], You et al. [21], and Du et al. [22]. The details of these approaches were introduced in Section 1. From Table 3, we can observe that DeepPPI [22] achieves the best performance among comparing methods (except DNN-LCTD). DeepPPI firstly uses APAAC descriptor to encode the amino acid sequence for each protein and takes the APAAC features as separate inputs for two individual DNNs to extract high-level features of these two proteins, it finally fuses the extracted features to predict PPIs. Its average prediction accuracy is 92.58% ± 0.38%, precision is 94.21% ± 0.45%, recall is 90.95% ± 0.41%, MCC is 85.41% ± 0.76%, F 1 is 92.55% ± 0.39%, and AUC is 97.55% ± 0.16%. This result mean that DeepPPI [22] is indeed successful for predicting new PPIs using DNNs with APAAC [26]. DNN-LCTD encodes the amino acid sequences of each protein via LCTD descriptor, it then concatenates the LCTD features of two proteins into a longer feature vector and takes the concatenated features as inputs of DNN for prediction. The average accuracy, recall, MCC, F 1 and AUC of DNN-LCTD are 0.53%, 1.45%, 0.83%, 1.05% and 0.4% higher than those of DeepPPI, respectively. The reason is that LCTD can discover more feature information from amino acid sequences than APAAC. The DNN-LCTD is far greater than other comparing approaches can be attributed to the merits of DNNs and of LCTD. The contributions of LCTD and DNNs will be further investigated in Sections 2.4 and 2.5. The S. cerevisiae dataset contains tremendous samples, hence, a little improvement in prediction performance still has a great effect. Based on these experimental results, we can conclude that DNN-LCTD can more effectively predict PPIs than other comparing methods, and the proposed LCTD descriptor can explore more patterns from continuous and discontinuous amino acid segments. The adopted negative PPIs set may lead to a biased estimation of prediction performance [34]. To prove the rationality of a negative set generated by selecting non-interacting pairs of non-co-localized proteins [19], we perform additional testing on a simulated dataset of S. cerevisiae. Particularly, we firstly construct the negative PPIs set by pairing proteins whose subcellular localizations are different, and we randomly select 17,257 protein pairs as the negative set of the simulated dataset. Next, we construct the positive PPIs set by pairing proteins whose subcellular localizations are the same, regardless of being interacting pairs or not. We then randomly select 17,257 protein pairs as the positive set. As a result, the simulated testing dataset includes 34,514 protein pairs for testing, where half are positives and the other half are negatives. After that, we randomly divide these testing PPIs into five folds, and apply the same DNN as trained on the dataset in Table 3 to predict PPIs in each fold. Table 4 reports the evaluation results on this simulated dataset. From Table 4, we can see that the values of accuracy, recall, MCC, and F 1 are much lower than the corresponding values reported in Table 3. The reason for the high specificity in Table 4 is that the way of constructing negative dataset in the training dataset (used in Table 3) and simulated testing dataset is the same. These results indicate that the constructed negative set is reasonable.

Comparison with Different Descriptors
To further investigate the contribution of the novel local conjoint triad descriptor, we separately train DNNs based on CT [18], AC [19], LD [20,33], MCD [21], APAAC [22], and LCTD. After that we use pairwise t-test at 95% significance level to check the statistical significance between LCTD and LD, MCD, AC, CT, APAAC in five-fold cross validation and report the results in Figure 2 and Table 5. In Table 5, • means that LCTD is statistically significant better than other descriptors on a particular evaluation metric. From Figure 2 and Table 5, we can observe that the prediction performance using LCTD outperforms other descriptors across nearly all evaluation metrics. The ACC, MCC, F 1 and AUC of DNN-LCTD are 1.76%, 3.48%, 1.86%, and 2.85% higher than those of DNN-MCD; 2.92%, 5.81%, 3.05% and 1.62% higher than those of DNN-LD; 3.62%, 7.25%, 3.56% and 2.06% than those of DNN-AC; 1.27%, 7.74%, 9.41% and 1.99% than those of DNN-CT; 3.02%, 5.99%, 3.03% and 2.06% than those of DNN-APAAC, respectively. These improvements can be attributed to that LCTD can extract more useful feature information of amino acid sequences by incorporating the advantage of LD [20,33] and conjoint triad (CT) descriptor [18]. From these results, we can conclude that the novel LCTD can more sufficiently capture the feature information of amino acid sequences for PPIs prediction.

Comparison with Existing Methods
Meanwhile, in order to further investigate the effective of DNNs, we separately train the different state-of-the-art predictors on S. cerevisiae dataset using LCTD to encode amino acid sequences, these predictors include support vector machine (SVM) [35], k neighbor nearest (kNN) [36], random forest (RF) [37], and adaboost [38]. Then, we compare the prediction performance based on the six already introduced evaluation metrics. In this study, five-fold cross validation is employed to reduce the impact of data dependency and enhance the reliability of the experiments. The results are shown in Figure 3. From Figure 3 we can see that a high average accuracy of 93.11% is obtained by DNN-LCTD. The average accuracy of adaboost, kNN, random forest, and SVM are 92.83%, 86.87%,92.28%, 92.76%, respectively. DNNs have the highest prediction performance across all evaluation metrics except in RE and SPE. In practice, grid search is used to seek the optimal parameters of these comparing algorithms. We also show the training speed of different comparing methods in Table 6. We can observe that DNN-LCTD with central processing unit (CPU) is separately 2, 25 and 39 times faster than random forest, adaboost and SVM. In order to speed up training of DNN-LCTD, GPU is employed. We can see that the training time of DNN-LCTD with GPU is 3 times faster than that with CPU, 4, 9.5, 97.5 and 148 times than k neighbor nearest, random forest, adaboost and SVM. According to these experimental results, we can conclude that DNN-LCTD can accurately and efficiently predict PPIs from amino acid sequences.

Results on Independent Datasets
To further assess the practical prediction ability of DNN-LCTD and other comparing methods, we firstly train different models with optimal configurations (details in Section 2.2) using PPIs of S. cerevisiae dataset (34,514 protein pairs). After that, five independent datasets that only contain the samples of interactions, including Caenorhabditis elegans (4013 interacting pairs), Escherichia coli (6954 interacting pairs), Helicobacter pylori (1420 interact-ing pairs), Homo sapiens (1412 interacting pairs), and Mus musculus (313 interacting pairs), are used as test sets to evaluate the prediction performance of these trained models. The prediction results are shown in Table 7. From Table 7, we can observe that the accuracy of DNN-LCTD on C. elegans, E. coli, H. pylori, H. sapiens, and M. musculus are 93.17%, 94.62%, 87.38%, 94.18%, and 92.65%, respectively. DNN-LCTD has a higher accuracy than DeepPPI [22] and SVM + LD [33] on E. coil, H. sapiens, and M. musculus. The accuracy of SVM + LD [33] is far lower than DNN-LCTD on C. elegans and H. pylori. These prediction accuracies are satisfying except on H. pylori. The reason is that we use S. cerevisiae as the training set to train models, the trained model is inclined to species that are closer to S. cerevisiae. In reality, S. cerevisiae has closer relationship with other four datasets than with H. pylori. These prediction results indicate that DNN-LCTD has a good generalization ability for predicting PPIs.

Materials and Methods
In this section, we briefly introduce the datasets we used for experiments, including S. cerevisiae and other five independent datasets. Then, we introduce the details of LCTD, a novel feature representation descriptor. Finally, we present a brief introduction of deep neural networks (DNNs), including characteristics and skills.

PPIs Datasets
To reliably evaluate the performance of DNN-LCTD, a validation benchmark dataset is necessary. We adopt the S. cerevisiae dataset used by Du et al. [22] for experiments. This dataset was collected from the database of interacting proteins (DIP; version 20160731) [39]. The protein pairs of this dataset exclude proteins with fewer than 50 amino acids and ≥40% sequence identity [19]. Finally, this dataset contains 17,257 positive protein pairs. Negative examples impact the prediction results of PPIs. The common approach is based on annotations of cellular localization [40,41]. The negative set is obtained by pairing proteins whose subcellular localizations are different. The strategy must meet the following requirements [18,19]: (1) the non-interaction pairs cannot appear in the positive dataset, and (2) the contribution of proteins in the negative set should be as harmonious as possible, which means that proteins without subcellular localization information, or denoted as 'putative', 'hypothetical' are excluded for constructing the negative set. Finally, 48,594 negative pairs are generated via this strategy. In the end, S. cerevisiae contains 34,514 protein pairs, where half are from positive dataset and the other (17,257 negative pairs) are randomly selected from the whole negative set. Other five independent PPIs datasets, including Caenorhabditis elegans (4013 interacting pairs), Escherichia coli (6954 interacting pairs), Helicobacter pylori (1420 interacting pairs), Homo sapiens (1412 interacting pairs), and Mus musculus (313 interacting pairs) [33], are used as independent test datasets to assess the generalization ability of DNN-LCTD. These datasets are available at http://ailab.ahu.edu.cn:8087/DeepPPI/index.html.

Feature Vector Extraction
Whether the encoded features are reliable or not can heavily affect the performance of PPIs prediction. The main challenge is how to effectively describe and represent an interacting protein pairs by a fixed length feature vector, in which the essential information content of interacting proteins is fully encoded. Various sequence-based methods are proposed to predict new PPIs, but one flaw of them is that they cannot adequately capture interaction information from continuous and discontinuous amino acid segments at the same time. To overcome this problem, we introduce a novel local conjoint triad descriptor (LCTD), which incorporates the advantage of local descriptor (LD) [20,33] and conjoint triad (CT) [18] sequence representation approach. To clearly introduce the LCTD, we first briefly introduce the feature representation methods of CT [18] and LD [20,33] in the following two subsections.

Conjoint Triad (CT) Method
Shen et al. [18] introduced the conjoint triad (CT). In order to conveniently represent the 20 standard amino acids and to suit synonymous mutation, they firstly divided these 20 standard amino acids into 7 groups based on the dipoles and volumes of the side chains as shown in Table 8. After that, the conjoint triad method is introduced to extract the sequence information, which includes the properties of one amino acid and its vicinal amino acids and regards any three continuous amino acids as a unit [18]. The process of generating descriptor vectors is described as follows. Table 8. Division of amino acids into seven groups based on the dipoles and volumes of the side chains. Firstly, they replaced each amino acid in the protein sequence by the index depending on its grouping. For instance, protein sequence "VCCPPVCVVCPPVCVPVPPCCV" is replaced by 0112201001220102022110. Then, binary space (V, F) stands for a protein sequence. Here, V is the vector space of the sequence features, and each feature v i represents a kind of triad type [18]. For example, v 1 , v 7 , and v 10 are separately representing the triad unit of 100, 010, 310. F is the frequency vector corresponding to V, and the value of the ith dimension of F (f i ) is the frequency of type v i appearing in amino acid sequence [18]. As the amino acids grouped into seven classes, the size V should be 7 × 7 × 7; therefore, i = 0, 1, · · · , 342. The detailed definition and description is shown in Figure 4. Clearly, each protein has a corresponding F vector. Nevertheless, the value of f i relates to the length of amino acid sequence. A longer amino acid sequence generally have a larger value of f i , which complicates the comparison between two heterogeneous proteins. As such they employed the normalization to solve this problem as follows: where the value of d i is normalized in the range [0, 1]. f i is the frequency of conjoint triad unit v i appearing in the protein sequence. Finally, they connected the vector spaces of two proteins to present the interaction features. Thus, a 686-dimensional vector (343 for each protein) is generated for each pair of proteins.

Local Conjoint Triad Descriptor (LCTD)
From the process of LD descriptor [20,33], we can find that it only considers the neighboring effect of adjacent two types of amino acids. Therefore, it cannot sufficiently extract information of neighbor amino acids, but can sufficiently discover information of discontinuous segments of the amino acid sequence. Meanwhile, we observe that the conjoint triad method [18] considers the neighboring effect of adjacent three types of amino acid, but ignores the discontinuous information. Thus, we advocate to integrate the merits of LD [20,33] and conjoint triad (CT) [18] to introduce a novel feature representation of amino acid sequence called LCTD. LCTD groups the 20 standard amino acids into 7 groups on the dipoles and volumes of the side chains at first as shown in Table 8. Then it divides the entire protein sequence into 10 segments as done by LD [20,33]. Next, for each local region, we calculate four descriptors, composition (C), transition (T) and distribution (D), and conjoint triad (CT). C represents the composition of each amino acid group. T stands for the frequency from a type of amino acid to another type. D describes the distribution pattern along the entire region by measuring the location of the first 25%, 50%, 75% and 100% of residues of a given group [33,44]. Conjoint triad considers the properties of one amino acid and its vicinal amino acids, it regards any three continuous amino acids as a unit [18]. These descriptors are introduced in Sections 3.2.1 and 3.2.2. For each local region, the four descriptors (C, T, D, CT) are calculated and concatenated, and a total of 63 + 343 descriptors are generated: 7 for C, 21 (7 × 6/2) for T and 35 (7 × 5) for D, and 343 for CT. After that, all descriptors from 10 regions are concatenated into an 4060-dimensional vector. Finally, LCTD concatenates the vectors of two individual proteins. Thus, a 8120-dimensional vector is constructed to encode each protein pair. The corresponding equations are shown as follows: D Bi = C ⊕ T ⊕ D ⊕ CT(i = 1, 2, · · · , 10) (9) where A and B are a pair of proteins, ⊕ is the vector concatenating operator. D A , D B is the extracted feature vector from A and B, respectively. i refers to any segment in 10 split segments. D AB is the extracted feature of two amino acid sequences. These 8120-dimensional feature vectors are used as input of DNNs for training and prediction.

Deep Neural Network
Deep learning, a popular type of machine learning algorithms, consists with an artificial neural network of multiple nonlinear layers. It is inspired by the biological neural network that constitutes animal brains. The characteristics of deep learning are that it can learn suitable features from the original data without designed by human engineers, and discover hierarchical representations of data [45]. The depth of a neural network corresponds to the number of hidden layers, and the width is the maximum number of neurons in one of its layers [27]. Neural network with a large number of hidden layers (three or more hidden layers) is called deep neural network [27].
The basic structure of DNN consists of an input layer, multiple hidden layers, and an output layer, the special configuration of our neural network is shown in Figure 7. In general, input data (x) are given to the DNN, the output values are sequentially computed along the layers of the network. Neurons of a hidden layer or output layer are connected to all neurons of the previous layer [27]. Each neuron computes a weighted sum of its inputs and applies a nonlinear activation function to calculate its outputs f (x) [27]. The representations in the layer below are transformed into slightly more abstract representations by the computation in each layer [46]. In general, the nonlinear activation function including sigmoid, hyperbolic tangent, or rectified linear unit (ReLU) [47]. The sigmoid and ReLU are used in this study.
In this work, we use the mini-batch gradient descent [48] and Adam algorithm [49] to reduce the sensitivity to the specific choice of learning rate [27], and speed up training using GPU. The dropout technique is employed to avoid the overfitting, which the activation of some neurons is randomly set to zero during training in each forward pass as shown in Figure 7 [27]. The dotted line means this neuron will not be activated and calculated. The activation function of ReLU [47] and the loss of cross entropy is employed because they can both accelerate the model training and obtain better prediction results [50]. Batch normalization approach is also employed to reduce the dependency of training with the parameter initialization, speed up training and minimize the risk of over-fitting. The following equations are used to calculate the loss: H i(j+1) = σ 1 (W ij H ij + b ij )(i = 2, · · · , n, j = 1, · · · , h) (14) where n is the number of PPIs for batch training. σ 1 is the activation function of ReLU, σ 2 is the activation function of the output layer with sigmoid, X is the batch training inputs, H is the outputs of hidden layer, and y is the corresponding desired outputs. h is the depth of the DNN, W is the weight matrix between the input layer and the output layer and b is the bias.

Conclusions
In this article, we propose an efficient approach for predicting PPIs from protein primary sequences by a novel local conjoint triad feature representation with DNNs. The LCTD takes PPIs of continuous segments and discontinuous segments in protein sequence into account at the same time. The feature sets, characterized by LCTD, are capable of capturing more essential interactions information from the continuous and discontinuous binding patterns within a protein sequence. We then train a DNN with LCTD feature sets as inputs. Finally, the trained DNN is employed to predict the new PPIs. The experimental results indicate that DNN-LCTD is very promising for predicting PPIs and can be an available supplementary tool to other approaches.
The high prediction accuracy can be partially attributed to a biased selection of positive/negative training data. In practice, the available PPIs are incomplete and have a high rate of false positives and false negative. Furthermore, constructing the negative data set by subcellular localization information may also result in bias. How to construct a high quality negative set and how to reduce the impact of noisy and bias of PPIs data are future pursues. Another possible reason for the high accuracy is that DNN can model complex relationship between molecules by hidden layers and reduce the impact of noisy and bias of PPIs data.