Next Article in Journal
A Sensitive and Rapid UPLC-MS/MS Method for Determination of Monosaccharides and Anti-Allergic Effect of the Polysaccharides Extracted from Saposhnikoviae Radix
Previous Article in Journal
Transition-Metal-Free C(sp3)–H Oxidation of Diarylmethanes
Article Menu
Issue 8 (August) cover image

Export Article

Open AccessArticle
Molecules 2018, 23(8), 1923;

Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences

1,2,* , 1,2
School of Computer Science and Technology, Tianjin University, Nankai District, Tianjin 300072, China
Tianjin Key Laboratory of Cognitive Computing and Application, Nankai District, Tianjin 300072, China
Author to whom correspondence should be addressed.
Received: 13 June 2018 / Revised: 16 July 2018 / Accepted: 28 July 2018 / Published: 1 August 2018
Full-Text   |   PDF [2794 KB, uploaded 2 August 2018]   |  


Machine learning based predictions of protein–protein interactions (PPIs) could provide valuable insights into protein functions, disease occurrence, and therapy design on a large scale. The intensive feature engineering in most of these methods makes the prediction task more tedious and trivial. The emerging deep learning technology enabling automatic feature engineering is gaining great success in various fields. However, the over-fitting and generalization of its models are not yet well investigated in most scenarios. Here, we present a deep neural network framework (DNN-PPI) for predicting PPIs using features learned automatically only from protein primary sequences. Within the framework, the sequences of two interacting proteins are sequentially fed into the encoding, embedding, convolution neural network (CNN), and long short-term memory (LSTM) neural network layers. Then, a concatenated vector of the two outputs from the previous layer is wired as the input of the fully connected neural network. Finally, the Adam optimizer is applied to learn the network weights in a back-propagation fashion. The different types of features, including semantic associations between amino acids, position-related sequence segments (motif), and their long- and short-term dependencies, are captured in the embedding, CNN and LSTM layers, respectively. When the model was trained on Pan’s human PPI dataset, it achieved a prediction accuracy of 98.78% at the Matthew’s correlation coefficient (MCC) of 97.57%. The prediction accuracies for six external datasets ranged from 92.80% to 97.89%, making them superior to those achieved with previous methods. When performed on Escherichia coli, Drosophila, and Caenorhabditis elegans datasets, DNN-PPI obtained prediction accuracies of 95.949%, 98.389%, and 98.669%, respectively. The performances in cross-species testing among the four species above coincided in their evolutionary distances. However, when testing Mus Musculus using the models from those species, they all obtained prediction accuracies of over 92.43%, which is difficult to achieve and worthy of note for further study. These results suggest that DNN-PPI has remarkable generalization and is a promising tool for identifying protein interactions. View Full-Text
Keywords: convolution neural networks; long short-term memory neural networks; protein–protein interaction; model generalization convolution neural networks; long short-term memory neural networks; protein–protein interaction; model generalization

Graphical abstract

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Li, H.; Gong, X.-J.; Yu, H.; Zhou, C. Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences. Molecules 2018, 23, 1923.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Molecules EISSN 1420-3049 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top