Next Article in Journal
Tomato Natural Resistance Genes in Controlling the Root-Knot Nematode
Previous Article in Journal
Proteomics Recapitulates Ovarian Proteins Relevant to Puberty and Fertility in Brahman Heifers (Bos indicus L.)
Open AccessArticle

Global Vectors Representation of Protein Sequences and Its Application for Predicting Self-Interacting Proteins with Multi-Grained Cascade Forest Model

1
The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
Department of Information Systems, King Abdulaziz University, Jeddah 21589, Saudi Arabia
*
Author to whom correspondence should be addressed.
Genes 2019, 10(11), 924; https://doi.org/10.3390/genes10110924
Received: 31 August 2019 / Revised: 5 November 2019 / Accepted: 6 November 2019 / Published: 12 November 2019
(This article belongs to the Section Technologies and Resources for Genetics)
Self-interacting proteins (SIPs) is of paramount importance in current molecular biology. There have been developed a number of traditional biological experiment methods for predicting SIPs in the past few years. However, these methods are costly, time-consuming and inefficient, and often limit their usage for predicting SIPs. Therefore, the development of computational method emerges at the times require. In this paper, we for the first time proposed a novel deep learning model which combined natural language processing (NLP) method for potential SIPs prediction from the protein sequence information. More specifically, the protein sequence is de novo assembled by k-mers. Then, we obtained the global vectors representation for each protein sequences by using natural language processing (NLP) technique. Finally, based on the knowledge of known self-interacting and non-interacting proteins, a multi-grained cascade forest model is trained to predict SIPs. Comprehensive experiments were performed on yeast and human datasets, which obtained an accuracy rate of 91.45% and 93.12%, respectively. From our evaluations, the experimental results show that the use of amino acid semantics information is very helpful for addressing the problem of sequences containing both self-interacting and non-interacting pairs of proteins. This work would have potential applications for various biological classification problems. View Full-Text
Keywords: self-interacting proteins; de novo protein sequence; global vector representation; multi-grained cascade forest self-interacting proteins; de novo protein sequence; global vector representation; multi-grained cascade forest
Show Figures

Figure 1

MDPI and ACS Style

Chen, Z.-H.; You, Z.-H.; Zhang, W.-B.; Wang, Y.-B.; Cheng, L.; Alghazzawi, D. Global Vectors Representation of Protein Sequences and Its Application for Predicting Self-Interacting Proteins with Multi-Grained Cascade Forest Model. Genes 2019, 10, 924.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop