Next Article in Journal
S-(−)-10,11-Dihydroxyfarnesoic Acid Methyl Ester Inhibits Melanin Synthesis in Murine Melanocyte Cells
Next Article in Special Issue
PL-PatchSurfer: A Novel Molecular Local Surface-Based Method for Exploring Protein-Ligand Interactions
Previous Article in Journal
Study on the Characteristics of Gas Molecular Mean Free Pathin Nanopores by Molecular Dynamics Simulations
Previous Article in Special Issue
TupA: A Tungstate Binding Protein in the Periplasm of Desulfovibrio alaskensis G20
Article Menu

Export Article

Open AccessArticle
Int. J. Mol. Sci. 2014, 15(7), 12731-12749; doi:10.3390/ijms150712731

A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction

1,3,* , 2,3,†
Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, China
Institute of Information Engineering, Anhui Xinhua University, Hefei 230088, China
School of Computer Science and Technology, Anhui University, Hefei 230601, China
School of Mathematical Science, Anhui University, Hefei 230601, China
These authors contributed equally to this work.
Author to whom correspondence should be addressed.
Received: 7 May 2014 / Revised: 23 June 2014 / Accepted: 14 July 2014 / Published: 18 July 2014
(This article belongs to the Collection Proteins and Protein-Ligand Interactions)


Protein–protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at DXECPPI/index.jsp. View Full-Text
Keywords: protein–protein interaction; random forest; ensemble coding; DX score protein–protein interaction; random forest; ensemble coding; DX score

Figure 1

This is an open access article distributed under the Creative Commons Attribution License (CC BY 3.0).

Supplementary material

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Du, X.; Cheng, J.; Zheng, T.; Duan, Z.; Qian, F. A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction. Int. J. Mol. Sci. 2014, 15, 12731-12749.

Show more citation formats Show less citations formats

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Int. J. Mol. Sci. EISSN 1422-0067 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top