Next Article in Journal
Micellization Behavior of Long-Chain Substituted Alkylguanidinium Surfactants
Previous Article in Journal
Biofuel Production Based on Carbohydrates from Both Brown and Red Macroalgae: Recent Developments in Key Biotechnologies
Article Menu
Issue 2 (February) cover image

Export Article

Open AccessArticle
Int. J. Mol. Sci. 2016, 17(2), 218; doi:10.3390/ijms17020218

A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data

1
School of Control Science and Engineering, Shandong University, Jinan 250061, China
2
School of Mechanical, Electrical and Information Engineering, Shandong University atWeihai, Weihai 264209, China
*
Author to whom correspondence should be addressed.
Academic Editor: Christo Z. Christov
Received: 4 January 2016 / Accepted: 1 February 2016 / Published: 6 February 2016
(This article belongs to the Section Physical Chemistry, Theoretical and Computational Chemistry)
View Full-Text   |   Download PDF [294 KB, uploaded 19 February 2016]   |  

Abstract

The Golgi Apparatus (GA) is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes. The dysfunction of GA proteins can result in neurodegenerative diseases. Therefore, accurate identification of protein subGolgi localizations may assist in drug development and understanding the mechanisms of the GA involved in various cellular processes. In this paper, a new computational method is proposed for identifying cis-Golgi proteins from trans-Golgi proteins. Based on the concept of Common Spatial Patterns (CSP), a novel feature extraction technique is developed to extract evolutionary information from protein sequences. To deal with the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted. A feature selection method called Random Forest-Recursive Feature Elimination (RF-RFE) is employed to search the optimal features from the CSP based features and g-gap dipeptide composition. Based on the optimal features, a Random Forest (RF) module is used to distinguish cis-Golgi proteins from trans-Golgi proteins. Through the jackknife cross-validation, the proposed method achieves a promising performance with a sensitivity of 0.889, a specificity of 0.880, an accuracy of 0.885, and a Matthew’s Correlation Coefficient (MCC) of 0.765, which remarkably outperforms previous methods. Moreover, when tested on a common independent dataset, our method also achieves a significantly improved performance. These results highlight the promising performance of the proposed method to identify Golgi-resident protein types. Furthermore, the CSP based feature extraction method may provide guidelines for protein function predictions. View Full-Text
Keywords: golgi apparatus proteins; common spatial patterns; synthetic minority over-sampling technique; recursive feature elimination; random forest golgi apparatus proteins; common spatial patterns; synthetic minority over-sampling technique; recursive feature elimination; random forest
Figures

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Supplementary material

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Yang, R.; Zhang, C.; Gao, R.; Zhang, L. A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data. Int. J. Mol. Sci. 2016, 17, 218.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Int. J. Mol. Sci. EISSN 1422-0067 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top