Next Article in Journal
Angular Correlation Using Rogers-Szegő-Chaos
Previous Article in Journal
A Modified Hestenes-Stiefel-Type Derivative-Free Method for Large-Scale Nonlinear Monotone Equations
Open AccessArticle

Prediction of Extracellular Matrix Proteins by Fusing Multiple Feature Information, Elastic Net, and Random Forest Algorithm

by Minghui Wang 1,2, Lingling Yue 1,2, Xiaowen Cui 1,2, Cheng Chen 1,2, Hongyan Zhou 1,2, Qin Ma 3 and Bin Yu 1,2,4,*
1
College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
2
Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
3
Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
4
School of Life Sciences, University of Science and Technology of China, Hefei 230027, China
*
Author to whom correspondence should be addressed.
Mathematics 2020, 8(2), 169; https://doi.org/10.3390/math8020169
Received: 3 January 2020 / Revised: 22 January 2020 / Accepted: 23 January 2020 / Published: 31 January 2020
Extracellular matrix (ECM) proteins play an important role in a series of biological processes of cells. The study of ECM proteins is helpful to further comprehend their biological functions. We propose ECMP-RF (extracellular matrix proteins prediction by random forest) to predict ECM proteins. Firstly, the features of the protein sequence are extracted by combining encoding based on grouped weight, pseudo amino-acid composition, pseudo position-specific scoring matrix, a local descriptor, and an autocorrelation descriptor. Secondly, the synthetic minority oversampling technique (SMOTE) algorithm is employed to process the class imbalance data, and the elastic net (EN) is used to reduce the dimension of the feature vectors. Finally, the random forest (RF) classifier is used to predict the ECM proteins. Leave-one-out cross-validation shows that the balanced accuracy of the training and testing datasets is 97.3% and 97.9%, respectively. Compared with other state-of-the-art methods, ECMP-RF is significantly better than other predictors.
Keywords: extracellular matrix protein; multi-information fusion; synthetic minority oversampling technique; elastic net; random forest extracellular matrix protein; multi-information fusion; synthetic minority oversampling technique; elastic net; random forest
MDPI and ACS Style

Wang, M.; Yue, L.; Cui, X.; Chen, C.; Zhou, H.; Ma, Q.; Yu, B. Prediction of Extracellular Matrix Proteins by Fusing Multiple Feature Information, Elastic Net, and Random Forest Algorithm. Mathematics 2020, 8, 169.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop