Next Article in Journal
Rice Secondary Metabolites: Structures, Roles, Biosynthesis, and Metabolic Regulation
Next Article in Special Issue
Numerical Encodings of Amino Acids in Multivariate Gaussian Modeling of Protein Multiple Sequence Alignments
Previous Article in Journal
Dissociative Ionization and Coulomb Explosion of Molecular Bromocyclopropane in an Intense Femtosecond Laser Field
Previous Article in Special Issue
Prediction of GluN2B-CT1290-1310/DAPK1 Interaction by Protein–Peptide Docking and Molecular Dynamics Simulation
Open AccessArticle

Recognition of Protein Pupylation Sites by Adopting Resampling Approach

by Tao Li 1,2, Yan Chen 1, Taoying Li 1,* and Cangzhi Jia 3
School of Transportation Management, Dalian Maritime University, Dalian 116026, China
China Waterborne Transport Research Institute, Beijing 100088, China
College of Science, Dalian Maritime University, Dalian 116026, China
Author to whom correspondence should be addressed.
Molecules 2018, 23(12), 3097;
Received: 13 October 2018 / Revised: 21 November 2018 / Accepted: 22 November 2018 / Published: 27 November 2018
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
With the in-depth study of posttranslational modification sites, protein ubiquitination has become the key problem to study the molecular mechanism of posttranslational modification. Pupylation is a widely used process in which a prokaryotic ubiquitin-like protein (Pup) is attached to a substrate through a series of biochemical reactions. However, the experimental methods of identifying pupylation sites is often time-consuming and laborious. This study aims to propose an improved approach for predicting pupylation sites. Firstly, the Pearson correlation coefficient was used to reflect the correlation among different amino acid pairs calculated by the frequency of each amino acid. Then according to a descending ranked order, the multiple types of features were filtered separately by values of Pearson correlation coefficient. Thirdly, to get a qualified balanced dataset, the K-means principal component analysis (KPCA) oversampling technique was employed to synthesize new positive samples and Fuzzy undersampling method was employed to reduce the number of negative samples. Finally, the performance of our method was verified by means of jackknife and a 10-fold cross-validation test. The average results of 10-fold cross-validation showed that the sensitivity (Sn) was 90.53%, specificity (Sp) was 99.8%, accuracy (Acc) was 95.09%, and Matthews Correlation Coefficient (MCC) was 0.91. Moreover, an independent test dataset was used to further measure its performance, and the prediction results achieved the Acc of 83.75%, MCC of 0.49, which was superior to previous predictors. The better performance and stability of our proposed method showed it is an effective way to predict pupylation sites. View Full-Text
Keywords: fuzzy undersampling; machine learning; principal component analysis; protein pupylation; sequence information fuzzy undersampling; machine learning; principal component analysis; protein pupylation; sequence information
Show Figures

Figure 1

MDPI and ACS Style

Li, T.; Chen, Y.; Li, T.; Jia, C. Recognition of Protein Pupylation Sites by Adopting Resampling Approach. Molecules 2018, 23, 3097.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Search more from Scilit
Back to TopTop