Next Article in Journal
Vasculoprotective Effects of Vildagliptin. Focus on Atherogenesis
Previous Article in Journal
Regenerative Potential of Carbon Monoxide in Adult Neural Circuits of the Central Nervous System
Previous Article in Special Issue
Insights into Interactions of Flavanones with Target Human Respiratory Syncytial Virus M2-1 Protein from STD-NMR, Fluorescence Spectroscopy, and Computational Simulations
Open AccessArticle

Developing Computational Model to Predict Protein-Protein Interaction Sites Based on the XGBoost Algorithm

by Aijun Deng 1,2,3,†, Huan Zhang 4,†, Wenyan Wang 4, Jun Zhang 5, Dingdong Fan 2, Peng Chen 5,* and Bing Wang 1,4,5,*
1
Key Laboratory of Metallurgical Emission Reduction & Resources Recycling (Anhui University of Technology), Ministry of Education, Ma’anshan 243002, China
2
School of Metallurgical Engineering, Anhui University of Technology, Ma’anshan 243032, China
3
Department of Engineering, University of Leicester, Leicester LE1 7RH, UK
4
School of Electrical and Information Engineering, Anhui University of Technology, Ma’anshan 243032, China
5
Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei 230032, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Int. J. Mol. Sci. 2020, 21(7), 2274; https://doi.org/10.3390/ijms21072274
Received: 3 February 2020 / Revised: 10 March 2020 / Accepted: 23 March 2020 / Published: 25 March 2020
(This article belongs to the Special Issue Recent Advances in Biomolecular Recognition)
The study of protein-protein interaction is of great biological significance, and the prediction of protein-protein interaction sites can promote the understanding of cell biological activity and will be helpful for drug development. However, uneven distribution between interaction and non-interaction sites is common because only a small number of protein interactions have been confirmed by experimental techniques, which greatly affects the predictive capability of computational methods. In this work, two imbalanced data processing strategies based on XGBoost algorithm were proposed to re-balance the original dataset from inherent relationship between positive and negative samples for the prediction of protein-protein interaction sites. Herein, a feature extraction method was applied to represent the protein interaction sites based on evolutionary conservatism of proteins, and the influence of overlapping regions of positive and negative samples was considered in prediction performance. Our method showed good prediction performance, such as prediction accuracy of 0.807 and MCC of 0.614, on an original dataset with 10,455 surface residues but only 2297 interface residues. Experimental results demonstrated the effectiveness of our XGBoost-based method. View Full-Text
Keywords: protein interaction sites; unbalanced data sets; overlapping regions; XGBoost protein interaction sites; unbalanced data sets; overlapping regions; XGBoost
Show Figures

Figure 1

MDPI and ACS Style

Deng, A.; Zhang, H.; Wang, W.; Zhang, J.; Fan, D.; Chen, P.; Wang, B. Developing Computational Model to Predict Protein-Protein Interaction Sites Based on the XGBoost Algorithm. Int. J. Mol. Sci. 2020, 21, 2274.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop