Next Article in Journal
Mitochondrial DNA Haplogroups and Breast Cancer Risk Factors in the Avon Longitudinal Study of Parents and Children (ALSPAC)
Next Article in Special Issue
An Integrated Approach for Identifying Molecular Subtypes in Human Colon Cancer Using Gene Expression Data
Previous Article in Journal
A High-Quality, Long-Read De Novo Genome Assembly to Aid Conservation of Hawaiiʻs Last Remaining Crow Species
Previous Article in Special Issue
Multimodal 3D DenseNet for IDH Genotype Prediction in Gliomas
Article Menu
Issue 8 (August) cover image

Export Article

Open AccessArticle
Genes 2018, 9(8), 394;

A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers

School of Computer Science and Technology, Tianjin University, Nankai, Tianjin 300072, China
Tianjin Key Laboratory of Cognitive Computing and Application, Nankai, Tianjin 300072, China
Beijing KEDONG Electric Power Control System Co. LTD, Qinghe, Beijing 100192, China
Author to whom correspondence should be addressed.
Received: 3 June 2018 / Revised: 24 July 2018 / Accepted: 24 July 2018 / Published: 1 August 2018
Full-Text   |   PDF [2858 KB, uploaded 1 August 2018]   |  


Nowadays, various machine learning-based approaches using sequence information alone have been proposed for identifying DNA-binding proteins, which are crucial to many cellular processes, such as DNA replication, DNA repair and DNA modification. Among these methods, building a meaningful feature representation of the sequences and choosing an appropriate classifier are the most trivial tasks. Disclosing the significances and contributions of different feature spaces and classifiers to the final prediction is of the utmost importance, not only for the prediction performances, but also the practical clues of biological experiment designs. In this study, we propose a model stacking framework by orchestrating multi-view features and classifiers (MSFBinder) to investigate how to integrate and evaluate loosely-coupled models for predicting DNA-binding proteins. The framework integrates multi-view features including Local_DPP, 188D, Position-Specific Scoring Matrix (PSSM)_DWT and autocross-covariance of secondary structures(AC_Struc), which were extracted based on evolutionary information, sequence composition, physiochemical properties and predicted structural information, respectively. These features are fed into various loosely-coupled classifiers such as SVM and random forest. Then, a logistic regression model was applied to evaluate the contributions of these individual classifiers and to make the final prediction. When performing on the training dataset PDB1075, the proposed method achieves an accuracy of 83.53%. On the independent dataset PDB186, the method achieves an accuracy of 81.72%, which outperforms many existing methods. These results suggest that the framework is able to orchestrate various predicted models flexibly with good performances. View Full-Text
Keywords: DNA-binding proteins; model stacking; logistic regression; multi-view features DNA-binding proteins; model stacking; logistic regression; multi-view features

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Liu, X.-J.; Gong, X.-J.; Yu, H.; Xu, J.-H. A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers. Genes 2018, 9, 394.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Genes EISSN 2073-4425 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top