Next Article in Journal
Separation and Enrichment of Lectin from Zihua Snap-Bean (Phaseolus vulgaris) Seeds by PEG 600–Ammonium Sulfate Aqueous Two-Phase System
Next Article in Special Issue
Molecular Dynamic Simulation of Space and Earth-Grown Crystal Structures of Thermostable T1 Lipase Geobacillus zalihae Revealed a Better Structure
Previous Article in Journal
Essential Oils as an Alternative to Pyrethroids’ Resistance against Anopheles Species Complex Giles (Diptera: Culicidae)
Previous Article in Special Issue
Integrative Pathway Analysis of Genes and Metabolites Reveals Metabolism Abnormal Subpathway Regions and Modules in Esophageal Squamous Cell Carcinoma
Article Menu
Issue 10 (October) cover image

Export Article

Open AccessArticle
Molecules 2017, 22(10), 1602; doi:10.3390/molecules22101602

Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods

1
School of Computer Science and Technology, Tianjin University, Tianjin 300350, China
2
School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China
3
Center of Potential Illness, Qinhuangdao Hospital of Traditional Chinese Medicine, Qinhuangdao 066001, China
4
School of Computer Science and Technology, Harbin Institute of China, Harbin 150001, China
5
State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin 300074, China
*
Author to whom correspondence should be addressed.
Received: 15 August 2017 / Revised: 19 September 2017 / Accepted: 20 September 2017 / Published: 22 September 2017
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
View Full-Text   |   Download PDF [807 KB, uploaded 25 September 2017]   |  

Abstract

DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features. View Full-Text
Keywords: DNA-binding protein; mixed feature representation methods; support vector machine DNA-binding protein; mixed feature representation methods; support vector machine
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Qu, K.; Han, K.; Wu, S.; Wang, G.; Wei, L. Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods. Molecules 2017, 22, 1602.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]

Molecules EISSN 1420-3049 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top