Next Article in Journal
A Robust Sparse Adaptive Filtering Algorithm with a Correntropy Induced Metric Constraint for Broadband Multi-Path Channel Estimation
Previous Article in Journal
Second Law Analysis of Nanofluid Flow within a Circular Minichannel Considering Nanoparticle Migration
Previous Article in Special Issue
Contact-Free Detection of Obstructive Sleep Apnea Based on Wavelet Information Entropy Spectrum Using Bio-Radar
Article Menu
Issue 10 (October) cover image

Export Article

Open AccessArticle
Entropy 2016, 18(10), 379; doi:10.3390/e18100379

A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen–Shannon Divergence

1
Institute of Computer Science, University of Göttingen, Göttingen 37077, Germany
2
Institute of Bioinformatics, University Medical Center Göttingen, Göttingen 37077, Germany
*
Author to whom correspondence should be addressed.
Academic Editors: Carlos M. Travieso-González and Jesús B. Alonso-Hernández
Received: 30 July 2016 / Revised: 19 October 2016 / Accepted: 20 October 2016 / Published: 24 October 2016
(This article belongs to the Special Issue Entropy on Biosignals and Intelligent Systems)
View Full-Text   |   Download PDF [1170 KB, uploaded 24 October 2016]   |  

Abstract

The knowledge of protein-DNA interactions is essential to fully understand the molecular activities of life. Many research groups have developed various tools which are either structure- or sequence-based approaches to predict the DNA-binding residues in proteins. The structure-based methods usually achieve good results, but require the knowledge of the 3D structure of protein; while sequence-based methods can be applied to high-throughput of proteins, but require good features. In this study, we present a new information theoretic feature derived from Jensen–Shannon Divergence (JSD) between amino acid distribution of a site and the background distribution of non-binding sites. Our new feature indicates the difference of a certain site from a non-binding site, thus it is informative for detecting binding sites in proteins. We conduct the study with a five-fold cross validation of 263 proteins utilizing the Random Forest classifier. We evaluate the functionality of our new features by combining them with other popular existing features such as position-specific scoring matrix (PSSM), orthogonal binary vector (OBV), and secondary structure (SS). We notice that by adding our features, we can significantly boost the performance of Random Forest classifier, with a clear increment of sensitivity and Matthews correlation coefficient (MCC). View Full-Text
Keywords: entropy; Jensen–Shannon divergence; Random Forest; DNA-binding sites entropy; Jensen–Shannon divergence; Random Forest; DNA-binding sites
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Dang, T.K.L.; Meckbach, C.; Tacke, R.; Waack, S.; Gültas, M. A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen–Shannon Divergence. Entropy 2016, 18, 379.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top