Next Article in Journal
Roles of Two Sox9 Genes during Gonadal Development in Japanese Flounder: Sex Differentiation, Spermatogenesis and Gonadal Function Maintenance
Next Article in Special Issue
DNA Methylation Levels of the ELMO Gene Promoter CpG Islands in Human Glioblastomas
Previous Article in Journal
Deficiency of Invariant Natural Killer T Cells Does Not Protect Against Obesity but Exacerbates Atherosclerosis in Ldlr−/− Mice
Previous Article in Special Issue
Changes in DNA Methylation from Age 18 to Pregnancy in Type 1, 2, and 17 T Helper and Regulatory T-Cells Pathway Genes
Article Menu
Issue 2 (February) cover image

Export Article

Open AccessArticle
Int. J. Mol. Sci. 2018, 19(2), 511; https://doi.org/10.3390/ijms19020511

A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties

1,2,†
,
1,2,†
,
1,2,3
and
1,2,*
1
School of Computer Science and Technology, Tianjin University, Tianjin 300350, China
2
Tianjin University Institute of Computational Biology, Tianjin University, Tianjin 300350, China
3
Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
These authors contributed equally to this work.
*
Author to whom correspondence should be addressed.
Received: 29 December 2017 / Revised: 1 February 2018 / Accepted: 2 February 2018 / Published: 8 February 2018
(This article belongs to the Special Issue DNA Methylation)
View Full-Text   |   Download PDF [824 KB, uploaded 8 February 2018]   |  

Abstract

DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods—especially machine learning methods—have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use k-gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria—area under the receiver operating characteristic curve (AUC), Matthew’s correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity—are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399 . For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3. View Full-Text
Keywords: DNA methylation; scBS-seq profiled mouse embryonic stem cells; k-gram; multivariate mutual information; discrete wavelet transform; PseAAC; Sparse Bayesian learning; support vector machine; feature selection DNA methylation; scBS-seq profiled mouse embryonic stem cells; k-gram; multivariate mutual information; discrete wavelet transform; PseAAC; Sparse Bayesian learning; support vector machine; feature selection
Figures

Graphical abstract

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Pan, G.; Jiang, L.; Tang, J.; Guo, F. A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties. Int. J. Mol. Sci. 2018, 19, 511.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Int. J. Mol. Sci. EISSN 1422-0067 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top