Next Article in Journal
Serine–Arginine Protein Kinase SRPK2 Modulates the Assembly of the Active Zone Scaffolding Protein CAST1/ERC2
Next Article in Special Issue
Predicting Disease Related microRNA Based on Similarity and Topology
Previous Article in Journal
hiPSCs Derived Cardiac Cells for Drug and Toxicity Screening and Disease Modeling: What Micro- Electrode-Array Analyses Can Tell Us
Previous Article in Special Issue
A Novel Computational Model for Predicting microRNA–Disease Associations Based on Heterogeneous Graph Convolutional Networks
Open AccessArticle

4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N4-Methylcytosine Sites in the Mouse Genome

1
Department of Physiology, Ajou University School of Medicine, Suwon 16499, Korea
2
School of Software, Shandong University, Jinan 250101, China
*
Authors to whom correspondence should be addressed.
Cells 2019, 8(11), 1332; https://doi.org/10.3390/cells8111332
Received: 21 August 2019 / Revised: 21 October 2019 / Accepted: 24 October 2019 / Published: 28 October 2019
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)
DNA N4-methylcytosine (4mC) is one of the key epigenetic alterations, playing essential roles in DNA replication, differentiation, cell cycle, and gene expression. To better understand 4mC biological functions, it is crucial to gain knowledge on its genomic distribution. In recent times, few computational studies, in particular machine learning (ML) approaches have been applied in the prediction of 4mC site predictions. Although ML-based methods are promising for 4mC identification in other species, none are available for detecting 4mCs in the mouse genome. Our novel computational approach, called 4mCpred-EL, is the first method for identifying 4mC sites in the mouse genome where four different ML algorithms with a wide range of seven feature encodings are utilized. Subsequently, those feature encodings predicted probabilistic values are used as a feature vector and are once again inputted to ML algorithms, whose corresponding models are integrated into ensemble learning. Our benchmarking results demonstrated that 4mCpred-EL achieved an accuracy and MCC values of 0.795 and 0.591, which significantly outperformed seven other classifiers by more than 1.5–5.9% and 3.2–11.7%, respectively. Additionally, 4mCpred-EL attained an overall accuracy of 79.80%, which is 1.8–5.1% higher than that yielded by seven other classifiers in the independent evaluation. We provided a user-friendly web server, namely 4mCpred-EL which could be implemented as a pre-screening tool for the identification of potential 4mC sites in the mouse genome. View Full-Text
Keywords: machine learning; DNA methylation; mouse genome; N4-methylcytosine identification machine learning; DNA methylation; mouse genome; N4-methylcytosine identification
Show Figures

Graphical abstract

MDPI and ACS Style

Manavalan, B.; Basith, S.; Shin, T.H.; Lee, D.Y.; Wei, L.; Lee, G. 4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N4-Methylcytosine Sites in the Mouse Genome. Cells 2019, 8, 1332.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop