Next Article in Journal
Fifteen Shades of Grey: Combined Analysis of Genome-Wide SNP Data in Steppe and Mediterranean Grey Cattle Sheds New Light on the Molecular Basis of Coat Color
Next Article in Special Issue
Bioinformatics Analysis Revealed Novel 3′UTR Variants Associated with Intellectual Disability
Previous Article in Journal
Insights into Mobile Genetic Elements of the Biocide-Degrading Bacterium Pseudomonas nitroreducens HBP-1
Article

A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data

1
Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
2
Department of Computer Science & Engineering, Aliah University, Newtown WB-700160, India
3
Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
4
MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA
*
Author to whom correspondence should be addressed.
Genes 2020, 11(8), 931; https://doi.org/10.3390/genes11080931
Received: 21 July 2020 / Revised: 3 August 2020 / Accepted: 6 August 2020 / Published: 12 August 2020
DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, such methylation signal mapping has limitations. To address these limitations, in this study, we introduced a combinatorial framework using linear regression, differential expression, deep learning method for accurate biological interpretation of DNA methylation through integrating DNA methylation data and corresponding TCGA gene expression data. We demonstrated it for uterine cervical cancer. First, we pre-filtered outliers from the data set and then determined the predicted gene expression value from the pre-filtered methylation data through linear regression. We identified differentially expressed genes (DEGs) by Empirical Bayes test using Limma. Then we applied a deep learning method, “nnet” to classify the cervical cancer label of those DEGs to determine all classification metrics including accuracy and area under curve (AUC) through 10-fold cross validation. We applied our approach to uterine cervical cancer DNA methylation dataset (NCBI accession ID: GSE30760, 27,578 features covering 63 tumor and 152 matched normal samples). After linear regression and differential expression analysis, we obtained 6287 DEGs with false discovery rate (FDR) <0.001. After performing deep learning analysis, we obtained average classification accuracy 90.69% (±1.97%) of the uterine cervical cancerous labels. This performance is better than that of other peer methods. We performed in-degree and out-degree hub gene network analysis using Cytoscape. We reported five top in-degree genes (PAIP2, GRWD1, VPS4B, CRADD and LLPH) and five top out-degree genes (MRPL35, FAM177A1, STAT4, ASPSCR1 and FABP7). After that, we performed KEGG pathway and Gene Ontology enrichment analysis of DEGs using tool WebGestalt(WEB-based Gene SeT AnaLysis Toolkit). In summary, our proposed framework that integrated linear regression, differential expression, deep learning provides a robust approach to better interpret DNA methylation analysis and gene expression data in disease study. View Full-Text
Keywords: uterine cervical cancer; DNA methylation; Liner regression; deep learning; differentially expressed genes uterine cervical cancer; DNA methylation; Liner regression; deep learning; differentially expressed genes
Show Figures

Figure 1

MDPI and ACS Style

Mallik, S.; Seth, S.; Bhadra, T.; Zhao, Z. A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data. Genes 2020, 11, 931. https://doi.org/10.3390/genes11080931

AMA Style

Mallik S, Seth S, Bhadra T, Zhao Z. A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data. Genes. 2020; 11(8):931. https://doi.org/10.3390/genes11080931

Chicago/Turabian Style

Mallik, Saurav; Seth, Soumita; Bhadra, Tapas; Zhao, Zhongming. 2020. "A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data" Genes 11, no. 8: 931. https://doi.org/10.3390/genes11080931

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop