Next Article in Journal
Perceptions of ICT Practitioners Regarding Software Privacy
Next Article in Special Issue
Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification
Previous Article in Journal
Identifying Communities in Dynamic Networks Using Information Dynamics
Previous Article in Special Issue
Improved Practical Vulnerability Analysis of Mouse Data According to Offensive Security based on Machine Learning in Image-Based User Authentication
Open AccessReview

Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges

by Samarendra Das 1,2,3, Craig J. McClain 4,5,6,7,8 and Shesh N. Rai 2,3,5,6,9,*
1
Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
2
School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA
3
Biostatistics and Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY 40202, USA
4
Department of Medicine, University of Louisville, Louisville, KY 40202, USA
5
Hepatobiology and Toxicology Center, University of Louisville, Louisville, KY 40202, USA
6
Alcohol Research Center, University of Louisville, Louisville, KY 40202, USA
7
Department of Pharmacology and Toxicology, University of Louisville, Louisville, KY 40202, USA
8
Robley Rex Louisville VAMC, Louisville, KY 40206, USA
9
Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA
*
Author to whom correspondence should be addressed.
Entropy 2020, 22(4), 427; https://doi.org/10.3390/e22040427
Received: 24 February 2020 / Revised: 18 March 2020 / Accepted: 3 April 2020 / Published: 10 April 2020
(This article belongs to the Special Issue Statistical Inference from High Dimensional Data)
Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Although gene set analysis approaches are extensively used in gene expression and genome wide association data analysis, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. In this article, we provide a comprehensive overview, statistical structure and steps of gene set analysis approaches used for microarrays, RNA-sequencing and genome wide association data analysis. Further, we also classify the gene set analysis approaches and tools by the type of genomic study, null hypothesis, sampling model and nature of the test statistic, etc. Rather than reviewing the gene set analysis approaches individually, we provide the generation-wise evolution of such approaches for microarrays, RNA-sequencing and genome wide association studies and discuss their relative merits and limitations. Here, we identify the key biological and statistical challenges in current gene set analysis, which will be addressed by statisticians and biologists collectively in order to develop the next generation of gene set analysis approaches. Further, this study will serve as a catalog and provide guidelines to genome researchers and experimental biologists for choosing the proper gene set analysis approach based on several factors. View Full-Text
Keywords: gene set analysis; microarrays; RNA-sequencing; genome wide association study; competitive; self-contained; sampling model; null hypothesis gene set analysis; microarrays; RNA-sequencing; genome wide association study; competitive; self-contained; sampling model; null hypothesis
Show Figures

Graphical abstract

MDPI and ACS Style

Das, S.; McClain, C.J.; Rai, S.N. Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges. Entropy 2020, 22, 427.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop