massiveGST: A Mann–Whitney–Wilcoxon Gene-Set Test Tool That Gives Meaning to Gene-Set Enrichment Analysis
Abstract
:1. Introduction
2. Materials and Methods
2.1. The Normalized Enrichment Score
2.2. Enrichments Prioritization
2.3. Enrichments Visualization
2.4. Web-Based Service
2.5. R Package
3. Results
3.1. Computational Time: Comparison with Literature Methods
3.2. Usage of the Online Web-Tool
4. Conclusions, Limitations, and Future Research
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Software Availability
Abbreviations
GSEA | Gene-Set Enrichment Analysis |
EA | Enrichment Analysis |
ES | Enrichment Score |
NES | Normalized Enrichment Score |
GST | Gene-Set Test |
mGST | massive Gene-Set Test |
MWW | Mann–Whitney–Wilcoxon |
RST | Rank-Sum test |
KS | Kolmogorov–Smirnov |
wKS | weighted-KS |
MI | Mutual Information |
TCGA | The Cancer Genome Atlas |
k-NN | k-Nearest Neighbor |
Appendix A
R Project | Online | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Study | Doi | Control | Treatment | Length | GSEA | fGSEA | CP/wKS | mGST | cPR | mGST | GT3/MWW | GT3/wKS | WG/wKS | File Name 2 |
BLCA | 10.1016/j.cell.2017.09.007 | Basal-squamous (142) | Luminal (246) | 19,664 | 319.59 | 2.09 | 1.79 | 0.21 | 0.01 | 0.93 | 19.97 | 16.78 | 90.26 | BLCA_Wald_pv.rnk |
BRCA | 10.1016/j.ccell.2018.03.014 | Basal (190) | Her2 (82) | 19,579 | 314.71 | 1.85 | 1.79 | 0.22 | 0.01 | 0.91 | 12.92 | 13.45 | 78.38 | BRCA_BH2_Wald_pv.rnk |
BRCA | 10.1016/j.ccell.2018.03.014 | Basal (190) | LumA (562) | 19,657 | 317.99 | 1.92 | 2.04 | 0.23 | 0.01 | 0.91 | 13.15 | 13.07 | 80.55 | BRCA_BLA_Wald_pv.rnk |
BRCA | 10.1016/j.ccell.2018.03.014 | Basal (190) | LumB (209) | 19,626 | 321.35 | 2.04 | 1.99 | 0.21 | 0.01 | 0.89 | 12.89 | 13.88 | 77.99 | BRCA_BLB_Wald_pv.rnk |
BRCA | 10.1016/j.ccell.2018.03.014 | Her2 (82) | LumA (562) | 19,650 | 383.10 | 2.08 | 1.87 | 0.23 | 0.01 | 0.91 | 13.17 | 14.21 | 86.20 | BRCA_H2LA_Wald_pv.rnk |
BRCA | 10.1016/j.ccell.2018.03.014 | Her2 (82) | LumB (209) | 19,592 | 380.24 | 2.01 | 1.85 | 0.21 | 0.01 | 0.91 | 13.95 | 15.17 | 75.52 | BRCA_H2LB_Wald_pv.rnk |
BRCA | 10.1016/j.ccell.2018.03.014 | LumA (562) | LumB (209) | 19,652 | 380.88 | 2.59 | 1.93 | 0.36 | 0.01 | 0.92 | 15.83 | 16.92 | 95.24 | BRCA_LALB_Wald_pv.rnk |
KIRC | 10.1038/nature12222 | 1 (147) | 2 (90) | 19,639 | 393.28 | 6.07 | 3.56 | 0.21 | 0.01 | 0.91 | 18.02 | 21.32 | 77.28 | KIRC_1_2_Wald_pv.rnk |
KIRC | 10.1038/nature12222 | 1 (147) | 3 (94) | 19,609 | 376.26 | 2.88 | 1.90 | 0.37 | 0.01 | 0.91 | 18.09 | 17.42 | 76.29 | KIRC_1_3_Wald_pv.rnk |
KIRC | 10.1038/nature12222 | 1 (147) | 4 (86) | 19,613 | 407.56 | 3.55 | 2.83 | 0.38 | 0.01 | 0.91 | 16.54 | 18.38 | 98.32 | KIRC_1_4_Wald_pv.rnk |
KIRC | 10.1038/nature12222 | 2 (90) | 3 (94) | 19,633 | 401.31 | 3.50 | 2.57 | 0.38 | 0.01 | 0.91 | 17.44 | 20.10 | 81.18 | KIRC_2_3_Wald_pv.rnk |
KIRC | 10.1038/nature12222 | 2 (90) | 4 (86) | 19,638 | 382.03 | 1.86 | 1.38 | 0.21 | 0.01 | 0.92 | 17.76 | 20.13 | 81.32 | KIRC_2_4_Wald_pv.rnk |
KIRC | 10.1038/nature12222 | 3 (94) | 4 (86) | 19,609 | 389.74 | 3.28 | 2.21 | 0.40 | 0.01 | 0.91 | 18.32 | 19.36 | 80.15 | KIRC_3_4_Wald_pv.rnk |
LGG | 10.1016/j.cell.2015.12.028 | IDHwt (97) | IDHmut (419) | 19,661 | 387.23 | 2.92 | 2.31 | 0.23 | 0.01 | 0.91 | 15.16 | 15.19 | 79.88 | LGG_IDH_Wald_pv.rnk |
LUAD | 10.1038/nature13385 | inflammatory (141) | proliferative (89) | 19,542 | 383.93 | 2.45 | 2.08 | 0.43 | 0.01 | 0.90 | 9.63 | 10.79 | 95.65 | LUAD_InflamProl_Wald_pv.rnk |
LUAD | 10.1038/nature13385 | proximal (78) | TRU (63) | 19,469 | 376.99 | 2.05 | 1.78 | 0.22 | 0.01 | 0.90 | 9.76 | 10.10 | 82.60 | LUAD_ProxTRU_Wald_pv.rnk |
LUSC | 10.1038/nature11404 | basal (43) | classical (65) | 19,560 | 383.54 | 4.31 | 2.67 | 0.21 | 0.01 | 0.91 | 9.54 | 10.65 | 92.81 | LUSC_BC_Wald_pv.rnk |
LUSC | 10.1038/nature11404 | basal (43) | primitive (27) | 19,554 | 378.42 | 2.85 | 2.05 | 0.22 | 0.01 | 0.90 | 9.90 | 12.12 | 82.13 | LUSC_BP_Wald_pv.rnk |
LUSC | 10.1038/nature11404 | basal (43) | secretory (44) | 19,554 | 389.57 | 3.06 | 2.56 | 0.24 | 0.01 | 0.90 | 10.36 | 10.93 | 88.52 | LUSC_BS_Wald_pv.rnk |
LUSC | 10.1038/nature11404 | classical (65) | secretory (44) | 19,481 | 421.33 | 5.30 | 3.07 | 0.21 | 0.01 | 0.90 | 11.99 | 13.16 | 80.77 | LUSC_CS_Wald_pv.rnk |
LUSC | 10.1038/nature11404 | primitive (27) | secretory (44) | 19,481 | 411.08 | 5.09 | 3.02 | 0.21 | 0.12 | 0.89 | 11.32 | 12.44 | 97.50 | LUSC_PS_Wald_pv.rnk |
PAAD | 10.1016/j.ccell.2017.07.007 | classical (54) | exocrine (62) | 19,395 | 381.77 | 1.75 | 1.46 | 0.20 | 0.01 | 0.89 | 13.04 | 12.93 | 76.34 | PAAD_classical_exocrine_Wald_pv.rnk |
PAAD | 10.1016/j.ccell.2017.07.007 | classical (54) | QM (34) | 19,334 | 393.19 | 1.97 | 1.56 | 0.50 | 0.01 | 0.88 | 9.18 | 9.48 | 78.63 | PAAD_classical_QM_Wald_pv.rnk |
PAAD | 10.1016/j.ccell.2017.07.007 | exocrine (62) | QM (34) | 19,366 | 412.21 | 1.49 | 1.25 | 0.21 | 0.01 | 0.88 | 10.27 | 11.63 | 80.78 | PAAD_exocrine_QM_Wald_pv.rnk |
STAD | 10.1038/nature13480 | C1 (49) | C2 (59) | 19,648 | 442.49 | 1.73 | 1.66 | 0.22 | 0.01 | 0.91 | 12.84 | 11.81 | 85.34 | STAD_C1C2_Wald_pv.rnk |
STAD | 10.1038/nature13480 | C1 (49) | C3 (98) | 19,679 | 442.85 | 1.83 | 1.59 | 0.22 | 0.12 | 0.91 | 11.21 | 12.46 | 91.14 | STAD_C1C3_Wald_pv.rnk |
STAD | 10.1038/nature13480 | C1 (49) | C4 (48) | 19,651 | 351.94 | 1.56 | 1.41 | 0.22 | 0.00 | 0.91 | 12.46 | 13.18 | 90.15 | STAD_C1C4_Wald_pv.rnk |
STAD | 10.1038/nature13480 | C2 (59) | C3 (98) | 19,681 | 357.27 | 2.38 | 1.62 | 0.56 | 0.01 | 0.91 | 14.10 | 13.64 | 78.59 | STAD_C2C3_Wald_pv.rnk |
STAD | 10.1038/nature13480 | C2 (59) | C4 (48) | 19,664 | 339.91 | 2.52 | 1.83 | 0.21 | 0.01 | 0.91 | 13.01 | 14.27 | 86.03 | STAD_C2C4_Wald_pv.rnk |
STAD | 10.1038/nature13480 | C3 (98) | C4 (48) | 19,681 | 344.40 | 2.17 | 2.05 | 0.22 | 0.12 | 0.91 | 15.15 | 14.13 | 80.49 | STAD_C3C4_Wald_pv.rnk |
average | 378.87 | 2.70 | 2.06 | 0.27 | 0.02 | 0.91 | 13.57 | 14.30 | 84.20 | |||||
standard deviation | 33.05 | 1.14 | 0.54 | 0.10 | 0.03 | 0.01 | 3.01 | 3.13 | 6.69 |
References
- Mootha, V.; Lindgren, C.; Eriksson, K.; Subramanian, A.; Sihag, S.; Lehar, J.; Puigserver, P.; Carlsson, E.; Ridderstråle, M.; Laurila, E.; et al. PGC1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 2003, 34, 267–273. [Google Scholar] [CrossRef] [PubMed]
- Wu, D.; Smyth, G. Camera: A competitive gene set test accounting for inter-gene correlation. Nucleic Acids Res. 2012, 40, e133. [Google Scholar] [CrossRef] [PubMed]
- Tian, L.; Greenberg, S.; Kong, S.; Altschuler, J.; Kohane, I.; Park, P. Discovering statistically significant pathways in expression profiling studies. Proc. Natl. Acad. Sci. USA 2005, 102, 13544–13549. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Das, S.; McClain, C.J.; Rai, S.N. Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges. Entropy 2020, 22, 427. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Subramanian, A.; Tamayo, P.; Mootha, V.; Mukherjee, S.; Ebert, B.; Gillette, M.; Paulovich, A.; Pomeroy, S.; Golub, T.; Lander, E.; et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 2005, 102, 15545–15550. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Barbie, D.A.; Tamayo, P.; Boehm, J.S.; Kim, S.Y.; Moody, S.E.; Dunn, I.F.; Schinzel, A.C.; Sandy, P.; Meylan, E.; Scholl, C.; et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature 2009, 462, 108–112. [Google Scholar] [CrossRef]
- Mann, H.; Whitney, D. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann. Math. Statist. 1947, 18, 50–60. [Google Scholar] [CrossRef]
- Wilcoxon, F. Individual Comparisons by Ranking Methods. Biom. Bull. 1945, 1, 80–83. [Google Scholar] [CrossRef]
- Korotkevich, G.; Sukhov, V.; Budin, N.; Shpak, B.; Artyomov, M.N.; Sergushichev, A. Fast gene set enrichment analysis. bioRxiv 2021. [Google Scholar] [CrossRef] [Green Version]
- Yu, G.; Wang, L.G.; Han, Y.; He, Q.Y. clusterProfiler: An R Package for Comparing Biological Themes Among Gene Clusters. OMICS 2012, 16, 284–287. [Google Scholar] [CrossRef]
- Pagnotta, S.M. massiveGST: Competitive Gene Sets Test with the Mann–Whitney–Wilcoxon Test. R Package Version 1.0.0. 2022. Available online: https://CRAN.R-project.org/package=massiveGST (accessed on 11 April 2022).
- Cerulo, L.; Pagnotta, S.M. Massive Gene-Sets Test. 2019. Available online: http://www.massiveGeneSetsTest.org (accessed on 11 April 2022).
- Gerstner, N.; Kehl, T.; Lenhof, K.; Müller, A.; Mayer, C.; Eckhart, L.; Grammes, N.L.; Diener, C.; Hart, M.; Hahn, O.; et al. GeneTrail 3: Advanced high-throughput enrichment analysis. Nucleic Acids Res. 2020, 48, W515–W520. [Google Scholar] [CrossRef] [PubMed]
- Liao, Y.; Wang, J.; Jaehnig, E.J.; Shi, Z.; Zhang, B. WebGestalt 2019: Gene set analysis toolkit with revamped UIs and APIs. Nucleic Acids Res. 2019, 47, W199–W205. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Stöckel, D.; Kehl, T.; Trampert, P.; Schneider, L.; Backes, C.; Ludwig, N.; Gerasch, A.; Kaufmann, M.; Gessler, M.; Graf, N.; et al. Multi-omics enrichment analysis using the GeneTrail2 web service. Bioinformatics 2016, 32, 1502–1508. [Google Scholar] [CrossRef] [PubMed]
- Frattini, V.; Pagnotta, S.; Fan, J.; Russo, M.; Lee, S.; Garofano, L.; Zhang, J.; Shi, P.; Lewis, G.; Sanson, H.; et al. A metabolic function of FGFR3-TACC3 gene fusions in cancer. Nature 2018, 553, 222. [Google Scholar] [CrossRef]
- Bamber, D. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J. Math. Psychol. 1975, 12, 387–415. [Google Scholar] [CrossRef]
- Schneider, K.; Venn, B.; Mühlhaus, T. TMEA: A Thermodynamically Motivated Framework for Functional Characterization of Biological Responses to System Acclimation. Entropy 2020, 22, 1030. [Google Scholar] [CrossRef]
- Liberzon, A.; Subramanian, A.; Pinchback, R.; Thorvaldsdóttir, H.; Tamayo, P.; Mesirov, J.P. Molecular signatures database (MSigDB) 3.0. Bioinformatics 2011, 27, 1739–1740. [Google Scholar] [CrossRef]
- Sales, G.; Romualdi, C. parmigene: A parallel R package for mutual information estimation and gene network reconstruction. Bioinformatics 2011, 27, 1876–1877. [Google Scholar] [CrossRef]
- Colaprico, A.; Silva, T.C.; Olsen, C.; Garofano, L.; Cava, C.; Garolini, D.; Sabedot, T.S.; Malta, T.M.; Pagnotta, S.M.; Castiglioni, I.; et al. TCGAbiolinks: An R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 2015, 44, e71. [Google Scholar] [CrossRef]
- Love, M.I.; Huber, W.; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014, 15, 550. [Google Scholar] [CrossRef] [Green Version]
- Geistlinger, L.; Csaba, G.; Santarelli, M.; Ramos, M.; Schiffer, L.; Turaga, N.; Law, C.; Davis, S.; Carey, V.; Morgan, M.; et al. Toward a gold standard for benchmarking gene set enrichment analysis. Brief. Bioinform. 2020, 22, 545–556. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Garofano, L.; Migliozzi, S.; Oh, Y.T.; D’Angelo, F.; Najac, R.D.; Ko, A.; Frangaj, B.; Caruso, F.P.; Yu, K.; Yuan, J.; et al. Pathway-based classification of glioblastoma uncovers a mitochondrial subtype with therapeutic vulnerabilities. Nat. Cancer 2021, 2, 141–156. [Google Scholar] [CrossRef] [PubMed]
- Huang, D.W.; Sherman, B.T.; Lempicki, R.A. Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2008, 37, 1–13. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bender, E. Challenges: Crowdsourced solutions. Nature 2016, 533, S62–S64. [Google Scholar] [CrossRef] [Green Version]
- Lim, W.K.; Lyashenko, E.; Califano, A. Master Regulators Used As Breast Cancer Metastasis Classifier. In Proceedings of the Pacific Symposium on Biocomputing, Kohala Coast, HI, USA, 5–9 January 2009; pp. 504–515. [Google Scholar] [CrossRef] [Green Version]
- Chanda, P.; Costa, E.; Hu, J.; Sukumar, S.; Van Hemert, J.; Walia, R. Information Theory in Computational Biology: Where We Stand Today. Entropy 2020, 22, 627. [Google Scholar] [CrossRef]
- Sarkar, S.; Hubbard, J.B.; Halter, M.; Plant, A.L. Information Thermodynamics and Reducibility of Large Gene Networks. Entropy 2021, 23, 63. [Google Scholar] [CrossRef]
EA Tool | Reference | Year | Test | Available as |
---|---|---|---|---|
camera | [2] | 2012 | MWW | R function in limma package |
GSEA | [5] | 2005 | wKS | R package |
fGSEA | [9] | 2021 | wKS | R package |
clusterProfiler | [10] | 2012 | wKS | R package |
massiveGST | [11,12] | 2022 | MWW | R package/web |
GeneTrial3 | [13] | 2020 | wKS/MWW | web |
WebGestalt | [14] | 2019 | wKS | web |
NES | Odds | logit2NES |
---|---|---|
0.20 | 0.25 | −2.00 |
0.30 | 0.43 | −1.22 |
0.40 | 0.67 | −0.58 |
0.50 | 1.00 | 0.00 |
0.60 | 1.50 | 0.58 |
0.65 | 1.86 | 0.90 |
0.75 | 3.00 | 1.58 |
0.90 | 9.00 | 3.17 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cerulo, L.; Pagnotta, S.M. massiveGST: A Mann–Whitney–Wilcoxon Gene-Set Test Tool That Gives Meaning to Gene-Set Enrichment Analysis. Entropy 2022, 24, 739. https://doi.org/10.3390/e24050739
Cerulo L, Pagnotta SM. massiveGST: A Mann–Whitney–Wilcoxon Gene-Set Test Tool That Gives Meaning to Gene-Set Enrichment Analysis. Entropy. 2022; 24(5):739. https://doi.org/10.3390/e24050739
Chicago/Turabian StyleCerulo, Luigi, and Stefano Maria Pagnotta. 2022. "massiveGST: A Mann–Whitney–Wilcoxon Gene-Set Test Tool That Gives Meaning to Gene-Set Enrichment Analysis" Entropy 24, no. 5: 739. https://doi.org/10.3390/e24050739