Recommendations of scRNA-seq Differential Gene Expression Analysis Based on Comprehensive Benchmarking
Abstract
:1. Introduction
2. Materials and Methods
2.1. Analysis Datasets
2.2. Simulation Methods
2.3. Diagnostic Plot Methods
2.4. DE Benchmarking Methods
2.5. Simulation Performance Methods
2.6. Real Data DE Application Methods
3. Results
3.1. Diagnostic Plots
3.2. Type-I Error Rate Control
3.3. FC Bias
3.4. FC Correlations
3.5. FDR Control
3.6. Power
3.7. AUROC and PRAUC
3.8. Computation Time
3.9. Heatmap
3.10. Real Data Application Results
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Code Availability Statement
References
- Svensson, V.; da Veiga Beltrame, E.; Pachter, L. A curated database reveals trends in single-cell transcriptomics. Database 2020, 2020, 1–7. [Google Scholar] [CrossRef]
- Cao, J.; O’Day, D.R.; Pliner, H.A.; Kingsley, P.D.; Deng, M.; Daza, R.M.; Zager, M.A.; Aldinger, K.A.; Blecher-Gonen, R.; Zhang, F.; et al. A human cell atlas of fetal gene expression. Science 2020, 370, 7721. [Google Scholar] [CrossRef]
- Jindal, A.; Gupta, P.; Jayadeva; Sengupta, D. Discovery of rare cells from voluminous single cell expression data. Nat. Commun. 2018, 9, 4719. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nguyen, A.; Khoo, W.H.; Moran, I.; Croucher, P.I.; Phan, T.G. Single Cell RNA Sequencing of Rare Immune Cell Populations. Front. Immunol. 2018, 9, 1553. [Google Scholar] [CrossRef]
- Schirmer, L.; Velmeshev, D.; Holmqvist, S.; Kaufmann, M.; Werneburg, S.; Jung, D.; Vistnes, S.; Stockley, J.H.; Young, A.; Steindel, M.; et al. Neuronal vulnerability and multilineage diversity in multiple sclerosis. Nature 2019, 573, 75–82. [Google Scholar] [CrossRef]
- Reyfman, P.A.; Walter, J.M.; Joshi, N.; Anekalla, K.R.; McQuattie-Pimentel, A.C.; Chiu, S.; Fernandez, R.; Akbarpour, M.; Chen, C.I.; Ren, Z.; et al. Single-Cell Transcriptomic Analysis of Human Lung Provides Insights into the Pathobiology of Pulmonary Fibrosis. Am. J. Respir. Crit. Care Med. 2019, 199, 1517–1536. [Google Scholar] [CrossRef] [PubMed]
- Soneson, C.; Robinson, M.D. Bias, robustness and scalability in single-cell differential expression analysis. Nat. Methods 2018, 15, 255–261. [Google Scholar] [CrossRef]
- Benidt, S.; Nettleton, D. SimSeq: A nonparametric approach to simulation of RNA-sequence datasets. Bioinformatics 2015, 31, 2131–2140. [Google Scholar] [CrossRef]
- Assefa, A.T.; Vandesompele, J.; Thas, O. SPsimSeq: Semi-parametric simulation of bulk and single-cell RNA-sequencing data. Bioinformatics 2020, 36, 3276–3278. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Crowell, H.L.; Soneson, C.; Germain, P.L.; Calini, D.; Collin, L.; Raposo, C.; Malhotra, D.; Robinson, M.D. muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data. Nat. Commun. 2020, 11, 6077. [Google Scholar] [CrossRef]
- Zappia, L.; Phipson, B.; Oshlack, A. Splatter: Simulation of single-cell RNA sequencing data. Genome Biol. 2017, 18, 174. [Google Scholar] [CrossRef]
- Li, W.V.; Li, J.J. A statistical simulator scDesign for rational scRNA-seq experimental design. Bioinformatics 2019, 35, i41–i50. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, M.; Liu, S.; Miao, Z.; Han, F.; Gottardo, R.; Sun, W. IDEAS: Individual level differential expression analysis for single-cell RNA-seq data. Genome Biol. 2022, 23, 33. [Google Scholar] [CrossRef]
- Squair, J.W.; Gautier, M.; Kathe, C.; Anderson, M.A.; James, N.D.; Hutson, T.H.; Hudelle, R.; Qaiser, T.; Matson, K.J.E.; Barraud, Q.; et al. Confronting false discoveries in single-cell differential expression. Nat. Commun. 2021, 12, 5692. [Google Scholar] [CrossRef] [PubMed]
- Robinson, M.D.; McCarthy, D.J.; Smyth, G.K. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010, 26, 139–140. [Google Scholar] [CrossRef] [Green Version]
- Ritchie, M.E.; Phipson, B.; Wu, D.; Hu, Y.; Law, C.W.; Shi, W.; Smyth, G.K. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015, 43, e47. [Google Scholar] [CrossRef] [PubMed]
- Love, M.I.; Huber, W.; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014, 15, 550. [Google Scholar] [CrossRef] [Green Version]
- Brooks, M.E.; Kristensen, K.; Van Benthem, K.J.; Magnusson, A.; Berg, C.W.; Nielsen, A.; Skaug, H.J.; Machler, M.; Bolker, B.M. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. R J. 2017, 9, 378–400. [Google Scholar] [CrossRef] [Green Version]
- He, L.; Davila-Velderrain, J.; Sumida, T.S.; Hafler, D.A.; Kellis, M.; Kulminski, A.M. NEBULA is a fast negative binomial mixed model for differential or co-expression analysis of large-scale multi-subject single-cell data. Commun. Biol. 2021, 4, 629. [Google Scholar] [CrossRef]
- Finak, G.; McDavid, A.; Yajima, M.; Deng, J.; Gersuk, V.; Shalek, A.K.; Slichter, C.K.; Miller, H.W.; McElrath, M.J.; Prlic, M.; et al. MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 2015, 16, 278. [Google Scholar] [CrossRef] [Green Version]
- Miao, Z.; Zhang, X. Differential expression analyses for single-cell RNA-Seq: Old questions on new data. Quant. Biol. 2016, 4, 243–260. [Google Scholar] [CrossRef] [Green Version]
- Jaakkola, M.K.; Seyednasrollah, F.; Mehmood, A.; Elo, L.L. Comparison of methods to detect differentially expressed genes between single-cell populations. Brief. Bioinform. 2017, 18, 735–743. [Google Scholar] [CrossRef]
- Dal Molin, A.; Baruzzo, G.; Di Camillo, B. Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods. Front Genet. 2017, 8, 62. [Google Scholar] [CrossRef]
- Reich, D.S.; Lucchinetti, C.F.; Calabresi, P.A. Multiple Sclerosis. N. Engl. J. Med. 2018, 378, 169–180. [Google Scholar] [CrossRef] [PubMed]
- Lassmann, H. Multiple Sclerosis Pathology. Cold Spring Harb. Perspect. Med. 2018, 8, a028936. [Google Scholar] [CrossRef] [Green Version]
- Trapp, B.D.; Peterson, J.; Ransohoff, R.M.; Rudick, R.; Mork, S.; Bo, L. Axonal transection in the lesions of multiple sclerosis. N. Engl. J. Med. 1998, 338, 278–285. [Google Scholar] [CrossRef] [PubMed]
- Schirmer, L.; Antel, J.P.; Bruck, W.; Stadelmann, C. Axonal loss and neurofilament phosphorylation changes accompany lesion development and clinical progression in multiple sclerosis. Brain Pathol. 2011, 21, 428–440. [Google Scholar] [CrossRef] [PubMed]
- Lederer, D.J.; Martinez, F.J. Idiopathic Pulmonary Fibrosis. N. Engl. J. Med. 2018, 379, 797–798. [Google Scholar] [CrossRef]
- Wynn, T.A. Fibrotic disease and the T(H)1/T(H)2 paradigm. Nat. Rev. Immunol. 2004, 4, 583–594. [Google Scholar] [CrossRef] [Green Version]
- Korsunsky, I.; Millard, N.; Fan, J.; Slowikowski, K.; Zhang, F.; Wei, K.; Baglaenko, Y.; Brenner, M.; Loh, P.R.; Raychaudhuri, S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 2019, 16, 1289–1296. [Google Scholar] [CrossRef]
- Sing, T.; Sander, O.; Beerenwinkel, N.; Lengauer, T. ROCR: Visualizing classifier performance in R. Bioinformatics 2005, 21, 3940–3941. [Google Scholar] [CrossRef] [PubMed]
- Grau, J.; Grosse, I.; Keilwagen, J. PRROC: Computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 2015, 31, 2595–2597. [Google Scholar] [CrossRef] [PubMed]
- Subramanian, A.; Tamayo, P.; Mootha, V.K.; Mukherjee, S.; Ebert, B.L.; Gillette, M.A.; Paulovich, A.; Pomeroy, S.L.; Golub, T.R.; Lander, E.S.; et al. Gene set enrichment analysis: A knowledge-based approach for interpreting Genom.e-wide expression profiles. Proc. Natl. Acad. Sci USA 2005, 102, 15545–15550. [Google Scholar] [CrossRef] [Green Version]
- Liberzon, A.; Subramanian, A.; Pinchback, R.; Thorvaldsdottir, H.; Tamayo, P.; Mesirov, J.P. Molecular signatures database (MSigDB) 3.0. Bioinformatics 2011, 27, 1739–1740. [Google Scholar] [CrossRef] [PubMed]
- Korotkevich, G.; Sukhov, V.; Budin, N.; Shpak, B.; Artyomov, M.N.; Sergushichev, A. Fast gene set enrichment analysis. bioRxiv 2021. [Google Scholar] [CrossRef] [Green Version]
- Beutel, T.; Dzimiera, J.; Kapell, H.; Engelhardt, M.; Gass, A.; Schirmer, L. Cortical projection neurons as a therapeutic target in multiple sclerosis. Expert Opin. Ther. Targets 2020, 24, 1211–1224. [Google Scholar] [CrossRef]
- Lauranzano, E.; Pozzi, S.; Pasetto, L.; Stucchi, R.; Massignan, T.; Paolella, K.; Mombrini, M.; Nardo, G.; Lunetta, C.; Corbo, M.; et al. Peptidylprolyl isomerase A governs TARDBP function and assembly in heterogeneous nuclear ribonucleoprotein complexes. Brain 2015, 138, 974–991. [Google Scholar] [CrossRef] [Green Version]
- Gilgun-Sherki, Y.; Melamed, E.; Offen, D. The role of oxidative stress in the pathogenesis of multiple sclerosis: The need for effective antioxidant therapy. J. Neurol. 2004, 251, 261–268. [Google Scholar] [CrossRef]
- Gonsette, R.E. Neurodegeneration in multiple sclerosis: The role of oxidative stress and excitotoxicity. J. Neurol. Sci. 2008, 274, 48–53. [Google Scholar] [CrossRef]
- Ascherio, A.; Munger, K.L. Environmental risk factors for multiple sclerosis. Part I: The role of infection. Ann. Neurol. 2007, 61, 288–299. [Google Scholar] [CrossRef]
- Homer, R.J.; Elias, J.A.; Lee, C.G.; Herzog, E. Modern concepts on the role of inflammation in pulmonary fibrosis. Arch. Pathol. Lab. Med. 2011, 135, 780–788. [Google Scholar] [CrossRef]
- Kuwano, K. Involvement of epithelial cell apoptosis in interstitial lung diseases. Intern. Med. 2008, 47, 345–353. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Noble, P.W.; Homer, R.J. Idiopathic pulmonary fibrosis: New insights into pathogenesis. Clin. Chest Med. 2004, 25, 749–758. [Google Scholar] [CrossRef] [PubMed]
- Bouland, G.A.; Mahfouz, A.; Reinders, M.J.T. Differential analysis of binarized single-cell RNA sequencing data captures biological variation. NAR Genom. Bioinform. 2021, 3, lqab118. [Google Scholar] [CrossRef]
- Alan, E.; Murphy, N.G.S. A balanced measure shows superior performance of pseudobulk methods over mixed models and pseudoreplication approaches in single-cell RNA-sequencing analysis. bioRxiv 2022. [Google Scholar] [CrossRef]
- Zimmerman, K.D.; Espeland, M.A.; Langefeld, C.D. A practical solution to pseudoreplication bias in single-cell studies. Nat. Commun. 2021, 12, 738. [Google Scholar] [CrossRef]
Good | Intermediate | Poor | |
---|---|---|---|
Power.median | Kmean class including max. median power | Otherwise | Kmean class including min. median power |
FDP.median | no more than 75% of FDPs (False Discovery Proportion) on one side (above or below) of 0.05 and 0.0167 < median FDP < 0.15 | Otherwise | median FDP ≥ 0.25 or median FDP ≤ 0.01 or at least one FDP is missing |
missFDP | 0 | <0.5 | ≥0.5 |
AUROC.median | ≥0.9 | 0.7≤ and <0.9 | <0.7 |
PRAUC.median | ≥0.8 | 0.4≤ and <0.8 | <0.4 |
FPR.median | |||
Time.median | ≤10 | 10< and ≤500 | >500 |
Abs(FC bias.median) ( is 1.2 for Schirmer et al. [5] or 1.4 for Reyfman et al. [6]) |
Method | Covariates? | Documentation? | Fixed Effect Matrix? | Random Design Matrix? | Download Link |
---|---|---|---|---|---|
t-test | No | Textbook | No | no | N/A |
u-test | No | Textbook | No | no | N/A |
ancova | Yes | Textbook | No 1 | no | N/A |
edgeR | Yes | vignette, users guide, reference | yes | no | https://bioconductor.org/packages/release/bioc/html/edgeR.html (last accessed 22 April 2022) |
limma | Yes | quickstart, users guide, reference | yes | no | https://bioconductor.org/packages/release/bioc/html/limma.html (last accessed 8 February 2022) |
DESeq2 | Yes | quick start, users guide, reference | yes | no | https://bioconductor.org/packages/release/bioc/html/DESeq2.html (last accessed 11 February 2022) |
MAST | Yes | intro, MAST examples, reference | yes | yes | https://www.bioconductor.org/packages/release/bioc/html/MAST.html (last accessed 10 February 2022) |
glmmTMB | Yes | multiple vignettes and reference | yes | yes | https://cran.r-project.org/web/packages/glmmTMB/index.html (last accessed 1 April 2022) |
NEBULA | Yes | vignette and reference | yes | no | https://cran.r-project.org/web/packages/nebula/index.html (last accessed 2 June 2022) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gagnon, J.; Pi, L.; Ryals, M.; Wan, Q.; Hu, W.; Ouyang, Z.; Zhang, B.; Li, K. Recommendations of scRNA-seq Differential Gene Expression Analysis Based on Comprehensive Benchmarking. Life 2022, 12, 850. https://doi.org/10.3390/life12060850
Gagnon J, Pi L, Ryals M, Wan Q, Hu W, Ouyang Z, Zhang B, Li K. Recommendations of scRNA-seq Differential Gene Expression Analysis Based on Comprehensive Benchmarking. Life. 2022; 12(6):850. https://doi.org/10.3390/life12060850
Chicago/Turabian StyleGagnon, Jake, Lira Pi, Matthew Ryals, Qingwen Wan, Wenxing Hu, Zhengyu Ouyang, Baohong Zhang, and Kejie Li. 2022. "Recommendations of scRNA-seq Differential Gene Expression Analysis Based on Comprehensive Benchmarking" Life 12, no. 6: 850. https://doi.org/10.3390/life12060850
APA StyleGagnon, J., Pi, L., Ryals, M., Wan, Q., Hu, W., Ouyang, Z., Zhang, B., & Li, K. (2022). Recommendations of scRNA-seq Differential Gene Expression Analysis Based on Comprehensive Benchmarking. Life, 12(6), 850. https://doi.org/10.3390/life12060850