Identifying Single-Cell Expression Quantitative Trait Loci Using a Bootstrap Penalized Hurdle Model
Abstract
1. Introduction
2. Materials and Methods
2.1. Hurdle Model Structure
2.2. Penalization
2.3. Model Inference
3. Simulation Study
3.1. Data Generated from Hurdle Poisson Model
3.2. Data Generated from Zero-Inflated Negative Binomial Model
4. Case Study
5. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Preprocessing Steps for Genotype Data in Case Study
Appendix B. Derivation of Equation (5)
References
- Cano-Gamez, E.; Trynka, G. From GWAS to function: Using functional genomics to identify the mechanisms underlying complex diseases. Front. Genet. 2020, 11, 505357. [Google Scholar] [CrossRef]
- Uffelmann, E.; Huang, Q.Q.; Munung, N.S.; De Vries, J.; Okada, Y.; Martin, A.R.; Martin, H.C.; Lappalainen, T.; Posthuma, D. Genome-wide association studies. Nat. Rev. Methods Prim. 2021, 1, 59. [Google Scholar] [CrossRef]
- Maurano, M.T.; Humbert, R.; Rynes, E.; Thurman, R.E.; Haugen, E.; Wang, H.; Reynolds, A.P.; Sandstrom, R.; Qu, H.; Brody, J.; et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 2012, 337, 1190–1195. [Google Scholar] [CrossRef] [PubMed]
- Zhu, Z.; Zhang, F.; Hu, H.; Bakshi, A.; Robinson, M.R.; Powell, J.E.; Montgomery, G.W.; Goddard, M.E.; Wray, N.R.; Visscher, P.M.; et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 2016, 48, 481–487. [Google Scholar] [CrossRef]
- Hormozdiari, F.; Van De Bunt, M.; Segre, A.V.; Li, X.; Joo, J.W.J.; Bilow, M.; Sul, J.H.; Sankararaman, S.; Pasaniuc, B.; Eskin, E. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 2016, 99, 1245–1260. [Google Scholar] [CrossRef] [PubMed]
- GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 2020, 369, 1318–1330. [Google Scholar] [CrossRef]
- Schmiedel, B.J.; Singh, D.; Madrigal, A.; Valdovino-Gonzalez, A.G.; White, B.M.; Zapardiel-Gonzalo, J.; Ha, B.; Altay, G.; Greenbaum, J.A.; McVicker, G.; et al. Impact of genetic polymorphisms on human immune cell gene expression. Cell 2018, 175, 1701–1715. [Google Scholar] [CrossRef] [PubMed]
- Shabalin, A.A. Matrix eQTL: Ultra fast eQTL analysis via large matrix operations. Bioinformatics 2012, 28, 1353–1358. [Google Scholar] [CrossRef] [PubMed]
- Young, A.M.; Kumasaka, N.; Calvert, F.; Hammond, T.R.; Knights, A.; Panousis, N.; Park, J.S.; Schwartzentruber, J.; Liu, J.; Kundu, K.; et al. A map of transcriptional heterogeneity and regulatory variation in human microglia. Nat. Genet. 2021, 53, 861–868. [Google Scholar] [CrossRef]
- Patel, D.; Zhang, X.; Farrell, J.J.; Chung, J.; Stein, T.D.; Lunetta, K.L.; Farrer, L.A. Cell-type-specific expression quantitative trait loci associated with Alzheimer disease in blood and brain tissue. Transl. Psychiatry 2021, 11, 250. [Google Scholar] [CrossRef] [PubMed]
- Kim-Hellmuth, S.; Aguet, F.; Oliva, M.; Muñoz-Aguirre, M.; Kasela, S.; Wucher, V.; Castel, S.E.; Hamel, A.R.; Viñuela, A.; Roberts, A.L.; et al. Cell type–specific genetic regulation of gene expression across human tissues. Science 2020, 369, eaaz8528. [Google Scholar] [CrossRef] [PubMed]
- Donovan, M.K.; D’Antonio-Chronowska, A.; D’Antonio, M.; Frazer, K.A. Cellular deconvolution of GTEx tissues powers discovery of disease and cell-type associated regulatory variants. Nat. Commun. 2020, 11, 955. [Google Scholar] [CrossRef]
- Aguirre-Gamboa, R.; de Klein, N.; di Tommaso, J.; Claringbould, A.; van der Wijst, M.G.; de Vries, D.; Brugge, H.; Oelen, R.; Võsa, U.; Zorro, M.M.; et al. Deconvolution of bulk blood eQTL effects into immune cell subpopulations. BMC Bioinform. 2020, 21, 243. [Google Scholar] [CrossRef]
- Maria, M.; Pouyanfar, N.; Örd, T.; Kaikkonen, M.U. The power of single-cell RNA sequencing in eQTL discovery. Genes 2022, 13, 502. [Google Scholar] [CrossRef] [PubMed]
- Cuomo, A.S.; Alvari, G.; Azodi, C.B.; single-cell eQTLGen consortium; McCarthy, D.J.; Bonder, M.J. Optimizing expression quantitative trait locus mapping workflows for single-cell studies. Genome Biol. 2021, 22, 188. [Google Scholar] [CrossRef]
- Hu, Y.; Xi, X.; Yang, Q.; Zhang, X. SCeQTL: An R package for identifying eQTL from single-cell parallel sequencing data. BMC Bioinform. 2020, 21, 184. [Google Scholar] [CrossRef]
- Nathan, A.; Asgari, S.; Ishigaki, K.; Valencia, C.; Amariuta, T.; Luo, Y.; Beynor, J.I.; Baglaenko, Y.; Suliman, S.; Price, A.L.; et al. Single-cell eQTL models reveal dynamic T cell state dependence of disease loci. Nature 2022, 606, 120–128. [Google Scholar] [CrossRef] [PubMed]
- Cuomo, A.S.; Heinen, T.; Vagiaki, D.; Horta, D.; Marioni, J.C.; Stegle, O. CellRegMap: A statistical framework for mapping context-specific regulatory variants using scRNA-seq. Mol. Syst. Biol. 2022, 18, e10663. [Google Scholar] [CrossRef] [PubMed]
- Kumasaka, N.; Rostom, R.; Huang, N.; Polanski, K.; Meyer, K.B.; Patel, S.; Boyd, R.; Gomez, C.; Barnett, S.N.; Panousis, N.I.; et al. Mapping interindividual dynamics of innate immune response at single-cell resolution. Nat. Genet. 2023, 55, 1066–1075. [Google Scholar] [CrossRef] [PubMed]
- Kang, J.B.; Raveane, A.; Nathan, A.; Soranzo, N.; Raychaudhuri, S. Methods and Insights from Single-Cell Expression Quantitative Trait Loci. Annu. Rev. Genom. Hum. Genet. 2023, 24, 277–303. [Google Scholar] [CrossRef]
- Zhou, Z.; Du, J.; Wang, J.; Liu, L.; Gordon, M.G.; Ye, C.J.; Powell, J.E.; Li, M.J.; Rao, S. SingleQ: A comprehensive database of single-cell expression quantitative trait loci (sc-eQTLs) cross human tissues. Database 2024, 2024, baae010. [Google Scholar] [CrossRef]
- Abell, N.S.; DeGorter, M.K.; Gloudemans, M.J.; Greenwald, E.; Smith, K.S.; He, Z.; Montgomery, S.B. Multiple causal variants underlie genetic associations in humans. Science 2022, 375, 1247–1254. [Google Scholar] [CrossRef] [PubMed]
- Jansen, R.; Hottenga, J.J.; Nivard, M.G.; Abdellaoui, A.; Laport, B.; De Geus, E.J.; Wright, F.A.; Penninx, B.W.; Boomsma, D.I. Conditional eQTL analysis reveals allelic heterogeneity of gene expression. Hum. Mol. Genet. 2017, 26, 1444–1451. [Google Scholar] [CrossRef] [PubMed]
- Cheng, W.; Shi, Y.; Zhang, X.; Wang, W. Sparse regression models for unraveling group and individual associations in eQTL mapping. BMC Bioinform. 2016, 17, 136. [Google Scholar] [CrossRef]
- Wang, Z.; Xu, J.; Shi, X. Finding alternative expression quantitative trait loci by exploring sparse model space. J. Comput. Biol. 2014, 21, 385–393. [Google Scholar] [CrossRef] [PubMed]
- Shalek, A.K.; Satija, R.; Adiconis, X.; Gertner, R.S.; Gaublomme, J.T.; Raychowdhury, R.; Schwartz, S.; Yosef, N.; Malboeuf, C.; Lu, D.; et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 2013, 498, 236–240. [Google Scholar] [CrossRef] [PubMed]
- Finak, G.; McDavid, A.; Yajima, M.; Deng, J.; Gersuk, V.; Shalek, A.K.; Slichter, C.K.; Miller, H.W.; McElrath, M.J.; Prlic, M.; et al. MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 2015, 16, 278. [Google Scholar] [CrossRef] [PubMed]
- Sekula, M.; Gaskins, J.; Datta, S. Detection of differentially expressed genes in discrete single-cell RNA sequencing data using a hurdle model with correlated random effects. Biometrics 2019, 75, 1051–1062. [Google Scholar] [CrossRef]
- Zhang, J.; Zhao, H. eQTL studies: From bulk tissues to single cells. J. Genet. Genom. 2023, 50, 925–933. [Google Scholar] [CrossRef] [PubMed]
- Westra, H.J.; Franke, L. From genome to function by studying eQTLs. Biochim. Biophys. Acta-(BBA)-Mol. Basis Dis. 2014, 1842, 1896–1902. [Google Scholar] [CrossRef]
- Shan, N.; Wang, Z.; Hou, L. Identification of trans-eQTLs using mediation analysis with multiple mediators. BMC Bioinform. 2019, 20, 87–97. [Google Scholar] [CrossRef] [PubMed]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef]
- Tay, J.K.; Narasimhan, B.; Hastie, T. Elastic net regularization paths for all generalized linear models. J. Stat. Softw. 2023, 106, 1–31. [Google Scholar] [CrossRef]
- Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. (Methodol.) 1995, 57, 289–300. [Google Scholar] [CrossRef]
- The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 2015, 526, 68–74. [Google Scholar] [CrossRef] [PubMed]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023. [Google Scholar]
- Nathan, A.; Beynor, J.I.; Baglaenko, Y.; Suliman, S.; Ishigaki, K.; Asgari, S.; Huang, C.C.; Luo, Y.; Zhang, Z.; Lopez, K.; et al. Multimodally profiling memory T cells from a tuberculosis cohort identifies cell state associations with demographics, environment and disease. Nat. Immunol. 2021, 22, 781–793. [Google Scholar] [CrossRef] [PubMed]
- Luo, Y.; Suliman, S.; Asgari, S.; Amariuta, T.; Baglaenko, Y.; Martínez-Bonet, M.; Ishigaki, K.; Gutierrez-Arcelus, M.; Calderon, R.; Lecca, L.; et al. Early progression to active tuberculosis is a highly heritable trait driven by 3q23 in Peruvians. Nat. Commun. 2019, 10, 3765. [Google Scholar] [CrossRef] [PubMed]
- Stelzer, G.; Rosen, N.; Plaschkes, I.; Zimmerman, S.; Twik, M.; Fishilevich, S.; Stein, T.I.; Nudel, R.; Lieder, I.; Mazor, Y.; et al. The GeneCards suite: From gene data mining to disease genome sequence analyses. Curr. Protoc. Bioinform. 2016, 54, 1.30.1–1.30.33. [Google Scholar] [CrossRef]
- Montoya, D.; Inkeles, M.S.; Liu, P.T.; Realegeno, S.; B. Teles, R.M.; Vaidya, P.; Munoz, M.A.; Schenk, M.; Swindell, W.R.; Chun, R.; et al. IL-32 is a molecular marker of a host defense network in human tuberculosis. Sci. Transl. Med. 2014, 6, 250ra114. [Google Scholar] [CrossRef] [PubMed]
- Bai, X.; Shang, S.; Henao-Tamayo, M.; Basaraba, R.J.; Ovrutsky, A.R.; Matsuda, J.L.; Takeda, K.; Chan, M.M.; Dakhama, A.; Kinney, W.H.; et al. Human IL-32 expression protects mice against a hypervirulent strain of Mycobacterium tuberculosis. Proc. Natl. Acad. Sci. USA 2015, 112, 5111–5116. [Google Scholar] [CrossRef] [PubMed]
- Koeken, V.A.; Verrall, A.J.; Ardiansyah, E.; Apriani, L.; Dos Santos, J.C.; Kumar, V.; Alisjahbana, B.; Hill, P.C.; Joosten, L.A.; van Crevel, R.; et al. IL-32 and its splice variants are associated with protection against Mycobacterium tuberculosis infection and skewing of Th1/Th17 cytokines. J. Leukoc. Biol. 2020, 107, 113–118. [Google Scholar] [CrossRef] [PubMed]
- Lonsdale, J.; Thomas, J.; Salvatore, M.; Phillips, R.; Lo, E.; Shad, S.; Hasz, R.; Walters, G.; Garcia, F.; Young, N.; et al. The genotype-tissue expression (GTEx) project. Nat. Genet. 2013, 45, 580–585. [Google Scholar] [CrossRef] [PubMed]
- Ogongo, P.; Tran, A.; Marzan, F.; Gingrich, D.; Krone, M.; Aweeka, F.; Lindestam Arlehamn, C.S.; Martin, J.N.; Deeks, S.G.; Hunt, P.W.; et al. High-parameter phenotypic characterization reveals a subset of human Th17 cells that preferentially produce IL-17 against M. tuberculosis antigen. Front. Immunol. 2024, 15, 1378040. [Google Scholar] [CrossRef]
- Ogongo, P.; Tezera, L.B.; Ardain, A.; Nhamoyebonde, S.; Ramsuran, D.; Singh, A.; Ng’oepe, A.; Karim, F.; Naidoo, T.; Khan, K.; et al. Tissue-resident-like CD4+ T cells secreting IL-17 control Mycobacterium tuberculosis in the human lung. J. Clin. Investig. 2021, 131, e142014. [Google Scholar] [CrossRef]
- Dunn, P.K.; Smyth, G.K. Randomized quantile residuals. J. Comput. Graph. Stat. 1996, 5, 236–244. [Google Scholar] [CrossRef]
- Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. Stat. Methodol. 2006, 68, 49–67. [Google Scholar] [CrossRef]
- Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R. A sparse-group lasso. J. Comput. Graph. Stat. 2013, 22, 231–245. [Google Scholar] [CrossRef]
- Kleiner, A.; Talwalkar, A.; Sarkar, P.; Jordan, M.I. A scalable bootstrap for massive data. J. R. Stat. Soc. Ser. Stat. Methodol. 2014, 76, 795–816. [Google Scholar] [CrossRef]
- Lee, J.D.; Sun, D.L.; Sun, Y.; Taylor, J.E. Exact post-selection inference, with application to the lasso. Ann. Stat. 2016, 44, 907–927. [Google Scholar] [CrossRef]
- Taylor, J.; Tibshirani, R. Post-selection inference for-penalized likelihood models. Can. J. Stat. 2018, 46, 41–61. [Google Scholar] [CrossRef] [PubMed]
- Chang, C.C.; Chow, C.C.; Tellier, L.C.; Vattikuti, S.; Purcell, S.M.; Lee, J.J. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 2015, 4, s13742-015. [Google Scholar] [CrossRef] [PubMed]
- Delaneau, O.; Zagury, J.F.; Robinson, M.R.; Marchini, J.L.; Dermitzakis, E.T. Accurate, scalable and integrative haplotype estimation. Nat. Commun. 2019, 10, 5436. [Google Scholar] [CrossRef] [PubMed]
- Rubinacci, S.; Delaneau, O.; Marchini, J. Genotype imputation using the positional burrows wheeler transform. PLoS Genet. 2020, 16, e1009049. [Google Scholar] [CrossRef] [PubMed]






| Gene ID | Chromosome | Number of cis-SNP | |
|---|---|---|---|
| Gene 1 | ENSG00000198885 | 2 | 124 |
| Gene 2 | ENSG00000119844 | 2 | 607 |
| Gene 3 | ENSG00000072182 | 2 | 558 |
| Gene 4 | ENSG00000279490 | 16 | 834 |
| Gene 5 | ENSG00000205084 | 16 | 784 |
| Gene 6 | ENSG00000103187 | 16 | 2857 |
| Gene | Chromosome | Width | Number of cis-SNPs | Number of Th1 eSNPs | Number of Th2 eSNPs | Number of Th17 eSNPs |
|---|---|---|---|---|---|---|
| CD69 | 12 | 8416 | 2366 | 5 | 5 | 4 |
| DUSP1 | 5 | 3100 | 2459 | 8 | 2 | 0 |
| FOS | 14 | 3405 | 2629 | 21 | 10 | 4 |
| GZMH | 14 | 3220 | 2912 | 0 | 2 | 7 |
| IL32 | 16 | 16,896 | 2517 | 22 | 21 | 9 |
| JUN | 1 | 3257 | 2286 | 9 | 5 | 5 |
| JUNB | 19 | 1830 | 2634 | 37 | 33 | 11 |
| JUND | 19 | 1929 | 2723 | 24 | 8 | 4 |
| KLF6 | 10 | 9286 | 3627 | 15 | 10 | 1 |
| NKG7 | 19 | 1096 | 3400 | 0 | 0 | 0 |
| SOCS3 | 17 | 3300 | 3603 | 11 | 6 | 0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wu, D.; Datta, S. Identifying Single-Cell Expression Quantitative Trait Loci Using a Bootstrap Penalized Hurdle Model. Genes 2026, 17, 625. https://doi.org/10.3390/genes17060625
Wu D, Datta S. Identifying Single-Cell Expression Quantitative Trait Loci Using a Bootstrap Penalized Hurdle Model. Genes. 2026; 17(6):625. https://doi.org/10.3390/genes17060625
Chicago/Turabian StyleWu, Dongyuan, and Susmita Datta. 2026. "Identifying Single-Cell Expression Quantitative Trait Loci Using a Bootstrap Penalized Hurdle Model" Genes 17, no. 6: 625. https://doi.org/10.3390/genes17060625
APA StyleWu, D., & Datta, S. (2026). Identifying Single-Cell Expression Quantitative Trait Loci Using a Bootstrap Penalized Hurdle Model. Genes, 17(6), 625. https://doi.org/10.3390/genes17060625

