Machine Learning Identification of Cell-Type-Specific Molecular Signatures Distinguishing COVID-19 from Other Lower Respiratory Tract Diseases
Abstract
1. Introduction
2. Materials and Methods
2.1. Dataset Description
2.2. Outline of the Machine Learning-Based Framework
2.3. Performance Metrics
2.3.1. Weighted F1 Score
2.3.2. Additional Performance Metrics
2.4. Enrichment Analysis
3. Results
3.1. Feature Ranking Results
3.2. Cross-Algorithm Union Analysis
3.3. Results of Enrichment Analysis
4. Discussion
4.1. Functional Enrichment Reveals Convergent and Divergent Pathological Programs
4.2. Cell-Type-Specific Molecular Signatures and Pathological Mechanisms
4.2.1. Alveolar Epithelial Cell Injury, Dysfunctional Regeneration, and Metabolic Alterations
4.2.2. Innate Immune Cells: Unique IFN-γ-Driven Macrophage Response and Neutrophil Dysregulation
4.2.3. Adaptive Immune Cells: Hyperactivation, Exhaustion, and Enhanced Plasmablast Differentiation
4.3. Limitations of This Study
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| COVID-19 | Coronavirus Disease 2019 |
| LRTD | Lower respiratory tract disease |
| ARDS | Acute respiratory distress syndrome |
| SARS-CoV-2 | severe acute respiratory syndrome coronavirus 2 |
| ML | Machine learning |
| IFS | Incremental feature selection |
| SMOTE | Synthetic Minority Oversampling Technique |
| AT1 | Alveolar type 1 epithelial cell |
| AT2 | Alveolar type 2 epithelial cell |
| Lasso | Least Absolute Shrinkage and Selection Operator |
| CATBoost | Categorical Boosting |
| XGBoost | EXtreme Gradient Boosting |
| SKB | SelectKBest |
| LightGBM | Light Gradient Boosting Machine |
| AdaBoost | Adaptive Boosting |
| ExtraTrees | Extremely Randomized Trees |
| MCC | Matthews Correlation Coefficient |
References
- Liu, P.; Xu, M.; Cao, L.; Su, L.; Lu, L.; Dong, N.; Jia, R.; Zhu, X.; Xu, J. Impact of COVID-19 pandemic on the prevalence of respiratory viruses in children with lower respiratory tract infections in china. Virol. J. 2021, 18, 159. [Google Scholar] [CrossRef]
- Lu, S.; Huang, X.; Liu, R.; Lan, Y.; Lei, Y.; Zeng, F.; Tang, X.; He, H. Comparison of COVID-19 induced respiratory failure and typical ards: Similarities and differences. Front. Med. 2022, 9, 829771. [Google Scholar] [CrossRef]
- Giamarellos-Bourboulis, E.J.; Netea, M.G.; Rovina, N.; Akinosoglou, K.; Antoniadou, A.; Antonakos, N.; Damoraki, G.; Gkavogianni, T.; Adami, M.-E.; Katsaounou, P. Complex immune dysregulation in COVID-19 patients with severe respiratory failure. Cell Host Microbe 2020, 27, 992–1000. e1003. [Google Scholar] [CrossRef]
- Wauters, E.; Van Mol, P.; Garg, A.D.; Jansen, S.; Van Herck, Y.; Vanderbeke, L.; Bassez, A.; Boeckx, B.; Malengier-Devlies, B.; Timmerman, A. Discriminating mild from critical COVID-19 by innate and adaptive immune single-cell profiling of bronchoalveolar lavages. Cell Res. 2021, 31, 272–290. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Yao, X.; Ma, S.; Ping, Y.; Fan, Y.; Sun, S.; He, Z.; Shi, Y.; Sun, L.; Xiao, S. A single-cell transcriptomic landscape of the lungs of patients with COVID-19. Nat. Cell Biol. 2021, 23, 1314–1328. [Google Scholar] [CrossRef] [PubMed]
- Xiao, K.; Cao, Y.; Han, Z.; Zhang, Y.; Luu, L.D.W.; Chen, L.; Yan, P.; Chen, W.; Wang, J.; Liang, Y. A pan-immune panorama of bacterial pneumonia revealed by a large-scale single-cell transcriptome atlas. Signal Transduct. Target. Ther. 2025, 10, 5. [Google Scholar] [CrossRef] [PubMed]
- Sorensen, R.; Barber, R.; Pigott, D.; Carter, A.; Spencer, C.; Ostroff, S.; Reiner, R.; Abbafati, C.; Adolph, C.; Allorant, A. Variation in the COVID-19 infection-fatality ratio by age, time, and geography during the pre-vaccine era: A systematic analysis. Lancet 2022, 399, 1469–1488. [Google Scholar]
- Okonji, E.F.; Okonji, O.C.; Mukumbang, F.C.; Van Wyk, B. Understanding varying COVID-19 mortality rates reported in africa compared to europe, americas and asia. Trop. Med. Int. Health 2021, 26, 716–719. [Google Scholar] [CrossRef]
- Nyirenda, J.; Hardy, O.M.; Silva Filho, J.D.; Herder, V.; Attipa, C.; Ndovi, C.; Siwombo, M.; Namalima, T.R.; Suwedi, L.; Ilia, G.; et al. Spatially resolved single-cell atlas unveils a distinct cellular signature of fatal lung COVID-19 in a malawian population. Nat. Med. 2024, 30, 3765–3777. [Google Scholar] [CrossRef]
- Liu, H.; Setiono, R. Incremental feature selection. Appl. Intell. 1998, 9, 217–230. [Google Scholar] [CrossRef]
- Chawla, N.; Bowyer, K.; Hall, L.O.; Kegelmeyer, W.P. Smote: Synthetic minority over-sampling technique. arXiv 2002, arXiv:1106.1813. [Google Scholar] [CrossRef]
- Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Dorogush, A.V.; Ershov, V.; Gulin, A. Catboost: Gradient boosting with categorical features support. arXiv 2018, arXiv:1810.11363. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: San Francisco, CA, USA, 2016; pp. 785–794. [Google Scholar]
- Ayyanar, M.; Jeganathan, S.; Parthasarathy, S.; Jayaraman, V.; Lakshminarayanan, A.R. Predicting the cardiac diseases using selectkbest method equipped light gradient boosting machine. In Proceedings of the 2022 6th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 2022; IEEE: Tirunelveli, India, 2022; pp. 117–122. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3146–3154. [Google Scholar]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the International Joint Conference on Artificial Intelligence, Montreal Quebec, Canada, 20–25 August 1995; Lawrence Erlbaum Associates Ltd.: Montreal, QC, Canada, 1995; pp. 1137–1145. [Google Scholar]
- Powers, D.M. Evaluation: From precision, recall and f-measure to roc, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar] [CrossRef]
- Ren, J.; Gao, Q.; Zhou, X.; Feng, K.; Guo, W.; Huang, T.; Cai, Y.-D. Identification of gene signatures differentiating cancer from normal tissues across histological classifications of gastric adenocarcinoma via machine learning methods. Biochem. Genet. 2026. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.; Xun, X.; Zhou, B. Root-associated protein prediction using a protein large language model and hypergraph convolutional networks. Sci. Rep. 2026, 16, 4876. [Google Scholar] [CrossRef]
- Matthews, B. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochim. Biophys. Acta (BBA)-Protein Struct. 1975, 405, 442–451. [Google Scholar] [CrossRef]
- Gorodkin, J. Comparing two k-category assignments by a k-category correlation coefficient. Comput. Biol. Chem. 2004, 28, 367–374. [Google Scholar] [CrossRef]
- Fang, Z.; Liu, X.; Peltz, G. Gseapy: A comprehensive package for performing gene set enrichment analysis in python. Bioinformatics 2023, 39, btac757. [Google Scholar] [CrossRef] [PubMed]
- Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; et al. Gene ontology: Tool for the unification of biology. The gene ontology consortium. Nat. Genet. 2000, 25, 25–29. [Google Scholar] [CrossRef]
- Kanehisa, M.; Goto, S. Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28, 27–30. [Google Scholar] [CrossRef]
- Finkel, Y.; Gluck, A.; Nachshon, A.; Winkler, R.; Fisher, T.; Rozman, B.; Mizrahi, O.; Lubelsky, Y.; Zuckerman, B.; Slobodin, B.; et al. SARS-CoV-2 uses a multipronged strategy to impede host protein synthesis. Nature 2021, 594, 240–245. [Google Scholar] [CrossRef]
- Eriani, G.; Martin, F. Viral and cellular translation during SARS-CoV-2 infection. FEBS Open Bio 2022, 12, 1584–1601. [Google Scholar] [CrossRef] [PubMed]
- Zhang, D.; Zhu, L.; Wang, Y.; Li, P.; Gao, Y. Translational control of COVID-19 and its therapeutic implication. Front. Immunol. 2022, 13, 857490. [Google Scholar] [CrossRef]
- Patra, T.; Ray, R. Bystander effect of SARS-CoV-2 spike protein on human monocytic thp-1 cell activation and initiation of prothrombogenic stimulus representing severe COVID-19. J. Inflamm. 2022, 19, 28. [Google Scholar] [CrossRef] [PubMed]
- Vanderbeke, L.; Van Mol, P.; Van Herck, Y.; De Smet, F.; Humblet-Baron, S.; Martinod, K.; Antoranz, A.; Arijs, I.; Boeckx, B.; Bosisio, F.M.; et al. Monocyte-driven atypical cytokine storm and aberrant neutrophil activation as key mediators of COVID-19 disease severity. Nat. Commun. 2021, 12, 4117. [Google Scholar] [CrossRef]
- Jang, J.C.; Nair, M.G. Alternatively activated macrophages revisited: New insights into the regulation of immunity, inflammation and metabolic function following parasite infection. Curr. Immunol. Rev. 2013, 9, 147–156. [Google Scholar] [CrossRef]
- Hume, D.A. The many alternative faces of macrophage activation. Front. Immunol. 2015, 6, 370. [Google Scholar] [CrossRef]
- Sandstrom, T.S.; Ranganath, N.; Angel, J.B. Impairment of the type i interferon response by hiv-1: Potential targets for hiv eradication. Cytokine Growth Factor Rev. 2017, 37, 1–16. [Google Scholar] [CrossRef]
- Lamers, M.M.; Haagmans, B.L. SARS-CoV-2 pathogenesis. Nat. Rev. Microbiol. 2022, 20, 270–284. [Google Scholar] [CrossRef] [PubMed]
- Abbasi-Kolli, M.; Nahand, J.S.; Kiani, S.J.; Khanaliha, K.; Khatami, A.R.; Taghizadieh, M.; Torkamani, A.R.; Babakhaniyan, K.; Bokharaei-Salim, F. The expression patterns of malat-1, neat-1, thril, and mir-155-5p in the acute to the post-acute phase of COVID-19 disease. Braz. J. Infect. Dis. 2022, 26, 102354. [Google Scholar] [CrossRef]
- Duan, C.; Ma, R.; Zeng, X.; Chen, B.; Hou, D.; Liu, R.; Li, X.; Liu, L.; Li, T.; Huang, H. SARS-CoV-2 achieves immune escape by destroying mitochondrial quality: Comprehensive analysis of the cellular landscapes of lung and blood specimens from patients with COVID-19. Front. Immunol. 2022, 13, 946731. [Google Scholar] [CrossRef]
- Melms, J.C.; Biermann, J.; Huang, H.; Wang, Y.; Nair, A.; Tagore, S.; Katsyv, I.; Rendeiro, A.F.; Amin, A.D.; Schapiro, D. A molecular single-cell lung atlas of lethal COVID-19. Nature 2021, 595, 114–119. [Google Scholar] [CrossRef]
- de Andrade, J.R.; de Farias, J.B.; de Lima Vitorino, M.L.; Travassos, F.T.; da Silva, R.A.; de Lima Filho, J.L.; Valença, M.M. Cerebrospinal fluid proteomic profiling reveals proteins associated with neuroinflammatory response in COVID-19 patients. ACS Omega 2025, 10, 25489. [Google Scholar] [CrossRef] [PubMed]
- La Rosa, P.; Tiberi, J.; Palermo, E.; Tiano, S.; Cortese, M.; Hiscott, J.; Fiorenza, M.T. Inactivation of the niemann pick c1 cholesterol transporter 1 (npc1) restricts SARS-CoV-2 infection. bioRxiv 2023. bioRxiv:2013.12.13.571570. [Google Scholar]
- Zhu, A.; Zhou, L.; Chen, Z.; Liu, D.; Feng, H.; Cai, B.; Chen, X.; Zhao, J.; Zhao, J.; Chen, J. Single-cell analysis reveals t cell dysfunction driven by macrophages and differential expression of transposable elements in severe COVID-19 patients. Heliyon 2024, 10, e38688. [Google Scholar] [CrossRef]
- Liao, M.; Liu, Y.; Yuan, J.; Wen, Y.; Xu, G.; Zhao, J.; Cheng, L.; Li, J.; Wang, X.; Wang, F. Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19. Nat. Med. 2020, 26, 842–844. [Google Scholar] [CrossRef]
- Dwivedi, A.; Mhaonaigh, A.U.; Carroll, M.; Khosravi, B.; Batten, I.; Ballantine, R.S.; Phelan, S.H.; O’Doherty, L.; George, A.M.; Sui, J. Emergence of dysfunctional neutrophils with a defect in arginase-1 release in severe COVID-19. JCI Insight 2024, 9, e171659. [Google Scholar] [CrossRef]
- Chhabra, R.; Ball, C.; Chantrey, J.; Ganapathy, K. Differential innate immune responses induced by classical and variant infectious bronchitis viruses in specific pathogen free chicks. Dev. Comp. Immunol. 2018, 87, 16–23. [Google Scholar] [CrossRef] [PubMed]
- Roberts, K. Regulation of Neutrophil Function by Nampt. Ph.D. Thesis, University of Liverpool, Liverpool, UK, 2012. [Google Scholar]
- Eddens, T.; Kolls, J.K. Host defenses against bacterial lower respiratory tract infection. Curr. Opin. Immunol. 2012, 24, 424–430. [Google Scholar] [CrossRef] [PubMed]
- Chen, Z.; John Wherry, E. T cell responses in patients with COVID-19. Nat. Rev. Immunol. 2020, 20, 529–536. [Google Scholar] [CrossRef]
- Jarjour, N.N.; Masopust, D.; Jameson, S.C. T cell memory: Understanding COVID-19. Immunity 2021, 54, 14–18. [Google Scholar] [CrossRef]
- Schuurman, A.R.; Reijnders, T.D.; Saris, A.; Ramirez Moral, I.; Schinkel, M.; de Brabander, J.; van Linge, C.; Vermeulen, L.; Scicluna, B.P.; Wiersinga, W.J. Integrated single-cell analysis unveils diverging immune features of COVID-19, influenza, and other community-acquired pneumonia. eLife 2021, 10, e69661. [Google Scholar] [CrossRef]
- Kalfaoglu, B.; Almeida-Santos, J.; Tye, C.A.; Satou, Y.; Ono, M. T-cell hyperactivation and paralysis in severe COVID-19 infection revealed by single-cell analysis. Front. Immunol. 2020, 11, 589380. [Google Scholar] [CrossRef] [PubMed]
- Dey, S.; Ashwin, H.; Milross, L.; Hunter, B.; Majo, J.; Filby, A.J.; Fisher, A.J.; Kaye, P.M.; Lagos, D. Downregulation of malat1 is a hallmark of tissue and peripheral proliferative t cells in COVID-19. Clin. Exp. Immunol. 2023, 212, 262–275. [Google Scholar] [CrossRef]
- Jha, P.; Das, H. Klf2 in regulation of nf-κb-mediated immune cell function and inflammation. Int. J. Mol. Sci. 2017, 18, 2383. [Google Scholar] [CrossRef] [PubMed]
- Kim, C.W.; Oh, J.E.; Lee, H.K. Single cell transcriptomic re-analysis of immune cells in bronchoalveolar lavage fluids reveals the correlation of b cell characteristics and disease severity of patients with SARS-CoV-2 infection. Immune Netw. 2021, 21, e10. [Google Scholar] [CrossRef] [PubMed]
- Huang, K.; Wang, C.; Vagts, C.; Raguveer, V.; Finn, P.W.; Perkins, D.L. Long non-coding rnas (lncrnas) neat1 and malat1 are differentially expressed in severe COVID-19 patients: An integrated single-cell analysis. PLoS ONE 2022, 17, e0261242. [Google Scholar] [CrossRef] [PubMed]
- García-Vega, M.; Llamas-Covarrubias, M.A.; Loza, M.; Reséndiz-Sandoval, M.; Hinojosa-Trujillo, D.; Melgoza-González, E.; Valenzuela, O.; Mata-Haro, V.; Hernández-Oñate, M.; Soto-Gaxiola, A. Single-cell transcriptomic analysis of b cells reveals new insights into atypical memory b cells in COVID-19. J. Med. Virol. 2024, 96, e29851. [Google Scholar] [CrossRef]








| Cell Types | Number of Cells on Different Disease Groups | ||
|---|---|---|---|
| COVID-19 | LRTD | Non-LRTD | |
| Macrophages | 5207 | 2787 | 1778 |
| T cells | 4088 | 2337 | 2206 |
| Neutrophils | 2693 | 1643 | 3293 |
| B cells | 653 | 137 | 37 |
| Plasma cells | 1251 | 301 | 39 |
| Mast cells | 145 | 86 | 86 |
| AT1 | 3376 | 3356 | 199 |
| AT2 | 1483 | 891 | 6 |
| Endothelium | 4298 | 1959 | 991 |
| Fibroblasts | 4690 | 1267 | 149 |
| Ciliated cells | 1597 | 808 | 358 |
| Basal cells | 248 | 636 | 17 |
| Smooth muscle cells | 2927 | 613 | 70 |
| Secretory cells | 734 | 243 | 57 |
| Mesothelium | 392 | 1129 | 130 |
| Cell Type | Feature Ranking Algorithm | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| AdaBoost | CATBoost | ExtraTrees | Lasso | LightGBM | RF_ZL | Ridge | SKB | XGBoost | |
| AT1 | RF | RF | RF | RF | RF | RF | Ridge | Ridge | RF |
| AT2 | RF | Ridge | RF | RF | RF | RF | Ridge | RF | RF |
| Basal cells | RF | Ridge | Ridge | RF | Ridge | Ridge | Ridge | Ridge | Ridge |
| B cells | RF | Ridge | Ridge | Ridge | RF | Ridge | Ridge | Ridge | RF |
| Ciliated cells | RF | Ridge | RF | RF | RF | RF | Ridge | RF | RF |
| Endothelium | RF | Ridge | Ridge | RF | Ridge | Ridge | Ridge | Ridge | Ridge |
| Fibroblasts | RF | RF | RF | RF | RF | RF | Ridge | RF | RF |
| Macrophages | RF | Ridge | Ridge | RF | Ridge | Ridge | Ridge | Ridge | Ridge |
| Mast cells | RF | Ridge | Ridge | Ridge | Ridge | Ridge | Ridge | Ridge | Ridge |
| Mesothelium | RF | Ridge | RF | RF | Ridge | RF | Ridge | RF | Ridge |
| Neutrophils | RF | Ridge | Ridge | Ridge | Ridge | Ridge | Ridge | Ridge | Ridge |
| Plasma cells | RF | RF | RF | RF | RF | RF | RF | RF | RF |
| Secretory cells | RF | RF | RF | RF | RF | RF | Ridge | RF | RF |
| Smooth muscle cells | RF | Ridge | Ridge | RF | RF | Ridge | Ridge | RF | Ridge |
| T cells | RF | Ridge | Ridge | Ridge | Ridge | Ridge | Ridge | Ridge | Ridge |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Bao, Y.; Zhou, X.; Chen, L.; Feng, K.; Guo, W.; Huang, T.; Cai, Y.-D. Machine Learning Identification of Cell-Type-Specific Molecular Signatures Distinguishing COVID-19 from Other Lower Respiratory Tract Diseases. Life 2026, 16, 771. https://doi.org/10.3390/life16050771
Bao Y, Zhou X, Chen L, Feng K, Guo W, Huang T, Cai Y-D. Machine Learning Identification of Cell-Type-Specific Molecular Signatures Distinguishing COVID-19 from Other Lower Respiratory Tract Diseases. Life. 2026; 16(5):771. https://doi.org/10.3390/life16050771
Chicago/Turabian StyleBao, Yusheng, Xianchao Zhou, Lei Chen, Kaiyan Feng, Wei Guo, Tao Huang, and Yu-Dong Cai. 2026. "Machine Learning Identification of Cell-Type-Specific Molecular Signatures Distinguishing COVID-19 from Other Lower Respiratory Tract Diseases" Life 16, no. 5: 771. https://doi.org/10.3390/life16050771
APA StyleBao, Y., Zhou, X., Chen, L., Feng, K., Guo, W., Huang, T., & Cai, Y.-D. (2026). Machine Learning Identification of Cell-Type-Specific Molecular Signatures Distinguishing COVID-19 from Other Lower Respiratory Tract Diseases. Life, 16(5), 771. https://doi.org/10.3390/life16050771

