Next Article in Journal
The Roles of RNA-Binding Proteins in Vasculogenic Mimicry Regulation in Glioblastoma
Previous Article in Journal
Folic Acid Supplementation and Risk of Gestational Diabetes Mellitus: A Systematic Review of the Literature
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Machine Learning Approach to Understanding the Genetic Role in COVID-19 Prognosis: The Influence of Gene Polymorphisms Related to Inflammation, Vitamin D, and ACE2

by
Sofía Jaurrieta-Largo
1,†,
José Pablo Miramontes-González
2,3,†,
Luis Corral-Gudino
2,3,
Miriam Gabella-Martín
2,
Sofía Pérez-Arroyo
2,
Ana M. Torres
4,5,
Jorge Mateo
4,5,
José Luis Pérez-Castrillón
2,3,*,‡ and
Ricardo Usategui-Martín
6,7,*,‡
1
Department of Pneumonology, University Clinical Hospital of Valladolid, 47003 Valladolid, Spain
2
Department of Internal Medicine, Río Hortega University Hospital, 47012 Valladolid, Spain
3
Department of Medicine, Faculty of Medicine, University of Valladolid, 47003 Valladolid, Spain
4
Medical Analysis Expert Group, Castilla-La Mancha Institute of Health Research (IDISCAM), 45071 Toledo, Spain
5
Medical Analysis Expert Group, Institute of Technology, University of Castilla-La Mancha, 16071 Cuenca, Spain
6
Department of Cell Biology, Faculty of Medicine, University of Valladolid, 47005 Valladolid, Spain
7
Unit of Excellence IOBA, University of Valladolid, 47005 Valladolid, Spain
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
These authors also contributed equally to this work.
Int. J. Mol. Sci. 2025, 26(16), 7975; https://doi.org/10.3390/ijms26167975
Submission received: 29 June 2025 / Revised: 10 August 2025 / Accepted: 17 August 2025 / Published: 18 August 2025
(This article belongs to the Special Issue Molecular Progression of Genome-Related Diseases: 2nd Edition)

Abstract

The genetic background influences the outcomes of COVID-19. This study aimed to evaluate the incidence of polymorphisms in genes linked to the RAAS system, cytokine production, and vitamin D on COVID-19 severity, with the goal of gaining a deeper understanding of the genetic etiology related to COVID-19. This study involved 338 COVID-19 patients and employed machine learning methods to identify the genetic variants that most significantly affect COVID-19 severity. The results revealed that polymorphisms in the IL6, IL6R, IL1α, IL1R, IFNγ, TNFα, CRP, VDR, VDBP, and ACE2 genes are the most significant genetic factors influencing COVID-19 prognosis, particularly in terms of the risks of COVID-19 pneumonia, mortality, rehospitalization, and associated mortality. The machine learning methods achieved an AUC of 0.86 for predicting COVID-19 pneumonia, mortality, and mortality related to rehospitalization, as well as an AUC of 0.85 for rehospitalization within the first year. These results confirm the crucial role of genetic background in COVID-19 prognosis, facilitating the identification of patients at increased risk. In summary, this research demonstrates that genetics-driven machine learning models can pinpoint patients at heightened risk by primarily focusing on genetic variants associated with ACE2, inflammation, and vitamin D.

1. Introduction

Coronavirus disease 2019 (COVID-19) is an infectious disease caused by the acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1]. According to WHO data, the COVID-19 pandemic resulted in 777 million cases and 7 million deaths worldwide [2]. The renin–angiotensin–aldosterone system (RAAS) plays a crucial role in its pathogenesis [3]. SARS-CoV-2 is an angiotensin I-converting enzyme 2 (ACE2) tropic virus; the viral “spike” (S) protein binds to the nasopharyngeal mucosa and alveolar pneumocytes that have ACE2 receptors on their surfaces [4,5]. ACE2, angiotensin II (Ang-II), and Ang 1–7 play crucial roles in regulating fibrosis, inflammation, and thrombosis, thereby modifying edema, permeability, and pulmonary damage [6,7,8]. It has been reported that patients develop a cytokine storm during the progression of the disease. This inflammatory response correlates with the severity of COVID-19 and is characterized by an increase in interleukins (ILs), IFN-γ, TNF-α, and other cytokines. The hyperinflammatory response has also been linked to mortality rates [9,10,11,12]. One of the factors that could modulate the inflammatory response is vitamin D [13,14]. Vitamin D reduces cytokine production to regulate the inflammatory response, which is crucial in respiratory infections, as it helps repair lung epithelial cells [14,15]. In this sense, a correlation has been reported between vitamin D deficiency, thrombotic events, and mortality rates in COVID-19 patients [16,17,18].
The clinical spectrum of COVID-19 ranges from mild to extremely severe cases [1,19]. It has been hypothesized that viral infection drives an exacerbated inflammatory response, leading to severe lung injury that may necessitate hospital admission and mechanical ventilation, increasing the risk of multi-organ failure and death [10]. It has been reported that a genetic etiology is related to the severity of COVID-19. A predisposing genetic background may underlie variations in disease severity among individuals [20,21,22]. Several receptors and metabolic pathways are involved in the pathogenesis of COVID-19 infection. This analysis will consider genes responsible for the renin–angiotensin system, genes that regulate inflammatory cytokines associated with cytokine storms, and genes that determine vitamin D levels and its transport. Using machine learning methods could help identify the genetic variables that have the most significant influence on disease severity, improving the ability to identify high-risk patients. Machine learning fundamentally involves algorithms that take in data, perform computational analysis to predict output values within acceptable accuracy limits, identify patterns and trends, and ultimately learn from past experiences [23,24,25]. Machine learning involves analyzing complex distributions to identify probabilistic associations and the minimal set of features that capture the key patterns in the data, thereby building a predictive model. It has shown better results than traditional, model-based statistical methods [23,24,25].
In this scenario, this study aimed to evaluate the incidence of genetic polymorphisms associated with the RAAS system, cytokine production, and vitamin D on COVID-19 severity, with the goal of gaining a deeper understanding of the genetic etiology related to COVID-19 and developing a machine learning-based risk prediction algorithm that enhances the ability to identify high-risk patients.

2. Results

This study involved 338 COVID-19 patients. Table 1 presents the general characteristics of the included patients. A total of 248 patients (73.3%) were diagnosed with COVID-19 pneumonia based on compatible radiographic patterns, and 76 patients (22.4%) passed away during the initial hospitalization. During the first year of clinical follow-up, 77 patients (29.3%) required rehospitalization, while 25 patients died (9.5%). The clinical and analytical variables, as well as the treatments used in the included COVID-19 patients, are summarized in the Supplementary Tables S1–S3. The genotypic distribution of the analyzed single nucleotide polymorphisms (SNPs) according to the risk of COVID-19 pneumonia, mortality, rehospitalization, and mortality related to rehospitalization is shown in Supplementary Table S4.
Table 2 presents various machine learning methods evaluated for predicting the risk of COVID-19 pneumonia, mortality, rehospitalization, and mortality related to rehospitalization based on the genotypic distribution of the polymorphisms. In all cases, XGB was the machine learning method that yielded the best performance for predicting outcomes (Table 2). The XGB method achieved the highest scores across all assessed metrics, including balanced accuracy, recall, precision, area under the curve (AUC), F1 score, Matthews correlation coefficient (MCC), and degenerated Youden index (DYI) kappa. The AUC was 0.86 for predicting COVID-19 pneumonia and mortality, 0.85 for rehospitalization, and 0.86 for mortality associated with rehospitalization (Table 2).
Figure 1 summarizes the risk of COVID-19 pneumonia according to genotypic distribution. Figure 1A illustrates the order of influence of genetic polymorphisms, showing that genetic variants in the IL6R, VDBP, CRP, IL6, VDR, and IFN-γ genes were the most significant in determining the risk of COVID-19 pneumonia. The most influential SNPs were rs2228145 of the IL6R gene, particularly the AA genotype, followed by the AA genotype of the rs7041 of the VDBP gene. Other crucial polymorphisms included rs1205 in the CRP gene, rs1800795 and rs1800797 in the IL6 gene, rs731236 in the VDR gene, rs2282679 in the VDBP gene, and rs2430661 in the IFN-γ gene. The ROC curve was calculated, and the results indicated that the system based on the XGB model achieves a larger area, enabling greater accuracy in predicting the risk of COVID-19 pneumonia (Figure 1B). The AUC was 0.86. The radar plots indicated that the model training subsets resembled the scores in the test subsets. The XGB system exhibited a larger area (Figure 1C). The TT genotype of the rs2069827 polymorphism in the IL6 gene was the most significant genetic variant in predicting the risk of mortality caused by COVID-19. The second most influential factor was the CC genotype of the CRP rs2794521 polymorphism. Also crucial were the rs2074192, rs2074192, rs35697037, and rs2285666 polymorphisms in the ACE2 gene; rs1800872 in IL10; rs2228570 in the VDR gene; rs1800587 and rs17561 in the IL1A gene; rs1800796 and rs1800797 in the IL6 gene; and rs7041 in the VDBP gene (Figure 2A). The ROC curve showed that the XGB model system has a larger area, thereby enhancing accuracy in predicting mortality caused by COVID-19, with an AUC of 0.86 (Figure 2B). The XGB model also exhibited a larger area in the radar plots for both the training and test subsets (Figure 2C).
In the first year, 29.3% of patients were rehospitalized, and among them, 32.4% died. The rs1544410, rs731236, and rs7975232 polymorphisms in the VDR gene were the most influential factors affecting the risk of rehospitalization, particularly the TT, AA, and AA genotypes, respectively (Figure 3A). Additionally, significant SNPs were identified in the IL6 gene (rs1800796, rs1800797, and rs1800795), the IL1B gene (rs1143634), CRP (rs2294521), and the IL6R gene (rs2228145) (Figure 3A). The AA genotype of the IL6R rs2228145 polymorphism was most significantly linked to mortality related to rehospitalization (Figure 4A). Polymorphisms in the IL1A (rs17561 and rs1800587), ACE2 (rs35697037, rs2074192, and rs879922), IL1B (rs1143634), IL10 (rs1800896 and rs1800872), and IL8 (rs2227306) genes were also crucial genetic factors for predicting mortality associated with rehospitalization in the first year following the first COVID-19 diagnosis (Figure 4A). The XGB model was the best for predicting rehospitalization and associated mortality, demonstrating a larger ROC curve area (AUC was 0.85 for rehospitalization and 0.86 for mortality related to rehospitalization) (Figure 3B and Figure 4B). Furthermore, the XGB model produced the largest area in the training and test subsets (Figure 3C and Figure 4C).

3. Discussion

The COVID-19 outcomes range from mild to severe, and the severity of the disease may partly depend on genetic background [1,20]. This study employs a machine learning methodology to identify the genetic variants that most significantly influence COVID-19 severity and to develop a genetically based risk prediction algorithm that enhances the ability to identify high-risk patients. In this context, our results identified polymorphisms in genes such as IL6, IL6R, IL1A, IL1R, IFN-γ, TNF-α, CRP, VDR, VDBP, and ACE2 as the most significant genetic factors influencing COVID-19 prognosis, especially regarding the risks of COVID-19 pneumonia, mortality, rehospitalization, and related mortality. The machine learning methods yielded an AUC of 0.86 for predicting COVID-19 pneumonia, mortality, and rehospitalization-associated mortality, as well as an AUC of 0.85 for rehospitalization within the first year. These results confirm the crucial role of the genetic background in COVID-19 prognosis, enabling the identification of patients at increased risk. Nineteen genetic variants in ten different genes were the most influential in COVID-19 prognosis, particularly IL6R (rs2228145), IL6 (rs1800797 and rs1800796), CRP (rs2794521 and rs1800947), IFN-γ (rs2430561), IL1A (rs17561 and rs1800587), TNF-α (rs1800629), IL1R (rs419598), ACE2 (rs35697037 and rs2285666), VDR (rs1544410, rs7975232, and rs2228570), and VDBP (rs7041, rs2282679, and rs4588). These results highlight the crucial role of gene polymorphisms in inflammation, vitamin D, and the ACE2 in COVID-19 outcomes.
Disease severity varies, ranging from asymptomatic to patients who require intensive care unit admission and mechanical ventilation. Factors such as age, multiple comorbidities, and sex have been linked to COVID-19 [1]. This variability has also been associated with genetic factors; SNPs in specific genes could impact the variation in the clinical spectrum of COVID-19 [26,27]. Principally, it has focused on the role of genetic factors in the disease’s progression and severity, including polymorphisms in ACE, IL6, TNF-α, or VDR genes [27]. Our results reinforce the critical importance of genetic background in COVID-19 pathogenesis, particularly genetic variants related to ACE2, inflammation, and vitamin D. SARS-CoV-2 enters epithelial cells by binding to the ACE2 receptor. Furthermore, ACE2 regulates fibrosis, inflammation, and thrombosis, all of which impact edema, permeability, and lung damage [6,7,8]. Thus, multiple genetic polymorphisms in ACE2 that alter the structure or the expression rate have been linked to COVID-19 susceptibility and severity [26,27,28,29,30]. Using machine learning methods, this study demonstrated that the rs35697037 and rs2285666 polymorphisms were the most significant SNPs in determining COVID-19 outcomes, particularly hospitalization and mortality, related to the ACE2 gene.
SARS-CoV-2 infection activates both innate immunity, with alveolar macrophages playing a key role, and adaptive immunity to prevent proliferation. However, the virus has mechanisms to bypass these defense systems, which can lead to tissue damage and the recruitment of immune cells responsible for the cytokine storm that may persist after viral clearance. The body’s inability to control this response, which may be genetically mediated, can result in poor disease outcomes [31]. The principal pathological mechanism in COVID-19 is the excessive release of cytokines, known as a cytokine storm, which leads to lung injury and multi-organ damage [32]. In this scenario, multiple studies have linked polymorphisms in inflammatory-related genes to disease severity, particularly genetic variants that alter cytokine expression [26,27,33,34]. Our results with machine learning methods confirm their importance in patient outcomes. Additionally, our results underscore the crucial role of SNPs in the CRP gene. The CRP concentrations have been previously associated with disease progression [35], but few studies are exploring the impact of SNPs in the CRP gene on the severity of COVID-19.
Vitamin D, aside from its effects on bone metabolism, regulates the expression of genes involved in various biological functions, including organ development, cell cycle control, phosphocalcic metabolism, detoxification, and the regulation of innate and adaptive immunity [36]. One hallmark of vitamin D’s effects is the regulation of genes involved in inflammatory processes. Accordingly, there is an interplay between vitamin D signaling and other signaling cascades that are involved in inflammation [37]. The rapid increase in the serum 25(OH)D3 concentrations was related to a decrease in innate immunity markers, including eotaxin, IL12, monocyte chemoattractant protein-1, and macrophage inflammatory protein-1beta. Vitamin D metabolism is enzymatically regulated and depends on polymorphisms in multiple genes involved [38]. Mainly, polymorphisms have been described in genes such as CYP2R1 [39], GC (which encodes the transporter protein DBP) [40], CYP24R [41], DHCR7 [42], and VDR (vitamin D receptor) [43]. In this scenario, polymorphisms influencing vitamin D activity, such as those in the VDR and VDBP genes, have also been associated with COVID-19 infection and its clinical progression [27,44,45,46,47]. Our results demonstrated the significant role of genetic variants in VDR and VDBP in the risk of COVID-19 pneumonia, mortality, and rehospitalization, confirming that vitamin D metabolism is essential in the pathology of COVID-19.
This study highlights the importance of genetic background in determining the severity of COVID-19. The results highlight the significance of polymorphisms in ACE2, inflammatory, and vitamin D genes in determining the risk of COVID-19 pneumonia, mortality, rehospitalization, and related mortality. This research also presents machine learning methods as a tool that allows the construction of genetic-based models to identify patients with a worse prognosis. Our results involved predictive algorithms that combined the influence of several SNPs. The models obtained have an AUC greater than 0.85, indicating the remarkable predictive power of the models for identifying COVID-19 outcomes and patients at increased risk. The main limitation of our study is that the cohort size did not permit a more detailed analysis. Additional studies in different patient series would be necessary. In addition, to delve deeper into this topic, functional studies in molecular and cellular biology would need to be developed. Nonetheless, this study provides, for the first time, a genetically based machine learning model to identify COVID-19 severity. Additionally, this research lays the groundwork for future studies to incorporate a broader range of patients and develop more precise and effective algorithms for predicting COVID-19 outcomes. In conclusion, this study highlights that genetics-based machine learning models can identify patients with an increased risk, primarily considering genetic variants in ACE2, inflammation, and vitamin D.

4. Materials and Methods

4.1. Patients

Consecutive patients with COVID-19 who developed acute respiratory distress syndrome (ARDS) were included in this study. The patients were diagnosed in the Río Hortega University Clinical Hospital (Valladolid, Spain) between March and November 2020. The primary inclusion criteria were patients over 18 years of age with a microbiological diagnosis of COVID-19, symptoms of lung involvement, an admission radiological image compatible with the diagnosis, and laboratory and clinical variables upon admission, as well as signing the informed consent form. The radiological images were classified into three groups according to Litmanovich et al. [48]: (1) typical presentation of COVID-19 pneumonia; (2) indeterminate, less typical findings of COVID-19 pneumonia that may occur in a variety of infectious and non-infectious processes; and (3) atypical or uncommon findings of COVID-19 pneumonia, making it necessary to consider alternative diagnoses. Exclusion criteria included the following: age under 18 years, diagnosis of active tumor disease, and patients with other diseases with a life expectancy of less than 6 months. Patients with atypical or uncommon findings in radiological images were also excluded. Furthermore, the fact that the informed consent form was not signed was another exclusion criterion.
Demographic variables, medical history, and previous pharmacological and non-pharmacological treatments were collected, along with laboratory parameters including complete blood count, biochemistry, coagulation profile, and inflammatory markers. Clinical features at the onset of infection, complications, and progression during hospitalization were also collected, including pharmacological and non-pharmacological treatments received during the hospital admission. Study participants were clinically followed up for a year. Additionally, venous blood samples were collected in tubes containing EDTA during hospital admission. This study employed a double-masked (blinded) design to minimize bias.

4.2. DNA Isolation and Polymorphism Genotyping

Venous blood samples were collected in EDTA tubes, and the genomic DNA was extracted from peripheral blood leukocytes using the Purelink Genomic DNA Mini Kit (Invitrogen, Paisley, UK). DNA was quantified and diluted to a final concentration of 100 ng/μL. Genotyping was performed using TaqMan 5’-exonuclease allelic discrimination assays that contain sequence-specific forward and reverse primers to amplify the polymorphic sequences and two probes labeled with VIC and FAM dyes to detect both alleles of each polymorphism [49]. PCR reactions were performed using TaqMan Universal PCR Master Mix according to the manufacturer’s instructions in a StepOne Plus Real-Time PCR system (Thermofisher, Applied Biosystems, MA, USA). To assess reproducibility, a random selection of 5% of the samples was re-genotyped, and all of these genotypes matched the initially obtained genotypes. It analyzed the genotypic distribution of SNPs in genes involved in inflammatory activation, vitamin D metabolism, and the RAAS system. The genotypic distribution of the SNPs in the 338 COVID-19 patients is shown in Supplementary Table S4. Genetic polymorphisms were selected according to the following considerations: (1) functionality (previously described or possible effect) and (2) distribution along the gene, with preference given to those located in exons or contiguous regions. The Genecard (www.genecards.org, accessed on 4 March 2021) and NCBI (www.ncbi.nlm.nih.gov/snp, accessed on 4 March 2021) databases were utilized to identify the genes and pathways associated with each included polymorphism.

4.3. Machine Learning Analysis

Machine learning methods were employed to analyze the associations between genetic distributions of polymorphisms, COVID-19 pneumonia (diagnosed by compatible radiographic patterns), and the risk of death during the initial hospitalization. Additionally, the patients were clinically followed for one year and assessed for rehospitalization and risk of death related to COVID-19. The XGB method was proposed as the primary approach for data analysis due to its scalability, rapid execution, and impressive accuracy. Additionally, its versatility facilitates parallel computing [50]. Other machine learning methods in the literature have assessed the efficacy and performance of this system. They were the Support Vector Machine (SVM) [51], Decision Tree (DT) [52], Gaussian Naïve Bayes (GNB) [53], and K-Nearest Neighbors (KNN) [54]. Models derived from these methodologies were created using MATLAB v2 (The MathWorks, Natick, MA, USA; MATLAB R2023) [55].
Supplementary Figure S1 summarizes the steps taken to implement the machine learning algorithms. This research employed nested cross-validation in conjunction with Bayesian optimization techniques to effectively and reliably tune the hyperparameters of machine learning models. In the nested cross-validation process, the outer loop assessed the model’s overall performance while the inner loop concentrated on optimizing hyperparameters. Bayesian optimization was utilized in the inner loop to effectively navigate the space of critical hyperparameters, such as maximum tree depth (max_depth), number of estimators (n_estimators), learning rate (learning_rate), and regularization terms (lambda and alpha). Bayesian optimization employed a probabilistic model rooted in a Gaussian process to uncover optimal hyperparameter combinations. This approach utilized insights from prior iterations, reducing the necessity for exhaustive evaluations and concentrating on the most promising areas [56,57]. This method mitigated the risk of overfitting by keeping the test data in the outer loop separate from the optimization process. It enhanced model stability through uniform evaluations over various data partitions. The synergy of these strategies yielded models with improved performance and robust generalizability. Drawing from a systematic feature relevance analysis, we implemented a hybrid approach to identify the most impactful variables. We first assessed feature importance using a preliminary XGB model, which generated scores based on gain, coverage, or weight. This was followed by iterative feature selection methods, including Recursive Feature Elimination (RFE) with XGBoost as the base estimator, which aimed to trim the feature set while preserving strong predictive performance. Finally, we evaluated the effect of omitting specific features through cross-validation, ensuring that only those significantly enhancing the model’s performance were retained.
To minimize overfitting in XGBoost, we implemented several techniques. These included using explicit regularization through the lambda and alpha parameters, controlling the maximum tree size (max_depth), lowering the learning rate (learning_rate), and applying early stopping to end training if validation metrics did not improve after several consecutive iterations. Bootstrap validation was conducted, creating various data subsets via resampling to assess uncertainty in performance metrics and guarantee consistency. Furthermore, the model was validated on an independent external cohort to evaluate its generalizability across diverse clinical settings. The data were randomly split into training (70%) and testing (30%) sets to ensure a balanced representation of key classes and characteristics. The simulations were rigorously executed over 100 iterations, carefully accounting for mean and standard deviation values, thus reducing the potential impact of noise and ensuring the achievement of statistically valid conclusions [58].

4.4. Ethical Aspects

This study involving human subjects was conducted following the tenets of the Declaration of Helsinki (2008) and received approval from the Ethics Committee of Río Hortega University Hospital in Valladolid (PI216-20, approval date: 29 May 2020 and 2 March 2021). This study fully complied with the ethical standards of the World Medical Association, as well as Spanish data protection laws (LO 15/1999) and related regulations (RD 1720/2007). All patients who agreed to participate provided signed written consent.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms26167975/s1.

Author Contributions

S.J.-L. contributed to the acquisition of data, data curation, data analysis, drafting, and critical revision of this article. J.P.M.-G. contributed to the acquisition of data, data curation, data analysis, drafting, and critical revision of this article. L.C.-G. contributed to the acquisition of data, data curation, data analysis, and critical revision of the draft article. M.G.-M. contributed to the acquisition of data, data curation, data analysis, and critical revision of the draft article. S.P.-A. contributed to the acquisition of data, data curation, data analysis, and critical revision of the draft article. A.M.T. contributed to the acquisition of data, data curation, data analysis, and critical revision of the draft article. J.M. contributed to the acquisition of data, data curation, data analysis, and critical revision of the draft article. J.L.P.-C. contributed to the conceptualization, data analysis, interpretation, drafting of this article, supervision, and critical revision of it. R.U.-M. contributed to the conceptualization, data analysis, interpretation, drafting, and critical revision of this article. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by Gerencia Regional de Salud, Castilla y León, Spain (GRS 2255/A/20 and GRS COVID91/A/20). It was also supported by the Institute of Technology (University of Castilla-La Mancha), the Chair of Artificial Intelligence (sponsored by Bayer), and the Castilla-La Mancha Institute of Health Research (IDISCAM).

Institutional Review Board Statement

This study involving human subjects was conducted in accordance with the tenets of the Declaration of Helsinki (2008) and received approval from the Ethics Committee of Río Hortega University Hospital in Valladolid (PI216-20, approval date: 29 May 2020 and 2 March 2021). This study fully complied with the ethical standards of the World Medical Association, as well as Spanish data protection laws (LO 15/1999) and related regulations (RD 1720/2007). All patients who agreed to participate provided signed written consent.

Informed Consent Statement

All patients who agreed to participate provided signed written consent.

Data Availability Statement

All data needed to evaluate the conclusions in this paper are present in this paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

Acknowledgments

We express our gratitude to the patients for their participation and the operating nurses for assisting in blood sample collection.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Guan, W.; Ni, Z.; Hu, Y.; Liang, W.; Ou, C.; He, J.; Liu, L.; Shan, H.; Lei, C.; Hui, D.S.C.; et al. Clinical Characteristics of Coronavirus Disease 2019 in China. N. Engl. J. Med. 2020, 382, 1708–1720. [Google Scholar] [CrossRef]
  2. Gong, Z.; Song, T.; Hu, M.; Che, Q.; Guo, J.; Zhang, H.; Li, H.; Wang, Y.; Liu, B.; Shi, N. Natural and socio-environmental factors in the transmission of COVID-19: A comprehensive analysis of epidemiology and mechanisms. BMC Public Health 2024, 24, 2196. [Google Scholar] [CrossRef]
  3. Ingraham, N.E.; Barakat, A.G.; Reilkoff, R.; Bezdicek, T.; Schacker, T.; Chipman, J.G.; Tignanelli, C.J.; Puskarich, M.A. Understanding the renin-angiotensin-aldosterone-SARS-CoV axis: A comprehensive review. Eur. Respir. J. 2020, 56, 2000912. [Google Scholar] [CrossRef] [PubMed]
  4. Yan, R.; Zhang, Y.; Li, Y.; Xia, L.; Guo, Y.; Zhou, Q. Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2. Science 2020, 367, 1444–1448. [Google Scholar] [CrossRef] [PubMed]
  5. Shang, J.; Ye, G.; Shi, K.; Wan, Y.; Luo, C.; Aihara, H.; Geng, Q.; Auerbach, A.; Li, F. Structural basis of receptor recognition by SARS-CoV-2. Nature 2020, 581, 221–224. [Google Scholar] [CrossRef] [PubMed]
  6. Imai, Y.; Kuba, K.; Rao, S.; Huan, Y.; Guo, F.; Guan, B.; Yang, P.; Sarao, R.; Wada, T.; Leong-Poi, H.; et al. Angiotensin-converting enzyme 2 protects from severe acute lung failure. Nature 2005, 436, 112–116. [Google Scholar] [CrossRef]
  7. Annoni, F.; Orbegozo, D.; Rahmania, L.; Irazabal, M.; Mendoza, M.; De Backer, D.; Taccone, F.S.; Creteur, J.; Vincent, J.-L. Angiotensin-converting enzymes in acute respiratory distress syndrome. Intensive Care Med. 2019, 45, 1159–1160. [Google Scholar] [CrossRef]
  8. Kuba, K.; Imai, Y.; Rao, S.; Gao, H.; Guo, F.; Guan, B.; Huan, Y.; Yang, P.; Zhang, Y.; Deng, W.; et al. A crucial role of angiotensin converting enzyme 2 (ACE2) in SARS coronavirus–induced lung injury. Nat. Med. 2005, 11, 875–879. [Google Scholar] [CrossRef]
  9. Mehta, P.; McAuley, D.F.; Brown, M.; Sanchez, E.; Tattersall, R.S.; Manson, J.J. COVID-19: Consider cytokine storm syndromes and immunosuppression. Lancet 2020, 395, 1033–1034. [Google Scholar] [CrossRef]
  10. Jose, R.J.; Manuel, A. COVID-19 cytokine storm: The interplay between inflammation and coagulation. Lancet Respir. Med. 2020, 8, e46–e47. [Google Scholar] [CrossRef]
  11. Gustine, J.N.; Jones, D. Immunopathology of Hyperinflammation in COVID-19. Am. J. Pathol. 2021, 191, 4–17. [Google Scholar] [CrossRef]
  12. Tan, L.Y.; Komarasamy, T.V.; Balasubramaniam, V.R. Hyperinflammatory Immune Response and COVID-19: A Double Edged Sword. Front. Immunol. 2021, 12, 742941. Available online: https://www.frontiersin.org/journals/immunology/articles/10.3389/fimmu.2021.742941/full (accessed on 24 April 2025). [CrossRef] [PubMed]
  13. Minton, K. Vitamin D shuts down T cell-mediated inflammation. Nat. Rev. Immunol. 2022, 22, 1. [Google Scholar] [CrossRef] [PubMed]
  14. Colotta, F.; Jansson, B.; Bonelli, F. Modulation of inflammatory and immune responses by vitamin D. J. Autoimmun. 2017, 85, 78–97. [Google Scholar] [CrossRef] [PubMed]
  15. Zdrenghea, M.T.; Makrinioti, H.; Bagacean, C.; Bush, A.; Johnston, S.L.; Stanciu, L.A. Vitamin D modulation of innate immune responses to respiratory viral infections. Rev. Med. Virol. 2017, 27, e1909. [Google Scholar] [CrossRef]
  16. Mohan, M.; Cherian, J.J.; Sharma, A. Exploring links between vitamin D deficiency and COVID-19. PLoS Pathog. 2020, 16, e1008874. [Google Scholar] [CrossRef]
  17. Weir, E.K.; Thenappan, T.; Bhargava, M.; Chen, Y. Does vitamin D deficiency increase the severity of COVID-19? Clin. Med. 2020, 20, e107–e108. [Google Scholar] [CrossRef]
  18. Radujkovic, A.; Hippchen, T.; Tiwari-Heckler, S.; Dreher, S.; Boxberger, M.; Merle, U. Vitamin D Deficiency and Outcome of COVID-19 Patients. Nutrients 2020, 12, 2757. [Google Scholar] [CrossRef]
  19. Fu, L.; Wang, B.; Yuan, T.; Chen, X.; Ao, Y.; Fitzpatrick, T.; Li, P.; Zhou, Y.; Lin, Y.; Duan, Q.; et al. Clinical characteristics of coronavirus disease 2019 (COVID-19) in China: A systematic review and meta-analysis. J. Infect. 2020, 80, 656–665. [Google Scholar] [CrossRef]
  20. Camps-Vilaró, A.; Pinsach-Abuin, M.; Degano, I.R.; Ramos, R.; Martí-Lluch, R.; Elosua, R.; Subirana, I.; Solà-Richarte, C.; Puigmulé, M.; Pérez, A.; et al. Genetic characteristics involved in COVID-19 severity. The CARGENCORS case-control study and meta-analysis. J. Med. Virol. 2024, 96, e29404. [Google Scholar] [CrossRef]
  21. Velavan, T.P.; Pallerla, S.R.; Rüter, J.; Augustin, Y.; Kremsner, P.G.; Krishna, S.; Meyer, C.G. Host genetic factors determining COVID-19 susceptibility and severity. eBioMedicine 2021, 72, 103629. Available online: https://www.thelancet.com/journals/ebiom/article/PIIS2352-3964(21)00422-9/fulltext (accessed on 23 April 2025). [CrossRef]
  22. Niemi, M.E.K.; Daly, M.J.; Ganna, A. The human genetic epidemiology of COVID-19. Nat. Rev. Genet. 2022, 23, 533–546. [Google Scholar] [CrossRef] [PubMed]
  23. Waring, J.; Lindvall, C.; Umeton, R. Automated machine learning: Review of the state-of-the-art and opportunities for healthcare. Artif. Intell. Med. 2020, 104, 101822. [Google Scholar] [CrossRef] [PubMed]
  24. Handelman, G.S.; Kok, H.K.; Chandra, R.V.; Razavi, A.H.; Lee, M.J.; Asadi, H. eDoctor: Machine learning and the future of medicine. J. Intern. Med. 2018, 284, 603–619. [Google Scholar] [CrossRef] [PubMed]
  25. Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
  26. Abobaker, A.; Nagib, T.; Alsoufi, A. The impact of certain genetic variants (single nucleotide polymorphisms) on incidence and severity of COVID-19. J. Gene Med. 2021, 23, e3310. [Google Scholar] [CrossRef]
  27. Ren, H.; Lin, Y.; Huang, L.; Xu, W.; Luo, D.; Zhang, C. Association of genetic polymorphisms with COVID-19 infection and outcomes: An updated meta-analysis based on 62 studies. Heliyon 2024, 10, e23662. [Google Scholar] [CrossRef]
  28. Gómez, J.; Albaiceta, G.M.; García-Clemente, M.; López-Larrea, C.; Amado-Rodríguez, L.; Lopez-Alonso, I.; Hermida, T.; Enriquez, A.I.; Herrero, P.; Melón, S.; et al. Angiotensin-converting enzymes (ACE, ACE2) gene variants and COVID-19 outcome. Gene 2020, 762, 145102. [Google Scholar] [CrossRef]
  29. Saengsiwaritt, W.; Jittikoon, J.; Chaikledkaew, U.; Udomsinprasert, W. Genetic polymorphisms of ACE1, ACE2, and TMPRSS2 associated with COVID-19 severity: A systematic review with meta-analysis. Rev. Med. Virol. 2022, 32, e2323. [Google Scholar] [CrossRef]
  30. Sabater Molina, M.; Nicolás Rocamora, E.; Bendicho, A.I.; Vázquez, E.G.; Zorio, E.; Rodriguez, F.D.; Gil Ortuño, C.; Rodríguez, A.I.; Sánchez-López, A.J.; Jara Rubio, R.; et al. Polymorphisms in ACE, ACE2, AGTR1 genes and severity of COVID-19 disease. PLoS ONE 2022, 17, e0263140. [Google Scholar] [CrossRef]
  31. Merad, M.; Blish, C.A.; Sallusto, F.; Iwasaki, A. The immunology and immunopathology of COVID-19. Science 2022, 375, 1122–1127. [Google Scholar] [CrossRef]
  32. Sanders, J.M.; Monogue, M.L.; Jodlowski, T.Z.; Cutrell, J.B. Pharmacologic Treatments for Coronavirus Disease 2019 (COVID-19): A Review. JAMA 2020, 323, 1824–1836. [Google Scholar] [CrossRef]
  33. Darbeheshti, F.; Mahdiannasser, M.; Uhal, B.D.; Ogino, S.; Gupta, S.; Rezaei, N. Interindividual immunogenic variants: Susceptibility to coronavirus, respiratory syncytial virus and influenza virus. Rev. Med. Virol. 2021, 31, e2234. [Google Scholar] [CrossRef]
  34. Yip, J.Q.; Oo, A.; Ng, Y.L.; Chin, K.L.; Tan, K.-K.; Chu, J.J.H.; AbuBakar, S.; Zainal, N. The role of inflammatory gene polymorphisms in severe COVID-19: A review. Virol. J. 2024, 21, 327. [Google Scholar] [CrossRef] [PubMed]
  35. Vogi, V.; Haschka, D.; Forer, L.; Schwendinger, S.; Petzer, V.; Coassin, S.; Tancevski, I.; Sonnweber, T.; Löffler-Ragg, J.; Puchhammer-Stöckl, E.; et al. Severe COVID-19 disease is associated with genetic factors affecting plasma ACE2 receptor and CRP concentrations. Sci. Rep. 2025, 15, 4708. [Google Scholar] [CrossRef] [PubMed]
  36. Bischoff-Ferrari, H.A.; Dawson-Hughes, B.; Stöcklin, E.; Sidelnikov, E.; Willett, W.C.; Edel, J.O.; Stähelin, H.B.; Wolfram, S.; Jetter, A.; Schwager, J.; et al. Oral supplementation with 25(OH)D3 versus vitamin D3: Effects on 25(OH)D levels, lower extremity function, blood pressure, and markers of innate immunity. J. Bone Miner. Res. 2012, 27, 160–169. [Google Scholar] [CrossRef] [PubMed]
  37. Malmberg, H.-R.; Hanel, A.; Taipale, M.; Heikkinen, S.; Carlberg, C. Vitamin D Treatment Sequence Is Critical for Transcriptome Modulation of Immune Challenged Primary Human Cells. Front. Immunol. 2021, 12, 754056. [Google Scholar] [CrossRef]
  38. Bouillon, R.; Bikle, D. Vitamin D Metabolism Revised: Fall of Dogmas. J. Bone Miner. Res. 2019, 34, 1985–1992. [Google Scholar] [CrossRef]
  39. Manousaki, D.; Dudding, T.; Haworth, S.; Hsu, Y.-H.; Liu, C.-T.; Medina-Gómez, C.; Voortman, T.; van der Velde, N.; Melhus, H.; Robinson-Cohen, C.; et al. Low-Frequency Synonymous Coding Variation in CYP2R1 Has Large Effects on Vitamin D Levels and Risk of Multiple Sclerosis. Am. J. Hum. Genet. 2017, 101, 227–238. [Google Scholar] [CrossRef]
  40. Scazzone, C.; Agnello, L.; Bivona, G.; Lo Sasso, B.; Ciaccio, M. Vitamin D and Genetic Susceptibility to Multiple Sclerosis. Biochem. Genet. 2021, 59, 1–30. [Google Scholar] [CrossRef]
  41. Cools, M.; Goemaere, S.; Baetens, D.; Raes, A.; Desloovere, A.; Kaufman, J.M.; De Schepper, J.; Jans, I.; Vanderschueren, D.; Billen, J.; et al. Calcium and bone homeostasis in heterozygous carriers of CYP24A1 mutations: A cross-sectional study. Bone 2015, 81, 89–96. [Google Scholar] [CrossRef]
  42. Kuan, V.; Martineau, A.R.; Griffiths, C.J.; Hyppönen, E.; Walton, R. DHCR7 mutations linked to higher vitamin D status allowed early human migration to Northern latitudes. BMC Evol. Biol. 2013, 13, 144. [Google Scholar] [CrossRef]
  43. Saccone, D.; Asani, F.; Bornman, L. Regulation of the vitamin D receptor gene by environment, genetics and epigenetics. Gene 2015, 561, 171–180. [Google Scholar] [CrossRef] [PubMed]
  44. Karcıoğlu Batur, L.; Dokur, M.; Koç, S.; Karabay, M.; Akcay, Z.N.; Gunger, E.; Hekim, N. Investigation of the Relationship between Vitamin D Deficiency and Vitamin D-Binding Protein Polymorphisms in Severe COVID-19 Patients. Diagnostics 2024, 14, 1941. [Google Scholar] [CrossRef] [PubMed]
  45. Jiang, H.; Chi, X.; Sun, Y.; Li, H. Vitamin D Binding Protein: A Potential Factor in Geriatric COVID-19 Acute Lung Injury. J. Inflamm. Res. 2024, 17, 4419–4429. [Google Scholar] [CrossRef] [PubMed]
  46. Dobrijevic, Z.; Robajac, D.; Gligorijevic, N.; Šunderic, M.; Penezic, A.; Miljuš, G.; Nedic, O. The association of ACE1, ACE2, TMPRSS2, IFITM3 and VDR polymorphisms with COVID-19 severity: A systematic review and meta-analysis. EXCLI J. 2022, 21, 818–839. [Google Scholar]
  47. Tentolouris, N.; Achilla, C.; Anastasiou, I.A.; Eleftheriadou, I.; Tentolouris, A.; Basoulis, D.; Kosta, O.; Lambropoulos, A.; Yavropoulou, M.P.; Chatzikyriakidou, A.; et al. The Association of Vitamin D Receptor Polymorphisms with COVID-19 Severity. Nutrients 2024, 16, 727. [Google Scholar] [CrossRef]
  48. Litmanovich, D.E.; Chung, M.; Kirkbride, R.R.; Kicska, G.; Kanne, J.P. Review of Chest Radiograph Findings of COVID-19 Pneumonia and Suggested Reporting Language. J. Thorac. Imaging 2020, 35, 354–360. [Google Scholar] [CrossRef]
  49. Schleinitz, D.; Distefano, J.K.; Kovacs, P. Targeted SNP genotyping using the TaqMan® assay. In Disease Gene Identification: Methods and Protocols; Humana Press: Totowa, NJ, USA, 2011; Volume 700, pp. 77–87. [Google Scholar]
  50. Chen, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Cano, I.; Zhou, T. Xgboost: Extreme gradient boosting. Sci. Res. 2015, 1, 1–4. [Google Scholar]
  51. Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
  52. Charbuty, B.; Abdulazeez, A. Classification Based on Decision Tree Algorithm for Machine Learning. J. Appl. Sci. Technol. Trends 2021, 2, 20–28. [Google Scholar] [CrossRef]
  53. Ampomah, E.K.; Nyame, G.; Qin, Z.; Addo, P.C.; Gyamfi, E.O.; Gyan, M. Stock Market Prediction with Gaussian Naïve Bayes Machine Learning Algorithm. Informatica 2021, 45, 3407. Available online: https://www.informatica.si/index.php/informatica/article/view/3407 (accessed on 23 April 2025). [CrossRef]
  54. Cubillos, M.; Wøhlk, S.; Wulff, J.N. A bi-objective k-nearest-neighbors-based imputation method for multilevel data. Expert Syst. Appl. 2022, 204, 117298. Available online: https://re.public.polimi.it/handle/11311/1252307 (accessed on 23 April 2025). [CrossRef]
  55. MATLAB Runtime—MATLAB Compiler. Available online: https://es.mathworks.com/products/compiler/matlab-runtime.html (accessed on 23 April 2025).
  56. Feurer, M.; Hutter, F. Hyperparameter Optimization. In Automated Machine Learning: Methods, Systems, Challenges; Hutter, F., Kotthoff, L., Vanschoren, J., Eds.; Springer International Publishing: Cham, Germany, 2019; pp. 3–33. ISBN 978-3-030-05318-5. [Google Scholar] [CrossRef]
  57. Chen, R.-C.; Dewi, C.; Huang, S.-W.; Caraka, R.E. Selecting critical features for data classification based on machine learning methods. J. Big Data 2020, 7, 52. [Google Scholar] [CrossRef]
  58. Herland, M.; Khoshgoftaar, T.M.; Wald, R. A review of data mining using big data in health informatics. J. Big Data 2014, 1, 2. [Google Scholar] [CrossRef]
Figure 1. Risk of COVID-19 pneumonia according to genotypic distribution. (A) Order of influence of genetic polymorphisms in the risk of COVID-19 pneumonia. The X-axis represents the importance (gain); higher values indicate a greater relative weight in the prediction, without implying direction of effect or relative risk. Red: SNPs are genes related to inflammation, green: SNPs related to vitamin D metabolism, and blue: SNPs in the ACE2 gene. (B) ROC curves for the assessed machine learning methods. (C) Radar plot in the training phase and the test phase. SVM: Support Vector Machine, DT: Decision Tree, GNB: Gaussian Naïve Bayes, KNN: K-Nearest Neighbors, XGB: Extreme Gradient Boosting.
Figure 1. Risk of COVID-19 pneumonia according to genotypic distribution. (A) Order of influence of genetic polymorphisms in the risk of COVID-19 pneumonia. The X-axis represents the importance (gain); higher values indicate a greater relative weight in the prediction, without implying direction of effect or relative risk. Red: SNPs are genes related to inflammation, green: SNPs related to vitamin D metabolism, and blue: SNPs in the ACE2 gene. (B) ROC curves for the assessed machine learning methods. (C) Radar plot in the training phase and the test phase. SVM: Support Vector Machine, DT: Decision Tree, GNB: Gaussian Naïve Bayes, KNN: K-Nearest Neighbors, XGB: Extreme Gradient Boosting.
Ijms 26 07975 g001
Figure 2. Mortality during the initial hospitalization according to genotypic distribution. (A) Order of influence of genetic polymorphisms in the risk of mortality. The X-axis represents the importance (gain); higher values indicate a greater relative weight in the prediction, without implying direction of effect or relative risk. Red: SNPs are genes related to inflammation, green: SNPs related to vitamin D metabolism, and blue: SNPs in the ACE2 gene. (B) ROC curves for the assessed machine learning methods. (C) Radar plot in the training phase and the test phase. SVM: Support Vector Machine, DT: Decision Tree, GNB: Gaussian Naïve Bayes, KNN: K-Nearest Neighbors, XGB: Extreme Gradient Boosting.
Figure 2. Mortality during the initial hospitalization according to genotypic distribution. (A) Order of influence of genetic polymorphisms in the risk of mortality. The X-axis represents the importance (gain); higher values indicate a greater relative weight in the prediction, without implying direction of effect or relative risk. Red: SNPs are genes related to inflammation, green: SNPs related to vitamin D metabolism, and blue: SNPs in the ACE2 gene. (B) ROC curves for the assessed machine learning methods. (C) Radar plot in the training phase and the test phase. SVM: Support Vector Machine, DT: Decision Tree, GNB: Gaussian Naïve Bayes, KNN: K-Nearest Neighbors, XGB: Extreme Gradient Boosting.
Ijms 26 07975 g002
Figure 3. Risk of rehospitalization according to genotypic distribution. (A) Order of influence of genetic polymorphisms in the risk of rehospitalization. The X-axis represents the importance (gain); higher values indicate a greater relative weight in the prediction, without implying direction of effect or relative risk. Red: SNPs are genes related to inflammation, green: SNPs related to vitamin D metabolism, and blue: SNPs in the ACE2 gene. (B) ROC curves for the assessed machine learning methods. (C) Radar plot in the training phase and the test phase. SVM: Support Vector Machine, DT: Decision Tree, GNB: Gaussian Naïve Bayes, KNN: K-Nearest Neighbors, XGB: Extreme Gradient Boosting.
Figure 3. Risk of rehospitalization according to genotypic distribution. (A) Order of influence of genetic polymorphisms in the risk of rehospitalization. The X-axis represents the importance (gain); higher values indicate a greater relative weight in the prediction, without implying direction of effect or relative risk. Red: SNPs are genes related to inflammation, green: SNPs related to vitamin D metabolism, and blue: SNPs in the ACE2 gene. (B) ROC curves for the assessed machine learning methods. (C) Radar plot in the training phase and the test phase. SVM: Support Vector Machine, DT: Decision Tree, GNB: Gaussian Naïve Bayes, KNN: K-Nearest Neighbors, XGB: Extreme Gradient Boosting.
Ijms 26 07975 g003
Figure 4. Risk of mortality on rehospitalization according to genotypic distribution. (A) Order of influence of genetic polymorphisms in the risk of mortality on rehospitalization. The X-axis represents the importance (gain); higher values indicate a greater relative weight in the prediction, without implying direction of effect or relative risk. Red: SNPs are genes related to inflammation, green: SNPs related to vitamin D metabolism, and blue: SNPs in the ACE2 gene. (B) ROC curves for the assessed machine learning methods. (C) Radar plot in the training phase and the test phase. SVM: Support Vector Machine, DT: Decision Tree, GNB: Gaussian Naïve Bayes, KNN: K-Nearest Neighbors, XGB: Extreme Gradient Boosting.
Figure 4. Risk of mortality on rehospitalization according to genotypic distribution. (A) Order of influence of genetic polymorphisms in the risk of mortality on rehospitalization. The X-axis represents the importance (gain); higher values indicate a greater relative weight in the prediction, without implying direction of effect or relative risk. Red: SNPs are genes related to inflammation, green: SNPs related to vitamin D metabolism, and blue: SNPs in the ACE2 gene. (B) ROC curves for the assessed machine learning methods. (C) Radar plot in the training phase and the test phase. SVM: Support Vector Machine, DT: Decision Tree, GNB: Gaussian Naïve Bayes, KNN: K-Nearest Neighbors, XGB: Extreme Gradient Boosting.
Ijms 26 07975 g004
Table 1. General characteristics of the study cohort.
Table 1. General characteristics of the study cohort.
General CharacteristicsPatients
Age, mean (SD) (years)73.26 (13.09)
Sex (male; female), n (%)182 (53.84); 156 (46.15)
Days from symptom onset to hospital admission, mean (SD)5.77 (5.45)
COVID-19 pneumonia, n (%)248 (73.3)
Days of admission, mean (SD)18.21 (22.75)
Death due to COVID-19, n (%)76 (22.48)
Dependency (yes; moderate; mild; independent), n (%)42 (12.42); 47 (13.90); 169 (50); 80 (83.67)
Rehospitalization in the first year, n (%)77 (29.3)
Smoking history (active; ex-smoker; never smoker), n (%)12 (3.56); 57 (16.91); 268 (79.52)
Dementia, n (%)41 (12.20)
Hypertension, n (%)196 (58.16)
Dyslipidemia n (%)133 (39.58)
Myocardial infarction (%)15 (4.45)
Heart failure, n (%)20 (5.97)
Cerebral ictus, n (%)18 (5.36)
Diabetes mellitus, n (%)68 (20.24)
COPD, n (%)8 (2.43)
Asthma, n (%)32 (9.46)
Obstructive sleep apnea, n (%)20 (5.93)
Chronic kidney disease (4–5), n (%)26 (7.15)
Tumor without metastasis, n (%)39 (11.57)
Tumor with metastasis, n (%)5 (1.48)
ACEi, n (%)82 (24.40)
ARBs, n (%)77 (22.91)
Statins, n (%)81 (24.11)
Metformin, n (%)36 (10.71)
DDP-4 inhibitors, n (%)30 (8.92)
Insulin, n (%)20 (5.95)
Inhaled corticosteroids, n (%)33 (9.82)
Corticosteroids, n (%)10 (2.97)
Immunosuppressors/immunomodulators, n (%)16 (4.76)
Test COVID-19 confirmation, n (%)338 (100)
Table 2. Different machine learning methods were tested to predict the risk of COVID-19 pneumonia, mortality, rehospitalization, and mortality on rehospitalization according to the genotypic distribution of the polymorphisms in patients included in this study.
Table 2. Different machine learning methods were tested to predict the risk of COVID-19 pneumonia, mortality, rehospitalization, and mortality on rehospitalization according to the genotypic distribution of the polymorphisms in patients included in this study.
MethodBA (%)RecallPrecisionAUCF1 ScoreMCCDYIKappa
COVID-19 pneumoniaSVM79.0979.1878.530.7978.8570.1879.0970.41
DT77.2377.3276.680.7677.0068.5377.2368.75
GNB71.1671.2570.650.7170.9563.1471.1663.35
KNN82.0482.1381.450.8281.7972.7982.0473.04
XGB86.1086.2085.480.8685.8476.4086.1076.65
MortalitySVM79.2079.2978.640.7978.9670.2879.2070.51
DT77.1377.2276.580.7676.9068.4477.1368.67
GNB70.1870.2669.680.7069.9762.2770.1862.48
KNN81.1781.2780.590.8180.9372.0281.1772.26
XGB86.0086.1085.390.8685.7476.3186.0076.56
RehospitalizationSVM79.2679.3578.690.7879.0270.3379.2670.56
DT77.4877.5876.930.7777.2568.7577.4868.98
GNB72.7872.8672.250.7272.5664.4672.7864.68
KNN81.9482.0481.360.8181.6972.7181.9472.95
XGB85.8685.9685.250.8585.6176.1985.8676.44
Mortality
(rehospitalization)
SVM79.8579.9479.280.8079.6170.8579.8571.09
DT78.1278.2177.560.7877.8869.3178.1269.54
GNB70.8370.9170.320.7170.6262.8570.8363.05
KNN81.1781.2780.590.8180.9372.0281.1772.26
XGB86.8586.7286.230.8686.5977.0686.8577.32
BA: Balanced Accuracy. AUC: Area Under Curve. MCC: Matthew Correlation Coefficient. DYI: Degenerated Younden Index. SVM: Support Vector Machine. DT: Decision Tree. GNB: Gaussian Naïve Bayes. KNN: K-Nearest Neighbors. XGB: Extreme Gradient Boosting.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jaurrieta-Largo, S.; Miramontes-González, J.P.; Corral-Gudino, L.; Gabella-Martín, M.; Pérez-Arroyo, S.; Torres, A.M.; Mateo, J.; Pérez-Castrillón, J.L.; Usategui-Martín, R. A Machine Learning Approach to Understanding the Genetic Role in COVID-19 Prognosis: The Influence of Gene Polymorphisms Related to Inflammation, Vitamin D, and ACE2. Int. J. Mol. Sci. 2025, 26, 7975. https://doi.org/10.3390/ijms26167975

AMA Style

Jaurrieta-Largo S, Miramontes-González JP, Corral-Gudino L, Gabella-Martín M, Pérez-Arroyo S, Torres AM, Mateo J, Pérez-Castrillón JL, Usategui-Martín R. A Machine Learning Approach to Understanding the Genetic Role in COVID-19 Prognosis: The Influence of Gene Polymorphisms Related to Inflammation, Vitamin D, and ACE2. International Journal of Molecular Sciences. 2025; 26(16):7975. https://doi.org/10.3390/ijms26167975

Chicago/Turabian Style

Jaurrieta-Largo, Sofía, José Pablo Miramontes-González, Luis Corral-Gudino, Miriam Gabella-Martín, Sofía Pérez-Arroyo, Ana M. Torres, Jorge Mateo, José Luis Pérez-Castrillón, and Ricardo Usategui-Martín. 2025. "A Machine Learning Approach to Understanding the Genetic Role in COVID-19 Prognosis: The Influence of Gene Polymorphisms Related to Inflammation, Vitamin D, and ACE2" International Journal of Molecular Sciences 26, no. 16: 7975. https://doi.org/10.3390/ijms26167975

APA Style

Jaurrieta-Largo, S., Miramontes-González, J. P., Corral-Gudino, L., Gabella-Martín, M., Pérez-Arroyo, S., Torres, A. M., Mateo, J., Pérez-Castrillón, J. L., & Usategui-Martín, R. (2025). A Machine Learning Approach to Understanding the Genetic Role in COVID-19 Prognosis: The Influence of Gene Polymorphisms Related to Inflammation, Vitamin D, and ACE2. International Journal of Molecular Sciences, 26(16), 7975. https://doi.org/10.3390/ijms26167975

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop