Exome-Wide Association Study Reveals Host Genetic Variants Likely Associated with the Severity of COVID-19 in Patients of European Ancestry

Host genetic variability plays a pivotal role in modulating COVID-19 clinical outcomes. Despite the functional relevance of protein-coding regions, rare variants located here are less likely to completely explain the considerable numbers of acutely affected COVID-19 patients worldwide. Using an exome-wide association approach, with individuals of European descent, we sought to identify common coding variants linked with variation in COVID-19 severity. Herein, cohort 1 compared non-hospitalized (controls) and hospitalized (cases) individuals, and in cohort 2, hospitalized subjects requiring respiratory support (cases) were compared to those not requiring it (controls). 229 and 111 variants differed significantly between cases and controls in cohorts 1 and 2, respectively. This included FBXO34, CNTN2, and TMCC2 previously linked with COVID-19 severity using association studies. Overall, we report SNPs in 26 known and 12 novel candidate genes with strong molecular evidence implicating them in the pathophysiology of life-threatening COVID-19 and post-recovery sequelae. Of these few notable known genes include, HLA-DQB1, AHSG, ALOX5AP, MUC5AC, SMPD1, SPG7, SPEG, GAS6, and SERPINA12. These results enhance our understanding of the pathomechanisms underlying the COVID-19 clinical spectrum and may be exploited to prioritize biomarkers for predicting disease severity, as well as to improve treatment strategies in individuals of European ancestry.


Introduction
More than two years after its first outbreak, the coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) remains a prominent healthcare challenge and disruptor of social and economic activities worldwide. It shows a complex array of manifestations ranging from the complete absence of symptoms to severe clinical outcomes, such as acute respiratory distress syndrome (ARDS) [1]. It is a multisystemic disease often marked by cardiovascular, respiratory, neurological, gastrointestinal, renal, immunological, and hematological derangements [2][3][4][5][6][7]. The pathophysiological course of COVID-19 is modulated by the host immune response and progression to severe disease or death occurs due to an uncontrolled cytokine storm and hyperinflammation of the immune system [8]. The severity of clinical outcomes in COVID-19 patients are likely modified by multiple factors ranging from advanced age, male gender, and preexisting comorbidities, such as hypertension and obesity [9][10][11][12]. Moreover, ethnicity-specific differences are also noted in the clinical phenotype of SARS-CoV-2 infection [13,14]. However, Life 2022, 12, 1300 2 of 18 these, as well as the discrepancies in socioeconomic attributes or access to healthcare and vaccination do not completely explain the striking individual and population specific disparities observed in the acuteness of COVID-19 clinical presentation worldwide [15][16][17]. For example, many SARS-CoV-2 infected individuals are known to remain asymptomatic despite significantly higher viral loads than their symptomatic counterparts [18]. Furthermore, middle-aged individuals (40-59 years) are known to develop severe disease even without the presence of underlying comorbidities [19]. The spectrum of COVID-19 clinical manifestation is multifactorial and plausibly explained, at least in part, by the variation in host genetic attributes.
Since uninfected individuals or population controls may develop morbid COVID-19 subsequently upon contracting SARS-CoV-2 infection, we used asymptomatic COVID-19 patients as controls in a GWAS using a COVID-19 dataset generated by AncestryDNA [28], to uncover the genetic variants governing susceptibility to severe COVID-19 in individuals of European ancestry [29]. This study identified 621 SNPs that differed significantly between asymptomatic and acutely afflicted COVID-19 patients, and were associated with IFN and IL signaling, as well as obesity and cholesterol metabolism that are well-known COVID-19 comorbidities. In addition, it highlighted the putative ancestral genomic differences between patients with exacerbated COVID-19 phenotype versus those that were asymptomatic; the latter contained higher genomic fractions of Ancestral North Eurasian (ANE) and Eastern Hunter-Gatherer (EHG), and lower Western Hunter-Gatherer (WHG) fractions. A similar approach using asymptomatic individuals as controls was used in a GWAS on the Chinese population that revealed significant associations at the chromosome loci, 11q23.3 and 11q14.2 with severe COVID-19 and noted significant expression quantitative trail locus (eQTL) associations for RNA Exonuclease 2 (REXO2), C11orf71, Nicotinamide N-Methyltransferase (NNMT1), and Cell Adhesion Molecule 1 Precursor (CADM1) at 11q23. 3 and Cathepsin C (CTSC) at 11q14.2 [30]. Another trans-ancestry GWAS of Europeans, South Asian, and East Asians from UAE used non-hospitalized COVID-19 patients as controls comparing them to those that were hospitalized [31]. This revealed prominent associations at tension, airway obstruction, emphysema, T-cell mediated inflammation, and inflammatory cytokines [31].
Genetic determinants are known to modify the susceptibility to immune-mediated disorders and severe viral infections, such as from respiratory viruses, eg. Influenza virus A and SARS-CoV-1 [32][33][34][35]. As in case of other complex diseases, in addition to a large number of variants in regulatory elements and non-coding regions of the genome [36,37], variations in the coding regions may be involved in controlling the spectrum and severity of COVID-19 [38]. Already rare pathogenic single nucleotide variants in genes governing host immunity, such as Toll Like Receptor 7 (TLR7), Toll Like Receptor Adaptor Molecule 1 (TICAM1), Interferon Regulatory Factor 3 (IRF3), and Interferon Alpha and Beta Receptor Subunit 1 (IFNAR1) are reported in severe cases of COVID-19, including in young patients without pre-existing comorbidities [39,40]. Nevertheless, the monogenic inheritance of rare variants is unlikely to completely account for the large numbers of patients with life-threatening complications in COVID-19 worldwide.
Accordingly, we conducted a case-control based association study to query an exome dataset of 1,432,135 SNPs in 2692 COVID-19 patients of European ancestry from the GEN-COVID consortium, University of Sienna, Italy. We sought to identify common variations in the coding regions of the genome that may be associated with variable prognosis in SARS-CoV-2 infected patients. To this end we employed two strategies, first in cohort 1 we compared non-hospitalized patients (controls; N = 493) to those that were hospitalized (cases; N = 2199) and in cohort 2 we evaluated hospitalized COVID-19 patients on respiratory support (cases; N = 1877) to those not requiring the same (controls; N = 815).

Dataset
A novel whole-exome dataset of COVID-19 patients was obtained from the GEN-COVID consortium, University of Siena, Italy (Sienna_COVID). It comprised of the data from 2960 COVID-19 patients and assessed 1,432,135 SNPs. The patients were graded on a scale of 0-5 based on their hospitalization status and the requirement of respiratory support (Table 1). Hospitalized without respiratory support (HWRS) 2 Hospitalized with O2 supplementation (HWOS) 3 Hospitalized with CPAP-biPAP (HWCB) 4 Hospitalized intubated (HI) 5 Dead (D) The Sienna_COVID dataset was first merged with the genomic data of individuals from the 1000 Genomes Project (https://www.internationalgenome.org/data/, accessed on 22 April 2017). The merged dataset (Sienna_1K) comprised of 5464 individuals assessing 229,021 SNPs that were common between the two datasets. VCFtools v0.1.16 [41], and PLINK v1.90 [42], were used for all file conversions and manipulations.
Population structure within the Sienna_1K dataset was identified using Principal Component Analysis (PCA) implemented in PLINK v1.9 using-pca command. The PC1 and PC2 are plotted in RStudio v1.4.1717 ( Figure 1). To control for population stratification and avoid genetic structure in the sample, we restricted our analysis to the COVID-19 patients that cluster with individuals of European ancestry from the 1000 Genomes Project [43]. Coordinates for the patients under analysis were delineated based on the clusters formed by GBR, FIN, IBS, CEU, and TSI genomes. 2692 COVID-19 patient exomes within the European cluster were selected (PC1 ranging from −0.0050 to 0.0050 and PC2 ranging from −0.0100 to 0) for downstream analysis (Figure 1 inset), and those outside this cluster were removed. Congruently, the data for 2692 COVID-19 patients of European descents were extracted from the original dataset (Sienna_COVID) and a new dataset assessing 1,432,135 SNPs was generated (European_Sienna_COVID). and PC2 are plotted in RStudio v1.4.1717 ( Figure 1). To control for population stratification and avoid genetic structure in the sample, we restricted our analysis to the COVID-19 patients that cluster with individuals of European ancestry from the 1000 Genomes Project [43]. Coordinates for the patients under analysis were delineated based on the clusters formed by GBR, FIN, IBS, CEU, and TSI genomes. 2692 COVID-19 patient exomes within the European cluster were selected (PC1 ranging from −0.0050 to 0.0050 and PC2 ranging from −0.0100 to 0) for downstream analysis (Figure 1 inset), and those outside this cluster were removed. Congruently, the data for 2692 COVID-19 patients of European descents were extracted from the original dataset (Sienna_COVID) and a new dataset assessing 1,432,135 SNPs was generated (European_Sienna_COVID). African, South Asian, European, Latin American, and East Asian populations were designated with different shades of yellow, red, blue, pink, and green, respectively. We selected COVID-19 patients that cluster with European genomes (PC1 ranging from −0.0050 to 0.0050 and PC2 ranging from −0.0100 to 0) for downstream analysis. PCA was performed in PLINK v1.9. The PC1 and PC2 were plotted in RStudio v1.4.1717.
Among the 2692 patients considered, the number of individuals in each of the categories described here (Table 1)

Exome-Wide Association Studies
To identify genetic variants with significant frequency variation between hospitalized (HWRS + HWOS + HWCB + HI + D) versus non-hospitalized (NH) patients (cohort 1), and hospitalized patients on respiratory support (HWOS + HWCB + HI + D) versus those without it (NH + HWRS) (cohort 2), multiple regression-based case-control analyses were performed on the European_Sienna_COVID dataset in PLINK v1.9. Age, sex, and the first African, South Asian, European, Latin American, and East Asian populations were designated with different shades of yellow, red, blue, pink, and green, respectively. We selected COVID-19 patients that cluster with European genomes (PC1 ranging from −0.0050 to 0.0050 and PC2 ranging from −0.0100 to 0) for downstream analysis. PCA was performed in PLINK v1.9. The PC1 and PC2 were plotted in RStudio v1.4.1717.
Among the 2692 patients considered, the number of individuals in each of the categories described here (Table 1)

Exome-Wide Association Studies
To identify genetic variants with significant frequency variation between hospitalized (HWRS + HWOS + HWCB + HI + D) versus non-hospitalized (NH) patients (cohort 1), and hospitalized patients on respiratory support (HWOS + HWCB + HI + D) versus those without it (NH + HWRS) (cohort 2), multiple regression-based case-control analyses were performed on the European_Sienna_COVID dataset in PLINK v1.9. Age, sex, and the first two principal components (PC1 and PC2) were employed as the covariates in the association analysis. To this end, COVID-19 patients were divided into two age groups: ≤50 years and >50 years. All covariates were included in a logistic regression model using -covar flag alongside the -logistic command. SNP-phenotype association was statistically delineated separately for all 1,432,135 SNPs to obtain Odd's Ratio and respective p-value for each SNP after controlling for the covariates. The p-value < 0.001 was considered statistically significant. The −log10 p-values of all assessed SNPs for both cohort 1 and cohort 2 were plotted as Manhattan plots using 'qqman' package in R v3.5.2 [44]. Significant SNPs were annotated using SNPnexus (https://www.snp-nexus.org/v4/, accessed on 15 August 2022) web-based server for GRCh38/hg38 [45].
covar flag alongside the --logistic command. SNP-phenotype association was statistically delineated separately for all 1,432,135 SNPs to obtain Odd's Ratio and respective p-value for each SNP after controlling for the covariates. The p-value < 0.001 was considered statistically significant. The −log10 p-values of all assessed SNPs for both cohort 1 and cohort 2 were plotted as Manhattan plots using 'qqman' package in R v3.5.2 [44]. Significant SNPs were annotated using SNPnexus (https://www.snp-nexus.org/v4/, accessed on 15 August 2022) web-based server for GRCh38/hg38 [45].

Exome-Wide Association Analyses
We assessed the genomes of non-hospitalized COVID-19 patients (N = 493; controls) against those hospitalized (N = 2199; cases) in cohort 1. And compared hospitalized COVID-19 patients on respiratory support (N = 1877; cases) with those not requiring it (N = 815; controls) in cohort 2. Out of 1,432,135 SNPs employed in exome-wide genetic analysis, 229 (cohort 1) and 111 (cohort 2) SNPs showed significant association with the severity of COVID-19 (p-value < 0.001) after controlling for the covariates (Figure 2A and Figure  2B, respectively). The top 20 highly significant SNPs for cohorts 1 and 2 are listed in Tables 2 and 3, respectively with their rsIDs, associated genes, Odd's Ratio (OR), and p-value (P). The top 20 highly significant SNPs for cohorts 1 and 2 are listed in Tables 2 and 3, respectively with their rsIDs, associated genes, Odd's Ratio (OR), and p-value (P).
SNPs that showed highly significant association (p-value < 0.001) in cohort 1 were found to be associated with pathways (https://reactome.org/, accessed on 15 August 2022), such as infectious diseases, NOTCH signaling, immunoregulatory interactions between lymphoid and non-lymphoid cells, and extracellular matrix (ECM) organization ( Figure 3A). SNPs with highly significant association (p-value < 0.001) in cohort 2, were linked with pathways involved in regulation of major histocompatibility complex (MHC) class II signaling, adaptive immune system modulation, carbohydrate metabolism, and RUNX transcription factor mediated regulation ( Figure 3B).
Among the SNPs that varied significantly between cases and controls, 11 were found to be common between cohorts 1 and 2 (Table 4). Overall, we identified variants in 26 genes with strong links to COVID-19 severity via case-control or cellular studies (Table 5). In addition, we report 12 novel candidates with molecular function and supporting evidence highly suggestive of their contribution to COVID-19 pathology and post recovery sequelae.  pathways involved in regulation of major histocompatibility complex (MHC) class II signaling, adaptive immune system modulation, carbohydrate metabolism, and RUNX transcription factor mediated regulation ( Figure 3B).

A.
B.

Pharmacogenomic Annotation of Significant SNPs
SNPs that showed pronounced variation in our exome-wide association analysis were further annotated for their potential impact on drug responses using pharmacogenomic data obtained and curated from PharmGKB [46]. We found two notable SNPs, one on each cohort. According to PharmGKB, rs2228130 is associated with toxicity of gemcitabine, a chemotherapy drug and rs492602 is associated with alcoholism.

Discussion
Multiple large-scale case-control GWASs have investigated the genetic basis of the disparity in clinical severity in SARS-CoV-2 infected patients using unaffected individuals, population controls [20,26,27], or asymptomatic and mildly affected individuals as controls [29][30][31]. The exome or protein-coding region is a conserved component of the genome that may harbor potentially damaging variants, which may modulate the clinical phenotype in SARS-CoV-2 infection. Consistent with this, rare pathogenic variants in TLR7 and eight candidate loci, including, TLR3, IRF3, and IRF7 governing type I IFN response have been identified in patients with critical COVID-19 [39,40]. Further a meta-analysis of worldwide exome and genome data implicated rare disease-causing variants in TLR7 in exacerbating the COVID-19 clinical course [47]. In contrast, largescale exome-wide association studies mostly using population controls did not find rare deleterious exonic variations to be significantly enriched in patients with life-threatening COVID-19 [48,49]. Given that COVID-19 is a genetically complex disorder with polygenic risk inheritance, we surmise that rare pathogenic exonic variants with large effect sizes are unlikely to completely explain the unfavorable disease course noted in large numbers of afflicted patients worldwide. Accordingly, we used an exome-wide association study to identify common genetic variants that are enriched in acutely affected COVID-19 patients of European ancestry. First in cohort 1, we compared non-hospitalized patients to those hospitalized and second, in cohort 2 we evaluated hospitalized COVID-19 patients supported by ventilation to those not needing the same.
Our results included SNPs in the genes FBXO34, Contactin 2 (CNTN2), and Transmembrane And Coiled-Coil Domain Family 2 (TMCC2) which have been previously linked with COVID-19 severity [31,50]. FBXO34 showed significant association with the acuteness of SARS-CoV-2 infection primarily in the European ancestry [31]. While its biological functions are not well understood, F-box proteins at large are a component of the Skip1-Cullin 1-F-box (SCF) E3 ubiquitin ligase complex that participate in proteasome-mediated protein turnover, which are manipulated during infection by many viruses including human immunodeficiency virus (HIV) [51]. Moreover, using GWAS, FBXO34 has also been linked with plasma protein levels in cardiovascular disease risk among Europeans, as well as with blood cell count [52][53][54][55]. CNTN2 and TMCC2 occur in the same genomic region that is significantly associated with risk of poor COVID-19 outcomes in individuals with high European ancestry in Brazil [50].
Specifically, in cohort 1 we identified SNPs in the following genes, Killer Cell Immunoglobulin like Receptor, Two Ig Domains and Long Cytoplasmic Tail 1 (KIR2DL1) that is an inhibitory receptor upregulated in COVID-19 patients with acute respiratory distress reflecting reduced antiviral activity of natural killer cells in them [56]. MBL Associated Serine Protease 2 (MASP2) that promotes complement cascade activation; variants in MASP2 resulting in its reduced expression are noted in asymptomatic elderly COVID-19 patients [57]. Alpha-2 HS glycoprotein (AHSG) that modulates inflammation via attenuating macrophage activation and neutrophil degranulation and is significantly downregulated in severe COVID-19 [58]. The mitochondrial ATPase family AAA domain containing 3A (ATAD3A) that was upregulated in lymph nodes from COVID-19 autopsy cases [59]; pathogenic variants in ATAD3A have also been linked to type I interferonopathy [60]. β-1,3-N-acetyl-glucosaminyltransferase 8 (B3GNT8) that encodes for a glycosyltransferase responsible for anchor point creation in poly-N-acetyl-lactosamine, glycan extensions that could modulate SARS-CoV-2 cellular invasion. Rare variants in B3GNT8 are linked with milder COVID-19 presentations [61].
Arachidonate 5-Lipoxygenase Activating protein (ALOX5AP) encodes for 5-LOX activating protein (FLAP), an activating cofactor for the lipid mediator, Arachidonate 5-lipoxygenase (ALOX5) that generates leukotriene B 4 (LTB 4 ) associated with poor respiratory pathologies such as pneumonia, ARDS, and severe lung injury [62,63]. Congruently, increased ALOX5 activity and ALOX5AP expression are observed in bronchoalveolar lavage (BAL) neutrophils in critical COVID-19 patients [64]. Increased expression of ALOX5AP, ALOX5, and plasma LTB 4 are also noted in diabetic COVID-19 cases requiring intensive care [65]. G Protein-Coupled Receptor Class C Group 5 Member C (GPRC5C) whose structural variants have been associated with poor prognosis in COVID-19 [66]. Mitogen-activated Protein Kinase 7 (MAP2K7) encodes for an augmenter of the c-Jun kinase pathway during T-cell activation and is elevated in severe COVID-19 [67][68][69]. Mucin 5 AC (MUC5AC) that encodes for a gel forming secreted glycoprotein, which is upregulated in the airway mucus of seriously ill SARS-CoV-2 infected patients [70]. Receptor-interacting kinase 3 (RIPK3), a serine-threonine kinase implicated in non-caspase dependent apoptosis termed as necroptosis that leads to ARDS after trauma and sepsis [71]; increased serum levels of RIPK3 are reported in morbid cases of COVID-19 [72]. Ring Finger Protein 213 (RNF213), an E3 ubiquitin ligase and component of the proteasomal degradation machinery that is suppressed in monocytes from severe COVID-19 patients [73]. Sphingomyelin phosphodiesterase 1 (SMPD1) that encodes for a sphinogomyelinase, which catalyzes the synthesis of ceramide, a bioactive lipid mediator involved in response to cellular damage. Upregulation of SMPD1 activity and ceramide have been noted in COVID-19 patients requiring intensive care [74]. Spastic Paraplegia 7 (SPG7) shows increased expression upon SARS-CoV-2 infection [75]. It encodes for a component of the mitochondrial permeability transition pore (mPTP) that is strongly elevated in critical COVID-19 patients and correlated with increased levels of cardiac injury markers in them [76]. Further SARS-CoV-2 proteins localize to the mitochondria, and directly interact with SPG7 to disrupt mitochondrial morphology, energetics, and function in cardiomyocytes [76].
In cohort 2 our results included SNPs in the following genes, Kruppel Like Factor 6 (KLF6) that encodes for an inflammation regulator with high expression in the macrophages in BAL from grievously ill COVID-19 patients [77]. Ubiquilin like (UBQLNL) that encodes for a regulator of proteostasis and showed enhanced expression in seriously ill COVID-19 patients [78]. Fucosyltransferase 4 (FUT4), a marker for premature/immature neutrophils that showed elevated expression in severe SARS-CoV-2 infection and is likely associated with poor outcomes in sepsis [79,80]. Growth-Arrest Specific 6 (GAS6) that encodes for a ligand functional in the restorative program initiated to counterbalance pro-inflammatory immune response [81]. GAS6 levels were found to be elevated following SARS-CoV-2 infection and increased with exacerbation of severity [82]. Notably patients with high levels of GAS6 at initial stages were found to have the worst disease prognosis [82]. Interestingly, GAS6-AS1 encodes for a long non-coding RNA that is downregulated in SARS-CoV-2 infection in vitro and likely interacts with immune modulatory and SARS-CoV-2 interacting proteins such as Adenosine Deaminase RNA Specific (ADAR) and A-Kinase Anchoring Protein 8 Like (AKAP8L) [83]. Serpin Family A Member 12 (SERPINA12) that is involved in the inhibition of kallikrein-dependent inflammation and is downregulated in critical cases of COVID-19 promoting uncontrolled inflammation and worsening of disease outcome [84,85]. Integrin subunit alpha L (ITGAL), an integrin involved in monocyte migration across endothelium in anti-viral immune response was strongly activated in non-resident macrophages in severe COVID-19 [86]. Striated Muscle Enriched Protein Kinase (SPEG) that is associated with cardiomyopathy and COVID-19 mortality [87,88]. Mucin 4 (MUC4) that is downregulated in the blood of critically ill COVID-19 patients [89].
The MHC class I and II performs pivotal roles in the host adaptive immunity by modulating the antigen presentation on the cell surface for T-cell recognition [90]. Genetic variation in human leukocyte antigens (HLA) at the MHC loci modifies the immune response to viral infections, including those caused by SARS-CoV-1 [91], influenza [92], and Middle East respiratory syndrome (MERS) [93]. We identified SNPs in Class II HLA genes, HLA-DRB1, and HLA-DQB1 previously noted to be strongly repressed in a dominant population of dendritic cells [94], in acute cases of COVID-19 [95], in cohort 1 and 2, respectively. Reduced allele frequency of HLA-DRB1 has also been observed in severe COVID-19 [96]. In addition, our results in cohort 1 included HLA-DOB, involved in antigen processing and loading. Its alleles are overrepresented in symptomatic SARS-CoV-2 infected females in a Brazilian cohort [97].
Other findings of interest included variants in novel candidate genes, so far not demonstrated to be directly associated with variability in COVID-19 presentation but with strong accessory evidence for the same. These include Lysine Acetyltransferase 2B (KAT2B), an epigenetic regulator of TGFβ signaling that is a key pathway in cardiovascular development and disease [98]. Interestingly, SARS-CoV-2 infection is known to trigger aberrant TGFβ signaling that, in turn, mounts a chronic and sustained immune response likely exacerbating disease prognosis [99]. While the molecular mechanisms of KAT2B-dependent regulation of the TGFβ pathway in COVID-19 remain to be understood we surmise host variants in KAT2B may modify disease progression and severity following SARS-CoV-2 infection. One Cut Homeobox 1 (ONECUT1) encodes for a transcription factor enriched in the liver and variants in it are associated with different forms of diabetes that is a known risk factor for severe outcomes in COVID-19 [100,101]. Misshapen (Msn)/NIK related kinase 1 (MINK1) encodes for a component of the MAP kinase cascade that is involved in the phosphorylation and priming of the nucleotide-binding domain, leucine-rich-repeat containing family, pyrin-domain containing 3 (NLRP3) inflammasome [102]. Notably NLRP3 activation is central to inflammation and the pathogenesis of ARDS in severe COVID-19 [103]. Spectrin alpha, Erythrocytic 1 (SPTA1) encodes for a structural protein in red blood cells (RBCs). In SARS-CoV-2 infected individuals increased levels and oxidation of SPTA1 peptides are noted reflecting structural aberration of RBCs [104], which may potentially contribute to the severity of hypoxemia, thromboembolism, and coagulation defects noted in the manifestation of COVID-19. Transporter 2, ATP binding cassette Family B Member (TAP2) encodes for a component of the immunoproteasome that replaces the proteasome in haematopoietic cells, as part of the first line of defense against pathogens following pro-inflammatory cytokine IFN-γ stimulation [105]. Furthermore, high viral load in SARS-CoV-2 infection activates the expression of TAP2 and other immunoproteasome components in lungs of COVID-19 patients and may result in worsening of disease prognosis [106]. Unc-93 Homolog B1, TLR Signaling Regulator (UNC93B1) modulates activation of human plasmatoid predendritic cells, which play an important role in SARS-CoV-2 induced immune response [107]. A detrimental clinical course in COVID-19 pathology has been linked with an outburst of pro-inflammatory processes or a 'cytokine storm' [108]. Consistent with this we identified variants in TGFβ Induced Factor Homeobox 1 (TGIF1), related to IFN signaling that is strongly upregulated following SARS-CoV-2 infection [109]. We also found variants in Colony subunit factor 2 Receptor Subunit β (CSF2RB) that encodes the common subunit of the receptor for IL-3, IL-5, and granulocyte-monocyte stimulating factor (GM-CSF). Increased levels of various cytokines including GM-CSF occur in later stages of SARS-CoV-2 infection producing a self-amplifying cytokine loop that leads to ARDS and mortality [110]. Hepatocyte Nuclear Factor 4 alpha (HNF4A) that encodes for an intestinal transcriptional regulator which promotes expression of Angiotensin 1 Converting Enzyme 2 (ACE2) and suppresses Transmembrane Serine Protease 2 (TMPRSS2), which are SARS-CoV-2 receptor and involved in viral protein priming, respectively, in the intestine [111]. Gastrointestinal symptoms of varying severity are noted in subsets of COVID-19 patients [112], and may be associated with variation in HNF4A in them.
Our analysis in cohorts 1 and 2 identified SNPs in 17 and 9 genes, respectively, that are already associated with variability in SARS-CoV-2 infection severity (Table 5). We surmise that this is because cohort 2 interrogates the entire pool of hospitalized patients distinguished only by the requirement of respiratory support, which has been used as a surrogate for more severe disease in the present study and may be regulated by variations at fewer genetic loci. In contrast, cohort 1 compares individuals who were critically ill necessitating hospital care versus milder, non-hospitalized patients, which are likely distinguished by larger genetic differences modifying the disease severity. Polymorphisms in AHSG, B3GNT8, DSG2, CFAP57, HLA-DQB1, ONECUT1, and SPG7 were among the top 20 SNPs delineated in cohort 1. In cohort 2 variants in TGIF1 and GAS6 were among the most prominent 20 SNPs identified. Notable candidate genes common to both cohorts included ONECUT1 and HNF4A.
Further, we also identified SNPs in genes, such as FK506 binding protein (FKBP6) that is a component of the synaptonemal complex with an indispensable role in meiotic chromosome pairing and fertility in males [113]. Dipeptidase 3 (DPEP3) that encodes a membrane-bound protein in testicular germ cells, which may be important for testicular function and is downregulated in convalescent male COVID-19 patients, likely contributing to compromised fertility in them [114]. Desmoglein 2 (DSG2) that occurs at the intercalated discs coupling adjacent cardiomyocytes and is implicated in arrhythmogenic cardiomyopathy [115]. High levels of circulating anti-DSG2 autoantibodies have been reported in recuperating COVID-19 male subjects, compared to healthy individuals, as well as a subgroup showed elevated levels compared to those in arrhythmogenic right ventricular cardiomyopathy [116]. These data further suggest that host genetic variability not only modifies the course of COVID-19 severity but may modulate features such as cardiovascular disease and male infertility that are also noted in post-COVID-19 sequelae in some patients [117,118].
Finally, we note the absence of information on the SARS-CoV-2 strains for the enrolled patients as a limitation of this approach. Different SARS-CoV-2 genetic strains are associated with distinct clinical outcomes [119]. Since 'hospitalization' has been used as a proxy for severe COVID-19 in this study, variance in hospitalization rates due to different COVID-19 strains can potentially obfuscate results obtained here.
Together with existing literature, our results improve the current understanding of genetic factors modulating the spectrum and gravity of the COVID-19 phenotype in individuals of European descent. The novel findings from this work warrant further validatory analysis and studies to facilitate a comprehensive molecular understanding of the COVID-19 pathology. Together the notable candidates from this work may be useful as biomarkers to inform on adverse prognosis and may facilitate the development and deployment of effective therapies, among COVID-19 patients of European ancestry.

Conclusions
Using exome-wide genetic analysis with data from individuals of European descent we identified common variants in candidate genes that are significantly associated with the severity of COVID-19. Our findings include FBXO34, CNTN2, and TMCC2 discovered earlier in other case-control analyses studies. Overall we report SNPs in 26 genes with existing molecular links and 12 novel candidates with biological function or strong supporting evidences suggestive of a high possibility of involvement in modifying the COVID-19 clinical phenotype. These results not only broaden the current knowledge of molecular mechanisms underlying the COVID-19 pathophysiology but may be utilized in delineating a battery of biomarkers predictive of disease outlook in individuals with European ancestry.

Data Availability Statement:
The data employed in this study can be obtained upon request from the GEN-COVID consortium, University of Siena, Italy.