Association of the Transmembrane Serine Protease-2 (TMPRSS2) Polymorphisms with COVID-19

SARS-CoV-2 uses the ACE2 receptor and the cellular protease TMPRSS2 for entry into target cells. The present study aimed to establish if the TMPRSS2 polymorphisms are associated with COVID-19 disease. The study included 609 patients with COVID-19 confirmed by RT-PCR test and 291 individuals negative for the SARS-CoV-2 infection confirmed by RT-PCR test and without antibodies anti-SARS-CoV-2. Four TMPRSS2 polymorphisms (rs12329760, rs2298659, rs456298, and rs462574) were determined using the 5′exonuclease TaqMan assays. Under different inheritance models, the rs2298659 (pcodominant2 = 0.018, precessive = 0.006, padditive = 0.019), rs456298 (pcodominant1 = 0.014, pcodominant2 = 0.004; pdominant = 0.009, precessive = 0.004, padditive = 0.0009), and rs462574 (pcodominant1 = 0.017, pcodominant2 = 0.004, pdominant = 0.041, precessive = 0.002, padditive = 0.003) polymorphisms were associated with high risk of developing COVID-19. Two risks (ATGC and GAAC) and two protectives (GAGC and GAGT) haplotypes were detected. High levels of lactic acid dehydrogenase (LDH) were observed in patients with the rs462574AA and rs456298TT genotypes (p = 0.005 and p = 0.020, respectively), whereas, high heart rate was present in patients with the rs462574AA genotype (p = 0.028). Our data suggest that the rs2298659, rs456298, and rs462574 polymorphisms independently and as haplotypes are associated with the risk of COVID-19. The rs456298 and rs462574 genotypes are related to high levels of LDH and heart rate.

TMPRSS2 is a member of transmembrane protease serine, a family of proteins with conserved serine protease domains located on the cell membrane. TMPRSS2 is an essential enzyme that cleaves the hemagglutinin of many influenza virus subtypes and the coronavirus protein S [6,7]. TMPRSS2 deficiency protects mice against H1N1 and H7N9 influenza A virus infections [6,8]. TMPRSS2 can cleave the S protein and thus facilitate the entry of SARS-CoV-2 into the cell [2]. Recently has been reported that cell lines expressing TMPRSS2 are highly susceptible to SARS-CoV, MERS-CoV, and SARS-CoV-2 [9]. The gene encoding TMPRSS2 is located on chromosome 21 and is highly polymorphic. The prevalence and mortality of the COVID-19 pandemic show marked geographic variability, suggesting that genetic differences in populations, and the presence of various types of viruses, play an essential role in this variability [10][11][12][13][14]. Recently an in-depth genetic analysis of chromosome 21 using genome-wide association study data established that five polymorphisms within TMPRSS2 and near MXI gene were associated with a reduced risk of developing severe COVID-19 [15]. In 2020, our research group performed bioinformatics analysis and reported some possible genes and polymorphisms candidates for study in patients with COVID-19 [16]. In that study, we compared the frequencies of these polymorphisms in several populations, including frequencies of Mexican individuals from Los Angeles, CA, USA. For the present study, we select four polymorphisms (rs12329760, rs2298659, rs456298, and rs462574) with high frequency (more than 15%) in Mexican Americans and with possible functional effects. Thus, the present study aimed to evaluate the association of the TMPRSS2 polymorphisms with COVID-19.
All participants or their relatives signed the institutional consent letter. The study was conducted following the Declaration of Helsinki and approved by the Ethics and Research Committees of the Instituto Nacional de Cardiología Ignacio Chávez (protocol 20-1202, approved 8 January 2021).

Sample Handling
Competent trained personnel handled and processed the patients' blood samples, which were later transported to the laboratory for processing. Samples were centrifuged, and the serum was separated in a class II biological safety cabinet. Personnel handling the samples used personal protective equipment, including disposable gloves, a lab coat, and a surgical mask. Samples were collected and processed in a laboratory that adheres to the guidelines established in the Official Mexican Standards NOR-007-SSA3-2011, NOM-087-SEMARNAT-SSA1-2002, NOM-010-SSA2-2010, NOM-006-SSA2-2013, and NMX-EC-15189  IMNC-2015.

Genetic Analysis
Genomic DNA of COVID-19 patients and controls was isolated from 5 mL of peripheral blood (containing EDTA) using a QIAamp DNA Blood Mini kit (QIAGEN, Hilden, Germany). DNA integrity was verified in 1% agarose gels stained with ethyl bromide. Then, DNA quantification was performed using automated spectrophotometry equipment (NanoDrop, ND-1000 spectrophotometer), and aliquots of 10 ng/µL concentration were prepared. Based on the previous results and the functional prediction analysis [16], we selected for the study four TMPRSS2 polymorphisms (rs12329760, rs2298659, rs456298, and rs462574) that were determined using 5 -exonuclease TaqMan genotyping assays, on an ABI Prism 7900HT Fast Real-Time PCR system (Applied Biosystems, Foster City, CA, USA).

Statistical Analysis
Data are expressed as frequencies, median (interquartile range), or mean ± standard deviation. The categorical variables were analyzed using the chi-squared test, whereas the continuous variable comparisons were made by either Student's t-test or Mann-Whitney U test, as appropriate. The chi-squared test was used to determine Hardy-Weinberg's equilibrium. The association of the TMPRSS2 polymorphisms with COVID-19 was analyzed by logistic regression analysis under the inheritance model [additive (major allele homozygotes vs. heterozygotes vs. minor allele homozygotes), codominant 1 (major allele homozygotes vs. heterozygotes), codominant 2 (major allele homozygotes vs. minor allele homozygotes), heterozygote (heterozygotes vs. major allele homozygotes + minor allele homozygotes), dominant (major allele homozygotes vs. heterozygotes + minor allele homozygotes) and recessive (major allele homozygotes + heterozygotes vs. minor allele homozygotes)]. Linkage disequilibrium and haplotype analysis were performed with Haploview software (version 4.1, Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA, USA). The biochemical markers in the COVID-19 patients were compared in the different genotypes. The data were expressed as means ± SD, and comparisons were performed by ANOVA and least significant difference (LSD) as a post hoc test; p values < 0.05 were considered statistically significant. We used the SPSS software v15.0 (SPSS Chicago, IL) for all analyses.

Characteristics of the COVID-19 Patients
Six hundred-nine COVID-19 patients were included in the study, with a median age of 50.3 ± 14.56 years. The most frequent comorbidities were obesity in 47.7% (n = 293), hypertension in 28.3% (n = 174), and T2DM in 26.3% (n = 162) ( Table 1). The levels of biochemical markers are shown also in this table. On the other hand, the more common symptoms were cough (64.3%), dyspnea (48.7%), headache (41.3%), fatigue (34.7%), and myalgia (34.7%) ( Figure 1).  Figure 2 shows the allele and genotype frequencies of the TMPRSS2 polymorphisms in COVID-19 patients and controls. The observed and expected frequencies of the four polymorphisms were in Hardy-Weinberg equilibrium (p > 0.05). Even though the distribution of the rs12329760 was similar in both groups, important differences were observed in the allele and genotype distribution of the rs2298659 (p = 0.0054 and p = 0.0016, respectively), rs456298 (p = 0.0001 and p = 0.0006, respectively), and rs462574 (p = 0.00002 and p = 0.00042, respectively) polymorphisms in COVID-19 patients and healthy controls.  Figure 2 shows the allele and genotype frequencies of the TMPRSS2 polymorphisms in COVID-19 patients and controls. The observed and expected frequencies of the four polymorphisms were in Hardy-Weinberg equilibrium (p > 0.05). Even though the distribution of the rs12329760 was similar in both groups, important differences were observed in the allele and genotype distribution of the rs2298659 (p = 0.0054 and p = 0.0016, respectively), rs456298 (p = 0.0001 and p = 0.0006, respectively), and rs462574 (p = 0.00002 and p = 0.00042, respectively) polymorphisms in COVID-19 patients and healthy controls. Alleles frequency

Discussion
The role that TMPRSS2 plays in the entry of SARS-CoV-2 into the cell is well known. A comprehensive comparative genetic analysis of 81,000 human genomes suggests the association of TMPRSS2 polymorphisms with COVID-19 susceptibility [17]. In the present study, we analyzed the distribution of four TMPRSS2 polymorphisms in a Mexican cohort of patients with COVID-19. Three polymorphisms (rs2298659, rs456298, and rs462574), and two haplotypes (ATGC and GAAC) were associated with a high risk of COVID-19. Contrarily, two haplotypes (GAGC and GAGT) were associated with a low risk of COVID-19. Different distribution of some markers of damage was observed in each genotype of these three polymorphisms.
Variations in genes related to the mechanisms used by SARS-CoV-2 to enter the host cells have an important impact on the variability of the infection observed in different ethnic groups [18]. Several studies that include case-controls, genome-wide association, and some using bioinformatics tools have suggested the participation of some polymorphisms in the susceptibility to COVID-19 [16,[19][20][21][22]. In this context, the TMPRSS2 gene has been studied with different results. Recently, Li et al. published a systematic review of ACE2 and TMPRSS2 polymorphisms associated with COVID-19 [23]. This review included 33 articles with 33,923 patients from 160 regions and 50 countries (principally Caucasians) and identified 12 TMPRSS2 polymorphisms associated with COVID-19. Irham et al. using data from multiple genomes, established that some variants of the TMPRSS2 gene affect the expression of the TMPRSS2 protease in lung tissue, suggesting that these variants could be involved in susceptibility to SARS-CoV-2 infection [24]. Latini et al. analyzed 131 COVID-19 patients by exome sequencing and reported three TMPRSS2 polymorphisms with different distribution in patients than in controls (rs75603675, rs114363287, and rs12329760) [25]. Considering the variability in frequencies of the TMPRSS2 polymorphisms reported associated with COVID-19, we determine only those with possible functional effects and high frequency in a Mexican population from Los Angeles reported previously in a bioinformatics analy-sis [16]. Only one of the polymorphisms included in our study was analyzed previously in other populations. This polymorphism is the rs12329760, located in the coding region (exon 6) and produces an amino acid change in codon 160 (Val by Met). This polymorphism could be damaging, but it does not seem to have any effect at the post-translational level. The prevalence of this polymorphism varies between 10 and 65%, with higher frequency in Asian populations [16,26]. Wulandari et al. reported an association of rs12329760 with COVID-19 in an Indian population [27]. A similar result was reported by Andolfo et al. in a European genetic ancestry population [15]. On the contrary, a study in a German population did not detect an association of this polymorphism with the risk of infection by SARS-CoV-2 or severity by COVID-19 [28]. This result agrees with our report of no association of this polymorphism with COVID-19 in the Mexican population.
The polymorphisms associated with COVID-19 in our population were rs2298659, rs456298, and rs462574. The rs2298659 polymorphism does not generate an amino acid change (Gly296Gly), but the "in-silico" analysis showed that it could have an impact at the mRNA level, affecting splicing and possibly generating one of the 20 TMPRSS2 isoforms reported so far (https://www.ensembl.org/Homo_sapiens/Transcript/Exons?db=core;g= ENSG00000184012; r = 21:41464300-41531116; t = ENST00000332149) (accessed on 25 June 2022). SRp40 is expressed in all cells, tissues, and organs, including the lung, where airway epithelial cells are found [29][30][31]. The rs456298 and rs462574 are located in the 3 -UTR region and have possible functional effects. The A allele of the rs456298 can create a binding site for hsa-miR-450b-5p, whereas the T allele disrupts this binding site, increasing the levels of TMPRSS2 mRNA and protein. The expression of miR-450b-5p has been found to be reduced in the lung tissues of bleomycin-treated mice [32]. According to the web-based tool RegulomeDB (https://regulomedb.org/regulame-search/) (accessed on 11 July 2022), this polymorphism affects the state of chromatin with high transcription of the gene in samples from the stomach, mucosa of the rectum, the liver, pancreas, intestine, B cells, lung, natural killer cells, kidney, and heart. In-silico analysis (with the SNP function prediction program) shows that the G allele creates binding sites for hsa-miR-127-3p and for hsa-miR-557; thus, the A allele disrupts the binding site for has-miR-127-3p and has-miR-557, and it could increase the levels of TMPRSS2 mRNA and protein. These microRNAs could regulate the translation of the TMPRSS2 protease with important effects on the S protein priming of the SARS-CoV-2 and, in consequence, in the process of infection by this virus (Figure 3). Patients with rs462574 AA and rs456298 TT genotypes presented high concentrations of LDH. In a pooled analysis that included nine studies, Henry et al. reported that elevated LDH concentrations are associated with the severity and mortality of COVID-19 [33]. This result was corroborated in a systematic review of 34 studies [34]. On the other hand, patients with rs462574 AA genotype showed a high heart rate, a condition associated with survival chances in patients older than 70 years [35].
The four polymorphisms analyzed were in high linkage disequilibrium, and four haplotypes were associated with COVID-19, two with a high risk (ATGC and GAAC) and two with a low risk (GAGC and GAGT). The two haplotypes associated with low risk included the three alleles with low frequencies in COVID-19 patients (rs462574 G, rs456298 A, and rs2298659 G alleles). The study of haplotypes is important since these haplotypes could be delimiting a region associated with COVID-19.
The association of TMPRSS2 polymorphisms with COVID-19 infection provides not only a reason to evaluate the risk of infection but also a basis for possible prevention or treatment by TMPRSS2 inhibition. Considering the critical role of the TMPRSS2 in the SARS-CoV-2 entry into the cells [2], several inhibitors of this protease have been under study. Nafamostat and camostat are synthetic serine protease inhibitors that can block viral entry to the host cell. In vitro experiments have shown that camostat mesylate and nafamostat mesylate significantly reduce the infection by SARS-CoV-2 [36,37]. . Hypothesis based on in-silico analysis of three SNPs located in the TMPRSS2 gene. TM-PRSS2 can cleave the S protein and thus facilitate the entry of SARS-CoV-2 into the cell. According to in-silico analysis, the A allele of rs2298659G/A could create a binding site for SRp40, a splicing protein involved in the generation of protein isoforms (20 mRNA isoforms have been described). The rs456298A/T and rs462574G/A polymorphisms (located in 3 UTR) could create binding sites for miR-450b-5p, miR-1324, and miR-127-3p/miR-557. The rs456298T allele of TMPRSS2 could disrupt the binding site for miR-450b-5p, whereas the rs462574A allele could disrupt the binding sites for miR-127-3p and miR-557. These alleles could increase the TMPRSS2 protein levels with the consequent increase in priming of the spike protein. This fact could facilitate the entry of SARS-CoV-2 into the cell and increase the risk of COVID-19.
A strength of the study was the inclusion of a control group of individuals exposed to the intensive care unit and who were not infected, as demonstrated by the absence of anti-SARS-CoV-2 antibodies. However, some limitations should be considered; (a) the number of controls was smaller than the number of patients, and (b) we only analyzed four polymorphisms of the gene; however, we selected those polymorphisms that, after a bioinformatic analysis, showed possible functional effect and with a frequency of minor allele enough to see statistical differences in our population, and (c) it is not possible to establish if the differences in LDH and heart rate levels are specific to COVID-19 patients or are present in healthy controls, this because these variables were not determined in this later group.

Conclusions
In conclusion, our data establish that the rs2298659, rs456298, and rs462574 polymorphisms are associated with the risk of COVID-19. We detected two risks (ATGC and GAAC) and two protective (GAGC and GAGT) haplotypes. LDH, heart rate, and high temperature were more frequent in patient carriers of specific genotypes of these polymorphisms. Funding: This research received no external funding. Open Access funding for this article was supported by Instituto Nacional de Cardiología Ignacio Chávez.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics and Research Committees of the Instituto Nacional de Cardiología Ignacio Chávez (protocol 20-1202, approved 8 January 2021).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author.