Allelic Variations in the Human Genes TMPRSS2 and CCR5, and the Resistance to Viral Infection by SARS-CoV-2

During the first wave of COVID-19 infection in Italy, the number of cases and the mortality rates were among the highest compared to the rest of Europe and the world. Several studies demonstrated a severe clinical course of COVID-19 associated with old age, comorbidities, and male gender. However, there are cases of virus infection resistance in subjects living in close contact with infected subjects. Thus, to explain the predisposition to virus infection and to COVID-19 disease progression, we must consider, in addition to the genetic variability of the virus and other environmental or comorbidity conditions, the allelic variants of specific human genes, directly or indirectly related to the life cycle of the virus. Here, we analyzed three human genetic polymorphisms belonging to the TMPRSS2 and CCR5 genes in a sample population from Sicily (Italy) to investigate possible correlations with the resistance to viral infection and/or to COVID-19 disease progression as recently described in other human populations. Our results did not show any correlations of the rs35074065, rs12329760, and rs333 polymorphisms with SARS-CoV-2 infection or with COVID-19 disease severity. Further studies on other human genetic polymorphisms should be performed to identify the major human determinants of SARS-CoV-2 viral resistance.


Introduction
The infection caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) induces coronavirus disease 2019 (COVID- 19), which rapidly spread around the world in the last two years. The pandemic originated in China and spread across the world with diversified levels of contagion and mortality in different geographical areas.
The causes of this diversification should be found in several factors, such as (1) genetic variability of the virus, (2) genotype of human genes involved in the viral infection process, (3) geographical and climatic features, and (4) high levels of specific environmental pollutants [1].
Furthermore, an extremely heterogeneous picture emerged in Italy, with contagion values and mortality of many northern regions that were significantly higher than those of central and southern Italy [2,3]. During the first wave of infection, Italy had one of the highest numbers of SARS-CoV-2 related contagions and one of the highest mortality rates. Across the world, the most severe clinical course of COVID-19 has been associated with old age, comorbidities, and male gender [4][5][6][7].
Thus, prognostic markers for the early identification of high-risk subjects, as well as knowledge of the environmental parameters that facilitate virus diffusion, are important to implement specific actions. On the other hand, polymorphisms in specific human genes may have contributed to the high variability in the contagiousness and lethality rates of COVID- 19.
The interpretation of the mortality rate (CFR Case-Fatality Rate) of this disease is extremely complex because several factors influence it, including high average age and population density, management policy, the number of performed swabs, and the presence of comorbidities especially in older patients. Several published studies showed that, in Italy, the death rate was positively correlated to the advanced age of patients; however, this cannot alone explain why mortality rates from COVID-19 in Italy are higher than in some countries, like Japan, where the average population age is higher [8][9][10].
A study on the danger of SARS-CoV-2 virus must consider not only the environmental factors and the genetic variability of the virus but also the allelic variants of some specific human genes that can determine a genetic predisposition to the viral infection. Two human genes related to viral infection have been identified, ACE2 and TMPRSS2, whose protein products are directly involved in the viral life cycle. One works as a cell receptor (ACE2) for virus entry into the competent human cells, and the other (TMPRSS2), a serine protease enzyme, allows the priming of the viral spike protein (S). Thus, both proteins are involved in the internalization of the virus into the human cell [7,11,12].
Many papers in the last few years analyzed the role of the Angiotensin-Converting Enzyme 2 (ACE2) and the Type II Transmembrane Serine Protease (TMPRSS2) in the mechanism of the coronavirus infection. The pivotal role of ACE2 as a receptor of viral S protein for virus entry into the host cells was demonstrated as was the role of serin protease TMPRSS2 in activating the spike protein and facilitating viral entry [13,14]. Studies of the human proteins involved in the virus cell cycle contribute to a better understanding of the variability of SARS-CoV-2 infection susceptibility, of the severity of disease symptoms, and outcomes in the mortality rate in the involved populations; however, there is still much to uncover [15][16][17].
Polymorphisms in the TMPRSS2 gene were analyzed regarding their involvement in SARS-CoV-2 infection, and some common variants seem to modulate TMPRSS2 expression, allowing a mild-to-moderate effect in infection susceptibility [4,18]. Genetic variants related to reduced TMPRSS2 expression may confer less individual susceptibility to infection with a better disease outcome. Indeed, greater frequency of alleles with high expression levels of TMPRSS2 was related to higher COVID-19 prevalence and mortality rates in Europe and the Americas, contrary to the lower prevalence of the disease and mortality observed in Southeast Asia, where alleles characterized by a low level of expression of TMPRSS2 gene were described [19].
In addition to the TMPRSS2 gene, other studies showed the involvement of CC chemokine receptor 5 (CCR5) in genetic susceptibility of SARS-CoV-2 infection. The CCR5-∆32 I/D polymorphism was detected as a promising candidate to predict the severity of SARS-CoV-2 infection. The specific deletion seems to be protective against COVID-19 disease [20]. As CCR5 is not a receptor recognized by SARS-CoV-2, the most likely explanation for the protective effect of the ∆32 allele in COVID-19 is an attenuated inflammatory response among the CCR5-deletion carriers [21,22].
A relevant fact to be considered concerns the variability of the viral infection, and of the clinical symptoms of the disease, observed in cohabiting subjects (families, hospital communities, and retirement homes for the elderly) in which only one person was infected by SARS-CoV-2 and developed the disease, while the other members of the group showed no virus or disease symptoms. Thus, if we consider families or, in general, communities, the specific genotype of each cohabiting subject could have a role in the different responses to viral infection. Here, taking into consideration the results previously described for some polymorphisms of TMPRSS2 and CCR5 genes in different human populations (see above), we analyzed three polymorphisms (rs35074065, rs12329760, and rs333) to detect the presence of an allele or a genotype that could confer resistance to viral infection and/or result in less severe symptoms of COVID-19 disease.

Subjects
A total of 102 subjects were recruited for the present study ( Table 1). The subjects belong to families in which some members were infected in the first pandemic wave (March and April 2020), when the original strain of SARS-CoV-2 circulated in Italy. Their genomic DNA samples were collected in 2021, from May to December. None of the subjects analyzed were vaccinated at the time of the infection, as vaccines were not yet available.  (1) Hospitalized and living in the same hospital ward. (2) Cohabiting subjects belonging to 10 different families.
The subjects belong to two different groups living in the same place, with several of them infected by the SARS-CoV-2 virus and others not. One group is composed of subjects who were hospitalized in the intellectual disability ward of the Oasi M. Santissima Scientific Hospitalization and Care Institutes (IRCCS; Troina, Italy), and the other group consists of subjects belonging to 10 different families, each composed of three to five cohabiting members, where at least one member was infected by SARS-CoV-2 virus and the other members not.

TMPRSS2 Gene
Two different SNPs belonging to TMPRSS2 gene were analyzed, rs35074065 and rs12329760. The former consists of a deletion of the C nucleotide (delC) and has an allele frequency in the European population of 17.4%. The second is a nucleotide substitution (C > T) with the T allele in the European population showing a frequency of 22.0% (dbSNP database at www.ncbi.nlm.nih.gov/snp/ accessed on 21 April 2021).
The genotypic analysis of these two polymorphisms does not show significant correlation (see Figures 1 and 2) between susceptibility of SARS-CoV-2 virus infection and the alleles detected in rs35074065 ( Figure 1) and rs12329760 ( Figure 2) SNPs in the human TMPRSS2 gene.

CCR5 Gene
The rs333 polymorphism, consisting of a deletion of 32 bp in the CCR5 gene, was analyzed. This allele, ∆32, shows a frequency in the European population of 11.0% (dbSNP database at www.ncbi.nlm.nih.gov/snp/ accessed on 21 April 2021). The genotypic analysis of the polymorphism showed no significant correlation (see Figure 3) with susceptibility to SARS-CoV-2 virus infection ( Figure 3).

TMPRSS2, CCR5 Gene Polymorphisms and COVID-19 Disease Severity
The three polymorphisms rs35074065, rs12329760, and rs333 were analyzed in relation to the genotypes observed in intellectual disability patients with respect to the severity of the disease. We distinguished three levels of disease severity in addition to the asymptomatic cases (pauci-symptomatic, symptomatic, and severe) as follows:
No significant correlation (see Figure 4) was found between disease severity and the genotypes of the three polymorphisms analyzed ( Figure 4).

Genotype Concordance between Infected and Non Infected Members in Cohabiting Subjects
The main input of the work was the verification of whether a specific genotype, among those considered by us, determines, in the carrier, a resistance to the viral infection or, conversely, a higher susceptibility to it, and the rationale to use individuals who live in the same "environment" (a home) allowed the elimination of all environmental variables that could interfere in the infection process. Thus, we verified if there were families with infected subjects cohabiting with healthy members who had different genotypes.
This type of information ( Table 3) clearly showed that there are no genotypes among those analyzed that could have a direct correlation with the COVID-19 disease or with resistance to SARS-CoV-2 infection. In fact, except for the T/T genotype of the rs12329760 polymorphism, which was, however, very rare in the population analyzed, all the other genotypes showed a random distribution between individuals who were positive and negative for viral infection, in accordance with the statistical analyses on the general population or divided into the two family and hospitalized groups (Figures 1-3). As an example, the genotype C/C of the SNP rs12329760 is present in infected and non-infected members in five families (Table 3).

Discussion
Environmental factors, such as temperature, humidity, and air pollutants, have been correlated with the incidence of new cases, mortality rates, and the spread of the virus in different Italian regions [1]. In fact, PM 10 and PM 2.5 airborne particulate matters at high concentrations can influence the circulation of the virus in the air acting as a carrier and can exert a boost action, thus, stimulating the high diffusion of the epidemic [23]. Despite the widespread diffusion of the virus and independent of the mode of action (by interpersonal infection or by environmental spread), some subjects showed no viral infection even though they cohabited with infected subjects, some of whom died from COVID-19.
Thus, genetic variation among cohabiting subjects should be considered to explain the different responses to SARS-CoV-2 infection. Thus far, it is still unclear why the severity of COVID-19 disease is so highly variable between infected individuals: indeed, some do not develop any symptoms (asymptomatic), while others develop mild ones (paucisymptomatic). In other cases, individuals may present with a respiratory syndrome that may result in intensive care. Here, we analyzed three polymorphisms of the human TMPRSS2 and CCR5 genes to highlight possible associations between alleles/genotypes and resistance/sensitivity to viral infection or different symptoms related to COVID-19 disease.
Subjects from families and a hospital community in Sicily (Italy) with different levels of susceptibility or responses to SARS-CoV-2 infection were included in the study. Cohabitation (in a community or in a household) represents one of the main risk factors for COVID-19 infection. Indeed, exposure to the same viral strain may result in either infection sensitivity or resistance or in highly variable levels of disease severity. Thus, when we take into consideration households or communities, the genetic variability of each household or community member should play a role in the differing responses to viral infection.
It was previously described that polymorphisms of the CCR5 gene may have an impact on susceptibility to viral infection with COVID-19 [5]. Similarly, a significantly higher CCR5-∆32 frequency was observed in symptomatic SARS-CoV-2 positive patients compared to COVID-19 asymptomatic subjects. Moreover, these researchers found that CCR5 deletion may predict the severity of SARS-CoV-2 infection [4]. Thus, countries with a higher CCR5-∆32 frequency showed the highest viral infection and COVID-19 mortality rates [24,25]. In our work, the statistical data are supportive of the fact that CCR5-∆32 mutation is not a risk factor for SARS-CoV-2 infection nor for the disease course (see Figures 3 and 4), at least in the Sicilian population we analyzed. This is similar to what was observed in Germany [26].
Moreover, we analyzed the three polymorphisms, rs35074065 and rs12329760 in the TMPRSS2 gene and rs333 in the CCR5 gene, in a cohort of patients with intellectual disability residing in rehabilitation departments of the Research Institute "IRCCS Oasi Maria SS", based in Troina, Italy. Patients were divided into subgroups matched by COVID-19 disease severity: asymptomatic, pauci-symptomatic, symptomatic, and severe, and in this case no statistically significant differences were found between groups (Figure 4) even when we considered only female patients (data not shown).
No statistical significance was obtained when grouping asymptomatic and paucisymptomatic vs. symptomatic and severe, or when using allele frequencies for statistical analysis (data not shown). No correlation was found between age and disease severity (r = 0.19896181, p-value = 0.139439) even when considering only female (90.7%) patients (r = 0.211702, p-value = 0.147545). Similarly, when comparing infected (COVID-19 positive) and non-infected (COVID-19 negative) individuals, genotypic or allele frequencies of the loci rs333, rs35074065, and rs12329760 failed to reach statistical significance (Figures 1-3) even when considering females and males separately ( Table 2).
Although the "sample size" analysis had shown our population as being appropriate for the statistical result obtained (sample size about 102, see Figures 1-3), certainly, a larger sample would have given more robustness "power" to the work (power= 0.132740, 0.0922357, and 0.15933 for SNPs rs35074065, rs12329760, and rs333, respectively). Unfortunately, to our knowledge, there are not many families that show both members infected by SARS-CoV-2 and other members who were not; furthermore, with the advent of vaccinations, it is no longer possible to obtain new samples, as, during correlation analysis, the protective effect of the vaccine would overlap with gene polymorphisms.
Finally, the difference between the observed genotype frequencies of the three polymorphisms in the two compared groups (infected and not infected, Figures 1-3) is very low-predominantly 1%. The chi-square tests obtained are indeed strongly indicative of this negative correlation, statistically confirmed by the high "p" value obtained (p = 0.991, 0.527, and 0.995, for SNPs rs35074065, rs12329760, and rs333, respectively). In summary, our data highlight that the three analyzed human polymorphisms (rs35074065, rs12329760, and rs333) in the samples from Sicily (Italy) are indicative that no correlation exists with the increased risk factor for SARS-CoV-2 infections, nor are they correlated to disease course.
Thus, these polymorphisms cannot be the cause of the different responses to viral infection detected among subjects cohabiting in the same house or in the same community building. Further studies on other human polymorphisms are necessary to identify the major determinants of SARS-CoV-2 viral resistance observed in these subjects, and work is in progress to analyze samples from different countries, which can include vaccinated and non-vaccinated subjects, to show whether the three polymorphisms here studied or elsewhere identified can be relevant in the vaccinated individuals.

Subjects
Biological materials, and data were collected from May to December 2021; however, infection of the analyzed subjects occurred in the first pandemic wave (Mar/May 2020). The present study was conducted on a cohort of 102 subjects from Sicily (Italy) with a personal or familial history related to the first wave of COVID-19 infection (Supplementary  Table S1). We analyzed people living in the same place, belonging to the same family, or hospitalized in the same hospital ward, some of whom were infected by the SARS-CoV-2 virus and showed symptoms of the disease (65 cases), and other cohabitants not showing infection or symptoms (37 cases).
The collected biological samples consisted of peripheral blood or exfoliative buccal cells. All procedures performed in the study were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Signed informed consent forms were obtained from the analyzed subjects. This study was approved by the local ethics committee "Comitato Etico IRCCS Sicilia-Oasi Maria SS", approval code: 2021/05/04/CE-IRCCS-OASI/43 as of 5 May 2021.

Nasopharyngeal Swabs for SARS-CoV-2 Detection
We used diagnostic Vita PCR SARS-CoV-2 platforms (Menarini Diagnostics, Wokingham, UK, and Credo Diagnostics, Singapore) following a protocol previously described [33] or the COVID-19 CE-IVD RT-PCR by Applied Biosystems test. This tool detects two SARS-CoV-2 RNA target sequences, one in the (specific) virus nucleocapsid (N) gene region and the other in the conserved region of SARS-like viruses (including the SARS-CoV-2, SARS-CoV, and SARS-like bat coronaviruses), respectively. Thus, the genotype of SARS-CoV-2 was not detected, as the diagnostic tool was not specific for virus strain identification.

DNA Preparation and Analysis
Genomic DNA was prepared from whole peripheral blood or buccal exfoliated cells using the MagCore Compact Automated Nucleic Acid Extractor (RBC Bioscience, New Taipei City, Taiwan). DNA samples were obtained following the standard manufacturer's protocols suggested for each type of biological material.
• CCR5 gene, polymorphism rs333: The genotypes of the polymorphisms rs12329760, and rs35074065 were obtained by sequencing the amplified DNA segments ( Figure 5A,B). ExoSAP purified samples were used before direct sequencing of PCR products. Sequencing was performed on a SeqStudio using a BigDye Terminator Cycle Sequencing Ready Reaction Kit (Applied Biosystems, Waltham, MA, USA). The genotypes of the polymorphisms rs333 (CCR5-∆32) were obtained by capillary electrophoresis with SeqStudio and GeneMapper 4.0 software ( Figure 5C). Forward primer for ∆32bp deletion (rs333) of CCR5 gene was modified on the 5 end by the addition of a dye label (6FAM). After PCR amplification, 1 µL of the products was diluted in 12 µL of deionized formamide and 1 µL of GeneScan 350 Rox (molecular weight DNA marker). The DNA was denatured at 95 • C for 3 min and loaded on an SeqStudio Genetic Analyser (Applied Biosystems) for amplicon length determination.

Statistical Analysis
The experimental design evaluated: The genotypes of the polymorphisms rs333 (CCR5-∆32) were obtained by capillary electrophoresis with SeqStudio and GeneMapper 4.0 software ( Figure 5C). Forward primer for ∆32bp deletion (rs333) of CCR5 gene was modified on the 5 end by the addition of a dye label (6FAM). After PCR amplification, 1 µL of the products was diluted in 12 µL of deionized formamide and 1 µL of GeneScan 350 Rox (molecular weight DNA marker). The DNA was denatured at 95 • C for 3 min and loaded on an SeqStudio Genetic Analyser (Applied Biosystems) for amplicon length determination.

Statistical Analysis
The experimental design evaluated: (1) The differences between four COVID-19 categories (asymptomatic, pauci-symptomatic, symptomatic, and severe) and the genotypes of each of the three loci analyzed in 66 subjects with intellectual disability using a 4 × 3 contingency table.
(2) The genotypes and allele frequencies between infected and non-infected patients using a 3 × 2 contingency table McNemar test. p values less than 0.05 were considered statistically significant. If the total N for a 2 × 2 chi-square table was less than about 40, the Yates continuity correction was used to compensate for deviations from the theoretical probability distribution. The calculation of the Pearson Correlation Coefficient was conducted using online website software: https://www.socscistatistics.com, accessed on 5 May 2022. Sample size: the statistical appropriateness of the chisquared tests used in this study was assessed a posteriori by first calculating the effect size and then calculating the corresponding sample size required with α = 0.05 (www.statskingdom.com/34test_power_chi2.html, accessed on 5 May 2022). Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: All data were in the Table S1.