Analysis of ACE2 Genetic Variability among Populations Highlights a Possible Link with COVID-19-Related Neurological Complications

Angiotensin-converting enzyme 2 (ACE2) has been recognized as the entry receptor of the novel severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2). Structural and sequence variants in ACE2 gene may affect its expression in different tissues and determine a differential response to SARS-Cov-2 infection and the COVID-19-related phenotype. The present study investigated the genetic variability of ACE2 in terms of single nucleotide variants (SNVs), copy number variations (CNVs), and expression quantitative loci (eQTLs) in a cohort of 268 individuals representative of the general Italian population. The analysis identified five SNVs (rs35803318, rs41303171, rs774469453, rs773676270, and rs2285666) in the Italian cohort. Of them, rs35803318 and rs2285666 displayed a significant different frequency distribution in the Italian population with respect to worldwide population. The eQTLs analysis located in and targeting ACE2 revealed a high distribution of eQTL variants in different brain tissues, suggesting a possible link between ACE2 genetic variability and the neurological complications in patients with COVID-19. Further research is needed to clarify the possible relationship between ACE2 expression and the susceptibility to neurological complications in patients with COVID-19. In fact, patients at higher risk of neurological involvement may need different monitoring and treatment strategies in order to prevent severe, permanent brain injury.


Introduction
Angiotensin-converting enzyme 2 (ACE2) has recently caught the attention of the scientific community, since it has been recognized as the entry receptor of the novel pathogenic severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2) [1]. ACE2 is a protein encoded by its homologous gene (ACE2), which maps on chromosome X (Xp2. 22) and consists of 18 exons. ACE2 is classified as an

Materials and Methods
The study was performed by utilizing 268 DNA samples representative of the Italian general population, which were partially available at the Genomic Medicine Laboratory of Santa Lucia Foundation Hospital and partially derived from international databases. The Italian cohort of samples was composed of 100 samples analyzed by a comparative genomic hybridization (aCGH) array for assessing the presence of structural genomic variations, and 168 samples utilized for identifying common and rare variants located in the coding or splice site regions of the genome. The genetic data referred to these samples were partially derived by whole exome sequencing (WES), available at the Genomic Medicine Laboratory of IRCCS Santa Lucia Foundation Hospital and partially extracted by the Ensembl database [13][14][15]. Italian patients had an average age of 46 ± 15 years and a female/male ratio of 45:55. The research was approved by the Ethics Committee of Santa Lucia Foundation of Rome (CE/PROG.650, approved on 1 March 2018), and was performed according to the Declaration of Helsinki. The participants provided signed informed consent.
The CNV analysis was performed by Chromosome Analysis Suite (ChAS) 3.1 (Affymetrix, Santa Clara, CA, United States) using the Cytoscan750k_Array Single Sample analysis "NA33_hg19" as reference file and an average resolution of 100 Kb.
Concerning SNVs, we decided to analyze SNVs (namely SNPs and indels) located within the coding and splice site DNA regions, because they are most likely to affect protein function. We therefore selected the SNVs of interest by extracting the variants localized in the exonic and splice site regions of ACE2, whose frequency data were available from the 1000 Genomes and GnomAD databases [15][16][17][18]. The frequency cutoff for selecting the genetic variants of interest was set at minor allele frequency (MAF) > 0.0001. This approach allowed for selecting 34 putative variants located within the ACE2 sequence (Table S1). The Ensembl database [15] was also utilized to download the allele frequency data of the 34 SNVs of interest in the worldwide populations, in order to compare the frequencies between the Italian cohort and the frequencies observed in African, American, Asian, and European populations. The presence of the 34 SNVs in the Italian samples was evaluated by analyzing the output file derived by the WES and Ensembl databases. For WES results, a coverage of 20X was considered for the analysis of ACE2 sequence. The variant caller files (VCFs) obtained by WES analysis were first scanned with vcfR [19], and then subjected to analysis by "genomic variants filtering by deep learning models in NGS" (GARFIELD-NGS) [20]. In particular, vcfR is a package that enables visualization, manipulation, and performing quality control of VCF data [19]. GARFIELD-NGS is an informatics tool, which relies on deep learning models to dissect false and true variants in exome sequencing experiments [20]. The allelic frequency distribution of the detected SNVs and the existence of significant differences among Italian and worldwide populations were calculated by statistical tools. All statistical analyses were performed in an R environment [21]. A two-sided Fisher's exact test and a p-value (p) were calculated in order to assess the different allelic distributions of detected SNVs in the Italian cohorts with respect to the other populations. The significance threshold was set at p < 0.05. In addition, multiple testing correction (false discovery rate) was performed by calculating the q-value (q) [22] and setting a significance cutoff at at q < 0.05. Considering that ACE2 maps to the X chromosome, statistical analysis (two-sided Fisher's exact test) was also performed by stratifying the cohorts according to gender to evaluate sex-related effects. Moreover, the SNVs detected in the Italian population were subjected to bioinformatic predictive analysis to assess their potential impact on ACE2 protein function and splicing mechanisms. To this purpose, VarSite [23], Human Splicing Finder (HSF) [24], and Uniprot database [25] were interrogated. VarSite analyzes and predicts the effect of amino acid changes on the protein structure. HSF evaluates the effects of variants on the splicing mechanisms. Moreover, Uniprot annotation database was utilized to retrieve the topological and functional domains organization of proteins. Concerning eQTL analysis, the Genotype-Tissue Expression (GTex) database [26] was utilized to retrieve the eQTLs variants with a significant effect on ACE2 expression in different tissues, and Biomart tool [27] was used to extract the significant eQTLs distributed on the basis of the affected tissue. GTex database is a public resource that enables us to study tissue-specific gene expression and regulation, as well as their relationship to genetic variation [26]. The significance threshold for eQTL analysis was set at p < 0.05. Biomart is a web-based tool that allows one to extract data in a uniform way and filter them for different queries [27]. In this study, Biomart was utilized to extract and filter the eQTLs significantly distributed in brain tissues. The entire analytical workflow of the study is illustrated in Figure 1. Moreover, all raw data utilized for statistical and computational analysis are available at the following link: https://github.com/Andreater/Data-and-RMD-for-ACE2-article.

Results and Discussion
The final goal of the study has been the research of variants potentially affecting ACE2 expression and function, which may contribute to SARS-Cov-2 spreading among worldwide populations, and may have a clinical significance regarding the clinical variability and outcome displayed by patients with COVID-19. The analysis of CNVs in ACE2 did not report any significant variation in our study cohort, ruling out that frequent copy number variations could potentially impact ACE2 expression. Concerning SNVs, though, the screening of the 34 SNVs previously selected was performed on 168 Italian samples, and revealed the presence of five variants: rs35803318 (C/T), rs41303171 (T/C), rs774469453 (A/-), rs773676270 (T/C), and rs2285666 (C/T). These variants presented some differences in the frequency distribution in the Italian cohort, with respect to worldwide populations (Table 1 and Figure 2).
As expected, the Italian cohort showed overlapping frequency distributions with the European population, whereas significant differences were observed with respect to African, American, and Asian populations ( Table 2). The statistical significance was also confirmed after correction for multiple testing (q-value) in these cohorts ( Table 2).

Results and Discussion
The final goal of the study has been the research of variants potentially affecting ACE2 expression and function, which may contribute to SARS-Cov-2 spreading among worldwide populations, and may have a clinical significance regarding the clinical variability and outcome displayed by patients with COVID-19. The analysis of CNVs in ACE2 did not report any significant variation in our study cohort, ruling out that frequent copy number variations could potentially impact ACE2 expression. Concerning SNVs, though, the screening of the 34 SNVs previously selected was performed on 168 Italian samples, and revealed the presence of five variants: rs35803318 (C/T), rs41303171 (T/C), rs774469453 (A/-), rs773676270 (T/C), and rs2285666 (C/T). These variants presented some differences in the frequency distribution in the Italian cohort, with respect to worldwide populations (Table 1 and Figure 2).
As expected, the Italian cohort showed overlapping frequency distributions with the European population, whereas significant differences were observed with respect to African, American, and Asian populations ( Table 2). The statistical significance was also confirmed after correction for multiple testing (q-value) in these cohorts ( Table 2).
In particular, the rs35803318 (C/T) is a synonymous variant, whose allelic frequency resulted in being significantly different from the frequencies observed in African and Asian populations, whereas it overlapped with frequencies recorded in the European and American groups (Tables 1  and 2). In fact, the rs35803318 appeared to be more frequent in Italian, European, and American populations, compared to the very low frequency observed in the African and Asian cohorts ( Figure 2). Concerning rs41303171, the Italian population showed overlapping frequencies with American, European, and Asian populations but not with the African cohort, in which the variant allele was almost absent (Figure 2). The frequency of rs774469453 in the Italian population overlaps with all investigated populations except for the American group, which reported a slightly higher frequency ( Figure 2). The rs773676270 did not display significant differences in terms of frequency distribution of wild-type and variant alleles between the Italian and the other worldwide populations ( Table 2). The variant allele was extremely rare among all populations, with a slight increase in the frequency observed in the Italian cohort (Table 1, Figure 2). However, none of these variants (rs41303171, rs774469453, and rs773676270) reported significantly different distributions among populations after correction for multiple testing (Table 2). Concerning rs2285666, this is the only variant showing a higher frequency, with respect to other variants previously discussed, that appeared to be very rare among the investigated populations ( Figure 2). Concerning the frequency of rs2285666 in the Italian population, it was found to be significantly different with respect to the African, American, Asian, and even European population (Table 2). After multiple tests, the significance was maintained for American and Asian populations ( Table 2). In particular, the variant allele of rs2285666 showed the lowest frequency in the Italian cohort compared to the other populations (Table 1, Figure 2). The frequency of the five SNVs were similarly distributed among male and female patients in each population, as well as between different populations, indicating that there are no gender effects underlying the frequency distribution of ACE2 variants. Altogether, the analysis of the frequency distribution of ACE2 coding variants in the Italian population with respect to worldwide populations showed a low rate of coding variants in ACE2 gene in the Italian cohort, suggesting that the susceptibility to SARS-Cov-2 infection may depend from other genetic variants outside ACE2, or in other genes. On this subject, it would be interesting to investigate the possible contribution of non-coding variants located in the regulatory regions (such as promoters and enhancers) of ACE2 to the risk for SARS-Cov-2 infection. In addition, further research should be performed to assess possible population-specific effects that may explain the variable susceptibility to SARS-Cov-2 infection in the Italian as well as in the worldwide populations. Consistent with our findings, a similar study on the Chinese population investigated the genetic variability of ACE2 in their population, finding a different frequency distribution of ACE2 variants with respect to the other populations [28]. Moreover, they found a higher allelic frequency of eQTL variants, which is associated with higher ACE2 expression in tissues, suggesting a different susceptibility or response to SARS-Cov-2 infection with respect to other populations under similar conditions [28]. However, these data are not sufficient to demonstrate direct evidence of a relationship between ACE2 genetic variants and a differential susceptibility to SARS-Cov-2 infection among populations. In fact, data obtained on Italian and Chinese populations should be replicated in larger cohorts and in case/control studies, in order to assess their potential association with susceptibility to SARS-Cov-2 infection. Moreover, further functional studies should be carried out to demonstrate and explain these eventual associations.
Concerning the functional analysis of the five SNV variants identified in the Italian cohort, most results were inconclusive or not significant to predict the functional impact of such variants on the resulting proteins. The rs35803318 is a synonymous variant, so it does not result in an amino acid change, and thus it is unlikely to affect the protein function. Moreover, it is located in a region coding for the transmembrane portion of the protein, which normally does not interact with the SARS-Cov-2 S protein. The rs41303171 is a missense variant, resulting in the amino acid change from asparagine (Asn) with a neutral side chain to aspartate carrying a negatively charged side chain, which is therefore more hydrophilic. However, interrogation of Varsite showed that the change from Asn to Asp is not a large one, indicating that it may or may not result in a change to the protein's function. The prediction analysis of the impact of rs41303171 on protein function was therefore inconclusive, so it is not possible to predict the functional impact of this variant at the moment. However, it is interesting to note that the rs41303171 is localized in the region coding for the extracellular portion of ACE2, which normally interacts with the SARS-Cov-2 S protein. It will be therefore interesting to evaluate the impact of this variant by functional experiments in future. The rs774469453 variant is a single nucleotide deletion and is located in a splicing intronic region. Therefore, it has been subjected to HSF analysis, in order to test the variant for a potential alteration of splicing. The HSF interrogation showed that the variant allele of rs774469453 may create an exonic splicing silencer (ESS) site, but it is not significant, and therefore it probably does not affect splicing. The rs773676270 is a synonymous variant localized in a region encoding the extracellular portion of the protein, which interacts with the S protein of SARS-Cov-2. Interestingly, the interrogation of HSF reported that this variant may affect the splicing by activating an exonic cryptic acceptor site or altering an exonic splicing enhancer (ESE) site. These findings suggest that rs773676270 should be further investigated, together with other variants that have a larger effect on ACE2 function. The rs2285666 variant was a variant located in the splice site region of ACE2. However, the prediction analysis by HSF did not reveal significant splicing alterations.    Successively, all previously discussed variants were evaluated as potential eQTLs in the GTEx database, and only rs2285666 was classified as a significant eQTL in several brain tissues, namely the amygdala, anterior cingulate cortex, basal ganglia, cortex, cerebellum, hippocampus, and hypothalamus ( Figure 3).
Genes 2020, 11, x FOR PEER REVIEW 8 of 11 so it is not possible to predict the functional impact of this variant at the moment. However, it is interesting to note that the rs41303171 is localized in the region coding for the extracellular portion of ACE2, which normally interacts with the SARS-Cov-2 S protein. It will be therefore interesting to evaluate the impact of this variant by functional experiments in future. The rs774469453 variant is a single nucleotide deletion and is located in a splicing intronic region. Therefore, it has been subjected to HSF analysis, in order to test the variant for a potential alteration of splicing. The HSF interrogation showed that the variant allele of rs774469453 may create an exonic splicing silencer (ESS) site, but it is not significant, and therefore it probably does not affect splicing. The rs773676270 is a synonymous variant localized in a region encoding the extracellular portion of the protein, which interacts with the S protein of SARS-Cov-2. Interestingly, the interrogation of HSF reported that this variant may affect the splicing by activating an exonic cryptic acceptor site or altering an exonic splicing enhancer (ESE) site. These findings suggest that rs773676270 should be further investigated, together with other variants that have a larger effect on ACE2 function. The rs2285666 variant was a variant located in the splice site region of ACE2. However, the prediction analysis by HSF did not reveal significant splicing alterations. Successively, all previously discussed variants were evaluated as potential eQTLs in the GTEx database, and only rs2285666 was classified as a significant eQTL in several brain tissues, namely the amygdala, anterior cingulate cortex, basal ganglia, cortex, cerebellum, hippocampus, and hypothalamus ( Figure 3). Violin plots represent the correlation between rs2285666 genotypes and ACE2 mRNA expression in different brain tissues. The reports on the x-axis are the genotypes, with the correspondent counts in brackets. On the y-axis, the normalized expression of ACE2 is reported. Moreover, a p-value of for the variant significance is reported. The figure has been obtained from Genotype-Tissue Expression (GTex) [26].
As shown in Figure 3, the homozygous genotype for the variant allele may increase the expression of ACE2 in multiple brain tissues, and consequently, may affect ACE2 functions in the brain. This finding suggests that the genetic variability of ACE2 may have a greater impact on COVID-19-related symptoms and SARS-Cov-2 tissue tropism, rather than on the susceptibility to SARS-Cov-2 infection. On this subject, evidence of genetic variants in TMPRSS and a large presence of eQTLs in the lung may suggest that the genetic variability of TMPRSS might have a role in Figure 3. Violin plots represent the correlation between rs2285666 genotypes and ACE2 mRNA expression in different brain tissues. The reports on the x-axis are the genotypes, with the correspondent counts in brackets. On the y-axis, the normalized expression of ACE2 is reported. Moreover, a p-value of for the variant significance is reported. The figure has been obtained from Genotype-Tissue Expression (GTex) [26].
As shown in Figure 3, the homozygous genotype for the variant allele may increase the expression of ACE2 in multiple brain tissues, and consequently, may affect ACE2 functions in the brain. This finding suggests that the genetic variability of ACE2 may have a greater impact on COVID-19-related symptoms and SARS-Cov-2 tissue tropism, rather than on the susceptibility to SARS-Cov-2 infection. On this subject, evidence of genetic variants in TMPRSS and a large presence of eQTLs in the lung may suggest that the genetic variability of TMPRSS might have a role in determining the different susceptibility to SARS-Cov-2 infection among populations. However, these are preliminary observations, which have to be confirmed by further investigations.
Considering the broader effect of rs2285666 in multiple brain tissues, we decided to look at the distribution of the eQTL variants located in and targeting ACE2 in the different brain tissues, which may thereby affect ACE2 expression, and in turn, contribute to the neurological symptoms and complications observed in patients with COVID-19. Interestingly, literature studies have highlighted the crucial role of ACE2 in brain physiology and pathophysiology, including marked regulatory effects on blood pressure, cardiac hypertrophy, stress response, anxiety, cognition, and brain injury [2,29,30]. The eQTL analysis on the GTex portal and Biomart allowed identification of 29 significant eQTLs, which have been predicted to affect ACE2 expression in the brain at different levels ( Table S2). Most of them (23 eQTLs) have a significant effect on multiple brain tissues, suggesting that they may affect ACE2-related brain functions as a whole. Six eQTLs instead showed a more tissue-specific effect, indicating that they may be involved in the alteration of brain functions regulated by restricted areas of the brain. Interestingly, the mostly enriched tissues with significant ACE2-associated eQTLs were the basal ganglia, cortex, hypothalamus, and substantia nigra, whereas the amygdala and cerebellum appeared to be less affected. These findings suggest that the alteration of ACE2 expression may be involved in different neurological symptoms (seizures, stroke, encephalitis, dizziness, headache, confusion, alteration of body temperature, anosmia, and ataxia) observed in COVID-19 patients, in relation to the brain-affected area. Interestingly, none of ACE2 eQTL variants located in and targeting ACE2 were reported in the lung tissue, which, instead, appeared particularly enriched in TMPRSS-associated eQTL variants. These findings raise the need for further investigation on the role of ACE2 genetic variability in the susceptibility and clinical outcome of patients with COVID-19, especially concerning neurological symptoms. Indeed, these studies will be useful for identifying patients at a higher risk of neurological complications, which may need different monitoring and treatment strategies in order to prevent fatal outcomes or severe, permanent brain injury.
Supplementary Materials: The following supplementary files are available online at http://www.mdpi.com/ 2073-4425/11/7/741/s1, Table S1: Selection of the variants within the coding and splice site regions of ACE2 with frequency data, annotated in 1000 Genomes. The reported variants were retrieved from Ensembl GRCh37 assembly. Chr: chromosome; bp: base pairs; MAF: Minor Allele Frequency; AA: amino acid; Table S2: List of the 29 significant eQTL variants distributed in relation to brain tissues.