Molecular Profile of Variants Potentially Associated with Severe Forms of COVID-19 in Amazonian Indigenous Populations

Coronavirus disease 2019 (COVID-19) is an infection caused by SARS-CoV-2. Genome-wide association studies (GWASs) have suggested a strong association of genetic factors with the severity of the disease. However, many of these studies have been completed in European populations, and little is known about the genetic variability of indigenous peoples’ underlying infection by SARS-CoV-2. The objective of the study is to investigate genetic variants present in the genes AQP3, ARHGAP27, ELF5L, IFNAR2, LIMD1, OAS1 and UPK1A, selected due to their association with the severity of COVID-19, in a sample of indigenous people from the Brazilian Amazon in order to describe potential new and already studied variants. We performed the complete sequencing of the exome of 64 healthy indigenous people from the Brazilian Amazon. The allele frequency data of the population were compared with data from other continental populations. A total of 66 variants present in the seven genes studied were identified, including a variant with a high impact on the ARHGAP27 gene (rs201721078) and three new variants located in the Amazon Indigenous populations (INDG) present in the AQP3, IFNAR2 and LIMD1 genes, with low, moderate and modifier impact, respectively.


Introduction
In December 2019, in Wuhan, China, the first cases of severe acute respiratory infection (SARS-CoV-2) by coronavirus (COVID-2) were diagnosed [1,2].In a short time, the cases of infection reached large proportions, spreading to several regions and reaching several countries [3,4].According to data from the World Health Organization (WHO), on June 7th, 2023, 767,750,853 cases of SARS-CoV-2 infection and 6,941,095 COVID-19 deaths were confirmed worldwide [5].Therefore, even with the improvement of the reality of the Viruses 2024, 16, 359 2 of 9 pandemic, due to the administration of vaccine doses, this disease still represents a global public health problem [6].
There is great clinical diversity among patients with COVID-19.The disease can manifest in different ways, varying from asymptomatic to mild and severe forms, and can lead to death [3,4].Clinical studies relate the heterogeneity of the disease with the genetic influence on the individual response to infection.The association of genetic factors with the severity and clinical evolution of COVID-19 has been investigated little.However, genome-wide association studies (GWASs) have been developed to understand better the relationship of genes associated with the severity of this disease [7][8][9], enabling the development of more specialized therapies for the risk group [7].
The association of genetic factors with the severity of COVID-19 has been addressed in different studies [7][8][9].Another study developed by our research group [10] also investigated the interaction of genes (SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6, XR1 and ABO) with the most severe forms of the disease and demonstrated a strong relationship of locus 3p21.31 with the severity of SARS-CoV-2 infection [7,8].
Seven new genes (AQP3, ARHGAP27, ELF5, IFNAR2, LIMD1, OAS1 and UPK1A) were related to the severity of COVID-19 in populations of European origin.The difference in disease severity between genders and the association of androgenic hormones with the severity of SARS-CoV-2 infection were also addressed in the study [9].
Therefore, the objective of our study is to investigate genetic variants in the genes AQP3, ARHGAP27, ELF5, IFNAR2, LIMD1, OAS1 and UPK1A, which were selected due to the association of these genes with the severity of COVID-19, in a sample of Amazonian indigenous peoples.

Consent and Ethics
This study was approved by the National Research Ethics Committee (CONEP) and the Research Ethics Committee of the Center for Tropical Medicine with the opinion 20654313.6.0000.5172and CAAE 33934420.0.0000.5634.The representatives of the groups participating in the study were informed about the stages of the study and signed the Free and Informed Consent Form (TCLE).Their materials were collected according to the Declaration of Helsinki.

Study Population
This study was carried out through the blood collection from 64 healthy indigenous people from the Brazilian Amazon region, belonging to the original groups: Asurini do Tocantins, Asurini do Xingu, Araweté, Arara, Juruna, Awa-Guajá, Kayapó/Xikrin, Munduruku, Karipuna, Phurere, Wajãpi and Zo'é.The collection of blood material was carried out before the pandemic period.The study participants were healthy and did not have COVID-19.
Information from markers indicative of ancestry (AIMs) was obtained to confirm the ancestry and the mixture between continental populations (European, African and Asian) in three multiplex PCR reactions [11][12][13].Electrophoresis was used to analyze the amplicons in the sequencer ABI Prism 3130 and GeneMapper ID v.3.2.In addition, the proportions of the individuals were analyzed in the STRUCTURE v.2.33 software.The allele frequency data of the INDG population were obtained by the allele count and compared with data from 5 other continental populations (AFR, AMR, EAS, EUR and SAS) found in the Project 1000 Genomes database (http://www.1000genomes.org;accessed on March 30th, 2022).

DNA Extraction and Preparation of the Exome Library
The blood collection of each participant in this study was carried out using 5 mL tubes.Subsequently, this material was extracted with the Roche Applied Science DNA extraction kit (Roche, Penzberg, Germany), according to the manufacturer's instructions.The samples were quantified in NanoDrop1000 to verify the integrity of the genetic material.The exome library was prepared with the help of the Nextera Rapid Capture Exome kit (Illumina, San Diego, CA, USA) and the SureSelect Human All Exon V6 kit (Agilent, Santa Clara, CA, USA).The sequencing step was developed on the NextSeq 500®(Illumina®, San Diego, CA, USA) using the NextSeq 500 v2 300 high production cycle kit (Illumina®, San Diego, CA, USA).

Bioinformatics Analysis
The quality of the FASTQ reads was analyzed (FastQC v.0.11http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), and the samples were filtered to eliminate low-quality readings (fastx_tools v.0.13).Subsequently, the sequences that showed good quality were mapped and aligned according to a reference genome (GRCh38) using the BWA v.0.7 software.The variants were identified in GATK v. 3

Statistical Analyses
In the statistical analysis, two tests were used: the first was Fisher's exact test, to differentiate the frequencies between the populations of the world.The results obtained were considered statistically significant when p ≤ 0.05.Subsequently, the Wright fixation index (FST) was used to estimate population differentiation.The statistical analyses of this study were developed in the software Arlequin v.3.518 and R Studio to develop the graphic data.

Selection of Variants
Seven genes were used in this study (AQP3, ARHGAP27, ELF5, IFNAR2, LIMD1, OAS1 and UPK1A), selected based on the study by Cruz et al. [9].The variants were selected based on three main criteria: (a) at least 10 coverage readings (fastx_tools v.0.13-http://hannonlab. cshl.edu/fastx_toolkit/;accessed on January 20th, 2022); (b) an allelic frequency described in the continental populations of the 1000 Genomes Project Consortium [15]; (c) variants must have the modifier impact, moderate or high, as classified by SnpEff [16].The SnpEff is a type of classification that evaluates the effect of variants, genes and genetic changes.

Results
In our study, we identified 66 variants distributed in the seven genes (AQP3, ARHGAP27, ELF5, IFNAR2, LIMD1, OAS1 and UPK1A), of these, 7 are present in the AQP3 gene, 14 in ARHGAP27, 6 in ELF5, 14 in the IFNAR2 gene, 11 in the LIMD1 gene, 9 in OAS1 and 5 in the UPK1A gene.Seventeen variants were excluded due to their low coverage (Supplementary Table S1).After going through these quality criteria and impact prediction, we identified 45 variants that were included in the study (Table 1) and three exclusive variants of the indigenous population (Table 2) and compared them with the other world populations described in the 1000 Genomes database (African (AFR), American population (AMR), East Asian (EAS), European (EUR) and South Asian (SAS)).The 45 variants described in Table 1 were characterized by information such as the chromosomal position, SNP ID, nucleotide alteration and classification by SNPeff, excluding the low-impact ones (except the new variants).A high-impact variant was identified in the ARHGAP27 gene (rs201721078) at position 45404053 with a frequency of 21.8% in the INDG population, being rare in the rest of the world.
We also identified three new variants in the Amazonian indigenous people (Table 2).The first was in the AQP3 gene at position 33442882 with a low impact and allele frequency of 8.3%.The second variant was identified in the IFNAR2 gene at position 33262799 with a moderate impact and a frequency of 8.3%, and the third was found in the LIMD1 gene at position 45676789 with a modifier impact and frequency of 6.6%.
In addition, 10 variants of moderate impact were also found in five of the seven genes studied: three in the ARHGAP27 gene (rs2959953, rs12949256 and rs117139057), three in the IFNAR2 gene (rs1051393, rs2229207 and the presence of a new variant), one in LIMD1 (rs267237), two in OAS1 (rs1131454 and rs2660), and one in UPK1A (rs2267586).
Among the variants identified, 17 showed significant differences when compared to other world populations (AFR, AMR, SAS, EAS and EUR), even in East Asians, who have greater genetic similarity with indigenous people.Two variants were identified in the ARHGAP27 gene (rs201721078 and rs2959953), three were present in AQP3 (rs2228332, rs591810 and rs2231231), one in the UPK1A gene (rs2285421), four in OAS1 (rs2660, rs7967461, rs11352835 and rs1051042) and seven were found in the IFNAR2 gene (rs1051393, rs2229207, rs1131668, rs9984273, rs2834158, rs149186597, rs79402470, rs3216172 and rs397789038) (Table 3).The frequencies of the other variants did not show significant differences between the INDG population and the other continental populations (Supplementary Table S2).The genetic differences between the populations (Figure 1) of the study were analyzed using the multidimensional scale graph (MDS), based on the Fisher fixation test of the genetic variants.The MDS identified greater similarity between the INDG and AMR populations, mainly due to the influence of indigenous peoples on the AMR populations.A difference was also identified between EUR, AFR, SAS and the EAS population that showed a greater difference when compared to the other populations.The genetic differences between the populations (Figure 1) of the study were analyzed using the multidimensional scale graph (MDS), based on the Fisher fixation test of the genetic variants.The MDS identified greater similarity between the INDG and AMR populations, mainly due to the influence of indigenous peoples on the AMR populations.A difference was also identified between EUR, AFR, SAS and the EAS population that showed a greater difference when compared to the other populations.

Discussion
GWASs with genetic variants associated with the severity of COVID-19 have been developed in several world populations [7][8][9].However, there is a gap in the information about the investigations of genetic variants related to COVID-19 in indigenous people.

Discussion
GWASs with genetic variants associated with the severity of COVID-19 have been developed in several world populations [7][8][9].However, there is a gap in the information about the investigations of genetic variants related to COVID-19 in indigenous people.Therefore, studies that seek to relate the influence of genetics on the individual response to the disease within these population groups are very important.
These population groups have been suffering from cases of diseases such as measles, flu and tuberculosis [10,17].Factors such as the unique genetic profile, as well as the presence of rare or little-known mutations in indigenous people, can contribute to the incidence of infectious diseases in these populations [17].Geographical isolation and the existence of consanguineous marriages can also favor the differentiation of the allele frequency in these populations, when compared to other world populations [18,19].
A recent study developed by our research group investigated genetic variants present in the genes SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6, XR1 and ABO involved with the severity of COVID-19 in indigenous people [10].The results of the study demonstrated that the variants identified by Ellinghaus et al. [7] were not found in indigenous people.In addition, such data suggested a low genetic variability in these populations, evidencing their unique genetic profile.Another study developed in 34 Spanish hospitals with 11,939 positive cases of COVID-19 identified the relationship of the AQP3, ARHGAP27, ELF5, IFNAR2, LIMD1, OAS1 and UPK1A genes with the severity of the disease.In addition, this study suggested the difference in the severity of COVID-19 between the sexes.Due to the greater propensity to develop a more critical picture of SARS-CoV-2 infection in males, the results of this investigation also indicated the relationship of androgenic hormones with the severity of the disease [9].Such data suggest that the genes AQP3, ARHGAP27, ELF5, IFNAR2, LIMD1, OAS1 and UPK1A, related to sexual differences in Spaniards in the development of more severe forms of COVID-19, may also be involved with androgenic hormonal pathways.
We also identified three new variants that may be potentially related to the severity of COVID-19 among indigenous people of the Brazilian Amazon.The first genetic variant was identified in the AQP3 gene.Some studies have already demonstrated the relationship of AQPs with diseases such as cancer, metabolic syndrome and epilepsy [20][21][22].Despite the limitation of information on AQPs and infectious diseases, the cellular inability to maintain the movement of fluids from the human body can alter homeostasis [23].Therefore, AQPs may be extremely necessary for the control of homeostasis in cases of infectious diseases such as COVID-19 [24].Recently, in a GWAS study, 49 variants were associated with the most severe forms of COVID-19, showing that the AQP3 gene had an intense relationship with the most critical picture of infection, a fact that corroborates other studies [9,25].
A second new variant was also identified in IFNAR2 and the variants rs1051393, rs1131668 and rs12482556 showed high allelic frequencies in the indigenous population.Recent studies have reported the association of the IFNAR2 gene with the most severe forms of COVID-19, as well as the relationship of generic variants with more critical cases of the disease [26][27][28].The OAS1 gene was also identified in the indigenous populations studied and five genetic variants (rs10774671, rs2660, rs7967461, rs11352835 and rs1051042) showed a high allele frequency within this population group.Recent studies point to the association of the OAS1 gene, as well as rs10774671, with the severity of COVID-19 [29].
The third and last new variant was found in LIMD1, a gene with reports of involvement in cellular processes and the progression of diseases such as cancer [30][31][32].However, its relationship with COVID- 19 has not yet been well elucidated and future studies are needed to prove the association of this gene with the severity of the disease.
Finally, a high-impact variant never before associated with COVID-19 was also identified in the ARHGAP27 gene (rs201721078).With this, the results of this study can contribute with important information to assess the risk of developing more severe forms of COVID-19 in indigenous populations of the Amazon.

Conclusions
This study investigated genes potentially related to the severity of COVID-19 in an Amazonian indigenous.The results found in this study suggest the urgency of more effective research that proves the impact of these new and high-impact variants in patients with SARS-CoV-2 infection in the indigenous populations of the Amazon, aiming to elucidate the biological role of these variants in the severity of COVID-19 in indigenous people and contributing to the development of personalized medicine that respects the particularities of the studied population.

Figure 1 .
Figure 1.Differences in the allelic frequencies of the variants studied in the indigenous population and the continental populations, plotted in multidimensional scaling plot (MDS).

Figure 1 .
Figure 1.Differences in the allelic frequencies of the variants studied in the indigenous population and the continental populations, plotted in multidimensional scaling plot (MDS).

Table 1 .
Description of the variants according to the high, moderate and modifier impact present in the AQP3, ARHGAP27, ELF5, IFNAR2, LIMD1, OAS1 and UPK1A genes.

Table 2 .
Description of new variants.
a Reference Allele; b Variant Allele; * variants without described SNP.

Table 3 .
Significant difference in allele frequencies of the indigenous population and in the world population (AFR, AMR, EUR, EAS and SAS).