Molecular Epidemiology in Amerindians of the Brazilian Amazon Reveals New Genetic Variants in DNA Repair Genes

Native American populations from the Brazilian Amazon have a low genetic diversity and a different genetic profile when compared to people from other continents. Despite this, few studies have been conducted in this group, and there is no description of their genetic data in the various currently existent international databases. The characterization of the genomic profile of a population not only has an impact in studies of population genetics, but also helps to advance diagnostic and therapeutic response studies, leading to the optimization of clinical applicability. Genetic variations in DNA repair genes have been associated with the modulation of susceptibility to various pathologies, as well as in their prognosis and therapy. This is the first study to investigate DNA repair genes in Amerindians from the Brazilian Amazon region. We investigated 13 important DNA repair genes in the exome of 63 Native Americans, comparing our results with those found in 5 continental populations, whose data are available in the Genome Aggregation Database. Our results showed that 57 variants already described in literature were differentially distributed in the Amerindian populations in relation to the continental populations, 7 of which have significant clinical relevance. In addition, 9 new variants were described, suggesting that they are unique to these populations. Our study reinforces the understanding that the Amazonian Native American population presents a unique genetic profile, and our findings may collaborate with the creation of public policies that optimize the quality of life of these groups as well as the Brazilian population, which presents a high degree of interethnic mixing with Amerindian groups.


Introduction
Genome integrity depends not only on faithful replication, but essentially on proper repair of the data in DNA throughout its synthesis and processing. It is estimated that approximately 10,000 DNA bases are chemically modified every day in each cell [1,2]. These modifications can be spontaneous or induced by chemicals or radiation, causing genomic mutations such as nucleotide substitutions, amplifications, deletions, rearrangements, or chromosomal loss [3,4]. In order to avoid these mutations, cells have evolved different DNA repair mechanisms to respond to specific damage and prevent mutagenesis [5][6][7]. There are at least four DNA repair mechanisms that have been well-described in humans: base excision repair (BER), nucleotide excision repair (NER), homologous recombination repair (HRR), and the mismatch/mismatch repair (MMR) pathway [8][9][10][11].
Given the importance of their functions, several studies have associated genes of DNA repair pathways and susceptibility to diseases such as xeroderma pigmentosum [12,13], cockayne syndrome and trichothiodystrophy [13], and cancer [14][15][16][17]. Additionally, when it comes to cancer, polymorphisms in these genes have been shown to be linked not only to its development, but also to the mechanism of resistance to pharmacological treatment [16,[18][19][20][21].
Despite the amount of information available in literature regarding imbalances in gene expression of DNA repair genes, there are no studies reporting the frequency of mutations in these genes in Amerindian populations, especially in the Amazon. However, studies of precision medicine have demonstrated the importance of screening not only genetically homogeneous populations, such as the European one, but also Amerindian people distributed worldwide. It has been shown that populations with a high prevalence of Amerindian genomic ancestry have an increased predisposition to develop acute lymphoblastic leukemia, stomach cancer, and tuberculosis [22][23][24][25].
Thus, genomic studies on DNA repair genes in Amerindian populations may provide information on their molecular profile and may potentially discover new variants. The findings from molecular epidemiology studies may also promote the investigation of clinical implications regarding the molecular markers described, allowing them to be used as diagnostic and treatment tools for the Amerindian population, as well as for the admixed populations with marked Amerindian ancestry, such as the Brazilian one [26]. Finally, findings from this type of data are the basis for inferences about human evolutionary history [27]. The aim of this study is to characterize the molecular profile of genes present in DNA damage repair pathways in Amerindian populations from the Brazilian Amazon, and to compare these findings with data from continental populations described in the Genome Aggregation Database (gnomAD).

Study Population and Ethics
The study was approved by the National Research Ethics Committee (CONEP; available at: http://conselho.saude.gov.br/comissoes-cns/conep/) and by the Research Ethics Committee of the Tropical Medicine Center of the Federal University of Pará (CAE: 20654313.6.0000.5172). All individuals and community leaders signed an informed consent form.
The Amerindian population data were compared with representatives of five continental populations obtained from the Genome Aggregation Database (available at: https: //gnomad.broadinstitute.org/; accessed on 15 February 2022), a public catalog of human variation and genotype data. This sample included 8128 individuals from Africa (AFR), 56,885 from Europe (EUR non-finish), 17,296 from the Americas (AMR), 9197 from East Asia (EAS), and 15,308 from South Asia (SAS).

Extraction of the DNA and Preparation of the Exome Library
DNA was extracted from a peripheral blood sample using the phenol-chloroform method described by Sambrook et al. [28]. To quantify the genetic material, the Nanodrop-8000 spectrophotometer (Thermo Fisher Scientific Inc., Wilmington, DE, USA) was used and its integrity was evaluated by 2% agarose gel electrophoresis.
After alignment, the generated file was indexed and sorted (SAMtools v. All statistical analyses were performed using the R Studio v.3.5.1 program (R Foundation for Statistical Computing, Vienna, Austria), including the discriminant analysis of principal components (DAPC). Significant differences in allele frequencies between populations were analyzed by Fisher's exact test. The false discovery rate (FDR) proposed by Benjamini and Hochberg [29] was used to correct the multiple analyses. Results were considered statistically significant when the p-value was less or equal than 0.05 (p ≤ 0.05).

Selection of the Genetic Variations
We analyzed 13 gene components of 4 different DNA repair pathways in humans (base excision repair (BER), nucleotide excision repair (NER), homologous recombination repair and mismatch repair). A total of 432 variants were found, to which the following selection criteria were applied: (I) the read should be high-coverage, with a minimum of 10 reads (fastx_tools v.0.13-http://hannonlab.cshl.edu/fastx_toolkit/; accessed on 25 March 2022); (II) the predicted impact should be "modifier", "moderate", or "high" according to the software SNPeff (https://pcingola.github.io/SnpEff/; accessed on 25 March 2022); (c) the difference in allelic frequency of the variants between NAT populations and continental populations should be significant (p-value ≤ 0.05).

Results
After applying our selection criteria to the data from the complete exome of the 63 Amerindians investigated, 55 genetic variants were analyzed, which were distributed in 13 genes composing the DNA repair pathways in humans: DNA2, NEIL1, NEIL2, NEIL3, TOP3A, XPC, XRCC1, XRCC3, ERCC1, ERCC2/XPD, ERCC5, MSH3, and MSH4. Table 1 presents the genetic variants and their respective specifics: chromosomal location, genomic position, the wild allele and the mutant allele, the detailed genomic region, gene, and SNPId. We also find the allele frequency data for the world populations investigated here, and the respective allele frequency calculated for variants in the Amerindian population (NAT).
Thirteen of the investigated variants were differentially distributed in the NAT population when compared to the five continental populations investigated, among them: rs10823209 (DNA2); rs7689099 (NEIL3); rs2294913 (TOP3A); rs3212038, rs1799796, rs861531, and rs861537 (XRCC3); rs6151734 and rs1105524 (MSH3); rs3765682 (MSH4); rs907187, rs2293464, and rs1805404 (PARP1). Overall, for the other SNVs presented, the Amazonian indigenous population differed from two or three of the investigated world populations. Table 1. Allele frequencies of the variants investigated in the Amerindian individuals (NAT) and continental populations (African (AFR), American population (AMR), East Asian (EAS), European (EUR), and South Asian (SAS)) described in the gnomAD database.  Figure 1 shows a discriminant analysis of principal components (DAPC) scatterplot of the six populations analyzed. The DAPC analysis allows for the visualization of well-defined clusters according to the similarity or difference of the frequency distributions of each investigated SNV. Figure 1 demonstrates a significant distance between the Amerindian population (NAT) and the African population.  Figure 1 shows a discriminant analysis of principal components (DAPC) sc of the six populations analyzed. The DAPC analysis allows for the visualization defined clusters according to the similarity or difference of the frequency distrib each investigated SNV. Figure 1 demonstrates a significant distance between the dian population (NAT) and the African population.  Table 2 shows the pairwise analysis between the allele frequencies in Ame and each of the five continental populations of the gnomAd database for each v the repair genes investigated, seven of which have a clinical impact for the deve of some pathology or on the therapeutic efficacy of some pathway in question a to the ClinVar database (National Center for Biotechnology Information-NC three of which have a high impact according to the SNPeff software. For r (XRCC3) and rs184967 (MSH3), the NAT population showed a significantly diffe tribution to the frequencies in relation to American and European populatio SNVs being respectively associated with susceptibility to breast cancer and he cancer predisposition syndrome.  Table 2 shows the pairwise analysis between the allele frequencies in Amerindians and each of the five continental populations of the gnomAd database for each variant of the repair genes investigated, seven of which have a clinical impact for the development of some pathology or on the therapeutic efficacy of some pathway in question according to the ClinVar database (National Center for Biotechnology Information-NCBI), and three of which have a high impact according to the SNPeff software. For rs1799796 (XRCC3) and rs184967 (MSH3), the NAT population showed a significantly different distribution to the frequencies in relation to American and European populations, these SNVs being respectively associated with susceptibility to breast cancer and hereditary cancer predisposition syndrome.
Supplementary Table S1 shows the same analysis as Table 2 on the 48 markers that are statistically significant, but not clinically significant according to the classification available in the ClinVar database (NCBI), nor classified as high-impact. Finally, Table 3 describes the chromosomal/genomic position data, the wild and mutant alleles, the impact, type of genetic variation, protein exchange that the mutation may entail, detailed region, and repair pathway that these markers comprise in nine novel genetic variants found in the investigated Amerindian populations. These variants have never been described in the literature and are present in DNA2, PARP2, TOP3A, ERCC2, ERCC5, and MSH3 genes, of which one has a predicted high impact, one has a modifier impact, and the others have a moderate impact. For all missense variants, we used two other impact prediction tools (SIFT and PolyPhen). Table 4 describes this data for each possible gene transcript in the investigated genomic region.

Discussion
The study of genetic variations in repair genes has aided the understanding of various pathological mechanisms, describing risks associated with both individual and population genotypes based on the exposure of cells to xenobiotics that lead to genomic instability [10,[30][31][32][33][34][35][36][37].
The past process of global colonization is known to have played an important role in defining current patterns of genetic diversity, and it partially explains geographic variation in susceptibility to certain complex diseases [38,39]. Thus, predisposition to certain illnesses may have an intrinsic relationship with genomic ancestry [40][41][42]. Recent investigations have shown that Brazilian Amerindian populations have a unique genetic profile, yet it remains undescribed in major population databases [23,43,44].
Therefore, we analyzed the complete exome of 13 DNA repair genes never previously investigated in Amerindian populations, representative of tribes from the Brazilian Amazon. Our description of frequencies and our DAPC analysis demonstrated that the African, South Asiatic, and Amazonian Native American populations were positioned at opposite extremes, being the most genetically distinct regarding the investigated variants, corroborating with findings on the history of human populations [45]. Our results also showed that at least seven of the investigated variants had a high clinical impact, both in disease predisposition and in modulating therapeutic response. The markers rs1799796 of the XRCC3 gene and rs184967 of the MSH3 gene were shown to be statistically significant regarding their distribution in the NAT population when compared to the AMR and EUR ones. The XRCC3 gene encodes a protein involved in the HRR pathway and in the repair of strand breaks caused by X-rays [11,46]. Based on the function of XRCC3, mutations in this gene are related to the development of various neoplastic types, such as osteosarcoma [47], bladder cancer [48], and thyroid cancer [9], among others [11]. The rs1799796 investigated here is primarily associated with susceptibility to breast cancer, an association reinforced by the investigation of Niu and colleagues (2021), who performed a meta-analysis of 13 major studies previously published in literature [49][50][51].
The MSH3 gene is part the MMR system, whose function is to repair DNA after crosslinking chains and recognize and correct base-base mismatches and insertion/deletion loops generated during DNA replication and homologous recombination [11,52]. The combination of genetic variants in this gene can disrupt cellular responses to DNA damage, directly influencing an individual's sensitivity to carcinogens, so that mutations in the MSH3 gene can modulate everything from carcinogenesis and metastatic progression to therapeutic response [17,53]. The rs184967 (c.2846A>G; Gln940Arg) was associated with increased risk of proximal colon cancer (p = 0.005), as well as a worse progression-free survival [54]. This polymorphism is also associated with familial breast cancer [55,56] and colon and rectal cancer [54].
Finally, we found nine new variants, among which three are INDEL-like, and onewhich causes a reading matrix change-has an estimated high clinical impact. This mutation is located in the TOP3A gene, responsible for homologous recombination-mediated repair of double-stranded DNA during DNA synthesis, an essential component in the mitochondrial DNA replication process [57]. The remaining novel variants found are distributed among four important DNA repair pathways in humans (Table 3). This data reinforces the importance of studying Amerindian populations, to discover the clinical impact of these mutations, since they are found in critical genes for maintaining genomic integrity. The advantages of studying populations with low genetic diversity such as ours for screening complex diseases are the high degree of linkage disequilibrium, reduced haplotype complexity, and greater potential for identifying rare variants [58,59]. Kuhn and collaborators in 2012 conducted a study analyzing the genomic profile of the Xavante tribe with other populations, including Brazilian populations, and demonstrated that the indigenous population investigated remained genetically isolated, potentially providing a unique opportunity for hereditary disease-mapping studies [22]. This is the first study to investigate DNA repair genes in Amerindian populations from the Brazilian Amazon, which are genetically unique and not yet described in any of the available databases on human genetic variability. Knowledge of different patterns in human genetic diversity is important in many areas of medical genetics, and it can be used as a tool to maximize understanding regarding susceptibility, diagnosis, prognosis, and therapeutic management for Native American populations, as well as for populations with high Amerindian ancestry, such as the Brazilian one.

Conclusions
Our study reinforces the understanding that the Amazonian Native American population presents a unique genetic profile. The characterization of repair genes in this populations is an important tool for future studies regarding their association with complex diseases in these populations and also in ones with a high degree of admixture with these groups. Furthermore, our data may contribute to the creation of public policies that optimize the quality of life of Amerindian populations investigated here.