Characterization of DNA Polymerase Genes in Amazonian Amerindian Populations

Due to their continuing geographic isolation, the Amerindian populations of the Brazilian Amazon present a different genetic profile when compared to other continental populations. Few studies have investigated genetic variants present in these populations, especially in the context of next-generation sequencing. Knowledge of the molecular profile of a population is one of the bases for inferences about human evolutionary history, in addition, it has the ability to assist in the validation of molecular biomarkers of susceptibility to complex and rare diseases, and in the improvement of specific precision medicine protocols applied to these populations and to populations with high Amerindian ancestry, such as Brazilians. DNA polymerases play essential roles in DNA replication, repair, recombination, or damage repair, and their influence on various clinical phenotypes has been demonstrated in the specialized literature. Thus, the aim of this study is to characterize the molecular profile of POLA1, POLE, POLG, POLQ, and REV3L genes in Amerindian populations from the Brazilian Amazon, comparing these findings with genomic data from five continental populations described in the gnomAD database, and with data from the Brazilian population described in ABraOM. We performed the whole exome sequencing (WES) of 63 Indigenous individuals. Our study described for the first time the allele frequency of 45 variants already described in the other continental populations, but never before described in the investigated Amerindian populations. Our results also describe eight unique variants of the investigated Amerindians populations, with predictions of moderate, modifier and high clinical impact. Our findings demonstrate the unique genetic profile of the Indigenous population of the Brazilian Amazon, reinforcing the need for further studies on these populations, and may contribute to the creation of public policies that optimize not only the quality of life of this population, but also of the Brazilian population.


Introduction
The human genome encodes important information to protect its integrity [1,2]. We know that DNA replication is carried out accurately and efficiently, essentially due to enzymes called DNA polymerases, which not only synthesize a daughter molecule via a DNA template molecule, but also have the ability to "check their work" immediately after that synthesis, repairing base mismatches in a process called proofreading [2,3]. Despite this, the frequency of errors made by eukaryotic DNA polymerases is estimated to be approximately one error for every 10 4 -10 5 nucleotides during each cellular S phase, partially because of environmental stressors such as ultraviolet sunlight, lifestyle habits such as smoking, or dietary factors [4,5]. Additionally, a large proportion of DNA changes are inevitably caused by endogenous mutagens, including reactive oxygen species and the

Study Population and Ethics
The study was approved by the National Research Ethics Committee (CONEP; available at: http://conselho.saude.gov.br/comissoes-cns/conep/ accessed on 5 October 2022) and by the Research Ethics Committee of the Tropical Medicine Center of the Federal University of Pará (CAE: 20654313.6.0000.5172). Because we are dealing with Amerindian groups, the leaders of the investigated communities were contacted, and they signed an informed consent form for participation in the study.
For comparison analyses we also used data from 5 continental populations available in the Genome Aggregation Database (available at: https://gnomad.broadinstitute.org/; accessed on 15 February 2022). Thus, we included data from 8128 individuals from Africa (AFR); 56,885 from Europe (EUR non-finish); 17,296 from the Americas (AMR); 9197 from East Asia (EAS) and 15,308 from South Asia (SAS).
Additionally, we included the population described in the Brazilian Online Mutation Archive (ABraOM-available at: https://abraom.ib.usp.br; accessed on 5 October 2022), which represents the population of southeastern Brazil, specifically residing in the state of São Paulo.

Extraction of the DNA
We collected peripheral blood from the individuals participating in the study. After that, DNA was extracted using the phenol-chloroform technique, described by Green and Sambrook et al. [27]. To quantify the genetic material extracted per sample, we used the Nanodrop-8000 spectrophotometer (Thermo Fisher Scientific Inc., Wilmington, DE, USA).
We aligned the sequences found with the GRCh38 genome, which is commonly used as a reference in next-generation sequencing studies.

Statistical Analysis
The allele frequencies of the NAT population were obtained by counting in the population and calculating the frequency. The difference in frequencies between the investigated populations (NAT, ABraOM, FR, EUR, AMR, EAS and SAS) was analyzed by Fisher Test. All statistical analyzes were performed using the R Studio v.3.5.1 program (R Foundation for Statistical Computing, Vienna, Austria). The False Discovery Rate (FDR) was used to correct the multiple analyses.

Selection of Genetic Variants
We selected 5 component genes of the DNA replication and repair pathway in humans. The following selection criteria were applied to the exome sequencing result to perform the pairwise analysis: (I) the read should be high-coverage, with a minimum of 10 reads coverage; (II) the predicted impact should be "modifier", "moderate" or "high" according to SNPeff (https://pcingola.github.io/SnpEff; accessed on 25 June 2021); (c) the difference in allelic frequency of the variants between NAT populations and continental populations should be significant (p-value ≤ 0.05). After applying selection criteria, 55 genetic variants were statistically analyzed.

Results
Our results identified 432 variants in total, with only 45 remaining after our selection criteria. Table 1 shows those 45 variants with their respective allele frequencies in the Amerindian, Continental and Brazilian populations. The chromosomal position, the mutant allele (variant), the wild allele (reference) and the gene are also described in the table. When analyzing the frequencies quantitatively, the Indigenous population (NAT) was more similar to the Latin American population (AMR) in most of the markers described. On the other hand, the African population (AFR) had the most different allele frequencies from the NAT population in the data below. Regarding the southeast Brazil population, represented by ABraOM data, 14 single nucleotide variants (SNV) stand out with significantly different frequency when compared to the Amazonian Amerindian population (rs4883537, rs4883538, rs4883543, rs5744750, rs4883544, rs4883613 and rs4883555, of the POLE gene; rs2072267l, rs2307433 and rs2307438 from POLG gene; rs3218636, rs3218651 and rs61757738 from POLQ gene; rs11376056 from REV3L gene).
We emphasize that the frequencies of some variants identified in this work and attributed to the Indigenous population have never been calculated or described for other populations. Thus, Table 2 shows new and possibly unique variants for the Amerindian populations investigated. In the table, one can observe the eight new variants related to their respective gene, chromosomal position (Chrom) in the genome with detailed region, mutant allele (Variant), wild allele (Reference), the impact of the mutation, the type, changes it generates in protein formation and the type of change produced (Var type). Among the eight new variants, one is an insertion/deletion polymorphism (INDEL), with high impact, three are SNVs with modifier impact and four are missense SNVs with moderate impact.
Regarding the pairwise comparison of frequencies of the mutations described between NAT and each of the populations investigated (including the five continental populations and the southeastern Brazilian population described in ABraOM), we obtained a total of 14 variants that met our selection criteria. Table 3 shows where this analysis was statistically significant, such results are highlighted in bold. Thus, we can see that there was statistical difference for three variants of the POLE gene, two of which have allele frequency in the NAT population different from the European, South Asian, and southwestern Brazilian populations (rs4883544 and rs4883613), and one of them differs only from the EUR and ABraOM populations (rs5744761). For the REV3L gene, the rs462779 variant in the Indigenous population differed significantly from the AMR, SAS and ABraOM populations. Notably, INDEL rs3087377 of the POLG gene stood out among all the variants analyzed, as its frequency in the Amerindian population is statistically different from four of the five continental populations investigated in the 1000 Genomes database (AMR, EAS, EUR and SAS) and the Brazilian population of ABraOM (p-Value AMR = 0.0365; p-Value EAS = 0.0366; p-Value EUR = 0.0004; p-Value SAS = 0.0006; p-Value ABraOM = 0.058), thus being the marker that most demonstrated genetic differentiation of the population studied when compared to the others.
Regarding the distribution of these markers among populations, we visualized that the population with greatest difference from the NAT population was the Europeans, with a total of seven markers, all with high statistical significance (rs4883537, rs4883538, rs4883543, rs4883544, rs5744761, and rs4883613 in the POLE gene; rs3087377 in the POLG gene). On the other hand, the population that differed least from the Amazonian Indigenous was the African population, with only rs11573344, present in POLA1 (p-Value = 0.0134). It is worth noting that all variants described here show modifier or moderate impact, according to the prediction of the SNPeff software, which can signify an important phenotypic impact in the individual with these variants. Supplementary Table S1 shows the same analysis as in Table 3 performed on the markers that were not statistically significant. Figure 1 illustrates only the SNVs that were statistically significant in Table 3. We show a mosaic plot for each variant, with data on the percentage distribution of the wildtype allele ("Reference" or "Ref") and the mutant allele ("Variant" or "Var"), for each investigated population.

Discussion
In humans, replication occurs in a coordinated manner with DNA repair mechanisms, which operate in cells to remove or indicate tolerance to DNA damage [28]. DNA polymerases are enzymes that play an important role in both processes, as they function as signaling pathways in cell cycle checkpoint pathways, pointing out the need for a pause in cell division until damaged DNA is repaired and replication is completed [29,30]. Previous studies available in the literature have linked genetic variations in DNA polymerase genes to biological dysfunctions, which are capable of primarily triggering oncogenesis and advancing tumor progression in various tissues [7,[31][32][33][34]. The study by Tomasetti et al. demonstrated that, for some tumor types, 77% of mutations in cancer driver genes can

Discussion
In humans, replication occurs in a coordinated manner with DNA repair mechanisms, which operate in cells to remove or indicate tolerance to DNA damage [28]. DNA polymerases are enzymes that play an important role in both processes, as they function as signaling pathways in cell cycle checkpoint pathways, pointing out the need for a pause in cell division until damaged DNA is repaired and replication is completed [29,30]. Previous studies available in the literature have linked genetic variations in DNA polymerase genes to biological dysfunctions, which are capable of primarily triggering oncogenesis and advancing tumor progression in various tissues [7,[31][32][33][34]. The study by Tomasetti et al. demonstrated that, for some tumor types, 77% of mutations in cancer driver genes can be attributed to errors during DNA replication; which can inhibit the expression of a gene important for successful replication or inhibit cell death, a well-known cancer hallmark [7,35].
Although it is understood that the replication and repair gene pathways are determinants for good biological functioning, no studies have screened these genes in Brazilian Amerindian populations. In fact, there is a gap in the knowledge of their genetic and epidemiological component, which is noteworthy, since there are approximately 355,000 Amerindians in 383 demarcated lands in the country [21]. Studies have shown that the settlement of the Indigenous populations was important in determining the number of gene lineages and founder haplotypes observed in these populations, as it triggered a population reduction [36]. Therefore, the genetic structure of contemporary Indigenous populations is relevant to the distribution of complex diseases and rare Mendelian diseases because most communities constitute relatively small and semi-independent gene pools [36]. It is also known that some deleterious mutations are shared among Amerindian populations, either by virtue of a common founding history or by a more recent genetic exchange [36][37][38].
In addition, the lack of genomic knowledge about Brazilian Indigenous populations negatively impacts the knowledge accumulated about the Brazilian population itself, since it has a high degree of interethnic admixture with these Indigenous groups [39][40][41]. Recent studies have shown that a high degree of Amerindian genomic ancestry is associated with the modulation of predisposition and therapeutic response in certain pathologies [25,[42][43][44].
Thus, we carried out the characterization of the molecular profile of POLA1, POLE, POLG, POLQ and REV3L genes, involved in DNA replication and repair pathways in Amerindian populations from the Brazilian Amazon, comparing these findings with ge-nomic data from five continental populations described in the gnomAD database, and with data from the Brazilian population described in the ABraOM database.
Our data demonstrated that the Amerindian populations investigated present variants with allele frequencies that diverge not only from the five continental populations, but also from the population of southeastern Brazil, reinforcing the fact that such groups present a low genetic variability, so that studies with data generated for other world populations may not be extrapolated or applicable to them [26,44,45].
A study published by our research group addressing north Brazilian Amerindian populations and DNA repair genes revealed nine new variants unique to this population, among which some had predicted high and modifier clinical impact [26]. In addition, this study demonstrated that the variants already described had high allele frequency in the Indigenous population, seven of which were associated with modulation in disease predisposition or therapeutic response, demonstrating that knowledge of different patterns in human genetic diversity is important in many areas of medical genetics, and it can be used as a tool to maximize the understanding of susceptibility, diagnosis, prognosis, and therapeutic management for Indigenous populations [26].
In this present study, we also found new variants never described in other populations, possibly unique to NAT populations. Among them, three had clinical impact predicted as modifier by SNPeff software (POLA1, POLE and REV3L genes) and one INDEL type had high clinical impact, present in the POLQ (Polymerase θ) gene, involved in translesion DNA synthesis (TLS), an important pathway that contributes to cell survival by bypassing DNA lesions that have not been repaired by other processes [12,34,46,47]. TLS is performed by seven polymerases that are recruited to the stalled replication forks, allowing damaged cells to complete genome replication, modulating the clinical impact caused by DNA errors in the body. Among the polymerases involved, we find POLQ and REV3L, investigated here [12,34].
Our results also showed that the modifier variants rs4883544 and rs4883613 of the POLE gene were significantly different regarding their distribution in the NAT population when compared to EUR, SAS and ABraOM. DNA POLE (Polymerase ε) is an essential enzyme for successful replication to occur, and also has a 3 -5 exonuclease activity, which corrects errors in DNA synthesis and helps maintain genomic stability [48]. Genome sequencing applied to cancer patients demonstrates that 3% of colorectal cancers and 7% of endometrial cancers contain mutations involving the exonuclease domain of POLE and are associated with high levels of single-nucleotide polymorphisms (SNPs) in this gene [10,11,49].
Finally, rs3087377 of the POLG gene was the SNV that stood out the most among our results, as it was differentially distributed in the NAT population when compared to four of the five continental populations investigated (p-Value AMR = 0.0365, p-Value EAS = 0.0366, p-Value EUR = 0.0004, p-Value SAS = 0.0006), as well as when compared to the Brazilian population described in the ABraOM (p-Value = 0.058). POLG (Polymerase γ) is considered the replicative as well as the mitochondrial DNA repair polymerase [50]. Different mutations in the POLG gene result in a wide variety of clinical phenotypes such as seizures, neurodegenerative disorders, and liver dysfunction [51][52][53].
This is the first study to characterize the molecular profile of POLA1, POLE, POLG, POLQ and REV3L genes in Amerindian populations from the Brazilian Amazon, a population whose genetic component has not yet been described in any database of human genetic variability. Molecular epidemiology studies such as this are the basis for inferences about human evolutionary history, so the lack of such data creates a gap in the understanding of several processes investigated by population genetics [26,54,55]. Understanding the genetic variability of Amerindians in genomic studies can aid the investigation of important molecular markers for clinical practice, and it can also generate a great scientific and public health impact for these people and admixed populations with this ethnic group, such as the Brazilian population in general. Thus, we also aim to collaborate with the creation of public policies able to optimize the quality of life of the Amerindian populations from the Brazilian Amazon.
Despite the importance of our data, new studies should be carried out in order to broaden the understanding of the genetic profile of these populations. We also suggest case-control studies to validate the investigations performed in Amerindian populations or uncover new insights regarding the clinical impact of the genetic variations described here.

Conclusions
Ethnic groups located in the Brazilian Amazon have differential distribution of important molecular markers present in DNA replication and repair pathways. Our findings may contribute to future studies of association between complex diseases in these populations and in the Brazilian population itself. The Amerindian populations investigated present a unique genetic profile, with genetic variants never previously described in other populations. We hope that these data can help establish public policies for this group and for populations admixed with them, such as the Brazilian population.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes14010053/s1, Table S1: Description of non-significant results of comparison between the allele frequency of Amerindian populations (NAT), the Brazilian population described in ABraOM and continental populations (African (AFR), American (AMR), East Asian (EAS), European (EUR) and South asian (SAS)) described in the gnomAD database.

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study, as described in the Material and Methods section of the manuscript.

Data Availability Statement:
The authors confirm that the data supporting the findings of this study are available within the article and its Supplementary Material. Raw data of the studied genes are available from the corresponding author, upon reasonable request.