Characterization of PCLO Gene in Amazonian Native American Populations

Genetic variations in PCLO have been associated with different pathologies in global literature, but there are no data regarding this gene in Native American populations. The Amazonian Native American populations have lower genetic diversity and are more different from other continental groups. We investigated 18 genetic variants in the PCLO gene in Amazonian indigenous and compared our results with the ones found in global populations, which were publicly available in the 1000 Genomes Project, gnmAD and ABraOM databases. The results demonstrated that the variants of the PCLO, especially rs17156844, rs550369696, rs61741659 and rs2877, have a significantly higher frequency in Amerindian populations in comparison with other continental populations. These data outline the singular genetic profile of the Native American population from the Brazilian Amazon region.


Introduction
The PCLO gene on chromosome 7q21.11 (GRCh38) encodes a presynaptic cytomatrix protein called piccolo (piccolo presynaptic cytomatrix protein), which is involved in the establishment of active synaptic zones and in synaptic vesicle trafficking [1][2][3]. PCLO is studied in neuroscience, as it plays a role in modulating monoaminergic transmission, e.g., transmission of serotonin (5-HT), adrenaline, noradrenaline and dopamine [4,5]. Piccolo is highly expressed in brain and adrenal glands, as well as adipose tissue, gallbladder, pancreas, stomach, thyroid and esophagus [6][7][8]. The unstable gene expression and genetic variation caused by mutations in PCLO have been associated with different pathologies in global literature, such as major depressive disorder [9,10], bipolar disorder [11], cancer [5] and diabetes [12,13]. There were few studies investigating the genetic profile of PCLO in general populations, but there are no data regarding this gene in Amerindian population. Information present in clinical databases-such as ClinVar, from National Center for Biotechnology Information (NCBI; accessed on 13 January 2022, at: https://www.ncbi.nlm.nih.gov/clinvar/) and Brazilian Initiative on Precision Medicine (https://bipmed.org/, accessed on 13 January 2022)-regarding this gene are also rare.
In the latest census in 2010, the Brazilian government estimated the Native Amerindian population in approximately 897,000 individuals with 305 ethnicities. This population has been reported in all states in Brazil; however, they are highly concentrated in the north of the country, within the Amazon region [14]. Great efforts have been made towards understanding the social and cultural pattern of this group; however, little research has been done on their genetic profile. In addition, genetic studies have indicated that highly mixed populations-particularly those with high Amerindian ancestry-are more susceptible to certain diseases, such as cancer and tuberculosis [15][16][17][18]. The investigation of genetic variants Single Nucleotide Polymorphisms (SNPs) and insertion/deletion polymorphism (INDELs) enable the discovery and the characterization of new molecular markers in this population, and also validates the knowledge from previous studies in other world populations. These genetic variations in PCLO gene can be compared intra-and interpopulation. Additionally, the molecular markers may have clinical applications, working as diagnosis and treatment tools for the Amerindian population and mixed populations with strong Amerindian ancestry, as in Brazil [17].
Therefore, this study aimed to characterize the PCLO molecular profile in Amerindian populations from the Brazilian Amazon, and to compare these findings with data from the general Brazilian population described in ABraOM, as well as with the five continental populations available in the 1000 Genomes Project and Genome Aggregation Database (gnomAD).

Study Populations
The study was approved by the National Research Ethics Committee (CONEP; available at: http://conselho.saude.gov.br/comissoes-cns/conep/; accessed on 13 January 2022) and by the Research Ethics Committee of the Tropical Medicine Center of the Federal University of Pará (CAE: 20654313.6.0000.5172). All individuals and community leaders signed an informed consent form (TCLE).
The Amerindian population data were compared with representatives of five continental populations obtained from the 1000 Genomes Project, a public catalog of human variation and genotype data (available at: http://www.1000genomes.org; accessed on 13 January 2022). This sample included 661 individuals from Africa (AFR), 503 from Europe (EUR), 347 from the Americas (AMR), 504 from East Asia (EAS) and 489 from South Asia (SAS).
In order to use data with a larger sample, we also added to our analysis population data available in the Genome Aggregation Database (available at: https://gnomad.broadin stitute.org/), for the five continental populations investigated (AFR, EUR, AMR, EAS and SAS). Finally, the Native American population of the Amazon was also compared in our analyzes with the Brazilian population, which were obtained from the ABraOM database, (ABraOM, Sao Paulo, Sao Paulo; available at: https://abraom.ib.usp.br/) a repository containing genomic variants gathered by sequencing the complete exome of individuals from São Paulo, the largest city of Brazil, located in the Southwest Region. All databases were accessed on 13 January 2022 to extract genomic data.

Extraction of the DNA and Preparation of the Exome Library
DNA was extracted from a peripheral blood sample using the phenol-chloroform method described by Sambrook et al. [19]. The Nanodrop-8000 spectrophotometer (Thermo Fisher Scientific Inc., Wilmington, DE, USA) was used to quantify the genetic material and its integrity was evaluated by 2% agarose gel electrophoresis.
After alignment to the reference, the generated file was indexed and sorted (SAMtools v.

Statistical Analyses
All statistical analyzes were performed using the R Studio v.3.5.1 program (R Foundation for Statistical Computing, Vienna, Austria), including the Discriminant Analysis of Principal Components (DAPC). Significant differences in allele frequencies between populations were analyzed by Fisher's exact test. The False Discovery Rate (FDR) proposed by [20] was used to correct the multiple analyses. Results were considered statistically significant when the p-value was less than 0.05 (p ≤ 0.05).

Results
A total of 18 variants were identified in the Amazonian NAT, with 16 SNVs (Single Nucleotide Variants) and two INDELs (Insertion-Deletion). Based on the location and the type of the genetic modifications, 12 variants of PCLO were classified as moderate impact, two variants as modifier impact and four as low impact variants. Table 1 summarized characteristics of the markers investigated in NAT and in the populations described in the 1000 Genomes Project, gnomAD and ABraOM databases. Table 2 shows the detailed allelic frequencies of the investigated markers in PCLO. Frequency data of each variant in the five continental populations were taken from the 1000 Genomes project database and the frequency of markers in the Brazilian population was taken from ABraOM, so that we could compare them to the frequencies of NAT populations. Six of the eighteen variants were not found in NAT populations. The rs550369696 variant was found only in NAT population. The rs762371134 have no frequency data in the 1000 Genomes Project, so the frequency data presented in this database are the result of the gnomAD analysis. Thus, we represented their frequencies by "ND", which means "no data" for this variant.   Table 3 shows the detailed frequencies of the investigated markers in the PCLO gene according to the data described in gnomAD browser. Table 3 also presents the frequencies for the same markers in the Brazilian population, which is described in the ABraOM database. For the American population of the 1000 Genomes project, three of the investigated variants showed significant difference from the NAT population (rs17156844, rs550369696 and rs2877) ( Table 4). We would like to highlight that SNV rs550369696 is distributed differentially in the NAT population when compared to all continental populations investigated. Likewise, when we highlight the SNV rs17156844, the results show that, for the six populations investigated, five statistically differ from the NAT population, including the ABraOM population. For rs2877, the number of markers decreases to four, with three continental populations (AMR, AFR and EUR) added to the ABraOM population. When we compare NAT-AFR, we can see that the population in which the PCLO gene markers differed the most was the African one, with 10 of the 18 having frequencies that are notably higher in Native American populations (rs17156844, rs2715150, rs550369696, rs2877, rs976714, rs2522833, rs10261848, rs28680905, rs17148149 and rs12668093). When we analyzed the results from gnomAD, we noticed some differences with the data presented above for the 1000 Genomes Project (Table 5). Only the AMR (p-Value = 0.0021) and EUR (p-Value = < 0.0001) populations were statistically different from the NAT population regarding the SNV rs17156844. For the marker rs550369696, the Native American population did not have higher frequencies when compared to any of the continental populations or the ABraOM. However, the NAT-AFR analysis remains significant, with six (rs2715150, rs2522833, rs10261848, rs28680905 and rs12668093) of the eighteen investigated variants being more frequent in NAT than in the African population, of which all obtained the same result in the analysis of the NAT-AFR in 1000 Genomes. In the results compared to the ABraOM population, four markers (rs2715150, rs61741659, rs762371134 and rs2522833) showed a higher frequency of distribution in the Amerindian population, and with the exception of rs762371134, all of which obtained the same results in the above-mentioned analysis. The comparative analysis of these two databases allows a more accurate visualization of the genomic data of the investigated population, as the databases use different sample numbers for each investigated continental population. Thus, the differences and similarities of each database are within the expected range.
Discriminant Analysis of Principal Components (DAPC) Scatterplot is presented in Figure 1. The analysis divided the studied populations into well-defined clusters according to the genetic structure in the PCLO gene, modulated in the five continental populations described in the 1000 Genomes Project database and in the gnomAD database. It demonstrated a more significant distance between the population of interest (NAT) and the African population, while showing a greater proximity between the NAT population and the EAS and SAS populations. Figure 1. The analysis divided the studied populations into well-defined clusters according to the genetic structure in the PCLO gene, modulated in the five continental populations described in the 1000 Genomes Project database and in the gnomAD database. It demonstrated a more significant distance between the population of interest (NAT) and the African population, while showing a greater proximity between the NAT population and the EAS and SAS populations.

Discussion
PCLO encodes Piccolo, a protein that builds the presynaptic cytoskeletal matrix, which is believed to be involved in modulating neurotransmitters' release [21,22]. Studies demonstrated that loss of Piccolo and Bassoon (another component of the presynaptic zone) leads to aberrant degradation of several presynaptic proteins, culminating in the degeneration of synapses [6]. Genetic variations in the PCLO gene in humans can lead not only to synaptic dysfunction, but also to loss of brain and cerebellar volume, suggesting severe neuronal loss [23]. Studies on this gene demonstrate that the imbalance in its expression can modulate the development of different diseases [5,13,24].
To date, few genetic screening investigations have been carried out in Amerindian populations, especially in the Brazilian Amazon ones; thus, this group is epidemiologically and genetically understudied [14]. Traditional Latin America populations are a complex study group due to their human history of admixed, which gives them high levels of

Discussion
PCLO encodes Piccolo, a protein that builds the presynaptic cytoskeletal matrix, which is believed to be involved in modulating neurotransmitters' release [21,22]. Studies demonstrated that loss of Piccolo and Bassoon (another component of the presynaptic zone) leads to aberrant degradation of several presynaptic proteins, culminating in the degeneration of synapses [6]. Genetic variations in the PCLO gene in humans can lead not only to synaptic dysfunction, but also to loss of brain and cerebellar volume, suggesting severe neuronal loss [23]. Studies on this gene demonstrate that the imbalance in its expression can modulate the development of different diseases [5,13,24].
To date, few genetic screening investigations have been carried out in Amerindian populations, especially in the Brazilian Amazon ones; thus, this group is epidemiologically and genetically understudied [14]. Traditional Latin America populations are a complex study group due to their human history of admixed, which gives them high levels of population genetic diversity [25]. The study by Ribeiro-dos-Santos [26] sequenced the genome of an individual from a South American tribe and identified more than 60,000 new genetic variants, among them, specific variants of the South America native populations. These results demonstrated the need for a deeper understanding of the genetic variability of South American Amerindian populations [26]. Other recent investigations have shown that Brazilian Amerindians have a unique genetic profile [17,[27][28][29]. Studies have increasingly demonstrated that next-generation sequencing methodologies are important to guide us to new and rare variants. It is also known that the frequencies' fluctuation of polymorphisms in different ethnic groups can lead to important consequences, such as an association with complex diseases, which justifies the relevance of studying the molecular epidemiology of a population [30]. This is the first study to investigate the PCLO gene in Amazonian Amerindians, a population not described in the largest available databases on human genetic variability, the 1000 Genomes Project and gnomAD. Brazilian efforts to identify genetic variants, such as ABraOM, show the importance of not only investigating Brazilian population, but also the various indigenous ethnic groups spread across all country. Our results showed that, although many investigated variants have a similar frequency distribution between continental populations and the population described in ABraOM, at least 10 variants differed significantly between Native American and African populations in the analysis of 1000 Genomes. When looking at the NAT-AFR data in gnomAD, the rs2715150, rs2522833, rs10261848, rs28680905 and rs12668093 populations showed the same result as the other analysis for these populations, corroborating data regarding the genetic variability between Native American populations and other continental populations. Likewise, the DAPC analysis of PCLO showed that the NAT and AFR populations are the most genetically distinct, confirming what is known about history of human populations, which indicates that the Amerindian and African groups represent the extremes of the evolutionary process [31].
We also found in both analyzes that the frequencies of the variants rs17156844 (p-Value 1000 Genomes = 0.0060; p-Value gnomAD = 0.0021) and rs2877 (p-Value 1000 Genomes = 0.0013; p-Value gnomAD = 0.0232) are distributed differently in the indigenous population, showing high frequency in the Amazonian Native compared with the American population. These data agree with the finding by Wang and collaborators [32], who analyzed the genetic diversity and the populational structure of Native Americans from Central, North and South America, and then divided the genotypes of the individuals using a model of six clusters, which corresponded mainly to isolated populations of Ache and Suruí in South America. The authors concluded that South American indigenous populations are even more genetically isolated from other American populations [32]. In addition, when comparing regions within the Americas, the highest FST value was observed in eastern South America, demonstrating an overall lower level of Native American genetic variability in that region [32].
All populations included in the study had lower allele frequencies than the NAT population in at least one PCLO gene marker in both analyses. NAT-ABraOM data infer that the variants rs17156844 (p-Value = < 0.0001), rs2715150 (1000 Genomes p-Value = 0.0500; gnomAD p-Value = 0.05006), rs61741659 (p-Value = 0.0250; gnomAD p-Value = 0.02507), rs2877 (p-Value = < 0.0001) and rs2522833 (100 Genomes p-Value = 0.0500; gnomAD p-Value = 0.0500) differed significantly between these populations. Our result endorses the observation of Santos et al. [33] and Amador et al. [34], who investigated allelic and genotypic proportions of the Brazilian population for different purposes, but both showed that Brazilian individuals have different proportions of ancestral contribution in each country region. The Southwest Region of Brazil (where the ABraOM samples come from) has a lower ancestral contribution from Native American populations, and therefore, they are expected to be more different from NAT individuals than populations from the North Region of the country, for example [33,34]. Thus, this result highlights that, although the Brazilian population has significant proportions of NAT genes throughout its territory, its genetic profile may differ intrapopulationally, and therefore, it is important to trace its genome to discover their particularities.
Finally, the knowledge of the different patterns of genetic diversity in human populations is important in many health and genetic areas, as it can be used, for example, to investigate and validate new insights in the study of complex diseases, thus understanding the predisposition, diagnosis, prognosis and therapies for indigenous populations, and for populations with a high level of admixed, such as the Brazilian one. Therefore, we aimed to collaborate in the creation of public policies that help to optimize the quality of life of indigenous populations. Furthermore, the knowledge produced is also basis for inferences about human evolutionary history [35,36].

Conclusions
The results showed that PCLO variants have a significantly higher frequency in Native American populations, especially for rs17156844, rs550369696, rs61741659 and rs2877, than in other continental populations, as well as in population of Southwest Brazil. The data generated in the present study contributed to the understanding of the genetic influence of PCLO in a poorly studied group, in addition to providing important subsidies for future association studies, which aim to identify individuals more susceptible to pathologies due to genetic alterations of PCLO in Native American populations, as well as in other populations with a high degree of admixed with these groups. This is the first study to perform the PCLO exome in NAT populations from the Brazilian Amazon region; thus, we hope that these data may help in the establishment of future public policies for this population.