Molecular Profile of Variants in CDH1, TP53, PSCA, PRKAA1, and TTN Genes Related to Gastric Cancer Susceptibility in Amazonian Indigenous Populations

Gastric Cancer is a disease associated with environmental and genetic changes, becoming one of the most prevalent cancers around the world and with a high incidence in Brazil. However, despite being a highly studied neoplastic type, few efforts are aimed at populations with a unique background and genetic profile, such as the indigenous peoples of the Brazilian Amazon. Our study characterized the molecular profile of five genes associated with the risk of developing gastric cancer by sequencing the complete exome of 64 indigenous individuals belonging to 12 different indigenous populations in the Amazon. The analysis of the five genes found a total of 207 variants, of which 15 are new in our indigenous population, and among these are two with predicted high impact, present in the TTN and CDH1 genes. In addition, at least 20 variants showed a significant difference in the indigenous population in comparison with other world populations, and three are already associatively related to some type of cancer. Our study reaffirms the unique genetic profile of the indigenous population of the Brazilian Amazon and allows us to contribute to the conception of early diagnosis of complex diseases such as cancer, improving the quality of life of individuals potentially suffering from the disease.


Introduction
Gastric Cancer is a multifactorial disease associated with genetic alterations or environmental questions.It is the fifth most common cancer and fourth in cause of death worldwide, with a high incidence in East Asia and Latin America [1].In the northern region of Brazil, Gastric Cancer is the second most frequent cancer in men and fifth in women, according to data released by the National Institute of Cancer (2023) [2].
This scenario may have influenced the ancestral genetic profile of this population.Studies demonstrate that genomic ancestry has a great influence on the clinical presentation and the incidence of Gastric Cancer in different populations according to their historical formation [3,4].The Brazilian population is one of the most mixed around the world, with the important contribution of three ancestral populations: Europeans, Africans and Indigenous people.This mixing directly impacts the fluctuation of genetic variants' frequency that may act on predisposition to diseases with clinical manifestations, such as cancer [5].It is known that the mixed Brazilian population presents approximately 30% of the indigenous genomic contribution; therefore, studies in this ancestral population potentially assist in the reduction of problematic Gastric Cancer in the country [6,7].
Despite being a highly type of neoplastic study, only rare studies have been conducted in populations genetically heterogeneous or in original people, and an even smaller portion of the Brazilian indigenous population [8].In this way, it is extremely valid to analyze the genetic profile of this original population of the Brazilian Amazon, in order to investigate genetic variants already elucidated in the world literature, as well as unknown or unscreened variants that may be associated with the development of Gastric Cancer.
Many studies demonstrate that the single nucleotide variants (SNVs) in target genes can assist in the carcinogenesis process [9,10].Especially for Gastric Cancer, the important role of genes related to metabolic pathways, accession, proliferation, and cellular survival could be highlighted by intense participation in tumor development [11,12].Thus, the objective of this study was to investigate the exome of genes TP53, CDH1, PSCA, PRKAA1 and TTN in the indigenous population of the Brazilian Amazon and characterize the variants that may be associated with the risk of Gastric Cancer in this population.

Population Analysis for the Sutdy
The study is composed of 64 indigenous people from the Amazon in the northern region of Brazil, represented by 12 different indigenous peoples: Asurini located in Xingu and Tocantins, Arara, Araweté, Awa-Guajá, Juruna, Kayapó/Xikrin, Karipuna, Munduruku, Phurere, Wajãpi and Zo'é.All participating individuals were grouped into a single group called Indigenous (INDG) for statistical analyses.Genetic ancestry data were obtained from a panel with 64 informative ancestry markers (IAM), as described by Ramos et al.All participants or their community leaders were instructed about the research to be carried out and signed an Informed Consent Form (TCLE).The study was approved by the National Research Ethics Committee (CONEP) and by the Research Ethics Committee of the Tropical Medicine Center of the Federal University of Pará (CAE: 20654313.6.0000.5172).
The population frequencies of indigenous people were compared with other continental populations: Europe (EUR), Africa (AFR), East Asia (EAS), South Asia (SAS) and Americas (AMR), present in the 1000 Genomes Database, version 3 (available at: http://www.1000genomes.org;accessed on 15 May 2023).The study included 503 subjects from Europe, 661 from Africa, 504 from East Asia, 489 from South Asia and 347 from the Americas.

DNA Extraction and Exome Analysis
DNA extraction was performed according to the PhenolChloroform method [13] with modifications.The quantification of the extraction product was performed by Nanodrop-8000 spectrophotometer (Thermo Fisher Scientific Inc., Wilmington, DE, USA) and the prospective analysis of the quality of the extracted material was performed using 2% agarose gel electrophoresis.
The variant library (exome) was prepared using Nextera Rapid Capture Exome (Illumina ® , San Diego, CA, USA) and SureSelect Human All Exon V6 (Agilent Technologies, Santa Clara, CA, USA), following the kit protocol provided by the manufacturer.The sequencing reaction was performed by NextSeq 500 ® platform (Illumina ® , San Diego, CA, USA) using the NextSeq 500 High-output v2 Kit 300 cycle kit (Illumina ® , San Diego, CA, USA).

Selection of Genes
The selection of genes was carried out by consulting the Pubmed database (pubmed.ncbi.nlm.nih.gov;accessed on 21 May 2023).The five selected genes (TP53, CDH1, PSCA, PRKAA1 and TTN) are among the most commonly cited in the literature related to susceptibility to Gastric Cancer.

Selection of Variants
The selection was the result of using two evaluation criteria.First, a minimum of 10 coverage readings was carried out for each variant presented in subjects.Second, the impact of variants was studied, considering only those with high risk, moderate or modifier effect, according to the classification by SNPeff (https://pcingola.github.io/SnpEff/;accessed on 25 May 2023).As a result of the exome analysis, 307 variants were found, as shown in Supplementary Table S1.After the selection based on the criteria mentioned above, a total of 237 variants remained to be followed in the investigation process.

Statiscal and Bioinformatics Analysis
The study population had its allele frequency obtained by calculating genes and comparing with other large populations already investigated (EUR, AMR, EAS, SAS and AFR).For the evaluation of statistical significance in the differentiation of frequencies between populations, Fisher's exact test was used.Population variability of polymorphisms was observed using Wright's fixation index (FST).A p-value of ≤0.05 was considered as significant data.All investigation was performed in RStudio v.3.5.1.

Results
Of 237 variants evaluated after the selection process, 207 belong to the TTN gene, one to the PRKAA1 gene, 13 to the PSCA gene, seven to the CDH1 gene and nine to the TP53 gene.Of these, it was possible to verify that the three variants were predicted with high impact, 126 modifiers and 108 moderates.From these analyses, the graphic of relative frequency (Figure 1) demonstrates the variance of the impact between the five genes studied, in which the significance of the TTN gene was visibly higher when compared to the other genes.Table 1 describes the characteristics of the variants predicted by the software SNPeff with high and modifier impact, including the affected gene, the Id reference, chromosomic region, clinical impact and the frequency of alleles referent to the Indigenous population and the five great populations described by the 1000 genomes database (AFR, AMR, EAS, EUR and SAS).The variants with moderate impact are described in Supplementary Table S1.
Table 1.Descriptions of the variants in the TTN, PSCA, PRKAA1, CDH1 and TP53 genes according Quantitatively, 100% of the three variants with high impact are present in the TTN gene, two being new in the indigenous population.Furthermore, 80% presented modifier impact corresponding to 101 variants and 95% with moderate impact, equal to 103 variants.The PSCA, PRKAA1 and TP53 genes have the same 0.9% frequency with moderate impact, equal to 1 variant each; the CDH1 gene has two variants with a frequency of 1.8% of the same moderate impact.With modifier impact, the PSCA gene has 9.5% corresponding to 12 variants, the CDH1 gene t3.8% with 5 variants, and the TP53 gene 6.3% with 8 variants.In the PRKAA1 gene, no modifier variant was identified.
Table 1 describes the characteristics of the variants predicted by the software SNPeff with high and modifier impact, including the affected gene, the Id reference, chromosomic region, clinical impact and the frequency of alleles referent to the Indigenous population and the five great populations described by the 1000 genomes database (AFR, AMR, EAS, EUR and SAS).The variants with moderate impact are described in Supplementary Table S1.
Table 1.Descriptions of the variants in the TTN, PSCA, PRKAA1, CDH1 and TP53 genes according to high impact and modifier, in addition to continental populations (African (AFR), American population (AMR), East Asian (EAS), European (EUR), and South Asian (SAS)) described in the 1000 genomes database.Among the variants described in Table 1, the rs556408709 of the TTN gene has a high impact, characterized by a nucleotide change from C to T. Among the high impact variants, two are new variants also located on chromosome 2, the first at position 178597932 characterized by an Insertion/Deletion causing a change in the reading matrix, and the second at position 178651537, also an Insertion/Deletion changing the splice site acceptor.In addition, another 15 new variants are exclusive to the Indigenous population, 14 in the TTN gene and one in the CDH1 gene (Table 2).
The graph of the Multidimensional Scale Analysis (MDS) using the FST values and the genotypes of the populations for comparison between the 237 variants in the genes, shows a division into three fields (Figure 2).It is possible to observe the genotypic distance of the Indigenous population compared with the other populations.Considering only the exome of the genes in this study, it can be concluded that the indigenous population is genetically closer to Americans and more distant from Europeans, which shows field isolation.The graph of the Multidimensional Scale Analysis (MDS) using the FST values and the genotypes of the populations for comparison between the 237 variants in the genes, shows a division into three fields (Figure 2).It is possible to observe the genotypic distance of the Indigenous population compared with the other populations.Considering only the exome of the genes in this study, it can be concluded that the indigenous population is genetically closer to Americans and more distant from Europeans, which shows field isolation.

Discussion
Gastric cancer is one of the leading causes of death in Latin America.Several investigations regarding susceptibility to Gastric Cancer have been developed, with Whole Exome Sequence Analysis being one of the most promising methods.However, for the more than 1 million indigenous people living in the Brazilian territory according to the Brazilian Institute of Geography and Statics (IBGE), little attention has been paid to understanding the high incidence of cancer cases in this population [15].
The determination of the people originating from the Brazilian Amazon as an object of sequencing study derives from the manifestation of a unique genetic profile observed

Discussion
Gastric cancer is one of the leading causes of death in Latin America.Several investigations regarding susceptibility to Gastric Cancer have been developed, with Whole Exome Sequence Analysis being one of the most promising methods.However, for the more than 1 million indigenous people living in the Brazilian territory according to the Brazilian Institute of Geography and Statics (IBGE), little attention has been paid to understanding the high incidence of cancer cases in this population [15].
The determination of the people originating from the Brazilian Amazon as an object of sequencing study derives from the manifestation of a unique genetic profile observed in this group, when compared with other world populations.It is known that this differentiation was due, in part, to the past colonization process that directly influenced the genetic variability of these peoples, implying the fluctuation of polymorphisms that influence the predisposition to different diseases [16], including Gastric Cancer.Thus, this research aimed to understand the genomic profile of the indigenous population (INDG) by evaluating five genes associated with gastric cancer applied to 12 different indigenous communities.
Our results described 237 variants never before described in these populations, at least five of which stood out due to their high clinical impact, at least five variants standing out for their predicted high clinical impact or statistical significance.The rs556408709 variant in the TTN gene, although it does not have a great difference in terms of frequencies when compared to other world populations, has high clinical relevance according to ClinVar (NCBI) with a prognosis not yet reported.The rs397517782 and rs397517532 variants, also in the TTN gene, were statistically significant in the INDG population when compared to AFR, AMR, EUR and SAS.TTN is responsible for encoding a transmembrane protein present in striated muscle tissues, in addition to being studied in different populations, being correlated with poor prognosis for Gastric Cancer in Chinese, and the most prominently related to various cancers, found in about 56% of tumors [17][18][19].Variants in specific genes, such as TTN, can influence cell proliferation that leads to cancer, in which studies that investigated the accumulation of mutations in several genes that lead to gastric adenocarcinoma showed TTN among the 10 most mutated genes [20].
Additionally, rs1625895, present in TP53, showed statistical significance compared with the EUR and SAS populations.This intronic variant is associated with certain types of cancer such as lung, colorectal and ovarian cancer, in addition to the significant geneenvironment interaction with a predisposition to oral cancer [21,22].The TP53 gene, which encodes the p53 tumor suppressor protein responsible for regulating the cell cycle, repair mechanisms and cell apoptosis from a transcriptional factor, is often described in cases of cancer as being essential for regulation against the development of neoplasms [23].The presence of germline genetic variants in the TP53 gene may influence the clinical presentation of diseases such as the Li-Fraumeni syndrome, characterized by a predisposition to multiple types of autosomal dominant cancer of early onset [24,25].Other studies have sought to describe the genetic variability in the TP53 gene applied to other Brazilian indigenous peoples other than our study population, finding possible associations with the cancer progression process [26].
In addition, an important and expressive result in our analyses was the 17 new variants found in the indigenous population, and therefore possibly exclusive to this population.Among these, four are of the INDEL type, two of which are expected to have a high clinical impact.A significant portion of 16 new variants were found in the TTN gene, including the two high-impact variants, and the other new variant, the INDEL type, was found in the CDH1 gene, which is highly associated with the development of Gastric Cancer.This tumor suppressor gene located on chromosome 16q22.1,which encodes the Ecadherin glycoprotein, is associated with cell adhesion for the formation of complex tissues associated with other proteins, in which mutations in the CDH1 gene with loss of function of its protein can lead to cell-cell adhesion instability and important signaling failure in the pathways in which they act [11].More than 100 pathogenic germline variants have already been described in the CDH1 gene and studies, such as that by Luo and collaborators (2018) [27], demonstrate the relevance of the molecular mechanisms of carriers of mutations in the CDH1 gene for diagnosis and treatment.Therefore, since CDH1 is a Gastric Cancer target gene, we can state that the new variant found in our investigation may have an extremely important role in terms of cell cycle modulation, making it necessary to carry out further investigations to understand its biological impact.
The multidimensional scale chart presented in this study has characteristics that differentiate it from other analyses that follow the same standard methodology.This difference occurs mainly when we analyze the genetic proximity of the INDG population regarding the AFR population when compared with the other populations analyzed in the study.The AMR population also has high similarity, which is expected, due to the genetic contribution of the INDG population to the historical formation of mixed American peoples.
Finally, with the analysis of the results, the need is understood for studies like these that seek the genomic understanding of indigenous communities, for a better understanding of mechanisms of susceptibility and progression of gastric cancer.Thus, our findings contribute to knowledge of the genetic profile of Indigenous populations and may help in the development of new case-control studies that assess the clinical impact of scientifically elucidated variants and variants exclusive to Indigenous populations of the Amazon and, in addition, to corroborate studies carried out in mixed races of the Brazilian population.

Conclusions
This study evaluated the presence of genetic variants in five genes associated with Gastric Cancer susceptibility in different indigenous groups in the Brazilian Amazon.Our results demonstrate the presence of genetic variants in well elucidated genes and the identification of a large number of variants, including the discovery of new ones in other genes, such as TTN, showing the importance of studies applied to the indigenous population.
The findings in our study allow us to reaffirm the unique genetic profile of the indigenous population of the Amazon and mixed races in northern Brazil.Our data can contribute to the design of markers that help in the early diagnosis of complex diseases such as cancer, improving the quality of life of individuals potentially susceptible to the

Figure 1 .
Figure 1.The relative contribution of variants discriminates according to the high, modifier and moderate impact in the TTN, PSCA, PRKAA1, CDH1 and TP53 genes.

Figure 1 .
Figure 1.The relative contribution of variants discriminates according to the high, modifier and moderate impact in the TTN, PSCA, PRKAA1, CDH1 and TP53 genes.
No annotation; (*) Reference genome to chromosomal location obtained from the GH38 of the human genome from the Human Genome Project.

Figure 2 .
Figure 2. Multidimensional scaling plot illustrating Indigenous population and continental population according to genetic traits of variants in the TTN, PSCA, PRKAA1, CDH1 and TP53 genes.

Figure 2 .
Figure 2. Multidimensional scaling plot illustrating Indigenous population and continental population according to genetic traits of variants in the TTN, PSCA, PRKAA1, CDH1 and TP53 genes.

Table 2 .
Description of new variants found in the Indigenous population from the Brazilian Amazon to genes relevant to increased susceptibility to gastric cancer.
(-) No annotation; (*) Reference genome to chromosomal location obtained from the GH38 of the human genome from the Human Genome Project.