The Search for Cancer Biomarkers: Assessing the Distribution of INDEL Markers in Different Genetic Ancestries

Cancer is a multifactorial group of diseases, being highly incident and one of the leading causes of death worldwide. In Brazil, there is a great variation in cancer incidence and impact among the different geographic regions, partly due to the genetic heterogeneity of the population in this country, composed mainly by European (EUR), Native American (NAM), African (AFR), and Asian (ASN) ancestries. Among different populations, genetic markers commonly present diverse allelic frequencies, but in admixed populations, such as the Brazilian population, data is still limited, which is an issue that might influence cancer incidence. Therefore, we analyzed the allelic and genotypic distribution of 12 INDEL polymorphisms of interest in populations from the five Brazilian geographic regions and in populations representing EUR, NAM, AFR, and ASN, as well as tissue expression in silico. Genotypes were obtained by multiplex PCR and the statistical analyses were done using R, while data of tissue expression for each marker was extracted from GTEx portal. We highlight that all analyzed markers presented statistical differences in at least one of the population comparisons, and that we found 39 tissues to be differentially expressed depending on the genotype. Here, we point out the differences in genotype distribution and gene expression of potential biomarkers for risk of cancer development and we reinforce the importance of this type of study in populations with different genetic backgrounds.


Introduction
Cancer is one of the leading causes of death worldwide [1], being considered a group of complex diseases that involve environmental, epigenetic, and genetic factors [2,3]. It is estimated that, in 2018, around 18 million new cases of cancer occurred in the world [1]. In Brazil, the National Cancer Institute (INCA) estimates that, for each year from 2020 to 2022, there were 625 thousand new cases, although there is a great variation in magnitude and in the cancer types among the different geographic regions of this country [4]. This occurs partly because Brazil has one of the most genetically heterogeneous populations in the world, composed mainly by Native American, European, and African contributions [5]. In addition, the biggest Japanese community outside Japan is in Brazil, estimated to be around 1.5 million people [6], which allows a certain degree of admixture between this population and the Brazilian population, mainly within the regions where this community is concentrated, North and Southeast of Brazil.
In the global literature, we may find several studies involving genetic markers related to cancer, mostly in case-control association studies, in which these are used to predict risk of development and/or prognosis of a certain type of cancer in different populations [7,8]. It is notable that, among different ethnic populations (also called continental populations), genetic markers commonly present diverse allelic frequencies [9]. However, in admixed populations, such as the Brazilian population, data on the distribution of this kind of markers are still limited.
In this work, we describe the allelic and genotypic distribution of 12 Insertion/Deletion (INDEL) polymorphisms, located in genes involved in important metabolic pathways associated with carcinogenesis, in populations from the five Brazilian geographic regions and in populations representing Europeans, Africans, Native Americans, and Asians. These genes and polymorphisms have been studied and associated with various types of cancer in different populations, such as bladder cancer [10], oral cancer [11], hepatocellular carcinoma [12], breast cancer [13][14][15], chronic lymphoblastic leukemia [16], colorectal cancer [17][18][19][20][21], thyroid cancer [22] and gastric cancer [23,24]. Thus, these markers were chosen based on the importance of each gene and their potential as an influence in tumor development.

Materials and Methods
This study included a population of 1411 non-related and cancer-free adult individuals, recruited in ten Brazilian states, in the years of 2009 and 2010, being 480 individuals from Pará (n = 360), Amazonas (n = 60) and Rondônia (n = 60) representing the North region; 370 individuals from Ceará (n = 135), Rio Grande do Norte (n = 175), Maranhão (n = 8) and Pernambuco (n = 52) representing the Northeast region; 186 individuals from Goiás (n = 101), Mato Grosso do Sul (n = 49) and Distrito Federal (n = 36) representing the Midwest region; 184 individuals from São Paulo representing the Southeast region; and 191 individuals from Rio Grande do Sul representing the South region. More details on the sampling approach may be found in previous studies [25,26].
In addition, we investigated a sample of 896 individuals representative of the main ethnic groups that contributed to the Brazilian population: 222 Native Americans (NAM) from nine tribes of the Brazilian Amazon (Tiriyó, Waiãpi, Zoé, Urubu-Kaapor, Awa-Guajá, Parakanã, Wai Wai, Gavião, Zoró) [27]; 211 Africans (AFR) from five different countries (Angola, Mozambique, Congo Republic, Cameroon, Ivory Coast) [28]; 270 Europeans (EUR) from two different countries (Portugal and Spain) [25,29]; and 193 Asians (ASN) from Japan [30]. By using a panel of ancestry informative markers (AIM), we have previously estimated the genomic ancestry of each group [30]. Informed consent for DNA analysis was obtained from all participants. Project approval was given by the Ethics Committee of Instituto de Ciências da Saúde, Universidade Federal do Pará.

DNA Extraction and Quantification
Samples of peripheral blood were collected from all individuals of the study and the DNA extraction was performed accordingly [31]. DNA quantification was performed with the NanoDrop 1000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA).

Genotyping of Investigated Polymorphisms
Polymorphisms were genotyped by a single multiplex reaction with Master Mix QIAGEN ® Multiplex PCR kit (Qiagen, Hilden, Germany) and the primers are described in Table 1. PCR preparation protocol was done as described by Cavalcante et al. [32]. All polymorphisms are functional and correspond to INDEL of DNA fragments.
Multiplex PCR products were separated and analyzed by capillary electrophoresis on the ABI 3130 Genetic Analyzer instrument, using GS-500 LIZ as a pattern of molecular weight, G5 virtual filter matrix and POP7 (instrument and reagents by Thermo Fisher Scientific). Then, samples were analyzed with GeneMapper ® 3.7 software (also by Thermo Fisher Scientific).

Data Analyses
Allelic and genotypic frequencies were obtained by direct counting. Hardy-Weinberg Equilibrium (HWE) deviations were tested in Arlequin 3.1 software [33] and corrected by Bonferroni method. Differences in genotypic frequencies among Brazilian regions and parental populations were measured by chi-squared test (χ 2 test, df = 2). FDR (False Discovery Rate) method was used to correct multiple analyses. All statistical analyses were performed in the statistical package R [34]. p-Value was considered significant if equal or lower than 0.05. In addition, to infer possible influences on cancer development, we assessed the Genotype-Tissue Expression (GTEx) Portal (https://gtexportal.org/home/, accessed on 1 May 2022) [35] to obtain the expression of each variant in different tissues.

Results
The observed allele and genotype frequencies for the 12 markers investigated in the Brazilian population and the continental populations (AFR, NAM, EUR and ASN) are shown in Tables S1-S3; and the distribution of the genotypes is plotted in Figure 1.
When assessing HWE with correction for multiple testing for all markers in the different populations, we did not find any deviation from HWE in the admixed populations from Brazil. However, the markers in CASP8, TP53 and XRCC1 genes in Amerindian, UGT1A1 gene in African, NFKB1 gene in European, as well as MDM2 and IL4 genes in Asian populations, presented HWE deviation, indicating the distribution of these markers in such populations is not normalized according to HWE principles.

Results
The observed allele and genotype frequencies for the 12 markers investigated in the Brazilian population and the continental populations (AFR, NAM, EUR and ASN) are shown in Tables S1-S3; and the distribution of the genotypes is plotted in Figure 1. When assessing HWE with correction for multiple testing for all markers in the different populations, we did not find any deviation from HWE in the admixed populations from Brazil. However, the markers in CASP8, TP53 and XRCC1 genes in Amerindian, UGT1A1 gene in African, NFKB1 gene in European, as well as MDM2 and IL4 genes in Asian populations, presented HWE deviation, indicating the distribution of these markers in such populations is not normalized according to HWE principles.
We then compared the genotypic distribution of the 12 markers among continental populations and the following results should be highlighted. Regarding biometabolism and cell energy markers (in UGT1A1, CYP19A1 and CYP2E1 genes), CYP19A1 e UGT1A1 did not present differences among populations in most comparisons, only in EUR vs. NAM, and CYP2E1 also did not differ in the comparisons, except for AFR vs. ASN. As for genomic stability and cell death markers (TYMS, XRCC1, CASP8, MDM2 and TP53), XRCC1 and CASP8 were significantly different in all populations, except for AFR vs. EUR; TYMS and TP53 only presented statistical difference in AFR vs. ASN and NAM vs. ASN comparisons, respectively; and marker MDM2 presented differences in the comparisons between NAM and all the other groups, but not in the other comparisons. Concerning markers of immune response and inflammatory processes (IL1A, IL4, NKFB1 and PAR1), we observed significant differences for both IL4 and PAR1 in all comparisons; IL1A was different in all comparisons, but not in AFR vs. EUR; and NFKB1 was only different in We then compared the genotypic distribution of the 12 markers among continental populations and the following results should be highlighted. Regarding biometabolism and cell energy markers (in UGT1A1, CYP19A1 and CYP2E1 genes), CYP19A1 e UGT1A1 did not present differences among populations in most comparisons, only in EUR vs. NAM, and CYP2E1 also did not differ in the comparisons, except for AFR vs. ASN. As for genomic stability and cell death markers (TYMS, XRCC1, CASP8, MDM2 and TP53), XRCC1 and CASP8 were significantly different in all populations, except for AFR vs. EUR; TYMS and TP53 only presented statistical difference in AFR vs. ASN and NAM vs. ASN comparisons, respectively; and marker MDM2 presented differences in the comparisons between NAM and all the other groups, but not in the other comparisons. Concerning markers of immune response and inflammatory processes (IL1A, IL4, NKFB1 and PAR1), we observed significant differences for both IL4 and PAR1 in all comparisons; IL1A was different in all comparisons, but not in AFR vs. EUR; and NFKB1 was only different in AFR vs. EUR and NAM vs. EUR. Due to the table size, p-values for these comparative analyses are shown in Table S4.
Moreover, we measured and analyzed δ (delta) values or mean frequencies among continental populations (Table 2). Among the investigated markers, the difference of δ values between NAM and AFR was 32%, between NAM and EUR was 23% and between EUR and AFR was 19%. In the comparisons involving ASN, an average delta value of 14% was estimated between ASN and NAM; 21% between ANS and EUR; and 26% between ASN and AFR. In the comparison of geographic regions, the marker in IL4 was significantly different between North and the other populations of Brazil. Additionally, distribution of the marker in IL1A was significantly different between North and the regions South, Southeast and Northeast, and between Midwest and South. As for the polymorphism in NFKB1, it showed statistically significant difference between North and regions South and Southeast, but it was similar in all other comparisons.

Discussion
This study aimed to investigate and describe the frequencies of markers of interest (located in genes involved in important metabolic pathways associated with carcinogenesis) in populations from the five geographic populations of Brazil and in populations representing European, African, Native American, and Asian ancestries. These markers were divided according to gene functions.
In a previous study [5], the description of the group of markers of immune response and inflammatory processes was performed in the same populations investigated here, except for the Asian population. Regarding this group of markers, in addition to the results presented in that paper, it is possible to highlight that ASN population was different from all other continental populations for the investigated markers in IL1A, IL4 and PAR1. As for the marker in NFKB1, it only differed between AFR and EUR and between NAM and EUR.
In the comparisons of geographic regions, marker IL4 was significantly different between North and the other Brazilian populations. Besides that, our analysis also showed the distribution of the IL1A marker with statistical differences between North and the South, Southeast and Northeast regions, as well as between Midwest and South regions. The polymorphism in NFKB1 was also significantly different between North and the regions South and Southeast. All other distributions of this group of markers were similar among these regions.
Regarding the investigated variants in genes of biometabolism and cell energy, not many studies can be currently found analyzing their distribution in populations of different genetic ancestries. However, a study by Fritsche et al. [36] compared genotypes of the 96-bp INDEL in CYP2E1 gene in samples from individuals with African (African-American), European (European-American) and Asian (Taiwanese) genetic backgrounds and observed statistically significant differences between Europeans and Asians and between Europeans and Africans, but none between Asians and Africans, which corroborates our findings here. Concerning variant rs8175347 in UGT1A1 gene, allele frequency of this variant has been reported as different when compared in groups of European, African, and Asian (including Japanese) ancestries [37]. No studies were found with the rs28892005 variant in CYP19A1.
As for variants in the group of genomic stability and cell death, there are some papers discussing their distribution in different populations in the global literature. For instance, a previous study by our research group compared the allele distribution of rs17878362 in TP53 gene in populations of European, African, and Asian ancestry from 1000 genomes database [9], as well as a population from Northern Brazil, and observed statistical differences in all comparisons, with the exception of the one between Northern Brazil and European populations, which could be expected given the high contribution of European ancestry in this region [38]. However, it is notable that these frequencies significantly vary among different genetic backgrounds.
Similarly, the variant rs3730485 (also known as del1518) in MDM2 gene has been investigated in different populations, particularly in connection with cancer development. For example, two independent studies involving different types of cancer in Chinese cohorts have reported a frequency of 30% of DEL allele in both groups of controls [12,39]. The study by Gansmo et al. [39] also investigated this variant in other populations, indicating the presence of the same allele in 38% and 42% in the African American and the Norwegian controls, respectively. Here, we found this allele in 33%, 38% and 33% of the African, European, and Asian groups, respectively, which seem to be close to the corresponding frequencies in these previous reports. To the best of our knowledge, this is the first study investigating this variant in NAM populations from the Brazilian Amazon.
Variant rs151264360 in TYMS gene has also been widely studied regarding cancer treatment in different regions. In this context, it has been associated with response to chemotherapy for colorectal cancer in a Mexican cohort, highlighting the importance of TYMS to cancer treatment in Latin American populations [40]. In that study, DEL allele was present in 33.0% of the participants, which was similar to the frequency of this allele in a study carried in a Slovak population (37.5%) [41]. A study by Summers et al. [42] reported a significant difference in the distribution of rs151264360 between African-Americans (DEL 53.75%) and Europeans (DEL 33.3%). These frequencies are like the ones observed here for African (DEL 58.0%) and European (DEL 36.0%) ancestries, which also showed significant differences.
Likewise, polymorphism rs3834129 in CASP8 gene has been broadly studied regarding cancer, particularly cancer development. For instance, in a study by Pardini et al. [43], DEL allele of this variant was suggested as a protective effect to colorectal cancer in the multiple populations investigated, mostly from European countries. In these populations, the presence of DEL allele ranged from 45% to 52%. In addition, two independent studies investigating the distribution of this polymorphism in British cohorts in association to different diseases have reported the presence of DEL allele as 50% in the controls [44,45]. Similarly, a study by Chatterjee et al. [46] investigated the association of this marker with HPV infection and cervical cancer in South Africa and showed the presence of DEL allele in around 52% of the controls. Here, we found this allele in 50% and 47% of the African and European groups, respectively, which are also similar frequencies and corroborate these studies.
On the other hand, there are not many studies with the variant rs3213239 (XRCC1 gene) in the specialized literature. Two studies by our research group have investigated this variant regarding cancer susceptibility in Northern Brazil, reporting association with acute lymphoblastic leukemia (ALL) [47], but not with gastric cancer or colorectal cancer [32]. Curiously, in the study by Carvalho et al. [47], not only the DEL/DEL genotype of this variant was associated with ALL, but also the genetic ancestry: NAM and EUR ancestries were associated with increased and decreased risk of developing ALL, respectively, highlighting the importance of investigating this variant in different populations.
Moreover, in the GTEx analysis, it is notable that the studied variants in CASP8 and XRCC1 appeared in most of the tissues showing more than one differentially expressed gene, nine each, of which six presented both markers: (i) cells-cultured fibroblasts, (ii) esophagus-mucosa, (iii) muscle-skeletal, (iv) nerve-tibial, (v) pituitary gland and (vi) thyroid. This suggests that these tissues are likely to be regulated by the variants in such genes, which are related to genomic stability and cell death.
Even though we did not find any works involving both CASP8 and XRCC1 and such tissues, there are a few studies in the global literature on the possible association of these genes and the development of different types of cancer, such as lung adenocarcinoma, breast cancer, gallbladder cancer, acute lymphoblastic leukemia, as well as gastric and colorectal cancers [32,[47][48][49][50][51][52].
It is also notable that both skin tissues (SE and NSE) presented differential expression in CASP8, CYP19A1 and IL1A and that NSE skin also presented this difference for NFKB1. This finding suggests a possible influence of this variant and gene on skin cancer development and reinforces previous studies that have reported the association of INS/INS genotype of this variant in NFKB1 with an increased risk of developing melanoma in a Swedish and in a Brazilian population [53,54].
In addition to NSE skin, testis tissue also presented four differentially expressed variants (in IL1A, NFKB1, TYMS and XRCC1 genes), the highest number of variants per tissue in this analysis. No studies were found about these specific variants in testis or these genes in testicular cancer, but the role of IL1A, NFKB1 and XRCC1 have been reported in Sertoli cells and other essential factors for spermatogenesis [55][56][57][58][59][60]. Hence, given their importance in testis function, these genes might also be involved in carcinogenesis in this tissue.
In summary, here we thoroughly analyzed the distribution of 12 polymorphisms in diverse populations (groups from European, African, Native American, and Asian populations, as well as groups from the five admixed Brazilian geographical regions) and tissue expression. All analyzed markers presented statistical differences in at least one of the population comparisons, and we found 39 tissues to be differentially expressed depending on the genotype, suggesting these markers might play a role in cancer distribution in different populations. Thus, we recommend future studies with larger cohorts to explore these novel observations, as this was the first study to investigate some of these markers in these populations. Based on our findings, we point out some potential biomarkers for risk of cancer development and we highlight the importance of this type of study in populations with different genetic backgrounds.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cimb44050154/s1, Table S1. Allelic and genotypic frequencies of markers in Biometabolism and Cell Energy group. Table S2. Allelic and genotypic frequencies of markers in Genomic Stability and Cell Death group. Table S3. Allelic and genotypic frequencies of markers in Immune Response and Inflammatory Processes group. Table S4. p-Values of the comparative analyses of frequencies among all populations for each marker.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of Instituto de Ciências da Saúde, Universidade Federal do Pará.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available within the article or supplementary material.