Identification of Genomic Variants Associated with the Risk of Acute Lymphoblastic Leukemia in Native Americans from Brazilian Amazonia

A number of genomic variants related to native American ancestry may be associated with an increased risk of developing Acute Lymphoblastic Leukemia (ALL), which means that Latin American and hispanic populations from the New World may be relatively susceptible to this disease. However, there has not yet been any comprehensive investigation of the variants associated with susceptibility to ALL in traditional Amerindian populations from Brazilian Amazonia. We investigated the exomes of the 18 principal genes associated with susceptibility to ALL in samples of 64 Amerindians from this region, including cancer-free individuals and patients with ALL. We compared the findings with the data on populations representing five continents available in the 1000 Genomes database. The variation in the allele frequencies found between the different groups was evaluated using Fisher’s exact test. The analyses of the exomes of the Brazilian Amerindians identified 125 variants, seven of which were new. The comparison of the allele frequencies between the two Amerindian groups analyzed in the present study (ALL patients vs. cancer-free individuals) identified six variants (rs11515, rs2765997, rs1053454, rs8068981, rs3764342, and rs2304465) that may be associated with susceptibility to ALL. These findings contribute to the identification of genetic variants that represent a potential risk for ALL in Amazonian Amerindian populations and might favor precision oncology measures.


Introduction
Acute Lymphoblastic Leukemia (ALL) is the most common cancer in children and is the principal cause of childhood mortality due to malignant disease [1,2]. The genetic etiology of ALL is driven by an ample diversity of alterations of the pathways responsible for the regulation of the cell cycle in the lymphoid precursors of the B and T cell lines, which include chromosomal translocations, mutations, and aneuploidy [3][4][5]. In recent years, Genome-Wide Association Studies (GWASs) have identified a number of loci associated with the risk of developing ALL, including the ARID5B, IKZF1, PIP4K2A, CEBPE, GATA3, BMI, and CDKN2A genes [6][7][8][9][10][11].
Most GWASs addressing ALL susceptibility have focused on homogeneous populations in regions such as Europe or North America, whereas the highest incidence of childhood ALL is found in populations with a major component of native American ancestry, such as the Latin American and hispanic populations of the New World [9,[12][13][14]. These populations are highly admixed and, like the population of Brazil, are primarily descended from European, African, and native American ancestors [15,16], and the high incidence of ALL observed in Latin American and hispanic populations has been attributed [9,12,13,17] to genetic risk factors related to native American ancestry. Despite the evidence that genetic variants related to native American ancestry may influence the incidence of childhood ALL, no data are available on the distribution of these variants in traditional Amerindian populations.
Given this, in the present study, we investigated the genetic variants potentially involved in the etiology of ALL in traditional Amerindian populations from Brazilian Amazonia. For this, we used New Generation Sequencing (NGS) to define the exomes of 17 genes associated with susceptibility to ALL in samples obtained from indigenous groups that inhabit Brazilian Amazonia, in both ALL patients and cancer-free individuals. The variants encountered in this initial analysis were compared with the data available on populations representing five different continents, which were obtained from the 1000 Genomes database.

Ethics, Consent, and Permissions
The present study was approved by Brazilian National Committee on Research Ethics-CONEP (identified by No 1062/2006 and 123/1998). All participants signed a free-informed consent as well as the tribe leaders and when necessary, a translator explained the project and the importance of the research. Their materials were collected according to the Declaration of Helsinki.

Study Population
The study sample included 59 cancer-free Amerindians and five individuals diagnosed with ALL. All the participants were members of isolated ethnic groups located in Brazilian Amazonia (Table S1). The genomic Amerindian ancestry of all these individuals was quantified and found to be at least 64% in all cases. The Amerindians with ALL were diagnosed and treated in two public hospitals specialized in the treatment of childhood cancer, the Ophir Loyola Hospital and the Octavio Lobo Childhood Oncology Hospital, both located in the city of Belém, in Pará state, northern Brazil. The clinical and demographic data on these patients are presented in Table S2.
Data were obtained from the 1000 Genomes Project database (available at https: //www.1000genomes.org, accessed on: 20 February 2021) to provide comparisons with ethnic groups from other continents. This sample included 661 individuals from Africa (AFR), 503 from Europe (EUR), 347 from the Americas (AMR), 504 from East Asia (EAS), and 489 from South Asia (SAS).

Selection of the Genes
A total of 17 genes were selected for the present study (see Table S3). The genes were selected based on a search of the literature available in the NCBI and Ensembl databases, and on GWASs or studies of other genetic markers associated with the risk of ALL. More details are described in Table S4.

Extraction of the DNA and Preparation of the Exomes
Samples of 5 mL of peripheral blood were collected from each of the participants of the study. The genetic material was extracted from these blood samples using the Roche Applied Science DNA extraction kit (Roche, Penzberg, Germany) following the manufacturer's instructions, and it was quantified using a NanoDrop 1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). The exome libraries were prepared using the commercial Nextera Rapid Capture Exome kit (Illumina, San Diego, CA, EUA) and the SureSelect Human All Exon V6 kit (Agilent, Santa Clara, CA, USA), with the manufacturer's protocol being followed in both cases. The sequencing reactions were run in the NextSeq 500 ® platform (Illumina ® , San Diego, CA, USA) using the NextSeq 500 high-output v2 300 cycle kit (Illumina ® , San Diego, CA, USA).

Bioinformatic Analysis
The bioinformatic analyses followed the approach described by Ribeiro-Dos-Santos et al. [18] and Rodrigues et al. [19]. For this, the sequences were first filtered to eliminate low-quality reads, and then mapped and aligned with the reference genome (GRCh38) using BWA v.0.7. The alignment was then processed to remove duplicate sequences, recalibrate the mapping quality, and finalize local realignment. The results were processed in GATK v.3.2 to identify the reference genome variants. The Viewer of Variants (ViVa ® , Universidade Federal do Rio Grande do Norte, Natal, RN, Brazil) software was used to analyze the annotations of the variants. The variants were annotated in three databases-SnpEff v. 4

.3.T, Ensembl Variant Effect Predictor (Ensembl version 99), and
, and FATHMM-MKL databases were used for the in silico prediction of pathogenicity.

Statistical Analysis
For the statistical analyses, the cancer-free Amerindians were assigned to the Native (NAT) group, while the ALL patients were in the ALL_NAT (Native with ALL) group. All the analyses were run in the R v.3.5.1 program. The differences in the allelic frequencies between the NAT and ALL_NAT were evaluated using Fisher's exact test, which was also applied to the comparisons with the continental populations. A p ≤ 0.05 significance level was considered for all the analyses.

Results
The analysis of the exomes of the 64 Amerindians investigated in the present study revealed the presence of 125 variants, seven of which were new. The SnpEff software [20] was used to annotate and predict the effects of these variants. This procedure classifies the impact of the variants in four categories: (i) modifier (no evidence of impact), (ii) low (no apparent alteration of protein function), (iii) moderate (some alteration of protein function), and (iv) high (high level of impact on protein function). The 118 variants identified in the 17 genes investigated in the present study are described in Table S3. Four of the new variants were classified as modifiers (Table 1), two as low effect, and one as moderate. The majority of these new variants were found in the ALL_NAT group, at frequencies of less than 0.1%.
As some of the variants were not covered adequately in one or other of the groups (NAT or ALL_NAT), they were excluded from the comparison of allele frequencies. This left 64 variants that were included in the analysis of association. The frequencies of these variants are compared between groups in Table 2.  The analyses revealed significant differences between the two groups in six variants of five genes: PIP4K2A (variants rs2765997 and rs1053454), CDKN2A (rs11515), IGF2BP1 (rs8068981), USP7 (rs2304465), and WWOX (rs3764342). The allele frequencies of these six variants recorded in the ALL_NAT group were also compared with those recorded in the 1000 Genomes Project for the five continental populations (AFR, AMR, EAS, EUR, and SAS). The comparisons are shown in Table 3 and the p values, in Table 4.
The analyses revealed significant differences between the Amerindian group (ALL_NAT) and all the continental populations (AFR, AMR, EAS, EUR and SAS) in the frequency of the rs11515 variant of the CDKN2A gene. The frequency of the rs8068981 variant of the IGF2BP1 gene was also significantly different from that of the EAS population, while the frequency of the rs2765997 variant of the PIP4K2A gene was significantly different from those of the AFR and EAS populations.   Table 3).

Discussion
The incidence of ALL is relatively high in populations with a high degree of native American ancestry, such as Latin American and Hispanic populations, which has been attributed, in part, to the contribution of molecular markers associated with a high risk of ALL in these populations [9,[12][13][14]21]. Despite this known association, no genomic data are available on the susceptibility to ALL of traditional Amerindian populations from Brazilian Amazonia. In the present study, we investigated the exomes of 17 of the principal genes associated with susceptibility to ALL in samples of indigenous Amazonian populations, including ALL patients and cancer-free individuals. It would be interesting to demonstrate whether the results observed for the markers investigated here in Amerindian populations would demonstrate the same profile in a larger cohort of leukemic patients from the same ethnic group. However, samples from indigenous patients with ALL are difficult to obtain, given the rarity of the disease and due to the fact that these populations inhabit remote and difficult-to-access rural regions, which reflects on the difficulty of care and clinical follow-up of these patients.
During the study, we identified seven new variants in four genes (IKZF1, IKZF3, WWOX, and ZPBP2), most of which were present in the ALL_NAT group, that is, Amerindian ALL patients. These genes are known to have alleles associated with some level of risk in the etiology of ALL in different populations [17,[22][23][24]. Given this, we believe that the new variants identified in the present study should be investigated further as potential risk factors for the incidence of ALL in Amerindian populations.
We also identified six variants of five genes that we associated with susceptibility to ALL (CDKN2A_rs11515, PIP4K2A_rs2765997 and rs1053454, IGF2BP1_rs8068981, WWOX_ rs3764342, and USP7_rs2304465) given the significant differences in the frequencies recorded in the two Amerindian groups (ALL patients and cancer-free individuals), and between the ALL group and populations of other groups from around the world.
The CDKN2A gene plays an important role in leukemogenesis. The rs11515 variant is located in the 3 -untranslated region (UTR) of the CDKN2A gene, and is known to be associated with a number of different types of cancer [25], including breast cancer [26], glioblastoma [27], melanoma [28], and colorectal cancer [29]. However, no data are available on the possible influence of this variant on the risk of developing ALL. In the present study, we recorded a high frequency of this variant in the Amerindian population, which was significantly higher than that recorded in other populations around the world. This indicates that research on this variant should be prioritized for the identification of its potential role in the etiology of ALL in this population.
A number of previous studies of genetic polymorphisms in the PIP4K2A gene have identified an association with susceptibility to ALL [30,31]. We identified an association with ALL in two variants (rs2765997 and rs105334534) of the PIP4K2A gene, given that both were more frequent in the Amerindian population, in comparison with the other continental populations, which may reflect a potential correlation with the risk of ALL.
The IGF2BP1 protein is expressed in a number of different types of cancer, including leukemia. In a recent study based on in vivo and in vitro analyses, Elcheva et al. [32] found evidence of a significant correlation between IGF2BP1 and the aggressiveness of leukemia, through the persistence of tumorigenicity by increasing critical transcriptional and metabolic regulators. These authors also emphasized that the IGF2BP1 gene is often positively regulated in many types of malignant disease, and is not expressed in most normal tissue, which means that it is a potentially important target for anticancer therapy. The results of the present study indicate that the rs8068981 variant of the IGF2BP1 gene is present at similar frequencies in the Amerindians with ALL and the European population, but contrasts with the Asian populations.
We also identified in the Amerindian population a positive association with the rs3764342 variant of the WWOX gene, which has a suppressor role in solid cancers, with the loss of function resulting in alterations in the adhesion of the cancerous cells to the extra-cellular matrix, which affects cell migration and metastasis. A number of studies have contributed to the description of the role of this gene in leukemic malignancies. Luo et al. [33] found that the expression of the WWOX mRNA and protein is significantly reduced or absent in cases of leukemia and their cell lines in comparison with the controls, which is consistent with the findings of Cui et al. [34], who evaluated patients diagnosed with different types of leukemia.
In the present study, the rs2304465 variant of the USP7 gene was also more frequent in Amerindian patients. Jin et al. [35] found an association between USP7 and a lack of ubiquitination and the stabilization of the NOTCH1 oncogene, which contributes to the control of the cell growth of the T cells in lymphoblastic leukemia.

Conclusions
The present study is the first to investigate the exome of genes involved in the etiology of Acute Lymphoblastic Leukemia (ALL) in Amerindian populations from the Amazon region. The results of the study provide important genetic data related to the etiology of ALL in such population in which genetic investigations are scarce. The study also contributes to the identification of variants of potential risk involved in the etiology of ALL in Amerindian populations, as well as in other Brazilian populations formed through a high level of admixture with this indigenous group.
Data Availability Statement: Not applicable.