Genetic Background Studies of Eight Common Beta Thalassemia Mutations in Thailand Using β-Globin Gene Haplotype and Phylogenetic Analysis

Single nucleotide polymorphisms are informative for haplotype analysis associated with genetic background and clinical linkage studies of β-thalassemia mutations. Hence, the aim of this study was to investigate five polymorphisms (codon 2 (C/T), IVS II-16 (C/G), IVS II-74 (G/T), IVS II-81 (C/T) and the Hinf I (T/A) polymorphism) on the β-globin gene, related to eight common β-thalassemia mutations in Thailand, including NT-28 (A > G), codon 17 (A > T), codon 19 (A > G), HbE (G > A), IVS I-1 (G > C), IVS I-5 (G > C), codon 41/42 (-TTCT) and IVS II-654 (C > T). The strongest LD (100%) between the β-thalassemia mutation allele and all five SNPs was found in NT-28 (A > G), codon 17 (A > T) and codon 19 (A > G). In the haplotype analysis, we found three haplotypes (H1, H2 and H7) related to Hb E, whereas we only found two haplotypes related to codon 41/42 (-TTCT) (H1, H3) and IVS I-1 (G > C) (H3, H4). Of interest is the finding relating to a single haplotype in the remaining β-thalassemia mutations. Furthermore, phylogenetic tree analysis revealed three clusters of these common β-thalassemia mutations in the Thai population and enabled us to determine the origin of these mutations. Here, we present the results of our study, including four intragenic polymorphisms and the finding that the Hinf I polymorphism could be informative in genetic background analysis, population studies and for predicting the severity of β-thalassemia in Thailand.


Introduction
β-Thalassemia mutation is a β-globin gene defect on chromosome 11. More than 200 mutations have been reported worldwide (http://globin.cse.psu.edu; accessed on 20 June 2022). These genetic defects result from single nucleotide substitutions and deletions or insertions of oligonucleotides that lead to a shift in the reading frame. Severe β-thalassemia diseases are related to homozygous β-thalassemia, compound heterozygous β-thalassemia and Hb E in Southeast Asia and Thailand [1,2]. In Thailand, more than 30 β-thalassemia mutations have been reported. There is a high frequency of the β-thalassemia carrier in each region of Thailand (between 3% and 9%). However, it has been reported that eight common β-thalassemia mutations in Thailand (including codon 41/42 (-TTCT), codon 17 (A > T), NT-28 (A > G), IVS II-654 (C > T), codon 19 (A > G), codon 26; HbE (G > A), IVS I-1 (G > C) and IVS I-5 (G > C)) represent more than 85% of the total β-thalassemia mutations [3][4][5]. Several previous studies have reported the genetic relationship between populations, chromosome background of the gene and linkage analysis, as well as differences in clinical phenotypic expression using haplotype and phylogenetic analyses [6][7][8]. Single nucleotide polymorphisms (SNPs) within the β-globin gene cluster on chromosome 11 have been found, with seven-eight polymorphic sites commonly studied [9,10]. However, because of the presence of recombination "hot spots" in the region, the β-globin gene haplotype can be divided into 5 and 3 sub-haplotypes, corresponding to regions 5 and 3 of the "hot spot". These 5 haplotypes are limited to study of the genetic background of β-thalassemia mutations. Only two DNA polymorphisms on 3 haplotypes are useful for studying the genetic background of β-globin genes: Ava II (IVS II-16) (rs10768683; C > G), located on the β-globin gene, and Hinf I (rs10837631; T > A), located on the 3 β-globin gene [11,12]. Newly intragenic polymorphisms of the β-globin gene are informative polymorphisms in haplotype analysis associated with genetic background and clinical linkage studies of β-thalassemia mutations [13][14][15]. However, studies of these intergenic polymorphisms have yet to be comprehensively reported for the Thai population [16]. Hence, the aim of this study was to construct four intragenic polymorphisms (codon 2 (C/T), IVS II-16 (C/G), IVS II-74 (G/T) and IVS II-81 (C/T)) and one 3 haplotype (Hinf I (T/A)) on the β-globin gene, related to eight common β-thalassemia mutations in Thailand, using haplotype and phylogenetic analysis.

Genotyping of Intragenic β-Globin Gene Polymorphisms and the Hinf I Polymorphism
Four intragenic β-globin gene polymorphisms, including codon 2 (C/T), IVS II-16 (C/G), IVS II-74 (G/T) and IVS II-81 (C/T), were genotyped using direct DNA sequencing on an ABI PRISM™ 3130 XLanalyzer (Applied Biosystems, Foster City, CA, USA), as described elsewhere [5]. Only samples with the codon 41/42 (-TTCT) mutation (four polymorphisms) were genotyped, with multiplexing targeted sequencing, using a barcode and a next-generation sequencing platform. Hinf I polymorphism genotyping was conducted using allele-specific PCR, as described elsewhere [17]. The basic statistics for five polymorphisms were calculated, including the allele frequency, genotype frequency and minor allele frequency (p-value < 0.05), using PEAS V1.0: a package for elementary analysis of SNP data. The Fisher's exact test revealed the derived allele frequency (DAF), and MINITAB release 14.12.0 statistical software was used for the calculations, indicating a significant difference between the wild type and β-thalassemia genotypes in biological polymorphic markers. A p-value < 0.05 was considered to be statistically significant.

Haplotype and Phylogenetic Analysis
Haplotype patterns (constructed using data from the four intragenic β-globin gene polymorphisms and Hinf I polymorphism, and the pairwise linkage disequilibrium (LD) test) were determined using HAPLOVIEW 4.2 software, which examined haplotypes >0.5%. The Hardy-Weinberg equilibrium (HWE) was tested in β-thalassemia subgroup mutation analysis. The D value of the linkage disequilibrium test was calculated and shown in an LD pattern [18]. The phylogenetic tree analysis was constructed with a dendrogram based on the haplotype pattern. DendroUPGMA software (http://genomes.urv.cat/UPGMA/, accessed on 10 May 2022) was used and the Jaccard (Tanimoto) coefficient was applied, with default settings; 100 bootstrap replications were performed. FigTree v.1.4.0 software was then used to design a graphical viewer for the phylogenetic tree.

Results
The hematological data for 163 subjects are shown in Table 1. The recruited samples were categorized into 14 groups as β-thalassemia genotypes. The most severe form of anemia was found in compound heterozygous β-thalassemia/Hb E cases, represented by low Hb and Hct values, whereas the homozygous Hb E, and β-thalassemia carrier showed minimal anemia with hypochromic (MCH), microcytic RBC (MCV). Hb analyses were also included and related to β-thalassemia genotypes.
Four intragenic β-globin gene polymorphisms (codon 2 (C/T), IVS II-16 (C/G), IVS II-74 (G/T) and IVS II-81 (C/T)) located on the β-globin gene and the Hinf I polymorphism on the 3 β-globin gene were genotyped using DNA sequencing and AS-PCR, respectively. Table 2 shows the allele frequency (ancestral allele and derived allele), categorized into 11 groups. Compound heterozygous β-thalassemia/Hb E accounted for one group because this was limited in the total samples. The derived allele frequency (DAF) of each polymorphism (T of codon 2, G of IVS II-16, T of IVS II-74, T of IVS II-81 and T of the Hinf I polymorphism) was calculated and compared with the β-thalassemia genotypes and wild type. Fisher's exact test revealed a significant DAF difference between the wild type and five β-thalassemia genotypes (β E /β thal , β E /β E , β N /β 17 , β N /β 19 and β N /β IVS II-654 ) in codon 2. Furthermore, the DAF showed a significant difference between the wild type and four β-thalassemia genotypes (β E /β E , β N /β 17 , β N /β 19 and β N /β IVS II-654 ) in IVS II-16, whereas the DAF in IVS II-74 showed a significant difference between the wild type and three β-thalassemia genotypes (β N /β 41/42 , β N /β IVS I-1 and β N /β IVS II-654 ). In the case of the Hinf I polymorphism, the DAF showed a significant difference between the wild type and five β-thalassemia genotypes (β E /β E , β N /β 41/42 , β N /β 17 , β N /β 19 and β N /β IVS I-1 ). However, a significant difference was not found between the wild type and β-thalassemia mutation in the case of the IVS II-81 polymorphism. Table 2. Allele frequencies of five single nucleotide polymorphisms on intragenic β-globin gene and 3 β-globin gene in different 11 β-thalassemia genotypes in Thai population. First and second allele columns indicate ancestral allele and derived allele, respectively. Bold text with superscript * indicates Fisher's exact test revealed DAF significant difference between wild type and β-thalassemia genotypes.

Discussion
In Thailand, more than 30 β-thalassemia mutations have been reported (ranging from 3% to 9% depending on the region) [2,4,5]. In this study, we focused on eight common β-thalassemia mutations and sought to demonstrate their genetic background in Thailand. Several studies have investigated Hb E and reported the results of haplotype and phylogenetic analysis [19][20][21]. However, haplotype and phylogenetic analyses have yet to be carried out for the intragenic polymorphism within the β-globin gene in Thailand. Here, we report a constructed haplotype of common β-thalassemia mutations based on four intragenic polymorphism β-globin genes and the Hinf I polymorphism in Thailand. The novel haplotype constructed in this study may provide a better representation of the actual genetic background of these mutations than those previously constructed. Previous β-globin gene haplotype studies have had the limitation of a recombination hot spot near the 5 end of the β-globin gene, which may lead to mistakes in detecting mutated alleles [8,11,13,22]. To increase the accuracy of the indirect genetic background analysis, the evaluation of new polymorphic sites and molecular analysis of β-thalassemia are investigated [13]. Table 1 shows that some genotypes of β-thalassemia mutations have only a few samples that may influence the polymorphism analysis, such as the cases of compound heterozygous Hb E/β-thalassemia (codon 19), Hb E/β-thalassemia (IVS I-5), Hb E/βthalassemia (codon 17). In Table 2 we accumulated these same genotypes as compound heterozygous β-thalassemia/Hb E groups for polymorphisms analysis. Table 2 shows the allele frequency of five polymorphisms within the β-globin gene and the 3 end of the β-globin gene. In the Thai population, the wild type was found to have high levels of heterogeneity in codon 2 (C/T), IVS II-16 (C/G) and the Hinf I polymorphism, with a minor allele frequency (MAF) of more than 30%, whereas IVS II-74 (G/T) and IVS II-81 (C/T) were found to have the least heterogeneity (MAF < 5%). However, two intragenic polymorphisms (IVS II-74 (G/T) and IVS II-81 (C/T)) have been found to have high levels of heterogeneity in other populations [13,14,23]. We compared the β-thalassemia genotype and wild type in each DAF of polymorphisms using the Fisher's exact test. Compound heterozygous β-thalassemia/Hb E was found to be significantly different for codon 2 (C/T). Homozygous Hb E was found to be different for codon 2 (C/T), IVS II-16 (C/G) and the Hinf I polymorphism. Heterozygous β-thalassemia codon 41/42 (-TTCT) and IVS I-1 (G > C) were found to be different for IVS II-74 (G/T) and the Hinf I polymorphism, whereas heterozygous β-thalassemia codon 17 (A > T) and codon 19 (A > G) were found to be different for codon 2 (C/T), IVS II-16 (C/G) and the Hinf I polymorphism. Lastly, heterozygous β-thalassemia IVS II-654 (C > T) was significant for codon 2 (C/T), IVS II-16 (C/G) and IVS II-74 (G/T). However, compound heterozygous β-thalassemia (NT-28 (A > G))/Hb E, heterozygous β-thalassemia NT-28 (A > G) and heterozygous β-thalassemia IVS I-5 (G > C) were not found to be significantly different from the wild type in all five polymorphisms. Interestingly, several β-thalassemia genotypes were found to be associated with intragenic polymorphisms, which could be a useful marker for linkage analysis and prenatal diagnosis. Furthermore, these intragenic polymorphisms could be routinely investigated in β-thalassemia mutation detection, based on direct DNA sequencing. The strong LD (D value = 1) between each β-thalassemia mutation allele (NT-28 (A > G), codon 17 (A > T) and codon 19 (A > G)) with all five SNPs is useful for predicting the genetic background of these β-thalassemia mutations in the Thai population. Furthermore, we found that these β-thalassemia mutations have a single origin, based on their single haplotype pattern.
Haplotypes are information for better understanding of the origin and evolutionary relationships between populations or clinical studies [24,25]. Haplotype analysis revealed five different haplotypes of the wild type in the Thai population, based on intragenic polymorphisms within the β-globin gene and the Hinf I polymorphism. Previous studies have reported finding framework 1 (16.32%), framework 2 (44.63%) and framework 3 (39.05%) [19]. In this study we found only framework 1, 2 and 3a, but not framework 3. This may be due to the fact that framework 3 can be differentiated by IVS II-81 (C/T) polymorphisms, whereas the previous study only analyzed 3 haplotypes, including Ava II and Hinf I polymorphisms. In this study, we found that the haplotype pattern of Hb E was related to haplotype H1 (framework 3a), H2 (framework 2) and H7 (framework 3a), which is in line with the results of the previous study [19]. The haplotype pattern of other β-thalassemia mutations has yet to be reported. Interestingly, we found two haplotype patterns in codon 41/42 (-TTCT) (H1, H3) and IVS I-1 (G > C) (H3, H4), whereas a single haplotype was found in NT-28 (A > G), codon 17 (A > T), codon 19 (A > G), IVS I-5 (G > C) and IVS II-654 (C > T), indicating that these β-thalassemia mutations have two points of origin and a single origin in the Thai population, respectively.
We constructed a phylogenetic tree of common β-thalassemia mutations in the Thai population, as shown in Figure 2. With regard to the evolution of the β-thalassemia mutations, Cluster I was found to be associated with codon 17 (A > T), NT-28 (A > G), codon 41/42 (-TTCT) and IVS I-5 (G > C), which originate from the same branch, whereas the two Hb E origins were different in the others. This cluster represents the highest level of mutation change (from the root of the tree upwards). Furthermore, the short branch represents younger mutations in this region. Cluster II shows the least amount of evolution and is associated only with codon 19 (A > G) and Hb E, with codon 19 (A > G) also showing a single origin in the Thai population. Cluster III was found to be related to codon 41/42 (-TTCT) and closely to IVS II-654 (C > T), whereas IVS I-1 (G > C) showed two origins that were separate from the two in Cluster III. This phylogenetic tree may provide a better representation than previously reported conceptualizations of the overall genetic background and evolution of common β-thalassemia mutations in Thailand.

Conclusions
Four intragenic polymorphisms on the β-globin gene and the Hinf I polymorphism may be useful markers for linkage analysis with β-thalassemia mutations in Thailand, and all five SNPs are useful for predicting the genetic background of these β-thalassemia mutations. Phylogenetic tree analysis shows demographic details of the evolution of eight common β-thalassemia mutations in Thailand. These polymorphisms should be selected when predicting clinical phenotypic correlations in the future.
Author Contributions: R.K. performed the experiments, analysis of data and wrote the initial manuscript. W.T. was involved in conceptualization and analysis of data. P.S., S.T., A.P., N.T. and R.R. performed the experiments and data analysis. W.J. was involved in conceptualization, acquisition of the grants, analysis of data, editing and guaranteeing the paper. All authors approved the final version. All authors have read and agreed to the published version of the manuscript.