Haplogroup Distribution of 309 Thais from Admixed Populations across the Country by HVI and HVII Sanger-Type Sequencing

The mitochondrial DNA (mtDNA) control region sequences for the hypervariable regions I (HVI) and II (HVII) of 309 Thai citizens were investigated using Sanger-type sequencing to generate an mtDNA reference dataset for forensic casework, and the haplogroup distribution within geographically proximal Asian populations was analyzed. The population sample set contained 264 distinct haplotypes and showed high haplotype diversity, low matching probability, and high powers of discrimination, at 0.9985, 0.4744%, and 0.9953, respectively, compared with previous reports. Subhaplogroup F1a showed the highest frequency in the Thai population, similar to Southeast Asian populations. The haplotype frequencies in the northern, northeastern, and southern populations of Thailand illustrate the relevance of social, religious, and historical factors in the biogeographical origin of the admixed Thai population as a whole. The HVI and HVII reference datasets will be useful for forensic casework applications, with improved genetic information content and discriminatory power compared to currently available techniques.


Introduction
The analysis of human mitochondrial DNA (mtDNA) has been conducted globally in recent years, following the standard forensic casework guidelines using Sanger-type sequencing (STS) [1][2][3][4][5]. Compared with the nuclear genome, high copy numbers of mtDNA are observed within the cell, leading to an increased likelihood of recovering useable mtDNA data [3,6]. The mtDNA typing of hairs, saliva, and bone samples collected at crime scenes is frequently used as the DNA-based testing option [1,3,[7][8][9][10]. The power of mtDNA typing to discriminate between individuals based on maternal inheritance is limited, with a lack of recombination, but mtDNA is frequently applied in routine forensic applications [7,11]. The noncoding control region is most suited for mtDNA typing and shows a high degree of genetic variability at the population level [8]. Historically, mtDNA typing has focused on the 342 bp (16,385) hypervariable region I (HVI), and the 268 bp (73-340) HVII, with reference to the human mtDNA genome sequence (GenBank accession number: NC_012920.1) [3,[12][13][14][15]. The probability of previously reported mtDNA (13 • 63) populations. The study focus is concentrated on single-source reference specimens for the Thai population. This research was approved by the Ethical Clearance Committee on Human Rights related to research involving human subjects, and the Faculty of Medicine, Ramathibodi Hospital, Mahidol University (COA. MURA2020/825). Blood samples were collected as a source of DNA, from which genomic DNA was extracted using the QIAamp DNA Blood Mini Kit (Qiagen GmbH, Hilden, Germany). DNA quantity was determined using the spectrophotometry approach (NanoDrop, Thermo Fisher Scientific, Waltham, MA, USA).

Amplification and Sequencing of HVI and HVII by STS
The nucleotide sequences of HVI (16,365) and HVII (73-340) [13] of the mtDNA control region were amplified using primers (HV1F, 5 -CTCCACCATTAGCACCCAA-3 ; HV1R, 5 -ATTTCACGGAGGATGGTG-3 ; HV2F, 5 -CACCCTATTAACCACTCACG-3 ; and HV2R, 5 -CTGTTAAAAGTGCATACCGC-3 ) specifically designed for the study. PCR amplification total volumes were performed using 25 µL Master Mix, consisting of 10X Taq buffer, containing 1.5 mM MgCl 2 , 0.2 mM dNTPs, 5.0 µM primers, 0.5 U AmpliTaq Gold DNA Polymerase (Life Technologies, Foster City, CA, USA), and 25 ng genomic DNA. In the PCR reaction tube, each primer pair was amplified individually. The amplification conditions of both primer pairs were: initial denaturation at 95 • C for 10 min, followed by 35 cycles of 94 • C for 20 s, 55 • C for 20 s, and 72 • C for 30 s, with a final extension at 72 • C for 10 min. The PCR products were separated by electrophoresis in 1% agarose gel and treated with ExoSAP-IT (USB, Cleveland, OH, USA) to eliminate the primers and unused dNTPs. Sequencing reactions were performed with BigDye Terminator version 1.1 (Applied Biosystems, Foster City, CA, USA), in accordance with the manufacturer's instructions, and purified using the DyeEx 2.0 Spin Kit (Qiagen). All reactions were sequenced using a 3130 Genetic Analyzer (Applied Biosystems), and the results were analyzed using SeqScape version 2.5 software (Applied Biosystems). The consensus sequences were aligned and compared with the revised Cambridge Reference Sequence (rCRS, NC_012920.1) following the nomenclature guidelines for mtDNA typing [13,50,51].

Data Analysis of mtDNA HVI and HVII
The summary statistics (number of haplotypes, haplotype diversity, matching probability, and power of discrimination) for combinations of HVI and HVII were based on pairwise comparisons of Thai populations. The matching probability (P) was calculated using the equation P = ∑ x 2 . The power discrimination (PD) was determined using the equation PD = 1 − ∑ x 2 . The haplotype diversity HD was determined using the equation , where x is the frequency of each mtDNA haplotype, and n is the total number of samples [52][53][54].
We analyzed the differentiation among Thai populations based on the administrative regions (N, NE, C, and S), using a pairwise F ST with the "stamppFst" function in the StAMPP package in R [55], following the method proposed by Wright [56], and updated by Weir and Cockerham [57]. Empirical p-values were obtained by comparing the observed value of F st with 1000 bootstrapping pseudoreplicates. The principal component analysis (PCA) was also performed on the dataset, using the function "prcomp" in the "stats" package in R [58]. The results from the PCA were visualized using the function "fviz_pac_ind" in the package "factoextra" in R [59].

Classification of mtDNA Haplogroup and Data Analysis
The MtDNA haplotypes were affiliated to the haplogroups based on the patterns of the shared haplogroup-specific nucleotide polymorphisms, using HaploGrep2 with PhyloTree 17 [60], and the MITOMAP (https://www.mitomap.org, acessed on 15 March 2020) and EMPOP (https://www.empop.online, acessed on 15 March 2020) [61] online databases, based on HVI and HVII sequences. All generated data were deposited in the EMPOP database as rCRS variations, under reference number EMP00840 [62]. Haplogroup frequencies were visualized using the "ggplot" function [63] and analyzed for significant differentiation among the regions using the chi-square test in the "stats" package in R [58]. The workflow graphic from the 309 sampling processes for data evaluation are shown in Figure S1.

Results
A total of 264 different haplotypes were observed, comprising 231 unique haplotypes (74.76%), and 33 haplotypes that shared more than one individual. One shared haplotype also resulted from the varied numbers of 2-5 individuals in 78 unrelated individuals (25.24%) ( Table 1).
The number of haplotypes was estimated in relation to four continental Thai groups, namely, the northern, northeastern, central, and southern populations (Table S1). The haplotype diversity of the HVI/HVII combinations was 0.9985, and the matching probability and power of discrimination were 0.4744% and 0.9953, respectively ( Table 1) We analyzed differentiation among the Thai population based on the administrative regions (N, NE, C and S). The pairwise FST values among the populations were approximately zero and did not differ significantly from the random (p-value ≥ 0.715), suggesting little differentiation among the populations. (Table 2).  We analyzed differentiation among the Thai population based on the administrative regions (N, NE, C and S). The pairwise F ST values among the populations were approximately zero and did not differ significantly from the random (p-value ≥ 0.715), suggesting little differentiation among the populations. (Table 2). All 264 haplotypes were subsequently classified into 82 haplogroups using Haplo-Grep2 with PhyloTree 17 and EMPOP [54]. The most frequent haplogroups were: F1a (13.92%, n = 43); B5a (8.41%, n = 26); M7b (6.47%, n = 20); R9b (5.83%, n = 18); and B4c (3.56%, n = 11) (Table 3, Figure 2). The haplogroup frequencies based on the HVI and HVII SNP data of the Thai population showed some degree of separation among the populations of different regions (Figure 3). However, the proportion of haplogroup frequencies did not differ significantly from the random (chi-square test, p = 1.00). In this study, haplogroup frequencies were compared with the 12 neighboring countries in Southeast Asia, South Asia, and East Asia (Figure 4). The results show that the frequencies of haplogroups in Thailand are similar to the However, the proportion of haplogroup frequencies did not differ significantly from the random (chi-square test, p = 1.00). In this study, haplogroup frequencies were compared with the 12 neighboring countries in Southeast Asia, South Asia, and East Asia (Figure 4). The results show that the frequencies of haplogroups in Thailand are similar to the data from Southeast Asia, while the Thai population shows a slight differentiation from other Asian countries (Figure 4; chi-square test, p = 1.00). However, almost all of the Southeast Asian countries had higher frequencies of the F1a haplogroup compared to the other regions ( Figure 4). The first two PCAs of the haplogroup frequencies accounted for 55.3% of the varia tion and show that Southeast Asian populations appear more similar to each other tha to countries from the other regions ( Figure 5). The first two PCAs of the haplogroup frequencies accounted for 55.3% of the variation and show that Southeast Asian populations appear more similar to each other than to countries from the other regions ( Figure 5).  The first two PCAs of the haplogroup frequencies accounted for 55.3% of the variation and show that Southeast Asian populations appear more similar to each other than to countries from the other regions ( Figure 5).

Discussion
Forensic genetic analyses provide useful information, as biological evidence found at crime scenes, for the identification of missing people after mass disasters, or for inferring the cause and manner of death. The STS approach is accepted as the standard method for mtDNA typing using HVI and HVII [2,3]. However, this approach is labor-intensive, timeconsuming, and costly as an analytical method. An alternative choice is required for practical mtDNA sequencing. Accurate human identification analysis is extremely important

Discussion
Forensic genetic analyses provide useful information, as biological evidence found at crime scenes, for the identification of missing people after mass disasters, or for inferring the cause and manner of death. The STS approach is accepted as the standard method for mtDNA typing using HVI and HVII [2,3]. However, this approach is labor-intensive, time-consuming, and costly as an analytical method. An alternative choice is required for practical mtDNA sequencing. Accurate human identification analysis is extremely important and a reference database for an individual country can be used by forensic laboratories to report casework precisely. To establish a reference mtDNA database in forensic casework for Thailand, we compiled a high-quality HVI and HVII sequence dataset for 309 Thai individuals. A total of 264 haplotypes showed 0.9985 diversity. This diversity was higher than recorded in the datasets of Thai populations (n = 190, 0.989; n = 124, 0.9958; n = 100, 0.9859) reported in previous studies [28,61,62,66], as well as in datasets for the Lao (n = 214, 0.790), Vietnam (n = 187, 0.991), and Singapore (n = 205, 0.9961) populations [15,23,[67][68][69]. Previous studies of the random matching probability and the power of discrimination in a unique ethnic group from Chiang Mai, located in the north of Thailand, (Tai, Mon-Khmer), were 0.9980 and 0.9906, respectively [28,54]. Modern Thai individuals, comprising an admixed population, demonstrated random matching probability, and the powers of discrimination were 0.4744% and 0.9985, respectively. These findings suggest that the present mtDNA dataset exhibited high haplotype variation and was suitable for human identification as a reference dataset.
In this study, information on the single-nucleotide polymorphisms (SNPs) of HVI and HVII was classified into haplogroups. These groups may provide valuable information for determining the patterns of variation and population structures of maternal lineages to support human identification [36]. In Thailand, human identification is commonly performed using autosomal STR (short tandem repeat) markers, but this method is sometimes unsuccessful in skeletal-remains identification. Therefore, to improve the results, we applied mtDNA haplogroups for additional supporting evidence. The majority of haplotypes detected in Thai populations belong to the M and N macro haplogroups, and comprise three major haplogroups (M7, R9, and B), as also observed in Asian populations [15,21,36]. The subhaplogroup F1a showed the highest prevalence in Thai populations (13.92%), consistent with other Southeast Asian populations (Table S2) [16,36,[68][69][70] (Figures 2 and 5). This finding concurs with historical and geographical details, given that Thailand has a long tradition of granting political asylum to refugees from neighboring countries who have been persecuted as a result of their religion or ethnicity. Thailand is the only Southeast Asian country that has never been colonized by a European power [71,72]. For centuries, Vietnamese Christians, the Mon ethnic group from Myanmar, and political dissidents from Cambodia, have all sought and received shelter in Thailand [71]. Many ethnic groups are resident in Thailand, including Malays, Mon, Khmer, and various hill tribes, as a result of the historic migrations of those fleeing civil wars and economic crises [15,71]. By contrast, Chinese, and other East Asian populations, contain a high proportion of subhaplogroup D4, whereas the majority of populations in South Asian countries, such as Nepal and India, contain subhaplogroup G2a [73,74] (Table S2). These results collectively suggest that subhaplogroup proportions can be used to determine the geographic origin of human populations.
In order to shed more light on the genetic admixture of haplogroups in Thailand, we compared the current samples, organized into four distinct geographic groups, with various Asian populations (Figure 3). For the migration of various ethnic groups, originating from distinct geopolitical regions, into the same geographical area, dynamic gene flow was observed in India, Indonesia, Australia, Sweden, Macedonia, Brazil, and the USA [75][76][77][78][79]. Although F1a was the predominant subhaplogroup detected in Thailand, distinctively high proportions of F1a, and other subhaplogroups, were observed in the northern and northeastern populations compared with the central population. The Kingdom of Lan Na captures the cultural development of the northern Thai people, which originated more than 650 years ago, coincident with successive Kingdoms of Thailand [80]. The northern population of hill tribes was also impacted by sex-specific migration and cultural factors [46]. The predominant subhaplogroup of the northern population is F1a, which differs substantially from the other subhaplogroups in the population. Over 200 years ago, Burma expanded its influence to the Lan Na Kingdom, and Burmese populations contain a high proportion of F1a [36]. This suggests the possibility of genetic exchange between the two kingdoms, in accordance with the Lan Na territorial occupation dynamic. The Lan Na Kingdom merged with the Kingdom of Thailand 88 years ago [80], and the distinctively high proportion of F1a remains. These results concur with the general distribution of maternal lineages in the northern population of Thailand, observed by Kampuansai et al., and Kriengchutima et al. [47,54], whereas in the northeastern population, the frequencies of F1a and B5a were consistent with the proportion of subhaplogroups in Lao, in accordance with historic migration, as determined by Bodner et al. [16]. By contrast, the subhaplogroups in the southern population became isolated, as exemplified by the absence of B5a and B4 from the majority of the Thai population. This result agrees with the most frequent subhaplogroup in populations on the Malay Peninsula, as recorded by Tabbada et al. [39], and Maruyama et al. [40]. Religion possibly influenced marriage because, in southern areas, the majority of residents are Muslims, similar to Malay populations [40]. It is likely that multidirectional cross-cultural marriages were influenced by social and historical conditions that varied significantly for each distinct geographical area. The modern Thai population reflects a high degree of genetic admixture as a result of recurrent historically interracial marriage events, as also observed in similar proportions in the subhaplogroups of the central population ( Figure 1) [45,47].
During the twelfth and thirteenth centuries, Thai people were subjugated under the Khmers (in the area now known as Cambodia), leading to mtDNA exchange between the Tai-Kadai and Khmer groups throughout the central part of Thailand [29]. This might also reflect the proportion of haplogroups in the central and northeastern regions of Thailand, which show high frequencies of the subhaplogroups F1a and B5a, similar to the subhaplogroup distribution in Cambodia (Figure 4) [41]. One subhaplogroup M7b lineage was also detected several times in the northern and northeastern populations of the dataset and is also observed in southern China [21]. The Chinese colonization of Thailand occurred during the nineteenth and twentieth centuries, with the migration from South China into Southeast Asia of those fleeing national economic and political unrest [16,81]. This suggests that the emergence of subhaplogroup M7b was caused by Chinese migration to Southeast Asia ( Figure 4). However, a high proportion of the population retains haplogroup F1a. This result is consistent with a significant gender bias and contribution imbalance reported for parental lineages in the Chinese-Thai population caused by traditional ethnicity, together with maternal contribution [16]. Alternatively, Chinese colonization had a notable gender bias toward masculine contributions. Such migrant men interbred with local Thai residents, especially during the initial colonization phases [46]. mtDNA markers display a more asymmetrical distribution of ancestral population contributions to the current genetic pool [12]. The historical and cultural factors that motivated the admixture mechanisms in Thailand have resulted in the current admixed population. Such diverse biogeographical groups shaped the current Thai population, forging a genetically and culturally diverse, highly admixed country [70].
This study evaluated the utility of mtDNA HVI and HVII in the Thai population to develop a national reference database for real forensic casework using 309 specimens collected across the country. Diversity indices and other relevant forensic statistical parameters indicated a high degree of polymorphism suitable for human identification purposes. Workflow automation should also be considered when casework volumes are high or a backlog exists. The construction of a representative forensic national database is also essential, with the addition of a substantial number of specimens to evaluate possible genetic regional or ethnic stratification in the Thai population. However, in particular situations, HVI and HVII data do not provide sufficient discriminatory information to resolve distinct maternal lineages. A recent study examining individuals from across the USA determined HVI and HVII information, with a discrimination of 0.9942, and a matching probability of 0.96% for specimen haplotypes [79]. Complete mtGenome (mitogenome) data allowed for identification at almost full resolution, with higher discrimination at 0.9999, and matching probability at 0.39%, for the US population [79]. Therefore, mitogenomic data are required to improve human identification by mtDNA testing in Thailand. A reference mtDNA database is also needed to develop the entire mitogenome reference population data suitable for forensic comparison. Such a database would supply additional information, with special consideration for the high ethnic diversity and admixed populations in Thailand. The investigatory unit might seek to resolve missing person cases, in which more than one reference family share the same HVI and HVII haplotype, using the mitogenome, thereby increasing statistical support when exclusionary references are unavailable [81][82][83]. This approach can lead to the correction of haplotype classification and the identification of undescribed subclades. STS processing is labor-intensive, costly, and time-consuming; however, routine work still follows this technique in local units across Thailand. These days, parallel sequencing technology is an efficient high-throughput tool, and further reports of mitogenomic sequencing should be challenged in Thai mtDNA datasets.

Conclusions
The HVI and HVII sequences analyzed in this study were generated in accordance with high-quality laboratory standards. The frequencies of the identified haplogroups were characterized for the admixed Thai population. The F1a, B5a, M7b, and R9b subhaplogroups showed high frequencies in the population database. The establishment of regional databases is necessary, especially for forensic database searches, in order to obtain reliable frequency estimates. Additional sequences from such haplotypes must be collected and genotyped in order for the mitogenome to expand the Thai population database. The haplotypes reported in this study will be available from the EMPOP database (https://www.empop.org, acessed on 21 December 2020) upon publication.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/d13100496/s1. Figure S1: Workflow graphic from 309 sampling processes for data evaluation; Table S1: Haplogroup diversity in the Thai population (based on classification of HVI and HVII sequence data); Table S2  Informed Consent Statement: Patient consent was waived due to using not re-identifiable data from the laboratory results.

Data Availability Statement:
The data can be found in the supplementary material.