The Mitochondrial DNA Landscape of Modern Mexico

Mexico is a rich source for anthropological and population genetic studies with high diversity in ethnic and linguistic groups. The country witnessed the rise and fall of major civilizations, including the Maya and Aztec, but resulting from European colonization, the population landscape has dramatically changed. Today, the majority of Mexicans do not identify themselves as Indigenous but as admixed, and appear to have very little in common with their pre-Columbian predecessors. However, when the maternally inherited mitochondrial (mt)DNA is investigated in the modern Mexican population, this is not the case. Control region sequences of 2021 samples deriving from all over the country revealed an overwhelming Indigenous American legacy, with almost 90% of mtDNAs belonging to the four major pan-American haplogroups A2, B2, C1, and D1. This finding supports a very low European contribution to the Mexican gene pool by female colonizers and confirms the effectiveness of employing uniparental markers as a tool to reconstruct a country’s history. In addition, the distinct frequency and dispersal patterns of Indigenous American and West Eurasian clades highlight the benefit such large and country-wide databases provide for studying the impact of colonialism from a female perspective and population stratification. The importance of geographical database subsets not only for forensic application is clearly demonstrated.


Introduction
Mexico, officially the United Mexican States, is the northernmost Central and southernmost North American country. It covers more than 1.9 million km 2 and is confined by the USA in the North, the Atlantic Ocean in the East, Belize and Guatemala in the South, and the Pacific Ocean in the West. In cultural terms, Mexico is part of Mesoamerica in its center and South and the Greater Southwest and Aridoamerica in its North [1][2][3][4][5][6][7]. It is generally agreed that the first modern human settlers reached America from Asia via Beringia and colonized the double continent in a rapid southward movement after an incubation period [8,9]. Mesoamerica had a pivotal role in this process, providing a bridge and geographic bottleneck into South America and continuous interethnic space. Archaeological evidence has shown human presence in today's Mexico since 12-15 thousand years (ky) [5,6,[10][11][12]. The extensive and arid North and the rainy and narrow central and Southern valleys and mountains required different subsistence strategies: while Northern human forensic aspect has increasingly become relevant in lieu of the many unidentified victims attempting to cross the Mexican-US border [43]. To address these limitations, this study presents the first comprehensive overview of the mitogenetic landscape of modern Mexico reporting novel complete mtDNA CR profiles of 2021 individuals collected from the general population across the country.

Sample Collection
The study was performed on 2021 mouthwash samples collected from the general Mexican population as part of a worldwide campaign by the Sorenson Molecular Genealogy Foundation (SMGF). All experimental procedures and individual written informed consent, obtained from all donors, were reviewed and approved by the Western Institutional Review Board, Olympia, Washington (USA). Ethno-linguistic affiliation of donors was not recorded. Only individuals where genealogical investigation revealed a terminal maternal ancestor (TMA) from Mexico and unrelatedness were included. The sample set covers all 32 administrative units of Mexico (31 states and Mexico City, the former Federal District), except the easternmost state of Quintana Roo. Sample numbers range from 3 to 511, with a mean and median of 61 and 36, respectively (Table S1). For interpretational purposes, the individuals were combined into macrogeographic subsets, revealing roughly similar sizes: North (seven states, n = 567), Center (15 states and Mexico City, n = 622), and South (eight states, n = 709) ( Figure 1). The macro-regions were further divided into Western (Pacific) and Eastern (Atlantic) subsets and sample sizes were again reasonably similar (Northwest, NW: n = 388, Northeast, NE: n = 179; Centre-West, CW: n = 360, Centre-East, CE: n = 362), except for the South (Southwest, SW: n = 629, Southeast, SE: n = 80). The country-wide Western and Eastern datasets contained 1377 and 521 individuals, respectively. A further 123 Mexican individuals were included in the study whose TMA could not be assigned to a specific state for the lack of information or recurrent toponyms (Table S1, Figure 1). reconstruct the history of human settlement and their interaction [20], as well as the forensic application of mtDNA as a vital niche marker in human identification and the assessment of biogeographic origin [37]. Furthermore, the forensic aspect has increasingly become relevant in lieu of the many unidentified victims attempting to cross the Mexican-US border [43]. To address these limitations, this study presents the first comprehensive overview of the mitogenetic landscape of modern Mexico reporting novel complete mtDNA CR profiles of 2021 individuals collected from the general population across the country.

Sample Collection
The study was performed on 2021 mouthwash samples collected from the general Mexican population as part of a worldwide campaign by the Sorenson Molecular Genealogy Foundation (SMGF). All experimental procedures and individual written informed consent, obtained from all donors, were reviewed and approved by the Western Institutional Review Board, Olympia, Washington (USA). Ethno-linguistic affiliation of donors was not recorded. Only individuals where genealogical investigation revealed a terminal maternal ancestor (TMA) from Mexico and unrelatedness were included. The sample set covers all 32 administrative units of Mexico (31 states and Mexico City, the former Federal District), except the easternmost state of Quintana Roo. Sample numbers range from 3 to 511, with a mean and median of 61 and 36, respectively (Table S1). For interpretational purposes, the individuals were combined into macrogeographic subsets, revealing roughly similar sizes: North (seven states, n = 567), Center (15 states and Mexico City, n = 622), and South (eight states, n = 709) ( Figure 1). The macro-regions were further divided into Western (Pacific) and Eastern (Atlantic) subsets and sample sizes were again reasonably similar (Northwest, NW: n = 388, Northeast, NE: n = 179; Centre-West, CW: n = 360, Centre-East, CE: n = 362), except for the South (Southwest, SW: n = 629, Southeast, SE: n = 80). The country-wide Western and Eastern datasets contained 1377 and 521 individuals, respectively. A further 123 Mexican individuals were included in the study whose TMA could not be assigned to a specific state for the lack of information or recurrent toponyms (Table S1, Figure 1).

MtDNA Sequencing, Haplogroup Estimation, and Quality Control
Total DNA was extracted using the QIAamp DNA Blood Maxi Kit (Qiagen, Hilden, Germany) according to the manufacturer's recommendations and stored at −20 • C. The entire control region (CR, nps 16024-16569, 1-576) was amplified and sequenced as described in [44]. Sequence data were assembled and aligned to the revised Cambridge reference sequence (rCRS) [45] using Sequencher v5.4.6 (GeneCodes, Ann Arbor, MI), resulting in haplotypes covering nps 16000-16569 1-580. All mtDNA haplotypes were subjected to forensic quality control, including plausibility checks and phylogenetic inspection on EMPOP (https://empop.online (accessed on 16 August 2021)) [46]. The SAM2 engine [47] implemented in EMPOP was used for phylogenetic haplotype alignment and haplogroup estimation [37] according to the clades cataloged in PhyloTreemt, build 17 [48]. SAM2 does not follow a strict classification along a tree but compares the input haplotypes to the large EMPOP dataset of verified complete mitogenomes. According to the variation and fluctuation in this etalon, the haplogroup of the queried haplotype is estimated. Partial mitogenomes, as in this study, might yield several candidate matches within the same cost range of the weighted differences. If so, rather than picking one, the most recent common ancestor (MRCA) haplogroup containing all candidates is provided. This conservative output can form the basis for a further educated haplogroup estimation taking non-genetic information into consideration, such as geographic dispersal and metapopulation [37]. The exact haplogroup can always only be assured by the complete mitogenome. In this study, SAM2 estimates were used (candidates not shown), and the few additional interpretations are explained.

Forensic, Population Genetic, and Phylogeographic Calculations
Forensic and population genetic molecular statistics and diversity indices were calculated from the complete haplotypes using Arlequin v3.5.1.2 [49] disregarding length variation in polycytosine stretches, i.e., using nps 16000-16193 16194-309 310-573 574-580. Random match probability (RMP), power of discrimination (PD, haplotype diversity), and discrimination capacity (DC) were calculated as in [26,50]. A principal component analysis (PCA) to compare the haplogroup frequencies of the Mexican states was performed as previously described [51]. Haplogroup frequency maps were obtained using Surfer v.6.04 (Golden Software, Golden, CO), with the Kriging procedure. Estimates at each grid node were inferred by consideration of the entire data set, as in [52].

Quality Control
All 2021 haplotypes passed quality control and will be uploaded on the forensic online mtDNA population database EMPOP with reference numbers EMP000849-852 [46]. They constitute the first mtDNA dataset for the general population of contemporary Mexico enabling haplotype and haplogroup frequency and dispersal estimates (Table S2). In 90 individuals (4.4%), one nucleotide, and in four individuals (0.2%), two nucleotides could not unambiguously be determined at altogether 68 different nps. The haplotypes were considered for haplogroup-based analyses since the general high data quality indicated otherwise inconspicuous results. Only the 1927 complete sequences were considered for haplotype-based parameters in order to use their full sequence information.

Forensic and Population Genetic Characterization of the Mexican mtDNA Dataset
The dataset contained 799 different haplotypes in the range nps 16000-16569 1-580, 502 of which (62.8%) were unique, comprising 26.1% of the included individuals. An RMP of 0.5% was calculated, PD reached 99.6%, and DC was 41.5%. The mean number of pairwise differences between two randomly chosen haplotypes (MNPD, k) was 14.995 ± 6.709, resulting in a nucleotide diversity (π) of 0.013 ± 0.006 over the 1168 nps. The most frequent haplotype was 16223T 16290T 16319A 16362C 64T 73G 146C 153G 182T 235G 263G 315.1C 523del 524del, relative to the rCRS [45], belonged to haplogroup A2 and was found  (Tables S2-S4). The most and third most frequent haplotypes were just observed once each in the 38,361 worldwide, thereof 17,062 American and 6497 Native American, samples with a range of CR and beyond in EMPOP v4/R13. This yielded haplotype frequency point estimates of 2.6|5.9|15.4 in 100,000 individuals (worldwide|America|Native Americans), respectively. The haplotype ranking second in Mexico was reported 11 times, i.e., at respective frequencies of 2.9|6.4|16.9 in 10,000 (worldwide|America|Native Americans). All 13 hits were made in Native American US populations [46].

The Phylogeography of mtDNA Haplogroups in Mexico
The spatial distribution of both Indigenous American and West Eurasian haplogroup clusters within Mexico was found to be strikingly heterogeneous. The Indigenous lineages showed peculiar and specific patterns along the country. Haplogroup A2 (41.8%) was most frequent in the South, where it comprised 57.0% of individuals, and showed a gradient via the Center (43.6%) to the North (22.2%). Always, the Eastern regions had a higher percentage than the Western parts. Haplogroup C1 (23.7%) presented the opposite pattern: it was particularly frequent in the Northwest (49.5%) and revealed a decline from the North (38.8%) to the Center (16.1%) and South (17.8%) (Table 1, Figure 2). clusters within Mexico was found to be strikingly heterogeneous. The Indigenous lineages showed peculiar and specific patterns along the country. Haplogroup A2 (41.8%) was most frequent in the South, where it comprised 57.0% of individuals, and showed a gradient via the Center (43.6%) to the North (22.2%). Always, the Eastern regions had a higher percentage than the Western parts. Haplogroup C1 (23.7%) presented the opposite pattern: it was particularly frequent in the Northwest (49.5%) and revealed a decline from the North (38.8%) to the Center (16.1%) and South (17.8%) (Table 1, Figure 2).  Both clines were also observed in Indigenous and admixed populations when analyzed at a country-wide scale [2,3,12,14,17,18,62]. These findings illustrate the heterogeneity of lineage dispersal within America, since they contrast with the overall gradients in the double continent [8,18]. Haplogroup B2 (17.6%) was uniformly distributed in the North, Center, and South of Mexico in this sample (16.8-18.5%) with the slightly higher proportion particularly the Northwest (20.4%). A generally higher prevalence in admixed and Indigenous Northern Mexican/Greater Southwest populations has been reported [2,6,14,17,63]. Haplogroup D1 (5.5%) was found to be relatively similarly distributed between the North (3.7%), Center (7.2%), and South (5.4%). Lower frequencies in the North were observed previously [14]. The rare D4h3 (mostly D4h3a) (0.6%) reached 1.6% in the North, particularly Northwest, while elsewhere did not exceed 0.6%. D4h3a spread into the Americas along the Pacific coast and is generally found there [4,10,54]. X2 (likely X2a; 0.1%) is a low-frequency (Northeastern) North American lineage [54] and the X2 haplotype with geographic information in our set was found in the North. The West Eurasian lineages (combined, 8.0%) reached 12.0% in the North, presenting a peak in the Northeast (22.3%) and a southward decline via 10.6% (Center) to 1.6% (South), where the proportion of Indigenous populations is highest. This dispersal was explained by the fact that the main settlement in the North occurred only after European conquest [18,64]. The African lineages (combined, 2.0%) were equally distributed (1.0-3.7%) in all areas except the Southeast, as reported [18] (Table 1, Figure 2).

Regional mtDNA Databases for Mexico?
It has been postulated for Indigenous populations of Mexico and beyond that geography rather than linguistic classification predicts mtDNA structure and diversity, suggesting that genetic divergence predates linguistic diversification [2,10,36,63]. It was also described that the general population throughout Mexico mirrors the surrounding Indigenous groups in terms of the mtDNA gene pool composition, particularly in the South [18], supporting geography-based classification. To further evaluate the particular mitophylogeographic structure of Mexico, a PCA comparing the haplogroup frequencies of the different states was performed (Figure 3). and Indigenous Northern Mexican/Greater Southwest populations has been reported [2,6,14,17,63]. Haplogroup D1 (5.5%) was found to be relatively similarly distributed between the North (3.7%), Center (7.2%), and South (5.4%). Lower frequencies in the North were observed previously [14]. The rare D4h3 (mostly D4h3a) (0.6%) reached 1.6% in the North, particularly Northwest, while elsewhere did not exceed 0.6%. D4h3a spread into the Americas along the Pacific coast and is generally found there [4,10,54]. X2 (likely X2a; 0.1%) is a low-frequency (Northeastern) North American lineage [54] and the X2 haplotype with geographic information in our set was found in the North. The West Eurasian lineages (combined, 8.0%) reached 12.0% in the North, presenting a peak in the Northeast (22.3%) and a southward decline via 10.6% (Center) to 1.6% (South), where the proportion of Indigenous populations is highest. This dispersal was explained by the fact that the main settlement in the North occurred only after European conquest [18,64]. The African lineages (combined, 2.0%) were equally distributed (1.0-3.7%) in all areas except the Southeast, as reported [18] (Table 1, Figure 2).

Regional mtDNA Databases for Mexico?
It has been postulated for Indigenous populations of Mexico and beyond that geography rather than linguistic classification predicts mtDNA structure and diversity, suggesting that genetic divergence predates linguistic diversification [2,10,36,63]. It was also described that the general population throughout Mexico mirrors the surrounding Indigenous groups in terms of the mtDNA gene pool composition, particularly in the South [18], supporting geography-based classification. To further evaluate the particular mitophylogeographic structure of Mexico, a PCA comparing the haplogroup frequencies of the different states was performed ( Figure 3). After having reduced the haplogroup complexity to PCs, the first PC was found to primarily mirror the geographic spread, with the Northwestern state group Sonora-Baja California-Baja California del Sur at one, and the Southeast (Chiapas-Tabasco-Campeche-Yucatan) at the other pole. Analysis of the contribution of variables to the first and second PC revealed clear haplogroup structuring. In particular, the second PC separated the Indigenous American from the West Eurasian clades, causing the outgroup position of the Northeastern state group of Nuevo León-Tamaulipas in the PCA (Figure 3). The findings After having reduced the haplogroup complexity to PCs, the first PC was found to primarily mirror the geographic spread, with the Northwestern state group Sonora-Baja California-Baja California del Sur at one, and the Southeast (Chiapas-Tabasco-Campeche-Yucatan) at the other pole. Analysis of the contribution of variables to the first and second PC revealed clear haplogroup structuring. In particular, the second PC separated the Indigenous American from the West Eurasian clades, causing the outgroup position of the Northeastern state group of Nuevo León-Tamaulipas in the PCA (Figure 3). The findings of this study, combined with the literature on the geographically nonuniform dispersal of both Amerindian and West Eurasian haplogroups in Mexico discussed above, thus clearly suggest also evaluating regional subsets within the general population in order to yield accurate dispersal and forensic rarity estimates. Striking regional differences in mtDNA distribution have been described for Argentina, also with a history of European conquest of Indigenous American societies [65]. To investigate this potential for Mexico, database subsets containing the complete Northern (n = 545), Central (n = 589), and Southern (n = 679) haplotypes were analyzed, all surpassing the minimum size considered necessary for a forensic mtDNA database [66] (Tables S4 and S6). The forensic and population genetic parameters calculated from the Northern and Southern subsets were similar to those yielded from the general Mexican database, while the Center was mostly more diverse. The proportion of unique haplotypes among haplotypes|individuals was 60.1-63.9|24. 2-27.9% in the general set, North and South, while 71.6|41.1% in the Center; DC was 57.4% in the Center and 40.2-43.7% elsewhere. RMP was calculated as 1.4-1.5% in the North and South, while it was 0.5-0.6% in the Center and general dataset, and PD lay between 98.6% and 99.6%. The MNPD within sets was highest in the North (15.5) and declining via Center (14.8) to South (13.3), and in general 15.00 (Table S4). The corrected MNPD and population pairwise F ST revealed highest differentiation between North and South (1.48|0.094) and lowest between Center and South (0.19|0.014), while the results for the comparison Center-North were intermediate (0.79|0.049), respectively (Tables S4  and S6). Analysis of molecular variance (AMOVA) revealed that the vast majority of the observed variation in the mtDNA structure represented differences within (94.73%) the three subsets and 5.27% were attributable to differences among them (Table S6). The relevance of regional databases was clearly proven by the fundamental differences revealed in the dispersal of haplotypes and singletons, which are of high relevance for forensic statistics [67] (Tables S4 and S7). Around 80% (77.7-83.2) of each subset's haplotypes were not shared at all among them, and only 15 of the haplotypes (around 0.5% for each subset) were observed in all three subsets. In total, 85.0% (81.6-87.8%) of the unique haplotypes per region were restricted to their region. The most frequent haplotypes over the entire dataset were not homogeneously dispersed along Mexico: the top-ranking haplotype was only found in the South, the second most frequent only in the Center and North, and the third most frequent one only in the North. The most common haplotypes within each regional database were rare or unobserved in the others. Particularly, none of their most frequent haplotypes was observed elsewhere, and the top three Southern haplotypes were only found in that dataset (Tables S4 and S7). This was similarly reported from Uzbekistan, but based on ethnic groups settling in close geographic proximity [68].

The Sex-Biased Genetic History of Mexico
This study demonstrates overwhelming Indigenous American maternal legacy in the extant admixed Mexican population, with almost 90% of mtDNAs belonging to indigenous lineages. A different picture is conveyed by the nuclear genome. Studies on classical blood markers found a ubiquitous European contribution that was, in the North and Center, sometimes larger than the usually predominant Indigenous proportion, while the African proportion was constantly small (references in [13,16,69]). Autosomal microsatellite-based studies revealed an average European ancestry of around 60% in the North, 40% in the Center, and 30% in the South, and 4-8% African contribution [64,70,71]. Investigations of nuclear single nucleotide polymorphisms confirmed the reduced Indigenous ancestry proportion: in admixed populations, the average was 50% in a country-wide sample [72], and more detailed analyses found it lowest in the North (36-51%), 57-66% in the Center, and 59-73% in the South, with a constant African proportion of 2-6% [18,[73][74][75]. This striking discrepancy between the two human genomes illustrates the difficulties of categorizing individuals and populations into (self-identified) ancestry categories. Yet, it is coherently explainable by the mode of European conquest that included "directional mating" of immigrant men with indigenous women, causing asymmetric genetic admixture in Indigenous and urban populations in and beyond Mexico [8,13,18,20,29,72,76]. The possibly 250,000 European settlers in Mexico's colonial period were mostly males, as were the African slaves [16,17]. The European ancestry was determined as mainly Southern European [74]. Consistently, Mexico-wide analyses revealed the paternally inherited Ychromosomal lineages in admixed populations to be predominantly European (48-89%, more prevalent in the North and West), followed by Indigenous American (13-48%, increased in the Center and Southeast). European paternal lineages were also found in all Indigenous populations studied (1-32%). The African ancestry was relatively the lowest (2-15%) [14,18,69].

Conclusions
This study provides for the first time a comprehensive overview on the mtDNA variation in the modern general population of Mexico using a large sample covering the entire country. The findings confirm that the genetic impact of European conquest was small in terms of maternal lineage introgression, and the changes in population structure that followed have likely not substantially changed the ample pre-Columbian pattern of mtDNA variation in Mexico. The mitogenetic structure of the general population is still mainly Indigenous. The proportion of West Eurasian mtDNA lineages in the population was found to be low, but with some exceptions, mainly restricted to the Northeast, where the West Eurasian component yielded high proportions. More detailed analyses argued pro regional databases at least for forensic genetic investigations.
MtDNA is just one genetic marker and, obviously, only a synopsis of all genomic data will fully open the window into the past. This study has also provided insights of a possible sex-differentiated mobility and mixture that impacted cultural as well as biological survival in Mexico, as well as in other countries. The specific insights from the maternal contribution have proven their importance for genetic and anthropological studies aiming to reconstruct and date human settlement. No mitogenome is inherited independently from the nuclear genome. Therefore, this dataset can also help to provide general ancestry estimates for the cosmopolitan population to avoid bias. Understanding population stratification, genetic contributions, and their association with medical traits in admixed populations and individuals is highly relevant to biomedical research and personalized medicine [13,74,77,78].
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/genes12091453/s1. Table S1: TMA geographic origin of the 2021 Mexican individuals included in this study by administrative unit, Table S2: The 2021 Mexican individuals included in this study including TMA origin and mtDNA haplogroup, Table S3: Haplotype frequencies in the Mexican dataset, Table S4: Intra-population forensic and population genetic parameters calculated from the Mexican dataset and subsets, Table S5: Haplogroup frequencies in the Mexican dataset, Table S6: Inter-population forensic and population genetic parameters calculated from the Mexican dataset subsets, Table S7: Haplotype frequencies in the Mexican database subsets and shared haplotypes among them.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Western Institutional Review Board, Olympia, Washington (USA) (protocol #20031734, approved 22 August 2007) and followed the guidelines on ethical publication of forensic research on genetics and genomics [79].
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.