Updating the Phylogeography and Temporal Evolution of Mitochondrial DNA Haplogroup U8 with Special Mention to the Basques

: Mitochondrial DNA phylogenetic and phylogeographic studies have been very useful in reconstructing the history of modern humans. In addition, recent advances in ancient DNA techniques have enabled direct glimpses of the human past. Taking advantage of these possibilities, I carried out a spatiotemporal study of the rare and little-studied mtDNA haplogroup U8. Today, U8, represented by its main branches U8a and U8b, has a wide western Eurasian range but both with average frequencies below 1%. It is known that, in Paleolithic times, U8 reached high frequencies in European hunter-gatherers. However, it is pertinent to precise that only lineages belonging to U8a and U8c, a sister branch of U8b, were detected at that time. In spite of its wide geographic implantation, U8c was extinct after the Last Glacial Maximum, but U8a subsisted until the present day, although it never reached its high Paleolithic frequencies. U8a is detected mainly in northern and western Europe including the Basques, testifying to a minor maternal Paleolithic continuity. In this respect, it is worth mentioning that Basques show more U8-based afﬁnities with continental European than with Mediterranean populations. On the contrary, coalescent ages of the most ancient U8b clades point to a Paleolithic diversiﬁcation in the Caucasus and the Middle Eastern areas. U8b-derived branches reached eastern Europe since the Mesolithic. Subsequent Neolithic and post-Neolithic expansions widen its ranges in continental Europe and the Mediterranean basin, including northern Africa, albeit always as a minor clade that accompanied other, more representative, mitochondrial lineages.


Introduction
The non-recombinant mitochondrial DNA (mtDNA) lineages have been successfully used as molecular tracers to follow the evolution of human populations over time [1]. Under the hypothesis that one of the earliest mtDNA modern human radiations outside of Africa occurred in Southeast Asia [2], the mtDNA haplogroup U* indicates early westward radiation that reached South Asia and Eurasia in Paleolithic times. One of its western branches is haplogroup U8. Currently, this haplogroup is very rare but paradoxically widely extensive in its geographic range. On average, the frequency of hg U8 is usually below 1%. However, its subclade U8a reaches higher frequencies (3%) in northern European populations from Finland [3] and Novgorod, Russia [4]. On the other hand, its sister subclade U8b shows higher frequencies in Mediterranean isolates such as Corsica Island (4.3%) [5] and the non-Berber population of Zriba (16%), in Tunisia [6]. It is worth noting that, in this U8 description, I excluded the K sub-branch of U8b because it has been already well studied in different population genetic contexts [7][8][9][10].
Based on ancient DNA studies, haplogroup U8 seems to have been much more frequent in hunter-gatherer populations throughout Europe during the Paleolithic and Mesolithic periods [11,12]. Although an early study on the Basque population based on the phylogeny and phylogeography of haplogroup U8 has been published [13], no wide-scale studies for this haplogroup exist.
Taking advantage of the impressive increase in mitochondrial sequences available today, here, I carried out a wide, spatiotemporal analysis of haplogroup U8 using past and present-day population samples throughout its whole geographic range. I found that its main branches U8a and U8b (x K) tell us different but complementary demographic histories and that, in regard to haplogroup U8, Basques show higher genetic affinities with continental European populations than with the Mediterranean ones, including the Iberian Peninsula.

Match-Based Distances between Populations
To diminish the strong influence of the common haplotypes in frequency-based distances, I used an additional measure of distance based on matches, considering matches pairs of identical sequences or those differing in one or a maximum of two mutations. I implemented a simple algorithm defining I Xy, the identity between populations x and y, as the double number of matches between them (2M XY ) divided by the product of the number of different lineages in x and y (N X X N Y ). The distance between populations (D XY ) simply equals 1 − I XY .

Other Statistical Analyses
Fisher's and chi-square tests were used to analyze 2 × 2 contingence tables. Multidimensional scaling (MDS) was performed with R software. To assess the correlation between frequencies of U8 subclades and geographic coordinates, I used multiple regression analysis as implemented in the free statistical software http://www.statskingdom.com, (accessed on 9 November 2021).

Coalescence Age Estimations
For estimating coalescence ages, I used the following four approaches: (1) the rho statistic [16] using a substitution rate for the complete mtDNA sequence (16,500 bp) of one substitution every 3624 years [17]; (2) a modified rho statistic that corrects for the time dependency effect [18], using for the most recent period a mutation rate of one mutation every 1408 years [19]; (3) the rho statistic but using only those sequences with the major number of mutations within the clade analyzed [19] and a mutation rate of one mutation in every 3205 years [20]; (4) the rho statistics using ancient mtDNA sequences and the substitution rate of one substitution every 2273 years, deduced by calibration with ancient mtDNA sequences extracted from fossil samples securely radiocarbon dated and applying the branch shortening concept [21].

Results
I screened 429,051 mtDNA sequences obtained from present-day samples from the Iberian Peninsula, including the Basque Country (Table S1), Italy (Table S2), the Balkans (Table S3), central Europe (Table S4), western Europe (Table S5), eastern Europe (Table  S6), northern Europe (Table S7), the Caucasus (Table S8), Turkey (Table S9), Middle East (Table S10), central Asia (Table S11), South Asia (Table S12) and northern Africa (Table S13). A summary of the frequencies of haplogroup U8 and its subclades U8a and U8b in each of the regions analyzed is provided in Table 1. In addition, I screened a total of 7739 ancient mtDNA sequences comprising archaeological periods from the Paleolithic to Historic times (Table S14), summarized in Table 2. The frequency of haplogroup U8 (21.8%) was maximum in the European Paleolithic, represented mainly by its branches U8a (14.5%) and U8c (7.3%), whereas U8b was not detected. After the Last Glacial Maximum (LGM), in Mesolithic times, the U8c branch seems to be extinct, while U8a barely subsisted at very low frequencies in all the subsequent periods. On the other hand, U8b appeared in the Mesolithic (1%) and reached its highest frequency (1.3%) in the Neolithic.

U8 Phylogeography of Present-Day Populations
Genetic drift and founder effects seem to be the main responsible factors for the anomalous high frequencies of U8 in some isolates. This seems to be the case for the Portuguese and Spanish Roma groups with frequencies of 3.6% and 2.5%, respectively (Table S1), only comparable to those found in northern European samples (Table S6). The frequency of U8b (2.9%) in Bulgarian Jews (Table S6) is also noteworthy, as are the high frequencies found for this haplogroup in isolate groups of Tunisia (16%) and Algeria (2.4%) (Table S13). Comparing average differences between large geographic areas, U8a is significantly more abundant in the continental European area comprising western, central, and northern regions, compared with eastern Europe, the Mediterranean basin, and the Middle East (0.45% vs. 0.15%; p < 0.00001). Conversely, frequencies of U8b in the latter area are significantly higher than in the former (0.21% vs. 0.12%; p = 0.0031). This geographic structure is also observed graphically. A PCA plot (Figure 1), which was based on an U8a pairwise genetic distance matrix between regions (Table S15), obtained from haplotype matches (Table S16), shows that the Middle East, the Caucasus, and northern Africa form a geographic continuum. On the other hand, all the European regions make up another geographic cluster to which central Asia approximates. For its part, Italy is in an intermediate position between these areas. I attribute the anomalous position of the Balkans to the small number of U8a sequences obtained from that region.  Table 1).
The same type of analysis performed for the U8b branch ( Figure 2 and Table S17) presents a cluster grouping northern and western Europe, including Basques and the Iberian Peninsula. Italy seems to have received influences from central Europe and northern Africa, while eastern Europe seems to have received these influences from the Middle East and the Caucasus. Again, the Balkans shows an anomalous position.  Table 1).
On the other hand, the results of testing geographic correlation for U8a and U8b frequencies reveal that a strong and positive cline with latitude (R = 0.716; p < 0.00001) and a negative small one with longitude (R = −0.248; p = ns) exist for the U8a frequencies. In the case of the U8b frequencies, there is a weak, partial interaction with both coordinates, with a negative sign (r = −0.311) but without statistical significance. The preeminent northern (U8a) and southern (U8b) expansions across Europe from an ancestor U8* lineage probably originated in the Caucasus are depicted in Figure 3.

U8 Phylogeography in the Past
In addition to the changes observed over time, the prehistoric U8 samples also show interesting geographic differentiation. During the Paleolithic (Table S18), there was a clear geographic partition with U8a concentrated in central and western Europe and U8c in eastern Europe and the Mediterranean (p = 0.0003). In the Mesolithic (Table S19), the U8b cluster appeared for the first time but was limited to the Middle East and eastern Europe. Lineages U8a and U8c were not detected in this sample. The important sample size gathered for the Neolithic period (Table S20) allows confirming that U8a did not disappear in the Mesolithic, as it is detected in the Neolithic at low frequencies in western, central, and Eastern Europe. However, the absence of U8c lineages confirms that this branch was extinguished during the LGM. For its part, in the Neolithic, U8b extends through central and western Europe but with significantly lower frequencies than in eastern Europe and the Mediterranean basin (0.89% vs. 2.24%; p = 0.04). The most striking result of the Chalcolithic period (Table S21) is the high frequency of U8b in Italy and its notable presence in the Caucasus and central Asia, compared with other regions (p = 0.0009). During the Bronze Age (Table S22), while the U8a branch is still barely detectable, the U8b branch is consolidated in eastern Europe, the Mediterranean basin, the Middle East, and the Caucasus. Finally, in the Iron Age and Historic times (Table S23) U8b reached northern Africa at frequencies (3.2%) comparable to those in the Middle East.

Haplogroup U8 Coalescent Age Estimates
I built a phylogenetic tree ( Figure S1) using 212 complete mitogenomes, 63 from ancient DNA remains, and 149 from present-day samples. The uncertainty of substitution and mutation rates, differences in the analyses, and the large confidence intervals make coalescent estimates rather imprecise (Table 3). (1) Present-day lineages; (2) Only sequences with number of mutations in the +0.05 Poisson interval.
Thus, I opted for using the average of different estimations as a provisional approach. The mean coalescence ages for the whole U8 clade (52,936 ya; 95%CI: 29,916-75,955 ya), and its main subclades U8a (29,741 ya; 95%CI: 20,620-38,862 ya), and U8b (50,405 ya; 95% CI: 11,785-89,024 ya) are within the upper range of previous estimates [9,22,23]. The observation that although U8a and U8b are sister branches, the mean coalescence age of the first is significantly more recent than the second (t = 5.7966; df = 8; p = 0.0004) deserves special attention. In fact, the U8b estimate is close to that for the whole haplogroup U8. This result could be explained by invoking different demographic histories for each clade. For example, the U8a basal branch that separated it from the main node U2 3 4 7 8 9 accumulated six mutations before its next bifurcation. However, its sister branch subdivided into the U8b and U8c clades after only one mutation, but after this, five additional mutations accumulated at the U8b trunk before it bifurcated into branches U8b1a and U8b1b ( Figure S1). There are also striking differences between clades attending to the geographic localization and temporal expansion of the clusters harboring the most evolved sequences. Whereas for U8a, these are eastern European (Poland) sequences within the U8a1a1b clade with Neolithic coalescence (6741, 95% CI: 1450-12,032 ya), for U8b, they are of the Caucasus and Middle East ascendance and are found within Paleolithic lineages U8b1a2 (28,738, 95%CI: 18,155-39,321 ya) and U8b1b2 (20,658, 95% CI: 11,604-29,492 ya).
According to coalescent theory, trees can be subdivided by internode intervals ( observation that although U8a and U8b are sister branches, the mean coalescence age of the first is significantly more recent than the second (t = 5.7966; df = 8; p = 0.0004) deserves special attention. In fact, the U8b estimate is close to that for the whole haplogroup U8. This result could be explained by invoking different demographic histories for each clade. For example, the U8a basal branch that separated it from the main node U2′3′4′7′8′9 accumulated six mutations before its next bifurcation. However, its sister branch subdivided into the U8b and U8c clades after only one mutation, but after this, five additional mutations accumulated at the U8b trunk before it bifurcated into branches U8b1a and U8b1b ( Figure S1). There are also striking differences between clades attending to the geographic localization and temporal expansion of the clusters harboring the most evolved sequences. According to coalescent theory, trees can be subdivided by internode intervals (γϒi) with a decreasing number of lineages going backward in time [24]. I think that the ratio of the number of lineages (i) between adjacent intervals (i/i−1) could be a simple measure of population size growth in the (i) period relative to that in the (i-1) period. Applying this ratio for U8a, the greatest values are between i2/i1 (3.17) and i2/i3 (2.53), which corresponds to time periods of 21.6 and 19.2 kya, pointing to the beginning of the end of the LGM. For U8b, they are between i2/i1 (4.0) and i2/i3 (6.13), in Paleolithic (22.4 kya) and Neolithic (6.5 kya) times, respectively.
Finally, I detected a certain degree of geographic structure in some clades that diverged late in time. Examples of this for U8a are U8a1a1a1, limited to central Europe, U8a1a1a2 and U8a1a1b comprising only eastern European sequences, or U8a1a3 represented only by western European lineages. As for U8b, it harbors subclades within U8b1a1, connecting Caucasus-Anatolia-Armenia, and the Middle East with eastern Europe, or U8b1b2, grouping Caucasus, Middle East, and the Balkans ( Figure S1).

The Extinction of mtDNA Lineages
Ancient DNA studies reveal that prehistoric populations had a genetic variation that has not been transmitted to modern populations [25,26]. This extinction process is evident at the level of mtDNA lineages. For example, the Mal'ta 1 [27] and Cioclovina 1 [11] specimens had haplogroup U basal mtDNA lineages ( Figure S1) with 9 and 1 particular mutations, respectively, which are not found as diagnostic of any of the present-day U haplogroups. In the same way, all the Paleolithic lineages that arose from the basal clade U2′3′4′7′8′9 ( Figure S1) should be considered extinct because none of their particular or shared mutations are diagnostic of any of the six lineages that persist today. Haplogroup U8, studied here, is one of these lineages, and the mtDNA of the Bacho Kiro 1653 Paleolithic specimen belonged to this clade [26], but it is now also an extinct lineage ( Figure S1). The case of the entire haplogroup U8c, which was present in the Paleolithic from northeastern Europe [28] to southern Italy [23] is, perhaps, the most striking example of a wide extinction. In the same way, if we look at the prehistoric specimens, indicated in red along the phylogenetic tree ( Figure S1), we will find lineages that belonged to branches U8a and U8b but that have not left descendants today. In some cases, this occurs for entire subclades such as the U8b1b subclade characterized by transitions in the 6465 and 8572 positions, which groups only Neolithic samples from eastern Europe ( Figure S1). As a haploid, non-recombining marker with a high mutation rate, mtDNA genealogies that include prehistoric samples are especially well suited for visualizing the extinction of many of these past lineages.
i ) with a decreasing number of lineages going backward in time [24]. I think that the ratio of the number of lineages (i) between adjacent intervals (i/i−1) could be a simple measure of population size growth in the (i) period relative to that in the (i-1) period. Applying this ratio for U8a, the greatest values are between i 2 /i 1 (3.17) and i 2 /i 3 (2.53), which corresponds to time periods of 21.6 and 19.2 kya, pointing to the beginning of the end of the LGM. For U8b, they are between i 2 /i 1 (4.0) and i 2 /i 3 (6.13), in Paleolithic (22.4 kya) and Neolithic (6.5 kya) times, respectively.
Finally, I detected a certain degree of geographic structure in some clades that diverged late in time. Examples of this for U8a are U8a1a1a1, limited to central Europe, U8a1a1a2 and U8a1a1b comprising only eastern European sequences, or U8a1a3 represented only by western European lineages. As for U8b, it harbors subclades within U8b1a1, connecting Caucasus-Anatolia-Armenia, and the Middle East with eastern Europe, or U8b1b2, grouping Caucasus, Middle East, and the Balkans ( Figure S1).

The Extinction of mtDNA Lineages
Ancient DNA studies reveal that prehistoric populations had a genetic variation that has not been transmitted to modern populations [25,26]. This extinction process is evident at the level of mtDNA lineages. For example, the Mal'ta 1 [27] and Cioclovina 1 [11] specimens had haplogroup U basal mtDNA lineages ( Figure S1) with 9 and 1 particular mutations, respectively, which are not found as diagnostic of any of the present-day U haplogroups. In the same way, all the Paleolithic lineages that arose from the basal clade U2 3 4 7 8 9 ( Figure S1) should be considered extinct because none of their particular or shared mutations are diagnostic of any of the six lineages that persist today. Haplogroup U8, studied here, is one of these lineages, and the mtDNA of the Bacho Kiro 1653 Paleolithic specimen belonged to this clade [26], but it is now also an extinct lineage ( Figure S1). The case of the entire haplogroup U8c, which was present in the Paleolithic from northeastern Europe [28] to southern Italy [23] is, perhaps, the most striking example of a wide extinction. In the same way, if we look at the prehistoric specimens, indicated in red along the phylogenetic tree ( Figure S1), we will find lineages that belonged to branches U8a and U8b but that have not left descendants today. In some cases, this occurs for entire subclades such as the U8b1b subclade characterized by transitions in the 6465 and 8572 positions, which groups only Neolithic samples from eastern Europe ( Figure S1). As a haploid, non-recombining marker with a high mutation rate, mtDNA genealogies that include prehistoric samples are especially well suited for visualizing the extinction of many of these past lineages.

The Peopling of Europe from an U8a Perspective
Our phylogenetic analysis supports U8 radiation in western Eurasia around 50 kya, following, in short, previous radiation of macrohaplogroup U* in central Asia [2]. The most probable geographic origin for the U8 branching phenomenon seems to be the Caucasus because there is where the deepest lineages of U8b had their roots. This would explain the generalized expansion of the now-extinct U8c clade throughout Europe [11,12,23,28], the high presence of U8a lineages in the Paleolithic of central and western Europe [11,12], and the old age of some U8b lineages in the Middle East. After LGM, U8c was completely extinct; however, some lineages of the U8a branch survived this period but never reached the high frequencies of earlier times, remaining to this day mainly in areas of western and northern Europe where the Neolithic demic wave had less influence. At this point, it is pertinent to introduce the Basque people in this discussion. It has been previously proposed that U8a reveals a Paleolithic settlement in the Basque country and that their primitive founders most probably came from western Asia and did not follow a north African route [13]. Our results closely agree with these hypotheses. Basques significantly differ from the rest of the Iberian Peninsula by their comparatively higher frequency of U8a and lower frequency of U8b lineages (p = 0.0126). In this respect, Basques are more similar to southern France populations [29]. This could be attributed to a comparatively minor influence on the Basques of the Neolithic maternal gene flow. The fact that the Basque U8a lineages are spread in both of the oldest clusters (U8a1 and U8a2) is proof of their ancient implantation in the Basque country. The close relationship of northern African U8a haplotypes with the Near East, and its large distance from those of the Basques (Figure 1), also confirm that the affinities of the Basque lineages are with those of European populations. Basques also actively participated in European regional interchanges that occurred since the Mesolithic (Table 4). Britain was singularly affected by these migrations, potentially receiving consecutive gene flows from Iberia, including Basques, and western, northern, and central Europe. For its part, northern Europe seems to have received migrating groups from both western and eastern regions. Finally, the younger U8a clades witness local expansions from Neolithic to Bronze Age that were particularly important in eastern Europe.

The Peopling of Europe from a U8b Perspective
Unlike its phylogenetic counterparts, U8a and U8c, haplogroup U8b is beginning to be detected in the Mesolithic period, already as derived lineages, in Jordan as U8b1a [30], in Serbia as U8b1b [31], and in Anatolia as U8b1b1 [32]. Furthermore, its sister branch haplogroup K already appears in the Paleolithic of the Caucasus as a Georgian Satsurblia K3 lineage [33]. K3 is a rare clade that, nevertheless, has had continuity until the present day. It was later detected in Armenia during the Bronze Age [34], and today in the Caucasus [35], but also as far as China [36]. Other K lineages were coeval to those of U8b in the Mesolithic. Examples are the presence of K2b in Mesolithic Anatolia [37], and K1c in Mesolithic Greece [38]. These data point to double radiation of haplogroup U8 before LGM. One occurred in Europe (U8a), and the other in the Caucasus (U8b). The data also point to a genetic continuity of the surviving Paleolithic lineages through the Mesolithic. Other K lineages, mainly those belonging to the K1a subclade, are considered a dominant sign of a demic Neolithic expansion through continental and Mediterranean Europe [10,39,40]. It seems that with minor frequencies, U8b1a, and mainly U8b1b branches, also participated in these Neolithic expansions. In fact, compared with its sister branch, U8a, the number of Neolithic/Chalcolithic samples belonging to U8b ( Figure S1) is significantly greater in the latter (p < 0.0001), and the same occurs in the Bronze/Iron Ages (p = 0.0029). However, the majority of the more derived subclades of U8b (Table 5) have Paleolithic and Mesolithic coalescences and, regionally, are still connected to the Caucasus and the Middle East. The U8b data seem to indicate that a Mesolithic wave from these areas preceded the Neolithic expansion. Haplogroup U8 is a rare clade of a small gene with only maternal inheritance. However, from its phylogeny and past and present phylogeography, it outlines a history of the Europe settlement by modern humans that does not differ in its main traits from those proposed using larger mtDNA lineages [12] or even complete genomes [11]. Thus, the U8a branch complements the contractions and expansions across western and northern Europe described by haplogroup U5 from the Paleolithic onwards [41], and those of haplogroup U4 in eastern and northern Europe since the Mesolithic [42,43]. In the case of the U8b branch, it seems to indicate a primitive Paleolithic diversification in the Caucasus or central Asia, perhaps similar in time and location to the U4 9 bifurcation [44], Mesolithic migrations to the Balkans and northern Africa (Table 5), as well as later westward expansions to Continental and Mediterranean Europe since the Neolithic onwards, affecting again northern Africa. These later movements have been previously visualized by the phylogeographies of haplogroup K, the sister branch of U8b1 [8,45,46], and haplogroup U3 [2,45,46].

Conclusions
Haplogroup U8 (x K) had three main Paleolithic branches. One of them, U8c, although widely extended across Europe in that period, did not subsist during the LGM. The two extant branches have had two very different demographic histories. U8a survived the LGM and recovered after it, albeit in low frequencies, mainly in northern and western Europe, including the Basques. U8b had deep Paleolithic roots in the Caucasus and the Middle East. From there, it accompanied other female lineages in the Mesolithic, Neolithic, and subsequent periods, reaching continental Europe and the Mediterranean basin, including northern Africa, from the East, but always at low frequencies (≈1%). Only subhaplogroup K, the sister clade of U8b1, reached average frequencies of around 7%.
Funding: This study has not had any funding.

Institutional Review Board Statement:
This study underwent formal review and was approved by the Ethics Committee for Human Research at the University of La Laguna as proposal NR157.

Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available in the article and Supplementary Materials.