Novel Post-Glacial Haplotype Evolution in Birch—A Case for Conserving Local Adaptation

Despite constituting the western-most edge of the population distributions for several native European plants, Ireland has largely been left out of key Europe-wide phylogeographic studies. This is true for birch (Betula pubescens Ehrh. and Betula pendula Roth), for which the genetic diversity has yet to be mapped for Ireland. Here we used eight cpDNA markers (two Restriction Fragment Length Polymorphism (RFLP) and six Simple Sequence Repeat (SSR)) to map the genetic diversity of B. pubescens, B. pendula, and putative hybrid individuals sampled from 19 populations spread cross most of the island of Ireland. Within Ireland, 11 distinct haplotypes were detected, the most common of which (H1) was also detected in England, Scotland, France, and Norway. A moderate level of population structuring (GST = 0.282) was found across Ireland and the genetic diversity of its northern populations was twice that of its southern populations. This indicates that, unlike other native Irish trees, such as oak and alder, post-glacial recolonization by birch did not begin in the south (i.e., from Iberia). Rather, and in agreement with palynological data, birch most likely migrated in from eastern populations in Britain. Finally, we highlight Irish populations with comparatively unique genetic structure which may be included as part of European genetic conservation networks.


Introduction
Two species of birch are native to Ireland, Betula pubescens Ehrh. and Betula pendula Roth, whereas in Britain, the more cold-tolerant Betula nana L. can also be found in Scotland and some upland parts of England. Birch tends to be a pioneer species, either in forest gaps or forest edges or in wetlands and areas of acidic soils [1]. The distinction between B. pubescens and B. pendula is not clear-cut, but B. pubescens is the predominant species in Ireland, with B. pendula occurring less frequently [2]. Hybrids are also evident, but a detailed study has yet to be undertaken [3]. Indeed, the occurrence of shared haplotypes in all three species indicates a species complex of hybrids and introgressed individuals rather than distinct taxa [4].
Palynological evidence shows that birch was present in Ireland in localized populations from c. 12,000 years before present (BP) and had completely colonized the island by 9500 BP [5,6]. For temperate tree species, recolonization of northern Europe following the last glacial maximum (LGM) generally involved individuals moving north from refugial populations in the south. The consequence of this is a "southern richness and northern purity" model of genetic diversity, as new colonisation typically involves only a few individuals [7]. For oak, this model holds true [8], and in Ireland oak haplotype diversity is even lower than in Britain [9]. However, for cold tolerant species such as birch, which could survive nearer to the edge of the glacial fronts during the LGM, this model tends not to fit.
First of all, palaeoecological evidence on European birch points to a scenario in which it persisted in refugia at mid latitudes [10,11], unlike oak, which persisted mainly in Iberia, Italy, and in the Balkans [8]. Second, initial phylogeographic analyses of birch haplotypes in Europe show a general northwest-southeast divide, with haplotype diversity being particularly high in eastern Europe and Russia [12,13], an observation which is typically taken to be indicative of the presence of refugial populations [14]. A more recent genetic analysis of Betula spp. at the nuclear level confirmed that the main refugia during the LGM for B. pubescens and B. pendula were most likely in Russia and western Siberia [15], a finding which is supported in the fossil record [16].
In Ireland, the direction of pollen influx shows a westward migration from Britain [5], which would make Ireland the western most point of a recolonization progression, originating as far east as Russia. If this is the case, a founder effect may be observable in the cpDNA diversity, which may mean that Ireland contains only a subset of haplotypes which are observable elsewhere in Europe. Such a scenario would not be surprising given that Ireland has a limited flora and therefore a limited gene pool.
In this paper we aim to provide genetic data to inform conservation initiatives and pre-breeding efforts for Betula spp. in Ireland [17]. In the current study, the following questions were posed: (i) What is the origin of Irish birch populations? (ii) What level of genetic diversity is there in Irish populations? (iii) Is there genetic structure in the populations or between the species?

Sampling
Leaves were sampled from putatively native Irish populations (6-14 per location), as identified in the 2008 National Survey of Native Woodlands [2]. These were primarily located on state-owned land and were managed by either local authorities, the National Parks, and Wildlife Services or the commercial forestry company, Coillte. Samples were also taken from national breeding programmes (2-20 per collection). These were maintained as Coillte nursery collections, the provenances of which were known (Table 1). Finally, a small number of samples (≤2 per location) from wild populations in France, Spain, and Norway were analyzed as references but were not included in statistical analyses due to their small sample sizes. Individual trees were designated as either B. pubescens or B. pendula according to morphological characteristics. B. pubescens tends to have downy young twigs and more triangular-oval leaves compared to more sharply pointed triangular leaves and prominent raised glands on the twigs in B. pendula ( Figure 1). If an individual had an intermediate morphology it was designated as a 'putative hybrid'. For natural populations in Ireland, mature trees were sampled which were separated by approximately 15 metres apart. Leaf material (typically three to four young leaves per individual) were immediately placed in silica gel following removal.

DNA Extraction
For each sample, approximately 200 mg of dried leaf tissue was disrupted for 2 min using a bead mill (30 Hz) and a single 3 mm tungsten carbide bead. Extraction of DNA was performed using a DNeasy Plant mini kit (QIAGEN, cat. no. 69204) according to the manufacturer's instructions. DNA was quantified using a NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific, cat. no. ND-2000) and DNA quality was determined by agarose (1.5%) gel electrophoresis and staining with SYBR™ Safe DNA Gel Stain (Invitro-gen™).

Chloroplast DNA Sequencing and Polymorphism Discovery
To allow for more rapid haplotype identification, the detection of sequence polymorphisms defining PCR-RFLPs was performed using high resolution melting (HRM) for medium-throughput genotyping [18], as done by Cubry et al. [19]. To identify candidate polymorphic cpDNA regions around which to design HRM primers, a preliminary PCR-RFLP screen was performed on a discovery set of samples from seven geographically distant Irish sites. Regions targeted were trnC-D (CD), psaA-trnS (AS), and trnT-F (TF) (as per Palmé et al. [4] and Maliouchenko et al. [12]). PCRs were performed in 10 × NH4 reaction buffer (BIOLINE), 5 units of BIOTAQ DNA polymerase (BIOLINE), 0.3 mM dNTP mix, 3 mM MgCl2, 0.2 µM primer mix, and ~5 ng gDNA. Targets were amplified using "subcycling" PCR conditions for targets with relatively low GC content according to Guido et al.

DNA Extraction
For each sample, approximately 200 mg of dried leaf tissue was disrupted for 2 min using a bead mill (30 Hz) and a single 3 mm tungsten carbide bead. Extraction of DNA was performed using a DNeasy Plant mini kit (QIAGEN, cat. no. 69204) according to the manufacturer's instructions. DNA was quantified using a NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific, cat. no. ND-2000) and DNA quality was determined by agarose (1.5%) gel electrophoresis and staining with SYBR™ Safe DNA Gel Stain (Invitrogen™).

Chloroplast DNA Sequencing and Polymorphism Discovery
To allow for more rapid haplotype identification, the detection of sequence polymorphisms defining PCR-RFLPs was performed using high resolution melting (HRM) for medium-throughput genotyping [18], as done by Cubry et al. [19]. To identify candidate polymorphic cpDNA regions around which to design HRM primers, a preliminary PCR-RFLP screen was performed on a discovery set of samples from seven geographically distant Irish sites. Regions targeted were trnC-D (CD), psaA-trnS (AS), and trnT-F (TF) (as per Palmé et al. [4] and Maliouchenko et al. [12]). PCRs were performed in 10 × NH 4 reaction buffer (BIOLINE), 5 units of BIOTAQ DNA polymerase (BIOLINE), 0.3 mM dNTP mix, 3 mM MgCl 2 , 0.2 µM primer mix, and~5 ng gDNA. Targets were amplified using "subcycling" PCR conditions for targets with relatively low GC content according to Guido et al. [20]. This included an initial incubation at 95 • C for 5 min, 30 cycles of 98 • C for 20 s followed by 4 subcycles (i.e., 30 × 4) of 60 • C and 65 • C for 15 s each. This was ended by a final extension for 5 min at 65 • C. Amplicons were digested directly using TaqI and Hinf I (separately), except for AS, which was only digested with TaqI [4]. All amplicons were then analyzed on an 8% TBE non-denaturing polyacrylamide gel (Novex™, Thermo Fisher Scientific), which was stained as before. Samples which captured the different RFLPs were sent for sequencing at Macrogen Europe (Macrogen Corporation, Amsterdam, The Netherlands). To ensure good quality contigs for each region, multiple internal primers were used (Table 2). Sequences were trimmed and mapped to in silico PCR amplicons from the Betula pubescens chloroplast reference sequence (NC_039996) using Geneious Prime ® 2021.1.1 (Biomatters Ltd., Auckland, New Zealand). In silico digests were then performed before aligning variable fragments to identify sequence polymorphisms around which to design suitable HRM primers (Table 2).

HRM Experiments
HRM analysis was performed on the whole sample set using a QIAGEN Rotor-Gene Q 2-plex HRM platform (QIAGEN GmbH, Germany). Each PCR was performed in a final volume of 15 µL comprising 2 × Type-it HRM mix (QIAGEN, cat. no. 206546), 0.8 µM primer mix and~5 ng gDNA. PCR conditions were as follows: 95 • C for 5 min, 40 cycles of 95 • C for 10 sec, 55 • C for 30 sec, and 72 • C for 10 sec. For HRM, fluorescence was continually monitored at a ramp rate of 0.1 • C for 2 s between 65 • C and 80 • C. Haplotypes were assigned manually using the Rotor-Gene Q-Pure Detection software (v2.3.5, QIAGEN GmbH, Germany). Grouping consistency was verified by comparison with PCR-RFLP gels and sequence alignments from the discovery sample set.

Data Analysis
Raw HRM and microsatellite data were combined prior to data analysis, as performed by others [12,26]. Species were analyzed together unless specified otherwise. Haplotype calling and frequency estimates per sampling site were calculated using a custom R script. These were mapped using QGIS (v3.18, Open-source software, Switzerland). General data handling, visualisation, and statistical analyses were performed using the R package adegenet (v2.1.3) [27,28]. The packages ade4 (v.1.7-16) [29], hierfstat (v.0.5-7) [30], mmod (v.1.3.3) [31], and poppr (v.2.9.1) [32,33] were also used to estimate population differentiation and diversity statistics as well as to bootstrap samples for tests of statistical significance. Poppr was also used to construct a minimum spanning network (MSN), using Euclidean squared distances between haplotypes, as done by Maliouchenko et al. [12] through the Arlequin software function for MSN construction. Before statistical analysis, sites with less than five samples were removed; this resulted in the removal of the Norwegian, Spanish, French, and "Scottish (Coillte)" samples. In addition, individuals with more than two null alleles (alleles that did not PCR-amplify) were removed prior to analyses.
Typological differences between sampling sites were investigated by submitting an allele contingency table to a factorial correspondence analysis (FCA) using adegenet. Results were plotted using the R package plot3D (v.1.3). Global population differentiation statistics were estimated using mmod, with estimates of significance being provided by a 95% confidence interval (CI) computed on 1000 bootstrap permutations of the dataset. For pairwise comparisons, Hedrick's G' ST was calculated, again using mmod. G' ST is a standardised version of Nei's G ST , which is itself an estimate of the fixation index, F ST for multiallelic markers [34]. G' ST considers the maximum theoretical G ST based on observed heterozygosity for a given marker, thereby dealing with biases towards low estimates for highly variable loci [35]. For easier interpretation of the pairwise estimates, each 95% CI was converted to a p value according to the method outlined by Altman and Bland [36] for deriving a p value when a CI is given for an estimate of difference in effect.
Analysis of molecular variance (AMOVA) tests were performed using the adegenet, poppr and ade4 packages for R. Specifically, the data in adegenet format (i.e., a "genind" object) was passed through the poppr wrapper of the ade4 AMOVA function. AMOVA significance was calculated on 1000 permutations of the data. To test for isolation by distance (IBD), samples containing null alleles were removed before using the ade4 Mantel test function to test for significant correlations between Slatkin's linearised pairwise F ST [37] and geographic distances-as suggested by Rousset [38]-following 1000 permutations of the data.
To estimate the extent to which all haplotypes were captured by our sampling, 10,000 permutations of our haplotype frequency distribution were used to calculate haplotype accumulation curves using the iterative extrapolation simulation algorithm, HACSim, available through the R package HACSim (v.1.0.5) [39]. HACSim calculates the number of samples needed to recover 95% of all haplotypes.

Chloroplast DNA Variation
A total of 240 birch individuals were sampled across 26 sites, 19 of which were in Ireland (Table 1). Morphologically, 201 of these were identified as Betula pubescens, 31 as Betula pendula, and eight as putative hybrids. Of the 26 sampling populations, 21 were wild and occurred in putative native woodland. The remainder were sourced from Coillte national breeding programmes, although their provenances were known. Haplotypes were identified based on variation at two PCR-RFLP (HRM) (Figure 2) and six microsatellite loci, as used in previous phylogeographic works on European birch [4,12,13]. The former was selected from a preliminary PCR-RFLP screen of a discovery set of Irish samples across three variable loci; trnC-D, psaA-trnS, and trnT-F (Table 2). Sequencing these samples revealed that the RFLP variation for all three lies primarily in indels of 19, 24, and 10 bp in length, respectively. For faster throughput haplotyping, primers flanking each region were designed and successfully tested in HRM experiments ( Table 2).

Chloroplast DNA Variation
A total of 240 birch individuals were sampled across 26 sites, 19 of which were in Ireland (Table 1). Morphologically, 201 of these were identified as Betula pubescens, 31 as Betula pendula, and eight as putative hybrids. Of the 26 sampling populations, 21 were wild and occurred in putative native woodland. The remainder were sourced from Coillte national breeding programmes, although their provenances were known. Haplotypes were identified based on variation at two PCR-RFLP (HRM) (Figure 2) and six microsatellite loci, as used in previous phylogeographic works on European birch [4,12,13]. The former was selected from a preliminary PCR-RFLP screen of a discovery set of Irish samples across three variable loci; trnC-D, psaA-trnS, and trnT-F (Table 2). Sequencing these samples revealed that the RFLP variation for all three lies primarily in indels of 19, 24, and 10 bp in length, respectively. For faster throughput haplotyping, primers flanking each region were designed and successfully tested in HRM experiments (Table 2). The exception, however, was the trnT-F indel, for which we were unable to design suitable primers due to the AT-richness of the flanking sequences. Even without screening at this region, 16 distinct haplotypes could nonetheless be identified, six of which occurred five times or more in the whole sample set ( Table 3). The third (H3) and fourth (H4) most abundant haplotypes only occurred in Ireland, roughly spanning from Cronybyrne (Co. Wicklow) in the east to Lough Gill, Slishwood (Co. Sligo) in the northwest (Figure 3). The The exception, however, was the trnT-F indel, for which we were unable to design suitable primers due to the AT-richness of the flanking sequences. Even without screening at this region, 16 distinct haplotypes could nonetheless be identified, six of which occurred five times or more in the whole sample set ( Table 3). The third (H3) and fourth (H4) most abundant haplotypes only occurred in Ireland, roughly spanning from Cronybyrne (Co. Wicklow) in the east to Lough Gill, Slishwood (Co. Sligo) in the northwest (Figure 3). The fifth most abundant haplotype (H5) was only identified in the English and Scottish sample set. An MSN based on Euclidean squared distances between haplotypes indicates that the haplotypes are relatively closely related and that the most abundant, H1 is also the most geographically widespread (Figure 4). Based on the distribution pattern, H1 is most likely to be equivalent to Haplotype A from Palmé et al. [13], which is the dominant haplotype in northern European populations.  2  2  106  118  100  205  118  147  Ireland  2  2  0  0   H9  1  1  105  118  100  205  119  147  Ireland  2  2  0  0   H10  1  1  104  118  100  205  117  147  Ireland  1  0  1  0   H11  2  2  106  118  100  205  119  147  Scotland  1  0  1  0   H12  2  2  106  118  100  205  116  147  England  1  0  1  0   H13  2  2  105  118  100  205  117  154  Spain  1  0  0  1   H14  1  1  105  118  100  205  119  148  Ireland  1  1  0  0   H15  1  1  105  118  98  205  118  147  Ireland  1  1  0  0   H16  1  2  105  118  100  205  117  147  Scotland  1  1  0  0 a Of the 240 individuals analyzed, 23 could not be haplotyped due to a lack of PCR amplification at one or more loci.
Statistical analysis of population differentiation and genetic diversity was only performed on populations containing five or more sampled individuals. Additionally, individuals with more than two null alleles were removed prior to analysis; in effect, this meant that only populations from Ireland, England, and Scotland were analyzed (n = 228). Unlike total diversity, intra-population diversity was low, whereas a global G' ST estimate of 0.336 indicated a moderate level of population structure (Table 4). This estimate was negligibly different when only the Irish populations were considered (G ST = 0.282, G' ST = 0.353). In agreement with this, results of a nested AMOVA revealed that 19.16% of variation was from differences between sampling populations (Table 5).  . Geographic distribution of the six main haplotypes (those which occurred five or more times) identified in this study. Pie chart size corresponds to the number of each haplotype which could be identified in a particular population. Colours correspond to the different haplotypes (H1 to H6). Names highlighted in red are sites which appeared more genetically differentiated relative to each other and to the other populations ( Figure 5 and Figure 6).  Statistical analysis of population differentiation and genetic diversity was only performed on populations containing five or more sampled individuals. Additionally, individuals with more than two null alleles were removed prior to analysis; in effect, this meant that only populations from Ireland, England, and Scotland were analysed (n = 228). Unlike total diversity, intra-population diversity was low, whereas a global G'ST estimate of 0.336 indicated a moderate level of population structure (Table 4). This estimate was negligibly different when only the Irish populations were considered (GST = 0.282, G'ST = 0.353). In agreement with this, results of a nested AMOVA revealed that 19.16% of variation was from differences between sampling populations (Table 5).
Effectively all population structure was attributable to differentiation between B. pubescens populations (Table 4). This is likely because there were considerably fewer B. pendula individuals, which meant that only two populations of B. pendula could be compared after removing those which possessed less than five individuals. In agreement with this, the nested AMOVA results showed that only 1.58% of variation could be explained at the species level, which was not statistically significant (Table 5). Table 4. Population differentiation statistics estimated across all haplotypes in the Irish, English, and Scottish sampling populations as well as separately across B. pubescens, B. pendula, and putative hybrids. Statistics include estimates of intrapopulation diversity (hS), total diversity (hT), diversity which apportions between populations (GST), and GST adjusted for the theoretical maximum based on mean heterozygosity (G'ST). Bootstrapped 95% confidence intervals (n = 1000 permutations) are provided in brackets. Bold values indicate statistical significance (p ≤ 0.05).      was explained by the first three FCA axes. While most populations were not clearly distinct from one another, there were exceptions, such as Rostrevor Forest (Co. Down), Annamarron (Co. Monaghan), Stormanstown Bog (Co. Louth), and Scragh Bog (Co. Westmeath), all of which appeared distinct from both each other and from the other populations. These sites also stood out in a pairwise G'ST comparison of populations ( Figure 6), with the highest G'ST values being for Scragh Bog, a site which had the highest frequency of the Irish-specific H3 haplotype ( Figure 3). The apparently lower levels of haplotype diversity among the more southern sampling populations in Ireland prompted us to test whether there was any statistical backing for either a north-south or an east-west effect. For this, populations south of Moods (a site nearest to the mid latitude point in Ireland) were deemed to be southern populations, whereas populations west of (and including) Carnpark (near the longitudinal midpoint) were deemed to be western populations. When populations were nested within either a northern or a southern location, a significant level (7.03%) of the variation could be explained, whereas no variation could be explained by an east-west division (Table 5). In-    Effectively all population structure was attributable to differentiation between B. pubescens populations (Table 4). This is likely because there were considerably fewer B. pendula individuals, which meant that only two populations of B. pendula could be compared after removing those which possessed less than five individuals. In agreement with this, the nested AMOVA results showed that only 1.58% of variation could be explained at the species level, which was not statistically significant (Table 5).

hS h T G ST G ST
To investigate which sites, if any, possessed unique allelic variation, population typology was investigated using an FCA ( Figure 5). Most of the allelic variation (77.05%) was explained by the first three FCA axes. While most populations were not clearly distinct from one another, there were exceptions, such as Rostrevor Forest (Co. Down), Annamarron (Co. Monaghan), Stormanstown Bog (Co. Louth), and Scragh Bog (Co. Westmeath), all of which appeared distinct from both each other and from the other populations. These sites also stood out in a pairwise G' ST comparison of populations ( Figure 6), with the highest G' ST values being for Scragh Bog, a site which had the highest frequency of the Irish-specific H3 haplotype (Figure 3).
The apparently lower levels of haplotype diversity among the more southern sampling populations in Ireland prompted us to test whether there was any statistical backing for either a north-south or an east-west effect. For this, populations south of Moods (a site nearest to the mid latitude point in Ireland) were deemed to be southern populations, whereas populations west of (and including) Carnpark (near the longitudinal midpoint) were deemed to be western populations. When populations were nested within either a northern or a southern location, a significant level (7.03%) of the variation could be explained, whereas no variation could be explained by an east-west division (Table 5). Indeed, genetic diversity was more than twice as high for northern (0.2806 ± 0.0133) compared to southern (0.1121 ± 0.0154) populations. To test for evidence that this geographic variation could be caused by IBD, a Mantel test was performed to test for a correlation between genetic and geographic distances. Neither before (r 2 = 0.069, p value = 0.347) nor after (r 2 = 0.004, p value = 0.539) removing the English and Scottish populations could a significant effect of IBD be observed. This latter result suggests that the north-south difference in Ireland is not attributable to IBD.

Sufficiency of Haplotype Capture in a European Context
Using the HACSim algorithm developed by Phillips et al. [39] for estimating the sufficiency of haplotype sample sizes, 10,000 permutations of the frequency distribution of all 16 haplotypes were used to extrapolate a haplotype accumulation curve ( Figure A1, Appendix A). From this, it was inferred that a total of 452 (95% CI: 449.74-454.26) individuals would need to have been sampled in order to have sufficiently captured 95% of the actually occurring haplotypes. Rather, it was estimated that 84% were captured instead, which suggests that up to three additional rare haplotypes may not have been identified at n = 217 (i.e., individuals with no null alleles).
According to Maliouchenko et al. [12], who used the same markers employed here (although including trnT-F) to identify 66 haplotypes in B. pubescens and B. pendula sampled across Western Europe (excluding Ireland) and Russia, at least 50 haplotypes ought to be identifiable across the regions represented in our sampling data. When we entered the adjusted number of expected haplotypes into the HACSim algorithm and assigned sampling probability frequencies of zero to the haplotypes which we did not identify (i.e., 50-16 = 34), only 27% of all possible haplotypes were identified. Realistically however, if only the samples from Ireland and Britain are considered, then at n = 214 we can assume that our sampling reflects 67% of the potential haplotypes in these regions. This estimation is based on an assumption by Maliouchenko et al. [12] that there are 20 observable haplotypes in Britain, that they can also to be found on the island of Ireland, and that they should include those which we identify here as Irish-specific haplotypes (H3 and H4). This estimate still suggests that most haplotypes for Ireland were captured in our study.

Discussion
Molecular studies are key elements in characterising populations for conservation and for monitoring sustainable forest management [40]. Here, we used a selection of cpDNA markers to characterise the genetic diversity and population structure of birch in Ireland. This selection was informed by two considerations: First, the selected markers have been widely used in previous studies to effectively map out the phylogeographies of several important European tree species, including birch [8,12,19,26,41]. However, in the case of birch, Irish populations have not been studied. Therefore, there exists a gap in the knowledge of the diversity of Irish birch at the cpDNA level. Moreover, there is an added urgency that the diversity of Irish birch be analyzed in the context of European populations given that Ireland has one of the lowest levels of native forestry cover in Europe, which stands currently at only approximately 2% [42]. With added studies, greater resolution of Europe-wide phylogeographic patterns can be achieved and the data can be used to select populations for use in conservation networks [43]. Second, as Irish birch populations are highly fragmented, we presumed that they would be poorly connected at the organellar level. By contrast, given that Ireland has a relatively small geographic area, gene flow via pollen between its isolated populations may still be nonetheless occurring. Therefore, we suspected that we would have a higher chance of observing any population structure at the cpDNA level given that the chloroplast genome is maternally inherited (i.e., via seed) in most angiosperms (discussed in Maliouchenko et al. [12]).
Native birch has been designated as "high priority" for conservation in Ireland [44]. In this study, 11 Irish haplotypes could be identified (Table 3). Based on abundance, the two most frequent (H1 and H2) may correspond to haplotypes "1" and "26", respectively, which were identified in Britain by Maliouchenko et al. [12]. Overall, of the most abundant haplotypes (≥5; Figure 3), two (H3 and H4) were not identified outside of Ireland. However, as the size and number of the non-Irish sampling populations was considerably smaller, it cannot be ruled out that these haplotypes were simply missed. Nonetheless, it is clear that the genetic diversity of Irish birch is relatively high compared to what has been reported for other native Irish trees such as oak (Quercus petraea (Matt.) Liebl. and Quercus robur L.), alder (Alnus glutinosa (L.) Gaertn.), and ash (Fraxinus excelsior L.), for which fewer haplotypes appear to be present [9,19,41]. For Irish oak, however, diversity has been estimated based on variation at only two (trnD-T and trnT-F) cpDNA regions [9], meaning that greater diversity may be revealed with the addition of more markers.
We are confident that our estimate of the genetic diversity of Irish birch is, if anything, slightly underestimated, as suggested by the extrapolation of haplotype accumulation curves. With all sampling populations included, it was estimated that 84% of all haplotypes were captured. Intuitively, this seems like a major overestimation given that regions other than Ireland (England, Scotland, France, Norway, and Spain) were represented in our data. Undoubtedly, this is attributable to the very small number of samples from these regions (n = 33, 2, 2, 2, and 1, respectively). When we factored in the number of haplotypes observed in these regions by Maliouchenko et al. [12], then haplotype capture dropped to only 27%. Inputting only the number of haplotypes which Maliouchenko et al. [12] observed in Britain, then this figure increased to 67% when considering only the British and Irish populations. However, it is worth mentioning that based on our markers alone, 15 haplotypes were identified across Ireland and Britain. The figure of 20 from Maliouchenko et al. [12] however, did not come directly from extensive sampling across Britain (n = 36), but rather from haplotypes identified in other regions which were presumed to be present because they were shown to be closely related to their actually observed British haplotypes in an MSN. This might suggest that our figure of 15 is a more accurate approximation. If so, then our sampling at n = 214 for Ireland and Britain ought to have captured 85% of all haplotypes. Focusing only on the Irish samples (n = 181) and the associated 11 haplotypes which we detected, then 87% of all possible Irish haplotypes were identified. Therefore, at the depth of the eight cpDNA markers used here, we are reasonably confident that a good representation of the genetic diversity in Ireland in the context of Europe-wide diversity has been revealed.
Owing to its lighter, wind-dispersed seeds, the genetic structuring and differentiation in birch compared to oak was expected to be lower [26]. Indeed, even with the English and Scottish populations included, cpDNA differentiation between sampling populations was lower for birch (G ST = 0.268) than for Irish oak (G ST = 0.730) [9]. This is more in line with other wind-dispersed species such as alder (G ST = 0.283) [19] and goat willow (Salix caprea L.; G ST = 0.38) [45]. At the species level, significant population structuring was only detected for B. pubescens (Table 4). We attributed this to an insufficient number of B. pendula populations from which diversity estimates could be calculated, which itself reflected the fact that this species is significantly less common than B. pubescens in Ireland [2].
Both species are well known to display high levels of hybridisation and haplotype sharing, so much so that interspecific cpDNA variation tends to be lower within the same forest compared to intraspecific variation between different forests [4,12]. Here, of the most common haplotypes, H1, H2, and H6 could be detected in both species. The occurrence of H3 and H4 in B. pubescens only is likely explained by their increased frequency in more northern populations where B. pendula, by contrast, becomes increasingly less frequent. As is the case for oak at the European level [8], the rarer haplotypes tended to be restricted to a single species.
It was not expected that the selected cpDNA markers would differentiate between species, as cpDNA variation in birch had already been extensively demonstrated to show no clear species delimitation [12,46,47]. For the most common shared haplotypes, this has been explained by incomplete lineage sorting, whereas for more rarer haplotypes it has been argued that interspecific backcrossing and sympatric introgression are responsible [46,47]. This is distinct from convergent evolution, which is not thought to play a role given the slow mutation rate of cpDNA and the asymmetric sharing of chloroplast alleles between the species [46]. Conversely, using nuclear DNA markers, strong species delimitation has been observed [15,46,48]. This has been explained by a model which states that higher gene flow (i.e., through pollen) within species will lead to better differentiation between species [49,50]. For species delimitation in Ireland, the well-validated nSSR markers for birch originally developed by Kulju et al. [51] ought to be tested.
The results of this work help to answer questions relating to the origins and phylogeographic patterns of birch in Ireland in the context of Europe and post-glacial recolonization following the LGM. Studies to date have shown a mixture of origins for tree populations in Ireland. Oak and ash populations have been shown to have originated in the Iberian Peninsula [9,41], whereas palynological and genetic data for alder indicate a two-pronged re-colonisation from the Iberian Peninsula and the Carpathians [19]. For birch in Ireland, a significant level of variation could be explained by a north-south divide. However, diversity was significantly greater in the more northern populations. This is congruent with earlier Europe-wide works which have demonstrated that birch did not recolonise from the south, as if this were the case, we would instead expect declining diversity at higher latitudes. Therefore, it is probably more realistic to account for the northern haplotype richness in Ireland as being a result of migration from Britain, in which case Ireland may be part of a western leading edge for birch in Europe. Potentially, the absence of the UK-specific haplotype (H5) in Ireland could mark a declining westward genetic diversity as might be expected as part of a founder effect. Within Ireland however, this effect was not observed as there was no significant east-west difference.
Another possible explanation for the higher northern diversity could be remnant B. nana haplotypes from ancient introgression events. B. nana is absent in Ireland today, but the macrofossil record reveals that it was present early following the LGM [6]. Moreover, pollen records show that B. nana and tree birch (such as B. pubescens) likely co-occurred during this period [52]. In Scotland, extant B. nana have been shown to share more nuclear alleles with B. pubescens than with B. pendula [48]. This, in conjunction with fossil evidence, led Wang et al. [48] to conclude that as B. nana moved northwards post-LGM with climate warming, "a footprint of introgressed genes in the genome of [advancing] B. pubescens" was left behind [48]. Indeed, triploid hybrids are readily observed where the species continue to co-occur [53]. Gene flow from B. nana into B. pubescens (but not the other way around) increases with latitude. Interestingly, this has led to a scenario being suggested in which pollen swamping of B. nana by B. pubescens creates hybrids which then backcross with the latter, resulting in haplotype capture from B. nana [46]. This agrees with findings from Currat et al. [49], in which it was demonstrated that introgression is almost always unidirectional from the local into the invading species. Therefore, the novel haplotype variation in the more northern Irish populations may be a genetic legacy of now-extinct Irish B. nana. Investigating whether these haplotypes occur in Scottish populations of B. nana could be useful in testing the validity of this hypothesis.
A main aim of this work was to select conservation units for birch in Ireland. Indeed, selection at the population level (rather than species) may be more sensible and practical for conserving FGR, given that the haplotypes were distributed geographically and not interspecifically. Towards this end, we suggest that populations sampled in the more northern and north-eastern areas (Figure 3), for example Scragh Bog, be prioritised as both FCA and pairwise G' ST analysis suggest these to be genetically the most differentiated ( Figures 5 and 6). We recommend that these sites be prioritised for conservation as they may represent possible sites of local adaptation and potentially contain unique allele combinations.

Conclusions
By mapping the genetic diversity of birch in Ireland, this work fills a gap in the phylogeographic structure of birch in Europe. Contrary to expectations based on other native Irish trees, haplotype richness in Irish birch is comparatively high. Building on previous work by Maliouchenko et al. [12], which estimated that up to 20 haplotypes may be observable in Britain, we empirically showed that at least 11 of these can be found in Ireland. In addition, the strikingly lower genetic diversity of southern populations supports the hypothesis that post-glacial recolonization did not involve migration from the Iberian Peninsula. Instead, and in agreement with pollen data, an eastern migration route from Britain is most likely. Moreover, based on findings from more recent works, we formulate a hypothesis which suggests that the greater northern diversity may be, in part, attributable to historic sympatric introgression events between B. pubescens and now-extinct Irish B.
nana. Finally, we suggest populations which may be particularly worthy of selection as part of European conservation networks for birch in Ireland.