1. Introduction
The lodgepole pine (
Pinus contorta Dougl.) complex illustrates a fundamental paradox in evolutionary biology: how do widespread species maintain substantial morphological variation across environmental gradients while still experiencing sufficient gene flow to prevent complete reproductive isolation? This species’ broad distribution—from the Yukon Territory through British Columbia and western Alberta, south to California, and east along the Rocky Mountains to Colorado [
1]—encompasses dramatic climatic and ecological differences that have led to notable phenotypic divergence. Understanding the evolutionary mechanisms underlying this variation requires integrating insights from population genetics, biogeography, and adaptation theory.
Two competing theoretical perspectives offer contrasting predictions about the evolutionary origins of morphological variation in lodgepole pine. The classical allopatric divergence theory posits that geographic isolation—particularly during Pleistocene glaciations—should produce phylogeographic patterns reflecting historical population fragmentation [
2,
3,
4]. Under this model, morphologically distinct subspecies should exhibit corresponding genetic differentiation across the genome, with the magnitude of divergence proportional to the duration of isolation and the effective population size. Critchfield’s influential morphological analysis [
2] identified four subspecies—
P. contorta contorta (coastal),
P. contorta murrayana (Sierra Nevada),
P. contorta latifolia (Rocky Mountains and interior), and
P. contorta bolanderi (Mendocino pygmy forest)—based primarily on cone architecture and foliar characteristics. This taxonomic framework assumes that morphological differences indicate separate evolutionary trajectories established through prolonged isolation during the Wisconsin glaciation (~100,000–12,000 years ago) [
4,
5].
An alternative theoretical perspective emphasizes migration-selection balance in maintaining local adaptation across species ranges [
6,
7,
8]. This framework predicts that morphological differentiation can evolve and persist despite ongoing gene flow if divergent selection is sufficiently strong relative to migration’s homogenizing effects. Critically, this scenario predicts genetic homogeneity at neutral markers alongside substantial differentiation at loci under selection—a genomic landscape fundamentally different from that expected under allopatric divergence. The conditions under which selection can overcome gene flow have been extensively modeled [
6,
7,
8], demonstrating that even moderate selection coefficients can maintain adaptive differentiation when gene flow is moderate, particularly for polygenic traits where selection acts simultaneously across multiple loci [
9,
10].
These contrasting frameworks generate testable predictions about lodgepole pine’s evolutionary history. Allopatric divergence predicts: (1) phylogenetic structure in neutral markers that aligns with subspecies boundaries; (2) genetic distances proportional to presumed periods of isolation; (3) reciprocal monophyly or distinct haplotype clusters for different subspecies; and (4) concordance between nuclear and organellar phylogeographic patterns reflecting genome-wide divergence. In contrast, the gene flow selection balance model predicts: (1) genetic homogeneity at neutral loci despite morphological differentiation; (2) weak or absent phylogeographic structure in organellar genomes; (3) morphological clines aligned with environmental rather than geographic gradients; and (4) potential discordance between adaptive and neutral genetic variation.
Our earlier research using nuclear markers revealed limited population genetic structure in lodgepole pine, inconsistent with long-term subspecies isolation [
11] and contradicting expectations under traditional vicariance scenarios. Subsequently, mitochondrial DNA minisatellite analysis identified unexpected refugial zones in Haida Gwaii (Queen Charlotte Islands) and the Alexander Archipelago [
12], suggesting a more complex biogeographic history than previously recognized. These findings raised critical questions: Does the apparent genetic homogeneity at nuclear markers extend to organellar genomes? If so, how can substantial morphological variation be maintained? What role did glacial refugia play in shaping current diversity patterns? And fundamentally, do recognized subspecies represent phylogenetically distinct lineages or ecotypic variants maintained by divergent selection?
Organellar genomes offer distinct advantages for reconstructing evolutionary history because of their uniparental inheritance, lack of recombination, and generally slower mutation rates than nuclear DNA [
13]. In conifers, chloroplast DNA (cpDNA) is paternally inherited and dispersed through pollen, facilitating long-distance gene flow and rapid homogenization across landscapes. Conversely, mitochondrial DNA (mtDNA) is maternally inherited and transmitted via seeds, which typically disperse over shorter distances, thereby preserving stronger signatures of historical population structure and colonization routes [
14,
15,
16]. This fundamental difference in dispersal biology creates contrasting spatial genetic structures: cpDNA typically exhibits less geographic structure because pollen disperses extensively, whereas mtDNA maintains stronger phylogeographic signals reflecting seed-mediated demographic processes.
These contrasting inheritance patterns enable powerful inferences about evolutionary history. Concordant phylogeographic structure across both genomes strongly suggests historical isolation of entire populations, whereas discordance may indicate sex-biased dispersal or recent gene flow primarily through pollen. Complete genetic homogeneity in cpDNA, coupled with some mtDNA structure, would suggest that extensive pollen flow has homogenized paternal lineages while maternal lineages retain traces of historical fragmentation. Conversely, homogeneity in both genomes, despite morphological variation, would provide compelling evidence that adaptive differentiation has occurred without prolonged geographic isolation.
This study addresses three interconnected objectives to discriminate among competing evolutionary hypotheses:
First, we test whether organellar genome variation supports existing subspecies classifications, directly evaluating predictions from allopatric divergence theory. If subspecies represent phylogenetically distinct lineages established through prolonged isolation, organellar genomes should exhibit: (a) significant genetic distances between subspecies exceeding typical intraspecific variation; (b) reciprocally monophyletic or at least distinct haplotype groups; and (c) phylogeographic structure aligned with taxonomic boundaries. Alternatively, if subspecies represent ecotypic variants, organellar genomes should show minimal differentiation, regardless of morphological distinctiveness.
Second, we evaluate the biogeographic hypothesis that Wisconsin glaciation caused prolonged population isolation by comparing coalescent expectations for genetic structure with empirical data. Classical vicariance models predict that ~100,000 years of isolation in separate refugia should generate detectable phylogenetic structure even at slowly evolving organellar loci. We assess whether the observed genetic diversity and demographic signatures are consistent with prolonged isolation or suggest more recent common ancestry and rapid post-glacial expansion.
Third, we identify potential glacial refugia and post-glacial colonization pathways by integrating molecular data with paleoecological evidence. Unique haplotypes concentrated in specific regions may indicate refugia, and their distribution patterns can illuminate colonization routes and the relative importance of seed versus pollen dispersal in range expansion.
By combining chloroplast and mitochondrial data within clear theoretical frameworks from coalescent theory, migration-selection models, and phylogeography, we seek to clarify the roles of historical demography and current selection in the evolution of lodgepole pine. These findings have wider implications for understanding conifer diversification processes and guiding conservation efforts that balance taxonomic identity with adaptive potential.
2. Materials and Methods
2.1. Study Populations and Hierarchical Sampling Design
We employed a hierarchical sampling strategy to capture both broad-scale phylogeographic patterns and fine-scale population genetic variation while minimizing environmental effects on genetic analyses. Needle tissue was collected from 139 individuals across 31 populations representing all four recognized subspecies. Sampling was conducted in established provenance plantations maintained by the British Columbia Ministry of Forests in Prince George and Lake Cowichan. These ex situ collections, derived from natural stands throughout the species’ range, provided standardized growing conditions that eliminated contemporary environmental variation while preserving the geographic and genetic signal from source populations—a critical advantage for disentangling genetic from plastic phenotypic variation.
Our sampling design included 16 populations of
P. c. latifolia, 11 of
P. c. contorta, 3 of
P. c. murrayana, and 1 of
P. c. bolanderi (
Figure 1). For all but four populations (121, 126, 135, and 141), four or five trees were sampled per population; these four populations had 1–3 trees due to low survival in the provenance trial. Although modest per-population sample sizes limit the detection of rare haplotypes, the total sample of 139 individuals across 31 populations provides enough statistical power to detect biologically meaningful subspecies-level differences in AMOVA [
17], as shown below (see
Section 2.5 for a formal power analysis). Fresh tissue was immediately frozen in liquid nitrogen and stored at −20 °C to prevent DNA degradation. For
P. c. bolanderi, which did not survive in provenance trials due to its extreme adaptation to nutrient-poor soils, we germinated archived seeds and grew seedlings in controlled greenhouse conditions for six weeks before tissue collection.
2.2. DNA Extraction and PCR Amplification
Total genomic DNA was extracted using a modified CTAB protocol optimized for coniferous tissues [
18], which effectively removes polyphenolic compounds and polysaccharides that can inhibit downstream enzymatic reactions. The chloroplast
trnL intron and
trnL/
trnF intergenic spacer regions were amplified as a single fragment of approximately 1100 bp using universal primers c and f [
19]. These regions were chosen for their proven usefulness in resolving phylogenetic relationships among closely related taxa, due to their moderate substitution rates that balance phylogenetic signal with alignability [
20,
21,
22,
23]. The mitochondrial
nad1 b/c intron was amplified with primers NAD1B1F and NAD1C1R [
24], producing an approximately 1537 bp fragment from which a 339 bp variable region was later sequenced using nested internal primers NAD1B3F and NAD1C3R. This region contains tandemly repeated 34 and 32 bp elements that differ among pine species [
24,
25], potentially offering valuable phylogeographic markers. Sequences of the amplification and sequencing primers used in this study are detailed in
Table 1.
PCR reactions were conducted in 25 μL volumes containing 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 2.5 mM MgCl2, 200 μM of each dNTP, 0.5 μM of each primer, 0.5 units of Taq polymerase, and 20–40 ng of template DNA. Thermal cycling involved an initial denaturation at 95 °C for 2 min, followed by 35 cycles of 95 °C for 1 min (denaturation), 55 °C for 1 min (annealing), and 72 °C for 2 min (extension), with a final extension at 72 °C for 5 min to ensure complete synthesis of all products. PCR products were verified by agarose gel electrophoresis, and amplicons displaying a single band of the expected size were purified using standard protocols before sequencing.
2.3. DNA Sequencing and Quality Control
Direct bidirectional sequencing was carried out using fluorescently labeled primers on a Li-Cor 4200 automated sequencer with SequiTherm EXCEL II DNA sequencing chemistry (Epicentre Technologies, Maharashtra, India). This method allowed for high-quality sequence determination without the need for cloning, which is suitable for organellar genomes where intracellular homogeneity reduces concerns about heteroplasmy. Both forward and reverse strands were sequenced to enhance accuracy and resolve ambiguous base calls. Raw chromatograms were inspected and edited with Chromas version 2.6.6 [
26], paying close attention to trace quality scores and possible sources of sequencing artifacts.
2.4. Sequence Alignment and Phylogenetic Analysis
Sequences were aligned using ClustalW version 2 within MEGA11 [
27] with the following parameters: gap opening penalty = 15.0, gap extension penalty = 6.66 for the pairwise alignment phase, and gap opening penalty = 15.0, gap extension penalty = 6.66 for the multiple alignment phase; IUB weight matrix; transition weight = 0.5, followed by manual inspection. The
trnL intron and
trnL/
trnF spacer datasets were combined for analysis because these regions are physically linked and inherited as a single unit, maximizing phylogenetic signal while avoiding pseudo-replication. Indel events were coded as binary characters using simple indel coding, which treats multi-base insertions or deletions as single evolutionary events, thereby retaining phylogenetic information while avoiding inappropriate weighting of indel length variation. We chose simple indel coding [
28] over alternative schemes (e.g., the modified complex indel coding or complete deletion of indel-containing sites) because of its demonstrated applicability to datasets with few, non-overlapping indels—as observed here. The 5 bp and 26 bp deletions occur at non-overlapping alignment positions; thus, the ambiguities inherent in more complex coding schemes do not apply. Nonetheless, to verify that the indel coding scheme did not affect phylogenetic conclusions, we re-ran maximum parsimony analyses with (a) indels treated as missing data and (b) indels excluded entirely; both alternatives recovered the same unresolved polytomy, confirming that conclusions are robust to indel treatment.
Nucleotide diversity (
π) was estimated following Nei [
29], providing a measure of average pairwise sequence divergence within and among populations. Genetic distances reported represent mean pairwise distances computed across all individual sequences within each subspecies group (inter-subspecific distances) or among individuals within each subspecies (intra-subspecific distances, shown on the diagonal). Genetic distances between subspecies were calculated using Tajima-Nei corrections [
30], which account for differences in rates of transition and transversion substitutions and saturation effects. These analyses were implemented in MEGA11 [
27]. Summary statistics describing sequence variation and neutrality tests were computed using DnaSP version 3.53 [
31]. Specifically, we calculated Tajima’s
D [
32] and Fu and Li’s
D* [
33], which test for departures from neutral evolution expectations by comparing different estimators of the population mutation parameter
θ. Significantly negative values of these statistics indicate an excess of rare alleles relative to neutral expectations, potentially reflecting either purifying selection removing deleterious mutations or demographic expansion that increases population size and thereby generates many recent, rare mutations.
2.5. Statistical Power Analysis
To formally assess whether the sampling design had enough statistical power to detect biologically meaningful structure at all levels of the AMOVA—among subspecies (ΦCT), among populations within subspecies (ΦSC), and within populations—we performed a formal power analysis for the three-level nested design. Power was estimated by partitioning the total variance across three levels with the following degrees of freedom: dfgroups = k − 1 = 3 (among four subspecies), dfpops (groups) = P − k = 27 (among 31 populations within four subspecies), and dfwithin = N − P = 108 (within 31 populations). The non-centrality parameter at each level is given by λ = df × nc × Φ/(1 − Φ), where nc is the adjusted average sample size appropriate for each comparison.
For the among-subspecies contrast (
ΦCT),
ΦCT is tested in the hierarchical design against MS
pops (groups) rather than MS
within; the appropriate denominator degrees of freedom are therefore
df = 27. The group size
nc = 27.00 (from
N = 139, with per-subspecies totals
nlatifolia = 76,
ncontorta = 46,
nmurrayana = 12,
nbolanderi = 5). Power to detect at
ΦCT = 0.10 is 0.6974; at
ΦCT = 0.15, power is 0.893; and at
ΦCT = 0.20, power reaches 0.95 (
Supplementary Table S1). The minimum detectable
ΦCT at 80% power is 0.13.
For the among-populations-within-subspecies contrast (
ΦSC), the relevant population size is approximately 4.48 (mean individuals per population: N/P = 139/31), resulting in
df = 27, 108. Power to detect
ΦSC = 0.15 is 0.647; at
ΦSC = 0.20, power increases to 0.841; and the smallest detectable
ΦSC at 80% power is 0.187 (
Supplementary Table S2). This indicates that the per-population sample sizes of 3–5 are sufficient to identify moderate-to-large population-level differentiation (
ΦSC ≥ 0.19) but may miss weaker among-population structure within subspecies (
ΦSC < 0.15). We acknowledge this limitation: our design was optimized for broad coverage across 31 populations rather than deep sampling within individual populations. Detecting fine-scale within-subspecies population structure would require targeted, intensive resampling.
For rare haplotype detection, the probability of observing at least one copy of a haplotype at a population frequency p is 1 − (1 − p)n. With n = 3, haplotypes at p = 0.10, 0.20, and 0.30 are detected with probabilities of 0.27, 0.49, and 0.66, respectively; with n = 5, these increase to 0.41, 0.67, and 0.83. Per-population sample sizes of 3–5 are therefore insufficient for cataloging the full range of within-population haplotype diversity, especially for alleles at p < 0.20. However, the implications for inference are asymmetric: undetected rare haplotypes are private variants limited to individual populations, and their existence would increase within-population variance and further reduce any detectable among-subspecies signal. Their non-detection, therefore, cannot lead to inflated ΦCT values and does not undermine the main conclusion of negligible subspecies-level differentiation. Instead, it emphasizes a limitation in our ability to resolve fine-scale haplotype diversity within populations, which we acknowledge as an area for future detailed resampling studies.
To empirically evaluate whether small per-population sample sizes of 3–5 individuals could bias AMOVA results, we conducted complementary simulation-based and rarefaction analyses using the observed alignment (
π = 0.00018, L = 1400 bp,
N = 139). For the simulation, a five-haplotype model representing the observed cpDNA diversity (14 variable sites; dominant haplotype frequency ≈ 0.65) was used to generate 500 structured datasets per target
ΦCT value following a four-group island model that matched the subspecific sampling design (
nlatifolia = 76,
ncontorta = 46,
nmurrayana = 12,
nbolanderi = 5;
Supplementary Figure S1A). Power was 8.2% at
ΦCT = 0.10, 40.0% at
ΦCT = 0.20, and only reached 80% at roughly
ΦCT ≈ 0.30, confirming that the low power results from the scarcity of segregating sites caused by the exceptionally low nucleotide diversity of this locus, not the small per-population sample size. In the rarefaction analysis, individuals were subsampled (
r = 2–5 per population; 500 replicates per level) from null datasets simulated with the same haplotype model and no among-group structure. The estimated
ΦCT remained near zero across all rarefaction levels (mean = 0.018 at
r = 2; 0.007 at
r = 5; 95% CI including zero throughout; false-positive rate < 1.5% at all levels;
Supplementary Figure S1B), indicating that the near-zero
ΦCT observed in the empirical data cannot be attributed to an artifact of small per-population sample size. These simulation-based estimates (8.2% at
ΦCT = 0.10) are considerably lower than the analytical
F-test power (64% at the same
ΦCT) because they measure fundamentally different quantities: the analytical
F-test assumes that the variance components are freely estimable from any allele frequency contrast and represents the maximum theoretical power of the AMOVA framework given the sample structure, whereas the simulation is bounded by the actual haplotype pool available in this dataset (five haplotypes, 14 variable sites,
π = 0.00018). With so few polymorphic sites, most individual pairs across populations carry identical sequences regardless of the true
ΦCT, severely limiting the realised discriminatory power of the test. The simulation-based power curve is therefore the more relevant guide to the detectability of structure in this specific dataset, and it confirms that the near-zero empirical
ΦCT is not an artefact of insufficient sampling but a genuine reflection of organellar genetic uniformity across subspecies.
3. Results
3.1. Organellar Genome Variation and Geographic Structure
Both organellar genomes exhibited remarkably low genetic variation across the species’ entire range, a pattern inconsistent with predictions from allopatric divergence theory. The chloroplast trnL intron (487 bp sequenced) contained only five polymorphic sites, yielding a nucleotide diversity of π = 0.000178. The adjacent trnL/trnF spacer (385 bp) revealed four polymorphic sites, with π = 0.000186. These diversity values are among the lowest reported for widespread conifer species and suggest either recent common ancestry, severe historical bottlenecks, or both.
The most striking chloroplast variant was a 5 bp indel (deletion of TAAAT) at positions 404–408 of the
trnL intron (
Table 2). This deletion occurred in four individuals from three geographically disjunct populations: two coastal populations (49 and 95, both
P. c. contorta from the Queen Charlotte Islands region) and one interior population (36,
P. c. latifolia from the Rocky Mountains). The predominance of this marker in coastal populations, combined with its rarity in interior locations, provides molecular evidence for a northern Pacific coastal refugium and subsequent eastward pollen-mediated gene flow following deglaciation. Additional structural variation in the
trnL/
trnF spacer included a single-bp deletion in population 135 (
P. c. murrayana) and a 26 bp deletion in population 31 (
P. c. latifolia). However, the phylogeographic significance of these variants remains unclear given their single occurrences (
Table 3).
Most notably, the mitochondrial
nad1 b/c intron showed complete sequence homogeneity across all 139 sampled individuals, despite containing tandemly repeated 34- and 32 bp elements known to vary in other pine species [
24,
25]. This total lack of mtDNA variation sharply contrasts with
Pinus ponderosa, where variation in repeat number clearly delineates distinct eastern and western lineages established during Pleistocene glaciations [
34]. The monomorphic mitochondrial genome offers no phylogeographic signal, making it impossible to infer seed-mediated colonization routes or maternal lineage structure and suggesting either recent bottlenecks or extremely slow mtDNA evolution in this species. However, it is important to recognize that the absence of variation may partly reflect the choice of locus rather than true species-wide maternal uniformity. The
nad1 b/c intron is among the more conserved regions in pine mitochondrial genomes [
35], and several other introns (e.g.,
nad5 intron 1 [
36],
nad7 introns [
37],
cox1 introns [
38]) or mitochondrial simple sequence repeats (mtSSRs) are known to show substantially higher polymorphism in
Pinus [
12]. Future research incorporating additional mtDNA targets or mtSSR markers would give a more complete picture of maternal lineage structure in
P. contorta and should be considered before making firm conclusions about mtDNA uniformity across the species. Benchmarks of locus-specific variability in conifers suggest that the
nad1 b/c intron may be too conserved to resolve intraspecific structure even if such structure exists [
35,
36,
37,
38].
Spatial analysis revealed no geographic clustering of chloroplast haplotypes that corresponded to subspecies boundaries or major geographic regions. The most common haplotype (consensus sequence) was ubiquitous across all four subspecies and often the only variant within populations, indicating recent common ancestry and extensive gene flow. Single-nucleotide polymorphisms showed no apparent geographic structure, appearing randomly distributed across the range without the clinal patterns expected under isolation-by-distance or the discrete clusters expected under refugial isolation. Notably, rare haplotypes were disproportionately found in peripheral rather than core populations, contrary to theoretical predictions that range-margin populations should exhibit reduced diversity due to founder effects and genetic drift during range expansion (“leading edge” effects). This unexpected pattern may reflect (1) sampling artifacts given modest within-population sample sizes, (2) persistence of ancestral variation in historically stable peripheral refugia, or (3) mutation accumulation in long-isolated marginal populations. Distinguishing among these alternatives would require more intensive sampling of both peripheral and core populations.
3.2. Subspecies Differentiation and Phylogenetic Relationships
Genetic distances among recognized subspecies were minimal, ranging from 1.06 × 10
−4 to 3.96 × 10
−4 (
Table 4), well within the range typically observed for intraspecific variation in conifers. Within-subspecies variation was comparable to between-subspecies variation, failing to support recognition of subspecies as genetically distinct evolutionary units. The largest genetic distance separated
P. c. latifolia and
P. c. contorta (3.96 × 10
−4), whereas the smallest occurred between
P. c. contorta and
P. c. bolanderi (2.56 × 10
−5). These small distances provide no evidence for deep evolutionary divergence among subspecies and suggest that morphological differences have evolved without substantial neutral genetic differentiation.
Maximum parsimony analysis of the combined chloroplast dataset produced a poorly resolved tree dominated by a large polytomy encompassing most individuals, with no recovery of morphologically defined subspecies as monophyletic groups; individuals from different subspecies clustered together throughout the topology. Bootstrap support values were uniformly low, indicating insufficient phylogenetic signal. The few resolved nodes separated individual haplotype variants rather than subspecies groups, suggesting either recent divergence or extensive homogenization of organellar lineages.
3.3. Demographic Signatures: Evidence for Recent Expansion
Neutrality tests revealed significant departures from equilibrium expectations, providing insights into demographic history. Tajima’s D was significantly negative (−2.26, p < 0.02), as was Fu and Li’s D* (−4.52, p < 0.02), both indicating an excess of rare alleles relative to neutral equilibrium expectations. In non-coding organellar regions where purifying selection is unlikely to operate strongly, a recent demographic expansion following glacial retreat is the most parsimonious explanation for this signature. Rapid range expansion from restricted southern and possibly northern refugia would amplify common ancestral alleles throughout the expanded range while rare variants remain localized near refugial areas, generating the observed site frequency spectrum characteristic of population growth.
Under the standard neutral model, Tajima’s
D compares the average number of pairwise differences (
π) with the number of segregating sites (
S), with negative values indicating excess rare variants. Fu and Li’s
D* similarly compares the number of external branch mutations (those appearing in only one sequence) with the total number of mutations, with negative values indicating an excess of singletons. The concordance between these two tests strengthens the inference that the observed site frequency spectrum departs from neutral equilibrium. However, a fundamental caveat must be acknowledged: negative values of both Tajima’s
D and Fu and Li’s
D* are consistent with both recent demographic expansion and purifying or positive selection at linked sites, and standard frequency-spectrum statistics cannot formally separate these processes [
34,
35]. The two explanations can, in principle, be distinguished using full or approximate Bayesian coalescent inference [
39,
40], which fits competing demographic models to the full data and returns posterior probabilities for each scenario. In the present study, demographic expansion is nonetheless the most parsimonious interpretation for three convergent reasons. First, the analysed loci are non-coding intergenic spacers and intron sequences for which purifying selection is unlikely to be substantial. Second, the demographic-expansion signal is consistent with extensive paleoecological evidence—pollen records documenting rapid northward range expansion from glacial refugia following deglaciation [
41,
42]—providing independent corroboration from a completely different data class. Third, selection on organellar genomes would require either very strong purifying selection uniformly affecting all surveyed non-coding loci, or a selective sweep of recent origin, both of which are difficult to reconcile with the moderate and geographically homogeneous nucleotide diversity observed. We therefore retain demographic expansion as the working hypothesis, while explicitly flagging that formal Bayesian model comparison remains outstanding and is recommended as a priority for future work (see
Section 4.6).
3.4. Estimation of Coalescence Time and Demographic History
Nucleotide diversity (
π ≈ 0.00018) can be used to estimate the time to the most recent common ancestor (
TMRCA) under the standard neutral coalescent relationship
θ ≈
π = 4
Neμ [
43,
44]. However, this calculation depends critically on two uncertain quantities: the per-site per-year substitution rate (
μ) for the
trnL/
trnF region, and the long-term effective population size (
Ne). Both quantities carry substantial uncertainty in non-equilibrium populations such as those that experienced glacial contraction and post-glacial expansion. We therefore present a sensitivity analysis rather than a single-point estimate, systematically varying
μ and
Ne over plausible ranges reported in the conifer literature.
Published fossil-calibrated substitution rates for non-coding cpDNA in pines range from approximately 0.5 to 5.0 × 10
−9 substitutions per site per year [
13,
45,
46]. The lower end of this spectrum is typical for slowly evolving intron regions, while the upper end reflects faster-evolving intergenic spacers. For
Ne, lodgepole pine is a widespread, wind-pollinated conifer with large census population sizes [
47]. However, chloroplast
Ne is expected to be considerably lower than census size due to paternal inheritance [
48], selective sweeps [
49], and repeated glacial bottlenecks [
50]. Empirical estimates of chloroplast
Ne in related pines range from roughly 10,000 to about 150,000 [
51,
52]. Since
TMRCA =
π/(2
μ) under a diploid model or
π/
μ for haploid organellar genomes under the infinite-sites model (
Table 5), the estimated coalescence times cover nearly two orders of magnitude across the plausible parameter space.
Table 5 shows estimated
TMRCA for five values of
μ across the published range, evaluated across five effective population sizes: low (
Ne = 10,000), lower-intermediate (
Ne = 25,000), intermediate (
Ne = 50,000), upper-intermediate (
Ne = 100,000), and high (
Ne = 200,000) effective population sizes. Throughout the full parameter space,
TMRCA varies from approximately 4000 years (fast rate, low
Ne) to roughly 720,000 years (slow rate, high
Ne). The central scenario (
μ = 1.5 × 10
−9,
Ne = 50,000) produces
TMRCA of approximately 60,000 years, which falls squarely within the timeframe of the Wisconsin glaciation (~100,000–12,000 years ago). In scenarios with lower rates or larger
Ne, coalescence occurs before the Wisconsin glaciation, suggesting that the observed low diversity might also reflect ancestral population structure prior to the most recent glacial cycle. Scenarios with faster rates and smaller
Ne are consistent with post-glacial coalescence, suggesting a significant bottleneck during the Holocene or late-glacial period.
The standard equilibrium model used in this calculation also assumes a stable population size, but there is strong independent evidence—from the significantly negative Tajima’s
D and
Fu and Li’s
D* statistics (
Section 3.3) and paleoecological pollen records [
41,
42]—that lodgepole pine experienced significant post-glacial range expansion. Population growth after a bottleneck shortens coalescent branch lengths compared to the equilibrium expectation, leading
π to underestimate the actual
TMRCA and skewing the site-frequency spectrum toward rare variants (exactly the pattern observed). Therefore, the estimates at the lower end of
Table 5 are probably artificially young, and the actual coalescence time may be much older than any single-rate estimate suggests. Given this non-equilibrium demographic context, the
TMRCA estimates in
Table 5 should be seen as approximate bounds rather than precise point estimates. Formal Bayesian inference with BEAST or similar software, using an appropriate demographic model and multiple fossil calibration points, would greatly reduce this uncertainty and is recommended for future research [
40,
53]. The key finding for this study’s conclusions is not a specific coalescence time, but rather that all scenarios within the plausible parameter space are incompatible with the deep, long-term isolation (millions of years) required to produce the level of organellar structure observed in well-differentiated conifer lineages.
Under neutral coalescent theory, lineage sorting over 100,000+ years should produce detectable phylogeographic structure even at slowly evolving organellar loci, especially given the large effective population sizes of widespread conifers [
54,
55,
56]. The absence of such structure in lodgepole pine—consistent across the full sensitivity range in
Table 5—suggests either (1) severe bottlenecks during glaciation eliminated most ancestral variation, leaving insufficient polymorphism for lineage differentiation; (2) rapid post-glacial expansion from a single or only a few closely related source populations homogenized variation before lineage sorting could occur; (3) cryptic northern refugia maintained connectivity among populations throughout glaciation, preventing complete isolation; or (4) post-glacial gene flow has been extensive enough to erase refugial signatures. Differentiating among these scenarios requires integrating paleoecological data and additional genetic information.
4. Discussion
4.1. Discordance Between Genetic and Morphological Variation: Theoretical Implications
Our results reveal a striking paradox: lodgepole pine exhibits pronounced morphological variation, warranting recognition of four distinct subspecies [
2], yet its organellar genomes show virtually no phylogenetic structure that aligns with these taxonomic boundaries. This pattern matches Avise’s Category V phylogeographic structure—defined by widespread haplotypes, low genetic diversity, and shallow genealogical roots that point to recent expansion from a common ancestor [
57]. This discordance between phenotypic and neutral genetic divergence challenges traditional assumptions underlying subspecies classifications in widespread conifers and requires careful evaluation within contemporary evolutionary theory.
From a theoretical perspective, the maintenance of morphological differentiation despite genetic homogeneity can arise through two non-mutually exclusive mechanisms: (1) rapid adaptive evolution driven by strong divergent selection, with gene flow-selection balance maintaining local adaptation across the species range [
6,
7,
8]; or (2) phenotypic plasticity producing environmentally induced morphological variation without underlying genetic differentiation [
58]. The former scenario predicts strong differentiation at loci underlying adaptive traits, whereas the latter predicts uniformly low differentiation across genomic regions. Distinguishing these alternatives requires genomic data targeting adaptive loci, though ecological observations provide initial insights. A critical limitation of the present study is that organellar loci, being neutrally evolving and uniparentally inherited, are blind to cryptic adaptive structure in the nuclear genome. Populations that appear homogeneous at chloroplast and mitochondrial markers may harbour substantial differentiation at nuclear loci under selection—a pattern documented in several conifers where organellar homogeneity coexists with pronounced nuclear adaptive divergence [
59]. Consequently, conclusions about the relative roles of selection and gene flow should be regarded as hypotheses to be tested with nuclear genomic evidence rather than firm empirical findings.
Several lines of evidence favor adaptive evolution over pure phenotypic plasticity as the sole explanation for morphological divergence, indicating a substantial genetic basis for adaptive differentiation. First, common-garden experiments in which diverse provenances were grown under uniform conditions showed that growth (height, diameter, and volume), morphology (branch length, branch width, and branch angle), and specific gravity varied considerably across both traits and geographic regions, reflecting the complex genetic basis of these characteristics [
60]. If variation in growth, morphology, and specific gravity were attributable solely to phenotypic plasticity, common-garden studies should eliminate these differences because all genotypes experienced identical environments.
Second, populations show the highest survival and growth in environments matching their origin, and performance declines when transplanted to foreign environments [
61]. This pattern of local adaptation—where each population performs best “at home”—requires genetic differentiation in fitness-related traits and cannot arise from phenotypic plasticity alone, which should optimize performance across all environments if plasticity itself is adaptive. In addition, morphological clines align with environmental gradients in predictable ways. For example, coastal forms exhibit adaptations to fog-belt environments, interior forms show fire-adapted cone characteristics, and the pygmy forest subspecies displays extreme dwarfism on nutrient-poor soils [
62]. These patterns suggest local adaptation rather than random phenotypic variation.
Third, specific morphological traits show genetic control and rapid evolutionary responses that are inconsistent with phenotypic plasticity. Cone serotiny—the retention of closed cones that require fire-generated heat for seed release—correlates strongly with fire frequency [
63]. Populations experiencing frequent stand-replacing fires show mean serotiny levels of 70–90%, whereas populations in fire-infrequent environments show < 20% serotiny. Reciprocal seed-sowing experiments demonstrate rapid selection against non-serotinous phenotypes in high-fire environments and against serotinous phenotypes in low-fire environments [
64]. Similarly, cone morphology responds to differential seed predation by red squirrels (
Tamiasciurus hudsonicus) and crossbills (
Loxia spp.), creating a geographic mosaic of predator-driven selection [
65]. Where squirrels are abundant, selection favors cones with thick scales and strong attachment to branches (squirrel-resistant morphology). In contrast, crossbill predation favors elongated cones with weak scale closure (crossbill-resistant morphology). These opposing selection pressures maintain variation in cone morphology across the range despite potential gene flow. Taken together, these patterns suggest local adaptation rather than random phenotypic variation.
Contemporary evolutionary theory increasingly recognizes that substantial phenotypic divergence can occur rapidly—on ecological rather than evolutionary timescales—when selection is sufficiently strong [
66,
67]. The gene flow-selection balance framework offers a coherent explanation for our observations. Under this model, pollen-mediated gene flow (as reflected in chloroplast DNA) homogenizes neutral genetic variation across the landscape. In contrast, divergent selection maintains adaptive differentiation at functional loci despite ongoing migration [
6,
7,
8,
9]. The requisite selection coefficients need not be prohibitively large: theoretical models show that even moderate selection can maintain local adaptation across moderate levels of gene flow, particularly for traits with polygenic architectures, where selection can act on multiple loci simultaneously [
6,
8,
9,
10]. In lodgepole pine, wind pollination facilitates extensive pollen dispersal (as evidenced by complete chloroplast DNA homogeneity). In contrast, shorter seed dispersal distances (as reflected in slightly more structured mitochondrial patterns in previous studies [
12]) create opportunities for seedling establishment in locally adaptive microhabitats.
4.2. Biogeographic History and Glacial Refugia: Reconciling Molecular and Paleoecological Evidence
The genetic homogeneity seen across the range of lodgepole pine offers limited support for traditional biogeographic hypotheses that suggest long-term isolation in multiple glacial refugia [
4,
5]. Classical vicariance models predict that the Wisconsin glaciation (which covered most of its current range for about 100,000 years) would fragment populations into isolated refugia, leading to distinct genetic lineages tied to specific refugial areas. According to coalescent expectations [
43,
44], the low observed nucleotide diversity (
π ≈ 0.00018) indicates a relatively recent common ancestor (
TMRCA) for the sampled lineages. Using a chloroplast substitution rate of roughly 1.0–3.0 × 10
−9 substitutions per site per year for pine
trnL/
trnF regions [
13], the sensitivity analysis in
Section 3.4 and
Table 5 shows that plausible
TMRCA estimates range from approximately 4000 to 720,000 years depending on the assumed substitution rate and effective population size. One main scenario broadly aligns with the Wisconsin glaciation timeframe, but the broad uncertainty means we cannot confidently determine a specific coalescence date. Regardless of the assumed rate, lineage sorting over these timescales under neutral coalescent theory should produce detectable phylogeographic structure, given the effective population sizes typical of widespread conifers [
54,
55,
56].
However, the absence of such structure in lodgepole pine—contrasting sharply with a clear east–west phylogeographic division in
Pinus ponderosa [
34,
68]-suggests either more recent common ancestry or exceptionally effective post-glacial homogenization. Several scenarios could explain this pattern: (1) severe bottlenecks during glaciation eliminated ancestral variation, leaving insufficient polymorphism for lineage differentiation; (2) rapid post-glacial expansion from a single or a few closely related source populations homogenized variation before lineage sorting could complete; (3) cryptic northern refugia maintained connected populations throughout glaciation, preventing complete isolation; or (4) post-glacial gene flow has been sufficiently extensive to erase any signature of refugial isolation.
The identification of a unique 5 bp chloroplast indel, predominantly in coastal populations (Queen Charlotte Islands and adjacent mainland), provides direct molecular evidence for a northern Pacific refugium, consistent with paleoecological data indicating ice-free coastal zones during the Wisconsin glaciation [
69,
70,
71]. The presence of this marker in interior populations (population 36,
P. c. latifolia) supports post-glacial pollen-mediated gene flow from coastal to interior regions following deglaciation. Wu and Ying [
61] reported that population 36 exhibits phenotypic characteristics intermediate between coastal and interior forms, displaying coastal-type morphology (extensive browsing by snowshoe hare,
Lepus americanus) at the test site while showing interior-type growth patterns at sites without wildlife damage. The geographic position of this population—situated in a river valley that descends toward the coast—facilitates pollen deposition from coastal air masses. The high frequency of non-serotinous cones observed in this population, a diagnostic trait distinguishing coastal from interior subspecies, provides additional evidence of coastal genetic influence. These observations are consistent with asymmetric gene flow from coastal to interior populations. However, we note that this interpretation is inferential rather than directly demonstrated, as organellar markers alone cannot establish directionality of gene flow. Direct confirmation would require nuclear genomic data, such as genome-wide SNP panels, to distinguish between coastal and interior allele pools. We propose that directional gene flow, bringing coastal alleles into interior populations, in combination with selection gradients associated with climatic variation, has been instrumental in establishing the observed geographic cline of local adaptation. This east–west connectivity in
P. contorta [
3,
12,
69] contrasts with the more isolated north–south refugial pattern proposed for
P. ponderosa [
34,
68]. It may reflect differences in refugial geography or post-glacial dispersal dynamics.
Recent syntheses of glacial refugia across western North American tree species [
69,
70] reveal considerable heterogeneity in refugial patterns, with some species exhibiting strong phylogeographic structure (e.g.,
Pinus ponderosa [
34,
68],
Pseudotsuga menziesii [
71]) and others showing relative genetic homogeneity (e.g.,
Picea glauca [
72]). These differences likely reflect species-specific combinations of refugial distribution, effective population sizes, dispersal capability, and time since range expansion. Lodgepole pine’s pattern suggests either rapid expansion from limited refugia or maintenance of population connectivity through stepping-stone colonization during deglaciation [
73], with subsequent gene flow effectively homogenizing organellar variation [
74].
4.3. Mechanisms of Rapid Morphological Evolution: Life History and Ecological Context
The maintenance of substantial morphological variation despite neutral genetic homogeneity raises important questions about the mechanisms and tempo of adaptive evolution in lodgepole pine. Several ecological and life-history factors may facilitate rapid morphological evolution in this species, providing insights into broader patterns of conifer adaptation.
First, lodgepole pine’s demographic characteristics create conditions conducive to rapid local adaptation. The species exhibits early reproductive maturity, often producing cones by age 5–10 years [
75], high fecundity [
76], and large effective population sizes [
56,
77], collectively providing substantial standing genetic variation on which selection can act. Indeed, rates of contemporary evolution in natural populations can be remarkably rapid, with measurable phenotypic shifts occurring over tens to hundreds of generations when selection is strong [
66,
67,
78]. In environments with strong selective pressures—particularly variable fire regimes—rapid evolutionary change can occur over relatively few generations [
79,
80,
81], potentially producing substantial phenotypic divergence on timescales shorter than those required for neutral lineage sorting at organellar loci.
Second, specific ecological interactions generate strong divergent selection on morphological traits across the species’ range. Cone serotiny provides perhaps the most compelling example: populations experiencing frequent stand-replacing fires evolve high serotiny levels that maximize post-fire regeneration, whereas populations in fire-infrequent environments evolve non-serotinous cones that facilitate annual seed dispersal [
63,
75]. This trait shows rapid evolutionary responses in reciprocal transplant experiments [
64]. Similarly, cone morphology responds to differential seed predation by red squirrels and crossbills, creating a geographic mosaic of predator-driven selection that maintains variation in cone characteristics [
65].
Third, the polygenic architecture of morphological traits may facilitate rapid evolution despite gene flow. Unlike simple Mendelian traits, in which migration can readily override selection, polygenic traits with many loci of small effect can evolve and maintain local adaptation even with moderate levels of gene flow [
9,
82]. Selection acting simultaneously across multiple loci generates stronger overall differentiation than would be predicted from single-locus models, potentially explaining how morphological variation persists across lodgepole pine’s range despite extensive pollen flow. This polygenic scenario remains a working hypothesis, however: empirical support requires identification of adaptive loci through genome-wide association studies or selective-sweep analyses using dense nuclear SNP data. Future work should explicitly contrast
FST at putatively neutral loci against
QST for quantitative morphological traits across subspecies to provide direct, rather than inferred, evidence for polygenic local adaptation.
4.4. Taxonomic Implications: Rethinking Subspecies as Adaptive Ecotypes
Our findings, based on neutral organellar markers with limited polymorphism, raise questions about the degree to which current subspecies classifications reflect deep phylogenetic divisions in lodgepole pine. The genetic homogeneity observed across recognized subspecies boundaries, combined with continuous morphological variation between forms [
2], is more consistent with an ecotypic interpretation than with deep phylogenetic divergence. However, we stress that this conclusion is necessarily tentative given the limited number of polymorphic sites resolved and the absence of data from adaptive genomic regions. Neutral markers are inherently insensitive to divergent selection acting on ecologically important traits; thus, the lack of neutral genetic structure does not preclude substantial adaptive differentiation. An ecotypic framework—in which subspecies represent adaptive solutions to local environmental challenges rather than evolutionarily independent lineages—is consistent with the data, but a definitive taxonomic reinterpretation would require corroborating evidence from genome-wide scans, quantitative genetic analyses, and assessments of reproductive isolation.
The ecotypic interpretation better accommodates several empirical observations. First, morphological transitions between subspecies occur gradually along environmental gradients rather than at discrete boundaries, particularly between
P. c. murrayana and
P. c. latifolia, where elevation-associated variation produces continuous clinal variation [
2,
3]. Second, diagnostic characters (cone serotiny, needle length, growth form) correlate more strongly with environmental variables than with geographic distance, suggesting adaptive responses to local conditions [
2,
60,
61,
62]. Third, provenance performance depends critically on environmental matching, with local adaptation evident across relatively fine spatial scales [
61,
83]. These patterns collectively indicate that morphological variation reflects ongoing adaptation to heterogeneous environments rather than historical isolation.
However, this ecotypic interpretation does not diminish the biological significance of recognized forms. Rather, it reframes our understanding of their evolutionary origin and maintenance: subspecies represent dynamic adaptive responses maintained by spatially varying selection rather than static entities isolated by reproductive barriers. This perspective has important implications for nomenclature and taxonomy. While retaining subspecies designations for communication and management may be pragmatic, we should recognize that these categories mark points along adaptive continua rather than discrete evolutionary units. This interpretation aligns with emerging conceptual frameworks that emphasize the evolutionary process over pattern in defining biological diversity [
84].
4.5. Conservation and Management Implications
The ecotypic interpretation of lodgepole pine diversity has significant implications for conservation prioritization and management strategies. Traditional approaches that emphasize protecting taxonomic units (subspecies, varieties) may inadequately capture the adaptive processes that maintain morphological variation along environmental gradients. Instead, conservation efforts should focus on preserving the ecological contexts and selective regimes that promote local adaptation.
For assisted migration and reforestation programs, our results indicate that genotype-environment matching should take precedence over subspecies identity in seed source selection. Provenance trials showing strong local adaptation [
60,
61,
83] indicate that transferring populations to climatically mismatched sites—even within the same subspecies—can reduce fitness and compromise regeneration success. Conversely, populations from different subspecies may perform similarly when matched to appropriate environmental conditions. This finding supports the development of seed transfer guidelines based on climate models and provenance performance data rather than taxonomic boundaries [
85,
86].
Climate change adds urgency to these considerations. As environmental conditions shift, populations must either adapt in situ, migrate to track suitable habitats, or face extirpation [
87,
88]. The genetic homogeneity we observe suggests substantial connectivity across the range, potentially facilitating adaptive allele flow to populations experiencing novel climates. However, the strong local adaptation evident in morphological traits suggests that adaptation may need to be rapid to keep pace with environmental change. Conservation strategies should therefore maximize both within-population genetic diversity (providing raw material for adaptation) and landscape connectivity (facilitating gene flow), while recognizing that historical taxonomic units may not effectively capture functionally important variation.
The genetic basis of adaptation will be critical to determining evolutionary responses to climate change. While neutral genetic diversity provides evolutionary potential, the architecture of adaptive traits—their heritability, genetic correlations, and pleiotropic effects—will determine the rate and direction of evolutionary change. Understanding these genetic architectures through genomic approaches will enable more informed predictions of population responses and more effective conservation interventions.
4.6. Broader Implications for Conifer Phylogeography and Evolution, and Priorities for Future Bayesian Demographic Inference
Lodgepole pine’s pattern of morphological divergence without corresponding genetic structure may characterize conifers more broadly, particularly among species with large ranges and high potential for gene flow. The growing body of phylogeographic studies in western North American conifers reveals considerable heterogeneity: some species show strong phylogeographic structure (e.g.,
P. ponderosa [
34,
68],
Pseudotsuga menziesii) [
71], whereas others exhibit relative homogeneity (e.g.,
P. contorta [this study],
Picea glauca [
72]). Understanding the factors that underlie these contrasting patterns remains a significant challenge.
Several characteristics may predispose lodgepole pine to rapid morphological evolution and genetic homogenization. First, the species’ ecological generalism—occupying sites ranging from coastal fog belts to interior montane forests, from nutrient-poor soils to productive sites—exposes populations to diverse selective pressures that drive adaptive differentiation. Second, wind pollination, combined with early reproductive maturity, facilitates extensive gene flow that homogenizes neutral variation while allowing adaptive loci to differentiate under selection. Third, the species’ boom-and-bust demography associated with fire disturbance creates periodic selective episodes that can drive rapid evolutionary change. These factors collectively create conditions that favor the evolution and maintenance of ecotypic variation in the absence of deep phylogenetic structure.
More broadly, our results suggest that, at least for lodgepole pine, inferences of evolutionary independence based solely on morphological differences should be approached cautiously. Whether this conclusion applies widely to other common conifers remains an open empirical question: patterns of disagreement between morphological traits and neutral genetic divergence are not universal (as the contrasting example of Pinus ponderosa illustrates), and each species needs individual assessment. The traditional method of recognizing subspecies, primarily based on morphology, may overstate phylogenetic divergence in taxa where morphological traits are strongly selected and can evolve rapidly, but this caveat applies only when organellar and nuclear evidence align. Combining genomic data that target both neutral and adaptive variation offers a more detailed view of evolutionary processes, distinguishing between historical demographic changes (as reflected in neutral markers) and current adaptation (as reflected in functional traits and their genetic basis).
A particular priority is to formally separate demographic expansion from selection when explaining the negative neutrality-test statistics discussed here. As noted in the Introduction and
Section 3.3, Tajima’s
D and Fu and Li’s
D* are inherently ambiguous regarding this distinction. Two complementary Bayesian approaches could clarify this ambiguity in future work. First, approximate Bayesian computation (ABC; [
39]) provides a flexible likelihood-free method where observed summary statistics are compared to distributions simulated under different demographic models (e.g., constant size, exponential expansion, bottleneck followed by expansion). This approach allows for efficient estimation of posterior model probabilities, with nuisance parameters—including the substitution rate and effective population size—integrated over, rather than fixed at specific values, thereby carrying the uncertainty from the sensitivity analysis in
Section 3.4. Second, full Bayesian coalescent inference using software such as BEAST [
40] can directly fit flexible demographic priors (e.g., the Bayesian skyline plot) to sequence data, estimating posterior distributions of effective population size over time. Applying these methods to the current dataset, along with additional nuclear loci to distinguish organellar from nuclear demographic histories, would significantly bolster the conclusion that the site-frequency-spectrum signature reflects post-glacial range expansion rather than selection on linked sites. It would also produce posterior-supported population size trajectories consistent with paleoecological reconstructions.
4.7. Limitations, Inferential Status of Key Conclusions, and Future Directions
Several conclusions in this manuscript warrant explicit clarification of their evidential basis. The following distinguishes conclusions directly supported by our data from those that are inferential or hypothesis-generating. (1) Directly supported: low organellar nucleotide diversity (π = 0.000178–0.000186) and absence of subspecies-level phylogenetic structure in cpDNA and mtDNA; significantly negative neutrality test values (Tajima’s D = −2.26; Fu and Li’s D* = −4.52), which are consistent with demographic expansion; and a 5 bp chloroplast indel concentrated in coastal populations. (2) Inferential or hypothesis-generating: asymmetric coastal-to-interior pollen-mediated gene flow; rapid polygenic adaptation as the primary driver of morphological divergence; and the claim that selection is strong enough to overcome gene flow across the genome. These latter claims are logically consistent with the organellar data but require nuclear genomic evidence for direct confirmation.
Another limitation is that, because organellar loci evolve neutrally, our study cannot detect any hidden adaptive nuclear structures that might exist. It is plausible—and, based on evidence from common-garden and provenance studies [
60,
61,
83], likely—that populations show significant divergence at adaptive nuclear loci despite organellar uniformity. Future research should therefore focus on: (i) genome-wide SNP genotyping (such as RADseq or whole-genome resequencing) to measure nuclear
FST and find outlier loci under selection; (ii) environmental association analyses that connect genomic variants to climate and disturbance gradients; (iii)
QST–
FST comparisons to differentiate between adaptive and neutral divergence; and (iv) coalescent demographic modeling with nuclear loci to formally test divergence scenarios and gene flow directionality. Until such data are available, our ecotypic reinterpretation should be seen as a well-supported working hypothesis rather than a definitive conclusion.
Figure 2 presents a conceptual model showing how gene flow (pollen versus seed dispersal), divergent selection, and morphological differentiation interact in lodgepole pine. It illustrates how cpDNA homogenization through long-distance pollen flow can happen alongside adaptive morphological divergence driven by spatially varying selection on fire- and climate-related traits.
5. Conclusions
This detailed phylogeographic analysis uncovers a key paradox in lodgepole pine evolution: clear morphological differences that suggest subspecies recognition occur alongside almost complete genetic uniformity at organellar loci. This pattern conflicts with expectations based on organellar markers, which indicate deep phylogenetic splits among subspecies, and underscores the complex interactions among gene flow, natural selection, and demographic history in shaping current biodiversity patterns.
Our organellar data do not support the existence of deep phylogenetic divisions among the recognized subspecies. Instead, the data align with the idea that morphological differences result from rapid adaptive evolution in spatially diverse environments rather than from long-term geographic separation. This view reinterprets subspecies as potential ecotypes maintained by ongoing divergent selection despite extensive gene flow. We emphasize, however, that this interpretation remains hypothesis-generating: testing it directly requires nuclear genomic data, including adaptive-locus scans and quantitative-genetic studies, which we identify as a critical priority for future research.
The discovery of a unique chloroplast variant in coastal populations offers molecular evidence for a northern Pacific refugium during the Wisconsin glaciation. Its presence in interior populations also indicates post-glacial pollen-mediated gene flow. However, the overall pattern of genetic uniformity suggests either severe bottlenecks during glaciation or highly effective post-glacial homogenization, in contrast to more structured phylogeographic patterns seen in related species. These findings add to the growing understanding that glacial refugia and post-glacial colonization patterns differ significantly among co-distributed species, influenced by species-specific factors like population size, dispersal ability, and demographic history.
The maintenance of morphological variation despite genetic homogeneity has important theoretical implications, illustrating that strong divergent selection can surpass the homogenizing effects of gene flow on ecologically important traits. This pattern supports models of local adaptation despite migration and highlights that evolutionary independence cannot be determined solely from phenotypic differences. The capacity for rapid morphological change, as seen in lodgepole pine, is common among conifers. Extensive phylogeographic and adaptive genetic research across
Pinus species—including
P. sylvestris [
89],
P. pinea L [
90], and
P. mugo [
91]—as well as
Picea species like
P. abies [
92] and
P. mariana [
93], consistently reveal marked phenotypic differences across environmental gradients despite limited neutral genetic divergence. These studies collectively demonstrate that strong divergent selection can sustain locally adapted ecotypes even with significant gene flow, a pattern that seems to be a hallmark of widespread, wind-pollinated conifers. This recurring pattern across diverse conifer lineages suggests that traditional taxonomy based on morphology may not accurately represent genetic relationships or evolutionary history in this group.
These results have direct implications for conservation and management strategies. As detailed in
Section 4.5, a process-oriented approach that prioritizes preservation of ecological gradients, selective regimes, and landscape connectivity—rather than taxonomic units per se—is recommended; genotype–environment matching should guide seed source selection in reforestation and assisted migration programs.
Future research that combines genomic methods with ecological niche modeling will offer unprecedented insights into the genetic basis of adaptation and the environmental factors that drive morphological differences. These studies will verify whether loci believed to be adaptive exhibit higher levels of differentiation than neutral genomic regions, directly illustrating the role of local adaptation in maintaining genetic variation. By integrating genetic, morphological, environmental, and demographic data within clear theoretical frameworks, we can develop a more advanced understanding of evolutionary processes in widespread species and create more effective strategies for conserving biodiversity amid rapid environmental changes.