Genetic, Morphological, and Environmental Differentiation of an Arid-Adapted Oak with a Disjunct Distribution

: The patterns of genetic and morphological diversity of a widespread species can be inﬂuenced by environmental heterogeneity and the degree of connectivity across its geographic distribution. Here, we studied Quercus havardii Rydb., a uniquely adapted desert oak endemic to the Southwest region of the United States, using genetic, morphometric, and environmental datasets over various geographic scales to quantify differentiation and understand forces inﬂuencing population divergence. First, we quantiﬁed variation by analyzing 10 eastern and 13 western populations from the disjunct distribution of Q. havardii using 11 microsatellite loci, 17 morphological variables, and 19 bioclimatic variables. We then used regressions to examine local and regional correlations of climate with genetic variation. We found strong genetic, morphological and environmental differences corresponding with the large-scale disjunction of populations. Additionally, western populations had higher genetic diversity and lower relatedness than eastern populations. Levels of genetic variation in the eastern populations were found to be primarily associated with precipitation seasonality, while levels of genetic variation in western populations were associated with lower daily temperature ﬂuctuations and higher winter precipitation. Finally, we found little to no observed environmental niche overlap between regions. Our results suggest that eastern and western populations likely represent two distinct taxonomic entities, each associated with a unique set of climatic variables potentially inﬂuencing local patterns of diversity. expected heterozygosity; Ar : allelic richness by locus; Rel : relatedness within a population.


Introduction
Genetic and phenotypic trait diversity of populations and species may be strongly influenced by environmental conditions at regional and local scales [1,2]. For example, favorable environmental conditions may promote higher genetic diversity by influencing the number of populations and migrants they produce, and thus increase their genetic connectivity [3]. Alternatively, unfavorable conditions could reduce genetic diversity and increase genetic drift by influencing population bottlenecks, inbreeding among close relatives, or local extinction-colonization dynamics [4]. Furthermore, environmental conditions can impose selective pressures and influence the rate at which populations diverge and speciation may occur [5][6][7].
An additional factor influencing genetic and trait divergence is large-scale spatial disjunctions, or discontinuities, in the range across which little or no migration occurs. Species with a once continuous range, but now consisting of disjunct populations, have long been of biogeographical interest [8][9][10], especially when those disjunctions have been

Sample Collection
Leaf tissue was collected in the summer of 2016 from a total of 489 samples from 23 populations for both morphological and genetic studies ( Figure 1). From each individual, two leaves were collected and pressed for morphometric analysis and two to three leaves were placed on ice for DNA extraction. Heteroblastic leaf shape has been noted to occur in Fagaceae [33,34], in which leaf morphology may differ from various shoot types. Using a collection method implemented in similar morphometric studies of Quercus [32,35,36], we collected fully expanded and undamaged leaves from sun-exposed terminal shoots. A total of 23 populations were included in genetic analysis, while only 21 populations were included in morphological analysis. The two populations that were not included in the morphological analysis (E13 and E14) were sampled by collaborators and only included enough tissue for use in genetic analysis. Voucher specimens were produced for each included in morphological analysis. The two populations that were not included in the morphological analysis (E13 and E14) were sampled by collaborators and only included enough tissue for use in genetic analysis. Voucher specimens were produced for each population and deposited at MOR and FLD with vouchers collected from the Navajo Nation additionally deposited in NAVA (herbarium acronyms follow Holmgren et al. [37]). Populations were chosen by considering timing and logistical issues of sampling while also attempting to represent the geographic range of the taxon. A list of possible sites was generated by contacting land managers of private and public land in the region, consulting GBIF (www.gbif.org; accessed 14 March 2015) and SEINet (https://swbiodiversity.org/seinet; accessed on 14 March 2015), and using suggestions from members of the International Oak Society. Additionally, populations were selected that were subject to little or no observed hybridization, although two sites located within the major disjunction (W11 and W12) documented occurrences of Q. gambelii and Q. turbinella in the vicinity.

DNA Extraction and Microsatellite Protocol
Genetic diversity was evaluated based on 11 nuclear microsatellite loci developed for other Quercus species (Appendix A Table A1). Genomic DNA was extracted from frozen leaf tissue using the E.Z.N.A. Plant DNA DS Kit (Omega Bio-tek, Norcross, GA, USA) and diluted to an approximate concentration of 5 ng/µL for PCR amplification. Microsatellite primers were organized using the software Multiplex Manager v.1.2 [38]. Multiplexed PCRs were carried out in a total volume of 10 µL, containing 2 µL of DNA template (5 ng/µL) and 8 µL of master mix consisting of 1× reaction buffer, 0.5 mM total dNTPs, 1.5 mM MgCl 2 , fluorescently tagged primer concentrations ranging from 0.02 to 0.28 µM, 0.03 µM of each reverse primer, 0.5 µg/µL BSA, 0.025 U of GoTaq G2 DNA polymerase (Promega, Madison, WI, USA) and HPLC-grade water. Initial PCR conditions and reagents, including the use of BSA, were based upon similar microsatellite studies of congeneric species [39][40][41][42]. Forward primers were labeled with one of the 6-FAM, NED, VIC or PET fluorescent dyes (Applied Biosystems, Waltham, MA, USA). Cycling conditions were: an initial denaturation step at 94 • C for 5 min, then 30 cycles consisting of 30 s at 94 • C, 30 s at 50-56 • C (multiplex 1 at 50 • C, multiplex 2 at 54 • C, and multiplex 3 at 56 • C) and 30 s at 72 • C, followed by a final extension of 5 min at 72 • C. PCRs were ran on Eppendorf Mastercycler pro (Eppendorf, Hamburg, Germany) and C-1000 Touch (Bio-Rad, Hercules, CA, USA) machines. Amplification products were visualized using a 1.5% agarose, 1× TAE gel with ethidium bromide stain and sized with a 1 kb + ladder (Thermofisher Scientific, Waltham, MA, USA). PCR products were run on an ABI 3730 Genetic Capillary Electrophoresis Sequencer (Applied Biosystems) and fragment sizes of alleles were scored using the software Geneious v. 10 [44] were both used to check for null alleles across loci.

Statistical Analysis of Genetic Data
Initial analyses were performed using a minimum of 10 individuals per population, which means that some initially sampled locations were not analyzed due to having too few individuals (E11, E13, and W2). We used R v.3.6.3 [45] for all R-based analyses. After the removal of clones from the dataset using the R package Poppr v.2.9.0 [46], we decided to decrease the minimum number of individuals per population to seven in order to maximize the number of populations included over the range of the taxon. The presence of null alleles was detected in several loci. Thus, analyses were performed both including and discarding these loci. The final dataset reported in this study used genetic data from eleven microsatellite loci for a total of 489 individuals from 23 populations, requiring a minimum of seven individuals per population (Table 1). Alternative analyses discarding loci with null alleles and adjusting minimum population requirements were performed and did not alter results; these can be run using the code on GitHub (doi:10.5281/zenodo.4453098). Genetic diversity statistics including allelic richness, heterozygosity, and F ST were calculated using the R packages adegenet v.2.1.3 [47], hierfstat v.0.5-7 [48], and demerlate v.0.9-3 [49]. Tests for normality were performed using the Shapiro-Wilk test for normality and Q-Q plots, which showed no significant departures from normality for the three diversity and relatedness measures, but did show a significant non-normal distribution for F ST . This is true for all subsets of the data tested. To compare populations from each region (East vs. West) for these basic summary statistics, we performed t-tests, as well as Wilcoxon tests, for these four statistics, for all subsets of the data. To determine patterns of isolation by distance and the distribution of genetic variation between and among populations, a Mantel Test and an Analysis of Molecular Variance (AMOVA; F ST , 999 permutations) were performed using GenAlEX v.6.503 [50]. Genetic data were also visualized using principal coordinate analysis (PCoA) of pairwise distances between populations using GenAlEx. Genetic structure was visualized using STRUCTURE v.2.3.4 software [26,27]. STRUCTURE plots were examined for both range-wide and within regions [51]. K values were set from  [52]. CLUMPP version 1.1.2 was used to resolve multimodality and DISTRUCT version 1.1 [53] was used to visualize results from STRUCTURE outputs. STRUCTURE clustering results were cross-checked with the model-free method discriminant analysis of principal components (DAPC) using the R package adegenet [47,54]. The number of K groups that best fit the data was chosen using the Evanno method [55].

Morphometric Methods
We used morphometric analysis, utilizing the characters defined in McCauley et al. [32], to assess patterns of morphological differentiation at local and regional scales ( Table 2) While specifically adapted for southwestern oaks, these measurements represent typical landmarks used in morphometric analysis in Quercus [56,57]. Normality of data was tested both across and within populations for individual characters using a Shapiro-Wilk test. As the data for many characters were not normally distributed, we used non-parametric Spearman's correlation to identify potential correlation (r > 0.70) [58,59] among variables. As no correlation was observed, all variables were used in the subsequent analyses. Principal component analysis (PCA) was used to visualize differentiation of all individual samples and means from each population across the two disjunct regions with the first and second principal components plotted using the R package ggplot2 v.3.3.3 [60]. To explore the partitioning of morphological variance in each of the measured variables across multiple levels, we performed a linear mixed effects model using the R package nlme v.3.1-152 [61]. Region was considered a fixed effect in the model, while population within each region and trees within each population were considered random effects. Residual plots were used to check for normality. The proportion of the total variance contributed by the fixed and combined random effects was determined using the marginal and conditional R 2 procedure of Nakagawa and Schielzeth [62] and calculated using the R package MuMIn v. 1.43.4 [63]. Table 2. Proportion of morphological variance in 12 directly measured and five composite variables in Q. havardii partitioned by East and West (regional) and by combined population and individual (local).

Character
Abbreviation Regional Local

Environmental Differentiation and Influence on Genetic Variation
We used PCA and calculated means of bioclimatic variables to investigate and visualize environmental relationships among our sampled populations. We utilized the 19 bioclimatic variables from the WorldClim version 2.1 database [64] and variables were downloaded at raster grids of 2.5 arc minutes (ca. 5 km) to compensate for the total area of the largest populations sampled. The full names and abbreviations for bioclimatic variables used for each environmental analysis are listed in Table A6. To avoid multicollinearity, one of each pair of highly correlated variables (r > 0.7) was removed [58], leaving 6 bioclimatic variables for analysis. Of these remaining variables, 4 were associated with temperature and 2 with precipitation. PCA was used to visualize climatic similarities and dissimilarities of the 23 sampled populations using the R package ggplot2.
In an additional analysis examining environmental differences between the regions, we assessed environmental niche overlap. To do so, we used a dataset that consisted of 303 occurrence points (243 from the East and 60 from the West) after accounting for spatial autocorrelation. Occurrence points were provided by collaborators who produced the IUCN Red List assessment for oaks in the United States [30]. The 19 bioclimatic variables were downloaded for 2.5 arc minutes resolution. Highly correlated variables (r > 0.7) were removed, leaving 7 bioclimatic variables for analysis, of which 4 were associated with temperature and 2 were associated with precipitation (Table A6). Niche differentiation in environmental space and also niche equivalency, similarity, and overlap were calculated using the R package ecospat v3.2 [65]. Niche overlap was measured by Schoener's D index, which ranges from 0 (no overlap) to 1 (full overlap; [66]).
To visualize specific environmental and genetic (i.e., heterozygosity, allelic richness, and relatedness) relationships within and among regions, we built models by multiple linear regressions for 3 genetic statistics (heterozygosity, allelic richness, and relatedness) using uncorrelated environmental variables at several geographic scales. Given the disjunct and patchy distribution of this species, and confirmation that different variables were found to be correlated within each region, we used three different sets of uncorrelated bioclimatic variables corresponding to: (1) the range-wide distribution including both eastern and western sampled populations; (2) eastern populations only; (3) western populations only. Spearman correlation analyses were performed for every subgroup and correlated variables were removed (r > 0.7). A final list of uncorrelated variables used as inputs for each model can be found in Table A6. The variables selected across all subgroups were subjected to a final check for collinearity using variance inflation factor (VIF) ≤ 10 [58]. We compared regression models using the all-subsets variable reduction approach (R function regsubsets, package leaps v.3.1 [67]) and selected the "best supported" model using Bayes' BIC. Model performance was assessed using adjusted coefficients of determination (adjusted R 2 ) and the Benjimini and Hochberg [68] multiple comparisons correction for p-values.

Estimation of Divergence between East and West Regions Using ABC and Genetic Data
Modeling-based simulation analyses, such as approximate Bayesian computation (ABC), have been useful in elucidating the evolutionary histories of species [69][70][71]. We performed an ABC analysis to estimate the timing of divergence between the East and West regions. We compared two simple hypotheses: (1) splitting and isolation (the I model) or (2) splitting and continued migration (the IM model). We estimated the following parameters: splitting time (TS), mutation rate-scaled population sizes of each region, theta 1 and theta 2 (representing the East and West regions, respectively), and migration (M). We used the R code developed by Navascués [72], which runs simulations in the coalescent simulator ms [73] and uses random forest [74,75] for parameter estimation. This software simulates parameters in the form of theta, M and TS, which are scaled parameters rather than the parameters of biological interest here (i.e., population size and migration). An estimation of mutation rate is needed to transform these to population size, migration rate and splitting time on scales amendable to biological interpretation. We searched the literature for plant microsatellite mutation rates per site per generation and found four: wheat (2.4 × 10 −4 ), maize (7.7 × 10 −4 ), Arabidopsis (8.87 × 10 −4 ), and red cedar (6.3 × 10 −4 ) [76][77][78][79]. We then used the mean of the four values, 6.3 × 10 −4 .

Genetic Variation and Differentiation
Genetic analyses showed an overall pattern of differentiation between the East and West regions of Q. havardii. Summary statistics for all populations as well as means and standard deviation across populations are shown in Table 1. Genetic diversity was higher in western populations (He = 0.655 ± 0.04) than eastern populations (He = 0.598 ± 0.061). Using a Mantel Test ( Figure A1), isolation by distance was significant when comparing among regions, as well as in the West (R 2 = 0.256, p < 0.001), but not in the East. The results of the AMOVA show that most of the molecular variance (81%) was found within and among individuals; however, more genetic variance occurred among regions (15%) than among populations (4%). When comparing between regions (Table A2), there is significantly higher allelic richness (p = 0.0003), higher heterozygosity (p = 0.015), and lower relatedness in West populations (p = 0.009; Figure 2) than in East populations. With the exception of population W1, which had significantly higher clones than other western populations, we detected a higher frequency of clones within eastern populations. Patterns from the first three axes of separate PCoA analysis among individuals ( Figure A2) and populations ( Figure 3A) revealed genetic distinction between the two regions and subdivisions of populations within the West, with W1 being the most separated from other populations. Bayesian clustering analyses from STRUCTURE ( Figure 3D-F) and a discriminant analysis of principal components using the R package adegenet showed highly similar relationships among populations. For all three analyses (i.e., range-wide, eastern region, and western region), K = 2 was the best-supported assumption for each dataset. The next best supported number of clusters (K = 3) for the range-wide dataset is also shown ( Figure 3D). latedness in West populations (p = 0.009; Figure 2) than in East populations. With the exception of population W1, which had significantly higher clones than other western populations, we detected a higher frequency of clones within eastern populations. Patterns from the first three axes of separate PCoA analysis among individuals ( Figure A2) and populations ( Figure 3A) revealed genetic distinction between the two regions and subdivisions of populations within the West, with W1 being the most separated from other populations. Bayesian clustering analyses from STRUCTURE ( Figure 3D-F) and a discriminant analysis of principal components using the R package adegenet showed highly similar relationships among populations. For all three analyses (i.e., range-wide, eastern region, and western region), K = 2 was the best-supported assumption for each dataset. The next best supported number of clusters (K = 3) for the range-wide dataset is also shown ( Figure 3D).
P opulation N am es A verag e P airw ise R elated n ess

Morphological Variation
Similarly, the morphological analysis showed the presence of two groupings corresponding to the two regions. This was most pronounced when using population means, which indicated a level of differentiation similar to that seen in the genetic analysis ( Figure 3B). The first two principal components represented 62.5% of the variation (Table A3; PCA plot loading scores are given in Table A4). Examination of population means showed most individual populations to cluster relatively close together in the West but with more dispersion of populations in PCA space than in the East (Figures A3 and A4). One western population, W4, was found to be highly divergent from the remaining populations. However, analysis at the individual-level ( Figures A5 and A6) showed a high degree of morphological variation occurring across both regions and interdigitation between the East and West regions with a high degree of variation seen within most populations. The partitioning of morphological variance indicated that factors associated with leaf length and lower lobe angle (Table A5) contributed the most to separation between the two regions while high levels of variation at population-and tree-levels, strongest for factors associated with leaf width, indicated high levels of within-population variation (Table 2). Generally, leaves in the East were characterized by having longer leaves with slightly fewer lobes and more acute lobes, and leaves in the West were characterized by having shorter leaves that are widest at the middle lobe of the leaf blade.

Environmental Differentiation, Variation, and Influence on Genetic Diversity
Principal component analysis of the six uncorrelated bioclimatic variables showed significant separation between populations from the East and West regions, with the first two components representing 56.6% of the variation ( Figure 3C). More specifically, PC1 represented 30.9% of variation while PC2 represented 25.7% (Table A9). Maximum temperature of warmest month (BIO5), mean temperature of driest quarter (BIO9), and precipitation of coldest quarter (BIO19) showed almost equal contributions to the separation of populations along PC1, while annual temperature range (BIO7) and precipitation of the driest month (BIO14) showed along PC2 (Table A8). Mean temperature of the driest quarter was most important for the separation of West populations, while most East populations were primarily separated by mean diurnal range and precipitation of the driest and coldest months. We found higher average temperatures and precipitation in the East when compared to western populations (Table A7). Additionally, there is less seasonal temperature variability in the eastern region. Notably, one western population, W1, was found to be more climatically similar to populations in the East.
We found little evidence of environmental niche overlap (D = 0.002; Figure A7) between the eastern and western regions of Q. havardii. Additionally, western populations were observed to occupy a larger climatic niche space than eastern populations. Bioclimatic variables representing environmental extremes (e.g., precipitation of the driest month, mean temperature of the warmest quarter, and maximum temperature of the warmest month) contributed most to the separation of the East and West regions in environmental space. We found no significant differences for either niche equivalency or similarity between the two regions.
For each of the regression models with the lowest BIC, using a Benjamini-Hochberg procedure was found to be significant (p < 0.05) between genetic statistics and environmental variables, with the exception of the model of relatedness within the eastern region (Table 3). When examining the significant regressions across the overall geographic space (i.e., analyzing the East and West populations together), we found that the selected models showed a strong relationship mostly between temperature-related variables and all genetic diversity metrics especially with bioclimatic variables representing extreme or limiting environmental factors. For the range-wide model, allelic richness and relatedness were mostly found to be associated with seasonality-based variables, while heterozygosity was associated with variables pertaining to environmental extremes. Within the East, we found that heterozygosity and relatedness were primarily correlated with temperature extremes (mean temperature of the driest quarter; BIO9), while allelic richness was primarily correlated with precipitation extremes (precipitation of the warmest quarter; BIO18). Within the West, we found very strong correlations between both heterozygosity and allelic richness with environmental variables (both with adjusted R 2 > 0.93). More specifically, heterozygosity and relatedness were associated with extreme or annual trends of precipitation (precipitation of the driest month and annual precipitation; BIO14 and BIO12, respectively), while allelic richness was associated with temperature extremes (mean temperature of wettest quarter; BIO8).

Approximate Bayesian Computation (ABC) Estimate of Divergence Time
The support for the IM model over the I model is 0.758/0.232, equaling a Bayes Factor of 3.26, which is "moderate" but not "very strong" evidence [80,81]. The out-of-bag prior error rate is 42%, suggesting that it is difficult to distinguish between the two models. After untransformed values (Table A10) were converted to usable parameters (Table A11), the mean divergence time using the I model is 12,380 years, with confidence intervals ranging from 567 to 59,500 years. This suggests that divergence possibly occurred on a timescale of tens of thousands of years. We also present posterior parameter estimates based on all simulations run over both models (Table A11). When both models are included, the median estimate of the divergence time is 31,000 years ago, though the confidence intervals run from a few thousand to more than ten million. This wider confidence range seems logical because the divergence time estimate will vary with the migration rate. Additionally, the estimated population size in the West was consistently larger than in the East. Table 3. Genetic-environmental correlations for each of the best-supported multiple linear regression (MLR) models for Quercus havardii at regional and local geographic levels. The p-values for the variable coefficients are shown, in addition to the adjusted R 2 and p-values for each reported model.

General Patterns of Genetic, Morphological, and Environmental Variation
We observed strong differentiation of genetic, environmental, and morphological datasets coinciding with contemporary geographic separation of populations. Across its entire geographic range, we found that Q. havardii has relatively moderate levels of genetic diversity and high differentiation when compared to other oaks [41,42,82,83]. While morphological differentiation, particularly population means, showed separation between the two regions principally driven by differences in leaf length, the total variance in the dataset was shown to be very high within populations. This suggests that morphological diversity and plasticity can be quite great at the individual and population level, which is a pattern typically seen and documented in oaks of different groups attributed to patterns of interspecific hybridization [84] or canopy position and light exposure [85].
Climate is likely playing a role in shaping genetic diversity within and among populations and regions of Q. havardii. We found stark differences in regional climate between the East and West regions (e.g., mean annual precipitation in eastern populations was nearly double that of western populations). Overall, in western populations, allelic richness is 25% higher and heterozygosity is 11% higher, while relatedness is <50%, compared to eastern populations. We also observed several correlations within regions of population genetic diversity statistics and environmental variables that have potential biological relevance to population size and persistence. Specifically, measures of genetic diversity (i.e., heterozygosity, allelic richness, and relatedness) were correlated with climatic variables that represent seasonality or extremes (e.g., heterozygosity and allelic richness were correlated with mostly temperature seasonality or extremes in the East; while heterozygosity, allelic richness, and relatedness were correlated with both precipitation and temperature extremes in the West). This suggests that these factors may be influencing long-term population size and stability and thus genetic diversity via genetic drift. Studies in other desert plants [86] and oak species [87,88] correlating neutral microsatellite markers and climatic variables showed a similar pattern of differences at a regional level. Additional studies within oaks similarly noted seasonality-related variables as possible key factors shaping genetic diversity [89,90] and the increased rates of evolution for leaf habit [22]. Of course, these are correlations; further research on the effects of climate on oak diversity will be needed to elucidate the importance of these mechanisms. Possible adaptive trait differentiation corresponding to morphological, genetic and environmental differentiation patterns should be explored further.
An alternative influence on genetic diversity could be the relatively higher rates of human disturbance in the East. The Llano Estacado region of Texas and New Mexico has high rates of ranching and oil and gas development [30]. In such disturbed areas, population extinction and colonization may be higher, which could cause lower genetic diversity, higher relatedness, and higher genetic differentiation due to bottleneck effects. Additionally, we observed high rates of clonality, but much lower rates than were suggested by earlier allozyme data for Q. havardii [91]. Future work should investigate in more detail the spatial distribution of clones within populations, which might have profound effects on genetic differentiation, genotypic diversity and perhaps the long-term viability of the species. Thus, we cannot fully untangle the influences of genetic diversity in this system, but our results do present specific hypotheses to be investigated in more detail.
We found that the divergence of eastern and western populations most likely occurred on the scale of tens of thousands of years ago, according to our results from the ABC analysis. Although confidence intervals are broad, this divergence estimate implies that these populations have been isolated for hundreds of generations and are likely sister species or subspecies; however, this assumes an evolutionary relationship between these populations that cannot be confirmed by our work. Our estimates align with a broader study of the American oak clade, suggesting that Q. havardii likely arose in the early-to mid-Miocene, and also suggested a significant adaptive transition in ecological space, perhaps due to niche specialization [22]. The history of these populations was likely influenced by glacial cycles and consequent effects such as drying of the deserts of the Southwest and loss of suitable habitat. To resolve questions of evolutionary history, possible hybrid origins, and phylogeography, we are currently working to incorporate samples from both geographic regions into the phylogeny of North American oaks [92].

Possible Implications for Conservation and Taxonomy
Quercus havardii is an important keystone species that defines the sand shinnery ecosystem and provides a habitat to vulnerable species such as the lesser prairie chicken and dunes sagebrush lizard. The conservation status is currently listed as "Endangered" by the IUCN Red List, which notes that populations are facing increased threats of habitat loss and fragmentation arising from development for oil and gas industries and agriculture [30]. We observed relatively moderate genetic diversity and a low degree of clonality in this study suggesting that the species, and most of its populations, are not in imminent danger of genetic problems like inbreeding. However, in our study, the population W1, a putative lagging-edge population [93], was identified as extremely vulnerable due to high clonality, high inbreeding and low genetic diversity, and thus should be considered as a possible priority for conservation. Other isolated or lagging-edge populations should be investigated as well.
Additionally, the taxonomic uncertainty of Q. havardii remains as it is currently treated as a singular species over the full extent of its range. However, this has been debated over the years as some authors have called the western region its own taxon, Q. havardii var. tuckerii Welsh or Q. welshii R.A. Denham, and has also been speculated as a putative introgressed hybrid with Q. gambelli [31,94,95]. Our work provided evidence that both eastern and western entities are genetically and morphologically distinct, and occupy unique environmental niche space with little to no overlap. However, future phylogenetic work may still be needed to elucidate the evolutionary relationships of these taxa and to support any taxonomic changes. If taxonomic changes are deemed valid, a reassessment of the conservation status of Q. havardii and Q. welshii will be needed to incorporate regionalspecific information such as population genetic data and modified area of occurrences.

Caveats
There are several important caveats that should be mentioned in this work. It must be noted that sampling was not systematic within and across each region, nor designed a priori to evenly cover different environmental variables but rather focused on the known locations of populations. It is possible that the relationships observed in this study could be artifacts of sampling or reflect a relationship with another unknown underlying variable. Moreover, we do not incorporate data on some variables that may be most important in determining a species demographic and genetic stability, (e.g., soil attributes). Additionally, the ABC analysis has extremely wide confidence intervals and assumes very simple models. An alternative hypothesis that these data do not support, but should not yet be ruled out yet, is that the divergence is deeper in time and that these are not sister taxa but rather have evolved similar morphology, habitat and habitat preference via convergent evolution.
Of additional importance, hybridization is possible among many oak taxa, and hybrids have been observed in many studies. Although oaks have been deemed taxonomically difficult due to high rates of intraspecific gene exchange, genetic investigations show that contemporary hybridization is usually observed at relatively low levels, such that species may maintain coherence [96]. We do not examine hybridization in this study, and only focus on individuals morphologically and ecologically consistent with Q. havardii; however, morphologies ascribed to Q. eastwoodiae or the Q. x undulata complex were observed (FLD and MOR herbarium specimens McCauley 700, 701, and 702). Additionally, several collected populations (W1, W4, W11, and W12) did not occur in deep, shifting sand and were noted as having other Quercus species located only a few kilometers away. It is likely that the highly specific and unique sand habitat of most populations of Q. havardii could be playing a role in observed diversity patterns by significantly limiting the opportunity for interspecific gene flow. Nonetheless, hybridization and its possible influence on diversity and population persistence is an important future area to investigate for this species.

Conclusions
We have provided the first range-wide genetic study of one of the arid-adapted oaks of the Southwest USA and one of the relatively few integrated genetic, ecological, and morphological characterization of oaks in general. Q. havardii has somewhat lower genetic diversity and higher genetic differentiation when compared with other widespread and important oaks [97], as might be expected with this species' highly specialized and fragmented habitat. We find strong differentiation between eastern and western regions, potentially at the level of evolutionary significant units (ESUs) or higher, suggesting important conservation designation. Within this larger regional context, correlations with environmental variables suggest that temperature seasonality, precipitation extremes, and other key factors may be influencing local levels of genetic diversity. Future studies of arid-adapted oaks are needed as climates continue to rapidly change and make habitats increasingly vulnerable, thus emphasizing the importance of these species as possible sources of genetic resources for breeding or restoration.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.   Table A3. Eigenvalues and percent variability for each principal component for the PCA using 17 uncorrelated morphological variables presented in Figure 3B.   5 Steinkellner et al. 1997 [101] PET (GT)5(GA)9 QM69.2M1 Isagi and Suhandono 1997 [102] PET (TGG)6(CGG)(TGG)2      Table A4. Plot loading scores of principle components axes with eigenvalues greater than 1 for the PCA using 17 uncorrelated morphological variables presented in Figure 3B. Important loadings that contribute more than one variable's worth of information are in bold.   Table A6. Uncorrelated bioclimatic variables used as inputs for the principle component analysis (Figure 1), building multiple linear regressions (MLR) models for each of the three regional and local geographic level datasets (Table 3), and variables used for the environmental niche overlap analysis ( Figure A7).   Table A8. Plot loading scores of principle components axes with eigenvalues greater than 1 for the PCA using bioclimatic data from 6 uncorrelated variables presented in Figure 3C. Important loadings that contribute more than one variable's worth of information are in bold.      Table 1.   Table 1.  Table 1.