Comparing Genetic Diversity in Three Threatened Oaks

: Genetic diversity is a critical resource for species’ survival during times of environmental change. Conserving and sustainably managing genetic diversity requires understanding the distribution and amount of genetic diversity (in situ and ex situ) across multiple species. This paper focuses on three emblematic and IUCN Red List threatened oaks ( Quercus , Fagaceae), a highly speciose tree genus that contains numerous rare species and poses challenges for ex situ conservation. We compare the genetic diversity of three rare oak species— Quercus georgiana, Q. oglethorpensis , and Q. boyntonii —to common oaks; investigate the correlation of range size, population size, and the abiotic environment with genetic diversity within and among populations in situ; and test how well genetic diversity preserved in botanic gardens correlates with geographic range size. Our main ﬁndings are: (1) these three rare species generally have lower genetic diversity than more abundant oaks; (2) in some cases, small population size and geographic range correlate with genetic diversity and differentiation; and (3) genetic diversity currently protected in botanic gardens is inadequately predicted by geographic range size and number of samples preserved, suggesting non-random sampling of populations for conservation collections. Our results highlight that most populations of these three rare oaks have managed to avoid severe genetic erosion, but their small size will likely necessitate genetic management going forward. change, increasing drought and high temperatures, changing ﬁre regimes, invasive


Introduction
Genetic diversity is a critical resource for species to adapt to future challenges including pests and diseases, climate change, and other environmental changes. To conserve and sustainably manage genetic diversity, it is important to understand the distribution and amount of genetic diversity present in situ, and to identify the key factors shaping that genetic diversity [1][2][3]. While genetic diversity has been assessed in hundreds of rare In these three oak species, we ask whether range size, population size, and environmental variables correlate to genetic diversity within and among populations in situ. We then ask to what degree genetic diversity is preserved in botanic gardens and if this correlates to geographic range size. Specifically, we aim to: • Assess levels of genetic diversity and differentiation in our focal species and compare them to other rare and common oak species (hypothesis: common oak species have more genetic diversity); • Determine if genetic diversity and differentiation correlate with the relative range size of the three oak species (hypothesis: larger range size correlates to higher genetic diversity and higher genetic structure) • Determine if genetic diversity and differentiation are correlated with demographic (e.g., population size) and/or environmental variables (hypothesis: lower genetic diversity in smaller populations; genetic diversity correlates with environmental variables); • Quantify the amount of genetic diversity of each species that is conserved ex situ and determine whether this is correlated with features of those species (hypothesis: genetic diversity should be predicted by both the number of plants ex situ and the commonness of the species).
With 31% of the United States' oak species now considered of conservation concern, results from this study should assist in assessing genetic diversity of the many other threatened oak species and designing management strategies for them, especially those for which genetic diversity is not available [18].

Quercus georgiana
crops and flat-rocks in the Piedmont Plateau of the Southeastern United States.

Study Species, Sampling, and Genotyping
These data were collected and used for a previous study answering different questions/in a different application [13]. Previously the data, along with genetic data of eight additional taxa of woody plant, were used to examine genetic diversity ex situ compared to in situ. We briefly described sampling and genotyping methods here for each focal species; complete collection and genotyping methods can be found in Hoban et al. 2020 [13]. For each species, we collected in situ samples from as many known populations as possible, and selected only trees representative of typical leaf morphologies for each focal species to avoid any possible hybrids. However, it is possible some hybrids were sampled because gene flow among oak species can occur (see Discussion). To find ex situ samples, we used Botanic Gardens Conservation International PlantSearch (https://members.bgci.org/data_tools/plantsearch, accessed on 2017), a global database of more than 1000 botanic gardens and their collections.
Quercus boyntonii (Alabama sandstone post oak) is endemic to Alabama (USA.), although historical records say that it formerly grew in Texas [34]. It is a shrub or small tree, sometimes reaching a height of 6 m, but usually smaller. Q. boyntonii was sampled ex situ from 16 botanic gardens and arboreta that have Q. boyntonii in their collections, totaling 87 individuals. In situ individuals were sampled in natural preserves, private property, and suburban parks. In situ population sizes ranged from fewer than 10 to more than 100 trees. Occurrences of the species are patchy, coinciding with suitable remnant habitat: sandstone outcrops, ridges, and slopes. We sampled 246 in situ samples (227 included in final analysis after clones were removed). In situ samples were collected during May 2017, and ex situ samples were collected between April and September 2017. Due to the patchiness of habitat, occurrence, and wind pollination it is challenging to delimit strict "populations". For these analyses, we used 8 km distance to delineate populations in instances of continuous distributions. We genotyped all individuals using 11 neutral microsatellites from previous studies in oaks. Extraction, testing of the larger panel of markers from which our microsatellites were drawn, and genotyping are discussed in detail in Hoban et al. 2020 [13].
Quercus georgiana (Georgia oak) is native to the Southeastern United States, mainly in Northern Georgia, but with additional populations in Alabama, North Carolina, and South Carolina. It grows on dry granite and sandstone outcrops of slopes of hills at 50-500 meters' altitude [35]. Quercus georgiana is a small tree, often shrubby in the wild, growing to 8-15 m tall. Quercus georgiana was sampled from nine populations across the known range of the species through the use of herbarium records, collection data from botanic garden records, and USDA PLANTS Database (USDA 2012). All sampled populations were separated by at least 15 km. A total of 226 samples (223 were retained with sufficient genetic data) were sampled in June 2011. At least 24 individual trees were randomly sampled from each site, and sampled plants were at least five meters apart. Seventeen botanical institutions in the United States, France, and Belgium shared samples, totaling 36 individuals. Eight nuclear and 11 expressed sequence tag (EST) microsatellite markers were used for genotyping, following extraction and genotyping methods detailed in Hoban et al. 2020 [13]. We expect EST microsatellites to have lower polymorphism information content as they are associated with transcribed regions of DNA [36]. Nuclear and EST markers were assessed separately for analysis.
Quercus oglethorpensis (Oglethorpe oak) is a long-lived woody plant endemic to the Southeastern United States. Extant and largely fragmented wild populations are documented in South Carolina, Georgia, Alabama, Mississippi, and Louisiana. Q. oglethorpensis has a disjunct distribution across its range, with smaller clusters of localities in Northeast Louisiana, Southeast Mississippi, and Southwest Alabama, and a more extensive and wellknown distribution from Northeast Georgia across the border into South Carolina. It grows to up to 25 m in height, and has leaves that are flat, narrowly-elliptical and usually without lobes. We prioritized sites with the most up-to-date occurrence data that was gathered in July 2015 during a germplasm collection effort [37]. We included additional sites not visited during the collection effort so that the greatest geographic distribution could be sampled. Sampled populations were separated by at least 9 km. Eight in situ populations were visited for a total of 191 samples (187 were retained with sufficient genetic data). Ex situ samples were collected from 145 trees, representing 16 botanic gardens around the world. All samples were genotyped with 10 nuclear microsatellite markers following extraction and genotyping methods in Hoban et al. 2020 [13].

Analysis: Basic Statistics
We used the R v 3.6.3 (R Core Team, Vienna, Austria) package adegenet version 2.1.2 to convert genepop files to genind and genpop formats. We used the R package poppr version 2.8.3 [38] to identify potential clones (and to remove clones/duplicate genotypes before any other calculations), expected heterozygosity, number of alleles, and allelic richness; hierfstat version 0.4.22 [39] to calculate pairwise population F ST values; diveRsity version 1.9.9 [40] to calculate observed heterozygosity and inbreeding coefficient (F IS ); and Demerelate version 0.9.3 to calculate measures of relatedness [39][40][41][42][43][44]. We also tested for signatures of recent bottlenecks using the heterozygote excess test in the BOTTLENECK software [45] with both the infinite allele model and the two phase model and the mode shift test. We performed an ANOVA with species as the factor and population as the unit of analysis, to test for differences among species in the main summary statistics. For this and subsequent tests we only used the nuclear SSRs because EST-SSRs have much lower heterozygosity and allelic richness and we only had them for one species. We also tested for isolation by distance with linear regression of genetic distance (F ST ) on geographic distance among populations. All analysis scripts are available at https://github.com/smhoban/SE_oaks_genetics (accessed on 15 February 2021).

Influence of Environment on Genetic Diversity and Differentiation
We obtained 19 standard bioclimatic variables from WorldClim 2.0 at a resolution of 2.5 min [46]. To determine if there is a relationship between local climatic variables and population-level genetics, for each species, we performed ordinary least squares linear regression [47] of each climatic variable on each of four basic population genetic summary statistics that we may expect to respond to local climate: expected heterozygosity, allelic richness, F ST , and relatedness. All analysis scripts are available at https://github.com/ smhoban/SE_oaks_genetics (accessed on 15 February 2021).

Influence of Local Population Size
Following previous work [48][49][50][51], we calculated the percentage of genetic diversity conserved as the proportion of extant in situ alleles preserved in ex situ collections. We focused on alleles existing in the wild; we did not count alleles existing only in botanic garden collections. These data were previously presented in Hoban et al. 2020 [13] but were analyzed in a different context: comparing genetic diversity in ex situ collections among different genera and without regard to species range sizes.

Basic Results
Genetic summary statistics for all three species include: N of samples genotyped, genetic diversity, measured as expected heterozygosity (H e ) and allelic richness (A r ), genetic differentiation (pairwise F ST ) and relatedness (R), and estimated population size ( Table 2). We only present one relatedness estimator [43], but the patterns were similar for all three measures tested. No populations showed significant bottleneck signatures using the heterozygote excess test and the two-phase model, although one population of each species did show a signature of a bottleneck using the heterozygote excess test and the infinite allele model. The smallest population of Q. oglethorpensis showed a "mode shift", though no heterozygote excess. For Q. georgiana a bottleneck signature was observed for five populations (half of the populations) but only for the EST-SSRs. No bottleneck was detected for Q. boyntonii or Q. georgiana with neutral microsatellites. Q. oglethorpensis, the largest-ranged species we sampled, and was the only species which showed significant isolation by distance.
Comparing these three rare species to a set of other Quercus studies, we found that the rare oaks in this study had among the lowest heterozygosity, and that Q. oglethorpensis had an exceptionally high inbreeding coefficient (F IS , Table A1). Table 2. Summary statistics for each population and the average across populations. Reported is the population name (Pop name), the state the population is located (State, specific locality data is not provided given the rarity of the species), the number of samples (N samples genotyped) and number of unique multilocus genotypes (unique MLG), the expected heterozygosity (H exp ), allelic richness (A r ), mean pairwise F ST , relatedness (Rel), and estimated number of trees based on direct observations in the field (Pop size est.).

Species
Pop Name State

Genetic Diversity and Range Size for Our Three Rare Oaks
Range size shows some relationship to heterozygosity and allelic richness, in that Q. boyntonii (the most geographically restricted species) had the lowest heterozygosity and allelic richness ( Figure 2). However, Q. georgiana had the highest heterozygosity even though its range size was moderate. Range size strongly related to genetic differentiation as measured by F ST . All ANOVA test comparisons were significant except Q. georgiana, Q. oglethorpensis for allelic richness and Q. boyntonii, Q. oglethorpensis for allelic richness and expected heterozygosity.

Genetic Diversity and Range Size within Each Species
Genetic diversity and differentiation statistics are presented in Appendix B for populations above and below Nc of 50 individuals. For two species, Q. boyntonii and Q. georgiana, the trends were as predicted, with allelic richness and heterozygosity generally

Genetic Diversity and Range Size within Each Species
Genetic diversity and differentiation statistics are presented in Appendix B for populations above and below N c of 50 individuals. For two species, Q. boyntonii and Q. georgiana, the trends were as predicted, with allelic richness and heterozygosity generally higher in larger populations and F ST generally lower for larger populations ( Figure A2). Though statistically significant p-value differences were observed in only a few comparisons, all other comparisons were not significantly different ( Figure A2). Additionally, in these two species, relatedness generally showed no difference. For the third species, Q. oglethorpensis, the opposite pattern was observed, with lower genetic diversity, higher differentiation and higher relatedness in larger populations.

Genetic Diversity and Environment
Genetic diversity and differentiation were not related to climate variables for Q. georgiana and Q. oglethorpensis when testing all 19 bioclimatic variables at 2.5 min-none were significant after correcting for multiple testing.

Genetic Diversity in Ex Situ Collections
The percentage of genetic diversity currently preserved in ex situ collections is shown in Table 3. The percentage was not clearly related to species range size or to the number of ex situ samples; although Q. boyntonii had the smallest range and moderate number of samples, it had the lowest genetic diversity ex situ. Note that EST diversity was not conserved ( Figure A3). Table 3. Percent allele capture in ex situ collections. The percent of alleles conserved in ex situ collections, for different allele frequency categories, for each species. Allele frequency categories are: all (all alleles); very common alleles (>10%); common alleles (>5%); low (<10% and >1%); and rare (<1%). For rare alleles and all alleles, two results are presented, percentage captured when all alleles including those with fewer than two occurrences are included (complete data), and when alleles with one or two occurrences are excluded (reduced data, shown in parentheses).

Other Observations
We only identified clones in Quercus boyntonii. For this species we often observed small "rings" or clusters of stems, sometimes 5 or more meters across. We found 12 pairs of clones, which werealways were adjacent individuals, either stems sampled immediately next to each other or within a few meters.
As expected, we found that EST-SSR markers had lower diversity than nuclear SSR markers, with heterozygosity and the number of alleles being 19% and 14% lower on average, respectively.

Discussion
Our study tested the influence of range size, environmental and demographic variables on genetic diversity, and differentiation in three rare oak species. Our main findings are as follows. (1) These three rare species generally have lower genetic diversity than more common oaks previously studied, and range size relates strongly to genetic differentiation but less strongly to genetic diversity. (2) In spite of relatively small numbers of populations available, due to the rarity of these species, we found that in some cases small population size and geographic range may correlate with some metrics of genetic diversity and differentiation. (3) We also found that genetic diversity currently conserved varies among species of comparable geographic range size and numbers of samples preserved. Thus, our study supports the idea that "rarity" and collection history are not sufficient to explain genetic diversity in ex situ collections: the amount of genetic diversity preserved is also a function of intrinsic biology, demography, or life histories that vary independently of rarity.
We first present our observations in the context of rare and common species in the genus. Many population genetic studies have been performed in Quercus [52,53]. Genetic diversity is often summarized using allelic richness and heterozygosity. Expected heterozygosity was lower in our study (less than 0.65 for most populations, and a mean of 0.641 for Q. oglethorpensis, 0.615 for Q. boyntonii, and 0.72 for Q. georgiana) than was observed in other oaks, which typically had heterozygosities between 0.7 and 0.9 (Table A2). However, some common oaks were observed with lower genetic diversity (e.g., Q. phillyreoides, H e = 0.535) and some rare oaks were observed with higher genetic diversity (e.g., Q. pacifica (H e = 0.851) and Q. hinckleyi (H e = 0.853) as estimated using microsatellites. Some oaks were postulated to be naturally rare (e.g., Q. boyntonii), others were more likely to be rare due to human disturbance (e.g., Q. arkansana), and others were increasing in rarity for a long time (e.g., Q. hinckleyi) [27]. Due to the relatively long-lived nature of most oak species (100+ years), it is possible that recently rare oaks may take a long time to show the subsequent genetic impacts of a drop in population size and narrowing ranges that are associated with their increasing rarity. This form of "extinction debt" has been shown in simulations [12,54], while more naturally rare oaks would not be expected to show such genetic impacts. The relatively low genetic diversity in the species we studied may be due to relatively low population sizes over multiple generations.
Comparing genetic diversity statistics for these three species with different range sizes we see that Q. boyntonii has lowest heterozygosity and allelic richness as expected based on small range and highest endangerment status. However, Q. oglethorpensis and Q. georgiana had relatively equal allelic richness, and Q. georgiana had the highest heterozygosity even though its range size was moderate. It is not surprising that overall range size was only a moderate predictor of genetic diversity, as it is the local effective population size that influences retention of genetic diversity within populations (see next section). The paucity of bottleneck signatures may suggest the species have not suffered bottlenecks, or that bottleneck signatures have not had time to develop (as in other species with known, recent population collapses, e.g., Juglans cinerea, [55]); bottleneck tests are unreliable for recent, moderate, or gradual bottlenecks [54].
On the other hand, F ST is related to species range size for these three species: the smallest-range species (Q. boyntonii) had lowest F ST , and the species with the largest range size and most general habitat preference (Q. oglethorpensis) had highest the F ST (Figure 2). This conforms to population genetic theory regarding isolation by distance, whereby populations of a large range species have the most distance among them, and genetic distance is known to increase with geographic distance. Thus, in our study the influence of range size was much more apparent on among population genetic differentiation than on within population genetic diversity. Of course range size is not the only predictor of F ST , factors such as connectivity can also be used to predict F ST . For example, wide ranging oak species with high numbers of populations, and thus high gene flow, can show low F ST (e.g., in Q. macrocarpa [56]).
According to conservation genetic theory we would expect that populations near or below a population size of 50 individuals would be subject to strong genetic drift. The exact threshold for a population to rapidly suffer detrimental genetic consequences has been hotly debated [7,57,58], but here we focused on 50 individuals. For our study we would predict lower allelic richness and heterozygosity, and high F ST and relatedness in such populations. We see this predicted pattern in Q. georgiana and Q. boyntonii, though comparisons were significant or nearly so only for F ST in Q. georgiana (all loci t test 0.055, Wilcox 0.063; ESTs t test 0.04, Wilcox 0.063) and heterozygosity for Q. boyntonii (t test 0.003, Wilcox 0.057). The relatively low number of significant values emphasizes the small number of replicate populations (inherent in rare species) and the fact that for very recently reduced populations, genetic diversity impacts may not yet have accumulated [55].
Interestingly, for Q. oglethorpensis all statistics are in the opposite direction of what might be predicted based on a large population size (higher F ST , higher relatedness, lower heterozygosity, and lower allelic richness). It is not clear why Q. oglethorpensis shows this pattern. This could be a result of fragmentation coupled with the fact that Q. oglethorpensis grows predominantly as a subcanopy tree [59]. Although not well-studied in wind pollinated trees, subcanopy habit could possibly limit pollen dispersal [60]. However, this pattern would be consistent with recent expansions or founding populations, which would result in moderate population size but reduced genetic diversity and increased F ST .
There was no relationship between environmental variables and genetic statistics. It is possible that for these species, neutral genetic diversity is more influenced by current population sizes, which may be impacted by processes other than environment, such as land development, loss of habitat, etc. It is also possible that neutral genetic diversity and demography are influenced by environment but at fine spatial scales and/or along unmeasured environmental axes. Useful future work will be to create ecological niche models for each species to test for the impact of habitat suitability/probability of occurrence in relation to genetic diversity [61][62][63]. All three species are habitat specialists with typically very restricted populations.
Although the very common and just common alleles are preserved well in ex situ collections, low frequency and rare alleles are not conserved well, and overall, only a moderate amount of genetic diversity is preserved ex situ, between 60 and 78% for these species assuming all alleles are considered (68-94% if the rarest alleles are dropped). Previous modeling work suggests that the species with the largest range and highest F ST would require the most samples [11,12]. Quercus oglethorpensis is preserved extremely well, at 78%, even though it has the largest range; it does have the most individuals ex situ. Less of the genetic variation of Q. boyntonii is conserved than of Q. georgiana, even though Q. boyntonii has the smaller range and about twice the number of individuals ex situ. Other studies of the genetic diversity conserved ex situ have primarily been species specific and we are only aware of a few attempts to determine if genetic diversity ex situ correlates to range size. In the plant genus: Zelkova, Christe et al. [64] found that a small-range endemic was less well conserved than a larger-range species. Several reasons can explain their similar findings: for the rare species, collectors may have revisited a single accessible site for seed collection, even though it occurred across high topographic and ecological diversity, while for the common species collectors in multiple countries had visited numerous populations. In other words, accessibility and availability of sampling are important to consider.
Although more than 3000 botanical institutions maintain more than 100,000 globally threatened species ex situ [65], the conservation value of these collections is unclear. Most taxa are held in a small number of collections, usually with a small number of inadequately documented accessions [66,67]. While some collections maintain relatively high levels of genetic diversity [68,69], research on the genetic representativeness of species in living collections is sparse. Our results emphasize that the genetic diversity conserved in collections is not only a function of the number of samples conserved, nor simply a function of the species inherent characteristics such as range size. Rather, the amount of genetic diversity conserved is likely a function of the interaction number of samples, range size, and collection strategy (such as which populations are visited, the spatial sampling within populations, the number of maternal plants collected from, etc.) [51,70]. While Q. oglethorpensis is conserved quite well, Q. boyntonii and Q. georgiana may need more individuals sampled to better represent in situ diversity.

Caveats
We used microsatellites because they are an affordable method to achieve an understanding of genetic diversity and structure. We recognize that increased resolution could be obtained with next generation sequencing techniques [71,72]. It is known that microsatellites that are developed in one species and applied in a different species can show reduced genetic diversity due to PCR amplification failure caused by mutations in primer binding sites. The markers applied to our species were all developed from other species of oaks, but were developed in Quercus subgenus Quercus sections to which they were applied (red oak markers from Section Lobatae for the one red oak species, and white oak markers from Section Quercus for the two white oak species). However, this is also the case for nearly all microsatellite studies of oaks: the majority of microsatellites were developed in European white oaks and then applied in diverse species (Table A2). It is also known that microsatellites are susceptible to ascertainment bias, such that the investigator will select markers that are polymorphic in a small sample of test individuals, such that less polymorphic markers are not included in the study. We did not have an a priori expectation that the patterns we saw between species were due to this reason, as this should apply to all oak species using these markers.
Other caveats involve the populations we studied. There are likely some populations of these species that we are unaware of, and we sometimes were not able to collect all of the individuals within a population. Moreover, genetic diversity in some populations may reflect gene flow from other species. It is known that gene flow among related oak species does occur, often at low levels and that hybridization may be even higher in species that have low population numbers due to the phenomenon of pollen swamping, where heterospecific pollen may far outnumber conspecific pollen [73,74]. For instance, in the extremely rare Q. hinckleyi, hybrids have been identified with genetic markers [52]. We did attempt to only sample individuals consistent with the phenotype of the target species. Of course, any of these caveats would obscure the patterns that we were testing for, and it is possible that if such caveats could be taken into account (for example identifying and removing all hybrids), the patterns we found here might be stronger.

Conclusions and Conservation Implications
We found that genetic diversity and differentiation were influenced by both population size and range size, but that patterns did not perfectly accord to predictions. This emphasizes stochastic processes and the influence of multiple factors on genetic diversity we see today (time, human influence, and population recovery). We also found that genetic diversity conserved ex situ was not well predicted by species geographic range size or number of samples, in contrast to theoretical predictions, and that two species need more samples ex situ. The overall low genetic diversity in these three rare oaks relative to more common oaks suggest that genetic diversity may also be low in other threatened oak species, a supposition to be tested by analyzing several more threatened oaks. We note that Q. oglethorpensis, in spite of its wide geographic range, had lower allelic richness and heterozygosity than might be expected-nearly as low as the critically endangered and small range Q. boyntonii-and thus might already be suffering genetic erosion in its isolated populations.  Acknowledgments: This was truly a group effort, with so many wonderful collaborators to thank. The authors would like to thank all permit-granting agencies, all institutions who voluntarily sent samples, and all the individuals and institutions who aided in collection of wild material. We would like to thank Cindy Johnson, Kevin Feldheim, and Isabel Distefano for volunteering their time and expertise. Additionally, we thank the Field Museum for providing us access to equipment for microsatellite analysis.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.    Table A1. The matrix of rarity as presented by Rabinowitz mapped with three rare oaks (Q. boyntonii, Q. georgiana, and Q. oglethorpensis) and three oaks which are common by two measures (geographic range and habitat) but rare in abundance (Q. hemisphaerica, Q. incana, and Q. laevis). The three rare species group together in the same rarity ranking, while the three common oaks group together in a different rarity ranking. Bold font indicates species is in the subgenus Erythrobalanus (red oak), regular font indicates species is in the subgenus Leucobalanus (white oak).