Perceptions of Similarity Can Mislead Provenancing Strategies—An Example from Five Co-Distributed Acacia Species

Ecological restoration requires balancing levels of genetic diversity to achieve present-day establishment as well as long-term sustainability. Assumptions based on distributional, taxonomic or functional generalizations are often made when deciding how to source plant material for restoration. We investigate this assumption and ask whether species-specific data is required to optimize provenancing strategies. We use population genetic and environmental data from five congeneric and largely co-distributed species of Acacia to specifically ask how different species-specific genetic provenancing strategies are based on empirical data and how well a simple, standardized collection strategy would work when applied to the same species. We find substantial variability in terms of patterns of genetic diversity and differentiation across the landscape among these five co-distributed Acacia species. This variation translates into substantial differences in genetic provenancing recommendations among species (ranging from 100% to less than 1% of observed genetic variation across species) that could not have been accurately predicted a priori based on simple observation or overall distributional patterns. Furthermore, when a common provenancing strategy was applied to each species, the recommended collection areas and the evolutionary representativeness of such artificially standardized areas were substantially different (smaller) from those identified based on environmental and genetic data. We recommend the implementation of the increasingly accessible array of evolutionary-based methodologies and information to optimize restoration efforts.


Introduction
Ecological restoration is carried out with a multitude of specific goals [1] but in general should always contribute positively to biodiversity, human health and wellbeing and the delivery of ecosystem services [2]. As global interest in restoring ecosystems is growing (e.g., the United Nations' Decade on Ecosystem Restoration; https://www.decadeonrestoration.org/), it is imperative that targets and actions rely on the best available evidence. The science in support of restoration activities is advancing rapidly [3,4] but to have useful impact, we need to understand what information is generalizable and broadly applicable, and when data is needed. For instance, increasing evidence advocates that promoting genetic diversity will improve the long-term sustainability of restored populations [5,6]. A range of carefully crafted germplasm-sourcing strategies has been proposed to facilitate such practices, but the majority of these (26 at last count [3]) are hampered by the paucity of information needed to identify the genetic and climatic boundaries on which they are based [7]. Therefore, in practice, restoration actions often rely on generalizations about evolutionary boundaries and/or climatic suitability [8]. Yet the evidence and methodological approaches necessary for appraising the suitability of taxonomic, distributional or other types of generalization are scarce at best [9]. We avail of novel standardized datasets and methodological approaches to test if generalized provenancing strategies are reliable.
Evolutionary resilience is critical to restoration success [10], and full recovery is only achieved when all key ecosystem attributes closely resemble those of a reference ecosystem, including the capacity of species and communities to adapt and evolve [2]. The restoration of evolutionarily sustainable ecosystems and biodiversity is therefore reliant on the establishment of a strong link between contemporary fitness and longer-term evolutionary potential [11,12]. As the necessary evolutionary, ecological and environmental information is rarely available, surrogate approaches are often relied upon to provide arbitrary guidance to seed sourcing strategies [13].
Assumptions on the distribution of genetic diversity can potentially be misleading, and while "local is best" [14] is increasingly viewed as a testable experimental hypothesis rather than a universal rule [3], prioritizing areas of high diversity for one species is unlikely to capture the same degree of diversity for another. Provenancing strategies invoking the inclusion of broad genetic representativeness [5,15], climate-adjusted strategies [16] or the use of climatological data [17] generally assume that a replicated approach is applicable across multiple species or, at a minimum, across related and/or co-distributed ones. Generalized expectations on the distribution of genetic diversity can originate from multi-species metanalyses (e.g., [18] using a wide range of sampling and analytical strategies; [19] relying on a more constrained but replicated approach) but it remains challenging to define replicable provenancing strategies from such studies alone.
Here we present a multispecies study aimed at testing the validity of provenancing generalizations based on taxonomic, distributional and functional similarity. We selected five Acacia species with overlapping distributions across the Sydney Basin Bioregion and along the east coast of Australia ( Figure 1). The genus Acacia (Mimosaceae) is used extensively for restoration practices in Australia and globally (e.g., http://acaciatreeproject.com.au/acacia-tree-project/). In Australia acacias also have an important role in ecosystem function and dynamics, as they are prolific post-disturbance colonizers recruiting from soil seed banks, contributing to nitrogen fixation and providing food and shelter for a vast array of insects and vertebrates [20].
Comparative investigations involving species representing functional or taxonomic groups commonly used in restoration practices can provide a better understanding of the relative strength of simplified provenancing strategies. This is particularly relevant when extensive overlaps in distribution, ecology and phylogeny are likely to impart localised practitioner communities with an expectation for evolutionary similarities. In order to test the validity of provenancing generalizations, we apply novel standardized methodologies and datasets across five co-distributed acacias frequently used in restoration practices and ask if overlap in distributional patterns mirror: (1) habitat preferences; (2) landscape-genomic patterns; and (3) empirically or arbitrarily defined genetic provenancing boundaries. Diversity 2020, 12, x FOR PEER REVIEW 3 of 17 Figure 1. Distribution of the five study species across the study area of eastern Australia.

Study Species: Acacia linifolia, A. longifolia, A. suaveolens, A. terminalis and A. ulicifolia
The five study species were selected because their distributions overlap within the Sydney Basin Bioregion, with Acacia linifolia largely confined to this region and the other four species extending further along the east coast of Australia often in sympatry ( Figure 1). All five species are used in restoration practices across eastern Australia and seed can be purchased commercially, generally with little or no indication of seed origin or collecting strategy. Provenancing and seed sourcing guidelines for acacias are mostly "all-purpose" [21,22], despite the genus being highly speciose (>800 species in Australia alone) and occurring across diverse habitats.
The five study species are broadly similar in ecology but vary in some relevant functional traits (Supplementary Table S1). Insect pollination is assumed for most Acacia species [23], but across the

Study Species: Acacia linifolia, A. longifolia, A. suaveolens, A. terminalis and A. ulicifolia
The five study species were selected because their distributions overlap within the Sydney Basin Bioregion, with Acacia linifolia largely confined to this region and the other four species extending further along the east coast of Australia often in sympatry ( Figure 1). All five species are used in restoration practices across eastern Australia and seed can be purchased commercially, generally with little or no indication of seed origin or collecting strategy. Provenancing and seed sourcing guidelines for acacias are mostly "all-purpose" [21,22], despite the genus being highly speciose (>800 species in Australia alone) and occurring across diverse habitats.
The five study species are broadly similar in ecology but vary in some relevant functional traits (Supplementary Table S1). Insect pollination is assumed for most Acacia species [23], but across the study species information was only available for A. longifolia (insect and wind) and A. terminalis (insect and bird [24]). Only a small number of Acacia have been studied in detail for pollination syndrome, and mating systems are considered as predominantly outcrossing [24]. Amongst the study species, A. longifolia and A. terminalis are recorded as self-incompatible, A suaveolens has a mixed mating system which has been discovered using progeny analysis (van der Merwe; pers. comm.) and A. ulicifolia has been recorded as self-compatible [24]. Acacia species can be both fire sensitive and fire responders (resprouting post fire) often relying on soil-stored seedbanks [25]. Seed viability generally increases with short-term soil storage, while dormancy-breaking temperatures and seed predation can vary between individuals and populations [26].

Species Distributions and Environmental Niche Models-Comparing Niche Overlap and Future Expectations
We used environmental niche models (ENMs) to define and compare habitat preferences among the five study species. For each species, occurrence points were extracted from the Atlas of Living Australia (http://www.ala.org.au) and filtered to remove records without an attached herbarium voucher. We used spThin [27] to randomly remove records so that each occurrence point was at least 2 km apart. The remaining records were used to create occurrence lists for mainland eastern Australia (−39.1, −20.9, 140.5, 154.0).
For the environmental background data, we obtained 19 bioclimatic and seven geomorphological variables at a spatial resolution of 0.01 degrees (1.1 × 1.1 km at the equator) from eMAST [28]. In order to minimize the influence of covariance on our models, we selected sets of variables with a variance inflation factor <12 [29], which was iteratively calculated using the R package usdm [30]. We excluded variables Bio08, Bio09, Bio18 and Bio19 as they show spatial inconsistencies across the study region. This resulted in 32 environmental backgrounds comprised of different combinations of bioclimatic variables along with slope, aspect, topographic position index, and clay percentage.
We modelled habitat suitability for each species with the maximum entropy algorithm implemented by MaxEnt 3.3.3 [31,32]. We used the R package ENMeval [33] to evaluate the AICc score of different ensembles of parameter settings and environmental backgrounds. The parameter settings we varied included three regularization multipliers (0.5, 1.0, 1.5) and each possible combination of three feature classes (Linear, Quadratic, Product). After a trial on a subset of the data, we selected the "checkerboard 1" method to partition the data in all models. We trained all models on environmental backgrounds constrained to a 200 km radius around occurrence points, though projected across the extent of eastern Australia. We took the mean of the top five performing models (lowest AICc score).
To forecast ENMs for the year 2070, we projected contemporary species environment relationships onto general circulation models (GCMs) obtained from the Climate Model Inter-comparison Project 5 (CMIP5) data repository, accessed via the Lawrence Livermore National Laboratory node (https://esgf-node.llnl.gov/projects/esgf-llnl). Since the climate scenarios simulated by different GCMs vary considerably, we used the top four performing GCMs for eastern Australia (ACCESS1-0, GFDL-CM3, MPI-ESM-LR, MPI-ESM-MR), as previously evaluated by [34]. Separate models were constructed for two representative concentration pathways (RCPs) of greenhouse gases assuming a moderate (RCP = 4.5) versus high (RCP = 8.5) emissions scenario. In total, we produced 40 MaxEnt models for each species (5 model settings × 4 GCMs × 2 emissions scenarios).
We calculated the average and standard deviation of the logistic output of the MaxEnt models for the current climate and the two 2070 emissions scenarios. To visualize areas of high predicted habitat suitability in 2070, the mean logistic habitat suitability of the top five performing models minus 2× the standard error was plotted. In addition, we calculated and plotted the change in habitat suitability availability between the current climate and 2070 climate models with the R package Binned Environmental Change Index (BRECI; [35]).
Using the same environmental variables of the top five MaxEnt models, we estimated niche overlap between species across eastern Australia with the R package dynRB [36]. Values are bounded between 0 and 1, with values near 0 indicating a small overlap and near 1 a large overlap [36]. We calculated overlap of n-dimensional hypervolumes for each set of environmental variables, then took an average of the results and plotted directional overlap between species-pairs with a heatmap.

A New Site Matching Tool
We developed a companion tool to our Restore and Renew webtool decision-support system (https://www.restore-and-renew.org.au; [7], the R&R SiteMatch tool, which allows users to develop an understanding of the distribution of broadly similar environmental conditions at a chosen location, both now and to account for future climate change. A basic form of site matching is provided by the tool, whereby a few selected "key" environmental drivers are used without reference to a selected species-that is, we consider only physical site similarity independently of the response of species. GIS layers are interrogated at a user-selected location to establish values for each of the key factors. Tolerances are then applied to the extracted values providing an indicative envelope of matching environments. The user is then shown a map of the spatial distribution of GIS grid-cells falling within the combined tolerance region for all factors. Users can select the combination of environmental factors to be used from mean annual temperature (MAT), mean annual precipitation (MAP), temperature seasonality (TS), precipitation seasonality (PS) and aspect and topographic wetness index (TWI). They can also select a combination of future climate change conditions including moderate versus severe impact scenarios, and forward time steps of 2050 or 2070.

Sampling Strategy and SNP Datasets
We followed the sampling strategy developed for Restore and Renew, a large project that aims to equip restoration practitioners and land managers with a summary of pertinent evolutionary, environmental and ecological information across multiple species [7]. The sampling strategy ensures even representation across the environmental and geographical distribution of each species while maximizing (when possible) between-species overlaps. To achieve maximum informative power while maintaining resource-effectiveness for wild species [7,37,38], the focus is on increasing the number of sites sampled across the distribution of each species targeting six individuals per site (Table 1;  Supplementary Table S2).  Supplementary Table S2) and genomic data used for each Acacia species. N samples (number of samples analysed, samples with greater than 50% loci missing were removed); N sites (sampled); N raw loci (number of loci generated by DArTseq); N loci (number of loci remaining for the analyses after standard filtering process involving the removal of loci with 20% missingness and with minimum reproducibility of 0.96). All samples were genotyped (Table 1) following the method developed by Diversity Arrays Technology Pty Ltd. (DArT) following previously developed protocols [7,39]. Reduced representation sequencing approaches such as DArTseq enable the cost-effective investigation of evolutionary processes at a genomic scale and the fine-scale examination of genetic variation across landscapes [40]. DArTseq is a high throughput, cost-effective restriction-based, reduced representation genome sequencing method that provides genotype data for thousands of single nucleotide polymorphisms (SNPs) across the genome [41,42]. The genome reduction and library construction method of DArTseq principally follows the methods described by [42]. Recent studies using DArTseq have found that they are highly informative for understanding relationships among populations and species in multispecies studies, particularly with closely related lineages [43,44].

Comparing Landscap Genomics among Five Co-Distributed Acacia Species
The genotype data for each species were analysed using the three-step workflow process (implemented in R) developed as part of the Restore and Renew project [7]. First, SNP loci of poor quality (reproducibility average <0.96, genotypes missing in >20% of samples) and poor-quality samples (samples missing data in a large proportion of loci) were removed, and SNPs were filtered to only include one SNP per locus to prevent the potential influence of linkage. Secondly, a distance-based network analysis [45] was used to identify outlier samples, whether due to taxonomic misidentification or biological processes (such as recent hybridization). After removal of these outlier samples, a dataset was prepared for each species to obtain landscape genetic measures and to derive the statistical models applied in the Restore and Renew webtool to define provenance boundaries. General population genetics measures were also obtained to provide general comparative parameters rather than in-depth interpretations of biology and dynamics (which will be the focus of dedicated studies).
Comparative measures of genetic diversity at population and whole-species levels included: observed heterozygosity (obs_het); expected heterozygosity (exp_het); allelic richness (ar); inbreeding coefficient (Fis) estimated using the R package diveRsity [46]; and number of private alleles (n_pa) within each site were calculated using the private alleles function from Poppr [47]. Additionally, as the preliminary results of A. ulicifolia suggested that some sites included highly similar individuals, we conducted further investigation to determine the presence of clonality (Supplementary Figure S1). Clonality can impact on diversity estimates and individuals identified as clonal were removed from the dataset.
To quantify genetic distance between sample sites, we calculated pairwise Fst values with the Weir and Hill estimator from the R package SNPRelate [48,49]. Correlation between pairwise Fst and linear population distances (isolation-by-distance, IBD) were assessed using a Mantel test [50], implemented with the R package Vegan [51]. Finally, we used sparse non-negative matrix factorization (sNMF) as implemented in the R package LEA [52] to investigate patterns of population structure across the landscape and assign samples to genetic clusters. sNMF was performed with 10 repetitions for each of the K values tested.

Comparing Provenance Delineation among Five Co-Distributed Acacia Species
Once the preliminary steps were completed, we estimated the models used to generate information for practitioners [7]. Briefly, these are generalized dissimilarity models (GDMs, implemented with R package gdm [53]), describing the level of genetic dissimilarity (differentiation) between sites as a function predictor variable [54,55]. For each species, we first evaluated how well the spatial distance between pairs of sites explained genetic dissimilarity. For species where spatial distance explained a large fraction of observed variation in pairwise genetic dissimilarity, we did not use additional covariates. For species that exhibited discrete stratification into ancestral population groups (or lineages) and this resulted in substantial unexplained variation ("deviance") in the GDM, we used ancestral population membership coefficients (as inferred using sNMF) as covariates. For A. suaveolens, we observed high levels of differentiation between highland and lowland sites [7], and consequently we used elevation as a covariate in the GDM, because it provided a better fit than spatial distance and ancestral population coefficients. Having estimated a GDM and using it to make predictions for Fst across the landscape in reference to a nominated site, we predicted the genetically local area surrounding that site by choosing a threshold value of differentiation (in this case Fst = 0.2, with the exception of A. suaveolens where we used Fst = 0.3 to be consistent with the steep grade of intra-specific variation [7]). The prediction of the genetically local area for a nominated location can be generated and displayed rapidly. Note that we avoid excessive extrapolation of model predictions by restricting predicted genetically local areas to fall within a buffered hull around the sampled sites (implemented using R packages inla [56], and rgeos [57]). For models that use ancestral population membership coefficients, spatial surfaces for these covariates were generated by using a kriging procedure that is specifically designed for compositional data (implemented in R package compositions [58] and using the compOKriging function).
We compared coverage of observed genetic diversity within geographically defined regions surrounding a test location by fitting convex hulls to the sequenced samples for taxon to the plot on axes PC1 and PC2 of a genotype-derived principal component analysis (PCA). The two geographic regions were: a) the provenance computed using the GDM for the taxon, and b) a 20 km radius around the test location (representing a general-purpose provenance for seed sourcing [13]). The area of each geographically defined convex hull was interpreted as an index of the coverage of observed genetic diversity captured within that geographic space (provenance) and was compared to the convex hull formed using all available samples for a species (i.e., its overall evolutionary potential).
Concave hulls might also be considered as alternatives to convex hulls and could be used to compute a more constrained or conservative estimate of coverage. However, trials of three concave hull methods described in supplementary material (Supplementary Figures S2-S5) suggest that while producing the same rank ordering of coverage estimates, these alternative methods are computationally less efficient than convex hulls and require the setting of one or two arbitrary parameters, therefore reducing the degree of objectivity in their application. We therefore based all further assessment of coverage on convex hull results.

Species Distributions and Environmental Niche Models-Comparing Niche Overlap and Future Expectations
The comparison of habitat preferences suggests that environmental niche overlap is generally high among the five Acacia species studied ( Figure 2). As could be expected, divergence is more evident among species with distribution areas that are particularly different in size, for example between A. linifolia, the species with the smallest distributional range, and all other species. The habitat of A. terminalis also appears to have smaller overlaps in distribution within the northern ranges of the other acacias. Future climate models suggest that the impact of scenarios predicted for 2070 tend to have similar patterns of suitable habitat losses amongst the five acacias, with the species with the smallest distribution being relatively less impacted (Figure 3; Supplementary Figure S6).
Restoration actions which seek to account for local adaptation or "future-proofing" need to characterize the specific environmental condition at a proposed restoration site and to identify regions in the surrounding landscape with broadly similar conditions from which pre-adapted propagation material may be sourced [59]. That is, it is necessary to consider a restricted portion of the species-wide indications of environmental suitability provided by ENMs. Using the site-matching approach, suitable sampling locations that currently characterize future climate conditions can be targeted for climate-proofing restored vegetation across all species (Figure 4). Although less refined than the ENMs, relatively simple site-matching models such as the one developed here provide the opportunity to visualize and interpret overall shifts in local conditions and potential sources of future-adapted material for all the species located at a target site.       Empirically defined provenance boundaries generated using generalized dissimilarity models (GDMs) based on distribution-wide genomic datasets. All GDMs were evaluated at a common test location (blue teardrop; 33.65 • S 150.5 • E). GDM-derived provenance is shaded grey. Shaded orange are regions with climate that currently matches predicted 2070 climate at the test location. Matching future climate was assessed under a moderate change scenario (RCP4.5). The location of a major biogeographic barrier, the Hunter River Corridor (HRC), is also included.

Comparing Knowledge-Based Provenance Delineation and Landscape-Genomics
The genetic provenance boundaries obtained using the predictions from GDMs vary in size and shape among the five species studied (Figure 4). The boundaries defined for Acacia linifolia include the whole range of the species, as expected for a species with little differentiation and a small range. A. longifolia and A. terminalis also displayed large provenance boundaries, although not including the whole species' distribution and identification of a northern boundary at the Hunter River Corridor, a recognized biogeographic barrier [60]. Finally, A. suaveolens and A. ulicifolia displayed much narrower boundaries, despite the former relying on a higher threshold setting [7].
We used the DArTseq dataset to provide a range of comparative diversity measures at population and whole-species levels for the five study species (Figure 5). The objective here was not to present and interpret a detailed landscape genomics study for each species (to be presented elsewhere), but to provide comparative statistics illustrating the context of this study. Acacia suaveolens displayed lower levels of diversity (allelic and heterozygosity) and higher levels of inbreeding (Fis) than the other species, as well as high levels of between population differentiation (Fst). As previously suggested [7], these high levels of landscape differentiation are reflected in the narrow provenancing boundaries defined for this species. A. ulicifolia showed greater overall variance, and higher levels of observed heterozygosity relative to expected heterozygosity, with higher levels of between-population differentiation (Fst), as expected from a relatively high incidence of population-level clonality (Supplementary Figure S1). Acacia linifolia, A. longifolia and A. terminalis displayed low variance in Fis, while for the other two species levels of inbreeding were more variable across populations. Differences in population dynamics among species are apparent in the relationship between genetic and geographic variation (Figure 6), where varying degrees of isolation by distance (IBD) were identified. Differences in spatial distribution of genetic variation are confirmed by the visualization of genetic structure groups obtained from sNMF analysis (Supplementary Figure S7). A. longifolia, A. suaveolens and A. terminalis appear to have some level of latitudinal structuring, although for each species there are inland populations that are inconsistently grouped with respect to latitude.
Coverage of observed genetic variation captured across the species varied among the five species irrespectively of the methods used to define sourcing boundaries (knowledge based, or arbitrary). The amount of observed genetic variation captured by a 20 km radius region around the test location was found to be a small fraction of coverage provided by provenance regions derived from GDMs (with the exception of A. suaveolens; Table 2). For example, for Acacia longifolia, a 20 km radius would capture only 5.73% of the total observed genetic variation compared to 90.16% for the GDM-derived provenance and this is clearly seen in the plot of convex hulls for this taxon (Table 2; Supplementary Tables S3 and S4).   Table 2. Coverage of observed genetic variation across the five Acacia species captured by geographic regions defined by using a GDM, and a generic 20 km radius around the same target site. Coverage was computed as the ratio between the area of a convex hull around samples on a plot of PC1 and PC2, and the area of the convex hull around samples falling within the region.

Discussion
Provenance boundaries empirically estimated from patterns of genetic differentiation were noticeably different among five species of Acacia that are closely tied to an analogous vegetation type and often found in sympatry. We show that broad similarities in habitat requirements among congeneric taxa do not correspond to similar landscape-level dynamics nor to similar distribution of genetic variation. Our data confirms that matching provenancing strategies based on a perception of

Discussion
Provenance boundaries empirically estimated from patterns of genetic differentiation were noticeably different among five species of Acacia that are closely tied to an analogous vegetation type and often found in sympatry. We show that broad similarities in habitat requirements among congeneric taxa do not correspond to similar landscape-level dynamics nor to similar distribution of genetic variation. Our data confirms that matching provenancing strategies based on a perception of taxonomic, distributional, environmental or ecological similarities could lead to suboptimal choices. Based on the results presented here, we recommend the implementation of evolutionary-based methodologies to optimize restoration efforts.
Commonalities in the delineation of genetic provenances are unlikely to be the norm, as species respond differently to selective filtering processes and stochastic events through time. The purpose of this study was not to investigate and identify these biological and historical drivers. It is likely however that variations in breeding systems, dispersal mechanisms and response to disturbances gave rise to contrasting patterns of landscape-level connectivity and the genetic provenance boundaries derived from them. For instance, among the five Acacia studied, self-compatibility was only reported for Acacia ulicifolia [24] and A. suaveolens, with current experimentation suggesting that selfing rates and related measures of fitness and viability can vary across a species' distribution range (van der Merwe pers. comm.).
Differential responses to fire and recruitment from soil seed banks and/or resprouting capacity (Supplementary Table S1) are also likely to impact on the distribution of diversity at populationand species-level. While most Acacia respond to fire through recovery from the seedbank, previous studies suggest that A. ulicifolia and A. terminalis may have mixed responses to post-fire recovery. For example, our genomic data suggest that A. ulicifolia is alone in displaying unexpectedly high levels of clonality (Supplementary Figure S1). Clonality and the potential of resprouting after disturbance events that might have otherwise killed above-ground ramets, can decrease localised vulnerability [61] but can also decrease within-population diversity and increase between-populations divergence ( Figure 5). For clonal species, a rapid shift in climate may inflict local extinction due to lack of evolvability and interestingly, A. ulicifolia was the most difficult to collect in consistent numbers with historical occurrence records suggesting localised extinction patterns. Agamospermy could also explain the extensive geographic distribution of clones observed in this study, and while seed production through self-fertilization and agamospermy can both have short term evolutionary benefits (such as range expansion), the levels of genetic diversity captured through seed sourcing will greatly vary depending on the collecting strategy. Consequently, undetected clonality can significantly influence restoration outcomes.
The lack of similarity in provenancing boundaries and the difficulty in developing generalized guidelines [9] is particularly important within highly localized contexts, where practitioners might rely on personal interpretations derived from locally replicated conditions and assemblages (Figures 1  and 2). It has been previously suggested that as anthropogenic influences continue to have an impact on natural systems, the simple protection of standing biodiversity is unlikely to suffice [2,62]. Consequently, restoring vegetation needs to strike a balance between considering natural historical boundaries (as revealed by genetic structure), responding to contemporary conditions (resulting in loss of available habitat for example) and predicting climatic shifts ("future-proofing"). Consequently, the implementation of sourcing strategies that are conscious of both climatic requirements (current or future) and the distribution of genetic diversity, are critical to the success of ecological restoration.
Better access to relevant genomic and environmental data ushered a new era for evolutionarily informed restoration activities [63] and enabled the implementation of novel methodologies that have become cost effective and easy to apply and interpret. Our replicable analytical and interpretational approach suggests that genetic provenancing areas, as defined by natural levels of gene flow, are often large and genetically diverse ( Figure 4). The species A. suaveolens was a notable exception. This was possibly due to high levels of selfing or biparental inbreeding in this species (M. van Der Merwe in prep.), leading to high levels of drift and population genetic differentiation, including over relatively short distances. The relatively simple site-matching model developed here provides an additional mechanism to consider climate-related options within these comprehensive, genetically defined sourcing areas (Table 2, Figure 4). Simplifying the logistic requirements of germplasm-sourcing strategies while preventing localized over-harvesting, circumvents some of the limitations of current restoration practices [64].
Finally, while we show that generalized provenancing guidelines and approaches need to be considered with caution, informative replicated patterns are still likely to emerge from large-scale, standardized, multispecies datasets. However, these will not necessarily be based on simple phylogenetic relationships or distributional similarities but will more likely denote shared functional and evolutionary histories [38].