Origins, Diversity and Naturalization of Eucalyptus globulus (Myrtaceae) in California

: Eucalyptus globulus is native to southeastern Australia, including the island of Tasmania, but is one of the most widely grown hardwood forestry species in the world and is naturalized on several continents. We studied its naturalization in California, where the species has been planted for over 150 years. We sampled 70 E. globulus trees from 53 locations spanning the entire range of the species in California to quantify the genetic variation present and test whether particular genotypes or native origin affect variation in naturalization among locations. Diversity and native afﬁnities were determined based on six nuclear microsatellite markers and sequences from a highly variable chloroplast DNA region (J LA+ ). The likely native origin was determined by DNA-based comparison with a range-wide native stand collection. Most of California’s E. globulus originated from eastern Tasmania. Genetic diversity in California is greatly reduced compared with that of the native Australian population, with a single chloroplast haplotype occurring in 66% of the Californian samples. Throughout California, the degree of E. globulus naturalization varies widely but was not associated with genotype or native origin of the trees, arguing that factors such as local climate and disturbance are more important than pre-introduction evolutionary history. or eastern Tasmania and King Island (western Bass Strait) (Cc18). The native distributions of haplotypes Cc41, Cc56 and S43 are mapped in Freeman et al. (2007). The positioning of the haplotypes Cc56, Cc41, Cc06 and S43 within a broader network of E. globulus haplotypes is shown in McKinnnon et al. (2004).


Introduction
The genus Eucalyptus L' Hér. (Myrtaceae) consists of over 700 species, the great majority of which are endemic to the continent of Australia [1][2][3]. The genus contains some of the most widely planted forestry species in the world, some of which have been cultivated for over 150 years [4,5]. These trees have become controversial in some places where they are grown, simultaneously being recognized as economically important and as problematic non-native weeds [6,7].
Eucalyptus globulus Labill. (Tasmanian blue gum) is the most grown temperate eucalypt, with extensive plantations in countries such as Australia, Chile, China, Ecuador, Ethiopia, Portugal, Spain and Uruguay [8]. Several studies have attempted to quantify the risk that E. globulus and other forestry species will become invasive outside their native ranges [9,10]. While E. globulus has become naturalized and formed landraces in many regions of the world, its spread from the original planting areas is considered 'limited' for plantations in Australia [11], Portugal [12][13][14][15] and South Africa [16]. While E. globulus may spread outside planted areas into adjacent habitat, the vast majority of non-planted seedlings and saplings (often termed wildlings) occur in close proximity to the source (<50 m) [15] (reviewed in [7]). Studies have implicated factors such as climate, disturbance seedlings and saplings (often termed wildlings) occur in close proximity to the source (<50 m) [15] (reviewed in [7]). Studies have implicated factors such as climate, disturbance (particularly fire) and management in determining the likelihood of E. globulus wildling establishment in Australia and Portugal [7,[11][12][13]15,[17][18][19]. However, there is little understanding as to whether genetic factors, including those encompassed by the native origin, impact the propensity for naturalization. In this context, 'naturalization' means that a species has overcome environmental barriers to survival and regular reproduction has become established at localities where the species has been introduced. This differs conceptually from 'invasion', which involves a species spreading away from areas of introduction and outcompeting native species [20].
Eucalyptus was introduced to California in the 1850s [21]. In 1853, a clipper ship captain named Robert E. Waterman reportedly commissioned his first mate to bring Eucalyptus seeds back from Australia [22], which Waterman subsequently used to establish several eucalypt species, including E. globulus, throughout the Suisun Valley, near San Francisco. Eucalyptus globulus was planted extensively throughout the state for countering malaria by drying up wetlands (the species was known as the 'fever tree' [23]), but also for windbreaks, fuel wood and timber [9,21,22,[24][25][26][27]. Approximately 16,000 ha of E. globulus were planted in California [27], with planting booms in the late 19th and early 20th centuries [5,21,22]. The remnants of these plantings persist ( Figure 1).  The extent of naturalization and spread of E. globulus from planted sources in California is highly variable [9,[28][29][30][31]. The California Invasive Plant Council had previously ranked the invasion potential of E. globulus as 'moderate' but subsequently downgraded the threat to 'limited' due to "evaluating E. globulus across the entire state, rather than focusing on coastal areas where it is most prone to spreading" [30,31]. Naturalization is clearly evident for some plantations, windbreaks and planted groves, but in other situations only the original planted trees remain with little or no sapling establishment. Most predicted current (2010) and future (2050) suitable habitats for E. globulus in California are in coastal regions [27,31] and, as in other countries [11,17,18,32], climatic factors no doubt play a significant role in the observed variation in naturalization. Nevertheless, with notable adaptive variation occurring within the E. globulus native gene pool with, for example, marked provenance differences in flowering traits [33], drought tolerance [34] and disease susceptibility [35], there is a need to also test whether genetic factors, particularly those related to the native provenance, influence the propensity for naturalization. Such 'pre-introduction evolutionary history' is being increasingly considered in invasion biology [36,37] with the interaction between climate and provenance of origin shown to significantly influence the invasive performance of Pinus taeda [38]. While the introduction history of E. globulus into California is fairly well-documented (e.g., [9,21,23,24,26]), little is known about the native source(s) of the Californian landrace of E. globulus, the number of introductions or the amount of genetic variation present relative to the native source populations.
The genetic variation in native E. globulus is strongly spatially structured across its geographic range, as evidenced by quantitative genetic [39], nuclear microsatellite DNA [40][41][42][43] and chloroplast DNA (cpDNA) studies [44,45]. For example, one common lineage of cpDNA haplotypes is found only in southern Tasmania and many specific (within lineage) cpDNA haplotypes have extremely localized spatial distributions [44][45][46]. As in most angiosperms, cpDNA is maternally inherited in E. globulus and does not recombine [47]. Thus, it is dispersed only by seed (not by pollen), most of which falls within twice the height of the mother tree [7,11,48,49]. As a consequence, the geographic structuring of cpDNA diversity in most eucalypt species tends to be more marked than that of nuclear DNA markers which are subject to genetic recombination and are spread across the landscape by both pollen and seed [50]. Given the knowledge of the geographic distribution of cpDNA haplotypes within the native E. globulus gene pool, it is possible to use cpDNA to determine the likely native origin of germplasm contributing to landraces, sometimes with quite fine geographic resolution [44][45][46].
Nuclear microsatellites (nSSRs) also reveal spatial structuring that differentiates native E. globulus populations (but at a coarser level than cpDNA), allowing mainland and Tasmanian populations to be differentiated, as well as western and eastern populations within Tasmania, Bass Strait Islands and Victoria [40][41][42][43]. This differentiation in nuclear microsatellite markers is sufficient to correctly identify the native origin of selected germplasm in the Australian National E. globulus breeding program [40]. Combined nuclear and cpDNA variation has been used to determine the genetic diversity and origins of the Portuguese population of E. globulus [45]. These marker systems are therefore suitable for studying genetic diversity in the Californian landrace of E. globulus.
In the present study, we combine DNA sequence data from the highly variable J LA+ cpDNA region [44] with nuclear microsatellite marker data to: (i) determine the native Australian origins of California's landrace of E. globulus; (ii) compare the amount of genetic diversity in the Californian landrace to that of native populations in Australia; and (iii) determine whether variation in naturalization of E. globulus in California can be explained by particular haplotypes or genotypes.

Collections and Naturalization
In April 2008, leaf tissue from 70 E. globulus trees was collected from 51 locations throughout California ( Figure 2, Table S1). A planting of E. globulus was considered 'unique' if it was at least 10 km from another grove or planting or if the age (size) of the trees indicated a different planting date. Sampling locations were chosen non-randomly to span the entire range of E. globulus in California, both inland and coastal, from Humboldt County, California (40.52914 • N, 124.03646 • W) to San Diego County, California (32.84748 • N, 117.27230 • W). Leaves were collected from one adult and one juvenile tree (if present) at each location for genetic analyses. Additionally, in some large groves tissue was sampled from more than one adult. In some cases, adult tissue could not be reached so only juvenile tissue was collected.

Collections and Naturalization
In April 2008, leaf tissue from 70 E. globulus trees was collected from 51 locations throughout California ( Figure 2,   Table S1 for further details). The JLA+ haplotypes have been grouped into Central (Cc) and Southern (S) types, and codes follow Tables A1 and S1. The distribution of the S112 haplotype, which has only been found in ornamental plantings of E. globulus in Australia, is indicated in red. Only locations used in the naturalization study with JLA+ haplotypes available are plotted (detailed in Table S1).  Table S1 for further details). The J LA+ haplotypes have been grouped into Central (Cc) and Southern (S) types, and codes follow Table A1 and Table S1. The distribution of the S112 haplotype, which has only been found in ornamental plantings of E. globulus in Australia, is indicated in red. Only locations used in the naturalization study with J LA+ haplotypes available are plotted (detailed in Table S1).

Naturalization Analysis
The level of naturalization was assessed in 39 of the 51 locations sampled. Locations that were not assessed for naturalization included E. globulus street trees, park trees or where human management might interfere with our ability to detect or quantify naturalization. In plantations where naturalization was scored, non-planted trees were determined by both age and location. A seedling, sapling or young tree occurring near a plantation, grove or windbreak was considered an offspring of the originally planted trees (i.e., a wildling). In most cases, reproduction was easy to determine because saplings occurred outside regularly planted rows or on plantation margins. Naturalization was quantified by counting the number of saplings or non-planted trees in a defined area at the edge of a grove. We counted saplings 10 m into the grove and 30 m outside the grove edge, for 10 m along the grove edge. This resulted in a 400 m 2 (40 m × 10 m) sampling area. The location of quantification was purposely chosen to capture the highest possible density of juveniles ensuring that our quantification would be an overestimate of overall naturalization. For single trees or windbreaks, we surveyed a 400 m 2 area around the tree(s). Naturalization was coded on a six-point scale where 0 = no evidence of any naturalization, 1 = limited naturalization with fewer than 10 wildlings present, 2 = some naturalization with 10-20 wildlings present, 3 = moderate naturalization with 20-30 wildlings present, 4 = abundant naturalization with 30-40 wildlings present, and 5 = extensive naturalization with over 40 wildlings present.

Chloroplast DNA
Genomic DNA was extracted using Qiagen DNeasy Plant Mini Kit (Qiagen Corp., Valencia, CA, USA). The J LA+ region of the chloroplast genome (near the junction of the large single-copy region and inverted repeat A, including the trnH-psbA intergenic spacer) was amplified and sequenced in both the forward and reverse direction for 67 Californian collections, utilizing the same primers as Freeman et al. [44]. Each PCR (25 µL final volume) consisted of 2.5 µL of 10× Taq buffer (67 mM Tris-HCl, pH 8.8; 16.6 mM (NH 4 ) 2 SO 4 ; 0.45% Trition X-100; 0.2 mg/mL gelatin), 1 U of Taq (Bioline, Trento, Italy), 200 µM of each dNTP, 2.5 mM MgCl 2 , 100 µg/mL BSA (Bovine Serum Albumin), 7.5 pmol of each primer, 2.5 µL of 50% glycerol (weight to volume) and 20 ng of genomic DNA. PCR amplification was conducted using a BioRad C1000 Thermal Cycler with the following program: 95 • C for 5 min, followed by thirty cycles of 94 • C for 1 min, 61 • C for 1 min, 72 • C for 1 min and a final extension of 72 • C for 5 min. Sequencing was performed on an AB3730xl automated sequencer. Sequences were aligned manually using Sequencher 4.8 (Gene Codes Corporation, Ann Arbor, MI, USA) and CLC Free Workbench 4.0.2 (CLC bio, Aarhus, Denmark). Haplotypes were classified by comparing the J LA+ sequences of the 67 Californian collections with J LA+ sequences from native trees of known location. The native Australian tree cpDNA haplotypes of E. globulus were defined on the basis of 133 variable characters scored from 579 trees and incorporated J LA+ sequences from 225 trees genotyped by Freeman et al. [44], 106 by McKinnon et al. [51], 30 by Freeman et al. [45] as well as 218 other native trees. The characters defining the J LA+ haplotypes found in the Californian samples and their Genbank accession numbers are listed in Table A1. The relationships among the Californian haplotypes were determined following McKinnon et al. [46]. This involved using 121 characters in 616 bp of J LA+ sequence to generate a distance matrix among haplotypes using PAUP version 4.0a [52]. This matrix was used to generate an intraspecific haplotype network with the program TCS 1.21 [53]. Several haplotypes were collapsed in this process due to having only minor character differences. Rarefied mean haplotype diversity (d) was calculated on the R console [54] using the 'vegan' statistical package [55], rarefying the diversity across 15 individuals, which was the minimum sample size among the native regions (specifically, King Island/Western Tasmania) included in the J LA+ database at the time of this study.

Nuclear DNA
Six nuclear microsatellite loci were amplified from all Californian samples and an additional 13 control samples from wild populations of known provenance from Australia (six from across Victoria; two from the Furneaux group of islands in eastern Bass Strait, two from northeastern Tasmania, one from southeastern Tasmania, one from western Tasmania and one from King Island in western Bass Strait-see Figure 3). We used two sets of microsatellite primers developed in E. grandis and E. urophylla, EMBRA19 [56] and EMBRA30 [57], both of which had been tested previously and optimized for E. globulus ( [58] and [59], respectively), and four developed for E. globulus-EMCRC2, EMCRC7, EMCRC10 and EMCRC11 [60]. Forward primers were tagged with an M13 florescent tag according to Schuelke [61]. We used the following PCR recipe: 12.5 µL Promega GoTaq Hotstart Colorless Mastermix, 0.65 µL of 10 pmol/µL 5 M13-tailed forward primer, 2.5 µL of 10 pmol/µL reverse primer, 2.5 µL of 10 pmol/µL 5 M13 HEX or FAM labeled primer, 1 µL of DNA (10-200 ng/µL) and 5.9 µL sterile deionized water for a final volume of 25 µL. PCR amplification was conducted with the following conditions: 94 • C for 5 min; 30 cycles of 94 • C for 30 s, touchdown annealing starting at either 60, 62 or 64 • C (see references above for specific temperatures) for 45 s and decreasing by 0.5 • C each cycle, 72 • C for 45 s; followed by 8 cycles of 94 • C for 30 s, 53 • C for 45 s, 72 • C for 45 s; and a final extension at 72 • C for 10 min. We visualized amplified products on 0.8% agarose gels using 1× Sodium Borate buffer with Biotium GelRed™ Nucleic Acid Gel Stain. We estimated the size of each fragment using an ABI sequencing machine at the UC Berkeley DNA Sequencing Facility using ROX 500 as a size standard. We scored fragment sizes using GeneMapper (v 4.0. Applied Biosystems, Waltham, MA, USA). The 13 Australian samples allowed alignment of the nuclear SSR genotypes from the Californian landrace samples with microsatellite data from a database of range-wide native collections of E. globulus trees (n = 590). Based on previous analyses of nine SSR loci [42,62], including the six used in this study, the native Australian samples were grouped into four major geographic regions: Victoria (n = 222), Furneaux (50), King Island/western Tasmania (107), and eastern Tasmania (n = 213) ( Figure 3; Table S2). POPGENE (v. 1.31 [63]) was used to obtain population genetic parameters; total observed (A) and effective (A e ) numbers of alleles across all loci, the observed (N a ) and effective (N e ) number of alleles per locus, as well as the observed and expected heterozygosity (H o and H e ) and the inbreeding coefficient (F IS ). GENEPOP (ver 4.7.0; Rousset 2008) was used to calculate the frequency of null alleles for each locus and population. Using the region by SSR locus values for H o , H e and F IS , the differences between regions (d.f. = 4; Californian landrace and four native Australian E. globulus regions) was tested using PROC GLIMMIX of SAS (Version 9.4), fitting region (d.f. = 4) and SSR locus (d.f. = 5) as fixed effects. The model assumed a Guassian distribution of errors (d.f. = 20) and used the chi-squared statistic to test the significance of the region effect. Where the region effect was significant (p < 0.05), pairwise contrasts between the Californian landrace sample and each of the native regions were tested using the t-statistic. Allelic richness (AR; a measure of the number of alleles, independent of variation in sample size) was calculated using the software package FSTAT [64]) and rarefied across 42 diploid individuals (EMBRA19 was scored in 42 of the Californian samples, representing the smallest sample size of any locus). These parameters were estimated for the Californian landrace as a whole, each of the four native regions in southeastern Australia, and across the total native range sample of E. globulus. Pairwise F ST values (and associated levels of significance) between the Californian landrace and each of the four native regions were obtained using FSTAT. We conducted an AMOVA analysis using GenAlEx (v 6.4 [65]) to partition the molecular variance between and within populations (Californian landrace plus native regions) and using Φ ST with 999 permutations of the data.
Based on the allelic profiles from six nuclear SSR loci and the database of 590 native samples (see above), STRUCTURE (v. 2.3.4 [66]) was used to obtain secondary evidence for the native region affinities of each of the Californian collections, based on the probabilities of their assignment to four pre-defined native regions ( Figure 3). We used a burn-in of 200,000 iterations (by which time stationarity had been reached), followed by 200,000 MCMC iterations for twenty independent runs, assuming no prior population groupings and using the admixture model. probabilities of their assignment to four pre-defined native regions ( Figure 3). We used a burn-in of 200,000 iterations (by which time stationarity had been reached), followed by 200,000 MCMC iterations for twenty independent runs, assuming no prior population groupings and using the admixture model.

Naturalization Analyses
We tested for an effect of both cpDNA haplotype and SSR regional assignment on naturalization with an analysis of variance (ANOVA) using the R base statistics package [54]. Our analyses focused on the 39 sites in which naturalization was assessed. We excluded any cpDNA haplotypes that were only represented in a single individual tree (seven haplotypes in total were removed). We conducted these analyses using cpDNA haplotypes and SSR assignments from the adult from each site. Three sites were excluded because adult tissue was not sampled. If there were multiple individuals sampled from the same site and they were assigned the same haplotype (only one case) or same regional assignment (only one case), the duplicate was removed in the ANOVA since it likely does not represent an independent sample. If multiple adults from the same site had different haplotyes or regional assignments, all unique samples were included and the naturalization for the site was applied to both. We also divided naturalization into two categories, high naturalization (all groves with a score 3 and above), and low naturalization (all groves with a score 2 and below), and used a Fisher's exact test to test if genotype or SSR assignment could predict the category degree of naturalization.

Naturalization Analyses
We tested for an effect of both cpDNA haplotype and SSR regional assignment on naturalization with an analysis of variance (ANOVA) using the R base statistics package [54]. Our analyses focused on the 39 sites in which naturalization was assessed. We excluded any cpDNA haplotypes that were only represented in a single individual tree (seven haplotypes in total were removed). We conducted these analyses using cpDNA haplotypes and SSR assignments from the adult from each site. Three sites were excluded because adult tissue was not sampled. If there were multiple individuals sampled from the same site and they were assigned the same haplotype (only one case) or same regional assignment (only one case), the duplicate was removed in the ANOVA since it likely does not represent an independent sample. If multiple adults from the same site had different haplotyes or regional assignments, all unique samples were included and the naturalization for the site was applied to both. We also divided naturalization into two categories, high naturalization (all groves with a score 3 and above), and low naturalization (all groves with a score 2 and below), and used a Fisher's exact test to test if genotype or SSR assignment could predict the category degree of naturalization.

Native Origin of Californian E. Globulus
Fifteen J LA+ cpDNA haplotypes were found in the 67 Californian samples sequenced successfully (defining characters and GenBank accession numbers are given in Table A1). Three samples failed to amplify for cpDNA. Haplotypes corresponded to two major J LA+ types found in native E. globulus from Australia [44], the Southern (S) and the Central (Cc) types ( Figure A1). Eleven of the sixty-seven Californian samples had five haplotypes belonging to the Central type, whereas the other fifty-six samples had ten haplotypes belonging to the Southern type ( Figure A1). The high proportion of Southern haplotypes indicates that most of California's trees likely originated from southern Tasmania (Figures 2 and 4, Table S1). Indeed, of nine haplotypes found in both the Californian landrace and the Australian native populations of E. globulus, five have been found only in southeastern Tasmania, and two of these have been found only in the vicinity of the D'Entrecausteaux Channel, south of Hobart (Figure 4). In addition, another of the Southern haplotypes found in the Californian landrace (S87) has, to date, only been found in E. morrisbyi, a Tasmanian endemic confined to southeastern Tasmania.

Native Origin of Californian E. Globulus
Fifteen JLA+ cpDNA haplotypes were found in the 67 Californian samples sequenced successfully (defining characters and GenBank accession numbers are given in Table A2). Three samples failed to amplify for cpDNA. Haplotypes corresponded to two major JLA+ types found in native E. globulus from Australia [44], the Southern (S) and the Central (Cc) types ( Figure A1). Eleven of the sixty-seven Californian samples had five haplotypes belonging to the Central type, whereas the other fifty-six samples had ten haplotypes belonging to the Southern type ( Figure A1). The high proportion of Southern haplotypes indicates that most of California's trees likely originated from southern Tasmania ( Figures  2 and 4, Table S1). Indeed, of nine haplotypes found in both the Californian landrace and the Australian native populations of E. globulus, five have been found only in southeastern Tasmania, and two of these have been found only in the vicinity of the D'Entrecausteaux Channel, south of Hobart (Figure 4). In addition, another of the Southern haplotypes found in the Californian landrace (S87) has, to date, only been found in E. morrisbyi, a Tasmanian endemic confined to southeastern Tasmania. Five of the fifteen haplotypes (S112, S143-S146) found in California have not yet been found in native populations of E. globulus. However, these haplotypes were most similar to the Southern Tasmanian cpDNA type ( Figure A1), which is confined naturally to eastern Tasmania. Just over half the Californian samples (37 out of 67, or 55%) shared a single cpDNA haplotype, S112. Collections of the S112 haplotype span California from San Diego in the south to Mendocino in the north (Figure 2). In Australia, haplotype S112 has only been found in ornamental ('compacta' form) or other planted E. globulus trees, Five of the fifteen haplotypes (S112, S143-S146) found in California have not yet been found in native populations of E. globulus. However, these haplotypes were most similar to the Southern Tasmanian cpDNA type ( Figure A1), which is confined naturally to eastern Tasmania. Just over half the Californian samples (37 out of 67, or 55%) shared a single cpDNA haplotype, S112. Collections of the S112 haplotype span California from San Diego in the south to Mendocino in the north (Figure 2). In Australia, haplotype S112 has only been found in ornamental ('compacta' form) or other planted E. globulus trees, not in natural stands. Seven of the 11 (64%) Californian localities with multiple samples were homogeneous for a single J LA+ cpDNA haplotype, consistent with localized naturalization. In these cases, the SSR assignment was generally consistent between the adult and seedling  Table S1) which could signal pollen flow from outside the stand. The four heterogeneous localities involved mixed cpDNA haplotypes of eastern Tasmanian or Californian origin (S112 or S143) (Figure 2; Table S1) and while this may signal heterogeneity in the cpDNA haplotype of the adult population, it generally had little effect on the SSR assignment at the regional level.
An analysis of molecular variance (AMOVA) of the nuclear SSR data showed low but significant genetic differentiation (Φ ST = 0.064, p <0.001) among E. globulus populations (California, Victoria, Furneaux, King Island/western Tasmania, and eastern Tasmania), with 94% of the variation occurring within populations. STRUCTURE analysis (Table 1) and F ST comparisons ( Table 2) clearly revealed (i) significant differentiation among the native Australian populations and (ii) that the Californian collections had closest overall affinities to native samples from eastern Tasmania. At the individual level, forty percent of the Californian collections were assigned to eastern Tasmania based on nuclear markers (Table S1). However, samples also showed affinities to Furneaux, King Island/Western Tasmania and Victoria. For example, of the thirty-seven collections that had the common S112 chloroplast haplotype, STRUCTURE analysis using the nuclear SSR data revealed that twenty-three had their highest affinities to eastern Tasmania, seven to Furneaux, four to Victoria and two to western Tasmania/King Island (Table S1), suggesting an eastern Tasmanian origin for S112. Despite strong molecular evidence from the shared cpDNA haplotypes (landrace/native) for a predominantly southeastern Tasmanian origin for the Californian samples, there was still a statistically significant difference between the SSR genotypes of the Californian samples and the two races of E. globulus (as defined by Dutkowski and Potts [39]) that occur in this region of the native distribution (AMOVA-Southeastern Tasmania n = 50, F ST = 0.033, p < 0.001; Southern Tasmania n = 26, F ST = 0.048, p < 0.001). Table 1. Average probability of assignment of samples from the Californian landrace to the four major regions in the native range of Eucalyptus globulus, based on six nuclear SSRs, and the number of samples in each pre-defined group (N). Seventy Californian samples were assigned to four predefined regions of the native distribution in Australia (Figure 3), using STRUCTURE. The assignment of the 590 native E. globulus samples used to define a priori the regional groups in the STRUCTURE analysis is also shown (W Tas = western Tasmania).

Genetic Diversity in California
Taken as a whole, the Californian landrace was less genetically diverse than the native populations of E. globulus in Australia. The mean cpDNA haplotype diversity, d, for the Californian landrace was 6.20 ± 1.38, which was low compared with the overall haplotype diversity of native E. globulus (d = 13.28 ± 1.20). It was also low compared with eastern Tasmanian E. globulus (d = 12.82 ± 1.32, Table 3), the region where most of the Californian trees are likely to have originated (see below). For the six nuclear SSR loci, diversity within the Californian landrace was low compared with the native Australian samples, both overall and within regions (Table 3). In the Californian landrace, the six loci had 3-18 alleles per locus (average of 10.33 alleles per locus). The average effective number of alleles per locus was 4.54. The mean rarefied allelic richness (±standard deviation) was 9.38 ± 3.95, which was lower than even the least diverse native region (Table 3) and markedly lower than the eastern Tasmanian region from where most of California's E. globulus likely originated (allelic richness = 12.92 ± 3.16). The observed heterozygosity across all loci (H o = 0.58) and the expected heterozygosity (H e = 0.73) in the Californian landrace were low compared with those across the whole native range (Table 3), although the differences among groups in H e were not significant (H o , χ 2 (4 d.f.) = 9.52, p = 0.049; H e , χ 2 (4) = 8.15, p = 0.086). H o was significantly lower in the Californian landrace than in the eastern Tasmanian region (pairwise contrast, t 20 = 2.65, p = 0.015) and the Furneaux region (t 20 = 2.22, p= 0.038), but was not significantly lower than King Island or Victorian regions. H e was significantly lower in the Californian landrace than in the eastern Tasmanian region (t 20 = 2.20, p = 0.040), but not significantly lower than Furneaux, western Tasmania or Victoria. There was a slightly greater deficiency of heterozygotes in California, which could be due to the Wahlund effect or the Californian landrace being more inbred compared with the Australian native regions (F IS = 0.19 cf. 0.14-0.17), although the differences among groups were not statistically significant (χ 2 (4) = 1.03, p = 0.906). While the influence of null alleles in the SSR data was apparent for some loci in some populations, their occurrence is unlikely to affect comparisons between the Californian and eastern Tasmanian E. globulus as they were similarly important in both samples. For example, the estimated null allele frequency in the Californian landrace averaged across the 6 loci was 0.078 compared with 0.075 in eastern Tasmania and the locus with the highest frequency of null alleles [EMCRC10] had the same frequency in both samples (Table S2).

Naturalization in California
There was a complete continuum in degree of naturalization among the 39 sites assessed in California ( Figure S1), with 18 having naturalization scores of three or greater (abundant or extensive naturalization, >20 saplings per 400 m 2 ) and these were widely distributed in coastal areas (Figure 2, Table S1). In contrast, we observed 21 sites with low naturalization (score of two or lower). We observed groves with no evidence of naturalization and groves with varying degrees of naturalization in close proximity. Across the 39 sites, we found a total of 12 cpDNA haplotypes in adults. Seven of these were represented only once in the sites assessed so were removed from the ANOVA regarding naturalization (removed haplotypes: Cc05, Cc06, Cc18, S129, S145, S64, and S87). Included in these analyses were the five remaining cpDNA haplotypes found in 30 adult individual trees (S112, n = 18; S43, n = 5; Cc41, n = 3; S05, n = 2; Cc56, n = 2) from 29 unique sites (sites with only juveniles and sites with unique haplotypes were removed). We found no effect of cpDNA haplotype on naturalization (ANOVA, p = 0.42, F 4, 25 = 1.01). The most prevalent cpDNA haplotype (S112) was found in groves exhibiting all levels of naturalization, including zero naturalization ( Figure 5 and Figure S1). For the SSR assignment, there were four assigned regions represented in 38 adult trees sampled at 36 unique sites: eastern Tasmania, n = 25; Furneaux, n = 6; western Tasmania, n = 4; and Victoria, n = 3. We found no effect of the SSR-assigned region on naturalization (ANOVA, p = 0.658, F 3,34 = 0.54). We also grouped sites into low naturalization (scores 0-2) and high naturalization (scores 3-5) ( Figure 5). We found that no cpDNA haplotype (Fisher's exact test p = 0.1392) (Figure 5a) nor any SSR-assigned region (Fisher's exact test p = 0.9442) (Figure 5b) was more likely to show high naturalization. loci in some populations, their occurrence is unlikely to affect comparisons between the Californian and eastern Tasmanian E. globulus as they were similarly important in both samples. For example, the estimated null allele frequency in the Californian landrace averaged across the 6 loci was 0.078 compared with 0.075 in eastern Tasmania and the locus with the highest frequency of null alleles [EMCRC10] had the same frequency in both samples (Table S2).

Naturalization in California
There was a complete continuum in degree of naturalization among the 39 sites assessed in California ( Figure S1), with 18 having naturalization scores of three or greater (abundant or extensive naturalization, >20 saplings per 400 m 2 ) and these were widely distributed in coastal areas (Figure 2, Table S1). In contrast, we observed 21 sites with low naturalization (score of two or lower). We observed groves with no evidence of naturalization and groves with varying degrees of naturalization in close proximity. Across the 39 sites, we found a total of 12 cpDNA haplotypes in adults. Seven of these were represented only once in the sites assessed so were removed from the ANOVA regarding naturalization (removed haplotypes: Cc05, Cc06, Cc18, S129, S145, S64, and S87). Included in these analyses were the five remaining cpDNA haplotypes found in 30 adult individual trees (S112, n = 18; S43, n = 5; Cc41, n = 3; S05, n = 2; Cc56, n = 2) from 29 unique sites (sites with only juveniles and sites with unique haplotypes were removed). We found no effect of cpDNA haplotype on naturalization (ANOVA, p = 0.42, F4, 25 = 1.01). The most prevalent cpDNA haplotype (S112) was found in groves exhibiting all levels of naturalization, including zero naturalization ( Figure 5; Figure S1). For the SSR assignment, there were four assigned regions represented in 38 adult trees sampled at 36 unique sites: eastern Tasmania, n = 25; Furneaux, n = 6; western Tasmania, n = 4; and Victoria, n = 3. We found no effect of the SSR-assigned region on naturalization (ANOVA, p = 0.658, F3,34 = 0.54). We also grouped sites into low naturalization (scores 0-2) and high naturalization (scores 3-5) ( Figure 5). We found that no cpDNA haplotype (Fisher's exact test p = 0.1392) (Figure 5a) nor any SSR-assigned region (Fisher's exact test p = 0.9442) (Figure 5b) was more likely to show high naturalization.

Native Origin of Californian E. globulus
Primary evidence from chloroplast DNA haplotypes, backed by secondary evidence from the nuclear SSRs, suggests that the Californian landrace of E. globulus is derived predominately from introductions from eastern Tasmania, particularly southeastern Tasmania. Of the 10 (out of 15) Californian cpDNA haplotypes that have been recorded previously in native populations, most (70%) have been found only in southeastern Tasmania, and several only in the region bordering the D'Entrecasteaux Channel (30% Channel in Table S1). The presence of five haplotypes (S112, S143-S146) in the Californian landrace that have not been found in native stands in Australia suggests either that genetic changes have occurred since the initial introduction, or these haplotypes are now exceedingly rare in the contemporary native range. The predominantly southeastern Tasmanian origin of the Californian landrace of E. globulus is similar to the native origin reported for the trees in Portugal [45], Spain [68], China [69] and Chile [70]. However, several chloroplast haplotypes in California have only been found in more northerly areas of eastern Tasmania, indicating that seed has probably been introduced to California from a number of native sources in Tasmania. This is consistent with the multiple imports of seed into California directly from Australia in the 19th Century documented by Groenendaal [22]. However, the widespread distribution of trees with the S112 haplotype in the Californian landrace suggests that a large component of the Californian plantation estate was established from local seed sources within California. In California, 67% of the trees sampled had the same chloroplast haplotype, S112, which, while not reported from native stands, is of the 'S' J LA+ type found only in southern Tasmania. We sampled the largest and likely oldest trees in the Suisun Valley where Eucalyptus globulus was introduced early on and they have the S112 haplotype. Given the widespread distribution of this haplotype it is possible that Waterman's early seed introduction(s) was a key source of seed for California's later planting boom in the late 19th Century [22,23], consistent with the reduced nuclear SSR diversity in the Californian landrace. S112 is also found in E. globulus trees in other countries, including Europe (B. Potts unpubl. data), but the possibility of an introduction via a second country is unlikely given the historic records of the importation of the seed from Australia at the time (see Introduction).
While S112 has not been found in the native populations of E. globulus in Australia, it has been found in ornamental plantings of the 'compacta' form of E. globulus in southeastern Tasmania, in the Hobart and D'Entrecasteaux Channel areas. These ornamentals are a relatively short, multi-stemmed form of E. globulus (E. globulus var. compacta L.H. Bailey, see [71]) markedly different from the typical single-stemmed form of E. globulus in California and Australia. 'Compacta' is propagated as an ornamental in both Australia and California (where it is referred to as 'Dwarf Blue Gum'), and the Californian form, at least, is recorded to have originated from a tree discovered in Fremont CA by John Rock of the California Nursery Company in the late 1800s [72]. The predominance of S112 in California and its rarity in E. globulus in Australia (found only in ornamental or otherwise planted trees) provides molecular evidence that supports suggestions in Australian [73] and Californian [25] horticultural publications that the 'compacta' form of E. globulus was introduced to Australia from California for use as an ornamental.

Genetic Diversity in California
Genetic diversity of the Californian landrace is lower than that of native populations in Australia, as would be expected of an introduced species. There is lower diversity of both cpDNA haplotypes and nuclear SSRs, and lower observed heterozygosity in the Californian landrace than in the native population in eastern Tasmania. The Californian landrace has values of these parameters comparable with the isolated King Island native population, which has among the lowest values reported in native E. globulus [42,62]. The likelihood of widespread distribution of locally sourced seed within California, as discussed above, clearly opens the possibility for reduced genetic diversity due to founder effects, plus reduced heterozygosity arising from inbreeding in the open-pollinated seed distributed from the original plantings. Such inbreeding would be expected due to (i) the mixed mating system of E. globulus which results in variable levels of selfing, particularly in seed collected from low in the canopy [74] or low density plantings [75], and (ii) biparental inbreeding if relatives are planted together [76]. Allelic richness [77] and chloroplast haplotype diversity [78]) are particularly sensitive to population bottlenecks, suggesting that founder effects and drift may explain why the Californian E. globulus landrace is significantly different from the eastern Tasmanian populations from where most of the germplasm is likely to have originated. The strong possibility of relatedness and drift in the Californian landrace as a result of local seed sourcing is exemplified by Santos' [21] historic note that two trees in Alameda County, just south of San Francisco, produced seed for the establishment of 150,000 trees, which in turn provided seed for at least 50,000 more. The low putatively neutral SSR diversity of the Californian landrace accords with the suggestion by Eldridge et al. [4] that landraces of Eucalyptus are probably less diverse and more inbred than the native populations from which they were derived.

Naturalization in California
Our results confirm the observations that some E. globulus groves in California, which flower and set seed, do not show sapling recruitment, while others show extensive naturalization [28,31]. Indeed, clear cases of naturalization have been reported where groves have doubled or tripled in size over six decades, relative to the original planting [79]. The goal of this study, however, was not to quantify total naturalization across California, but rather to explore whether variation in naturalization could be explained by genetic factors, including those associated with the native origin. We tested the hypothesis that differences in naturalization are associated with differences in genotype. However, our results show that this is not the case. We found no evidence to suggest that populations bearing particular chloroplast haplotypes were more prone to naturalization than others, and populations bearing the most common haplotype in California showed all possible levels of naturalization. The propensity for naturalization was also not associated with differences in nuclear SSR markers, nor the inferred native region of origin.
If variation in naturalization is not based on genotype or source location, the most likely explanation is variation in environmental conditions or opportunities for establishment. In its native range, E. globulus experiences more rain in winter but, on average, rainfall occurs nearly year-round [67]. In California, six months or more can go by with no rainfall. Therefore, establishment and extensive naturalization might only occur in areas with sufficient year-round water resources. Groves found in the coastal fog belt, irrigation ditches, and riparian areas show the most extensive naturalization ( [27,28,31], J. Yost unpubl. data). In Portugal, E. globulus recruitment is higher in areas with lower temperature seasonality and higher rainfall [17,18,32]. From a systematic survey of E. globulus naturalization along plantation edges in Australia, Larcombe et al. [11] found that sapling abundance was higher on sites that received regular, relatively high rainfall and had lower mean annual temperatures (i.e., climate conditions more similar to the native range). At a local scale, studies in Portugal have shown saplings were more abundant on moist aspects [14], and spread more along natural drainage lines and in the direction of the prevailing wind [19]. At this local scale, the likelihood of sapling establishment has also been shown to depend upon the reproductive output of the E. globulus source population [11,15,19], adjacent plant community type [18,49], fire [11,15,17,80,81] and, potentially, levels of seed predation [80,82].

Conclusions
We found no evidence to support the hypothesis that 'pre-introduction evolutionary history' reflecting differences in adaption of the founder planted trees could contribute to variation in reproduction and sapling establishment of the Californian landrace of E. globulus. This absence of a genetic signal may be due to the relatively low genetic diversity and predominance of introductions from a single region within the native range. Most of California's E. globulus originated from eastern Tasmania and the trees underwent a significant bottleneck when introduced to the state. Genetic diversity in California is greatly reduced compared with that of the native Australian population, with a single chloroplast haplotype occurring in 66% of the Californian samples. Therefore, the variability in naturalization observed in E. globulus groves in California is likely to be driven by sitespecific ecological conditions. Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/f12081129/s1, Figure S1: The number of sites in each naturalization class, Table S1: Collections of Californian Eucalyptus globulus samples, their chloroplast and nuclear genotypes and geographical assignments, Table S2: Genetic diversity statistics per population and per SSR locus for the Californian and Australian native stand samples of Eucalyptus globulus.  Table A1 for accession numbers). The SSR data are available at: https://dx.doi.org/10.25959/sd4n-9v54. Locality naturalization scores are provided in Table S1.

Acknowledgments:
The authors thank D. Meister, A. Leman, Y. Serge-Groba, K. Kay for laboratory assistance, Thais Pfeilsticker for producing haplotype maps, Paul Tilyard for assistance with haplotype data management, as well as J. Freeman and R. Vaillancourt for discussion and access to data from the native populations.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. J LA+ character variation in cpDNA haplotypes found in California 1 . Light shading indicates haplotypes not found in the native range of Eucalyptus globulus and the dark shading indicates character states not found in the native range.
GTTGAAAAGAATGTAGATAAATTG, GenBank Assession numbers Cc05-JQ029998, Cc06-JQ029999, Q030009, Cc41-JQ713792, Cc56-AY620896, S05-JQ713796, S112-JQ030108, S129-JQ030068, S143-JQ713808, Q713809, S145-JQ713810, S146-JQ713811, S43-AY620869, S64-JQ030107, S87-AY640050. Figure A1. Haplotype network for chloroplast haplotypes found in the Californian landrace of Eucalyptus globulus. Circle sizes represent number of individual trees found to contain each haplotype. Haplotypes that have been found in the native population in Australia are shaded. Unshaded haplotypes have so far not been found in Australia, except for S112 which has been found only in ornamental plantings. Within the native range of E. globulus in Australia, the S haplotypes are found only in eastern Tasmania. Although Cc haplotypes are found in Victoria, the Furneaux Islands and Tasmania (Freeman et al., 2007), the specific haplotypes shown here are confined to either eastern Tasmania (Cc05, Cc06, Cc41 and Cc56) or eastern Tasmania and King Island (western Bass Strait) (Cc18). The native distributions of haplotypes Cc41, Cc56 and S43 are mapped in