Phylogeographical Structure of Liquidambar formosana Hance Revealed by Chloroplast Phylogeography and Species Distribution Models

: To understand the origin and evolutionary history, and the geographical and historical causes for the formation of the current distribution pattern of Lquidambar formosana Hance, we investigated the phylogeography by using chloroplasts DNA (cpDNA) non-coding sequences and species distribution models (SDM). Four cpDNA intergenic spacer regions were ampliﬁed and sequenced for 251 individuals from 25 populations covering most of its geographical range in China. A total of 20 haplotypes were recovered. The species had a high level of chloroplast genetic variation (Ht = 0.909 ± 0.0192) and a signiﬁcant phylogeographical structure (genetic di ﬀ erentiation takes into account distances among haplotypes (Nst) = 0.730 > population di ﬀ erentiation that does not consider distances among haplotypes (Gst) = 0.645; p < 0.05), whereas the genetic variation within populations (Hs = 0.323 ± 0.0553) was low. The variation of haplotype mainly occurred among populations (genetic di ﬀ erentiation coe ﬃ cient (Fst) = 0.73012). The low genetic diversity within populations may be attributed to the restricted gene ﬂow (Nm = 0.18). The time of the most recent common ancestor for clade V mostly distributed in Southwestern China, Central China, Qinling and Dabieshan mountains was 10.30 Ma (95% Highest posterior density (HPD): 9.74–15.28) dating back to the middle Miocene, which revealed the genetic structure of L. formosana was of ancient origin. These results indicated that dramatic changes since the Miocene may have driven the ancestors of L. formosana to retreat from the high latitudes of the Northern Hemisphere to subtropical China in which the establishment and initial intensiﬁcation of the Asian monsoon provided conditions for their ecological requirements. This scenario was conﬁrmed by the fossil record. SDM results indicated there were no contraction–expansion dynamics, and there was a stable range since the last interglacial period (LIG, 130 kya). Compared with the population expansion detected by Fu’s Fs value and the mismatch distribution, we speculated the expansion time may happen before the interglacial period. Evidence supporting L. formosana was the ancient origin and table range since the last interglacial period.


Introduction
Asia's climate has experienced dramatic changes since the Miocene and the global climate cooling  after the Miocene [1,2]. The dramatic changes included intensification of East Asian

Plant Materials and DNA Extraction
A total of 251 individual trees of L. formosana were collected from 25 populations covering most of its geographical range in China ( Figure 1, Table 1). Fresh young leaves were collected from 10 to 11 sample trees, with a minimum distance of 100 m apart from each other. One sample of Liquidambar styraciflua was used in the phylogenetic analysis as outgroup. The leaf samples were then dried using silica gel for DNA extraction. Total genomic DNA was extracted using the Plant Genomic DNA Kit (TIANGEN, Beijing, China). The DNA quality and concentration were measured by electrophoresis on 0.8% agarose gel and Microplate Spectrophotometer (Molecular Device, Sunnyvale, CA, USA) respectively. The DNA samples were diluted to 50 ng/µL for a later experiment.
The program PERMUT [29] was used to test the phylogeographic structure by comparing whether Nst was significantly greater than Gst. A significantly higher Nst than Gst indicates the presence of phylogeographical structure with relatively closer haplotypes frequently found in the same area. Nst is a parameter of genetic differentiation that takes into account distances among haplotypes, whereas Gst is a parameter of population differentiation that does not consider distances among haplotypes. Gst and Nst were estimated with 1000 permutations. Meanwhile, the total gene diversity (Ht) and average gene diversity within populations (Hs) was calculated. Relationships among chloroplast haplotypes were inferred by constructing a median-joining network with the default settings in NETWORK 5.0 [30]. The distributions of all haplotypes were plotted on a relief map of China. Several clades identified in the chloroplast network were also mapped.

Phylogenetic Analyses and Divergence Time Estimation
The phylogenetic tree was reconstructed by maximum likelihood (ML) and maximum parsimony (MP) using PAUP Version 4.0*b10 [31], and Bayesian analysis using MrBayes Version 3.1.2 [32], respectively. The sequences of L. styraciflua was treated as outgroup. The ML and MP analyses were performed by heuristic search methods with tree bisection reconnection (TBR) branch swapping, and a bootstrap test was conducted 1000 times. The Bayesian analysis was performed using the best-fit GTR+I model of nucleotide substitution, which was chosen from MrModeltest 2.3 [33] under the Akaike information criterion (AIC). Markov Chain Monte Carlo (MCMC) was run for 10,000,000 generations, sampling every 1000 generation. A total of 10,000 trees were sampled. The first 2500 trees were burned in, and the rest were used to reconstruct 50% consensus tree and computed posterior probabilities. The result of the tree was edited and visualized using FIGTREE v1.4.0 [34].
The divergence time of all haplotypes was estimated by BEAST 1.7.4 [35], with the uncorrelated lognormal (UCLN) relaxed-clock model. The estimated divergence time of L. styraciflua and L. formosana was 22.16 Ma (95% HPD (Highest posterior density): 9.74, 44.59) according to Morris et al. [36] which was used to calibrate the root node. The constant size coalescent was adopted as the tree prior, with L. styraciflua as outgroup. Five independent runs of 80,000,000 MCMC generations, were conducted, sampling at every 8000 generations, following a burn in of the first 10% cycles. The log files were checked in TRACER v.1.6 [37] to confirm effective sample size (ESS) for all parameters reached adequacy (ESS > 200) and convergence of the chains to a stationary distribution. Divergence time and 95% highest posterior density (HPD) were visualized using FIGTREE v1.4.0 [34].

Population Genetic Structure
To study genetic structure, an analysis of molecular variance (AMOVA) was employed using the software ARLEQUIN V3.5 [38] by performing 1000 random permutations to ensure the accuracy of the estimation of variance components.

Tests of Expansion
To test the likelihood of departures from neutrality and evidence of range expansion, Tajima's D and Fu's Fs statistics were calculated using DnaSPv5 [28]. Significant negative values of Tajima's D and Fu's Fs may be the result of population expansion. Because the value of Fu's, Fs was more sensitive than Tajima's D, it is often used as the test parameter at present [39]. In addition, the mismatch distribution was calculated to further investigate demographic history. A multimodal distribution usually suggests that populations are at equilibrium, whereas a unimodal distribution shows that populations have experienced recent expansion. Coded indels were excluded from all tests of expansion.

Ecological Niche Modeling
A Biomod2 species distribution model was used to reconstruct geographic distribution patterns of L. formosana in different historical periods, including the potential distribution areas of the present  and the past (the last glacial maximum period (LGM: 21,000 years before present), and the last interglacial period (LIG: 130,000 years before present). Current distribution records were obtained from the Chinese Virtual Herbarium (http://www.cvh.org.cn/), Global Biodiversity Information Facility (https://www.gbif.org/), and the 25 records in this study. A total of 215 presence records of L. formosana were obtained from throughout the species distribution range ( Figure S1). Nineteen bioclimatic variables and altitude were downloaded from the WorldClim website with a resolution of 2.5 arcmin resolution (http://www.worldclim.org/). To avoid model over-fitting and inaccurate distribution prediction, pairwise Pearson correlation coefficients (r > 0.8) were eliminated [40]. Only eight environmental variables with r < 0.7 were retained for SDMs (BIO2: Mean diurnal range, BIO5: Max temperature of warmest month, BIO6: Min temperature of coldest month, BIO8: Mean temperature of wettest quarter, BIO12: Annual precipitation, BIO15: Precipitation seasonality, BIO17: Precipitation of driest quarter and BIO18: Precipitation of warmest quarter) were used to reconstruct the geographic distribution of L. formosana. We evaluated model performance using the true skill statistics (TSS), the area under the receiver operating characteristic curve (AUC) and Kappa value calculated by Biomod2.

Chloroplast Haplotype Variation, Haplotype Network, and Distribution
The total aligned length of the four cpDNA regions was 2732 bp (807 bp for psbD-trnT, 714 bp for psbJ-petA, 747 bp for trnL-trnF, and 464 bp for ndhF-rpl32). Based on the concatenated sequences, twenty variable sites were detected, including ten nucleotide substitutions and ten indels (Supplementary Table S1). Sequences have been deposited in GenBank (accession numbers provided in Table S2). Twenty haplotypes (H1-H20) were recovered from the 25 populations, with 18 of these populations exhibiting more than one haplotype, including one population (PX) with five haplotypes, four populations (XY, HSAH, JO, and NJ) with four haplotypes each, seven populations (TR, KX, TG, WYJX, KH, FN, and CX) with three haplotypes each, and six populations (BWL, LY, CB, ZS, SZHN, and GY) with two haplotypes each, while the remaining seven populations (WYGD, FD, SZHB, HuoS, SC, TB, and HA) were fixed for a signal haplotype. Haplotypes in each population of L. formosana are listed in Table 2. According to the phylogenetic consensus tree, five clades could be identified by 10 nucleotide substitutions and 10 indels mutations in the haplotype network ( Figure 2). The solid line indicates a nucleotide mutation step interlinking two haplotypes, and the dotted line indicates an indel mutation step. Each line indicates one mutation step. Clade I including H1, H12, H20, H9, and H10 occurred mainly in Eastern China, with the exception of BWLpopulation, each showed one or more indels mutation. Clade II contained H11 and H13, six individuals of JO and one individual of TG. Clade III containing H8, H16, and H17 detected in 13 individuals, nine individuals of CB, and the rest, four individuals, distributed in four populations. Clade IV contained only H14 and H15. The remaining haplotypes were mostly distributed in Southwestern China, Central China, Qinling and Dabieshan mountains regions, and formed Clade V (Figure 2). H4 was the most widespread haplotype, found in 19.9% of the individuals and 40% of the populations. The second (H1) and third (H5) most frequent haplotypes were detected in 16.7% and 12.7% of the individuals, respectively. H4 and H1 were internal to most of the other haplotypes. Obviously, H5 was an ancestral haplotype because it was scattered in geographically remote populations (XY, PX, and FN in Southwestern China, and CX, WYGD, and TG in Central China, HSAH and NJ in Eastern China) and located at a center position in the network. In addition, 11 haplotypes (H6, H10, H12, H13, H14, H16, H17, H18, H19, and H20) were private to a single population.

Genetic Diversity and Genetic Structure
Total cpDNA haplotype diversity was considerably high in L. formosana (Ht = 0.909 ± 0.0192). Nevertheless, diversity within populations was relatively low (Hs = 0.323 ± 0.0553). The haplotype diversity (Hd) and nucleotide diversity (π) were 0.88762 and 0.00144, respectively. The population haplotype diversity ranged from 0 to 0.75556, and nucleotide diversity (π) ranged from 0 to 0.00120 (Table 2). Permutation test showed that Nst (0.730) was significantly higher than Gst (0.645; p < 0.05), indicating significant phylogeographical structure across the whole natural range of L. formosana in China. Analysis of molecular variance (AMOVA) ( Table 3) indicated that a large proportion (75.34%) of the chloroplast variation in L. formosana was resided between populations, whereas the component of genetic variation within populations was only 24.66%. The variation of chloroplast variation between populations had reached a significant level (Fst = 0.75, p < 0.01).

Population History Dynamic Analysis and Divergence Time Estimation
The value of Tajima's D (−0.87277, p > 0.10) was not significantly negative. However, the value of Fu's Fs was significantly negative (−2.668, p < 0.05) for these four concatenated sequences. Significant negative values of Fu's Fs may be the result of population expansion. To further investigate the demographic history, mismatch distribution analysis was also plotted (Figure 3). Deviation of observed values from those expected and a strongly unimodal mismatch distribution suggested the sudden spatial or demographic expansion of L. formosana. Neutrality test and the mismatch distribution analysis showed L. formosana experienced an expansion event in history.  (Figures 2 and 4). The divergence time of the latter three groups belongs to the Quaternary period. Five clades could be clearly identified to the phylogenetic relationships of haplotype detected in the BEAST analysis, which had high probability values and correspond closely to the haplotype network ( Figure 2).

Ecological Niche Model
The Biomod2 species distribution model (SDM) performance was assessed using three different statistics: the true skill statistics (TSS), the area under the receiver operating characteristic curve (AUC), and Kappa value. All models performed well (TSS = 0.940, ranged from 0.772 to 0.992; AUC = 0.975, ranged from 0.886 to 0.996; Kappa = 0.937, ranged from 0.772 to 0.968), which indicated good predictive performance ( Figure S2). Based on the potential distributions of the present (1950-2000) of L. formosana, it is widely distributed across three climatic zones: tropical, subtropical, and south temperate regions. North form Qinling and Dabieshan Mountains, south to Hainan, west to Sichuan and Guizhou, and east to Taiwan (18 • -34 • N, 102.1 • -122.7 • E). The predicted distribution under the present was consistent with their existing distribution, indicating that the distribution had reached its maximum range. The predicted distributions are actually almost identical in the present, LGM (21 kya), and LIG (130 kya), except for some peripheral areas where the concentration is concentrated on the edge of the Tibetan area and southern Yunnan and Taiwan regions. Along the distribution of the north rim, especially, the probability of occurrence is very low, and we probably should not draw many conclusions from these areas ( Figure 5). The model results suggest no contraction-expansion dynamics. Instead, this was the overwhelming signal of range stability through time.

Genetic Diversity and Genetic Differentiation of cpDNA Haplotype in L. Formosana
The total cpDNA haplotype diversity (Ht = 0.909) detected in our study was considerably higher than the average value (Ht = 0.67) of 170 species using cpDNA markers reported by Petit [41]. These may be attributed to a long evolutionary history of the tertiary relict plant, which could accumulate more nucleotide mutations [42]. Relatively high chloroplast haplotype diversity was also found in some relict plants, such as Alsophila spinulosa (Ht = 0.93) [43], Dipentodon sinicus (Ht = 0.90) [44], and Tetracentron sinense (Ht = 0.93) [45]. Widespread species or/and hotspots species may be possible explanations for the high chloroplast haplotype diversity. In general, widespread species had higher genetic variation than those of narrowly distributed species, due to wider geographical distribution providing more opportunities for gene isolation, genetic drift, and gene mutation [46].
The higher cpDNA chloroplast genetic variation was inconsistent with the moderate SSR genetic variation reported by a previous study [18]. This may have been caused by pollen and seed mediating gene flow. Nuclear genes, such as SSR markers belonging to the biparental inheritance, reflect the gene flow mediating both pollen and seed. However, cpDNA is maternal inheritance in most plants which reflect part of gene flow only, including seed mediating. The limited seed mediating gene flow of cpDNA markers weakens the homogeneity among populations which might retain the high chloroplast genetic variation. The pattern of seed dispersion could have a significant impact on the genetic differentiation of chloroplasts in the population [47]. Generally speaking, the degree of genetic differentiation of a population is negatively correlated with the transmission distance of seeds. The population differentiation due to seed transmitted by gravity is greater than by wind. Sweetgum is a wind-pollinated species, and seed-dispersing mainly by gravity. In the previous study of L. styraciflua, instead of individually dispersing seeds, many whole sweetgum fruits (each containing up to 50 seeds) were directly located around the seed tree (50 m-100 m) [48].
Total cpDNA haplotype diversity was high (Ht = 0.909) in L. formosana. Nevertheless, diversity within populations was relatively low (Hs = 0.323), probably caused by restricted gene flow. This was also consistent with the lower gene flow (Nm = 0.18) detected in our study. High haplotype diversity may reflect the wide distribution of L. formosana in subtropical China, which had a large and stable population size. Compared with other species of Liquidambar distributed in North America and Turkey (e.g., L. styraciflua [36] and L. orientalis [49]), which experienced a dramatic reduction in the distribution area, L. formosana had no contraction-expansion dynamics and a stable range.

Phylogeographic Structure and Inference of Demographic History
The populations of L. formosana in China had a significant phylogeographical structure (Nst = 0.730 > Gst = 0.645; p < 0.05). However, the mantel test (not listed) showed there was no geographic isolation among populations, indicating that geographical distance was not the main factor causing genetic variation of L. formosana. High genetic differentiation was not completely caused by distance isolation but by natural barriers, such as mountains and rivers [50]. Therefore, another cause for the phylogeographical structure of L. formosana may be the complex topography. For example, extending from east to west are the Qinling and Dabashan mountain and north to south the Dabie mountains. Due to the blocking of the mountains, only limited genes can flow among populations. Fu's Fs value and the mismatch distribution analyses indicated there was population expansion detected in L. formosana ( Figure 3). However, based on the result of SDM, no contraction-expansion dynamics were detected since LIG (130 kya). Therefore, we speculated the expansion time may happen before the period of LIG.
Using the uncorrelated lognormal relaxed-clock model, the divergence time (10.30 Ma; 95% HPD, 9.74-15.28 Ma; Figure 4) of clade V haplotypes and 7.65 Ma (95% HPD: 2.49-12.15) of clade I for L. formosana was an ancient origin of genetic structure, belonging to the middle and late Miocene period. The result indicated that the earliest lead to genetic differentiation was due to the Neogene geology and climate events, as confirmed by the fossil pollen records from Liquidambar. Fossils of Liquidambar fruit, pollen, and leaves have been commonly found in Mid-Cretaceous and Neogene of the Northern Hemisphere [51], and L. formosana has a long fossil record of pollen (about 70 Ma) in the northeast of China [52]. In addition, it maintains a stable range and even increases effective population size through time which is suggested by history dynamic analysis and SDM results (Figures 3 and 5) and has a wide geographical distribution and ecological amplitude which allows it to retain ancient origin [53]. The significant uplift of the Qinghai-Tibet Plateau during the middle and late Miocene (approximately 8-10 Ma) and the intensified Asian monsoons (approximately 8-9 Ma) were regarded as the two main factors driving diversification of plants during Miocene in China [54,55]. Evidence supporting middle and late Miocene diversification of other subtropical woody species has been found in phylogeographic studies, e.g., Ulmus lamellosa 9.27 Ma (5.17-13.33) [56], Tetracentron sinense 9.6 Ma (2.2-27.0) [45], Quercus glauca 9.07 Ma (5.16-13.32) [57]. The divergence time of haplotypes between H11 and H13; H8, H16, and H17; H14 and H15 were 1.13 Ma (95%HPD: 0.02-4.29), 2.10 Ma (95% HPD: 0.10-6.34), and 1.50 Ma (95% HPD: 0.07-4.91) (Figure 4), respectively. Thus, we speculated that Quaternary climatic oscillations and associated series of environmental changes had also caused the diversification within L. formosana, and they are relatively complicated.

Conclusions
In summary, dramatic changes since the Miocene and associated climate cooling changes may have driven the ancestors of L. formosana to retreat from the high latitudes of the Northern Hemisphere to subtropical China in which the establishment and initial intensification of the Asian monsoon provided conditions for their ecological requirements. This scenario was confirmed by the fossil record. We revealed the ancient origin dating back to the middle Miocene (10.30 Ma; 95% HPD, 9.74-15.28 Ma) of L. formosana. SDM results indicated there was no contraction-expansion dynamics and a stable range since the LIG. Compared with the population expansion detected by Fu's Fs value and the mismatch distribution, we speculated the expansion time might happen before LIG.