Genetic Diversity and Structure of Japanese Endemic Genus Thujopsis (Cupressaceae) Using EST-SSR Markers

: The genus Thujopsis (Cupressaceae) comprises monoecious coniferous trees endemic to Japan. This genus includes two varieties: Thujopsis dolabrata (L.f.) Siebold et Zucc. var. dolabrata (southern variety, Td) and Thujopsis dolabrata (L.f.) Siebold et Zucc. var. hondae Makino (northern variety, Th). The aim of this study is to understand the phylogeographic and genetic population relationships of the genus Thujopsis for the conservation of genetic resources and future breeding. A total of 609 trees from 22 populations were sampled, including six populations from the Td distribution range and 16 populations from the Th distribution range. The genotyping results for 19 expressed sequence tag (EST)-based simple sequence repeat (SSR) markers, followed by a structure analysis, neighbor-joining tree creation, an analysis of molecular variance (AMOVA), and hierarchical F statistics, supported the existence of two genetic clusters related to the distribution regions of the Td and Th varieties. The two variants, Td and Th, could be deﬁned by their provenance, in spite of the ambiguous morphological di ﬀ erences between the varieties. The distribution ranges of both variants, which have been deﬁned from their morphology, was conﬁrmed by genetic analysis. The Th populations exhibited relatively uniform genetic diversity, most likely because Th refugia in the glacial period were scattered throughout their current distribution area. On the other hand, there was a tendency for Td’s genetic diversity to decrease from central to southern Honshu island. Notably, the structure analysis and neighbor-joining tree suggest the hybridization of the two varieties in the contact zone. More detailed studies of the genetic structure of Td are required in future analyses.


Introduction
Conifers are a dominant plant type found in the vast boreal forests of the North American and Eurasian continents. They represent an important forest resource in many countries because of their superior wood properties, including straighter trunks and stronger yet lighter wood compared to those of most angiosperms [1]. Generally, wild populations of trees are an important genetic resource and are required for breeding programs that aid in the selection of new plant varieties suitable for withstanding a range of environmental conditions, diseases, and future climate change [2][3][4]. A principal requirement for conserving forest genetic resources is maintaining the genetic diversity within and among the populations of a species [5]. Phylogeographic and population genetic studies using neutral molecular  Table  1.   Table 1.
Higuchi et al. [11] suggested that seven natural populations from the Th distribution region, as well as other Japanese conifers distributed over a wide area, showed a relatively low F ST (0.046). The two populations that were located in the marginal part of the Th distribution region and the five populations in the northern part of the distribution region were genetically different in the neighbor-joining tree and structure analysis [11]. Ikeda et al. [12] also analyzed the genetic structure of natural forests using populations distributed over almost the same geographical area as studied by Higuchi et al. [11]. They suggested that the distribution area of Th included four regional groups (Hokkaido and Aomori prefectures, Iwate and Yamagata prefectures, Niigata, and Ishikawa prefectures) of natural populations [12]. These findings are important for illustrating the genetic structure of Th; Forests 2020, 11, 935 3 of 16 however, they did not include Td populations. Therefore, a study examining both Td and Th is needed to determine the extent to which these two varieties are genetically distinct. The aim of this study was to examine the genetic structure of the genus Thujopsis and to determine the relative contributions of the genetic structure between varieties. In the present study, simple sequence repeat (SSR) markers, which were representative neutral and co-dominant genetic markers, were used in order to provide basic information that would be useful in the conservation of Thujopsis genetic resources in natural forests.

Sampling and Study Sites
A total of 609 trees from 22 populations, including 6 populations from the Td distribution range and 16 populations from the Th distribution range, were sampled (Table 1; Figure 1b). Fresh needles were collected and stored at −30 • C. The sampled trees were each separated by more than 30 m. Sample sizes varied among the populations (Table 1). This variation reflects the sampling of large individuals (older trees) from some populations. Moreover, sampling was not attempted in dangerous areas where the topography was too steep.

DNA Extraction and Genotyping
Total genomic DNA was extracted from 100-mg needles using the hexadecyltrimethylammonium bromide (CTAB) method [13] with minor modifications. The genotypes of the sample trees were determined for 19 markers that were developed from expressed sequence tag (EST)-based simple sequence repeat (SSR) markers for the genus Thujopsis [14] (Table 2). In addition, universal primers were attached to each forward primer to efficiently incorporate fluorescent dyes during PCR for multiplexing [15]. The PCR was performed in a volume of 10.0 µL containing 10-120 ng of template DNA, 0.15 µM of forward primer, 0.5 µM reverse primer, 0.2 µM of either one of the tail primers fluorescently labeled by FAM, VIC, or NED, and 5.0 µL of Go Taq Master mix (Promega Corporation, Madison, MI, USA). A GeneAmp PCR System 9700 thermal cycler (Applied Biosystems, Foster City, CA, USA) was used (Applied Biosystems, Foster City, CA, USA) with the following thermal profile: initial denaturation at 94 • C for 2 min, followed by 30 cycles of denaturation at 94 • C for 30 s; an annealing temperature of 60 • C for 30 s, and an extension at 72 • C for 30 s; then a final extension at 72 • C for 5 min. The amplified PCR products of each individual were classified into five groups according to the fragment size and the type of fluorescent marker (Table A1). Each group of PCR products was separated by capillary electrophoresis using a 3100 Genetic Analyzer (Applied Biosystems), and genotypes were scored with Geneious 7.0.4 software (Biomatters Ltd., Auckland, New Zealand).

Genetic Diversity within Populations
The total number of detected alleles (TA), the observed heterozygosity (H o ), the total gene diversity (H T ), and Wright's inbreeding coefficient (F IS ) were calculated at each locus using FSTAT software version 2.9.4 [16].
The total number of detected alleles (allele number), the observed (H o ) and expected heterozygosity (H e ), and the number of private alleles were calculated for each population using GenAlEx 6.51b2 [17].
The allelic richness was standardized based on 12 individuals (24 gene copies), which was the smallest sample size among the populations. Fixation index values estimating inbreeding within individuals in a population (F IS ) were calculated. The significance of positive or negative values of F IS was tested based on 8360 randomizations with a Bonferroni correction. These calculations were performed using FSTAT [16]. The statistical independence of loci (linkage disequilibrium for all pairs of loci across populations) was also evaluated using FSTAT.
To compare genetic differentiation among populations and between loci, the relative genetic differentiation among populations defined under the infinite allele model (IAM; F ST ) [18] and the stepwise mutation model (SMM; R ST ) [19] were calculated using SPAGeDi version 1.5 [20]. The genetic differentiation coefficients (G ST ; analogous to F ST ) and a standardized measure, which had a range of 0-1 for all levels of genetic diversity (G' ST ), were calculated based on the allele frequencies for each locus using GenAlEx [21,22]. The significance of the deviations of F ST , R ST , G ST and G' ST (from zero) was evaluated by permutation tests.
The likelihood of a bottleneck within each population was examined using the two-phase model with 95% single-step mutations and 5% multi-step mutations with 1000 iterations, implemented in the program BOTTLENECK version 1.2.02 [23]. The one-tailed Wilcoxon test was used to detect an excess of expected heterozygosity (H e ) compared to that expected under mutation-drift equilibrium (H EQ ).

Genetic Structures among Populations and Distribution Regions
The presence of isolation-by-distance patterns in population differentiation was investigated by applying the Mantel test to the pairwise relationship between the geographic distances (transformed to natural logarithms) and genetic distances, F ST /(1 − F ST ), between the populations according to Rousset's method [24]. Comparison analyses for Td population vs. Td population, Th population vs. Th population, and the species as a whole were performed, in order to separately examine the effect of isolation-by-distance within each distribution region and within the species. These calculations were performed using SPAGeDi version 1.5 [20].
The genetic relationships among the populations were evaluated by constructing a neighbor-joining tree [29] using Poptree2 web [30,31]. Nei's chord distance (D a ) [32] was used to estimate the degree of genetic divergence of the populations. The node significances of the trees were evaluated using bootstrap probabilities based on 1000 replicates.
The Bayesian approach, which infers population structure and assigns individuals into clusters, was used, implemented in STRUCTURE version 2.3.4 software [33]. We performed 10 runs for each value of K (number of putative populations) from 1 to 10, and employed the Markov chain method with 100,000 iterations (burn-in) and 100,000 Markov chain Monte Carlo repetitions. The simulation was performed under the admixture model with correlated allele frequencies (default parameters). Then the most appropriate cluster number (K) was selected using the criterion from Evanno et al. [34], which is based on ∆K. To choose K and identify sets of highly similar runs in multiple independent runs at a single K value, we used CLUMPAK software [35].

Genetic Diversity across All Populations
The total number of detected alleles for all populations at each locus ranged from five to 32, with an average value of 14.3 (Table A2). On average, over all loci, the observed heterozygosity (H o ) and gene diversity in the total population (H T ) were 0.621 and 0.691, respectively. The inbreeding coefficient (F IS ) ranged from −0.056 to 0.096. At all loci, the F IS values deviated significantly from zero.
The ranges of genetic diversity within the populations were 5.5 to 8.7 for allele number, 4.20 to 6.55 for allelic richness, 0.496 to 0.746 for H o , and 0.500 to 0.682 for H e ( Table 2). The average genetic diversities for Td and Th were 6.42 and 7.17 for allele number, 4.93 and 5.93 for allelic richness, 0.545 and 0.650 for H o , and 0.551 and 0.642 for H e , respectively. Private alleles were found in 18 populations, with a maximum value of five. There were four populations without private alleles. In all populations, observed heterozygosity was not significantly different from that expected for Hardy-Weinberg equilibrium. No evidence of significant linkage disequilibrium was detected in any of the total of 3420 tests for linkage disequilibrium between loci in the populations. The genetic differentiation among population parameters F ST , R ST , G ST and G' ST varied among loci, ranging from 0.037 to 0.238, from 0.021 to 0.289, from 0.031 to 0.204, and from 0.098 to 0.502, respectively (Table A2). The genetic differentiation between populations over all loci (F ST , R ST , G ST and G' ST ) was 0.105, 0.096, 0.088, and 0.246, respectively. All these measures were significantly different from zero at all loci.
No evidence of a recent bottleneck was found (i.e., no H e excess compared to H EQ ) in the genotypes of all the populations.

Genetic Structure among Populations
All of the F ST /(1 − F ST ) values between pairs of populations calculated for the three categories were significantly related to the natural logarithms of the geographic distances between them ( Figure A2). The AMOVA suggested that the majority of the variation existed within population (Within Pop, 5.889, p < 0.001, Table 3). The results of the AMOVA and hierarchical F statistics revealed a highly significant genetic divergence between the two distribution regions (Among Region, 0.606; F Region/Total = 0.088; p < 0.001). The genetic differentiation was highly significant among populations, whereas the contribution of variety was relatively low between the two distribution regions (Among Pop, 0.393; F Pop/Region = 0.062; p < 0.001). The neighbor-joining tree based on D a distance reflected the geographical locations of the populations for the most part ( Figure 2). However, genetic relationships between two distribution regions (Td and Th) and two intermediate populations (Minakami (MK) and Nikko (NK)) were not reliably supported because of low bootstrap values.
According to clustering analysis in STRUCTURE, the cluster number (K) was 2 ( Figure A3). The distribution of populations for the two clusters was found to be geographically structured between the populations along the distribution regions ( Figure 3). The proportion of cluster 1 was much higher in the Td distribution region, while that of cluster 2 was much higher in the Th region. Although the MK and NK populations were located in the Td region, individuals of these two populations belonged to both clusters 1 and 2.
reliably supported because of low bootstrap values.
According to clustering analysis in STRUCTURE, the cluster number (K) was 2 ( Figure A3). The distribution of populations for the two clusters was found to be geographically structured between the populations along the distribution regions ( Figure 3). The proportion of cluster 1 was much higher in the Td distribution region, while that of cluster 2 was much higher in the Th region. Although the MK and NK populations were located in the Td region, individuals of these two populations belonged to both clusters 1 and 2.

Genetic Diversity at EST-SSR in Thujopsis
In the present study, 19 EST-SSR loci were used to estimate population genetic diversity and to investigate the genetic structure in 22 natural populations of the genus Thujopsis. The EST-SSR polymorphisms of the genus Thujopsis retained a nearly equal or slightly higher diversity compared MK and NK populations were located in the Td region, individuals of these two populations belonged to both clusters 1 and 2.

Genetic Diversity at EST-SSR in Thujopsis
In the present study, 19 EST-SSR loci were used to estimate population genetic diversity and to investigate the genetic structure in 22 natural populations of the genus Thujopsis. The EST-SSR polymorphisms of the genus Thujopsis retained a nearly equal or slightly higher diversity compared

Genetic Diversity at EST-SSR in Thujopsis
In the present study, 19 EST-SSR loci were used to estimate population genetic diversity and to investigate the genetic structure in 22 natural populations of the genus Thujopsis. The EST-SSR polymorphisms of the genus Thujopsis retained a nearly equal or slightly higher diversity compared to other Cupressaceae species (Table 2) [36][37][38]. In general, EST-SSR markers have a lower polymorphism than nuclear SSR markers [39,40]. Therefore, the EST-SSR marker loci used in the present study showed lower levels of polymorphism than the previous research on Th [11] and other conifers that are widely distributed in Japan [4,41,42].
A significant and relatively high value of overall population differentiation was found among the populations we examined (Table A2;   between populations because of habitat discontinuity [50]. Therefore, it is likely the case that the two populations were not completely genetically isolated, although we identified relatively high values for the population differentiation measures in the genus Thujopsis.

Comparison of Genetic Structure between Td and Th
The population relationships and genetic structures for the genus Thujopsis were analyzed, focusing on the relationship between the two variants. Structure analysis supported the existence of two genetic clusters related to the distribution regions (Figure 1a), i.e., the Td and Th varieties (Figure 3). These clusters were significantly differentiated based on AMOVA and hierarchical F statistics ( Table 3). The neighbor-joining tree also supported these results, according to the high (100%) bootstrap probability of branches between the KM population (No. 14, Th) and the MK population (No. 17, Td) ( Figure 2). Therefore, the two variants, Td and Th, could be defined by their provenance, in spite of the ambiguous morphological differences between these varieties.
The average values of allelic richness, H o and H e , were relatively lower in Td than Th. Two factors may have contributed to the decline of genetic diversity in Td. First, demographic factors, such as postglacial colonization and a history of human overexploitation, could have played a role. If the refugia of the species were restricted to the southern region, a postglacial rapid expansion to northern regions would be expected to cause a series of founding events that would lead to a loss of alleles and homozygosity [51]. Similarly, tree species that have experienced population declines due to human overexploitation may show low genetic diversity and genetic bottlenecks [52][53][54]. However, no significant bottlenecks were detected in the population in the present study. This suggests that the low genetic diversity exhibited by each of the Td populations was probably caused by the natural characteristics of this variety, or other factors.
Since the Japanese Archipelago extends in a narrow arc from northeast to southwest, with the various mountain ranges probably acting as physical barriers, temperate plant species have generally migrated along the Pacific side, the Sea of Japan side, or the mountain slopes of the Archipelago. Thus, plants species migrated either southwards along the coasts or to lower altitudes into refugia during glacial periods, and expanded either northwards or to higher altitudes during interglacial periods [55]. In fact, many plant species distributed in Japan exhibit genetic divergence between the Pacific side and the Sea of Japan side (e.g., Fagus crenata, [56]; Kalopanax septemlobus, [36]). Additionally, as many tree species exhibit a long generation time, it is likely that not many generations have elapsed since the initial postglacial colonization. As a result, there has been less opportunity for genetic drift, and the large size of many plant populations could fossilize the genetic structure established at the time of colonization [57]. In the case of conifers, Kimura et al. [58] identified clear genetic divergence between two and four gene pools in Cryptomeria japonica. Two gene pools were distributed along the Sea of Japan side and along the Pacific Ocean side, while four gene pools suggested the potential of northern cryptic refugia and/or the potential of admixture events from several refugia between populations in the northern Tohoku district and an isolated gene pool on Yakushima Island (south of Kyushu district). As an example of fossilized genetic structures, Tsuda and Ide [59] suggested that the populations of Betula maximowicziana could be divided into a southern group (Central Honshu island) and a northern group (Hokkaido and Tohoku district) that originated from different refugia. They detected significant bottlenecks that may have been caused by processes of postglacial colonization and the species' characteristics and/or life history as a long-lived pioneer tree species. However, the present study indicates that the relatively high genetic differentiation of the genus Thujopsis did not fit these frequent patterns of genetic differentiation. The Td and Th varieties were distributed in the southern and northern parts of the Japanese Archipelago. No evidence of a genetic structure between the Sea of Japan side and Pacific Ocean side was found in this study. Th has similar diversity among populations, with relatively uniform values of allelic richness, H o and H e . On the other hand, there was a tendency for allelic richness, H o and H e , to decrease from the Minakami and Nikko groups to the Obitani group in Td. Structure analysis, and the locations of the Minakami and Nikko populations in the neighbor-joining tree with low node significances, suggested that these two populations may contain Td and Th hybridization. In this case, the reason for the high genetic diversity in the Minakami and Nikko populations could be due to hybridization. The remaining four populations (Kiso, Kuraiyama, Toyo-oka, and Obitani) most likely represented pure Td populations, but we found no evidence of refugia in any of these.
Comparing the current distribution of Th with that of Cr. japonica, Aoyama [60] indicated that Th is more drought tolerant and can survive in colder climates. These physiological characteristics may have allowed Th to form refugia in southern Hokkaido and the areas below 500 m elevation in the Tohoku district during the last glacial period [60]. In the case of Th, refugia may have been scattered throughout the Tohoku district, which is the approximate current distribution range of Th. The relatively uniform values of allelic richness, H o and H e , among the Th populations could be explained by this hypothesis. However, additional populations must be examined to fully understand the characteristics of Td.

Contributions of the Breeding Program for the Genus Thujopsis
Both varieties, Td and Th, are essential elements of natural forests in Japan, and logs from natural forests have historically been used for timber and other wooden materials. However, at present, Th is more important than Td in forestry. Plantations of Th are active mainly in Aomori, Niigata, and Ishikawa Prefectures (see Table 1) [10]. In Aomori, the demand for Th has recently increased. As a result, seed orchards were established in the beginning of 2003 [61]. In Niigata, plus trees on Sado Island were selected in 1989, and a seed orchard was established on Honshu island in 2009 [62]. Plantations in Ishikawa were established using rooted cuttings or saplings created by layering, and the majority of trees in these plantations are clones of 14 Th cutting/saplings [10]. In the Niigata and Ishikawa plantations, particular attention to interactions between natural populations through pollen flow and hybridization with Td is required. Th is found on Sado Island, and both Td and Th are distributed in the closest area on Honshu island (Figure 1). Similarly, the Suzu (SZ) population in Ishikawa, the present study site, is classified as Th; however, Td is distributed in neighboring areas. This study showed that these varieties are genetically different, and there is a risk that the seed orchard in Niigata may unintentionally produce seeds derived from hybridization between the variants. Furthermore, Th plantations in Ishikawa might risk introducing pollen of different variants into the surrounding natural Td forests. Since the breeding program of Th in Niigata and Ishikawa started relatively recently, the results of the present study could be useful for ongoing and future breeding programs, especially for the proper development and maintenance of seed orchards.
The results of the present study suggest that Td and Th can be distinguished by EST-SSR. As mentioned previously, distinguishing among varieties may be an important task in the future for breeding within the genus, Thujopsis. Several loci containing alleles with characteristic frequencies for Td and Th, respectively, were identified (Table A3). A combination of four loci, Tdest24, 39, 42, and 56, maintain a 100% bootstrap probability between varieties when constructing a neighbor-joining tree. Therefore, an analysis of these four loci is sufficient for the identification of the varieties.

Conclusions
Evidence from EST-SSR markers suggested that the two variants of the genus Thujopsis, Thujopsis dolabrata (L.f.) Siebold et Zucc. var. dolabrata (Td) and Thujopsis dolabrata (L.f.) Siebold et Zucc. var. hondae Makino (Th), were clearly distinct, as assessed using structure analysis and a neighbor-joining tree. Using these techniques, the two varieties could be defined by their provenance, in spite of the ambiguous morphological differences between them. The relatively uniform values of genetic diversity among the Th populations suggest that refugia of Th may have been scattered throughout the Tohoku district. On the other hand, there was a tendency for genetic diversity to decrease from central to southern Honshu island in Td. Structure analysis and the neighbor-joining