Assessment of the Genetic Diversity in Forest Tree Populations Using Molecular Markers

Molecular markers have proven to be invaluable tools for assessing plants' genetic resources by improving our understanding with regards to the distribution and the extent of genetic variation within and among species. Recently developed marker technologies allow the uncovering of the extent of the genetic variation in an unprecedented way through increased coverage of the genome. Markers have diverse applications in plant sciences, but certain marker types, due to their inherent characteristics, have also shown their limitations. A combination of diverse marker types is usually recommended to provide an accurate assessment of the extent of intra-and inter-population genetic diversity of naturally distributed plant species on which proper conservation directives for species that are at risk of decline can be issued. Here, specifically, natural populations of forest trees are reviewed by summarizing published reports in terms of the status of genetic variation in the pure species. In general, for outbred forest tree species, the genetic diversity within populations is larger than among populations of the same species, indicative of a negligible local spatial structure. Additionally, as is the case for plants in general, the diversity at the phenotypic level is also much larger than at the marker level, as selectively neutral markers are commonly used to capture the extent of genetic variation. However, more and more, nucleotide diversity within candidate genes underlying adaptive traits are studied for signatures of selection at single sites. This adaptive genetic diversity constitutes important potential for future forest management and conservation purposes.


Introduction
Forests constitute an integral part of the world's ecosystems, and approximately 80,000-100,000 different tree species are estimated to cover 31% of land area globally (Food and Agricultural Organization (FAO), United Nations).Especially in the developing world, people heavily rely on forest tree species for their livelihood, including the supply of wood-based fuels, as well as non-timber based products for nutrition, health and income.Forest trees are largely undomesticated and highly heterozygous, due to their outcrossing breeding systems [1] and, therefore, have large effective population sizes [2].Despite the high number of known species, approximately 450 different forest tree species are actively part of a deliberate domestication process through tree improvement programs (FAO) [3].The majority of the world-wide forests represent natural forests (93%), with 12% dedicated as conservation forests.A major concern regarding forests health and resilience is the declining in forest genetic diversity as documented as early as 1967 (FAO conference).Genetic diversity serves several important purposes: (a) as a resource for tree breeding and improvement programs to develop well-adapted tree species varieties and to enhance the genetic gain for a multitude of useful traits; (b) to ensure the vitality of forests as a whole by their capacity to withstand diverse biotic and abiotic stressors under changing and unpredictable environmental conditions; and (c) the livelihoods of indigenous and local communities that use traditional knowledge.Rich genetic diversity within and among forest tree species thus provides an important basis for maintaining food security and enabling sustainable development (FAO) [3].
Historically, for plant improvement, three major areas have always been important for molecular marker applications: (a) the determination of genetic diversity within and among populations; (b) verification and characterization of genotypes; and (c) marker-assisted selection (MAS) [4].In particular, for forest trees that are outcrossing and largely undomesticated plant species, molecular markers have proven to be invaluable tools with applications in: (1) genetic conservation efforts by identification of genetic diversity hotspots; (2) the assembly of breeding populations in newly developed and advanced breeding programs; (3) the monitoring and characterization of population dynamics and gene flow; (4) the proper delineation of species taxonomy for management issues associated with conservation; (5) assessment of gene flow (pollen contamination) in seed orchards and the authentication of "controlled crossings", the assessment of inbreeding occurrence in breeding programs and studies of mating systems in non-industrial tree species; and (6) genetic fingerprinting in advanced breeding programs for the purpose of quality control to detect misidentified ramets in production and breeding populations [4].Although tree breeding programs would significantly benefit from an early selection of clones with advantageous trait characteristics (particularly important for late-expressing wood quality traits, [5]), MAS was deemed not feasible for forest trees with limited genetic marker coverage [6].The main reasons for the infeasibility of MAS as a tool for forest tree improvement are the inherent characteristics specific of forest trees as compared to inbred agricultural crop plants, such as the polygenic nature of most of the economically important traits in forestry, the inconsistency in quantitative trait locus (QTL) marker linkages among families originating from large outcrossed breeding populations and the instability of QTLs from the same genetic material planted across different sites, due to strong genotype-by-environment (G × E) interactions.As highly efficient next generation SNP (single nucleotide polymorphism) genotyping platforms have become available, genome-wide selection approaches have become feasible for accelerating forest tree breeding [7,8].Here, we review the status of genetic variability in forest trees as assessed by molecular markers.

Marker Types and Their Applications
We begin our review with regards to the genetic diversity in forest tree species with a brief historical retrospect concerning the development of marker types that have been widely employed for studying genetic variability in plants in general.The first, while the most easily accessible types of plant characteristics, are morphological markers that can easily be monitored based on simple inheritance [9].However, due to serious drawbacks with respect to dominance, the difficulty of distinguishing between multiple alleles or even between different loci [10,11] and trait expression due to environmental and developmental variation (G × E interaction), their use was substantially reduced with the advent of DNA marker technologies.Another marker type that played an important role in assessing genetic diversity in plants was isozymes [12,13].Isozymes had a long history in genetic variability studies in forestry, to assess the genetic diversity present within natural forest stands [14,15] or to determine whether domestication practices had led to a reduction in diversity [16][17][18].However, the problem of these biochemical marker assays is that they are affected by plant phenological stage and their limited availability, and therefore, they would never allow for a genome-wide scan of variability (as only 0.1% of the total variation is detectable by this technique, [19]).An invaluable alternative offered DNA-based markers, such as restriction fragment length polymorphism (RFLPs) [20][21][22].Finally, the possibility to rapidly amplify specific DNA fragments in vitro via polymerase chain reaction (PCR) [23] revolutionized the generation of molecular markers, leading to diverse sets of diagnostic DNA-marker systems with or without a priori sequence knowledge, such as random amplified polymorphic DNA (a.k.a RAPD) [24], amplified fragment length polymorphism (a.k.a AFLP) [25], simple sequence repeats (a.k.a SSRs or microsatellites) [26], single nucleotide polymorphisms (a.k.a SNPs) [27,28] and variations thereof [29,30].Important issues are related to the reproducibility of the RAPD marker system [31], other limitations, such as the presence of null alleles in the case of SSR assays that may underestimate heterozygosity [32], or the dominance nature of the RAPD and AFLP marker systems, where heterozygous individuals cannot be distinguished from homozygous ones, and lastly, the inexpensive generation of a vast abundance of highly polymorphic DNA markers to tackle genome-wide genetic diversity studies.Dependent on the study focus, genetic markers were derived from nuclear or organelle sequences; for example, chloroplast-or mitochondrial-derived diagnostic markers [33][34][35], dependent on the evidence of their maternal inheritance in the species, were used to trace back the colonization history of angiosperm forest tree species and conifers, respectively [36,37].Although it has been known that variability within protein-coding regions is far less than within non-coding genomic regions, due to lower mutation rates and purifying selection to maintain proper protein functions, the study of polymorphic sites within coding sequences has been deemed more relevant because of their putative functional associations and, in addition, the ease of their interspecific transferability for comparative genetic studies based on sequence conservation.Thus, a major focus in plant studies has been the development of genetic markers prevalently present within such coding regions for high-throughput analysis of many samples using the inexpensive detection method of PCR fragment length polymorphisms (e.g., eco-tilling to circumvent expensive Sanger resequencing of PCR products, as in the case of SNP detection and genotyping [38]), but that still relied on laborious PCR optimizations (e.g., [39][40][41][42]).The substantial and almost exponential drop in whole genome sequencing costs, thanks to the 1000 Human Genome Project, which has stimulated the development of highly cost-efficient high-throughput technologies, has also provided for the plant research community unprecedented opportunities for affordable in-depth characterization of plant genomes that has involved the genome-wide discovery of SSRs and SNPs and the detection of common, as well as rare functional variants by next generation sequencing [43][44][45][46][47][48][49].

Assessment of Genetic Diversity
A number of evolutionary processes can impact the genetic diversity of natural populations.These are: (a) spontaneously arising mutations; (b) gene flow via migration; (c) inbreeding; (d) natural selection; (e) the Wahlund effect; and (f) random genetic drift [50].Genetic drift introduces random changes in allele frequencies over generations and becomes important for finite population samples and/or a large number of generations.These random allele frequency changes can, over time, lead to allele fixation or extinction.By all means, genetic drift represents a source of differences in genetic diversity among different populations.On the other hand, gene flow evens out among-population genetic differences, but increases genetic variation within populations, due to the introduction of new alleles.Selection influences within-population diversity, but the effects are dependent on the nature of these selection processes (balancing selection).Furthermore, the effects of natural selection are interwoven with stochastic effects, such as genetic drift.Mutations can counterbalance the loss of allelic diversity; however, natural mutations are rare, and such mutations that turn out to be harmful allelic variants are again removed by purifying selection.The occurrence of a population bottleneck causes a significant reduction in the effective population size and represents a major reason for the loss in allelic diversity, first by the loss of rare alleles, then by the successive loss of heterozygosity in the population [50].Inbreeding and the presence of a subpopulation structure, where gene flow is prevented by habitat fragmentation (the Wahlund effect), both cause the loss in heterozygosity [50].This, in turn, results in increased genetic diversity among populations.

Within-Population Genetic Variation Using Genotype Data
A gene is defined as polymorphic in the population when its most common allele is less frequent than 95% [50].Genetic diversity can be assessed by estimating the following parameters: the total number of different alleles in the population, the percentage of polymorphic loci, the mean number of alleles per locus, the allelic richness, the within-population genetic diversity, θ, the effective population size, N e (i.e., θ divided by the per-generation mutation rate), the minor allele frequency (as in the case of biallelic loci), the proportion of heterozygous individuals in the population for a given locus (the expected heterozygosity, (H E ; based on the Hardy-Weinberg expectations that assume the random mating of genotypes), as well as the observed heterozygosity (H O ) and the fixation index, F [50].Genomic diversity is estimated by genome-wide assessment of genetic diversity using a larger sample of loci at random.An estimate of the genome-wide genetic diversity in a population is then derived by averaging heterozygosity over the multitude of studied loci.

Between-/Among-Population Genetic Variation Using Genotype Data
Differences in the genetic diversity between/among (sub-)populations are assessed based on the presence of significant allele frequency differences; widely applied metrics to estimate such "genetic differentiation" include, for example, F ST [51,52], θ [53], R ST [54], Φ ST (Φ′ ST ) [55,56], G ST (G′ ST ) [57,58], D ST [57], H ST [59] or D [60].Some measures are marker-dependent; they are based on the assumption of infinite-allele or stepwise mutation models, respectively, and depending on whether biallelic or multi-allelic molecular markers or haplotype data were used in the analysis (F ST ; R ST ; Φ ST ).Moreover, the use of fixation measures for result interpretation with regards to genetic differentiation has been found to be problematic when the populations under study exhibited high genetic diversity/heterozygosity (cf.G ST ; [58,60]).For such cases, "standardized" genetic differentiation metrics' have been suggested ([56,58,60]); but, see also the recent publication on the topic by Whitlock [61], who emphasized the continuous use of F ST for intra-specific differentiation estimation when the mutation rate is small (relative to gene flow), while emphasizing the use of Φ ST and R ST when the mutation rate is high (as in the case of SSRs).In any case, for the estimation of population divergence from genotypic data, freely available software packages within the R environment [62] that have these statistics implemented are readily available (cf. "mmod").Furthermore, genetic loci with allelic frequencies significantly different among populations and potentially under selection ("F ST outlier loci") can be efficiently detected using multilocus scans that compare the patterns of nucleotide diversity and genetic differentiation (based on the distribution of empirical F ST estimates conditioned on H E ) to the simulated genome-wide selectively-neutral genetic background [63,64].

Sequence Divergence Using Sequence Alignment Data
Other and additional ways to look at genetic diversity and study mutation and selection events within populations and by comparing different populations involve the characterization of DNA sequences of genes and the diversity of nucleotides as the specific study entities [65][66][67][68].Widely used tests include nucleotide diversity π [50], the estimation of Tajima's D, Fu's D, Fay and Wu's H, Zeng et al.'s E [69][70][71][72] and the McDonald-Kreitman and HKA (Hudson-Kreitman-Aguade) tests [73,74], respectively.Such tests are implemented in the freely available software package, DnaSP [75].The combination of results from such analyses has particular value for identifying past population size changes (population expansion or population bottleneck).

Forest Tree Population Diversity
One of the first comprehensive reviews on genetic diversity with regards to forest tree populations was published by Hamrick and co-workers [76].This early work summarized results based on isozymes and is especially valuable, as it compares long-lived forest trees with other life forms of plant species, in total comprising 662 different species with representatively high sample sizes for the analysis of the genetic diversity parameters.Long-lived, woody species showed the highest genetic diversity (including a significantly higher percentage of polymorphic loci and more alleles per locus) among all plant species.Specifically, the genetic diversity within populations was significantly the highest (H E = 0.15) compared to all other plant life forms (H E < 0.10).However, heterogeneity in genetic diversity exists among woody species taxa, and this is due to the different evolutionary histories of species.For example, species from smaller founder populations, small disjunct populations or those with past population bottlenecks show generally less genetic diversity.Alseis blackiana, Picea glauca, Robinia pseudoacacia and Pinus sylvestris showed high diversity.On the other side of the spectrum were Acacia mangium, Pinus resinosa, P. torreyana and Populus balsamea with very low diversity [76].Other studies [77,78] identified additional species with low intra-population diversity: Ficus carica and Thuja plicata.
While most studies identified high intra-population variation, by contrast, the diversity among populations of long-lived, woody tree species based on the G ST estimate was significantly the lowest (G ST = 0.08) compared to the herbaceous and annual life forms (G ST > 0.25) [76].When woody angiosperms were compared to gymnosperms in terms of their intra-population genetic diversity, differences were not significant, yet the latter exhibited a significantly higher percentage of polymorphic allozyme loci, suggestive of a higher proportion of low frequency alleles in gymnosperm species [76].Angiosperm species showed higher among-population genetic diversity (G ST ).Recent research on the conifer genome evolution, which involved orthologous coding sequence alignments for thousands of gymnosperms and angiosperm orthologous coding sequences, respectively, showed, more specifically, an overrepresentation of non-synonymous substitutions in protein-coding genes for conifers compared to angiosperms [79], while the average synonymous mutation rate in angiosperms is significantly higher, suggestive of a higher number of fixed adaptive mutations in conifers.As expected, the extent of the geographical range had a significant impact on genetic diversity within species and among populations [76].Geographically widespread species showed a significantly higher intra-population genetic diversity estimate compared to locally confined species, but the latter showed higher genetic diversity among populations [76].However, the "non-significant" inter-population differentiation sometimes reported in these isozyme studies (see above) can mislead the directions of conservation efforts.Other marker types, those that are able to cover a higher portion of the overall genetic variation (such as restriction fragment length polymorphisms of DNA) succeeded in uncovering significant among-population diversity in Pinus and Quercus, specifically with the application of organellar DNA markers (cf. [80,81]).Differing outcomes for isozymes and organellar DNA studies on population divergence are frequent and were even reported within the same sample as for Argania spinosa (L.) Skeels, an important multi-purpose tree in the Moroccan local community [82].It is also clear that variation at selectively neutral molecular markers commonly used to assess genetic diversity within or among populations may not covary with the phenotypic expression of a particular qualitative or quantitative trait of interest [29], such that population differentiation for adaptive traits (growth, morphology or fitness) is much higher than for isozymes, for example.In any case, the total allelic richness was identified as a more adequate directive than the H E estimate for conservation purposes, and marker types, such as SSRs or DNA sequence-based data, that are highly polymorphic are required for an accurate estimate [82].A recent study integrating molecular genetic analysis based on four SSR and five sequence loci along with climate modeling [83] forecasted the long-term decline of the late-successional Australian rainforest conifer, Podocarpus elatus, in its southern populations, due to habitat fragmentation (and the decline in N e ), for which conservation strategies are now invoked.Isozyme markers (15 loci) were used to characterize the genetic diversity of Carapa procera, which occurs in low density within a tropical rain forest [15].Its characteristics were high within-population diversity (comparable to temperate gymnosperms), high heterozygosity and a lack of spatial structure consistent with the highly outcrossing nature of the species, leading to extensive pollen-mediated gene flow that prevented local genetic differentiation.When 63 SNP polymorphisms (surveyed by eco-tilling) in nine different genes with broad functional properties were targeted as a feature for understanding DNA variation in 41 wild populations of a small western black cottonwood (P.trichocarpa) sample panel [40], it was found that heterozygosity was high (H O = 0.47) and that overall nucleotide diversity at the gene level (π = 0.0018) among populations was low.Similarly, low average π values of the segregating sites were obtained for other forest tree species, such as P. nigra (π = 0.0024; [84]) and Pinus sylvestris (π = 0.0025; [85]).Much higher overall nucleotide diversity levels in a conifer were uncovered for P. taeda (π = 0.00398; [86]).Among the studied poplars, interestingly, the European species, P. tremula, showed the highest nucleotide diversity (π = 0.007 [87] or even π = 0.0111 [88], dependent on the surveyed genes), but differences in diversity were also consistent with its different and complex demographic history.However, nucleotide diversity is best interpreted on a gene-by-gene basis, as population history and selection affect these mutation rates more specifically [40,89].In a similar context, assessing the adaptive genetic diversity in forest trees is important to harness this adaptive potential for future forest management and conservation purposes [90].Candidate genes underlying a specific trait of interest are typically selected (cf.nine candidate genes for bud burst in Quercus petraea: π = 0.00615 [91]; 121 candidate genes for cold hardiness in Pseudotsuga menziesii var.menziesii: π = 0.004 [92]; 13 candidate genes for drought stress in Pinus pinaster π = 0.00548 [64]).While most of this detected variation was largely attributed to purifying selection (an excess of nucleotide diversity at synonymous vs. non-synonymous sites), as commonly observed in forest trees, patterns of strong diversifying selection in candidate genes were also uncovered [64].

Conclusions
This review summarizes the major molecular marker types that have been developed to replace the more problematic phenotypic markers in plants used at the infancy of genetic diversity studies.While we touched on their most important applications and showed how broad such applications have been, here we focused specifically on forest genetic diversity studies.We emphasize that integrative approaches using future climate modeling have been very successful in uncovering potential threats of declines of the genetic diversity and the distribution of forest tree species, so that timely precautions to preserve the species can be undertaken.Associated with the substantial drop in whole genome sequencing costs making the sequencing of genetically complex organisms more affordable, inventorying the complete portfolio of genetic resources has become feasible.This will also open new avenues for the conservation of previously marginalized and undervalued forest tree species that are considered of less economic value, but nevertheless represent value to the local ecosystems.While the present review focused primarily on the genetic diversity assessed for pure species, we also stress the importance of investigating natural species hybrid zones as important sources of population genetic diversity in forest tree management [93,94].Finally, we stress the value of integrating knowledge on adaptive complex traits as a companion to molecular markers for making informative management and conservation decisions.