Use of Genomic Resources to Assess Adaptive Divergence and Introgression in Oaks

: Adaptive divergence is widely accepted as a contributor to speciation and the maintenance of species integrity. However, the mechanisms leading to reproductive isolation, the genes involved in adaptive divergence, and the traits that shape the adaptation of wild species to changes in climate are still largely unknown. In studying the role of ecological interactions and environment-driven selection, trees have emerged as potential model organisms because of their longevity and large genetic diversity, especially in natural habitats. Due to recurrent gene ﬂow among species with different ecological preferences, oaks arose as early as the 1970s as a model for understanding how speciation can occur in the face of interspeciﬁc gene ﬂow, and what we mean by “species” when geographically and genomically heterogeneous introgression seems to undermine species’ genetic coherence. In this review, we provide an overview of recent research into the genomic underpinnings of adaptive divergence and maintenance of species integrity in oaks in the face of gene ﬂow. We review genomic and analytical tools instrumental to better understanding mechanisms leading to reproductive isolation and environment-driven adaptive introgression in oaks. We review evidence that oak species are genomically coherent entities, focusing on sympatric populations with ongoing gene ﬂow, and discuss evidence for and hypotheses regarding genetic mechanisms linking adaptive divergence and reproductive isolation. As the evolution of drought- and freezing-tolerance have been key to the parallel diversiﬁcation of oaks, we investigate the question of whether the same or a similar set of genes are involved in adaptive divergence for drought and stress tolerance across different taxa and sections. Finally, we propose potential future research directions on the role of hybridization and adaptive introgression in adaptation to climate change.


Introduction
Darwin's [1] account of the role of natural selection in speciation is arguably the most unifying principle in evolutionary biology. Yet remarkably, the mechanisms by which it may lead to reproductive isolation in the face of ongoing gene flow and the affected genes are still largely unknown [2][3][4][5]. Ecological speciation-the evolution of reproductive isolation between populations by adapting to different environments [2]-has in the past two decades become widely recognized as an important source of species diversity [6][7][8][9][10]. This perspective was accepted by many evolutionary biologists, building off the biological species concept even before evidence mounted in support of ecological speciation; but increasing evidence for environmental selection as separable from other divergence mechanisms [10] has rendered the concept of ecological speciation a unifier for speciation drivers as diverse as climate, resource competition, and predation [10]. It has become increasingly clear that the early focus on allopatric speciation, in which post-zygotic incompatibilities evolved in separated populations with no or limited gene flow [11,12], accounts for only a portion of the diversity of life [3,13]. To know the tree of life fully requires studying barriers to gene flow in populations that are not completely diverged and not yet reproductively isolated, before genetic changes contributing to reproductive isolation become confounded with other differences that accumulate after speciation [3].
One difficulty in studying genetic signatures of sympatric speciation has been technical: studying the maintenance of species integrity in the face of gene flow relies strongly on identifying enough molecular markers to tease apart the effects of genes that are driving divergence from those that are homogenized by gene flow [2,[14][15][16]. Leaning on this concept of speciation in the face of gene flow, Via & West [17] coined the term "genetic mosaic of speciation", based on evidence that divergent selection affects the genes of adaptive key traits while the rest of the genome can remain similar between species [18,19]. The related metaphor of "genomic islands of divergence" describes genomic heterogeneity in differentiation resulting from divergent selection [8,20]. Genomic regions under direct selection-the "islands"-or loci linked to them are expected to show signatures of relatively high interspecific differentiation in comparison to the genome as a whole. These "islands" form and then grow in size due to "divergence hitchhiking" [3,17], a reduction in gene flow in regions adjacent to selected genes that allows alleles involved in reproductive isolation to accumulate in the face of interspecific gene flow [3,8,21]. Due to divergence hitchhiking and a genomically localized reduction in interspecific gene flow, linkage disequilibrium is expected to be higher in those regions that are under divergent selection when compared to control regions [3,14,15,17].
While the role of environmental selection in speciation is widely recognized, the genetic mechanisms linking adaptive divergence and reproductive isolation are still unclear [2,[22][23][24]. The majority of adaptive divergence studies have been performed in model organisms [25,26]. Only a few "speciation" genes, especially causing hybrid inviability and sterility, have been identified, and mainly in model species [8,13,[27][28][29]. As a consequence, little is known about the locations and distribution of regions/loci involved in adaptive divergence and reproductive isolation [30], especially in plants [24,29,31].
The high genetic diversity in tree species, especially in their natural habitats, makes them particularly well suited to research into adaptive divergence in response to changing environments [32]. There are other advantages to choosing trees as models for the study of genome-wide adaptive variation, including their limited history of domestication; their large, open-pollinated native populations [33]; their predominantly random mating systems; and their large effective population sizes [34]. Advances in next-generation sequencing technologies and bioinformatics in the last 15 years have greatly impacted our understanding of forest tree diversity and biology [35]. Tree genomics has benefited from the sequencing of several whole genomes [36][37][38][39][40][41], which have enabled studies of evolutionary history [32] and the potential roles of tens of thousands of genes on diversification, adaptation, and tree biology generally [35,38].
Oaks have long been recognized as important models for understanding ecological controls on speciation and gene flow, as edaphic and climatic factors shape patterns of distribution and introgression [42][43][44][45][46]. This may make oaks particularly well suited to understanding the genomic mosaic of introgression. A recent population genomic study using whole-genome sequencing in oaks has confirmed that divergence between hybridizing species is limited to a subset of the genome, maintaining species integrity in the divergent portions while the rest of the genome is permeable to gene flow: high levels of differentiation (F ST > 0.8) were found at narrow regions distributed across the genome with an average width of 10 kb, indicating that selection in these regions counteracts the homogenizing effect of gene flow [47]. In this context, a question of interest to our understanding of the longevity and persistence of species is whether species evolved in allopatry or sympatry, and whether and in which period there was potential gene flow between them, including whether introgression has been ongoing through the lifespan of the species or initiated after secondary contact [48]. Because divergent selection in strictly allopatric species proceeds unaffected by homogenizing effects of gene flow, the genetic mosaic is less pronounced in species that have diverged in allopatry [8]. In this respect, too, oaks present themselves as an ideal study subject, as the high diversity of oaks growing in sympatry across most of North America and much of western Europe and east Asia serve as replicated natural experiments in oak diversification. At an estimated 435 species, Quercus is one of the most diverse and most widespread woody plant genera in the northern hemisphere [44,[49][50][51][52], and whole genomes of Quercus robur [53] and Q. lobata [54], as well as draft genome of Q. suber [41] position oaks as a model genus for understanding the evolution of reproductive isolation and divergence in the face of gene flow.

Oaks as a Model to Study Adaptive Divergence between Species
Oaks are important both economically and ecologically, not just for human use, but also as a shelter and food source for wild animals [55]. As far back as the colonization of early Homo sapiens of the Middle East and Europe [56], oaks are entangled in human history, symbolism, tradition, economics, and livelihoods [57][58][59][60][61]. Acorn findings in caves suggest that oaks fed humans in their early settlements and have long served as both a "famine food" and a chosen food, even when other options are available [59,62]. Oaks have also been valuable in cultural and religious life, from association with important gods (e.g., [58]) to becoming a national symbol for such modern countries as Germany, the UK, Poland, Portugal, and, in 2004, the USA [31]. In a review by Leroy et al. [31], the symbolism of oak trees and their significance for humans was connected with attributes of the oak genome. For example, longevity, cohesiveness, and robustness of oaks are three pillars on which the symbolism of oaks stands. As a genetic consequence of longevity, accumulation of heritable mutations is expected. However, Leroy et al. [31] summarized recent findings [38,63] that independently reported only small numbers of somatic mutations (17 and 46 singlenucleotide polymorphisms) at the whole-genome level. Their robustness was supported by Plomion et al. [38], revealing patterns of immune system diversification with an expansion of resistance (R) genes accounting for 9% of the gene catalogue [31]. Thus, the importance of oaks to humans is deeply rooted in their diversification history and biology.
Oaks have been described as a thorny problem [31] and a "worst case scenario" for the biological species concept [13], and they have been included among "botanical horror" taxa [64]. Oaks are problematic for classical biological or morphological species concepts because of weak interspecific boundaries and high intraspecific variation relative to interspecific differentiation [65][66][67][68][69][70]. Due to these observations, it has been suggested that speciation in oaks may be driven by ecological factors, as reflected in their occurrence in different micro-environments, rather than by strong post-zygotic reproductive isolation [71,72]. Oaks, however, are not unique in the difficulties they pose for delimiting species in light of hybridization. They do exhibit introgression, but many other taxa do as well. Perhaps more important is the fact that oaks are particularly well studied and extremely rich in species, with many occurring in sympatry and thus subject to multispecies introgression [42,73]. Oaks are, for these reasons, among the most famous syngameons (groups of species that maintain their distinctions despite sympatry and regular introgressive hybridization [73][74][75][76]).

The Reality of Oak Species and Oak Introgression
Species boundaries in oaks are often genetically and morphologically ambiguous [55,66,77,78]. Still, different species typically form genetically disjunct clusters [66,[78][79][80][81][82][83] and maintain ecological distinctions, with ecologically divergent oaks commonly growing together in the same forest but in different micro-environments [84][85][86][87]. Oaks hybridize readily within sections, but hybrids between different sections are exceptionally rare in botanical gardens and have not been recorded in nature [55,88,89], though ancient introgression has been detected between sections Quercus and Ponticae [90] and sections Quercus and Protobal-anus [91,92]. Good examples of species pairs that occur in sympatry but differ in local adaptations (for example to drought) include Q. rubra and Q. ellipsoidalis in North American red oaks (section Lobatae), Q. robur and Q. petraea in European white oaks (section Quercus) [93][94][95], and Q. virginiana and Q. geminata in the North American live oaks (section Virentes) [96]. These and many other species pairs are known to hybridize where their ranges overlap [70,77,97,98], with introgression shifting over time in many cases as climate and habitats change [45,46,99]. Assessing levels of hybridization is necessary in studying the effects of interspecific gene flow and divergent selection of species [100].
Until the 1980s, hybrids were characterized purely based on morphological characteristics such as leaf morphology [67,[101][102][103][104][105][106], and later based on genetic markers that helped distinguishing species of the same taxonomic section [81,100,107,108]. Some studies have combined morphometric and molecular approaches to identify hybrids [68,[109][110][111]. While understanding morphological and functional variation in combination with molecular variation is key to understanding how introgression shapes species evolution and ecology, molecular genetic markers are more useful for identifying hybrids and quantifying introgression, because molecular markers generally show higher discriminatory power and accuracy in species assignment than morphological characteristics and directly sample the genomic background that is most often the target of studies.
Hybridization rate and introgression can be estimated based on identification of hybrids and introgressive forms across species ranges or in sympatric stands using genetic assignment analyses in adult trees, seeds or seedlings; or using paternity analysis, which is limited to sympatric stands [89]. In either case, analyses are based on species-discriminating markers (fixed or nearly fixed within species; e.g., [75,112] or markers polymorphic in all species (e.g., [113])). Several studies of contemporary interspecific gene flow using paternity analysis have been conducted in multi-species stands of European white oaks [98,114,115]. A comparatively high level of interspecific gene flow and introgression was found in the contact zone of species in each example. Using a set of six polymorphic microsatellite loci and paternity analysis in 320 acorns from four oak species, Curtu et al. [98] found an overall hybridization rate of 35.9%, with first generation hybrids present at a rate of 8.4%. Similarly, by means of paternity analysis, a comparable rate of F 1 hybrids (9.0%) was found in 623 acorns by Lepais and Gerber [114] using 10 microsatellite markers in four oak species. Comparable overall proportions of hybrids between Q. petraea and Q. pubescens (26%) were revealed by paternity analysis of acorns by Salvini et al. [115]. Microsatellite markers were used also in contemporary gene flow studies in North American oaks [70,77]. Paternity analysis of Q. coccinea, Q. rubra, Q. velutina and Q. falcata demonstrated more than 20% of seedlings in two mixed-species stands to be potential hybrids [70]. Interspecific gene flow was also confirmed by Khodwekar and Gailing [77] based on parentage analysis of 466 seeds collected from 15 genetically assigned seed parents in sympatric stands of Q. rubra and Q. ellipsoidalis, which revealed that 3.9% of the seedlings (0-25% for individual seed parents) were derived from interspecific matings.
Genetic assignment analyses are more commonly used to estimate introgression rates. Analyzing five nuclear microsatellite markers with both multivariate discriminant and Bayesian clustering methods implemented in STRUCTURE [116], Valbuena-Carabana et al. [87] quantified genetic differentiation and introgression in 176 adult trees of Q. petraea and Q. pyrenaica. They found reliable minimum estimates of introgression between species to be relatively low, identifying only 8.5% of adult trees as putative first and later generation hybrids. Similar results were found in a Romanian oak stand comprising the four interfertile species Q. petraea, Q. robur, Q. pubescens, and Q. frainetto [80]. Using genetic assignment analysis in adult trees, introgression rates between pairs of species varied, with the highest value between Q. pubescens and Q. frainetto (16.2%) and the lowest value between Q. robur and Q. frainetto (1.7%). Comparatively low percentages of adult trees (2.1%-4.6%) and acorns (1.5%) with hybrid ancestry were found in mixed stands of Q. lobata and Q. douglasii [108,117]. While hybrids are often comparatively frequent in direct contact zones of interfertile species, they rarely occur in pure stands (e.g., [81]). Despite this preponderance of gene-conduits in the contact zones of species, we still observe long-term maintenance of species. A range-wide study of four interfertile oak species-Q. ellipsoidalis, Q. coccinea, Q. rubra, and Q. velutina-demonstrated that genetically intermediate individuals or putative F 1 hybrids were present in contact zones between species, where they were inferred to act as a bridge for recurrent gene flow [83]. The same four species were studied by Zhang et al. [118] using maternally inherited chloroplast DNA markers for the first time in red oaks. At three chloroplast microsatellite markers, a total of 23 haplotypes were observed. In all cases, neighboring interspecific population pairs shared haplotypes, indicating contemporary interspecific gene flow. Using a set of 58 unlinked SNPs from coding regions, Reutimann et al. [65] assessed the extent of admixture in three presumably mixed populations of the white oak species Q. robur, Q. petraea, and Q. pubescens across Switzerland. The Bayesian STRUCTURE method and a machine learning approach (support vector machine) recovered admixture levels of 35-36%. Levels of admixture varied across the species pairs, but the highest degree was detected in mixed stands of Q. petraea and Q. pubescens, indicating high levels of gene flow between these two species. Since no fixed alleles were recovered in any of the taxa, levels of interspecific heterozygosity could not be quantified. This may be a general issue in oaks, in which pure stands (lacking admixture) and fixed alleles are nearly impossible to find (though cf. [112], in which apparently fixed SNPs were identified using RAD-seq and then used to genotype range-wide samples in [75]).
The weight of evidence thus demonstrates two important facts. First, oak species cohere genetically. When DNA data are directed at oaks, for the most part they are found to cluster in genetic space (going back to the foundational study of [119]). The published exceptions we are aware of (e.g., [70,120,121]) involve relatively small numbers of microsatellites, and studies using larger numbers of markers on the same taxa (e.g., [66,75]) have generally found the species to separate in molecular genotypic space. Second, numerous studies (cited in paragraphs above) demonstrate that introgression is ongoing at a fairly high rate, and consequently that conduits for gene flow between species may be present on the landscape at rates of 9-20% or higher, depending on the species pair. Assuming that these rates are representative of the long-term histories of gene flow between oak species, these two findings in combination strongly suggest that oaks cohere genetically in spite of ongoing gene flow. Thus, the data support the existence of a syngameon of interconnected oak species, evolving both separately and together, as hypothesized in the mid-1970s [71,72].

Evidence for Selection against Hybrids
While the number of hybrids and introgressive forms seems to be low in adult trees [80,81,87,89], gene flow analyses suggest frequent interspecific gene flow [70,78,98,115], which may suggest selection against hybrids from seedling to adult stage. Thus, Curtu et al. [98] found a much higher proportion of hybrids in the seed generation as determined by paternity analysis (35.9%) when compared to the adult trees (20.1%) in a multi-species oak stand, suggesting selection against many or perhaps most hybrid genotypes may be a mechanism for the maintenance of species identity. On the other hand, Abraham et al. [117] found low numbers of hybrids in both adults (2.1%) and acorns (1.5%) in Q. lobata and Q. douglasii stands. It is important to note that these observational studies cannot distinguish between selection and stand-level historical changes in species composition and structure, climate, or management that might result in inter-generational differences in introgression rate. While they are strongly suggestive, direct evidence of selection against hybrids would require the assessment of hybrids and parental species over time, ideally from seed to juvenile and adult stage. To our knowledge, such long-term experiments have not been performed yet. In their absence, the sum of observational studies in independent mixed-oak stands showing consistently higher introgression rates in the acorn/seedling generation relative to the adults nonetheless gives credence to the hypothesis that selection against most but not all hybrid genotypes contributes to the observed genomic heterogeneity of introgression observed in adult trees.
Experimental evidence for selection against hybrids was provided by the comparison of non-germinated but viable seeds and germinated seedlings from a hybrid zone between Q. rubra and Q. ellipsoidalis in a greenhouse common garden experiment [122]. Using genetic assignment analyses at microsatellite markers, Gailing and Zhang [122] characterized and compared the number of hybrids and "pure" species between non-germinated acorns and seedlings of two interfertile sympatric species, Q. rubra and Q. ellipsoidalis. Hybrids and introgressive forms in non-germinated acorns showed a frequency of 43.6% while they had a much lower frequency of 9.3% in the seedlings, indicating early selection against hybrids possibly as result of intrinsic incompatibilities between species [122]. However, direct evidence for environment dependent selection against hybrids is still missing.

Distribution of Hybrids and Species Indicates Environmental Selection
Curtu et al. [80,98] found in a multi-species stand in central Romania that the white oak species Q. robur, Q. petraea, Q. pubescens, and Q. frainetto were distributed according to their soil preferences and environmental requirements, and that interspecific hybrids occurred in environmentally intermediate contact zones between species. This result is in accordance with several other findings in oaks [46,70,77,86] and a long history of expectations about the effect of "hybridization of the habitat" on distribution of plant hybrids [123,124]. For example, Khodwekar & Gailing [77] found that the distribution of species and hybrids was significantly associated with soil properties (i.e., soil water holding capacity, nutrient availability). Additionally, closely related oaks have been found to co-occur less frequently at the plot scale than distantly related oaks due to convergence on traits under strong environmental filtering [84,125]. The rapid evolution of fine-scale habitat differentiation seems, in fact, to be a hallmark of oak diversification [126] that contributes to the separation of potentially introgressing species both among and within forest stands.
These results suggest that pre-zygotic [97,100,114] and post-zygotic [98,122] isolation mechanisms play complementary roles in the maintenance of species identity in oaks. Findings of Gailing and Zhang [122] together with species distribution according to environmental requirements and observed lower hybrid frequencies in ancient hybrid zones [127,128] serve as indirect evidence for selection as an important post-zygotic isolation mechanism in oaks. Experimentally disentangling pre-zygotic and post-zygotic isolation mechanisms might include (1) observations of the number of produced seeds and seedling survival derived from intra-and interspecific crosses, accompanied by (2)

Adaptive Introgression
The introgression of potentially adaptive alleles can have a large impact on adaptation to different environmental conditions. Evidence for adaptive introgression of outlier alleles was provided by the comparison of hybrid numbers and outlier allele frequencies in sympatric and parapatric populations of Q. rubra and Q. ellipsoidalis [77,129]. In neighbouring parapatric populations of the two species, genetic assignment analysis indicated symmetric interspecific gene flow, but introgression of Q. ellipsoidalis-specific alleles at a species-discriminating outlier locus (CONSTANS-like 1) into Q. rubra occurred at a lower rate than the introgression of Q. rubra-specific alleles into Q. ellipsoidalis [129]. Sympatric populations in dry outwash plains exhibited a similar discrepancy between symmetrical genome-scale introgression and asymmetrical outlier-locus introgression. However, outlier introgression from the drought adapted species Q. ellipsoidalis into Q. rubra was more prevalent in these dry outwash plains [77], concordant with the predictions of adaptive introgression [130]. Different directions of outlier alleles introgression in sympatric and parapatric populations suggests shifting selection regimes on these outliers in different spatial and environmental contexts. As these two species have different environmental preferences and outlier allele introgression seems to be related to environmental contrasts, the maintenance of genetic differences between these two species is likely dependent at least in part on environmental selection [77].
Clinal variation consistent with the geographic variation of introgression from Q. robur into Q. petraea displayed by some genes associated with phenological divergence in Q. petraea indicated adaptive introgression in outlier regions [44]. Quercus robur-like alleles, for example SNPs located in two important genes controlling stomatal responses, RopGEF1 and PBL10 (=APK1b) [131,132], were found with higher frequency in Q. petraea populations growing at higher altitudes in cooler and/or wetter environments, indicating that introgressed alleles from Q. robur serve as a source of phenotypic and potentially adaptive variation in Q. petraea [44]. These findings mirror and move beyond the findings of an earlier study

Adaptive Introgression
The introgression of potentially adaptive alleles can have a large impact on adaptation to different environmental conditions. Evidence for adaptive introgression of outlier alleles was provided by the comparison of hybrid numbers and outlier allele frequencies in sympatric and parapatric populations of Q. rubra and Q. ellipsoidalis [77,129]. In neighbouring parapatric populations of the two species, genetic assignment analysis indicated symmetric interspecific gene flow, but introgression of Q. ellipsoidalis-specific alleles at a species-discriminating outlier locus (CONSTANS-like 1) into Q. rubra occurred at a lower rate than the introgression of Q. rubra-specific alleles into Q. ellipsoidalis [129]. Sympatric populations in dry outwash plains exhibited a similar discrepancy between symmetrical genome-scale introgression and asymmetrical outlier-locus introgression. However, outlier introgression from the drought adapted species Q. ellipsoidalis into Q. rubra was more prevalent in these dry outwash plains [77], concordant with the predictions of adaptive introgression [130]. Different directions of outlier alleles introgression in sympatric and parapatric populations suggests shifting selection regimes on these outliers in different spatial and environmental contexts. As these two species have different environmental preferences and outlier allele introgression seems to be related to environmental contrasts, the maintenance of genetic differences between these two species is likely dependent at least in part on environmental selection [77].
Clinal variation consistent with the geographic variation of introgression from Q. robur into Q. petraea displayed by some genes associated with phenological divergence in Q. petraea indicated adaptive introgression in outlier regions [44]. Quercus robur-like alleles, for example SNPs located in two important genes controlling stomatal responses, RopGEF1 and PBL10 (=APK1b) [131,132], were found with higher frequency in Q. petraea populations growing at higher altitudes in cooler and/or wetter environments, indicating that introgressed alleles from Q. robur serve as a source of phenotypic and potentially adaptive variation in Q. petraea [44]. These findings mirror and move beyond the find-ings of an earlier study that used range-wide genome scans of AFLPs in Californian red oaks [42] to show that the movement of introgressed loci beyond the natural range of their source species was predicted by climate, but without identifying the genes involved. With ongoing climate change, we may expect to find additional opportunities to study adaptive introgression. It is anticipated, for example, that Mediterranean oaks (Q. pyrenaica and Q. pubescens) will migrate north into the distribution area of temperate oak species (Q. petraea and Q. robur), providing a chance to study evolutionary processes in action at the present time [24]. Beginning these studies now may give us the opportunity to understand the roles of hybridization and adaptive introgression in climate change adaptation [47].

Oak Reference Genome Sequences
A full appreciation of all of the genetic components of variation underlying the capacity for adaptive divergence, speciation, reproductive isolation, and individual quantitative adaptive traits ultimately requires high-quality genome sequence assembly references [133]. Reference chromosome-scale genomes support investigations into variations in the coding sequence, chromosome structure at the macro (whole-genome) and micro (indel) levels, regulatory elements, non-coding RNAs, and positional effects (such as repetitive DNA environs). The approach initially available for assembling draft genome references was the de novo assembly of contigs and scaffolds, which for oak species included pedunculate oak, Q. robur (17,910 scaffolds) [53], and valley oak, Q. lobata (18,512 scaffolds) [54], both in the Quercus section of the genus and Q. suber (cork oak, (23,347 scaffolds) [41]) in the section Cerris. The Q. robur de novo assembly scaffolds were then anchored (ordered) by reference to a high-density genetic linkage map [134], providing chromosome-scale sequences for Q. robur covering 96% (716.6 Mb) of the haploid genome size [38]. The contiguous chromosome-scale genome for Q. robur contains 90% of the consensus set of 25,808 annotated protein-coding genes predicted in the de novo genome. Certain features of oak genomes, discovered from the chromosome-scale genome assembly of Q. robur, may impact studies on hybridization, introgression, and adaptation, including a high level (35.6%) of proximal tandem duplications of genes and inheritance of somatic mutations arising during the long lifespan of oak trees [38]. The extent to which tandem duplication of genes is consistent across the genus, or specific to lineages within oaks, will require chromosome-scale genome assemblies for representatives of the other sections of the oak genus. Recent technology advancements for enabling chromosome-scale genome assemblies, such as Hi-C and nanopore, are now widely in use. A contiguous chromosomescale genome assembly for northern red oak (Q. rubra), in the Quercus section Lobatae, has recently been completed and will soon be released on the Joint Genome Institute's Phytozome 13 (https://phytozome-next.jgi.doe.gov/, accessed on 28 April 2021). The northern red oak genome, and anticipated additional chromosome-scale oak genomes for oaks, will enable detailed comparative studies at the whole-genome level on pan-genome organization, phylogeny, maintenance of species integrity, hybridization, and adaptations among oaks genus-wide.

Genomic Distribution and Architecture of Differentiation
As already pointed out, selection may direct interspecific gene flow between sympatric oak species. However, outliers exist because effective gene flow is in at least some loci reduced, potentially accelerating the accumulation of alleles involved in reproductive isolation [21]. Genome scans in both European white oaks (section Quercus in part) and North American red oaks (section Lobatae), each of which is composed of ecologically divergent but interfertile oak species, have revealed strong divergent selection on only a few so-called outlier loci [14,44,79,135,136]. Outlier loci exhibit high allele-frequency divergence and are presumed in many cases to be affected by selection in the genomic region in which they occur, due either to direct selection or selection on closely linked loci. They are referred to as "outliers" because they exhibit differentiation that deviates significantly from neutral expectations [136]. Methods that are widely used to detect these loci are F ST -based neutrality tests, where the F ST values of individual loci are tested against a simulated null distribution that is derived from a neutral island model of migration [137]. Another method widely used is the Bayesian method as implemented in BAYESCAN [138]. Alternative methods use environmental variables for outlier detection, implemented in software such as BAYENV2 [139] and BAYESCENV [140]. There are some constraints of F ST outlier detection methods that should be considered, such as number of sampled demes [141] and population structure [142], and it does not display absolute differentiation, but differentiation relative to the total variation (see [143]). The menagerie of genomic tests for selection is the subject of a specialized literature (e.g., [144][145][146]) that is well beyond the scope of our review. In concert, these methods comprise a suite of tools, in some cases complementary, in others redundant. Overall, however, these methods are crucial to answering a question central to understanding the genomic architecture of differentiation and adaptive introgression in oaks and other species: which loci are exchanged between species, which are not, and under what spatial and environmental conditions does differentiation give way to introgression? Once identified, divergent loci can be localized in the genome using high resolution genetic linkage maps anchored to scaffolds of sequenced genomes [33,89,134].
Outlier loci have been identified for different oak species with contrasting ecological preferences, few of them showing very high interspecific differentiation as evidence for strong divergent selection. For example, one F ST outlier (FIR013, an EST-SSR [147]) was identified when screening four Q. rubra/Q. ellipsoidalis species pairs from three different geographic regions using nuclear microsatellite markers and expressed sequence tag EST-SSR markers [82,136]. FIR013 was detected as an outlier in all geographic regions and pairwise comparisons, and was almost fixed for alternative alleles in the two species with values of interspecific F ST ranging from 55% to 80%. The EST from which FIR013 was developed was annotated as CONSTANS-like (COL) gene, a candidate gene for flowering time and growth cessation in late summer [148]. CONSTANS-like genes are also involved in growth/development and phenology in other species [149,150], and it is thus possible that this gene may play a role in adaptive divergence between species through both ecological and phenological divergence [136]. Another outlier identified between Q. rubra and Q. ellipsoidalis included a histidine kinase 4-like gene (marker GOT021) [81], which was also detected as an outlier between Q. pyrenaica and Q. faginea [14]. Histidine kinase is shown to be an important part of a signaling cascade to effect stomatal closure in response to environmental and endogenous stimuli in Arabidopsis [151]. Additionally, GOT021 underlay a QTL for leaf shape variation in a Q. robur full-sib family [152]. The fact that it is an F ST outlier in different sections suggests that this marker may be involved in the maintenance of species integrity across a broad phylogenetic range of oaks as a result of parallel environmental selection. The genomic architecture of such loci associated with species differentiation may be non-random and under parallel divergence pressure across the genus. A study of genomic coherence in eight species of the eastern North American white oak syngameon, for example, found SNPs that are fixed or nearly fixed across species' ranges [75]. SNPs divergent between species map back to all 12 Quercus linkage groups (chromosomes) and are separated from each other by an average of 7.47 million bp. The proportion of divergent SNPs separated by <10,000 bp is significantly higher than the null distribution, suggesting that genome-wide patterns of divergence may be concentrated on chromosomes or in regions of the genome that reflect a higher-than-average history of among-species divergence.
Leroy et al. [44] performed genome-wide scans in Q. petraea populations sampled along latitudinal and elevational gradients to study local adaptation. Additionally, reference populations of other European white oak species were included (Q. robur, Q. pubescens and Q. pyrenaica) [47] to detect outliers potentially discovering footprints of divergence between these populations. To identify SNPs potentially subjected to selection within Q. petraea, XtX statistics (analogous to F ST but specifically corrected for covariance matrix of allele frequency between populations, accounting for neutral correlations of allelic frequencies [139]) were calculated. 761,554 SNPs deviating from neutral expectations distributed over all chromosomes were detected, which is 2.05% of all investigated SNPs. Outliers were strongly enriched in SNPs which differentiated Q. robur from Q. petraea: strong correlations between intraspecific XtX values in 18 Q. petraea populations and interspecific F ST between Q. robur and Q. petraea suggested that these SNPs are potentially under diversifying selection within and among species. Among the SNPs detected in Q. petraea association studies between genotype and environment/phenotype, 1331 SNPs were associated with temperature, 2932 with precipitation, and 1572 with leaf unfolding. The largest proportion of SNPs was significantly associated with both temperature and leaf unfolding, and those were highly enriched in SNPs strongly differentiated between Q. robur and Q. petraea. Subsequently, 167 unique candidate genes involved in different growth and developmental processes such as stomatal responses to water stress (e.g., ALY4, ALMT9, PBL10, PP2AA, RopGEF1, SUS3, SCD1) and some temperature associated genes acting as regulators of production of gibberellins were identified (e.g., GAMT2, ASPG1 or GA2ox) [44].
In a companion paper, Leroy et al. [47] searched for candidate genes in regions enriched in outliers in the same four European white oak species based on a large data set of 31 million SNPs. In total, 227 genes were identified, from which at least 32 genes were assigned to three major functional categories (genes underlying ecological preferences of the four species, genes involved in biotic interactions, and those involved in intrinsic barriers (for the list of genes see Table 2 in [47])). Among these 32 genes, 6 candidate genes were detected as outliers for all species pairs. Four of these genes are putatively involved in intrinsic barriers, two are involved in pollen development (Cycloartenol synthase 2), and two in photoreceptor and UV-B tolerance. One of these-UVR8-was also detected as a gene under positive selection in Juglans siligata [153]. The other two outliers have a putative role in ecological barriers (regulator of root growth and role in soil-water deficit stress). Four outlier candidate genes related to drought tolerance were present in all observed species pairs except Q. robur-Q. petraea [47], which is consistent with the higher drought tolerance of Q. pubescens and Q. pyrenaica than in these two species [154]. One of the genes-VRN1, detected in all species pairs except in Q. robur-Q. petraea-plays a role in acclimation to low temperatures, and has been identified as a candidate gene for cold tolerance in many other plant species [155]. In general, most outlier SNPs contributing to reproductive barriers differentiate southern from northern species, suggesting that these barriers are driven by climate preferences [47]. Another association study on Q. robur, Q. petraea, and Q. pubescens was conducted by Rellstab et al. [156], who identified common patterns of SNP-environment associations across species for seven genes among 95 targeted genes, suggesting a role in local adaptation [156]. Genes gigantea and galactinol synthase (GolS1) from this study were also detected by Alberto et al. [157] as candidate genes for bud burst in Q. petraea. Additional genes showing patterns of local adaptation across different oak and related species of the beech family can be found in a recent review [158].
Whole genome outlier screens across species pairs can be used in combination with association analysis within species sampled across environmental gradients ( Figure 2) to detect genes involved in both local adaptation within species and adaptive divergence between species. Such outliers, present in both analyses, are candidate genes for adaptive divergence and reproductive isolation between species [44,159]. Association analyses of genetic variation (single nucleotide polymorphisms, SNPs) with adaptive trait variation can be performed in segregating populations using QTL mapping approaches to detect associated genomic regions [79,134,160]. To narrow down these regions to individual genes, association mapping in unstructured populations such as provenance trials is required [133]. Additionally, environmental association analyses between SNP variation and environmental variables can be used to identify adaptive genetic variation [44,65,157,161]. Moreover, populations in extreme environments of the environmental cline can be selected for outlier analyses [33]. populations in extreme environments of the environmental cline can be selected for outlier analyses [33]. A final approach that has been illuminating in understanding the genomic architecture of differentiation in oaks is mapping phylogenetically informative loci back to the genome to determine whether (1) loci that recover introgression history are genomically clustered, following population-level studies of species divergence (e.g., [79]); or (2) loci that track divergence are replicated in different clades. Several studies have utilized phylogenomic datasets to explicitly model divergent vs. introgressive histories in oaks [69,90,99,[162][163][164][165][166][167]. Two studies have then mapped loci implicated in ancient introgression vs. divergence in the white oaks (Quercus section Quercus) back to the genome to characterize the genomic distribution of divergence history [49,163]. Both studies failed to find that introgressive loci were clustered on the genome, presumably because the long history of recombination has eroded linkage disequilibrium connecting loci in ancient introgression events. Moreover, one of the studies [49] demonstrated that the loci recovering the divergence history in lieu of the dominant introgression history for alternative regions of the white phylogeny are also not related to one another. Thus, they reject the hypothesis that there are particular genes or regions of the genome that define the oak phylogeny globally. The evolutionary history of oaks is a mosaic history.

Conclusions and Future Perspectives
As summarized above, many studies have dealt with hybridization in oaks, but the decreasing costs and flexibility of whole genome sequencing and reduced representation A final approach that has been illuminating in understanding the genomic architecture of differentiation in oaks is mapping phylogenetically informative loci back to the genome to determine whether (1) loci that recover introgression history are genomically clustered, following population-level studies of species divergence (e.g., [79]); or (2) loci that track divergence are replicated in different clades. Several studies have utilized phylogenomic datasets to explicitly model divergent vs. introgressive histories in oaks [69,90,99,[162][163][164][165][166][167]. Two studies have then mapped loci implicated in ancient introgression vs. divergence in the white oaks (Quercus section Quercus) back to the genome to characterize the genomic distribution of divergence history [49,163]. Both studies failed to find that introgressive loci were clustered on the genome, presumably because the long history of recombination has eroded linkage disequilibrium connecting loci in ancient introgression events. Moreover, one of the studies [49] demonstrated that the loci recovering the divergence history in lieu of the dominant introgression history for alternative regions of the white phylogeny are also not related to one another. Thus, they reject the hypothesis that there are particular genes or regions of the genome that define the oak phylogeny globally. The evolutionary history of oaks is a mosaic history.

Conclusions and Future Perspectives
As summarized above, many studies have dealt with hybridization in oaks, but the decreasing costs and flexibility of whole genome sequencing and reduced representation genomic sequencing make it possible to study the evolutionary implications of hybridization and introgression at much higher genomic and spatial resolutions today [24]. To better understand the genetic basis for the maintenance of species integrity in oaks in the face of interspecific gene flow and the roles of genes that distinguish species, additional studies observing outlier regions between sympatric oak species should be performed. Analyzing oaks with different adaptations, for example to drought, might reveal whether there is a higher number of genes associated with adaptive divergence for drought tolerance and reproductive isolation between the ecologically divergent oak species in outlier genomic regions than in shared regions. Comparison of outlier genes identified between species with different environmental requirements across different sections can reveal genomic divergence unique to oak sections, or parallel genomic divergence driven by natural selection. Such outlier genes or regions detected across sections are prime candidates for adaptive species divergence by environmental selection. Whole genome resequencing can be used for the detection of such outliers and in combination with environmental association analysis, genome wide association studies (GWAS), and QTL mapping, candidate genes for speciation processes involved in adaptive divergence within and between species can be identified.
The focus of this review is solely on oaks and their maintenance of species integrity despite ongoing gene flow, hybridization, and introgression. Oaks, however, have the potential to illuminate speciation patterns in complement with immensely different speciation models such as sunflowers [168] and irises [169], where hybridization is a major driver of speciation. By contrast with the other models, in which hybridization triggers speciation [170], oaks exhibit a facilitation of adaptation via hybridization [44,48,69]. Equally important, hybridization appears not to threaten species integrity in oaks, even though a limited part of the genome-potentially a very limited portion, composed of only small regions distributed across the genome-is responsible for maintaining species barriers [24]. Thus, oaks are an alternative model of species differentiation, maintenance, and conservation that are equally important to understanding the tangled ways of speciation.