Comparative Genome and Transcriptome Analysis Reveals Gene Selection Patterns Along with the PaleoClimate Change in the Populus Phylogeny

Poplars are widely distributed in the northern hemisphere and have good adaptability to different living environments. The accumulation of genome and transcriptome data provides a chance to conduct comparative genomics and transcriptomics analyses to elucidate the evolutionary patterns of Populus phylogeny. Transcript sequences of eight Salicaceae species were downloaded from public databases. All of the pairwise orthologues were identified by comparative transcriptome analysis in these species, from which we constructed a phylogenetic tree and estimated the rate of divergence. The divergence times of the phylogenetic clades were mainly estimated during the Middle Miocene Climate Transition (MMCT) to Quaternary Ice Age. We also identified all of the fast-evolving sequences of positive selection and found some resistance genes that were related to environmental factors. Our results suggest that drought-, H2O2and cold-stress genes are involved in positive selection along with the paleoclimate change. These data are useful in elucidating the evolutionary patterns and causes of speciation in the Populus lineage.


Introduction
Poplars (Genus Populus) are widely distributed in the northern hemisphere, ranging from subtropical to northern forests and are the most important source of wood in forests.Populus has become an excellent model plant along with the completion of genome projects Populus trichocarpa Torr.& A.Gray ex.Hook.[1], Populus euphratica Oliv.[2] and Populus pruinosa Schrenk [3].Several studies have focused on this genus, particularly with regard to its phylogenetic relationships [4,5], origin and speciation [6][7][8]; the timing of diversification events [9][10][11]; and environmental stress tolerance [12,13].However, no study explains how poplars have adapted to various climates during its long evolutionary history.
Transcriptome sequencing can rapidly and economically obtain all of the RNA information of organisms at one time, thus playing an important role in identifying molecular markers and functional genes for biology research [14,15].As the transcriptomes of more species have been generated, comparative transcriptomics has received more attention from researchers [16][17][18][19][20]. Comparative transcriptomics can explain the phylogenetic relationships based on multiple species, as well as determine the functional differences between orthologous genes after species divergence in different living environments.
In this study, the transcript sequences of seven Populus and one Salix species were downloaded from public databases (Table 1).Comparative genomics and transcriptomics were subsequently analyzed in eight Salicaceae species.A number of positive selection genes were found to be related to environmental factors in the Populus lineage.

Data Sources
To elucidate the evolutionary pattern of orthologues, the transcript sequences of seven Populus and one Salix (out-group) species were downloaded from the public databases (Table 1).Sequences of P. nigra [21], P. deltoides, P. tremula and P. tremuloides were obtained from PlantGDB [22] and sequence read archive (SRA) of national center for biotechnology information (NCBI) database.The cDNA sequences of P. trichocarpa (v3.1) [1], P. euphratica [2], P. pruinosa [3], and S. purpurea (v1.0) were downloaded from genome projects of GigaDB [23], NCBI and the United States department of energy joint genome institute (JGI) [24].SRA datasets with FASTQ format were filtered to remove raw reads of low quality.Transcriptome assembly was achieved using the short-read assembly program Trinity (version 2.3) [25].The assembled transcripts (≥300 bp) and expressed sequence tag (EST) sequences of PlantGDB were combined; Coding sequences (CDS) with amino acid sequences (≥50 bp) were extracted by OrfPredictor [26] and clustered with CD-HIT-EST (version 4.0) [27,28].Sequences with a clustering threshold of0.95 were divided into one class, and the longest sequence of each class was treated as a unigene during later processing.

Identification of Orthologues among Eight Salicaceae Species
OrthoFinder software (version 2.3.1)[29] was used to identify the putative orthologues among the eight species.Single copy sequences between every pair-wise species were then used to estimate the substitution rates in the subsequent analyses.Single copy sequences shared among all eight species were used to construct the phylogenic tree.The annotations obtained from the NCBI Non-redundant protein database (Nr) were processed through the BLAST2GO program (version 4.0) [30] to obtain relevant Gene Ontology (GO) terms and these were then analyzed by Web Gene Ontology Annotation Plot (WEGO) software (version 2.0) [31] to assign a GO functional classification and illustrate the distribution of the gene functions.A heatmap of orthologues was drawn using the R language (version 3.0).

Estimation of Synonymous and Non-Synonymous Substitution Rates
To remove the unigenes without open reading frames, pairwise orthologues were searched against plant protein sequences of GenBank using basic local alignment search tool (BLASTX) as previously described [32].Clustalw software (version 2.1) [33] was used to align the filtered pairwise orthologues, and the output files were formatted to the NUC format for subsequent analysis.The rates of synonymous substitutions (Ka) and non-synonymous substitutions (Ks) were estimated using phylogenetic analysis by maximum likelihood (PAML) software (version 4.7) [34].Using the fossil calibrations 45 million years ago (Mya) of genera Salix and Populus [10,11], the rate of substitution (r) was calculated based on the formula (T = K/2r, T is the time of divergence and K is the rate of non-synonymous substitutions Ks) and the average Ks value 0.121 between Salix and Populus.

Phylogenetic Analysis
There were still some inconsistencies in the phylogenetic relationship in previous studies [4,5].Single copy genes by OrthoFinder were aligned by multiple sequence comparison by log-expectation (Muscle, version 3.8) [35] and formated by Gblock [36]; the maximum-likelihood method was used to build the phylogenetic tree by molecular evolutionary genetics analysis (MEGA, version 6) [37] (bootstrap is 1000 and Kimura 2-parameter model).S. purpurea was used as an outgroup to root trees.
Forests 2019, 10, x FOR PEER REVIEW 3 of 11 substitution (r) was calculated based on the formula (T = K/2r, T is the time of divergence and K is the rate of non-synonymous substitutions Ks) and the average Ks value 0.121 between Salix and Populus.

Phylogenetic Analysis
There were still some inconsistencies in the phylogenetic relationship in previous studies [4,5].
Single copy genes by OrthoFinder were aligned by multiple sequence comparison by log-expectation (Muscle, version 3.8) [35] and formated by Gblock [36]; the maximum-likelihood method was used to build the phylogenetic tree by molecular evolutionary genetics analysis (MEGA, version 6) [37] (bootstrap is 1000 and Kimura 2-parameter model).S. purpurea was used as an outgroup to root trees.

Orthologue Identification and Functional Characterization among Eight Salicaceae Species
All of the pairwise orthologues were identified by the comparative analysis of eight Salicaceae species (Table 2).The results showed that P. trichocarpa has the highest average number (8198) of orthologous genes, whereas P. tremuloides has the lowest average number (5148).The highest number of orthologous genes (9687) was observed between P. trichocarpa and P. pruinosa, whereas the lowest (4339) was detected between P. tremuloide and S. purpurea.One-thousand-eight-hundred-and-thirtyfive shared orthologues were found among the eight Salicaceae species (Figure 2).The orthologues were functionally annotated using GO terms (Table S1 and Figure S1), and 339 orthologues were involved in biological processes (218), cellular components (113) and molecular functions (273).

Orthologue Identification and Functional Characterization among Eight Salicaceae Species
All of the pairwise orthologues were identified by the comparative analysis of eight Salicaceae species (Table 2).The results showed that P. trichocarpa has the highest average number (8198) of orthologous genes, whereas P. tremuloides has the lowest average number (5148).The highest number of orthologous genes (9687) was observed between P. trichocarpa and P. pruinosa, whereas the lowest (4339) was detected between P. tremuloide and S. purpurea.One-thousand-eight-hundred-and-thirty-five shared orthologues were found among the eight Salicaceae species (Figure 2).The orthologues were functionally annotated using GO terms (Table S1 and Figure S1), and 339 orthologues were involved in biological processes (218), cellular components (113) and molecular functions (273).S1).Sequence similarity was indicated with different colors from red (highly similar) to blue (weakly similar).S1).Sequence similarity was indicated with different colors from red (highly similar) to blue (weakly similar).

Phylogenetic Analysis and Divergence Time
Using S. purpurea as the outgroup, phylogenetic reconstruction of Populus was conducted based on 1835 orthologous transcripts using the maximum likelihood (ML) method (Figure 3).The observed phylogenetic relationship is highly consistent with the phylogenetic tree obtained from single-copy DNA sequences of a previous study [4].

Phylogenetic Analysis and Divergence Time
Using S. purpurea as the outgroup, phylogenetic reconstruction of Populus was conducted based on 1835 orthologous transcripts using the maximum likelihood (ML) method (Figure 3).The observed phylogenetic relationship is highly consistent with the phylogenetic tree obtained from single-copy DNA sequences of a previous study [4].The genetic distance of species was related to the synonymous mutation rate, which was calculated using the orthologous genes, so the synonymous mutation rates of all of the pairs of orthologues were estimated in the eight Salicaceae species (Table 2).The minimum Ks peak is about 0.01 between P. tremula and P. tremuloides and between P. euphratica and P. pruinasa.In addition, P. tricocarpa has a Ks peak (0.02) with P. nigra and P. deltoides; 0.04 Ks peak with P. tremula, P. tremuloides, P. euphratica and P. pruinasa; and the highest Ks peak 0.11 with the outgroup S. purpurea (Figure 4).In the phylogenetic tree, the average Ks value is 0.121 between the genera Populus and Salix (calculated by Table 2), which is highly consistent with the value of 0.12 in previous studies [9].Based on existing fossil evidence, the divergence time of the genera Salix and Populus was about 45 million years old (Mya) in the middle Eocene sediments [10,11].With this time as the separation of the two The genetic distance of species was related to the synonymous mutation rate, which was calculated using the orthologous genes, so the synonymous mutation rates of all of the pairs of orthologues were estimated in the eight Salicaceae species (Table 2).The minimum Ks peak is about 0.01 between P. tremula and P. tremuloides and between P. euphratica and P. pruinasa.In addition, P. tricocarpa has a Ks peak (0.02) with P. nigra and P. deltoides; 0.04 Ks peak with P. tremula, P. tremuloides, P. euphratica and P. pruinasa; and the highest Ks peak 0.11 with the outgroup S. purpurea (Figure 4).
Using S. purpurea as the outgroup, phylogenetic reconstruction of Populus was conducted based on 1835 orthologous transcripts using the maximum likelihood (ML) method (Figure 3).The observed phylogenetic relationship is highly consistent with the phylogenetic tree obtained from single-copy DNA sequences of a previous study [4].The genetic distance of species was related to the synonymous mutation rate, which was calculated using the orthologous genes, so the synonymous mutation rates of all of the pairs of orthologues were estimated in the eight Salicaceae species (Table 2).The minimum Ks peak is about 0.01 between P. tremula and P. tremuloides and between P. euphratica and P. pruinasa.In addition, P. tricocarpa has a Ks peak (0.02) with P. nigra and P. deltoides; 0.04 Ks peak with P. tremula, P. tremuloides, P. euphratica and P. pruinasa; and the highest Ks peak 0.11 with the outgroup S. purpurea (Figure 4).In the phylogenetic tree, the average Ks value is 0.121 between the genera Populus and Salix (calculated by Table 2), which is highly consistent with the value of 0.12 in previous studies [9].Based on existing fossil evidence, the divergence time of the genera Salix and Populus was about 45 million years old (Mya) in the middle Eocene sediments [10,11].With this time as the separation of the two In the phylogenetic tree, the average Ks value is 0.121 between the genera Populus and Salix (calculated by Table 2), which is highly consistent with the value of 0.12 in previous studies [9].Based on existing fossil evidence, the divergence time of the genera Salix and Populus was about 45 million years old (Mya) in the middle Eocene sediments [10,11].With this time as the separation of the two lineages and K = 0.121, the rate of substitution (r) was calculated to about 1.29 × 10 −9 per site and year, which is very close to the previous value of 1.28 × 10 −9 [9].
Phylogenetic reconstruction of genus Populus, mainly consists of four sections, namely, Turanga, Populus, Aigeiros, and Tacmahaca.Using the fossil calibrations (45 Mya) of genera Salix and Populus [10,11], the divergence times were estimated to be about 19.9 Mya, 18.9 Mya, and 10.0 Mya, respectively, corresponding to the three separated nodes in the Populus phylogeny.The minimum divergence time was about 9.4 Mya between P. nigra and P. deltoides; P. tremula and P. tremuloides; and P. euphratica and P. pruinasa.

Evolutionary Pattern of Populus spp. Genes
The Ka/Ks rate of orthologous genes could reflect the evolutionary pattern of species.Ka/Ks > 1 indicates that the gene underwent positive selection during evolution.The fast-evolving sequences of positive selection were identified in the genus Populus (Table 3), and some resistance genes were found to be related to environmental factors (Table 4).In section Populus, 301 positively selected genes were identified between P. tremula and P. tremuloides, which included one salt stress gene (Nr annotation: P93209.1)by producing the 14-3-3 protein [38][39][40], one drought stress gene (Nr annotation: BAB68268.1)by producing the drought inducible 22 kD protein, and one light stress gene (Nr annotation: NP_565524.1)by producing the SEP protein.

Paleoclimate Changes during the Divergence of Populus Phylogeny
The divergence time of the genera Populus and Salix was about 45 Mya in the middle of the Paleogene period (66-23 Mya) [53][54][55].During the Paleogene period, the global climate went against the hot and humid conditions of the late Mesozoic era and began a cooling and drying trend [56].As the Earth cooled, tropical plants were restricted to equatorial regions and decreased in number.Deciduous plants became more common as these could survive seasonal climates, during which Populus and Salix diverged.
Miocene  was the main period of section divergence in the Populus phylogeny (Figure 3).The divergence times of the four sections Turanga, Populus, Aigeiros and Tacmahaca were respectively 19.9 Mya, 18.9 Mya and 10.0 Mya, corresponding to the beginning and end of the Miocene period.During this period, the climate slowly cooled towards a series of the ice age.Although a long-term cooling trend was well underway, there is evidence of a warm period from 21 to 14 Mya, which was named the Middle Miocene Climate Transition (MMCT) [56], during which sections Turanga and Populus diverged (19.9 and 18.9 Mya).Then, global temperatures decreased and some species became extinct by 14 Mya [57][58][59]; thus plants and animals had to migrate or adapt to survive.In particular, the climate sharply cooled by 8 Mya and this formed the Quaternary Ice Age (2.6-0.1 Mya) [60].The climate change during the MMCT to Quaternary Ice Age may play an important role in the divergence of the Populus phylogeny.

H 2 O 2 -, Drought-Stress Genes and Speciation of Sections Turanga and Populus
The divergence times of sections Turanga and Populus were about 19.9 Mya and 18.9 Mya during the MMCT, during which the climate was warming.P. euphratica and P. pruinasa of section Turanga are mainly distributed in the deserts of Northern Africa and western China (Table S2).In previous studies, comparisons of the transcriptomes of P. euphratica and P. pruinasa suggest that these may have caused enough genetic divergence and helped them to adapt to these different desert habitats [12,13].Our results show that the H 2 O 2 stress gene was generally identified to be involved in positive selection between section Turanga and the other three sections.The H 2 O 2 stress gene can help plants develop abiotic resistance to adapt to the complex environment.It is suggested that selective evolution of H 2 O 2 stress genes should play an import role in the speciation of section Turanga.Meanwhile, P. tremula and P. tremuloides of section Populus are mainly distributed in the cooler and drought region of Northern America, Europe and Asia (Table S2).Our results show that drought stress genes are involved in positive selection between P. tricocarpa (section Tacmahaca) and P. tremula (section Populus), and P. deltoides (section Aigeiros) and P. tremula (section Populus).Speciation of the section Populus might be related to selective evolution of the drought stress genes during the MMCT.

Cold-, Salt-Stress Genes and Speciation of Sections Tacmahaca, Aigeiros
Previous research had suggested that Pleistocene (1.8-0.1 Mya) glacial cycles acted as drivers of speciation of Populus balsamifera and P. tricocarpa [8].The same results were found in Amborella trichopoda, that two main genetic groups of Amborella were shaped by the divergence of two ancestral populations during the last glacial maximum [61].In our work, cold stress genes were found to be involved in positive selection between P. nigra (section Aigeiros) and P. tricocarpa (section Tacmahaca), and P. deltoides (section Aigeiros) and P. tricocarpa (section Tacmahaca).In addition, their divergence time was about 10.0 Mya-9.4Mya at the beginning of cooling period.Our results show that the start time affected by the cooling climate may be earlier than the Quaternary Ice Age (2.6-0.1 Mya) during the divergence of the Populus phylogeny.P. tricocarpa of section Tacmahaca was mainly distributed in coastal western North America.Our results show that salt stress genes were involved in positive selection between P. tricocarpa and P. nigra, between P. tricocarpa and P. deltoides.This may be related to different geographical distribution of these Populus species.

Conclusions
In this study, we completed the comparative analysis based on transcript sequences of eight Salicaceae species from public databases.All the pairwise orthologues were identified in these species, from which we constructed a phylogenetic tree and estimated the rate of substitutions.The divergence times were estimated by the comparative transcriptomic analysis, and this suggested the speciation of Populus was involved in the period from the MMCT to Quaternary Ice Age.Furthermore, a number of positive selection genes were found to be related to environmental factors.In particular, cold-, salt-, drought-and H 2 O 2 -stress genes may be the driving force of species formation in the Populus phylogeny.The study shows that the paleoclimate change and selective evolution played an important role in the divergence of Populus phylogeny.

Figure 1 .
Figure 1.Distribution of coding sequences (CDS) length of eight Salicaceae species.

Figure 1 .
Figure 1.Distribution of coding sequences (CDS) length of eight Salicaceae species.

Figure 2 .
Figure 2. Divergence between orthologues of seven Populus and one Salix species.The heat map is drawn based on the combined sequences of 1835 putative orthologues in eight species.The orthologues were annotated to a different function with Gene Ontology (GO) (TableS1).Sequence

Figure 2 .
Figure 2. Divergence between orthologues of seven Populus and one Salix species.The heat map is drawn based on the combined sequences of 1835 putative orthologues in eight species.The orthologues were annotated to a different function with Gene Ontology (GO) (TableS1).Sequence similarity was indicated with different colors from red (highly similar) to blue (weakly similar).

Figure 3 .
Figure 3. Phylogenetic tree of seven Populus and one Salix species.Phylogram derived from multiple sequence alignments based on the combined 1835 orthologous transcripts using maximum likelihood (ML) method (Sequences in File S1).Paleoclimate change is presented by different colors.

Figure 4 .
Figure 4. Distribution of the rate of non-synonymous substitutions (Ks) values of orthologous pairs between P. trichocarpa and others.

Figure 3 .
Figure 3. Phylogenetic tree of seven Populus and one Salix species.Phylogram derived from multiple sequence alignments based on the combined 1835 orthologous transcripts using maximum likelihood (ML) method (Sequences in File S1).Paleoclimate change is presented by different colors.

Figure 3 .
Figure 3. Phylogenetic tree of seven Populus and one Salix species.Phylogram derived from multiple sequence alignments based on the combined 1835 orthologous transcripts using maximum likelihood (ML) method (Sequences in File S1).Paleoclimate change is presented by different colors.

Figure 4 .
Figure 4. Distribution of the rate of non-synonymous substitutions (Ks) values of orthologous pairs between P. trichocarpa and others.

Figure 4 .
Figure 4. Distribution of the rate of non-synonymous substitutions (Ks) values of orthologous pairs between P. trichocarpa and others.
Main geograhoical distribution of seven Populus, Figure S1.GO classes of shared orthologues in eight Salicaceae species, File S1.Sequences of 1835 shared orthologues in eight Salicaceae species, File S2.Sequences of resistance genes in genus Populus.Author Contributions: Conceptualization, Y.-j.Z.; analysis, Y.-j.Z. and C.-z.H.; validation, Y.C. and H.Z.; writing-original draft preparation, Y.-j.Z.; writing-review and editing, Y.C. and H.Z.; funding acquisition, C.-z.H. Funding: This research was funded by Project of National Natural Science Foundation (31560211).

Table 2 .
Number and the rate of non-synonymous substitutions (Ks) peaks of orthologous genes in eight Salicaceae species.

Table 2 .
Number and the rate of non-synonymous substitutions (Ks) peaks of orthologous genes in eight Salicaceae species.

Table 3 .
Positive selection orthologues in genus Populus.

Table 4 .
Information of resistance genes involved in positive selection in genus Populus.Ppr respectively stand for P. tricocarpa, P. pruinosa.Sequences show in File S2.Nr: Non-redundant protein database; Ka: the rate of synonymous substitutions; Ks: the rate of non-synonymous substitutions.