Next Article in Journal
Using Conjoint Analyses to Improve Cable Yarder Design Characteristics: An Austrian Yarder Case Study to Advance Cost-Effective Extraction
Next Article in Special Issue
Elevated CO2 Increases Root Mass and Leaf Nitrogen Resorption in Red Maple (Acer rubrum L.)
Previous Article in Journal
Do High-Voltage Power Transmission Lines Affect Forest Landscape and Vegetation Growth: Evidence from a Case for Southeastern of China
Previous Article in Special Issue
Input-Output Budgets of Nutrients in Adjacent Norway Spruce and European Beech Monocultures Recovering from Acidification
 
 
Article
Peer-Review Record

Comparative Genome and Transcriptome Analysis Reveals Gene Selection Patterns Along with the Paleo-Climate Change in the Populus Phylogeny

Forests 2019, 10(2), 163; https://doi.org/10.3390/f10020163
by You-jie Zhao 1, Chang-zhi Han 2, Yong Cao 1 and Hua Zhou 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Forests 2019, 10(2), 163; https://doi.org/10.3390/f10020163
Submission received: 26 December 2018 / Revised: 12 February 2019 / Accepted: 12 February 2019 / Published: 15 February 2019
(This article belongs to the Special Issue Effects of Climate Change and Air Pollutants on Forest Tree Species)

Round  1

Reviewer 1 Report

In this study, the authors performed a comparative genomics approach based on transcriptomic data publicly available, to study the Populus lineage evolution. They highlighted more than 20 genes related to abiotic tolerance under positive selection hypothesized as involved in the Populus evolutionary patterns.

The study addresses a topic of interest for the Populus lineage evolution and overall plants.  The MS is well presented and structured.

General remarks:

I think that the discussion section was shallowly treated and could be further developed. The phylogenetic analysis showed 3 Populus sections, so here, the authors can dig more for example: Is there a link between section/climate/geographical origins?  (As they briefly described section 4.2) In other words how these genes selection patterns/climate changes have shaped the current Populus distribution and structuring of intra-specific genetic diversity?

Or is there a possible indication of the presence of some section in known paleo-refugia? If yes how is the selection pattern within concerned species? I encourage the authors to compare their results with other species as Amborella for example (Tournebize et al, 2017. Two disjunct Pleistocene populations and anisotropic postglacial expansion shaped the current genetic structure of the relict plant Amborella trichopoda.)

 The conclusion section lacks perspectives; we expected more emphasis on the Populus evolutionary dynamics. Like for the discussion, I encourage the authors to further develop.

 Specific remarks

Line 45 It would be better to put Table 1 in the material and method section (2.1 Data sources).

Line 58 For intelligibility purpose, it would be better to have a brief explanation of Nr.

Line 59 “..to obtain relevant GO terms.” No supplementary information concerning the used metrics to validate the Go terms (must be added) and why not a figure highlighting the enriched GO terms! In order to the choice of genes does not seem biased!   

Line 67  2.4. Phylogenetic analysis section needs supplementary information concerning the chosen mutation model and how it was chosen (BIC /AIC ?)

Line 99 Figure 2 The color heat distribution may seem counterintuitive, as one expects that similar sequences have hotter color.

Line 120 The “formula” (T = K/2r) should be presented in the material and method section.

Line 161 Table 4  I have some issue to understand the difference between >1 and 1.7721(for example) which is for me also > 1?

Author Response

Dear Editor and reviewers,

Thank you for your useful comments and suggestions on our manuscript. We have modified the manuscript accordingly, and detailed corrections are listed below point by point:

 For Reviewer 1

 (1)Reviewer 1:  The discussion section could be further developed. How these genes selection patterns/climate changes have shaped the current Populus distribution and structuring of intra-specific genetic diversity?  The authors should compare their results with other species as Amborella for example (Tournebize et al, 2017. )

 Answer:

We add the Table S2- “Main geograhoical distribution of seven Populus”.

 And we revise the discussion in lines 192-193:

“P. eupratica and P. pruinasa of section Turanga are mainly distributed in the deserts of Northern Africa and western China (Table S2).”

We revise the discussion in lines 196-198:

“Our results show that the H2O2 stress gene was generally identified to be involved in positive selection between section Turanga and the other three sections. H2O2 stress gene can help plants to develop abiotic resistance to adapt to the complex environment.”

 We revise the discussion in lines 200-204:

“P. tremula and P. tremuloides of section Populus are mainly distributed in the cooler and drought region of Northern America, Europe and Asia (Table S2). Our results show that drought stress genes are involved in positive selection between P. tricocarpa (section Tacmahaca) and P. tremula (section Populus), and P. deltoids (section Aigeiros) and P. tremula (section Populus). Speciation of section Populus might be related to selective evolution of drought stress genes during the MMCT.”

 We revise the discussion in lines 206-211:

“Previous research had suggested that Pleistocene (1.8–0.1 Mya) glacial cycles acted as drivers of speciation of Populus balsamifera and P. tricocarpa [7]. The same results were found in Amborella trichopoda that two main genetic groups of Amborella were shaped by the divergence of two ancestral populations during the last glacial maximum [58]. In our work, cold stress genes were found to be involved in positive selection between P. nigra (section Aigeiros) and P. tricocarpa (section Tacmahaca), and P. deltoids (section Aigeiros) and P. tricocarpa (section Tacmahaca).”

 We revise the discussion in lines 214-217:

“P. tricocarpa of section Tacmahaca was mainly distributed in coastal western North America. Our results show that salt stress genes were involved in positive selection between P. tricocarpa and P. nigra , between P. tricocarpa and P. deltoids. This may be related to different geographical distribution of these Populus species.”

 (2) Review 1: The conclusion section lacks perspectives; Like for the discussion, encourage the authors to further develop.

 We revise the conclusion in lines 222-227:

“The divergence times were estimated by the comparative transcriptomic analysis, and which suggested the speciation of Populus was involved in the period from MMCT to Quaternary Ice Age. Furthermore, a number of positive selection genes were found to be related to environmental factors. In particular, cold-, salt-, drought- and H2O2- stress genes may be the driving force of species formation in the Populus phylogeny. The study shows that the paleoclimate change and selective evolution had played an important role in the divergence of Populus phylogeny.”

 (3) Review 1: Table 1 should put in the material and method section (2.1 Data sources).

 We revise the Table 1 and put it in material and method section line 60 (2.1 Data sources).

 (4) Review 1: For intelligibility purpose, it would be better to have a brief explanation of Nr.

 We revise the material and method in lines 65-66:

“The annotations obtained from NCBI Non-redundant protein database (Nr) were processed through the BLAST2GO program [27] to obtain relevant GO terms”

 (5) Review 1: No supplementary information concerning the used metrics to validate the Go terms (must be added) and why not a figure highlighting the enriched GO terms.

 We add the S1-Table “GO annotation of shared orthologues in eight Salicaceae species” and S1 Figure “GO classes of shared orthologues in eight Salicaceae species”. And we revise the results in lines 102-105 :

“1835 shared orthologues were found among the eight Salicaceae species (Figure 2). The orthologues were functionally annotated using GO terms (Table S1 and Figure S1), and 339 orthologues were involved in biological processes (218), cellular components (113), and molecular functions (273)”

(6) Review 1: Phylogenetic analysis section needs supplementary information concerning the chosen mutation model and how it was chosen (BIC /AIC ?)

We revise the method in lines 79-82:

“There were still some inconsistencies on phylogenetic relationship in previous studies [3, 4]. Single copy genes by OrthoFinder were aligned by Muscle [32] and formated by Gblock [33], maximum- likelihood method was used to build the phylogenetic tree by MEGA6 [34] (bootstrap is 1000 and Kimura 2-parameter model). S. purpurea was used as an outgroup to root trees.”

 (7) Review 1 : Figure 2 The color heat distribution may seem counterintuitive, as one expects that similar sequences have hotter color.

We revise the color of Figure2 in the results of line 110.

 (8) Review 1 : The “formula” (T = K/2r) should be presented in the material and method section.

We revise the formula” (T = K/2r) and put it to the material and method section of lines 75-77 .

“Using the fossil calibrations (45 Mya) of genera Salix and Populus [9, 10], the rate of substitution (r) was calculated based on the formula (T = K/2r) and the average Ks value 0.121 between Salix and Populus.”

 (9) Review 1: Table 4  I have some issue to understand the difference between >1 and 1.7721(for example) which is also > 1?

The value was cumulated by Ka/Ks, when the Ks was 0 and Ks>0, Ka/Ks was present as >1; when the Ka and Ks were not 0, the Ka/Ks was present as its rate value.

 I would re-submit this manuscript, and hope it is acceptable for publication in the journal. If there are any problems or questions about our paper, please do not hesitate to let us know.

Thank you very much for your attention to our paper.

Sincerely yours,

Youjie Zhao

 Reviewer 2 Report

Manuscript Forests-425189 by Zhao et al conducted comparative genome and transcriptome analyses for eight Populus species using publically available data, in which the authors have identified orthologs between species, constructed phylogenic tree and estimated the rate of divergence. Based on the analyses, the authors found a number of fast-evolving genes and positive selection genes, and explained the evolutionary patterns and cause of speciation in the Populus lineage.

My primary concern was that the data used do not look like they represent the species well. Data for some species seem to be under-represented (the number of transcript sequences are low). Is there any possibility that the transcript data from JGI, GigaDB and PlantGDB were generated and processed in different ways (e.g., assembly approaches, filtering)? The length seems to be shorter for those from PlantGDB in comparison with those from other two sources. The total number of transcripts for some species is low, e.g, 8,186 (P. deltoids), 5,730 (P. tremuloides). Is it true that the genomes/ transcriptomes of those species are smaller and less complex than others? 

To me, it looks like the data do not represent well the species used in this study. As a result, I think this has affected the results that this manuscript is presenting. For example, in paragraph line 89, it is clearly that comparisions between those having less transcripts resulted in less number of orthologs detected (for P. tremuloides, P. tremuloide and P. eupratica).

Some other minor points to consider:

1.      Line 37: it is probably safer to say “generated” rather than “completely sequenced”, since transcriptome is unlike genome.

2.      For Table 1: the authors could also consider to provide N50 for all datasets.

3.      Figure 1. The authors could consider presenting all the data in one single plot, to allow better comparison.

Author Response

Dear Editor and reviewers,

Thank you for your useful comments and suggestions on our manuscript. We have modified the manuscript accordingly, and detailed corrections are listed below point by point:

 For Reviewer 2

 Reviewer 2: Data for some species seem to be under-represented (the number of transcript sequences are low). Is there any possibility that the transcript data from JGI, GigaDB and PlantGDB were generated and processed in different ways (e.g., assembly approaches, filtering)?

Answer:

We revise the data source in Table 1 of the method, and revise the lines 49-58:

“Unigene sequences of P. nigra, P. deltoids, P. tremula and P. tremuloides were obtained from PlantGDB (http://www.plantgdb.org/prj/ESTCluster/progress.php) [20] and NCBI SRA (https://www.ncbi.nlm.nih.gov/sra/). The cDNA sequences of P. pruinosa, P. trichocarpa (v3.1), P. eupratica and S. purpurea (v1.0) were downloaded from genome projects of GigaDB (http://www.gigadb.org) [21], NCBI (https://www.ncbi.nlm.nih.gov/genome/) and JGI (https://genome.jgi.doe.gov/portal/) [22]. SRA datasets with FASTQ format were filtered to remove raw reads of low quality. Transcriptome assembly was achieved using the short-read assembly program Trinity [23]. The assembled sequences (>=300 bp) were combined and clustered with CD-HIT (version 4.0) [24, 25].”

 Reviewer 2: Is it true that the genomes/ transcriptomes of those species are smaller and less complex than others? And this has affected the results that this manuscript is presenting.

 We revise the data source of Table 1, and get new results in lines 85-94:

“There were 35,447, 35,395, 54,527 and 37,290 annotated genes in the genomes of P. trichocarpa, P. pruinosa, P. eupratica and S. purpurea, respectively. These genes respectively made up a total of 37 Mb, 41 Mb, 74Mb and 44 Mb cDNA sequences with Contig N50 of 1,500 bp, 1,668 bp, 1785bp and 1,581 bp. More than 8,153 (23%), 8,849 (25%), 16,903 (31%) and 8,950 (24%) cDNAs had the length of >1,500 bp in P. trichocarpa, P. pruinosa, and S. purpurea (Figure 1). In contrast, there were 62,740, 42,207, 50902, and45,263 unigenes in the transcriptomes of P. nigra, P. deltoids, P. tremula and P. tremuloides, , which respectively made up a total of 55 Mb, 36 Mb, 40 Mb, and 38 Mb sequences, with Contig N50 of 1,344bp, 1,285 bp, 1,167 bp, and 1,251 bp. In addition, more than 8,156 (13%), 5,064 (12%), 5,090 (10%), and 5,431 (12%) unigenes had the length of >1,500 bp in the transcriptomes of P. nigra, P. deltoids, P. tremula, and P. tremuloides.”

 We revise the all figures and tables based on the new data source, and more orthologues were found in results of lines 98-105:

“All of the pairwise orthologues were identified by comparative analysis of eight Salicaceae species (Table 2). The results are showed that P. trichocarpa has the highest average number (8,198) of orthologous genes, whereas P. tremuloides has the lowest average number (5,148). The highest number of orthologous genes (9,687) was observed between P. trichocarpa and P. pruinosa, whereas the lowest (4,339 was detected between P. tremuloide and S. purpurea. 1835 shared orthologues were found among the eight Salicaceae species (Figure 2). The orthologues were functionally annotated using GO terms (Table S1 and Figure S1), and 339 orthologues were involved in biological processes (218), cellular components (113), and molecular functions (273).”

We rebuild the phylogeny tree based on 1835 orthologues in results of lines 115-118:

“Using S. purpurea as outgroup, phylogenetic reconstruction of Populus was conducted based on 1835 orthologous transcripts using the ML method (Figure 3). The observed phylogenetic relationship is highly consistent with the phylogenetic tree obtained from single-copy DNA sequences of a previous study [3].”

 And more positive selection genes were found in Table 3 and Table 4 of lines 145-147.

“The fast-evolving sequences of positive selection were identified in genus Populus (Table 3), and some resistance genes were found to be related to environmental factors (Table 4).”

 Reviewer 2: it is probably safer to say “generated” rather than “completely sequenced”, since transcriptome is unlike genome.

We revise the introduction in lines 37-38:

“As the transcriptomes of more species are generated, comparative transcriptomics has received more attention from researchers [15–19].”

 Reviewer 2:  For Table 1: the authors could also consider to provide N50 for all datasets.

We revise the Table 1 and add the Contig N50 values.

 Reviewer 2: Figure 1. The authors could consider presenting all the data in one single plot, to allow better comparison.

We revise the Figure 1 and present all the data in one single plot.

 I would re-submit this manuscript, and hope it is acceptable for publication in the journal. If there are any problems or questions about our paper, please do not hesitate to let us know.

Thank you very much for your attention to our paper.

Sincerely yours,

Youjie Zhao

 Round  2

Reviewer 2 Report

1.     In the section “Data sources”, the authors collected data from two sources PlantGDB and NCBI SRA for four species (P. nigra, P. deltoids, P. tremula and P. tremuloides) and combined them for downstream analysis. I am wondering what the authors think about those very short sequences (<100 bp) from PlantGDB? I think the database has also EST sequences, which are only parts of the transcripts (very short min 50 bp). Can the author confirm this?

2.     The authors downloaded FASTQ data from SRA and assembled into contigs using Trinity (which I think reasonable), but one point is not clear to me when they mentioned the use of CD-HIT to reduce the sequence redundancy, was that done on the total combined data (Trinity contigs and PlantGDB sequences) or just on Trinity contigs? If only Trinity contigs were clustered, would it be better to combined with the EST data first, then cluster using CD-HIT-EST? Do the authors think that this could affect the results somehow? 

3.     Should it be CD-HIT-EST? If the parameter “–c 0.95” was used, then in line 58, should it be “similarity ≥95%”?

4.     For species P. euphratica, in the previous version, the data was derived from PlantGDB, however, in this updated version, the authors used only contigs assembled from NCBI dataset. It is likely that the sequences from PlantGDB were derived from several experiments, while the data from NCBI could be from only one single experiment/condition. I am not sure how this could affect the results. One possibility is that, it would affect the number of common orthologs used in phylogenetic analysis (similar to the number orthologs in the first version and this version). Could the authors elaborate this?

5.     I think the authors should cite the publications where the NCBI data accessions were derived from, in the data description.

6.     Additionally, the data description should provide more details, e.g., the statistics for both datasets (Trinity and PlantGDB sequences) before combining and clustering. For Table 1, the authors should retain column “min length (bp)” as in version 1 of this manuscript.

7.     Figure 1: I think it would be better to use the line plot rather than bar plot, similar to what the authors have done in figure 4. It is difficult to follow the bar for each species.

Author Response

Dear reviewers,

Thank you for your useful comments and suggestions on our manuscript. We have modified the manuscript accordingly, and detailed corrections are listed below point by point:

 For Reviewer 2

 (1) Question: In the section “Data sources”, the authors collected data from two sources PlantGDB and NCBI SRA for four species (P. nigra, P. deltoids, P. tremula and P. tremuloides) and combined them for downstream analysis. What the authors think about those very short sequences (<100 bp) from PlantGDB? The database has also EST sequences, which are only parts of the transcripts (very short min 50 bp). Can the author confirm this?

Answer:

The transcripts with amino acid sequences (<50 bp) were filtered, so the min length of CDS was 150 bp in the Table 1. It was revised and shown in the lines 57-59 of the method.

“The assembled transcripts (>=300 bp) and EST sequences of PlantGDB were combined, CDS with amino acid sequences (>=50 bp) were extracted by OrfPredictor [26] and clustered with CD-HIT-EST (version 4.0) [27, 28].”

  (2) Question: The authors downloaded FASTQ data from SRA and assembled into contigs using Trinity (which I think reasonable), but one point is not clear to me when they mentioned the use of CD-HIT to reduce the sequence redundancy, was that done on the total combined data (Trinity contigs and PlantGDB sequences) or just on Trinity contigs? If only Trinity contigs were clustered, would it be better to combined with the EST data first, then cluster using CD-HIT-EST? Do the authors think that this could affect the results somehow?

Answer:

Trinity contigs and PlantGDB sequences were combined together and clustered by CD-HIT-EST. It was revised and shown in the lines 57-60 of the method.

“The assembled transcripts (>=300 bp) and EST sequences of PlantGDB were combined, CDS with amino acid sequences (>=50 bp) were extracted by OrfPredictor [26] and clustered with CD-HIT-EST (version 4.0) [27, 28]. Sequences with clustering threshold 0.95 were divided into one class, and the longest sequence of each class was treated as a unigene during later processing.”

 (3) Question: Should it be CD-HIT-EST? If the parameter “–c 0.95” was used, then in line 58, should it be “similarity ≥95%”?

Answer:

Yes, the sequences were clustered by CD-HIT-EST, and the option –c (clustering threshold) was set 0.95. It was revised and shown in the lines 58-60 of the method.

“CDS with amino acid sequences (>=50 bp) were extracted by OrfPredictor [26] and clustered with CD-HIT-EST (version 4.0) [27, 28]. Sequences with clustering threshold 0.95 were divided into one class, and the longest sequence of each class was treated as a unigene during later processing.”

 (4) Question: For species P. euphratica, in the previous version, the data was derived from PlantGDB, however, in this updated version, the authors used only contigs assembled from NCBI dataset. It is likely that the sequences from PlantGDB were derived from several experiments, while the data from NCBI could be from only one single experiment/condition. I am not sure how this could affect the results. One possibility is that, it would affect the number of common orthologs used in phylogenetic analysis (similar to the number orthologs in the first version and this version). Could the authors elaborate this?

Answer:

In this version, contigs of P. euphratica genome was downloaded from NCBI genome projects. The annotated GFF from assembled genome had contained most of the gene sequences, so the P. euphratica EST sequences of PlantGDB was not used. It affected the number orthologs of phylogenetic analysis. So the number orthologs (Table 2) were different in the first version and this version.

(5) Question: the authors should cite the publications where the NCBI data accessions were derived from, in the data description.

Answer:

We added the publications for the species P. nigra and P. eupratica based on the NCBI data accessions in lines 49-55. And P. deltoids, P. tremula and P. tremuloides were not found any publications in the NCBI.

“Sequences of P. nigra [21], P. deltoids, P. tremula and P. tremuloides were obtained from PlantGDB (http://www.plantgdb.org/prj/ESTCluster/progress.php) [22] and NCBI SRA database (https://www.ncbi.nlm.nih.gov/sra/). The cDNA sequences of P. trichocarpa (v3.1) [1], P. eupratica [2], P. pruinosa [3] and S. purpurea (v1.0) were downloaded from genome projects of GigaDB (http://www.gigadb.org) [23], NCBI (https://www.ncbi.nlm.nih.gov/genome/) and JGI (https://genome.jgi.doe.gov/portal/) [24].”

 (6) Question:  Additionally, the data description should provide more details, e.g., the statistics for both datasets (Trinity and PlantGDB sequences) before combining and clustering. For Table 1, the authors should retain column “min length (bp)” as in version 1 of this manuscript.

Answer:

We revised the Table 1 and added the column “min length (bp)” in lines 61.

 (7) Question:  Figure 1: I think it would be better to use the line plot rather than bar plot, similar to what the authors have done in figure 4. It is difficult to follow the bar for each species.

Answer:

We revised the Figure 1 and used the line plot.

 I would re-submit this manuscript, and hope it is acceptable for publication in the journal. If there are any problems or questions about our paper, please do not hesitate to let us know.

Thank you very much for your attention to our paper.

 Sincerely yours,

Youjie Zhao

Author Response File: Author Response.pdf

Back to TopTop