Transcriptomic Analysis of Phenotypic Changes in Birch (Betula platyphylla) Autotetraploids

Plant breeders have focused much attention on polyploid trees because of their importance to forestry. To evaluate the impact of intraspecies genome duplication on the transcriptome, a series of Betula platyphylla autotetraploids and diploids were generated from four full-sib families. The phenotypes and transcriptomes of these autotetraploid individuals were compared with those of diploid trees. Autotetraploids were generally superior in breast-height diameter, volume, leaf, fruit and stoma and were generally inferior in height compared to diploids. Transcriptome data revealed numerous changes in gene expression attributable to autotetraploidization, which resulted in the upregulation of 7052 unigenes and the downregulation of 3658 unigenes. Pathway analysis revealed that the biosynthesis and signal transduction of indoleacetate (IAA) and ethylene were altered after genome duplication, which may have contributed to phenotypic changes. These results shed light on variations in birch autotetraploidization and help identify important genes for the genetic engineering of birch trees.


Introduction
Polyploidization has played an important role in plant evolution. Many of the phenotypes commonly observed in polyploid derivatives of flowering plants can be grouped together into the "giga" category, which includes increased sizes of floral organs and fruits, larger leaves and often an overall fleshy appearance, making polyploids quite interesting to both agriculture and horticulture [1][2][3]. In forestry, a greater understanding of the formation and characterization of tree polyploidy has led to well-developed polyploid cultivars. Polyploids in Populus, Robinia, Broussonetia and Edgeworthia have been shown to exhibit such characteristics as greater yield, longer fibers and higher resistance compared to their diploid counterparts [4][5][6][7][8][9][10].
The members of the genus Betula form a particularly significant group of broad-leaved trees in Eurasia and North America. Certain birch species, such as Betula platyphylla, Betula pendula, Betula pubescens and Betula papyrifera, are valuable sources of wood, and great importance is attached to breeding work aimed at their economic improvement [11]. A natural European birch (B. verrucosa) triploid, discovered by Löve, displayed "gigantism" in its morphological organs [12]. Since then, tree-breeding scientists have performed a great deal of work with polyploids in Betula [13][14][15][16][17][18][19][20]. Pieninkeroinen reported that tetraploids in B. pendula and B. pubescens bear long fibers and vessels [17]. No other studies about artificial polyploid breeding in Betula have been reported, however, and polyploid cultivars in Betula have had not been employed in production. In 2004, 99 B. platyphylla autotetraploids were obtained using colchicine treatment. These autotetraploids exhibited phenotypic changes compared to diploids, but information concerning transcriptomic changes after genome duplication in Betula has been scant. In a previous study, researchers often obtained transcriptome profiles of polyploid plants using microarray analysis, which requires whole-genome data [21][22][23]. Nevertheless, birch is an outbreeding species with a high degree of heterozygosity, the prohibitive costs associated with sequencing and assembling such a complex genome make it unfeasible to consider whole-genome sequencing in the near future. Fortunately, newly developed high-throughput sequencing technology, i.e., next generation sequencing (NGS), including sequencing employing the Roche/454 Genome Sequencer FLX, the ABI SOLiD System and the Illumina Genome Analyzer, is a powerful and cost-efficient tool for advanced research in many areas, especially de novo transcriptome sequencing for polyploid plants containing unknown genome information [24,25]. In the present study, we compared differences between transcriptomes of diploid and autotetraploid B. platyphylla plants using Illumina paired-end sequencing technology. By investigating changes in the expression of genes attributable to genome duplication, our study deepens our understanding of variations in birch autotetraploidization and identifies important genes for the genetic engineering of birch trees.

Detection of Birch Autotetraploids
Ploidy analysis of novel saplings and control diploid saplings was performed using flow cytometry analysis and chromosome counts. Results from flow cytometry analysis are shown in Figure 1a,b. Figure 1a represents the DNA content of the control diploids, with a main peak at channel 100.

Phenotypic Changes versus Ploidy in Birch
The phenotypes of autotetraploid trees differed significantly from diploids in four full-sib families ( Figure 2 and Table 1). The breast-height diameter, volume, leaf, fruit and stoma of autotetraploid individuals were significantly larger than those observed in the diploids of each family. The mean breast-height diameter, volume, leaf area, fruit length, fruit diameter and stoma lengths of autotetraploids were 25.09%, 37.31%, 81.43%, 12.37%, 36.97% and 89.72% larger than those of the diploids, respectively. Nevertheless, the heights of the autotetraploid individuals were significantly lower than those observed for the diploids in each family. The mean height of the autotetraploids was only 5.58 m, which was 18.61% lower than that of diploids. The maximal height of the autotetraploid a b c d in four families was only 6.51 m, which was 4.96% lower than the mean height observed for the diploids. These results indicate that the vertical growth of the autotetraploids was inferior to that of diploids, but the increase in breast-height diameter and volume and the leaf, fruit and stoma sizes of autotetraploids were superior to those of diploids. Consequently, the autotetraploid trees were stouter, and the diploid trees were more slender.

RNA-Seq, de Novo Assembly and Functional Annotation
The apical meristems in plant shoot tips contain a pool of pluripotent stem cells that can self-maintain and produce cells that can differentiate into different cell and tissue types [26], making this tissue a key target for the investigation of plant development and organ formation. Consequently, we constructed a transcriptome from the tender shoot tips of an individual autotetraploid tree and one from its diploid sister from the 6 × 5 family using Illumina paired-end sequencing technology. The transcriptomes of the diploid and autotetraploid produced 4,612,611,600 nt and 4,944,254,580 nt raw data, respectively, and the Q20 percentages were both over 96% ( Table 2). Short reads from two transcriptomic libraries were assembled into unigenes, taking the distance of pair-end reads into account. A library produced from the diploid contained 78,213 unigenes, while a library produced from the autotetraploid contained 88,607 unigenes. The mean lengths of the diploid and autotetraploid unigenes were 625 nt and 622 nt, respectively. All of the sequences were assembled, yielding 84,788 non-redundant unigenes with a mean length of 762 nt, and the N50 was 1294 nt (Table 3). Table 3. Quality of de novo assembly of diploid and autotetraploid.  Figure 3a). Based on Nr annotation and E-value distribution, 74.67% of the mapped sequences showed strong homology (E-value ≤ 10 −20 ), and 51.36% showed very strong homology (E-value ≤ 10 −50 ) to available plant sequences ( Figure 3b).

Gene Ontology Classfication of DEUs
The differentially expressed unigenes (DEUs) were selected based on expression profiles. A total of 7052 unigenes were found to have been upregulated in the autotetraploids compared to the diploids. Moreover, the expression of 3658 unigenes was downregulated in the autotetraploids ( Figure 4).  Gene ontology analysis was performed by mapping each DEU to the GO database. In total, 2651 DEUs matched genes in Blast2GO were separated into gene ontology classes according to biological process, cellular component and molecular function ( Figure 5). With regard to biological process, metabolic processes (929 unigenes, 35.04%) and cellular processes (866 unigenes, 32.67%) were prominently represented. For cellular component, cell (1422 unigenes, 53.64%), cell part (1255 unigenes, 47.34%) and organelle components (960 unigenes, 36.21%) represented the majorities of this category. Catalytic activity (1195 unigenes, 45.08%) and binding (1125 unigenes, 42.44%) represented a high percentage of the molecular function category. Moreover, cell death, the aminoglycan catabolic process, response to stimulus, the polysaccharide catabolic process and stress differed significantly (corrected p-value < 0.05) between autotetraploids and diploids.

Pathway Analysis of DEUs
To further investigate biological behavior, KEGG pathway analysis was used to identify the biological pathways of DEUs. There were 3437 DEUs mapped into 122 KEGG pathways, some of which were consistent with biological processes already revealed by GO analyses. The pathway that differed most significantly between the autotetraploids and diploids was plant-pathogen interactions (Ko04626), followed by stilbenoid, diarylheptanoid and gingerol biosynthesis (Ko00945), limonene and pinene degradation (Ko00903), phenylpropanoid biosynthesis (Ko00940), amino sugar and nucleotide sugar metabolism (Ko00520), flavonoid biosynthesis (Ko00941), biosynthesis of secondary metabolites (Ko01110), plant hormone signal transduction (Ko04075) and phenylalanine metabolism (Ko00360). The DEUs could be divided into two groups related to phenotype and transcriptome differences between the autotetraploids and diploids based on previous knowledge about plant growth. The first group of DEUs were involved in the biosynthesis and signal transduction of indoleacetate (IAA), including eleven genes involved in aldehyde dehydrogenase (EC: 1.2.1.3), indole-3-acetaldehyde oxidase (EC: 1.2.3.7), AUX1, AUX/IAA and GH3 ( Figure 6 and Table 4). The second group were genes involved in the biosynthesis and signal transduction of ethylene, including six genes involved in aminocyclopropanecarboxylate oxidase (EC: 1.14.17.4), MAPK, ERF1 and ERF2 (Figure 7 and Table 5). All of these genes were upregulated in the autotetraploid.

Verification of RNA-Seq by q-PCR
To test the reliability of RNA-Seq further, q-PCR was performed with specific primers for the 17 differentially expressed genes involved in the biosynthesis and signal transduction of IAA and ethylene.
Expression patterns revealed by q-PCR analysis were similar to those revealed by RNA-Seq for the same genes (Figure 8). In addition, a significantly positive correlation (r = 0.833; p < 0.01; n = 17) was found between RNA-Seq and q-PCR results from these genes.

Plant Material
The experiment was initiated in April 2004. Seeds from four full-sib families (5 × 3, 5 × 9, 5 × 11 and 6 × 5; all parent plants were located in an intensive seed orchard in Harbin, China) were soaked in 0.1% (w/v) colchicine for 48 h in the dark. A number of seeds from these four full-sib families were soaked in distilled water for the controls. The seeds were sown in a greenhouse after colchicine treatment, and a total of approximately 7000 saplings were obtained. In 2005, 128 novel saplings and 120 control diploid saplings (30 saplings chosen from each family at random) were transplanted into plastic pots in an intensive seed orchard.

Ploidy Measurement
The DNA contents of leaves of novel saplings were evaluated by flow cytometry (PA, Partec, Germany) using the methods of Zhang et al. [27]. Briefly, approximately 2 cm 2 leaf material was chopped with a razor blade in a plastic petri dish containing 1.5 mL nuclei extraction buffer (10 mmol/L MgSO 4 ·7H 2 O, 50 mmol/L KCl, 5 mmol/L HEPES, 1% (w/v) PVP-40, 0.25% (v/v) Triton X-100, pH 8.0). The homogenate was filtered through a 30 μm Partec Celltrics™ nylon filter to remove cell debris, and 1 mL staining solution (Partec) was added. The analysis was performed automatically. The total DNA content of each sample was measured, and the percentages of the cells that showed varied DNA content were processed using DPAC software. Meanwhile, the DNA content of diploid leaves was measured in the control group.
Mitotic chromosomes were isolated from young leaf buds using the protoplast dropping method of Anamthawat-Jónsson [28]. Briefly, buds were collected in ice water and treated for 24 h to arrest the cells at metaphase before fixing the buds in a mixture of three parts absolute ethanol to one part glacial acetic acid. The fixed buds were digested in an enzyme mixture of 2.5% (w/v) cellulase Onozuka R10 (Merck, Darmstadt, Germany) and 2.5% (v/v) pectinase (Sigma-Aldrich, St. Louis, MO, USA) for 5 h at room temperature. After hypotonic treatment with 75 mmol/L KCl for 5-10 min and repeated fixation to clear the cytoplasm, the protoplasts were placed onto microscope slides using a dropper. The chromosomes were stained with carmine and visualized under a microscope (Axioimanger A1, ZEISS, Germany) using 1000× magnification. Chromosomes were counted from 10 to 20 cells in metaphase from each individual tree.

Phenotype Measurement
Height, breast-height diameter, volume, leaf, fruit and stoma of diploid and autotetraploid trees were measured in 2009. The areas of mature leaves were measured using a portable area meter (LI-3100C, LI-COR, USA). At least ten healthy, undamaged and fully expanded leaves from the top of each individual tree were measured at random. Stoma lengths were evaluated using a scanning electron microscope (S-4800, Hitachi, Japan). Briefly, approximately 1 cm 2 from the center of each leaf was fixed in 2.5% (v/v) glutaral for 24 h and then transferred to 70% (v/v) ethanol for storage. Prior to evaluation, the material was dehydrated through a graded series of alcohol-isoamyl acetate, critical point dried in carbon dioxide for 15 min with a critical point dryer (HCP-2, Hitachi) and coated with gold palladium at 15 nm with an ion coater (EIKO IB-3, Hitachi). At least 30 stomata from each individual tree were viewed and measured at 15 kV using 800× magnification.
The lengths and diameters of the fruits were measured using a vernier caliper. At least ten healthy, undamaged fruits from each tree that had flowered were selected at random and measured. Height and breast-height diameter measurements were taken after the vegetation period ended. Individual tree volumes were calculated following the methods of Meng [29].

RNA Extraction, Library Construction and RNA-Seq
For transcriptomic analysis, a single autotetraploid tree and its diploid sister from the 6 × 5 family, both of which were healthy, had flowered and possessed erect stems, were chosen. In June 2011, shoot tips from diploid and autotetraploid trees were collected, immediately frozen in liquid nitrogen and stored at −80 °C until use. Total RNA from each sample was isolated using a modified CTAB method and digested with DNaseI (RNase free) to remove contaminating DNA. Enrichment of mRNA, fragment interruption, addition of adapters, size selection and PCR amplification and RNA-Seq were performed by staff at the Beijing Genome Institute (BGI) (Shenzhen, China). Total mRNA was isolated with oligo (dT) cellulose and broken into short fragments. Using these short fragments as templates, the first-strand cDNA and second-strand cDNA were synthesized. Sequencing adapters were ligated to short fragments after purification using a QiaQuick PCR extraction kit, which were used to distinguish different sequencing samples. Fragments that were 200 ± 25 bp in length were then separated by agarose gel electrophoresis and selected as sequencing templates for PCR amplification. Finally, the two transcriptomic libraries were sequenced using Illumina HiSeq™ 2000. The remaining RNA was used for real-time quantitative RT-PCR verification.

De Novo Assembly and Functional Annotation
The raw reads were first filtered by removing the adapter sequences and low quality sequences, which included reads with N percentages (i.e., the percentage of nucleotides in a read that could not be sequenced) of over 5% and sequences containing more than 20% nucleotides with a Q-value ≤ 10. The Q-value represents the sequencing quality of related nucleotides. Clean reads were used in de novo assembly and read-mapping to the transcriptome. RNA-Seq data was de novo assembled using the Trinity assembling program [30]. For each library, short reads were first assembled into longer but gapless contigs. Then the reads were mapped back to contigs, taking the distance of paired-end reads as frame. The contigs were connected to access the sequence that could not be extended on either end, and the sequence of the unigene was then produced. Next, unigenes from two libraries were further spliced and assembled to obtain maximum length nonredundant unigenes using TGICL clustering software [31] with a minimum overlap length of 100 bp. After clustering, the unigenes were divided to two classes: clusters, with the prefix CL and singletons, with the prefix Unigene. Finally, unigenes were aligned with the NCBI Nr, Swiss-Prot, KEGG and COG protein databases using Blastx with an E-value ≤ 10 −5 . The best results were used to further determine the sequence orientations of the unigenes. If results from different databases conflicted with each other, a priority order of Nr, Swiss-Prot, KEGG and COG was followed to determine the sequence orientation. When a unigene did not align with any of the above databases, ESTScan software [32] was used to predict the coding regions and orientation of the sequence. Blast2GO [33] was used to obtain GO annotation of the unigenes based on Blastx hits against the NCBI Nr database. Blastn was used to align the unigenes to the NCBI Nt nucleotide database, retrieving proteins with the highest sequence similarity with the given unigenes along with their protein functional annotations.

Differential Expression of Unigenes
An alignment package, SOAPaligner [34], was used to map reads back to the transcriptome. The number of mapped clean reads for each unigene was counted and normalized into an RPKM value (reads per kb per million reads), which is commonly used to calculate unigene expression [35]. The fold change for each transcript was then calculated using the log 2 formula of autotetraploid RPKM/diploid RPKM. If the value of either autotetraploid RPKM or diploid RPKM was zero, 0.001, rather than 0, was used to calculate the fold change. The method of Audic et al. [36] was used to analyze differential gene expression. The formula used to calculate the probability of a specific gene being expressed equally between the two samples was as follows: (1) where N 1 and N 2 indicate the total number of clean reads in a diploid and autotetraploid, respectively, and x and y indicate the mapped clean read counts of the transcript in each sample, respectively. The FDR (false discovery rate) method [37] was used to determine the threshold of p-values in multiple tests. In this study, FDR ≤ 0.001 and |log 2 ratio| ≥ 1 were the thresholds employed to judge the significance of differentiated gene expression.

Gene Ontology Functional Enrichment Analysis of Differentially Expressed Unigenes (DEUs)
The analysis mapped all DEUs to GO terms in the database to calculate the numbers of genes for every term. A hypergeometric test was then performed to find significantly enriched GO terms in DEUs compared to the transcriptome background. The formula used for this calculation was as follows: where N is the number of all unigenes with GO annotation, n is the number of DEUs in N, M is the number of all unigenes annotated to the specific GO terms and m is the number of DEUs in M. The calculated p-value was subjected to Bonferroni Correction using the corrected p-value of 0.05 as the threshold. GO terms fulfilling this condition were defined as significantly enriched GO terms in DEUs. This analysis was able to recognize the main biological functions that DEUs exercised.

Pathway Analysis of DEUs
The Blastall program was used to annotate the pathways of DEUs against the KEGG database. The formula used to calculate p-value was similar with that used in the GO analysis. In this formula, N represents the total number of unigenes with KEGG annotation, n is the number of DEUs in N, M is the total number of unigenes annotated to specific pathways and m is the number of DEUs in M.

Real-Time Quantitative RT-PCR (q-PCR) Verification
The expression of candidate genes was confirmed by q-PCR. Primers for these genes were designed manually (Table 6). RNA extraction and DNase treatment were performed as described above, and the first-strand cDNA was synthesized using ReverTra qPCR RT Master Mix with gDNA Remover (Toyobo, Osaka, Japan) according to the manufacturer protocol. The cDNA was diluted tenfold and used as the template for q-PCR.  The q-PCR mixture comprised a 3 μL template of the RT reaction mixture, 10 μL of 2× SYBR Green Master Mix (Toyobo, Osaka, Japan) and 0.5 μL of forward and reverse primer (10 μmol/L) brought to a final volume of 20 μL with water. The reactions were performed in an MJ Opticon™ -2 machine (Bio-Rad, Hercules, CA, USA) using the two-step method that was initiated for 30 s at 94 °C followed by 44 cycles of 94 °C for 12 s, 55 °C for 30 s, 72 °C for 30 s and 78.5 °C for 1 s for plate reading. A melting curve was performed from 55 °C to 95 °C to check the specificity of the amplified product. All experiments were conducted with three biological replicates for each sample. The expression was calculated by 2 −ΔΔCt and normalized to values obtained from the 18S rRNA and α-tubulin controls.

Discussion
Polyploidization events occur frequently during plant evolution and most natural autopolyploids probably did not originate through "simple" genome doubling [38]. For the past few years, analyses of whole-genome sequences, extensive expressed sequence tag (EST) sets and duplicated genomic regions have led to the realization that genome doubling has occurred repeatedly during plant evolution [39]. Variations in plant morphology and physiology resulting from genome duplication have occurred in many plants, which led to the production of fast-growing, high-quality plants due to changes in genes expression that were shown to have occurred [3]. In the present study, we analyzed transcriptomic variation associated with birch (B. platyphylla) autotetraploidization that resulted in the differential expression of 10,710 unigenes, some of which are involved in cell death, metabolic processes, plant responses to stimulus and plant hormone signal transduction. These results indicate that phenotypic changes may bear some relation to transcriptomic alteration after genome duplication.
According to transcriptomic analysis, eleven genes were upregulated in the biosynthesis and signal transduction of IAA, including genes involved in aldehyde dehydrogenase, indole-3-acetaldehyde oxidase, AUX1, AUX/IAA and GH3. It has been known for decades that the plant hormone IAA plays a critical role in regulating plant growth and cell enlargement [40]. Although several pathways have been proposed, the indole-3-pyruvic acid (IPA) pathway is the main IAA biosynthetic pathway [41], and IAA is biosynthesized from indole-3-acetaldehyde (IAAld), which is catalyzed by aldehyde dehydrogenase and indole-3-acetaldehyde oxidase [42,43]. During transport, IAA can enter a cell by diffusion or by carrier-mediated uptake and leave a cell through the action of efflux carriers; AUX1 is a major influx carrier [44]. As two related families of proteins, Aux/IAA and ARF are key regulators of IAA-modulated gene expression (AUX/IAA, GH3 and SAUR), and AUX/IAA, GH3 and SAUR play a critical role in regulating plant growth and cell enlargement [40,45]. In the present study, eleven upregulated genes were involved not only in IAA biosynthesis but also in signal transduction. This result provides an explanation for the "gigantism" of breast-height diameter, volume, leaves, fruit and stoma found in autotetraploid trees.
In the present study, it was also shown that the mean breast-height diameter of autotetraploids was 25.09% larger than that of diploids, whereas the mean height of autotetraploids was 18.61% lower than that of diploids. Furthermore, six genes involved in the biosynthesis and signal transduction of ethylene were shown to be upregulated in autotetraploids, including genes involved in the aminocyclopropanecarboxylate oxidase, MAPK, ERF1 and ERF2. Ethylene is a gaseous plant hormone with an effect on seedlings known as the triple response. In the model plant species Arabidopsis thaliana, the triple response is characterized by the inhibition of hypocotyl and root elongation, a thickened hypocotyl and an exaggerated apical hook [46]. Ethylene is biosynthesized from 1-aminocyclopropane-1-carboxylate (ACC), which is catalyzed by aminocyclopropanecarboxylate oxidase [47]. MAPK participates in ethylene signal transduction [48]. The transcription factors ERF1 and ERF2 activate the additional downstream ethylene-responsive genes by binding to the GCC-box [49,50]. The six upregulated genes involved in aminocyclopropanecarboxylate oxidase, including MAPK, ERF1 and ERF2, may be related to the stoutness of autotetraploid trees.
Previous research showed that polyploid birch has a higher resistance to biotic and abiotic stress than diploid birch. Triploid birch (B. verrucosa× B. pubescens) was more resistant to birch rust (Melampsoridium betulinum) [14]. Pentaploid and hexaploid birch (B. papyrifera) were more tolerant of water deficit than their diploid relatives [20]. This study revealed that responses to stimulus, responses to stress and plant-pathogen interactions differed significantly between autotetraploids and diploids. Therefore, autotetraploid birch (B. platyphylla) may have a higher resistance to stress than diploid birch. Consequently, this should be investigated in the future.
Polyploid plants contain new gene regulatory networks after genome duplication, leading to phenotypic and physiological changes [3,39,51]. Complete RNA-Seq data are essential to understanding the new regulatory networks [52]. RNA-Seq analysis of different autotetraploid birch organs and plants in different stages of development should be performed to obtain a deeper understanding of molecular variation after genome duplication.