De Novo Transcriptome Sequencing of Low Temperature-Treated Phlox subulata and Analysis of the Genes Involved in Cold Stress

Phlox subulata, a perennial herbaceous flower, can survive during the winter of northeast China, where the temperature can drop to −30 °C, suggesting that P. subulata is an ideal model for studying the molecular mechanisms of cold acclimation in plants. However, little is known about the gene expression profile of P. subulata under cold stress. Here, we examined changes in cold stress-related genes in P. subulata. We sequenced three cold-treated (CT) and control (CK) samples of P. subulata. After de novo assembly and quantitative assessment of the obtained reads, 99,174 unigenes were generated. Based on similarity searches with known proteins in public protein databases, 59,994 unigenes were functionally annotated. Among all differentially expressed genes (DEGs), 8302, 10,638 and 11,021 up-regulated genes and 9898, 17,876, and 12,358 down-regulated genes were identified after treatment at 4, 0, and −10 °C, respectively. Furthermore, 3417 up-regulated unigenes were expressed only in CT samples. Twenty major cold-related genes, including transcription factors, antioxidant enzymes, osmoregulation proteins, and Ca2+ and ABA signaling components, were identified, and their expression levels were estimated. Overall, this is the first transcriptome sequencing of this plant species under cold stress. Studies of DEGs involved in cold-related metabolic pathways may facilitate the discovery of cold-resistance genes.

numerous specific cold-related genes. Our results provided a foundation for understanding the cold response mechanism of P. subulata and provided a valuable resource for the development of cold-tolerant plants through genetic manipulation.

Results and Discussion
2.1. Phlox subulata (P. subulata) Transcriptome Sequencing and de Novo Assembly P. subulata plants grew and blossomed in the spring and autumn ( Figure 1A), producing a 10-15 cm high plant with old stem half-lignification ( Figure 1B), needle-like and leathery leaves, and 2 cm pink flowers ( Figure 1C-F). Sequence analysis and assembly were performed to investigate the transcriptome and gene expression profiles of P. subulata under normal and cold stress. Four cDNA samples from seedlings of CT (4, 0 and −10 °C, subsequently referred to as CT1, CT2 and CT3, respectively) and CK (20 °C) plants were sequenced using an Illumina HiSeq 2000 platform. In total, we obtained approximately 55-59 million raw reads for CT and CK samples. After removing the low-quality reads and reads containing adaptors, 21.3 × 10 7 clean reads consisting of 19.2 × 10 9 nucleotides (nt) were obtained with a Q20 percentage (an error probability of 0.01) of more than 97% for four samples ( Table 1). All clean reads were deposited in the NCBI Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/Traces/sra) database with accession number SRP055942.  Transcriptome de novo assembly was performed using Trinity program [15]. All high-quality clean reads of each sample were assembled into 125,583 (CT1), 120,123 (CT2), 140,329 (CT3) and 126,166 (CK) contigs, respectively (Table 1). In four samples, the average contig length exceeded 340 nt (length distributions of these contigs are shown in Figure S1). The contigs of each sample were then joined into unigenes, generating 81,059 (CT1), 75,181 (CT2), 85,491 (CT3) and 82,948 (CK) unigenes, respectively. After long-sequence clustering of four samples, a total of 99,174 unigenes were obtained for all samples. The total length was 98,892,318 nt, with a mean length of 997 nt and an N50 of 1622 nt ( Table 1). The length distributions of unigenes of each sample are given in Figure 2.

Functional Annotation and Classification of the Assembled Unigenes
To validate and annotate the assembled unigenes, sequence similarity searches were conducted using sequence-and domain-based alignments. In total, 99,174 unigenes from all groups significantly matched a sequence in at least one of the public databases, including NCBI non-redundant protein (NR), Swiss-Prot protein, Kyoto Encyclopedia of Genes and Genomes (KEGG), Clusters of Orthologous Groups (COG), and Gene Ontology (GO) databases. There were 59,994 unigenes (60.49% of all unigenes) with homologous sequences in at least one of the aforementioned databases ( Table 2). Among them, 55,996 (56.5%), 39,519 (39.8%), 36,150 (36.5%), 24,872 (25.1%) and 41,688 (42.0%) unigenes were found in NR, SwissProt, KEGG, COG, and GO databases, respectively. The remaining 39,180 (39.5%) unmapped unigenes were not identified ( Table 2). After searching all unigene sequences against four protein databases (NR, SwissProt, KEGG, and COG) using BLASTx (E-value < 0.00001), we identified 57,044 coding sequences (CDSs) and predicted proteins. For unigenes with no BLAST hits, we used ESTScan to predict 6834 CDSs and predicted proteins. The distributions of the CDSs and predicted proteins are shown in Figure 3.   For the NR annotations, 55,996 of the unigenes were found to be matched in the database. For the E-value distribution, 44.4% of moderate homolog sequences ranged from 1.0 × 10 −5 to 1.0 × 10 −45 , while 55.6% of sequences had a threshold E-value less than 1.0 × 10 −45 , showing strong homology ( Figure 4A). The identity distribution pattern showed that 59% of the sequences had a similarity higher than 60%, while 41% showed similarity between 17% and 60% ( Figure 4B). The majority of the annotated sequences corresponded to the known protein sequences of plant species. However, we found that 32.8% of the unigenes had similar sequences to proteins from Vitis vinifera, followed by Lycopersicon esculentum (11.1%), Amygdalus persica (9.5%), Ricinus communis (7.2%) and Populus balsamifera subsp. Trichocarpa (6.4%; Figure 4C). In addition, the other 24.2% unigenes had matches with other plant species, such as Cucumis sativus and Medicago truncatula. This result suggested that the sequences of the P. subulata transcripts generated in the present study were assembled and annotated properly. Based on the NR annotation, we used the GO annotation to classify the possible functions of the unigenes. GO annotation is an international classification system that provides controlled vocabulary for descriptions of gene functions [16]. A total of 41,688 (42.0% of all unigenes) unigenes were successfully assigned to at least one GO term. All the GO terms were classified into 56 functional groups, including three main categories: biological processes, cellular components, and molecular functions ( Figure 5). Among biological processes, transcript sequences assigned to cellular (26,638), metabolic (25,871) and single-organism processes (18,247) were the most abundant. For cellular components, proteins were mostly assigned to cell (31,028), cell part (31,024) and organelle (24,793) categories. Within the molecular function category, the majority of the GO terms were predominantly assigned to binding (19,903), catalytic activity (21,442) and transporter activity (3303) ( Figure 5). Similar results were found in many plant species [11][12][13], suggesting that the DEGs were responsible for fundamental biological metabolism and regulation. To further predict gene function and to evaluate the integrality of the P. Subulata transcriptome, all unigenes were searched against the COG database. Overall, 24,872 (25.1% of all unigenes) unigenes were assigned COG classifications ( Figure 6). By classifying the possible functions of these unigenes, we grouped the unigenes into 25 functional categories. The largest category was "General function prediction only" (8804 of 24,872 unigenes, about 35.4%), followed by "Transcription" (4186 unigenes, 16.8%), "Replication, recombination and repair" (3820, 15.4%), "Translation, ribosomal structure, and biogenesis", (3708, 14.9%), and "Signal transduction mechanisms" (3307, 13.3%). The categories of "Nuclear structure" (19, 0.07%), "Extracellular structures" (21, 0.08%) and "Cell motility" (320, 1.29%) had the fewest responding unigenes. Additionally, 2287 (9.10%) unigenes were annotated as "Function unknown" (Figure 6).

Analysis of Potential Differentially Expressed Genes (DEGs)
To identify different expression levels of unigenes, we used the reads per kb per million reads (RPKM) method to calculate the expression levels of the unigenes from CT and CK samples. Screening of DEGs (FDR ≤ 0.001 and ratios larger than 2), the results showed that the numbers of both up-and down-regulated genes changed with reduction of the temperature. The distribution of transcript changes of unigenes is shown in Figure S2. Among all DEGs, 8302 were induced by cold, while 9898 were down-regulated after incubation at 4 °C (CT1; Figure 7 and Table S1). After incubation at 0 °C (CT2), 10,638 genes were up-regulated and 17,876 were down-regulated (Figure 7 and Table S2). After incubation at −10 °C (CT3), 11,021 genes were up-regulated, and 12,358 were down-regulated ( Figure 7 and Table S3). We also identified a total of 3417 up-regulated unigenes expressed only in the CT samples (Table S4). Among them, 533, 1580 and 1922 unigenes were found in CT, CT2 and CT3, respectively ( Figure 8). Some of these unigenes had no homologs in the NCBI database, suggesting that they may represent new cold-related transcripts that have not been reported in model plants.

Analysis of DEGs Related to Metabolism
To understand the biological function of DEGs, we performed metabolic pathway enrichment analysis using the KEGG database. A total of 21,155 up-regulated unigenes and 22,737 down-regulated unigenes were identified after cold treatment. All DEGs were related to 128 metabolic pathways (Table S5). Among the mapped pathways, 43, 33 and 61 pathways were significantly enriched (p ≤ 0.05) after cold treatment. We listed the 10 most significantly enriched pathways (Table 3). After incubation at 4 °C, the four most significantly enriched pathways were plant-pathogen interactions ( Figure S3), ether lipid metabolism ( Figure S4), biosynthesis of secondary metabolites and glycerophospholipid metabolism pathways ( Figure S5). After incubation at 0 °C, the ribosome and plant-pathogen interaction pathways were significantly identified as the top two pathways. After incubation at −10 °C, we found that biosynthesis of secondary metabolites, metabolic pathways, phenylpropanoid biosynthesis, ether lipid metabolism and glycerophospholipid metabolism were significantly enriched (Table 3).

Identification of Major Genes Involved in Cold Stress
Next, we concentrated on analysis of genes up-regulated in response to cold stress in P. subulata. Functional annotations of up-regulated unigenes showed that some unigenes were closely related to cold stress, including cold-inducible transcription factors, cold-induced proteins, antioxidant enzymes, osmoregulation proteins, and proteins related to Ca 2+ and ABA signaling [12,13,17,18]. We selected 20 cold-related unigenes and estimated their expression levels by the RPKM method (Table 4). Among these genes, dehydration-responsive element binding protein (DREB), ethylene-responsive element binding factor (AP2/ERF), and the transcription factors MYB (v-myb avian myeloblastosis viral oncogene homolog) and ICE1 (inducer of CBF expression 1) were found to be significantly induced by cold stress in P. subulata. These transcription factors have been described as major factors involved in plant cold-stress responses [5,19,20]. Small groups of cold-related proteins, including low temperature-regulated (LT), cold-regulated (COR), and shock proteins were involved in responses to cold stimuli. Genes within the LT and COR groups have been previously reported to be involved in the cold stress response process [21,22]. Additionally, some antioxidant enzymes and osmoregulation proteins were identified, including superoxide dismutase (SOD), peroxidase (POD), catalase (CAT), glutathione peroxidase (GPX), water stress-induced protein, and sugar transporter. Under cold stress, reactive oxygen species (ROS) accumulate, and water becomes imbalanced in plants. These phenotypes are harmful to both the membrane and related biochemical reactions [23]. The accumulation of antioxidant enzymes and osmoregulation proteins contributes to the stabilization of membranes as these molecules protect membranes against cold injury [18,24]. Moreover, we also found that ABA-hydroxylase and calcium-dependent protein kinase (CDPK) were up-regulated (Table 4). These genes may be involved in the modification and processing of ABA and Ca 2+ signaling molecules, which play important roles in signal transduction under cold stress [18,25]. Finally, many unigenes with no known homologs in the NR were also found. These unigenes may represent novel genes specifically expressed in P. subulata or may participate in specific regulatory mechanisms associated with cold adaptation in P. subulata. However, further studies are needed to verify and characterize these unigenes.

Plant Materials and Cold Treatments
P. subulata L. plants were grown in the nursery for three months at the Heilongjiang Academy of the Sciences (China; 128.4°E, 45.0°N). Subsequently, control plants were grown in a growth chamber under a 12/12-h light/dark photoperiod (20-40 μm·s −1 ·m −2 light intensity) at 20 °C. For cold treatments, plants were transferred to 4, 0 and −10 °C under the same light source. Roots, stems, and leaves of three CT and CK plants were collected simultaneously after treatment for three days and immediately stored at −80 °C until required for RNA extraction.

RNA Extraction, cDNA Library Construction, and Transcriptome Sequencing
Total RNA from each sample was isolated using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. The RNA samples were examined using a NanoDrop ND-8000 Spectrophotometer (NanoDrop, Wilmington, DE, USA); the A260/A280 ratios of four samples ranged from 1.8 to 2.0. The integrity of the RNA samples was assessed with an Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA, USA). The RNA sample (mixed RNA of root, stem, leaf according to 1:1 proportion) was sent to the Beijing Genomic Institute (BGI, Shenzhen, China) for RNA sequencing. Accordingly, cDNA library construction and Illumina transcriptome sequencing were performed following the methods described by Shu et al. [14].

De Novo Assembly and Functional Annotation
Raw reads produced from sequencing machines contained dirty reads with adapters or low-quality bases. These data negatively affect bioinformatics analysis. Therefore, dirty raw reads were discarded. De novo assembly of the transcriptome was achieved using the short-reads assembly program, Trinity [15]. Trinity combined reads with a certain length of overlap to form longer fragments, known as contigs. These contigs were subjected to sequence clustering to form longer sequences. Such sequences were defined as unigenes.
All of the assembled unigenes were compared with the publicly protein databases, including NR, Swiss-Prot protein, KEGG, COG and GO databases, using BLASTx analysis with a cut-off E-value of 10 −5 . The best alignments were used to identify the sequence direction of the assembled unigenes. Unigenes were first aligned to protein databases in the priority order of NR, Swiss-Prot, KEGG and COG. When unigenes could not be aligned using these databases, ESTScan software was introduced to determine the sequence direction [26]. For the NR annotations, the BLAST2GO program was used to obtain GO annotations of assembled unigenes for describing biological processes, molecular functions, and cellular components [27]. After obtaining GO annotations for assembled unigenes, WEGO software [28] was used to determine GO functional classifications for understanding the distribution of unigene functions. Unigenes were aligned to the COG database to predict and classify possible functions of the unigenes. In the final step, KEGG pathway [29] annotations were performed to analyze the metabolic pathways and functions of unigenes.

Differential Expression Analysis of Unigenes
RPKM [30] was used to calculate unigene expression levels, which eliminated the influence of gene length and sequencing level. The RPKM method formula was: where C is the number of reads that uniquely aligned to one unigene; N is the total number of reads that uniquely aligned to all unigenes; and L is the base number in the CDS of one unigene.
The FDR control method [31] was used in multiple hypothesis testing to correct for p values. After the FDR was obtained, the ratio of RPKMs was used to calculate the fold-change in the expression of unigenes in two samples simultaneously. The smaller the FDR and the larger the ratio, the larger was the difference in the expression level between the two samples. In our analysis, DEGs were screened with an FDR threshold of 0.001 or less and an absolute value of the log2 ratio of 1 or more [32]. For pathway enrichment analysis, we looked for significantly enriched metabolic pathways or signal transduction pathways containing DEGs by comparison with the whole genome. Finally, some cold stress-related genes were selected and listed.

Conclusions
In this study, we first reported the transcriptome data of P. subulata under cold stress using Illumina/Solexa. The total length of the reads was ~20 Gb. A total of 99,174 unigenes were assembled. The large number of unigenes and their functional annotations provide valuable resource for genetic and genomic studies in P. subulata plants. A large number of DEGs involved in cold stress were identified. Further, we identified a total of 3417 up-regulated unigenes expressed only in the cold-treated P. subulata plants. Studies of these up-regulated unigenes involved in cold-related metabolic pathways will facilitate the discovery of other cold-resistance genes.