Transcriptomic Proﬁling of Cryptomeria fortunei Hooibrenk Vascular Cambium Identiﬁes Candidate Genes Involved in Phenylpropanoid Metabolism

: Cryptomeria fortunei Hooibrenk (Chinese cedar) is a coniferous tree from southern China that has an important function in landscaping and timber production. Lignin is one of the key components of secondary cell walls, which have a crucial role in conducting water and providing mechanical support for the upward growth of plants. It is mainly biosynthesized via the phenylpropanoid metabolic pathway, of which the molecular mechanism remains so far unresolved in C. fortunei . In order to obtain further insight into this pathway, we performed transcriptome sequencing of the C. fortunei cambial zone at 5 successive growth stages. We generated 78,673 unigenes from transcriptome data, of which 45,214 (57.47%) were successfully annotated in the non-redundant protein database (NR). A total of 8975 unigenes were identiﬁed to be signiﬁcantly di ﬀ erentially expressed between Sample_B and Sample_A after analyzing their expression proﬁles. Of the di ﬀ erentially expressed genes (DEGs), 6817 (75.96%) and 2158 (24.04%) were up- and down-regulated, respectively. 83 DEGs were involved in phenylpropanoid metabolism, 37 DEGs that encoded v-Myb avian myeloblastosis viral oncogene homolog (MYB) transcription factor (TF), and many candidates that encoded lignin synthesizing enzymes. These ﬁndings contribute to understanding the expression pattern of C. fortunei cambial zone transcriptome. Furthermore, our results provide additional insight towards understanding the molecular mechanisms of wood formation in C. fortunei . levels using the FPKM method conducted annotation and enrichment analysis of DEGs. Firstly, we performed sample clustering to obtain gene expression of C. fortunei vascular cambium. The gene expression patterns of samples November clustered together, while from July making samples collected in May their own cluster These results indicate a higher sample similarity between these samples. In addition, the three biological replicates were clustered together, indicating the reliability of our transcriptome data.


Introduction
The Cryptomeria genus consists of the species Cryptomeria fortunei Hooibrenk and Cryptomeria japonica (L.f.) D.Don (Japanese cedar). Cryptomeria fortunei Hooibrenk is an important coniferous timber species native to China. This species is a monoecious coniferous species which is widely planted in southern China due to its strong adaptability. C. fortunei has excellent properties that allow for efficient timber production, including a straight bole, soft texture, rapid growth, and ease of processing. Thus, its wood is widely used to construct wooden houses, barrels, and a large number of industrial materials. Additionally, as a photosynthesizing plant, C. fortunei is an important plant

RNA Extraction and Transcriptome Sequencing
Total RNA was isolated from the samples of C. fortunei vascular cambium by a RNeasy Plant Mini Kit (Qiagen, Hilden, Germany). The integrity and concentration of total RNA were assessed by an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) and a Thermo Scientific NanoDrop 2000 (Thermo Fisher Scientific, Wilmington, DE, USA). Library preparation and sequencing experiments were performed in accordance with the standard procedure provided by Illumina. Sequencing was then performed using an Illumina HiSeq™ 2500 system by OE biotech Co., Ltd. (Shanghai, China), generating 150 bp paired-end reads. The accession number of this project is PRJNA 644276.

Assembly and Functional Annotation
We obtained millions of clean reads after removing adaptor and low-quality sequences. Clean reads were then assembled into expressed sequence tag clusters (contigs), which were then de novo assembled into transcripts using Trinity and the paired-end method [13]. Subsequently, we described the longest transcript as a unigene using CD-HIT [14]. Unigene was chosen for subsequent analysis. Unigenes were aligned by diamond [15] and HMMER [16] to public databases NR, KOG, GO, Swiss-Prot, eggNOG, KEGG, and Pfam with the highest sequence similarity for protein functional annotation and classification.

Differential Expression Analysis
Unigene expression was quantified according to the fragments per kb per million reads (FPKM) method [17], using bowtie2 [18] and eXpress [19]. Through pairwise comparisons, DEGs of different stages were identified by DESeq [20]. A threshold of p < 0.05 and a greater than two-fold change were set [21]. To explore expression patterns, we performed a sample to sample distances cluster analysis [22]. GO and KEGG enrichment analysis of DEGs were performed using R based on the hypergeometric distribution [21].

Verification of Gene Expression Using qRT-PCR
8 unigenes were chosen for validation through qRT-PCR. RNA was extracted from the cambium region then reverse-transcribed using a HiScript III RT SuperMix (Vazyme Biotech Co., Ltd., Nanjing, China). We designed the primers using Primer Premier 5.0 software (Premier Biosoft International, Palo Alto, CA, USA) (Supplementary Table S1). Three biological replicates were run at a final volume of 20 µL, which consisted of 6 µL of ddH 2 O, 1 µL of primers, 2 µL of cDNA, and 10 µL of 2× ChamQ SYBR qPCR Master Mix (Vazyme Biotech Co., Ltd., Nanjing, China). The C. fortunei β-actin gene was used as reference [11]. The primers used were F: GCCATCTTTGATTGGAATGG and R: GGTGCCACAACCTTGACTT. The qRT-PCR reaction was performed on an ABI 7500 Step One Plus Real-time PCR System (Applied Biosystems, Foster City, CA, USA). Reactions were performed at 95 • C for 30 s, followed by 40 cycles of 95 • C for 10 s, and 60 • C for 30 s. The delta-delta-Ct method was used to assess the amplification results [23].

Determination of Lignin Content
The lignin content of stems of the same year was determined according to Reference [24]. We acquired the data by a GeneQuant pro ultraviolet spectrophotometer (Biochrom Ltd., England, UK).

Statistics of Transcriptome Sequencing Results and De Novo Assembly
In order to obtain candidate genes involved in phenylpropanoid metabolism, in this study, we performed transcriptome sequencing of the C. fortunei cambial zone at 5 successive growth stages. As a result, we obtained a total of 724,452,816 raw reads from C. fortunei cambium total RNA across all of our samples. From these, we assembled five complete transcriptomes, one for each growth stage ( Table 1). The percentage of raw reads with a Q-value > 30 ranged from 92.33% to 94.89% for all samples, and the average GC content (the percentage of the total number of G's and C's in clean bases) was 44.01%. After removing low-quality reads and adaptor sequences, a total of 706,935,392 clean reads were obtained, which were used for de novo assembly. We obtained 78,673 unigenes using Trinity software, and found the average unigene length to be 957 bp, with an N50 length of 1576 bp. The sequence length distribution is shown in Supplementary Figure S1. 1 Q30 represents the percentage of bases whose phred number is greater than 30 in raw bases. 2 GC represents the percentage of the total number of G's and C's in clean bases.
Following this functional categorization, we continued analyzing transcription factor (TF) categorization across the different known plant TF families. As a result, we identified a total of 1401 TFs, which could be further classified into 66 TF families, such as WRKY, NAC, bZIP, and others (Supplementary Figure S2). We found the C2H2 family to be most abundant, with 233 unigenes, followed by AP2/ERF-ERF (115) and bHLH (100). We also found 71 and 34 unigenes encoding MYB and NAC TFs in this study.

Identification of DEGs and the GO and KEGG Enrichment Analysis
In our study, we calculated unigene expression levels using the FPKM method and conducted annotation and enrichment analysis of DEGs. Firstly, we performed sample clustering analysis to obtain gene expression patterns of C. fortunei vascular cambium. The gene expression patterns of samples collected in September and November clustered together, while those from April and July did as well, making the samples collected in May their own cluster ( Figure 4). These results indicate a higher sample similarity between these samples. In addition, the three biological replicates were clustered together, indicating the reliability of our transcriptome data.

Identification of DEGs and the GO and KEGG Enrichment Analysis
In our study, we calculated unigene expression levels using the FPKM method and conducted annotation and enrichment analysis of DEGs. Firstly, we performed sample clustering analysis to obtain gene expression patterns of C. fortunei vascular cambium. The gene expression patterns of samples collected in September and November clustered together, while those from April and July did as well, making the samples collected in May their own cluster ( Figure 4). These results indicate a higher sample similarity between these samples. In addition, the three biological replicates were clustered together, indicating the reliability of our transcriptome data.

Identification of DEGs and the GO and KEGG Enrichment Analysis
In our study, we calculated unigene expression levels using the FPKM method and conducted annotation and enrichment analysis of DEGs. Firstly, we performed sample clustering analysis to obtain gene expression patterns of C. fortunei vascular cambium. The gene expression patterns of samples collected in September and November clustered together, while those from April and July did as well, making the samples collected in May their own cluster ( Figure 4). These results indicate a higher sample similarity between these samples. In addition, the three biological replicates were clustered together, indicating the reliability of our transcriptome data.  We then identified all C. fortunei vascular cambium DEGs through pairwise comparisons. The amount of DEGs for each pairwise comparison is shown in Supplementary Figure S3. With A as a reference, we compared B vs. A, C vs. A, D vs. A, and E vs. A: 8975, 4432, 11,683, and 17,774 unigenes were differentially expressed in all four pairwise comparisons, respectively.
In a previous study, we observed the development of C. fortunei vascular cambium by studying their morphology using paraffin sections [25]. We found that cellular growth and development were most vigorous in May. Here, we chose B vs. A (May vs. April) as an example to explain the DEGs' functionality. A total of 4165 DEGs were found through GO enrichment analysis in B vs. A, of which 3230 and 935 were up-and down-regulated, respectively ( Figure 5A). To describe our GO annotation results, we constructed a directed acyclic graph (DAG) using topGO [26] ( Figure 5B). The most significant enrichment in the 'biological process' category is 'secondary metabolic process' (GO: 0019748) and 'phenylpropanoid metabolic process' (GO: 0009698).  We then identified all C. fortunei vascular cambium DEGs through pairwise comparisons. The amount of DEGs for each pairwise comparison is shown in Supplementary Figure S3. With A as a reference, we compared B vs. A, C vs. A, D vs. A, and E vs. A: 8975, 4432, 11,683, and 17,774 unigenes were differentially expressed in all four pairwise comparisons, respectively.
In a previous study, we observed the development of C. fortunei vascular cambium by studying their morphology using paraffin sections [25]. We found that cellular growth and development were most vigorous in May. Here, we chose B vs. A (May vs. April) as an example to explain the DEGs' functionality. A total of 4165 DEGs were found through GO enrichment analysis in B vs. A, of which 3230 and 935 were up-and down-regulated, respectively ( Figure 5A). To describe our GO annotation results, we constructed a directed acyclic graph (DAG) using topGO [26] ( Figure 5B). The most significant enrichment in the 'biological process' category is 'secondary metabolic process' (GO: 0019748) and 'phenylpropanoid metabolic process' (GO: 0009698).   To further explore DEGs biological function, we performed a KEGG enrichment analysis: 2628 DEGs were successfully annotated into 24 pathways in B vs. A ( Figure 6). We found 146 DEGs, of which 134 and 12 were up-and down-regulated, annotated to the secondary metabolism pathway, which indicates that these DEGs are involved in secondary metabolism. indicates a high confidence level. The GO terms are presented at the horizontal node position. The red arrows represent 'secondary metabolic process' (GO: 0019748) and 'phenylpropanoid metabolic process' (GO: 0009698), respectively.
To further explore DEGs biological function, we performed a KEGG enrichment analysis: 2628 DEGs were successfully annotated into 24 pathways in B vs. A ( Figure 6). We found 146 DEGs, of which 134 and 12 were up-and down-regulated, annotated to the secondary metabolism pathway, which indicates that these DEGs are involved in secondary metabolism.

Identification of Candidate Genes Involved in Phenylpropanoid Metabolism
In order to understand how gene activity changes between growth stages, we continued by comparing KEGG pathway enrichment of DEGs (Figure 7). One of the main KEGG pathways undergoing enrichment dynamics was phenylpropanoid metabolism (ko00940). We identified 83 DEGs involved in this pathway in B vs. A, of which 76 DEGs were upregulated.

Identification of Candidate Genes Involved in Phenylpropanoid Metabolism
In order to understand how gene activity changes between growth stages, we continued by comparing KEGG pathway enrichment of DEGs (Figure 7). One of the main KEGG pathways undergoing enrichment dynamics was phenylpropanoid metabolism (ko00940). We identified 83 DEGs involved in this pathway in B vs. A, of which 76 DEGs were upregulated.
Lignin is mainly synthesized through the phenylpropanoid metabolic pathway (Figure 8), phenylalanines are converted to monolignols by the enzymes phenylalanine ammonia-lyase (PAL)  Lignin is mainly synthesized through the phenylpropanoid metabolic pathway (Figure 8), phenylalanines are converted to monolignols by the enzymes phenylalanine ammonia-lyase (PAL)    We then analyzed the expression of key enzymes involved in phenylpropanoid biosynthesis ( Figure 9). All of them had different expression patterns at different stages. Five enzymes, including C3H, CCR, 4CL, PAL, and CCoAOMT, were all present at higher expression levels at stage_May. Two enzymes, including PAL and HCT, showed higher expression levels at stage_November, which indicates that these enzymes could play roles in response to cold stress. Most enzymes displayed lower expression levels at stage_April and November than at other stages, which could be caused by the seasonally cyclical pattern of dormancy and activity. Furthermore, the expression of these We then analyzed the expression of key enzymes involved in phenylpropanoid biosynthesis ( Figure 9). All of them had different expression patterns at different stages. Five enzymes, including C3H, CCR, 4CL, PAL, and CCoAOMT, were all present at higher expression levels at stage_May. Two enzymes, including PAL and HCT, showed higher expression levels at stage_November, which indicates that these enzymes could play roles in response to cold stress. Most enzymes displayed lower expression levels at stage_April and November than at other stages, which could be caused by the seasonally cyclical pattern of dormancy and activity. Furthermore, the expression of these enzymes increased from July to September and decreased again from September to November. This finding is consistent with the general trend of lignin content. In the present study, the lignin content increased gradually from April (10.88%) to September (34.56%) and stabilized from September to November (34.75%) (Supplementary Figure S4). These phenomena revealed that key enzymes involved in phenylpropanoid biosynthesis might be responsible for the seasonal change in wood formation activity.

Quantitative Real-Time PCR Validation of Candidate Genes Involved in Phenylpropanoid Biosynthesis
In this study, we randomly selected eight unigenes involved in phenylpropanoid biosynthesis and examined their expression levels using qRT-PCR. We firstly performed the melt curve analysis. We found that all samples (including 3 replicates) have a single peak and the temperature is between 80 and 90 °C (Supplementary Figure S5), which indicates that the data is reliable. The expression profiles of these candidates are shown in Supplementary Figure S6. Although the exact fold changes between stages of each unigene varied somewhat between RNA-seq and qRT-PCR data, the trends

Quantitative Real-Time PCR Validation of Candidate Genes Involved in Phenylpropanoid Biosynthesis
In this study, we randomly selected eight unigenes involved in phenylpropanoid biosynthesis and examined their expression levels using qRT-PCR. We firstly performed the melt curve analysis. We found that all samples (including 3 replicates) have a single peak and the temperature is between 80 and 90 • C (Supplementary Figure S5), which indicates that the data is reliable. The expression profiles of these candidates are shown in Supplementary Figure S6. Although the exact fold changes between stages of each unigene varied somewhat between RNA-seq and qRT-PCR data, the trends between the different stages were overall similar. We could find just one candidate gene (TRINITY_DN57952_c0_g1_i1_3) of which the expression values were inconsistent with our RNA-seq data. Therefore, these results confirm the accurate assembly of the transcript sequences and reliability of our RNA-seq data.

Discussion
Wood is an important raw material with a rapidly increasing worldwide demand. As a result, more research is being devoted to analyzing the genetic regulation of wood formation. An important tool for such research is transcriptome sequencing, which can be used to discover genes that control economic traits. In a previous study, we identified the different expression pattern of reproductive genes in two conifer species through transcriptome sequencing [27]. In this study, approximately 72.44 million paired-end reads were obtained. After assembly, we obtained 78,673 unigenes, with an average length of 957 bp, significantly longer than has been reported previously for Cunninghamia lanceolata (Lamb.) Hook (449 bp) [28], Camellia sinensis (L.) O. Ktze. (355 bp) [29], and Porphyra yezoensis (Rhodophyta) (419 bp) [30], and slightly shorter than C.japonica (1069 bp) [31], and thereby providing more abundant genetic information to understand the mechanism of lignin biosynthesis.
Lignin is mainly synthesized through the phenylpropanoid metabolic pathway. We aimed to find DEGs involved in lignin biosynthesis. As a result, we obtained 83, 29, 49, and 74 DEGs in four pairwise comparisons, of which 76, 22, 43, and 46 DEGs were upregulated, respectively. We found most DEGs when comparing the growth stages May vs. April and November vs. April, which is most likely because of the seasonally cyclical pattern of dormancy and activity in wood formation. In this study, we found 8, 3, 4, and 8, and 9, 2, 1, and 1 DEGs encoding PAL and 4CL respectively, in four pairwise comparisons, most of which were upregulated. It is consistent with the expression patterns in Figure 9. Similarly, Mishima et al. [31] found the homologues, PAL4 and 4CL3, and showed an increasing expression pattern during cessation of growth. This finding indicates that PAL and 4CL might be regulated to the cold stress. We also found most DEGs encoding CAD, CCoAOMT, and CCR to be upregulated. Most enzymes were induced in April, and expression level gradually decreased from August to October in C. japonica [31]. The expression pattern corresponded to our previous study about anatomical observation of cambium cells [25]. Previous reports have found that CAD [32], CCoAOMT [11], and CCR [33] promote lignin synthesis, which is consistent with the expression patterns of these enzymes ( Figure 9) and the associated corresponding changes in lignin content.
Previous studies have found that temperature plays an important role in dormancy development [34]. During our study, the daily maximum and minimum temperatures increased from 4 April to 18 May and peaked at 10 July (Supplementary Table S3). Subsequently, temperatures steadily declined from July to November. Similarly, from our previous work, we found cambium cells undergoing peri-planar divisions, with the largest number of layers, full cells, cytoplasm, and vigorous divisions in May, while the number of cell layers decreased in July and September compared to May, and with the number of cells being the least and division stopping in November [25]. Furthermore, we found that the expression of most lignin-synthesizing enzymes in C. fortunei vascular cambium changed in tandem with rising and falling temperatures, consistent with the growth season and previous studies [31]. Analogously, the growing season of Japanese cedar, another gymnosperm, runs from March until October, with lignin synthesis activity sharply increasing from March to June, then declining until dormancy in October [31]. Interestingly, Sato et al. [35] analyzed the diurnal periodicity of expression of lignin synthesizing genes in C. japonica and found that most enzymes showed different expression abundance at different times on the same day. This paper provides a good research direction for our future study.
Previous reports have found TFs involved in the regulation of lignin biosynthesis [36][37][38], response to the abiotic stress [39][40][41][42][43], and regulation of growth and developmental processes [44][45][46][47][48]. MYB TFs could regulate lignin synthesis by binding to the corresponding regulatory elements of lignin-synthesizing enzyme genes, whereas NAC TFs could regulate the corresponding MYB TFs to regulate lignin synthesis. During this study, we obtained 37, 16, 33, and 37 DEGs encoding MYB TFs in all four pairwise comparisons, of which 26, 10, 26, and 20 were upregulated. Similarly, Mishima et al. [31] found that 34 MYB were upregulated during the peak activity of xylem formation. These results are consistent with our findings in this paper. In addition, we found that these TFs regulated some metabolic pathways, including the phenylpropanoid metabolic pathway. We also analyzed the expression patterns of NAC TFs and obtained 15, 4, 12, and 12 NAC TFs showing differential expression in all four pairwise comparisons. Similarly, most TFs were upregulated. Mishima et al. [31] found a VND6 homolog and its expression was moderately decreased during peak xylem formation. According to the expression patterns of lignin-synthesizing enzymes in this study, these results imply a potential role of TFs in the regulation of lignin biosynthesis.

Conclusions
C. fortunei is a plant tree species that has a large number of excellent qualities, such as rapid growth, a straight bole, and ease of processing for wood production. In this study, we performed transcriptome sequencing on C. fortunei vascular cambium for 5 successive growth stages. We identified candidate genes involved in phenylpropanoid metabolism and analyzed expression patterns of lignin-synthesizing enzymes. Finally, we found the correlation of enzyme expression with different growth stages. Thus, our findings contribute to a better understanding of the molecular mechanisms underlying the phenylpropanoid biosynthesis pathway. Importantly, these results may be useful for molecular breeding of C. fortunei to improve wood characteristics.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4907/11/7/766/s1: Figure S1: De novo transcriptome assembly, Figure S2: Distribution of TFs according to their TF family, Figure S3: Identification of DEGs. The x-axis represents the comparison groups, and the y-axis represents the number of DEGs, Figure S4: The lignin content of 5 growth stages, error bars represent standard deviation of three biological replicates, Figure S5: The melt curve of qRT-PCR, Figure S6: qPCR validation of RNA-seq data. The x-axis represents the growth stages, and the y-axis represents the fold change (log2). Table S1: Primers used for qRT-PCR, Table S2: Functional annotation of unigenes, Table S3: The daily maximum and minimum temperatures at the sampling date.