Transcriptome Characterization of the Chinese Fir ( Cunninghamia lanceolata ( Lamb . ) Hook . ) and Expression Analysis of Candidate Phosphate Transporter Genes

Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) is the most important afforestation tree species in China because of its excellent timber quality and high yield. However, the limited availability of phosphorus in forest soils is widespread and has become an important factor in the declining productivity of Chinese fir plantations. Here we used the Illumina HiSeqTM 2000 DNA sequencing platform to sequence root, stem, and leaf transcriptomes of one-year old Chinese fir clones with phosphorus treatment. Approximately 236,529,278 clean reads were obtained and generated 35.47 G of sequencing data. These reads were assembled into 413,806 unigenes with a mean length of 520 bp. In total, 109,596 unigenes were annotated in the NR (NCBI non-redundant) database, 727,287 genes were assigned for GO (Gene Ontology) terms, information for 92,001 classified unigenes was assigned to 26 KOG (Karyotic Orthologous Groups) categories, and 57,042 unigenes were significantly matched with 132 KEGG (Kyoto Encyclopedia of Genes and Genomes) predicted pathways. In total, 49 unigenes were identified as exhibiting inorganic phosphate transporter activity, and 14 positive genes’ expression patterns in different phosphorus deficiency treatments were analyzed by qRT-PCR to explore their putative functions. This study provides a basic foundation for functional genomic studies of the phosphate transporter in Chinese fir, and also presents an extensive annotated sequence resource for molecular research.


Introduction
Chinese fir (Cunninghamia lanceolata (Lambert) Hooker) is an evergreen coniferous tree species that is primarily distributed in southern China and northern Vietnam [1,2].It is the most commercially important afforestation tree species in China and is known for its rapid growth rate, high yield, and excellent timber quality [3][4][5].The Chinese fir has been widely planted across southern China with a planting history spanning more than 1000 years, and it accounts for 24% of all forested land in China [6].Because of its fast-growing characteristics, the Chinese fir has a high nutrient requirement for its optimal growth, especially of phosphorus (P) [7,8].However, most of the Chinese fir plantations that have been established are deficient in available P because this macro nutrient is easily bound by Ca, Al, and Fe in tropical and subtropical soils [9].The current P utilization rate in Chinese fir plantations is only 10-25%, which is inadequate for the timber industry's requirement for the fast growth of this species [8,10].Generally, suboptimal levels of P cause stunted tree growth and can result in 5-15% yield losses [10,11].Furthermore, the successive rotation patterns characteristic of Chinese fir plantations amplify the lack of available P in forest soil [6].Thus, the mass application of phosphate fertilizer has been commonly used in Chinese fir plantations across the country, which is now causing serious issues of soil and water pollution [12].
The limited availability of P in forest soil is widespread and has become one of the most important factors causing the declining productivity of Chinese fir plantations over successive rotations in south China [13].As a result, extensive research has been conducted on more efficient methods of P fertilization, cultivation, and breeding of Chinese fir over the past 20 years [1,12,[14][15][16].One effective strategy for reducing the use of P fertilizers is the selecting and breeding of Chinese fir genotypes with high P acquisition and utilization efficiencies in areas where P availability is limited [17].Wu et al. have been successful in identifying several Chinese fir clones (e.g., M1, M4, & M32), with high P-use efficiency through repeated generations of selection [10,13].Those clones show a proliferation of root mass, modified root morphology, increase in proton secretion, and enhanced activity of acid phosphatase in leaves and roots in response to P deficiency.These changes in phenotype result in a better growth capacity than common genotypes grown in P deficient soils [10].The high P-use efficiency genotype selection research has also been applied in some additional vital economic crops, such as Populus L. and rice [18,19].As model plant systems, the functional genomic research focused on Populus and rice has revealed that the uptake and translocation of phosphorus from the rhizosphere to different plant tissues is achieved by phosphate transporters (PHTs) [20,21].The efficient acquisition and utilization of soil P by plants are achieved through a diverse group of transporters including PHT1, PHT2, PHT3, and PHT4 sub-families [22].There are at least 42 PHT genes that have been identified, and the presence of 25 genes that are highly expressed in the roots suggests that these might be involved in P uptake [21].Thus, for Populus and rice, PHT gene identification and functional genomics analysis have been very important for the molecular identification and breeding of high P-use efficiency economic crops.Unfortunately, for Chinese fir, there is a significant lack of genomic information and genetic tools, which severely encumbers molecular breeding and functional genomic research in this important timber species [2,23].To date, there is no report available on studies of the PHT genes identification or functional analysis on Chinese fir, even though considerable research has been published on the physiology and ecology of trees grown under conditions of serious P deficiency [1,12].
As a non-model plant species with a very large genome size that is typical of conifers (20 to 30 Gb), the limited molecular research of Chinese fir has focused primarily on transcript profile investigations and patterns of gene expression [24,25].However, the development of next-generation DNA sequencing technologies provides a revolutionary tool for transcriptomics to detect information about a given sample's RNA content [23].In particular, the Illumina HiSeq 2000 platform can provide high throughput, accurate, and cost-effective approaches to generate large unigenes and expression datasets that have proven to be powerful tools to profile plant tissue transcriptomes both qualitatively and quantitatively [26].Huang et al. used transcriptome sequencing to characterize the candidate genes related to cellulose and lignin biosynthesis in Chinese fir; in total, they found 49 unigenes [23].Wang et al. characterized the cambial tissues transcriptome of Chinese fir and identified six alternatively expressed genes correlated with cambial activities [25].These recently acquired transcriptome data for Chinese fir have been highly informative, but until now, have been limited and only focused on vascular cambium activity or dark-grown cotyledons, and no root or leaf transcriptome data has been released in NCBI (National Center for Biotechnology Information) to date (as of August 2017).Thus, transcriptome sequencing is a potentiality new tool that is increasingly being used to identify the genes that control traits of economic value within non-model organisms with undetermined genomes, such as phosphate transporter genes in Chinese fir [27][28][29].
In the present study, the Chinese fir clone M32 selected for its high P-use efficiency was chosen for root, stem, and leaf tissue transcriptome characterization using the Illumina DNA sequencing platform.Based on the bioinformatics analysis of assembled transcriptome data, the candidate phosphate transporter genes were identified.The 14 positive gene expression patterns in root tissues were analyzed by qRT-PCR in different phosphorus deficiency treatments.This study provides a basic foundation for functional genomic studies of the phosphate transporter, and serves as an important public information platform for functional genomics research.

Plant Material and Experiment
The Chinese fir clone M32, known for its high efficiency of P use, was selected for transcriptome characterization and candidate inorganic phosphate transporter genes expression analysis [10].The one-year-old healthy seedlings were raised by reproductive cloning and were chosen for the hydroponic culture experiment in the greenhouse at the College of Forestry, Fujian Agriculture and Forestry University, China.Each seedling was cultivated in a pot, and the nutrients were added to each pot according to a modified Hoagland solution [10,13].This solution contained 5.
The pH of the nutrient solution was regulated to 5.5.The following concentration gradient was established: 0, 0.5, and 1.0 mmol•L −1 KH 2 PO 4 , representing P-starved, low, and normal levels of P supply, respectively, and the deficient K + in the stress is replaced by an equal amount of KCl [13].After normal levels of P were supplied to these hydroponic cultures for seven days, the root, stem, and leaf tissues were harvested from three M32 Chinese fir seedlings for future transcriptome analysis.The remaining seedlings continued in a treatment with 0 and 0.5 mmol•L −1 P supplied for 12 h, 24 h, and 48 h, and then the recovery of P supply occurred for 12 h, 24 h, and 48 h.The root tissue of seedlings was collected for the candidate inorganic phosphate transporter genes expression analysis.All the plant materials were quick frozen in liquid nitrogen upon harvest and stored at −80 • C in a refrigerator until they were ready for RNA extraction.

RNA Extraction, Library Preparation, and Transcriptome Sequencing
Total RNA was extracted using the Plant RNA Kit (OMEGA Bio-Tek, Norcross, GA, USA) according to the manufacturer's instructions.RNA degradation and contamination were monitored with 1% agarose gels.RNA purity was checked using the Nano Photometer ® spectrophotometer (IMPLEN, Westlake Village, CA, USA) and RNA concentration was assayed using the Qubit ® RNA Assay Kit in Qubit ® 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA).RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies, San Francisco, CA, USA).
Total RNA from the root, stem, and leaves of the three plants of clone M32 was pooled prior to RNA-seq libraries preparation.Equimolar quantities of total RNA from the same tissues were combined into one RNA pool.Prior to cDNA library construction, poly-T oligo-attached magnetic beads were used to purify the mRNA, which was then broken into short fragments of approximately 200 bp by fragmentation buffer.The fragments were used to synthesize the first-strand cDNA using random hexamer (N6) primers (Illumina).Second-strand cDNA was then synthesized using DNA polymerase I, dNTPs, and RNase H. Short fragments were purified with a QiaQuick PCR extraction kit and resolved with EB buffer for end reparation and the addition of poly(A).The double-stranded cDNA fragments were connected with sequencing adapters, and a final cDNA library was selectively enriched by PCR and purified using the AMPure XP system (Beckman Coulter, Beverly, MA, USA).The library preparations were paired-end sequenced by Novogene Bioinformatics Technology Co., Ltd.(Beijing, China) via an Illumina HiSeq 2000 platform.Then, 100-bp paired-end reads were generated, and all raw sequence read data were deposited in the NCBI Sequence Read Archive (SRA) database with the accession number of SRX2586188 (Stem), SRX2586189 (Root), and SRX2586190 (Leaf).

Data Processing, Assembly, and Annotation
Prior to assembly, the raw reads were filtered using stringent requirements to remove the reads containing the adapters, reads containing over 10% ambiguous 'N' nucleotides, and reads with over 50% of bases with a quality score lower than 5.The downstream analyses were based on the remaining clean reads and then assembled using a combined de novo transcriptome assembly strategy.The clean reads were assembled into contigs using Trinity software as previously described for de novo transcriptome assembly without a reference genome [27].
These assembled sequences were defined as unigenes, and the unigenes function was annotated using protein similarity analysis.NCBI BLAST 2.2.28 + alignment with an E-value threshold of 10 −5 was performed between the unigenes set and the following protein databases of NR [28], Swiss-Prot [29], GO [30], KOG [31], and KEGG [32,33].Blast2GO v2.5 [34] was used to obtain Gene Ontology (GO) annotation of unigenes based on the GO and NR database.The amino acid sequence of the unigenes was analyzed by HMMER 3.0 [35] software and the outputs were searched for in the protein family (Pfam) [36] database to obtain annotated information for the unigenes.By using the DESeq R package (1.10.1)[37], the annotated unigenes were normalized in FPKM (Fragments Per Kilobase Million) [38] to calculate their relative expression levels.The metabolic pathways were predicted using the GO and KEGG enrichment analysis, respectively, based on the GOseq (1.10.0)[39] and KOBAS [40] software.

qRT-PCR Analysis for Gene Expression
Seven unigenes were chosen for gene expression validation using qRT-PCR.Specific primer pairs for these selected genes were designed by Primer Express Software V3.0 (Applied Biosystems, Foster City, CA, USA), as shown in Table S1.The first-strand cDNA was transcribed from 1 µg of total RNA using the Thermo Scientific Revertaid First Strand cDNA Synthesis Kit (Thermo, Waltham, MA, USA) in 25 µL of reaction mixture.The qRT-PCR was performed with an ABI 7500 Real-Time PCR system (Applied Biosystems) with the Power SYBRH Green PCR Master Mix (Applied Biosystems) according to the manufacturer's protocol.The thermal profile for SYBR Green I RT-PCR was 95 • C for 10 min, followed by 40 cycles of 95 • C for 15 s and 60 • C for 1 min.Three biological replicates were carried out for selected genes, and the mean value was used for graphics plotting.The reference gene (β-Actin) was used for normalization.The comparative CT method (2 −∆∆CT method) [41] was used to analyze the expression levels of the different genes.Statistical analyses were performed using variance (ANOVA) followed by Duncan's new multiple range tests with SPSS version 13.0 (SPSS, Chicago, IL, USA).

Sequencing and De Novo Assembly
To comprehensively cover the transcriptome of Chinese fir, three sequencing libraries were prepared, respectively, from the root, stem, and leaf tissues and sequenced with the Illumina HiSeqTM 2000 sequencing technology paired-end technique.There were 80,914,328 raw reads generated from the root, 86,439,146 from the stem, and 81,365,604 from the leaves (Table 1).A total of 76,829,210 clean reads were obtained from root sequencing, and 96.64% (11.52 Gbp) of the bases had Phred-like quality scores at the Q20 level (an error probability of 0.02%).Of the clean reads data from stem sequencing, 82,165,990 clean reads were obtained and 96.7% (12.32 Gbp) of the bases had a Q value ≥ 20 (an error probability of 0.01%).The clean reads for leaves were 77,534,078 and 96.79% (11.63 Gbp) of the bases had a Q value ≥ 20 (an error probability of 0.01%).The GC-contents were 47.94%, 44.22%, and 44.22% for the root, stem and leaves, respectively.The Trinity software was used to break all high-quality reads and assembled the small fragments into 479,035 transcripts (Table 2) with a mean length of 591 bp.The length of the transcripts mainly ranged from 201 to 17,835 bp, and 141,005 transcripts with a length larger than 500 bp.Size distribution of the transcripts is shown in Figure S1.We identified 413,806 unigenes with a mean length of 520 bp.The lengths of 102,702 and 44,407 unigenes were larger than 500 bp and 1000 bp, respectively, while 75.18% of the unigenes had lengths between 200 and 500 bp (Figure 1).To assess the quality and coverage of assembled unigenes, we analyzed the sequencing depth range.The sequencing depth ranged from 0.27 to 2319 folds, in which 71.45% of the unigenes were less than 10 reads, 18.6% of the unigenes ranged from 11 to 100 reads, 6.69% of the unigenes varied from 100 to 1000 reads, and approximately 3.26% were supported by more than 1000 reads (Figure 2).
The quality of transcriptome data is a decisive factor for various subsequent analyses [42].However, the quality of the data may be affected by several factors, including the contamination of adapter sequences, poor quality reads, and assembly errors [23].In this study, a total of 236,529,278 clean reads and 35.47 Gbs of databases were obtained from Chinese fir transcriptomes, which is much more than any previously published transcriptome research on Chinese fir [23][24][25]43].Due to a lack of homologous unigenes, the average length of unigenes for non-model plants is usually less than 500 bp [26,44,45].Surprisingly, the unigenes we assembled revealed a mean length of 520 bp, which is longer than those achieved in previous studies of Chinese fir as reported by Huang et al. (449 bp) [23], Wang et al. (505 bp) [24], and Qiu et al. (497 bp) [43].However, the length distribution of the unigenes was similar to that of the transcripts, and 75.18% unigenes were less than 500 bp, indicating that the length distribution of the transcripts and unigenes was mainly represented by short sequences with relatively little redundancy.These findings are roughly consistent with the results of previous studies [23,24], suggesting that the quality of our data is comparable to similar data of Chinese fir transcriptomic research.The Trinity software we used is a popular assembly tool, but different assembly strategies have an effect on the accuracy of data, and the comparative study of different assembly software such as AbySS and SOAPdenovo can help to improve assembly accuracy [46].For further research, a large number of accuracy assembled sequential data could provide more in-depth transcriptome information, and lead to rapid characterization for most of the transcripts and a reference for the potentially valuable genes [47].

Annotation of Predicted Proteins
To functionally categorize all the assembled unigenes of Chinese fir, we identified the unigenes coding sequences in the following protein databases: NR, Swiss-Prot, GO, KOG, Pfam, and KEGG.Of the 413,806 unigenes, we recovered 203,404 of these (49.15%) that were represented in at least one database (Table S2).The rest of the unigene sequences (50.85%) had no significant matches in the

Annotation of Predicted Proteins
To functionally categorize all the assembled unigenes of Chinese fir, we identified the unigenes coding sequences in the following protein databases: NR, Swiss-Prot, GO, KOG, Pfam, and KEGG.Of the 413,806 unigenes, we recovered 203,404 of these (49.15%) that were represented in at least one database (Table S2).The rest of the unigene sequences (50.85%) had no significant matches in the

Annotation of Predicted Proteins
To functionally categorize all the assembled unigenes of Chinese fir, we identified the unigenes coding sequences in the following protein databases: NR, Swiss-Prot, GO, KOG, Pfam, and KEGG.
Of the 413,806 unigenes, we recovered 203,404 of these (49.15%) that were represented in at least one database (Table S2).The rest of the unigene sequences (50.85%) had no significant matches in the existing databases.More specifically, 109,596 unigenes were annotated in the NR database with the E value and similarity of the annotated results, showing that 75.0% of the mapped sequences have a strong homology (ranging from 1.0 × 10 −5 to 1.0 × 10 −45 ).An additional 17.8% of the unigenes showed more than 80% similarity (Figure 3).Based on the species distribution analysis, the Chinese fir unigenes have 7.5% and 2.9% matches with sequences from the conifer Picea sitchensis (Bong.)Carr.and the flowering plant Amborella trichopoda Baill., which is sister to all living angiosperms.(Figure 3).These results indicate that the Illumina paired-end sequencing produced a substantial fraction of the Chinese fir genes, but since there is no reference genome available, there are still limits in our ability to identify the assembled unigenes.The unigenes which were not annotated in the existing databases might lack a characterized protein domain to match in the database, but some of those unigenes could also be potential specific genes for Chinese fir.In total, 203,404 unigenes were annotated in our study, which was far more than the previous study of the number of unigenes in the Chinese fir [23,25].The species similarity analysis revealed possible incompletely defined genetic backgrounds for Chinese fir [48].Large-capacity databases and whole genome sequence data of Chinese fir will be needed to more accurately complete future transcriptome analyses [49].).An additional 17.8% of the unigenes showed more than 80% similarity (Figure 3).Based on the species distribution analysis, the Chinese fir unigenes have 7.5% and 2.9% matches with sequences from the conifer Picea sitchensis (Bong.)Carr.and the flowering plant Amborella trichopoda Baill., which is sister to all living angiosperms.(Figure 3).These results indicate that the Illumina paired-end sequencing produced a substantial fraction of the Chinese fir genes, but since there is no reference genome available, there are still limits in our ability to identify the assembled unigenes.The unigenes which were not annotated in the existing databases might lack a characterized protein domain to match in the database, but some of those unigenes could also be potential specific genes for Chinese fir.In total, 203,404 unigenes were annotated in our study, which was far more than the previous study of the number of unigenes in the Chinese fir [23,25].The species similarity analysis revealed possible incompletely defined genetic backgrounds for Chinese fir [48].Large-capacity databases and whole genome sequence data of Chinese fir will be needed to more accurately complete future transcriptome analyses [49].

GO Assignments
Gene Ontology (GO) is an internationally standardized gene function classification system that provides a dynamic standard vocabulary and a strictly defined concept to describe the functional attributes of genes and their products [50].In order to classify the putative functions of the predicted Chinese fir genes, the assembled unigene sequences were uploaded into the GO database.The sequence homology assessment revealed that a total of 148,466 unigenes (35.87% of all our annotated unigenes) were present in the GO database (Table S3), and 727,287 predicted genes were assigned to at least one GO term (Table S4).Furthermore, the assigned GO terms were statistically classified into three main GO categories: biological processes, molecular functions, and cellular components, and then into 56 subcategories (Figure 4).Biological processes encompassed 348,743 (47.95%)GO annotations and was the largest cluster, followed by cellular components (212,057, 29.16%) and molecular functions (166,489, 22.89%).Biological processes are the vital processes occurring in organisms to live and contain various chemical reactions or other events that result in growth and development [51].The sub-categories of metabolic process (79,020 unigenes, 22.66%), cellular process (77,109 unigenes, 22.11%), and single-organism process (63,493 unigenes, 18.21%) were prominent in the biological processes category, indicating that the analyzed tissue has a high degree of metabolic activity and these unigenes are involved in metabolic activities of Chinese fir.In the cellular component category, 41,377 (19.51%) and 41,331 (19.49%) unigenes were assigned to cell and cell parts, respectively; with the following sub categories being well represented: organelle (28,494 unigenes, 13.44%), macromolecular complex (25,652 unigenes, 12.10%), membrane (23,894 unigenes, 11.27%), membrane parts (22144 unigenes, 10.44%), organelle parts (14,508 unigenes, 6.84%), and other subcategories with fewer unigenes.According to the classification of the molecular function category, binding (73,118 unigenes, 43.92%), catalytic activity (66,050 unigenes, 39.67%), and transporter activity (11,086 unigenes, 6.66%) represent the most abundant sub-categories.These GO

GO Assignments
Gene Ontology (GO) is an internationally standardized gene function classification system that provides a dynamic standard vocabulary and a strictly defined concept to describe the functional attributes of genes and their products [50].In order to classify the putative functions of the predicted Chinese fir genes, the assembled unigene sequences were uploaded into the GO database.The sequence homology assessment revealed that a total of 148,466 unigenes (35.87% of all our annotated unigenes) were present in the GO database (Table S3), and 727,287 predicted genes were assigned to at least one GO term (Table S4).Furthermore, the assigned GO terms were statistically classified into three main GO categories: biological processes, molecular functions, and cellular components, and then into 56 subcategories (Figure 4).Biological processes encompassed 348,743 (47.95%)GO annotations and was the largest cluster, followed by cellular components (212,057, 29.16%) and molecular functions (166,489, 22.89%).Biological processes are the vital processes occurring in organisms to live and contain various chemical reactions or other events that result in growth and development [51].The sub-categories of metabolic process (79,020 unigenes, 22.66%), cellular process (77,109 unigenes, 22.11%), and single-organism process (63,493 unigenes, 18.21%) were prominent in the biological processes category, indicating that the analyzed tissue has a high degree of metabolic activity and these unigenes are involved in metabolic activities of Chinese fir.In the cellular component category, 41,377 (19.51%) and 41,331 (19.49%) unigenes were assigned to cell and cell parts, respectively; with the following sub categories being well represented: organelle (28,494 unigenes, 13.44%), macromolecular complex (25,652 unigenes, 12.10%), membrane (23,894 unigenes, 11.27%), membrane parts (22144 unigenes, 10.44%), organelle parts (14,508 unigenes, 6.84%), and other subcategories with fewer unigenes.According to the classification of the molecular function category, binding (73,118 unigenes, 43.92%), catalytic activity (66,050 unigenes, 39.67%), and transporter activity (11,086 unigenes, 6.66%) represent the most abundant sub-categories.These GO classification results are similar to those revealed in previous transcriptomic studies of Chinese fir [23,43].GO annotations describe the contour features of the overall gene expression of Chinese fir among root, stem, and leaf tissues, and revealed expressed genes encoding inorganic phosphate transporter activity functions like transporter activity-the primary focus of this study.classification results are similar to those revealed in previous transcriptomic studies of Chinese fir [23,43].GO annotations describe the contour features of the overall gene expression of Chinese fir among root, stem, and leaf tissues, and revealed expressed genes encoding inorganic phosphate transporter activity functions like transporter activity-the primary focus of this study.

KOG Classification
An objective classification of proteins encoded within a transcriptome is required for making the unigenes useful for functional and evolutionary studies [52].To further evaluate the completeness of our transcriptome library and to predict the unigenes' functions, all the assembled unigenes with high levels of homology to sequences in the generic databases were searched against the KOG database containing only orthologous genes of eukaryotes [53].In total, information for 92,001 classified unigenes was assigned to 26 KOG categories (Figure 5, Table S5).Among the 26 KOG categories, the cluster for general functional prediction (14,054 unigenes, 13.65%) represented the largest group, followed by post-translational modification, protein turnover and chaperones (11,200 unigenes, 10.88%), translation, ribosomal structure and biogenesis (8791 unigenes, 8.54%), and energy production and conversion (6406 unigenes, 6.22%).Only a few unigenes were assigned to cell motility (57 unigenes, 0.06%).Secondary metabolites biosynthesis, transport, and catabolism represented 4136 unigenes (4.02%), presumably because of the relative importance of secondary metabolic activity for the transmembrane transport of inorganic phosphate and other ions.To some extent, KOG classifications further reveal the potential specific reactions and the functional participation in molecular processes for genes expressed in Chinese fir.

KOG Classification
An objective classification of proteins encoded within a transcriptome is required for making the unigenes useful for functional and evolutionary studies [52].To further evaluate the completeness of our transcriptome library and to predict the unigenes' functions, all the assembled unigenes with high levels of homology to sequences in the generic databases were searched against the KOG database containing only orthologous genes of eukaryotes [53].In total, information for 92,001 classified unigenes was assigned to 26 KOG categories (Figure 5, Table S5).Among the 26 KOG categories, the cluster for general functional prediction (14,054 unigenes, 13.65%) represented the largest group, followed by post-translational modification, protein turnover and chaperones (11,200 unigenes, 10.88%), translation, ribosomal structure and biogenesis (8791 unigenes, 8.54%), and energy production and conversion (6406 unigenes, 6.22%).Only a few unigenes were assigned to cell motility (57 unigenes, 0.06%).Secondary metabolites biosynthesis, transport, and catabolism represented 4136 unigenes (4.02%), presumably because of the relative importance of secondary metabolic activity for the transmembrane transport of inorganic phosphate and other ions.To some extent, KOG classifications further reveal the potential specific reactions and the functional participation in molecular processes for genes expressed in Chinese fir.

KEGG Pathway Analysis
The KEGG pathway database allows for a systematic analysis of inner-cell metabolic pathways and functions of gene products within a cell [54].To identify the biological pathways activated in Chinese fir, we mapped the annotated sequences to the canonical reference pathways in the KEGG database [33].The results indicated that 57,042 unigenes had significant matches in the database and were assigned to 132 KEGG pathways (Table S6).These pathways are summarized and mapped in five categories with 12 sub categories (Figure S2).The top 10 pathways include unigenes involved in carbon metabolism (3457), ribosomes (3343), biosynthesis of amino acids (2696), spliceosome (2138), protein processing in endoplasmic reticulum (1962), purine metabolism (1686), RNA transport (1567), oxidative phosphorylation (1434), glycolysis/gluconeogenesis (1381), and pyrimidine metabolism (1326).These results partially explain why a large number of secondary metabolites exist in Chinese fir.The inorganic phosphate transporters that we are most interested in studying belong to the ABC (ATP-binding Cassette) transporter family [22].Thus, we concentrated our efforts further on the membrane transport category involving 620 unigenes of ABC transporters.These annotations provide a valuable resource for investigating the processes, functions, and pathways involved in inorganic phosphate transport.The GO, KOG, and KEGG annotated more than half a percent of unigenes and basically represents the various aspects of metabolism, which also indicated that our transcriptomic yielded a substantial fraction of genes from Chinese fir.

Candidate Genes Involved in Inorganic Phosphate Transport
The expression of all unigenes was denoted by FPKM values [55].Table S6 lists the top 10 most frequently expressed unigenes in the root, stem, and leaf transcriptomes, respectively.Unigenes c219691, c213460, and c193149, whose gene functional annotations were identified as the pathogenesis-related protein Bet v I family, thaumatin-like protein, and translationally-controlled tumor protein, respectively, had a relatively high expression level across the root, stem, and leaf transcriptions.Research over the past 20 years has provided clear evidence that the limited availability of phosphorus in forest soil is widespread and plays an important factor in the declining productivity of Chinese fir plantations over successive rotations [1,12,[14][15][16].Therefore, we have selected the unigenes with the functional annotations of inorganic phosphate transporter activity for further analysis.Approximately 49 unigenes were identified as potentially exhibiting inorganic phosphate transporter activity in the Chinese fir transcriptome (Table S7).There are 25 unigenes that were annotated as exhibiting inorganic phosphate transmembrane transporter activity, and 14

KEGG Pathway Analysis
The KEGG pathway database allows for a systematic analysis of inner-cell metabolic pathways and functions of gene products within a cell [54].To identify the biological pathways activated in Chinese fir, we mapped the annotated sequences to the canonical reference pathways in the KEGG database [33].The results indicated that 57,042 unigenes had significant matches in the database and were assigned to 132 KEGG pathways (Table S6).These pathways are summarized and mapped in five categories with 12 sub categories (Figure S2).The top 10 pathways include unigenes involved in carbon metabolism (3457), ribosomes (3343), biosynthesis of amino acids (2696), spliceosome (2138), protein processing in endoplasmic reticulum (1962), purine metabolism (1686), RNA transport (1567), oxidative phosphorylation (1434), glycolysis/gluconeogenesis (1381), and pyrimidine metabolism (1326).These results partially explain why a large number of secondary metabolites exist in Chinese fir.The inorganic phosphate transporters that we are most interested in studying belong to the ABC (ATP-binding Cassette) transporter family [22].Thus, we concentrated our efforts further on the membrane transport category involving 620 unigenes of ABC transporters.These annotations provide a valuable resource for investigating the processes, functions, and pathways involved in inorganic phosphate transport.The GO, KOG, and KEGG annotated more than half a percent of unigenes and basically represents the various aspects of metabolism, which also indicated that our transcriptomic yielded a substantial fraction of genes from Chinese fir.

Candidate Genes Involved in Inorganic Phosphate Transport
The expression of all unigenes was denoted by FPKM values [55].Table S6 lists the top 10 most frequently expressed unigenes in the root, stem, and leaf transcriptomes, respectively.Unigenes c219691, c213460, and c193149, whose gene functional annotations were identified as the pathogenesis-related protein Bet v I family, thaumatin-like protein, and translationally-controlled tumor protein, respectively, had a relatively high expression level across the root, stem, and leaf transcriptions.Research over the past 20 years has provided clear evidence that the limited availability of phosphorus in forest soil is widespread and plays an important factor in the declining productivity of Chinese fir plantations over successive rotations [1,12,[14][15][16].Therefore, we have selected the unigenes with the functional annotations of inorganic phosphate transporter activity for further analysis.Approximately 49 unigenes were identified as potentially exhibiting inorganic phosphate transporter activity in the Chinese fir transcriptome (Table S7).There are 25 unigenes that were annotated as exhibiting inorganic phosphate transmembrane transporter activity, and 14 unigenes that may be homologs of the characterized PHT genes in plants or yeast (Table 3).Unfortunately, the other 11 unigenes were not authentic homology annotated to any characterized PHT genes, and their functional analysis remains to be further studied.
Ten of the candidate PHT unigenes were annotated to the mitochondrial phosphate carrier protein and had a high homology with the solute carrier family 25 members 3 genes (SLC25A3) (Table S7, number [26][27][28][29][30][31][32][33][34][35].The SLC25A3 gene is a phosphate transport protein and catalyzes the transport of phosphate into the mitochondrial matrix, either by proton cotransport or in exchange for hydroxyl ions [56].The protein contains three related segments arranged in tandem which are related to those found in other characterized members of the mitochondrial carrier family [57].The homologous genes of SLC25A3 identified in Arabidopsis are called mitochondrial phosphate transporters (MPT) and belong to the PHT3 sub-family, which plays crucial roles in the uptake of orthophosphate into the mitochondrial matrix, and is essential for the oxidative phosphorylation of ADP to ATP [57,58].Therefore, the ten unigenes we discovered may belong to PHT3 and play a similar function in mitochondrial phosphate transport in Chinese fir.In comparison, 13 unigenes in the list of Table S7 exhibited functional annotations of ATP binding and ATP: ADP antiporter (AAA) activity.The AAA gene family has been reported as an obligate exchange translocase specific for ATP and ADP, and can also transport inorganic phosphate [59].Unigene c258053-g1 was annotated to the transmembrane 9 superfamily member 2 (TM9SF2), which encodes a member of the transmembrane 9 superfamily and plays a role in small molecule transport or acts as an ion channel [60].In total, there are 45 candidate PHT unigenes expressed in root tissue, and 12 unigenes expressed in leaves and stem tissue.Baker et al. reported that in Arabidopsis the PHT1 gene exhibits strong expression in roots and is responsible for absorbed phosphorus from soil and for the distribution and remobilization within the plant [20].Thus, the PHT1s were the most important transporters for plants and widely studied in many plants like tomato, rice, and poplar [19,22].The other PHT family members are mostly responsible for intracellular P distribution [61].The PHT2s play the function of H + /Pi cotransporters in the plastids of plants, the PHT3s were recognized as mitochondrial Pi transporter genes to catalyze the exchange of Pi between the matrix and cytosol, and the PHT4s were reported to transport Pi in plastids and the Golgiapparatus [21,61].

Gene Expression Analysis by qRT-PCR
We used qRT-PCR analysis to compare the relative transcript levels of the 14 candidate phosphate transporter unigenes in Chinese fir root tissue after treatment with low P or starved P stress for 0, 12 h, 24 h, and 48 h, and then the recovery to a normal level of P supply for 12 h, 24 h, and 48 h, as described above (Table 3).The results showed that all 14 unigenes exhibited positive expression in root tissue, and most of them revealed a significant expression change when faced with a P deficient or recovery supply (Figure 6).The unigene c290880_g1 performed at the highest level of expression in all unigenes, and expression levels increased significantly after P was deficient for 12 h.After being P deficient for 24 h and 48 h, the gene expression level was increased to more than double.When recovery P was supplied, the level of gene expression decreased but still maintained a high level.The unigene c169602_g1 and c222312_g2 also performed at a high level of expression in root tissue, and the expression pattern was obviously induced by P deficient and recovery supply.Under starved P stress for 48 h, the gene expression level of unigene c169602_g1 was increased to more than triple, and the unigene c169602_g1 was increased to more than six.The expression level of Unigene c222463_g1 was relatively low in root tissue but also revealed the same expression pattern as a significant increase in P deficiency and decrease in recovery P supply.The expression level of Unigene c217564_g1, c140065_g1, and c144269_g1 did not show a significant increase in P deficiency and a decrease in recovery P supply.In contrast, the unigenes c303734_g1 and c206932_g1 showed a decreased expression in P deficiency and an increased expression in recovery P supply.Additionally, the expression levels of Unigene c305983_g1, c7218_g1, c245205_g1, c423777_g1, and c123360_g1 exhibited no significant change according to the environment P concentration changes.The gene expression results from qRT-PCR were partially consistent with the FPKM value by RNA-Seq, such as the unigene c305983_g1 and c290880_g1, which both have a relatively high level of gene expression and FPKM value.On the whole, the expression results from the qRT-PCR analysis matched closely with the putative functions assigned to these unigenes.
Previous studies on Arabidopsis and rice revealed the presence of two different P uptake systems, low P inducible high-affinity systems, and constitutive low-affinity systems [20,22].The high-affinity transporters functional expression induced by low P concentrations and low-affinity transporters are operational at high P availabilities [62].The unigenes c169602_g1 and c222312_g2 displayed a high homology with the PHO48 gene, which was identified in yeast and encoded a H + -Pi cotransporter [63].The first P transporters identified in Arabidopsis showed a high homology to PHO84 and were named as PHT1 for plant H + -Pi cotransporters [19,63].The PHO48 gene belongs to the high-affinity transporters, and the unigenes c169602_g1 and c222312_g2 we characterized exhibited a similar expression pattern when induced by P deficiency.Thus, we believe these two unigenes belong to the PHT1 subfamily and show high affinity with P transporters of Chinese fir.The unigene c290880_g1 expression pattern also suggests the high-affinity transporters and a high homology with the PHT gene in the marine diatom Thalassiosira pseudonana.We believe that these three unigenes belong to the PHT1s and are functional as they absorbed phosphorus from soil in Chinese fir, and the qRT-PCR results also supported this as the expression of unigenes was induced by low Pi stress.The unigene c222463_g1 annotated high homology with PHT2 gene in Medicago truncatula Gaertn.Usually, the PHT2s belong to the low-affinity transporters and are functional as transport Pi in plastids, but the exhibited expression pattern of unigene c222463_g1 was induced by P deficiency and more work needs to be conducted to research the gene function.The unigene c206932_g1 also annotated to the PHT2 gene but showed the opposite expression pattern with Unigene c222463_g1 in P deficient experiments, and it should be the Pi transporter in plastids.Overall, those unigenes exhibiting no low P deficient induced high expression change could belong to low-affinity transporters.Further investigations are required such as a comparative transcriptomic analysis with more samples and different levels of P supply, which could give us more clear results.Nevertheless, the PHT unigenes we discovered in this study will greatly enhance our understanding of the mechanisms of P uptake and translocation in Chinese fir.Ultimately, this work should prove to be helpful for the molecular-assisted selection and breeding of high P-use efficiency genotypes in this economically important timber species.

Conclusions
Phosphorus (P) is the most important limiting element for the successful growth of Chinese fir, which is a major timber crop in south China.Unfortunately, there is little research on phosphate transporters for this species because of the lack of a reference genome [21].In this study, we generated and analyzed root, stem, and leaf transcriptome sequences to identify the unigenes functionally annotated with inorganic phosphate transporter activity.In total, 413,806 unigenes were obtained and functionally classified based on their matches in the GO, KOG, and KEGG databases.The 14 positive gene expression patterns in different phosphorus deficiency treatments were analyzed by qRT-PCR to explore their putative functions.This study demonstrated that the Illumina transcriptome sequence technology is a fast and cost-effective method for gene discovery and expression profiling in non-model plants, including conifers with notoriously large genomes.The transcriptome data and expression information for Chinese fir unigenes reported herein provide a basic foundation for functional genomic studies of phosphate transporters, and serve as an important public information platform for ongoing functional genomics research.

Figure 1 .
Figure 1.Unigenes length distribution.The y-axis number has been converted into a logarithmic scale.

Figure 2 .
Figure 2. Distribution of uniquely mapped reads of assembled unigenes.The y-axis indicates the number of reads.The x-axis indicates the number of assembled unigenes.

Figure 1 .
Figure 1.Unigenes length distribution.The y-axis number has been converted into a logarithmic scale.

Figure 1 .
Figure 1.Unigenes length distribution.The y-axis number has been converted into a logarithmic scale.

Figure 2 .
Figure 2. Distribution of uniquely mapped reads of assembled unigenes.The y-axis indicates the number of reads.The x-axis indicates the number of assembled unigenes.

Figure 2 .
Figure 2. Distribution of uniquely mapped reads of assembled unigenes.The y-axis indicates the number of reads.The x-axis indicates the number of assembled unigenes.

Forests
More specifically, 109,596 unigenes were annotated in the NR database with the E value and similarity of the annotated results, showing that 75.0% of the mapped sequences have a strong homology (ranging from 1.0 × 10 −5 to 1.0 × 10 −45

Figure 3 .
Figure 3. Characteristics of the homology search of Illumina sequences against the NR database.

Figure 3 .
Figure 3. Characteristics of the homology search of Illumina sequences against the NR database.

Figure 4 .
Figure 4. Distribution of GO categories assigned to the Chinese fir transcriptome.

Figure 4 .
Figure 4. Distribution of GO categories assigned to the Chinese fir transcriptome.

Figure 5 .
Figure 5. KOG function classification of the Chinese fir transcriptome.

Figure 5 .
Figure 5. KOG function classification of the Chinese fir transcriptome.

Figure 6 .
Figure 6.Expression analysis of candidate phosphate transporter unigenes in phosphorus deficient and recovery supply by qRT-PCR.The x-axis represents low P or starved P treatment from 0 h to 48 h and then recovery P supply for 12 h, 24 h, and 48 h; the y-axis represents the unigene relative expression level.

Figure 6 .
Figure 6.Expression analysis of candidate phosphate transporter unigenes in phosphorus deficient and recovery supply by qRT-PCR.The x-axis represents low P or starved P treatment from 0 h to 48 h and then recovery P supply for 12 h, 24 h, and 48 h; the y-axis represents the unigene relative expression level.

Table 1 .
Data quality and statistics.
1 Q20: The percentage of bases with a Phred value >20; 2 Q30: The percentage of bases with a Phred value >30.

Table 2 .
Length distribution of assembled transcripts and Unigenes.

Table 3 .
Putative identification and function of annotated PHT genes for qRT-PCR analysis.