De Novo Sequencing and Assembly Analysis of Transcriptome in Pinus bungeana Zucc. ex Endl.

: To enrich the molecular data of Pinus bungeana Zucc. ex Endl. and study the regulating factors of different morphology controled by apical dominance. In this study, de novo assembly of transcriptome annotation was performed for two varieties of Pinus bungeana Zucc. ex Endl. that are obviously different in morphology. More than 147 million reads were produced, which were assembled into 88,092 unigenes. Based on a similarity search, 11,692 unigenes showed significant similarity to proteins from Picea sitchensis (Bong.) Carr. From this collection of unigenes, a large number of molecular markers were identified, including 2829 simple sequence repeats (SSRs). A total of 158 unigenes expressed differently between two varieties, including 98 up-regulated and 60 down-regulated unigenes. Furthermore, among the differently expressed genes (DEGs), five genes which may impact the plant morphology were further validated by reverse transcription quantitative polymerase chain reaction (RT-qPCR). The five genes related to cytokinin oxidase/dehydrogenase (CKX), two-component response regulator ARR-A family (ARR-A), plant hormone signal transduction (AHP), and MADS-box transcription factors have a close relationship with apical dominance. This new dataset will be a useful resource for future genetic and genomic studies in Pinus bungeana Zucc. ex Endl.


Introduction
Pinus bungeana Zucc. ex Endl. is an endemic species to China, with high economic and ecological values for landscaping and afforestation. Owing to long-term over-exploitation, natural forest of Pinus bungeana Zucc. ex Endl. has been severely damaged and natural populations are mainly found on the top of hills with poor conditions usually isolated and fragmented. Genetic diversity in populations is losing. The plant morphology, which is the most visible difference, has been commonly used for genetic diversity studies. Additionally, its regulation and secondary metabolites should also be focused on.
Sizable numbers of molecular markers and genes data from the genome are playing an increasingly important role in population genomic studies of fine-scale genetic variation and the genetic basis of traits [1]. Nevertheless, very limited genomic data is available on Pinus bungeana Zucc. ex Endl. Transcriptome sequencing is an efficient means to generate functional genomic level data for the plants [2,3]. Transcriptome sequencing is a developed approach to transcriptome profiling that uses deep-sequencing technologies [4]. Studies using this method have already altered our view of the extent and complexity of the eukaryotic transcriptome [5,6].
Apical dominance occurs when the shoot apex inhibits the growth of lateral buds, leading to the vertical growth of the plant [7]. Growing upward vertically is important for the tree. In this way, it can get more sun light to maintain photosynthesis [8]. It is best demonstrated via decapitation, which leads to apical dominance. Branching control is regulated by divergent mechanisms among different species [7,9,10]. The functions of hormones in apical dominance have been included in investigations for nearly five decades. Evidence from hormonal studies suggests that apically produced auxin indirectly suppresses axillary bud outgrowth that is promoted by cytokine originating from roots/shoots [11][12][13]. The mechanism of branching control is still the perplexing problem which is explored by many hypotheses including the classical hypothesis, the auxin transport hypothesis, and the bud transition hypothesis [11]. Exhibiting significant involvement with other hormones, cytokinins (CK) are signaling hormonal molecules that may have an essential impact in regulating cytokinesis, growth, and development in plants [14,15]. In addition, the CKX, a flavoenzyme irresistibly degrading CK into adenine/adenosine moiety, is critical for maintaining CK homeostasis in plants. Therefore, the CKX plays a very important role by means of controlling the balance of CK. The functions of the CKX gene, including pygmyism, high yield, and anti-aging, are reported in many plants, such as maize, rice, and Arabidopsis thaliana (Linn.) Heynh. in Holl & Heynh. Even though the precision of hormone content analyses in tissue has greatly improved in recent years, there have been no reports on the understanding of Pinus spp.
Next generation transcriptome sequencing was employed to characterize the transcriptome of Pinus bungeana Zucc. ex Endl. and to develop genomic resources to support further studies of the species A new variety of Pinus bungeana Zucc. ex Endl. named "Penxian", which loses apical dominance, has also been sequenced and analyzed to characterize the architecture regulation mechanism of this species at molecular level.

Materials
Bud samples of two varieties of Pinus bungeana Zucc. ex Endl. namely 'Tree No. 9' and 'Penxian' were collected from six-year-old seedlings grown in a glasshouse in Changping, Beijing, China. The variety 'Penxian' has no main stem due to loss of apical dominance ( Figure 1). Three replicate experiments were made for each sample. After sampling, the buds were immediately frozen in liquid nitrogen and homogenized with a pestle and mortar.

Sample Preparation for Illumina Sequencing
Total RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's protocol. The RNA samples were digested using DNase I at 37 • C for 30 min to remove potential genomic DNA contamination. RNA concentration and quality were assessed using a NanoDrop ND-1000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA) and Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Poly (A) mRNA was purified from 3 µg of total RNA using Oligo (dT) magnetic beads. The mRNA was fragmented using divalent cations at an elevated temperature. Double-stranded complementary to RNA (cDNA) was synthesized using a SuperScript double-stranded cDNA synthesis kit (Invitrogen, Carlsbad, CA, USA) with random hexamer primers (Illumina). These cDNA fragments were subjected to purification, end repair, and ligation to sequencing adapters. Ligation products were purified with magnetic beads and separated by agarose gel electrophoresis. A range of cDNA fragments (200 ± 25 bp) were excised from the gel and selected for PCR amplification as templates. The cDNA library was sequenced on a paired-end flow cell using IlluminaHiSeq 2500 platform (2 × 100 bp read length) by Biomarker Co. (Beijing, China).

Differential Expression Analysis and Functional Enrichment
To identify DEGs between different samples, the expression level of each unigene was calculated according to the fragments per kilobase of exon per million mapped reads (FPKM) method [6]. The R statistical package software DESeq was used for differential expression analysis with default parameters. The DEGs between different samples were restricted with FDR (False Discovery Rate) ≤ 0.001 and the absolute value of log2 Ratio ≥ 1. In addition, functional-enrichment analyses including GO and KEGG were performed to identify which DEGs were significantly enriched in GO terms and metabolic pathways at a Bonferroni-corrected p-value ≤ 0.05 compared with the whole-transcriptome background. KEGG pathway functional enrichment analysis was carried out by KOBAS (http://kobas.cbi.pku.edu.cn/) [25,26].

Differential Expression Analysis and Functional Enrichment
To identify DEGs between different samples, the expression level of each unigene was calculated according to the fragments per kilobase of exon per million mapped reads (FPKM) method [6]. The R statistical package software DESeq was used for differential expression analysis with default parameters. The DEGs between different samples were restricted with FDR (False Discovery Rate) ≤ 0.001 and the absolute value of log2 Ratio ≥ 1. In addition, functional-enrichment analyses including GO and KEGG were performed to identify which DEGs were significantly enriched in GO terms and metabolic pathways at a Bonferroni-corrected p-value ≤ 0.05 compared with the whole-transcriptome background. KEGG pathway functional enrichment analysis was carried out by KOBAS (http://kobas.cbi.pku.edu.cn/) [25,26].

Prediction of Unigene Coding Regions
The coding sequence (CDS) for unigene was predicted by BlastX [21] and getorf [27]. The unigene sequences were searched against Nr [17], COG [19], KEGG [20], and Swiss-Prot [18] protein databases using BLASTX (e-value < 10 −5 ) [21]. Unigenes aligned to a higher priority database will not be aligned to a lower priority database. The best alignment results were used to determine the sequence direction of unigenes. When a unigene could not be aligned to any database, the getorf program was introduced to decide its sequence orientation and protein coding region prediction.

Detection of SSR
The SSR detection of unigenes was performed with MicroSAtellite (MISA) software (http://pgrc.ipk-gatersleben.de/misa/misa.html). We searched for SSRs with motifs ranging from mono-to hexa-nucleotides in size. The minimum of repeat units was set as follows: ten repeat units for mono-nucleotides, six for di-nucleotides, and five for tri-, tetra-, penta-, and hexa-nucleotides.

Differential Expression Level Analysis by RT-qPCR
To verify the reliability and accuracy of RNA-Seq-based expression level analysis, five transcripts (Table S1.) which may impact the plant morphology were selected from the contigs database and evaluated the gene expression level in the two Pinus bungeana Zucc. ex Endl. varieties which were described above. RT-qPCR experiments were performed in triplicate. The methods of RNA isolation and cDNA synthesis were the same as described for RNA-Seq above. The specific primer (Table S2.) set for RT-qPCR was designed with the Oligo 7.57 software (MBI, Colorado Springs, CO, USA). The cDNA was diluted and amplified using a 7500 Fast Real-Time PCR Systems (Thermo Fisher, Waltham, MA, USA) and SYBR Premix Ex Taq kit (Thermo Fisher, Waltham, MA, USA) according to the manufacturer's instructions. Standard curves for individual unigenes were calculated using a dilution series of the pMD19-T vector containing the target gene. The relative expressions of the genes in the cold-treated seedlings compared with the control ones were determined based on comparisons with the reference gene ubiquitin-conjugating enzyme. Therefore, the amplification of the target gene was normalized with the amplification of the reference gene to correct amplification variations. The relative expression data were calculated according to the 2 −∆∆Ct method and are presented as fold change [28].

Sequencing Data Quality and Assembly of Reads
In total, approximately 148 million paired end reads were generated by high throughput sequencing. After removing adapters, low-quality sequences, and ambiguous reads, we obtained approximately 25 million, 25 million, 25 million, 24 million, 25 million, and 23 million clean reads from the ordinary variety samples (Tree No. 9-1, Tree No. 9-2, and Tree No. 9-3), and new variety samples (Penxian-1, Penxian-2, and Penxian-3), respectively. Approximately 37.21 Gb bases were obtained with 91.56% Q30 bases with a GC content of 45.85% (Table 1). Trinity was employed for transcriptome de novo assembly with next-generation short-read sequences. A total of 124,692 transcripts with a mean length of 922 were assembled. Additionally, we further clustered the transcripts into 88,092 unigenes with a mean length of 758 bp and an N50 value of 1357 bp. Among the unigenes, 51,803 (58.81%) were between 200-500 bp, 16,917(19.20%) were between 500-1000 bp, and 19,372 (21.99%) unigenes were greater than 1 kb. The shorter sequences, which were less than 200 bp in length, were excluded, as they may lack a characterized protein domain or be too short to show a sequence that matches the contigs ( Table 2). The length distributions of the unigenes are shown in Figure 2.

Functional Annotation
Sequence annotations of all unigenes were predicted with BLAST [21] with an E-value threshold of 10 −5 in the NCBI database of Nr [17], the Swiss-Prot [18] protein database, the KEGG [22] database, the COG [19] database, and the GO [24] database. Finally, after sequence annotation, 48.43% unigenes were predicted and 51.57% unigenes were still unknown. Among these annotated unigenes, 41,504 had significant matches in the Nr database, 29,287 had significant matches to the Pfam database , 25,679 had effective matches to the GO database, while 27,426 matched the Swiss-Prot database

Functional Annotation
Sequence annotations of all unigenes were predicted with BLAST [21] with an E-value threshold of 10 −5 in the NCBI database of Nr [17], the Swiss-Prot [18] protein database, the KEGG [22] database, the COG [19] database, and the GO [24] database. Finally, after sequence annotation, 48.43% unigenes were predicted and 51.57% unigenes were still unknown. Among these annotated unigenes, 41,504 had significant matches in the Nr database, 29,287 had significant matches to the Pfam database , 25,679 had effective matches to the GO database, while 27,426 matched the Swiss-Prot database (Figure 3 and Table S3). In total, 42,667 unigenes were BLAST for homology searches, resulting in 48.43% unigenes showing similarity to known protein databases. For unigene sequences in the Nr annotations, BLAST search analysis further revealed that a total 11,692 had the most similar sequences to proteins from Picea sitchensis, followed by Elaeis guineensis Jacq., with a value of 2674, and Amborella trichopoda Baill., with a value of 2292 ( Figure 4 and Table S4).
Forests 2018, 9, x FOR PEER REVIEW 6 of 14 ( Figure 3 and Table S3). In total, 42,667 unigenes were BLAST for homology searches, resulting in 48.43% unigenes showing similarity to known protein databases. For unigene sequences in the Nr annotations, BLAST search analysis further revealed that a total 11,692 had the most similar sequences to proteins from Picea sitchensis, followed by Elaeis guineensis Jacq., with a value of 2674, and Amborella trichopoda Baill., with a value of 2292 ( Figure 4 and Table S4).    ( Figure 3 and Table S3). In total, 42,667 unigenes were BLAST for homology searches, resulting in 48.43% unigenes showing similarity to known protein databases. For unigene sequences in the Nr annotations, BLAST search analysis further revealed that a total 11,692 had the most similar sequences to proteins from Picea sitchensis, followed by Elaeis guineensis Jacq., with a value of 2674, and Amborella trichopoda Baill., with a value of 2292 ( Figure 4 and Table S4).

Functional Classification
Based on the GO annotation, in total, 25,679 unigenes were assigned to 53 functional groups which were categorized into three main divisions ( Figure 5 and Table S5). The Cellular components were classified into 17 groups, within which the unigenes were mostly enriched in the cell (12,836), the cell part (12,836), and the organelle (10,185). Additionally, the unique sequences were grouped into 16 groups which were included in the molecular function. The predominant groups were the catalytic activity (13,329) and binding (12,710). These were followed by transporter activity (1594) and structural molecule activity (879). In the biological process group, the matched unique sequences were categorized into 20 classes. In this group, the unigenes were highly matched to the metabolic process (17,168), followed closely by the cellular process (14,445), the single-organism process (11,988), the response to stimulus (5320), the biological regulation (4461), and the localization (3673) ( Table S5).
All the assembled unigenes were searched in the COG database. A total of 18,516 matched unique sequences were categorized into 25 functional groups ( Figure 6). The maximum group was replication, recombination, and repair (1773), followed by the transcription (1576), posttranslational modification (1447), translation, ribosomal structure, and biogenesis (1372), signal transduction mechanisms (1351), carbohydrate transport and metabolism (1049), and amino acid transport and metabolism (1015) ( Figure 6). However, extracellular structures and nuclear structure respectively matched one and two unigenes ( Figure 6). Groups with no concrete assignment, such as function unknown (536) and general function prediction only (3299), accounted for a part of unigenes ( Figure 6).

Functional Classification
Based on the GO annotation, in total, 25,679 unigenes were assigned to 53 functional groups which were categorized into three main divisions ( Figure 5 and Table S5). The Cellular components were classified into 17 groups, within which the unigenes were mostly enriched in the cell (12,836), the cell part (12,836), and the organelle (10,185). Additionally, the unique sequences were grouped into 16 groups which were included in the molecular function. The predominant groups were the catalytic activity (13,329) and binding (12,710). These were followed by transporter activity (1594) and structural molecule activity (879). In the biological process group, the matched unique sequences were categorized into 20 classes. In this group, the unigenes were highly matched to the metabolic process (17,168), followed closely by the cellular process (14,445), the single-organism process (11,988), the response to stimulus (5320), the biological regulation (4461), and the localization (3673) ( Table S5).
All the assembled unigenes were searched in the COG database. A total of 18,516 matched unique sequences were categorized into 25 functional groups ( Figure 6). The maximum group was replication, recombination, and repair (1773), followed by the transcription (1576), posttranslational modification (1447), translation, ribosomal structure, and biogenesis (1372), signal transduction mechanisms (1351), carbohydrate transport and metabolism (1049), and amino acid transport and metabolism (1015) ( Figure 6). However, extracellular structures and nuclear structure respectively matched one and two unigenes ( Figure 6). Groups with no concrete assignment, such as function unknown (536) and general function prediction only (3299), accounted for a part of unigenes ( Figure 6).

Simple Sequence Repeats (SSR) Loci Discovery
We used the generated 19,372 unigenes (≥1 kb) to exploit potential microsatellites by using MISA software. In total, this resulted in an authentication of 3323 putative microsatellites (including 1717 mononucleotides), covering 2829 nice SSR loci, and 163 loci in a compound formation. The most abundant repeat motif (excluding mononucleotides) was trinucleotide, 862, followed by dinucleotide, 646, tetra nucleotide, 76, penta nucleotide, 14, and hex nucleotide, 8 (Tables 3 and S6). Further, we designed 2352 SSR primer pairs successfully using software Primer 3 (Table S7). These SSR loci will significantly contribute to developing polymorphic SSR markers in Pinus Zucc. ex Endl.

DEGs between Two Samples
The analysis showed that a few of the unigenes were significantly affected between the two varieties. The Volcano plot showed the number of DEGs between the two samples ( Figure 7). We identified 158 unigenes as DEGs, including 98 up-regulated and 60 down-regulated unigenes (Tables  4 and S8).

Simple Sequence Repeats (SSR) Loci Discovery
We used the generated 19,372 unigenes (≥1 kb) to exploit potential microsatellites by using MISA software. In total, this resulted in an authentication of 3323 putative microsatellites (including 1717 mononucleotides), covering 2829 nice SSR loci, and 163 loci in a compound formation. The most abundant repeat motif (excluding mononucleotides) was trinucleotide, 862, followed by dinucleotide, 646, tetra nucleotide, 76, penta nucleotide, 14, and hex nucleotide, 8 (Table 3 and Table S6). Further, we designed 2352 SSR primer pairs successfully using software Primer 3 (Table S7). These SSR loci will significantly contribute to developing polymorphic SSR markers in Pinus Zucc. ex Endl.

DEGs between Two Samples
The analysis showed that a few of the unigenes were significantly affected between the two varieties. The Volcano plot showed the number of DEGs between the two samples ( Figure 7). We identified 158 unigenes as DEGs, including 98 up-regulated and 60 down-regulated unigenes (Table 4 and Table S8).

Confirmation of Solexa Expression Patterns by RT-qPCR
There are different expression genes between the two varieties. As there is a difference between the two varieties in terms of morphology, we suggest that the auxin in them is different. Therefore, the genes for plant hormone, apical dominance, and genes transcriptional factor were identified in the KEGG pathways (Tables 5 and S1) from the DEGs. The genes concerning the CKX were downregulated. The unigenes relating to ARR-A were up-regulated. Moreover, the unigenes for AHP related up. To take the basics of DEGs of high-throughput sequencing further and confirm it, we selected five genes to validate the expression patterns of them by RT-qPCR. As it was shown in Figure  8, the expression patterns were highly consistent with the RNA-Seq analysis.

Confirmation of Solexa Expression Patterns by RT-qPCR
There are different expression genes between the two varieties. As there is a difference between the two varieties in terms of morphology, we suggest that the auxin in them is different. Therefore, the genes for plant hormone, apical dominance, and genes transcriptional factor were identified in the KEGG pathways (Table 5 and Table S1) from the DEGs. The genes concerning the CKX were down-regulated. The unigenes relating to ARR-A were up-regulated. Moreover, the unigenes for AHP related up. To take the basics of DEGs of high-throughput sequencing further and confirm it, we selected five genes to validate the expression patterns of them by RT-qPCR. As it was shown in Figure 8, the expression patterns were highly consistent with the RNA-Seq analysis.

The Description of Pinus Bungeana Zucc. ex Endl. Transcriptome
Due to its low cost and high throughput, transcriptome sequencing is becoming a necessary method to perform gene discovery. The plants without a reference genome de novo transcriptome have been generated by illumine high throughput sequencing in recent years such as Liriodendron chinense (Hemsl.) Sarg. [29], Corchorus capsularis L. [30], Camelina sativa L. [31], and Chlorophytum borivilianum Santapau & R.R. Fern. [32], etc. The Pinus spp. are ancient species and important in modern landscape planting. In addition, the Pinus bungeana Zucc. ex Endl. is an endangered species [33]. Therefore, research on the new varieties of Pinus bungeana Zucc. ex Endl. is necessary. The new variety "Penxian" loses apical dominance completely, which is regulated by the factors related to plant hormones, including signal transduction and genes transcriptional factor. So, the transcriptome study not only enriched the lack of molecular data on Pinus bungeana Zucc. ex Endl., but also supplied materials for the research of apical dominance.

Transcriptome Assembly and Gene Annotation
In this paper, 42,667 (48.43%) unigenes out of 88,092 identified were annotated using BLAST searches of the public Nr [17], Swiss-Prot [18], GO [24], COG [19], and KEGG [22] databases. According to existing databases, more than 50% of unigenes were not annotated. Several reasons, such as assembly mistakes, short sequences not containing a protein domain, the deficiency of genomic information on the Pinus app., the unigenes without hits probably belonging to untranslated regions, non-coding RNA, etc., may lead to this result [34]. The unigenes without hits may be considered putative novel transcribed sequences because of the lack of Pinus app. transcriptomic information. So, this means a large collection of unigenes of Pinus and expression patterns is necessary.
The annotation for each hit unigene met the GO annotation in terms of biological process, molecular function, and cellular components groups ( Figure 5). The categories such as the nucleotide binding (GO: 0000166), DNA binding (GO: 0003677), zinc ion binding (GO: 0008270), protein binding (GO: 0005515), ATP binding (GO: 0005524), metal ion binding (GO: 0046872), and the structural

The Description of Pinus Bungeana Zucc. ex Endl. Transcriptome
Due to its low cost and high throughput, transcriptome sequencing is becoming a necessary method to perform gene discovery. The plants without a reference genome de novo transcriptome have been generated by illumine high throughput sequencing in recent years such as Liriodendron chinense (Hemsl.) Sarg. [29], Corchorus capsularis L. [30], Camelina sativa L. [31], and Chlorophytum borivilianum Santapau & R.R. Fern. [32], etc. The Pinus spp. are ancient species and important in modern landscape planting. In addition, the Pinus bungeana Zucc. ex Endl. is an endangered species [33]. Therefore, research on the new varieties of Pinus bungeana Zucc. ex Endl. is necessary. The new variety "Penxian" loses apical dominance completely, which is regulated by the factors related to plant hormones, including signal transduction and genes transcriptional factor. So, the transcriptome study not only enriched the lack of molecular data on Pinus bungeana Zucc. ex Endl., but also supplied materials for the research of apical dominance.

Transcriptome Assembly and Gene Annotation
In this paper, 42,667 (48.43%) unigenes out of 88,092 identified were annotated using BLAST searches of the public Nr [17], Swiss-Prot [18], GO [24], COG [19], and KEGG [22] databases. According to existing databases, more than 50% of unigenes were not annotated. Several reasons, such as assembly mistakes, short sequences not containing a protein domain, the deficiency of genomic information on the Pinus app., the unigenes without hits probably belonging to untranslated regions, non-coding RNA, etc., may lead to this result [34]. The unigenes without hits may be considered putative novel transcribed sequences because of the lack of Pinus app. transcriptomic information. So, this means a large collection of unigenes of Pinus and expression patterns is necessary.
The annotation for each hit unigene met the GO annotation in terms of biological process, molecular function, and cellular components groups ( Figure 5). The categories such as the nucleotide binding (GO: 0000166), DNA binding (GO: 0003677), zinc ion binding (GO: 0008270), protein binding (GO: 0005515), ATP binding (GO: 0005524), metal ion binding (GO: 0046872), and the structural constituent of ribosome (GO: 0003735) were well identified, indicating the need for a large number of transcripts related to various biochemical processes. These gene annotations will make contributions to the discovery of other sibling species in Pinus app.

Identification of Transcription Factor Involved in Pinus Bungeana Zucc. ex Endl. Architecture Regulation
Transcription factors usually relate to plant architecture regulation. Transcript profiling can be a valid tool for the characterization of apical dominance-related transcriptional factors genes. To gather information of transcription factors genes family in Pinus bungeana Zucc. ex Endl., we retrieved all members of transcription factors gene family from Pfam and searched against databases of Pinus bungeana Zucc. ex Endl. DEGs as mentioned above. In total, we found two MADS-box transcription factors in Pinus bungeana Zucc. ex Endl. (Table 5). The two transcription factors were down-regulated in "Penxian", but the opposite in "Tree No. 9". Among the transcription factors identified from our data, MADS-box is one of the most important transcription factors due to its roles in Pinus bungeana Zucc. ex Endl. architecture regulation.

DEGs between the Two Varieties
The differences between the two varieties are apparent. Also, 158 unigenes expressed differently. Actually, the previous study including the two lacebark pine varieties, using SSR markers, showed that genetic difference exists between the samples of "Penxian" and "Tree No. 9" [35]. So, this research also goes a step further on the basis of the previous study, to develop a more unambiguous understanding of the morphology of Pinus bungeana. Additionally, among the DEGs, CKX (K00279, EC1.5.99.12, Figure S1), ARR-A (K14492, Figure S2), and AHP (K14490, Figure S2) were related to apical dominance. The function of CKX in the regulation of plant form was reported [36]. In rice, reduced expression of CKX caused CK accumulation in inflorescence meristems and increased the number of reproductive organs [37]. In transgenic Arabidopsis plants, the over expression of CKX creased CK breakdown and resulted in diminished activity of the vegetative and floral apical meristems and leaf primordia, as well as the root meristematic cell activity related to plant form [38]. ARR-A and AHP are in the same pathway, plant hormone signal transduction. ARR-A increased cytokinin sensitivity and resulted in weak morphological phenotypes [39]. Furthermore, AHP is the previous process in the pathway to the result of Cell division and Shoot initiation (Ko04075, Figure S1). In this paper, the CKX was down-regulated, whilst ARR-A and AHP were up-regulated, which may be the main reason for the difference between the two varieties. However, apical dominance is very complex, which is connected with not only the auxin and CK and their interactions in controlling the buds' outgrowth, but also the buds' status of dormancy and sustained that would be easily influenced by the environmental factors, such as light, photoperiod, carbon acquisition, and nutrients [11,40]. Moreover, this paper may promote a research approach on the function of CK and ARR-A in Pinus.

Conclusions
In this study, 88,092 unigenes with an average length of 758 bp were obtained. In addition, 42,667 unigenes acquired were BLAST, resulting in 48.43% unigenes showing similarity to known protein databases. A total of 3323 putative microsatellites (including 1717 mononucleotides), covering 2829 perfect SSR loci, and 163 loci in a compound form. The DEGs related to CKX, ARR-A, AHP, and the MADS-box transcription factors between the two varieties are validated by RT-qPCR. This will provide support to for the classical hypothesis of apical dominance. The transcriptome data in this study will contribute to further gene discovery and functional exploration for Pinus significative characterization in deeper genetic breeding.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4907/9/3/156/s1, Table S1: The detail sequence of the selected five transcripts, Table S2: The specific primer for RT-qPCR, Table S3: The Unigene Number Annotated in the Public Database Searches, Table S4: Nr Homologous Species Distribution, Table S5: GO Classification of all unigenes, Table S6: Frequency of identified SSR motifs, Table S7: The details of the 2352 SSR primer pairs, Table S8: The details of the 158 DEGs, Figure S1: The pathway map of ko00908, Figure S2: The pathway map of ko04075. The dataset is available from the NCBI Short Read Archive (SRA) with an accession number SRX2024410 (https://www.ncbi.nlm.nih.gov/sra/SRX2024410).