RNA-Seq Analysis of Plant Maturity in Crested Wheatgrass (Agropyron cristatum L.)

Crested wheatgrass (Agropyron cristatum L.) breeding programs aim to develop later maturing cultivars for extending early spring grazing in Western Canada. Plant maturity is a complex genetic trait, and little is known about genes associated with late maturity in this species. An attempt was made using RNA-Seq to profile the transcriptome of crested wheatgrass maturity and to analyze differentially expressed genes (DEGs) between early and late maturing lines. Three cDNA libraries for each line were generated by sampling leaves at the stem elongation stage, spikes at the boot and anthesis stages. A total of 75,218,230 and 74,015,092 clean sequence reads were obtained for early and late maturing lines, respectively. De novo assembly of all sequence reads generated 401,587 transcripts with a mean length of 546 bp and N50 length of 691 bp. Out of 13,133 DEGs detected, 22, 17, and eight flowering related DEGs were identified for the three stages, respectively. Twelve DEGs, including nine flowering related DEGs at the stem elongation stage were further confirmed by qRT-PCR. The analysis of homologous genes of the photoperiod pathway revealed their lower expression in the late maturing line at the stem elongation stage, suggesting that their differential expression contributed to late maturity in crested wheatgrass.


Introduction
Crested wheatgrass (Agropyron cristatum L.) is widely used for early spring grazing. This forage crop undergoes its initial growth in Western Canada from May to July and when it reaches the anthesis stage, its forage nutritive value and palatability declines rapidly [1]. Therefore, development of late maturing cultivars to maintain forage quality is highly desirable for this species. However, developing such cultivars is a challenging task, as plant maturity is a complex genetic trait, and little is known about genes associated with late maturity in this species [2].
Manipulating the timing of floral transition requires a better understanding of the molecular mechanisms of plant maturity and informative analyses of genes expressed over different stages of floral initiation and development. Flowering is one of the most important events in a plant's life cycle [3]. In many C 3 grasses, flowering in response to seasonal changes is controlled by pathways such as vernalization and photoperiod [4].
Vernalization is the prolonged exposure to the cold of a typical winter, or by an artificial equivalent to promote flowering in plants. Vernalization requirement prevents plants from flowering in the late fall before the onset of winter. In Arabidopsis thaliana, the activity of central flowering repressor which is encoded by the FLOWERING LOCUS C (FLC) gene was reduced by vernalization [5]. In the cereals of the grass subfamily Pooideae, particularly wheat (Triticum aestivum L.) and barley (Hordeum vulgare L.), three genes (VERNALIZATION1 (VRN1), VRN2, and VRN3) have been identified and are thought to form a regulatory loop to control the timing of flowering [6]. VRN1 encodes an AP1-related MADS-box protein [7]. VRN2, functioning as floral repressor FLC in A. thaliana, has a CCT zinc finger domain [8][9][10][11]. VRN3 corresponds to an ortholog of A. thaliana FT [12]. During growth of vernalization-requiring cereals in the fall, VRN2 represses VRN3 to prevent flowering and VRN1 is transcribed at very low levels [12][13][14][15]. Constitutive expression of VRN3 from a promoter not subject to VRN2-mediated repression bypasses the vernalization requirement [12], just as the constitutive expression of FT bypasses FLC repression of flowering and the vernalization requirement in A. thaliana [16].
Photoperiod regulation is another major flowering time mechanism and has been studied extensively in the model plant species A. thaliana. The key component of the photoperiod pathway in A. thaliana is CONSTANS protein (CO) [16]. CO can activate the transcription of the Flowering Locus T (FT) gene. This FT gene is the florigen which is synthesized in leaves in long-day and transported to the shoot apical meristem to start the expression of multiple floral identity genes [17]. CO gene expression is regulated by light through light receptors, phytochromes (PHYs) and cryptochromes (CRYs), and by the circadian rhythm through GIGANTEA (GI) [18]. Both cryptochromes 1 and 2 (CRY1 and CRY2), which are activated by blue light, and phytochrome A (PHYA), which is activated by far-red light, stabilize the CO protein through a mechanism that disrupts the function of the CONSTITUTIVELY PHOTOMORPHOGENIC 1/SUPPRESSOR OF PHYTOCHROME A-105 complex (COP1-SPA) [19]. The circadian rhythm is an important part of the photoperiod pathway for plant flowering including LATE ELONGATED HYPOCOTYL (LHY), CIRCADIAN CLOCK ASSOCIATED 1 (CCA1), GI, EARLY FLOWERING 3 (ELF3), and TIMING OF CAB EXPRESSION 1 (TOC1) [18,[20][21][22][23][24][25]. The expression of CO is active in the same flowering-time pathway as the circadian-clock-related genes LHY, GI and ELF3 [23,26]. Overexpression of CO can cause early flowering in the late flowering lhy and gi-3 mutants, whereas early flowering in elf-1 mutant correlates with elevated CO expression [27].
We initiated a research project in 2014 using the RNA-Seq technique to profile the transcriptome of flowering, analyze gene expressions for floral initiation and development with the aim to identify differentially expressed genes (DEGs) associated with late maturity in crested wheatgrass. Our research so far has generated a set of genomic resources for studying genes and pathways involved in floral transition and development, and has revealed the conserved photoperiod-circadian clock-CO-FT regulatory circuit for flowering initiation in crested wheatgrass [28]. In this communication, we will specifically report the research findings on differentially expressed genes associated with late maturity from an RNA-Seq analysis of plant maturity in early and late maturing lines.

Plant Materials, RNA Extraction, and Illumina Sequencing
This RNA-Seq analysis considered two accessions: Plant Introduction (PI) No. W625134, an early maturity line, and a breeding line S9516 selected for late maturity ( Table 1). The plants of these two accessions were grown in the field plots at Agriculture and Agri-Food Canada Saskatoon Research Farm, Saskatoon, SK, Canada. The field plots were established in July of 2014 with individual plants on 1 m centers. These two crested wheatgrass lines have a similar vernalization requirement to initiate flowering. The samples of the two randomly selected plants were collected at the same time, as follows: leaf tissues at stem elongation stage (vegetative stage, VS) (approximately E0), spikes at boot stage (BS) (approximately R0, 7 d after the first sampling), and spikes at the anthesis (AS) stage (R4, 30-d after the second sampling), respectively, according to Moore et al. [29]. The collected tissues were immediately frozen and stored in liquid nitrogen for RNA extraction. Total RNA of the early and late maturing plants was extracted from approximately 100 mg raw material at the three developmental stages, respectively, using the Qiagen RNeasy Plant mini kit (Qiagen Inc, Mississauga, ON, Canada) according to the manufacturer's protocol. An additional DNase treatment was preformed using the Ambion DNA-free DNase treatment and removal kit (Life Technologies, Carlsbad, CA, USA) to remove additional residual genomic DNA. RNA quantification was performed using Nanodrop 8000 (Thermo Fisher Scientific, Wilmington, DE, USA). RNA 6000 Nano labchip on 2100 Agilent Bioanalyzer (Agilent Technologies, Waldbronn, Germany) was used to assess RNA integrity. The RNA samples were subsequently used in cDNA library construction. Three cDNA libraries for each line were prepared using a TruSeq ® RNA sample preparation Kit from Illumina (San Diego, CA, USA). Paired-end libraries were sequenced using the Illumina HiSeq ® 2500 system at National Research Council, Saskatoon, SK, Canada. The Sequence data were deposited in the National Center for Biotechnology Information (NCBI) Short Read Archive under the accession number SRP092070 for the early maturing line and SRP103607 for the late maturing line.

De Novo Assembly, Differential Gene Expression Analysis and Sequence Annotation
De novo assembly of crested wheatgrass flowering transcriptome using six libraries of the early and late flowering plants was accomplished using Trinity with the online instructions [30,31]. Relevant transcripts were clustered into genes by Corset [32]. Bowtie2 was used to do the multi-mapping of the reads to the transcriptome and to produce the bam files for Corset analysis [33]. The de novo assembly was used as a reference for DEGs analysis. The alignment-based qualification method RSEM was used to estimate transcripts abundance [34]. Each RNA-Seq library was separately aligned to the reference, using Bowtie [35]. The DEGs were analyzed using R Bioconductor package, edgeR [36]. As there were no biological replications, the distribution of the edgeR dispersion values was empirically explored between 0.1 and 0.4. More than 10,000 DEGs were identified using edgeR dispersion value at 0.1, while 200 or fewer DEGs were found using edgeR dispersion value at 0.3. Therefore, a dispersion value of 0.2 was adopted to detect DEGs. The threshold to judge the significance of gene expression differences was "false discovery rate (FDR) ≤ 0.001 and the absolute value of Log2 fold change (Log2FC) ≥ 5." Blast2go (http://www.blast2go.com/b2ghome) was used to align the assembled transcripts against the NCBI nr protein database for function annotation. The e-value cut-off was set at 1 × 10 −5 . Gene name was assigned to each gene based on top Blastx hit with the highest score. The genes related to flowering time and flowering development were explored based on the gene name.

The Validation of Differentially Expressed Genes
As the original RNA-Seq analysis, done in 2014, had no biological replication, an additional effort was made here to validate a subset of the assembled gene sequences and the detected differentially expressed genes. To check the quality of assembled gene sequences, nine flowering related DEGs at the VS stage between the early and late maturing lines were selected for validation. Primer3 (http://bioinfo.ut.ee/primer3-0.4.0/primer3/) was employed for primer design, and the primers are listed in Supplementary File S1. Two early maturing plants, each from PI 439914 and PI W625134, and two late maturing plants, each from later maturing breeding lines S9580 and S9516, respectively, were randomly selected for the validation. These four plants were not genetically related to the two plants used for the RNA-Seq analysis. Total RNA at the VS stage of these four plants was extracted separately, and the extracted RNA was treated with DNAse I (Ambion Technologies, Santa Clara, CA, USA), and cDNA was synthesized using the SuperScript ® First-Strand Synthesis System for RT-PCR (Invitrogen, Life Technologies, CA, USA) according to the manufacturer's instruction. The cDNA of these plants at the VS stage was employed as templates for Sanger sequencing to validate the sequences of de novo assembly. Each PCR (20 µL) contained 1× standard PCR buffer (New England Biolabs (NEB), Whitby, ON, Canada), 1 U of Taq polymerase (NEB, ON, Canada), 0.25 µM forward primer, 0.25 µM reverse primer, 100 µM each dNTP and 50 ng of genomic DNA in a total volume of 20 µL. The PCR amplification consisted of an initial denaturation at 94 • C for 3 min, 30 cycles consisting of 94 • C (30 s), 60 • C (30 s), 72 • C (2 min) terminating with 72 • C for 5 min. All PCR products were analyzed by electrophoresis in 2% agarose gels in 1× Tris-acetate-EDTA (TAE) buffer. Gels were visualized by staining in ethidium bromide and photographed on a digital gel documentation system. The DNA fragment of the expected PCR band was cloned with a pGEM-T kit (Promega, Madison, WI, USA), and the ligation product was transformed into competent Escherichia coli DH5α cells followed by culturing overnight at 37 • C. Positive clones containing the expected DNA fragment of DEG were identified by PCR analysis with M13 and T7 specific primers and were sequenced at the National Research Council, Saskatoon, SK, Canada. Five independent positive clones containing the DEG DNA fragment from each of the four tested plants were sequenced to ensure the right sequence obtained. The DNA sequence analysis was performed using the software Mega 6 (http://www.megasoftware.net/).
The gene expression levels of nine flowering related DEGs and three randomly selected DEGs at the VS stage between the early and late maturing lines were chosen for validation using qRT-PCR. Four plants, used for Sanger sequencing were employed for qRT-PCR validation. The sequences of the specific primer sets for each tested gene were designed using the PrimerQuest Tool (https://www.idtdna.com/Primerquest/Home/Index) and listed in Supplementary File S2. The glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene of crested wheatgrass (DN67262-c0-g1) was used as an internal control for normalization. Three separate first-strand cDNA reactions were analyzed in duplicate for each sample. The qRT-PCR analysis was performed with SsoFast EvaGreen supermix (Bio-Rad, ON, Canada) according to the manufacturer's instructions using a Bio-Rad CFX96 TM system. KEGG Orthology Based Annotation System (KOBAS) program [37] was used for pathway enrichment analysis. Significantly enriched pathways were defined by taking a corrected p-value ≤ 0.05 as the threshold.

De Novo Assembly
The flowering date of breeding line S9516 selected for late maturity was on average 6 d later than that of the PI W625134 in the 2015 and 2016 growing seasons (Table 1). Two plants, each selected from the breeding line S9516 and PI W625134, respectively, were used for RNA-Seq analysis. Illumina sequencing generated 22,064,163; 28,810,091; and 24,343,976 clean reads for stem elongation, boot and anthesis stages of the early maturing line and 24,294,860; 27,763,034; and 21,957,198 clean reads for the late maturing line, respectively. A total of 149,233,322 clean reads for these six cDNA libraries were generated. De novo assembly of these clean reads resulted in 401,587 transcripts (Supplementary File S3) with an average length of 546 bp and N50 length of 691 bp. The minimum and maximum lengths of the assembled transcripts were 201 bp and 15,230 bp, respectively. These transcripts were further clustered into 217,365 "genes" using Corset, with a mean length of 737 bp. Above 90% of the RNA-Seq reads of each sample mapped back to the assembly, among which more than 94% mapping was considered as properly paired reads for the assembly (Supplementary File S4). ExN50 value measures limit to the top most highly expressed transcripts that represent x% of the total normalized expression data and is considered to be more appropriate to evaluate the quality of the transcriptome assembly data [30,31]. In this study, about 80% of the total transcription is represented by "E80 transcription set" of 38716 transcripts (E80N50 value of 1242 bp) (Supplementary File S5A). Based on the metric of transcripts per million (TPM), there were 94,461 "genes" with TPM > 5 and 45,689 "genes" with TPM > 10 (Supplementary File S5B). A summary of the sequencing reads, assembled transcripts and 'genes' for each library are presented in Table 2.

Differentially Expressed Genes Detection and Annotation
The de novo assembly of six libraries of the early and late maturing lines representing three flower developmental stages was used as reference. The expression level of each gene was calculated based on the number of reads mapping onto the reference. Comparative transcriptome profile indicated that in both genotypes, a total of 13,133 DEGs were detected at all three stages. The sequences and annotation information for these DEGs was shown in Supplementary File S6 and Supplementary File S7, respectively. Of these DEGs, 2141, 2415, and 963 were up-regulated while 3105, 3364 and 1145 were down-regulated in the late maturing line at the VS, BS and AS stage, respectively ( Figure 1A; Supplementary File S8). The Venn diagram of the DEGs at three sampling stages is shown in Figure 1B. Of these DEGs, 3587, 3973, and 1169 showed specific expression at the VS, BS, and AS, respectively ( Figure 1B). For each stage, the annotations and expression level fold changes of top 20 DEGs between the early and late maturing lines were listed in Supplementary File S9. These DEGs were involved in different biological pathways from disease resistance to seed maturing. To further identify biosynthetic pathways and to explore the functions of the DEGs, KOBAS program was used in the study. The DEGs at all three stages were significantly enriched to the pathways such as metabolic pathways, starch, and sucrose metabolism and photosynthesis pathways by KOBAS program (Supplementary File S10). It is worthy to note that circadian rhythm pathway was also identified by KOBAS for these up-regulated DEGs identified in the early maturing line at the VS stage (Supplementary File S10). Among these 13,133 DEGs, flowering related genes were also identified based on the function annotation. In total, 22, 17, and 8 flowering related DEGs were identified in the VS, BS, and AS stages, respectively ( Figure 1C). The Venn diagram of these flowering related DEGs is shown in Figure 1D. DN73410-c0-g1 showed down-regulated expression in the late maturing line at all three stages. DN60639-c0-g1 and DN76849-c0-g1 had low expression in the late maturing line at both VS and BS stages. While DN70198-c0-g1 and DN54022-c0-g3 showed high expression in the late maturing line at both BS and AS stages (Figure 2).

Flowering Related Differentially Expressed Genes
Twenty-two flowering related DEGs were identified at the VS stage and the functional annotation for these genes is listed in Table 3. DN75888-c2-g3, homologous gene to the photoreceptor CRY1 gene which detects blue light, had around 10 times lower expression in the late maturing line. DN59102-c0-g1, homologous gene to the circadian clock gene TIMING OF CAB (TOC1), downregulated 11 times in the late maturing line. DN73561-c0-g2, which is the homologous gene of GI, was expressed about 10 times lower in the late flowering line. DN66104-c0-g1, DN48214-c0-g5 and DN69810-c0-g1, COL homologous gene, were nine times down-regulated in the late maturing line. DN67303-c0-g12, DN74350-c0-g11, and DN66158-c0-g2, homologous genes to FT showed around 10 times lower expression in the late maturing line. Photoreceptors, a circadian clock and an output pathway from the clock specific to flowering are composed of the photoperiod pathway [38]. The photoreceptors process physical signals and produce a circadian clock. Interactions between photoreceptors and the circadian clock are thought to allow plants to distinguish different day lengths. It has been reported in A. thaliana that the output from circadian clock via the GI protein activates the expression of CO [27,39]. CO then directly activates its prime target the FT gene which promoting flowering [35]. Overexpression of a wheat FT homologous gene TaFT in transgenic wheat caused early flowering [12]. The homologous genes of CRY1, TOC1, GI, CO, and FT had down-regulated expression in the late maturing line at the VS stage which indicated that differential expression of the photoperiod pathway genes could be the reason for later maturity in crested wheatgrass. Furthermore, genes involved in flower organ formation and floral development also had higher expression at the VS stage in the early maturing line. DN70348-c0-g2 and DN64729-c0-g4, the homologous genes of MADS-box transcription factors that were correlated with floral development, were nine times up-regulated in the early maturing line. Additionally, 17 and 8 flowering related DEGs were identified at the BS and AS stages between the early and late maturing lines, respectively. The function annotation of these genes is listed in Tables 4 and 5 for BS and AS stage, respectively. DN65855-c0-g1, a CO homolog had about 10 times up-regulation at the BS stage and DN54022-c0-g3, homologous gene for MADS-box transcription factor 7, showed up-regulated expression in both the BS and AS stages in the late maturing line.  Furthermore, the expression level changes of these 40 flowering related DEGs at all three stages between the early and late maturing lines were shown in Figure 2. The CO homologous gene DN69810-c0-g1 and FT homologous genes DN67303-c0-g12 and DN74350-c1-g11 showed specific expression at the VS stage. This is consistent with our previous report that CO and FT homologs showed specific expression in the leaves of crested wheatgrass for promoting flowering [28]. CO is expressed in the vasculature of A. thaliana leaves which can activate FT expression in flowering [40]. FT is a strong flowering promoter that is translocated from the vasculature of leaves to the shoot apical meristem [17,41]. These identified genes will be valuable for understanding the molecular basis of late maturity in crested wheatgrass.

Differentially Expressed Genes Validation
To check the quality of de novo assembly, nine flowering related DEGs at the VS stage were selected for validation. Seven of these selected DEGs had good amplification products at the expected size ( Figure 3). Further Sanger sequencing indicated that they had the same sequences as that indicated by RNA-Seq de novo assembly. However, DN48214-c0-g5 and DN69810-c0-g1 had no amplification products, suggesting the presence of the poor assembled gene sequences. Twelve DEGs, nine flowering related, and three randomly selected DEGs from the up-regulated DEGs in the early maturing line at the VS stage were used to validate their expression levels. Eleven selected DEGs displayed consistent RNA-Seq results at the VS stage ( Figure 4). A regression analysis of log 2 FC data from RNA-Seq and qRT-PCR revealed a trend of positive linear relationship between two measurements (Supplementary File S11). The statistical non-significance mainly reflected the test of a small number of DEGs. Together, these validation results provided valuable information on the reliability of our RNA-Seq analysis.

Implications for RNA-Seq Analysis and Breeding
The research outputs presented here helped to confirm the effectiveness of RNA-Seq and differential gene expression analysis in investigation of genes related to specific traits in species without a reference genome [28]. These technologies have made the informative genomic investigation of a complex trait such as plant maturity in a non-model, polyploidy plant practically possible. However, the analysis would be more informative with proper experimental design of multiple plant pairs with extreme trait values, including biological replication [42].
The genomic resources generated from this RNA-Seq can be used to characterize genes associated with late maturity further in breeding materials and to develop informative markers specifically for late maturity to guide the selection of parental lines. PCR primers can be designed for the 40 differentially expressed genes associated with late maturity (Tables 3-5) and alleles associated with late maturity can be identified either using qRT-PCR or other PCR means. Informative alleles can be screened to inform parental selection for crossing. We are currently developing these single nucleotide polymorphism markers and exploring their use in our breeding program.

Conclusions
The combination of RNA-Seq application with DEG analysis based on Illumina sequencing technology has provided a comprehensive set of genomic information on gene expression level changes between the early and late maturing lines at three flowering stages. It appeared that the down-regulated expressions of the photoperiod pathway genes contributed to late maturity in crested wheatgrass. These findings not only are useful for investigating the molecular mechanism for late maturity but also can be used to develop molecular markers for selecting later maturing germplasm of crested wheatgrass.