De Novo Assembly and Discovery of Genes That Involved in Drought Tolerance in the Common Vetch

The common vetch (Vicia sativa) is often used as feed for livestock because of its high nutritional value. However, drought stress reduces forage production through plant damage. Here, we studied the transcriptional profiles of common vetch exposed to drought in order to understand the molecular mechanisms of drought tolerance in this species. The genome of the common vetch has not been sequenced, therefore we used Illumina sequencing to generate de novo transcriptomes. Nearly 500 million clean reads were used to generate 174,636 transcripts, including 122,299 unigenes. In addition, 5313 transcription factors were identified and these transcription factors were classified into 79 different gene families. We also identified 11,181 SSR loci from di- to hexa-nucleotides whose repeat number was greater than five. On the basis of differentially expressed genes, Gene Ontology analysis identified many drought-relevant categories, including “oxidation-reduction process”, “lipid metabolic process” and “oxidoreductase activity”. In addition to these, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified pathways, such as “Plant hormone signal transduction”, “Glycolysis/Gluconeogenesis” and “Phenylpropanoid biosynthesis”, as differentially expressed in the plants exposed to drought. The expression results in this study will be useful for further extending our knowledge on the drought tolerance of common vetch.


Introduction
The common vetch (Vicia sativa) is an annual, self-pollinated and diploid leguminous forage [1][2][3][4]. It is adaptable to different soil and climate and can fix nitrogen to improve soil structure. In addition to these qualities, the common vetch is nutritious to animals and is used in agriculture as feed, green manure and silage [5][6][7]. It is widely planted and used for agriculture in Turkey, Australia, New Zealand, China and other regions of the world [5,7].
Drought stress is one of the most common abiotic stresses that plants experience and cause cellular damage and secondary stresses, such as osmotic and oxidative stress and reduced membrane stability, which eventually leads to cell death by complex reaction [8]. Plants have evolved a variety of defense mechanisms and physiological responses to withstand drought, such as changes in signal transduction, metabolism and gene expression [9]. Although various breeding methods have been used to mitigate damages caused by drought stress in plants, genetic engineering is more effective than traditional breeding. However, genetic engineering requires identifying genes that are important for drought tolerance and understanding genes that are differentially expressed under drought stress is important for identifying these genes. Previous studies have identified genes that show a transcriptional response to drought stress in other higher plants [10].
Next generation sequencing allows for the generation of large-scale transcriptome data in both model and non-model species. Since Hegedus et al. [11] first used Solexa/Illumina s Digital Gene Expression (DGE) system to study the transcriptome of zebrafish infected with Mycobacterium marinum, high-throughput RNA sequencing (RNA-Seq) and DGE technology have been widely used to identify plant genes, including those expressed in stress condition [12,13] and for other important agronomic traits.
Here, we aimed to identify genes involved in drought tolerance using Ilumina tag-sequencing and screening for differentially expressed genes (DEGs). We further validated DEGs using qPCR. To our knowledge, this is the first transcriptome resource for the common vetch. DEGs identified in this study will help to elucidate the common vetch s molecular response mechanism to drought stress and serve as a reference for improving drought resistance in the common vetch through future genetic modifications.

Transcriptome Sequencing and De Novo Assembly
A total of 10 cDNA libraries from the control (0 days, 3 days, 5 days) and drought treated (3 days, 5 days) were generated and referred to as C0, C1, C2, D1 and D2. Each condition had two biological replicates and those were referred to as C0a, C0b and so forth. Overview of the sequencing and assembly results are listed in Table 1 and they have been deposited in the NCBI (National Center for Biotechnology Information) Short Read Archive (SRA: SRR8186820). More than 95.56% of the bases from the raw reads had Q value ≥ 20 (an error probability of 0.02%) and approximately 90% of the bases from the raw reads had Q value ≥ 30 (an error probability of 0.02%). The GC-content was between 42.46% and 42.98%. These reads were used for de novo assembly of the transcriptome.
To explore the potential function of the unigenes in the common vetch, the biochemical pathways and functions associated with the unigenes were assigned by KEGG. A total of 37,056 were assigned to five KEGG biochemical pathways (Table S3): Cellular processes (1808), environmental information processing (1345), genetic information processing (7966), metabolism (16,705) and organismal systems (1202). The largest group was metabolic pathways and many of the unigenes within this group were associated with carbohydrate metabolism (3349), overview (2254), amino acid metabolism (1991) and lipid metabolism (1799). Genetic information processing was the second largest group, including genes involved in translation (3377), folding, sorting and degradation (2517), transcription (1342) and replication and repair (730). Pathways related to cellular processes, environmental information processing and organismal systems were also well represented. These results provide a valuable resource for investigating metabolic pathways in the common vetch.

Transcription Factors
Transcription factors (TFs) are important upstream regulatory proteins that regulate the plant's responses to abiotic and biotic stress and were overexpressed to enhanced the plant resistance [14][15][16][17][18][19]. In the common vetch transcriptome, we identified 5313 TFs that were classified into 79 different common families (Table S4). The largest group of TFs was the MYB family (415, 7.81%), followed by bHLH (315, 5.93%), Orphans (245, 4.61%), AP2-EREBP (245, 4.61%), C3H (244, 4.59%) and WRKY (235, 4.42%). These results are similar to the Chrysanthemum morifolium transcriptome, where largest TF group is MYB, followed by Zinc finger, AP2/EREBP and HB families [20], as well as the ramie transcriptome where the largest TF groups belonged to the bZIP, MYB, AP2/ERF and WRKY families [21]. These results imply that bZIP, MYB, AP2/ERF and WRKY are TF superfamilies in plants. At the same time, on the one hand, we found that the MYB, bHLH, C3H, WRKY and bZIP families that are well-known in stress tolerance in plants were identified. Members of MYB (Cluster-8152.4887 and Cluster-8152.51806), bHLH (Cluster-8152.52494 and Cluster-8152.29098), C3H (Cluster-8152.48221 and Cluster-8152.68994), WRKY (Cluster-8152.31165), bZIP (Cluster-8152.63327) family are always up-regulation under drought stress, suggesting were positive regulation mechanism in the common vetch. On the other hand, there are a few members of the transcription factor family that have been down-regulated, such as member of bHLH (Cluster-8152.61311). The results indicate that TFs respond to drought stress in a variety of mechanisms and it is similar to what was found in Zea mays ssp. mexicana L. [22].

SSR Identification
Simple sequence repeats (SSRs) have much higher levels of polymorphisms than most other marker systems due to their codominance, hypervariability, high reproducibility and abundance in eukaryotic genomes. and we identified expression sequence tags-simple sequence repeats (EST-SSRs) in the transcriptome of the common vetch by analyzing the assembled contig templates. We identified a total of 24,914 distant SSR loci were identified (Table S5) and among these loci, SSR loci repeat number greater than five accounted for 11,181. Most of these satellites were mono-nucleotide motifs with more than 10 repeats, accounting for 13,733 (55.12%). AG/CT was the most frequent dinucleotide SSR repeat and accounted for 2896 and AAG/TCC was the most frequent tri-nucleotide

Transcription Factors
Transcription factors (TFs) are important upstream regulatory proteins that regulate the plant's responses to abiotic and biotic stress and were overexpressed to enhanced the plant resistance [14][15][16][17][18][19]. In the common vetch transcriptome, we identified 5313 TFs that were classified into 79 different common families (Table S4). The largest group of TFs was the MYB family (415, 7.81%), followed by bHLH (315, 5.93%), Orphans (245, 4.61%), AP2-EREBP (245, 4.61%), C3H (244, 4.59%) and WRKY (235, 4.42%). These results are similar to the Chrysanthemum morifolium transcriptome, where largest TF group is MYB, followed by Zinc finger, AP2/EREBP and HB families [20], as well as the ramie transcriptome where the largest TF groups belonged to the bZIP, MYB, AP2/ERF and WRKY families [21]. These results imply that bZIP, MYB, AP2/ERF and WRKY are TF superfamilies in plants. At the same time, on the one hand, we found that the MYB, bHLH, C3H, WRKY and bZIP families that are well-known in stress tolerance in plants were identified. Members of MYB (Cluster-8152.4887 and Cluster-8152.51806), bHLH (Cluster-8152.52494 and Cluster-8152.29098), C3H (Cluster-8152.48221 and Cluster-8152.68994), WRKY (Cluster-8152.31165), bZIP (Cluster-8152.63327) family are always up-regulation under drought stress, suggesting were positive regulation mechanism in the common vetch. On the other hand, there are a few members of the transcription factor family that have been down-regulated, such as member of bHLH (Cluster-8152.61311). The results indicate that TFs respond to drought stress in a variety of mechanisms and it is similar to what was found in Zea mays ssp. mexicana L. [22].

SSR Identification
Simple sequence repeats (SSRs) have much higher levels of polymorphisms than most other marker systems due to their codominance, hypervariability, high reproducibility and abundance in eukaryotic genomes. and we identified expression sequence tags-simple sequence repeats (EST-SSRs) in the transcriptome of the common vetch by analyzing the assembled contig templates. We identified a total of 24,914 distant SSR loci were identified (Table S5) and among these loci, SSR loci repeat number greater than five accounted for 11,181. Most of these satellites were mono-nucleotide motifs with more than 10 repeats, accounting for 13,733 (55.12%). AG/CT was the most frequent di-nucleotide SSR repeat and accounted for 2896 and AAG/TCC was the most frequent tri-nucleotide SSR repeat and accounted for 1284 (Table S5). Similarly, AG/CT is the most common di-nucleotide SSR repeat in Sorghum sudanense [23] and this may be due to the similar drought stress experienced by the two plants. Similarly, AAG/CTT is the most frequent tri-nucleotide SSR repeat in Ammopiptanthus mongolicus [24] and Sophora moorcroftiana [25], this is likely because the common vetch, A. mongolicus and S. moorcroftiana are all legumes that share similar genomic characteristics.

Differentially Expressed Genes under Drought Stress
We found a high number of unigenes with differential expression in drought-treated samples. We identified differentially expressed genes (DEGs) with a p value-adjusted (padj) < 0.05 cut-off, conducted hierarchical clustering of the DEGs. The resulting gene expression profiles of the control and drought treated samples were highly divergent (Figure 3). We discovered 3126 and 10368 genes when comparing D1 versus C1 and D2 versus C2 ( Figure 4A-B), respectively. A total of 1762 genes overlapped with those of D1 versus C1 and D2 versus C2 (Table S6), indicating that a shared set of genes was involved in response to drought stress at different time points ( Figure 4C).
SSR repeat and accounted for 1284 (Table S5). Similarly, AG/CT is the most common di-nucleotide SSR repeat in Sorghum sudanense [23] and this may be due to the similar drought stress experienced by the two plants. Similarly, AAG/CTT is the most frequent tri-nucleotide SSR repeat in Ammopiptanthus mongolicus [24] and Sophora moorcroftiana [25], this is likely because the common vetch, A. mongolicus and S. moorcroftiana are all legumes that share similar genomic characteristics.

Differentially Expressed Genes under Drought Stress
We found a high number of unigenes with differential expression in drought-treated samples. We identified differentially expressed genes (DEGs) with a p value-adjusted (padj) < 0.05 cut-off, conducted hierarchical clustering of the DEGs. The resulting gene expression profiles of the control and drought treated samples were highly divergent (Figure 3). We discovered 3126 and 10368 genes when comparing D1 versus C1 and D2 versus C2 ( Figure 4A-B), respectively. A total of 1762 genes overlapped with those of D1 versus C1 and D2 versus C2 (Table S6), indicating that a shared set of genes was involved in response to drought stress at different time points ( Figure 4C).

Functional Classification of the Drought-Responsive Stress Genes using Gene Ontology Analysis
We next analyzed DEGs using GO analysis gain an understanding of the function of the DEGs. In the D1 and C1 comparing, there were 1336 up-regulated and 1790 down-regulated DEGs and the significantly overrepresented GO terms were "single-organism metabolic process" and "metabolic process" in the Biological Process category and "catalytic activity" in the Molecular Function category ( Figure 5A). When comparing D2 C2, there were 4135 up-regulated and 6233 down-regulated DEGs and significant overrepresentation was found in "metabolic process" and "single-organism process" subcategories in the Biological Process category and "catalytic activity" subcategory in the Molecular Function category ( Figure 5B). Overall, the results suggest that "single-organism metabolic process", "single-organism process", "metabolic process" and "catalytic activity" were strongly affected in samples treated with drought stress, which likely led to a strong metabolic response.

Functional Classification of the Drought-Responsive Stress Genes using Gene Ontology Analysis
We next analyzed DEGs using GO analysis gain an understanding of the function of the DEGs. In the D1 and C1 comparing, there were 1336 up-regulated and 1790 down-regulated DEGs and the significantly overrepresented GO terms were "single-organism metabolic process" and "metabolic process" in the Biological Process category and "catalytic activity" in the Molecular Function category ( Figure 5A). When comparing D2 C2, there were 4135 up-regulated and 6233 down-regulated DEGs and significant overrepresentation was found in "metabolic process" and "single-organism process" subcategories in the Biological Process category and "catalytic activity" subcategory in the Molecular Function category ( Figure 5B). Overall, the results suggest that "single-organism metabolic process", "single-organism process", "metabolic process" and "catalytic activity" were strongly affected in samples treated with drought stress, which likely led to a strong metabolic response.
There was also an enrichment of DEGs categorized as "oxidation-reduction process" and "oxidoreductase activity", which are commonly observed categories in drought-treated plants, suggesting that our drought treatment was effective. Other enriched categories included "carbohydrate metabolic process", "lipid metabolic process", "cofactor binding" and "coenzyme binding", similar to previously published transcriptomes of high plants [23,25] and these results implicate that the interaction of different metabolic pathways is important for drought-response in plants. There was also an enrichment of DEGs categorized as "oxidation-reduction process" and "oxidoreductase activity", which are commonly observed categories in drought-treated plants, suggesting that our drought treatment was effective. Other enriched categories included "carbohydrate metabolic process", "lipid metabolic process", "cofactor binding" and "coenzyme binding", similar to previously published transcriptomes of high plants [23,25] and these results implicate that the interaction of different metabolic pathways is important for drought-response in plants.

KEGG Pathway Analysis of DEGs in Plants Exposed to Drought Conditions
To determine whether the drought stress-responsive genes belonged to specific pathways, DEGs were searched against the KEGG database. The top 20 enriched pathways are listed in Figure 6. Comparisons between D1 and C1 showed that DEGs were enriched in "Plant hormone signal transduction", "Glycolysis/Gluconeogenesis" and "Phenylpropanoid biosynthesis" ( Figure 6A, Table  S7). When comparing D2 and C2, the DEGs were enriched in "Starch and sucrose metabolism", "Phenylpropanoid biosynthesis" and "Glyoxylate and dicarboxylate metabolism" ( Figure 6B, Table  S8). In general, these data indicate that drought stress affects "Plant hormone signal transduction", "Glycolysis/Gluconeogenesis" and "Phenylpropanoid biosynthesis" in the common vetch.

KEGG Pathway Analysis of DEGs in Plants Exposed to Drought Conditions
To determine whether the drought stress-responsive genes belonged to specific pathways, DEGs were searched against the KEGG database. The top 20 enriched pathways are listed in Figure 6. Comparisons between D1 and C1 showed that DEGs were enriched in "Plant hormone signal transduction", "Glycolysis/Gluconeogenesis" and "Phenylpropanoid biosynthesis" ( Figure 6A, Table S7). When comparing D2 and C2, the DEGs were enriched in "Starch and sucrose metabolism", "Phenylpropanoid biosynthesis" and "Glyoxylate and dicarboxylate metabolism" ( Figure 6B, Table S8). In general, these data indicate that drought stress affects "Plant hormone signal transduction", "Glycolysis/Gluconeogenesis" and "Phenylpropanoid biosynthesis" in the common vetch. Under abiotic stress, plants process information from the environment through signaling pathways to activate adaptive responses [26]. Histidine-containing phosphotansfer proteins (HPTs) are involved in the cytokinin transduction pathway [27] and cytokinin activity plays an important role in the plant's response to salt, osmotic and drought stress [28]. Here, we found that HPTs Under abiotic stress, plants process information from the environment through signaling pathways to activate adaptive responses [26]. Histidine-containing phosphotansfer proteins (HPTs) are involved in the cytokinin transduction pathway [27] and cytokinin activity plays an important role in the plant's response to salt, osmotic and drought stress [28]. Here, we found that HPTs (Cluster-8152.101072) were only expressed under drought stress with FC D2 vs. D1 = 1.43 and it may be involved in the common vetch's resistance to environmental stress.
Changes in glycolysis and gluconeogenesis are common when plants respond to abiotic stress. Glycolysis is an important metabolic pathway that regulates carbohydrate metabolism and drought stress changes in sucrose and amino acid contents of plants [29]. DEGs in our analysis contained enzymes involved in glycolysis and glyconeogenesis and this is consistent with drought-mediated photosynthetic carbon metabolism [30,31]. In this study, the results show that the expression of some key enzymes in glycolysis/gluconeogenesis metabolism have changed under drought stress. For instance, putative phosphoglycerate mutase (PGAM), is regulated by an identified miRNA involved in the glycolysis pathway [32]. When water is insufficient, PGAM levels decrease [33,34]. We found that PGAM (Cluster-8152.18831) was significantly down-regulated under drought stress in the common vetch, suggesting that this gene may contribute to drought tolerance.
Signal transduction participate in numerous processes and has many pathways such as MAPK signaling pathway [35], Calcium signaling pathway [36], cAMP signaling pathway [37] and Plant hormone signal transduction [38,39]. In particular, plant hormone signal pathways, are extremely vital for plant development, growth, differentiation and adaptation to environmental stresses [40][41][42]. In this study, the results show that the expression of some key enzymes genes in plant hormone signal transduction were significantly up-regulated under drought stress. For example, protein phosphatase 2C (PP2C), have been shown to be key regulators of abscisic acid (ABA) signaling pathways, which regulate plant growth and development as well as tolerance to adverse environmental conditions [43]. The results showed that PP2C (Cluster-8152.72288) was significantly up-regulated (7.8251-fold) under drought stress, indicating this gene positively regulated the ABA signaling pathways and thus improved the drought resistance of plants.
Phenylpropanoids is a group of plant secondary metabolites derived from phenylalanine and are involved in differentiation and the protection of plant tissues against environment stresses [44]. Serine carboxypeptidase-like (SCPL) is a protease belonging to a family of hydrolases and is involved in the processing, modifying and degrading polypeptides and proteins during growth and development of plants [45][46][47][48]. As expected, we found that SCPL (Cluster-8152.36781) was significantly up-regulated under drought treatment, indicating that SCPL enhances drought resistance in the common vetch. Cinnamate 4-hydroxylase (C4H) is one of the most abundant P450s in plant [49] and is the key enzyme of the core reaction of the general phenylpropanoid pathway [44,50]. By comparing gene expression levels in drought and control conditions, we found that two candidate C4H genes, including Cluster-8152.77535 and Cluster-8152.50375, were significantly up-regulated. The results suggested that C4H protein participates in phenylpropanoid pathway to improve plants adaptation to the environment, as demonstrated by Yannick, B. [51].

Quantitative Real-Time-PCR Validation of DEGs from RNA-Seq
To confirm the gene expression data, 10 DEGs were randomly chosen for qRT-PCR analysis. The selected DEGs were all significantly down-regulated in drought-treated plants. The gene expression trends were similar in both the transcriptome and qRT-PCR data (Figure 7), validating our RNA-Seq data and DEG analysis. Figure 7. Unigene expression tendencies in both transcriptome and qRT-PCR analysis. The X-axis shows the different unigenes and the Y-axis represents expression in drought condition relative to the control. The numbers shown above the graphs indicated the fold changes for each unigene in the drought treatment relative to control conditions.

Plant Material and Drought Treatment
The surface of Vicia sativa seeds were sterilized with 75% ethanol for 5 min and rinsed with sterile distilled water. Seeds were germinated in plastic pots with 10 g seeds per pot (20 cm length, 15 cm width and 8 cm deep) filled with sterilized quartz. Pots were kept in a controlled growth chamber at the Sichuan Agricultural University in Chengdu (30°42′N, 103°51′E; Chengdu Wenjiang, Sichuan, China) and the chamber was set to a 12 h photoperiod cycle, 19 °C/15 °C day/night temperature and 500 μmol photons m −2 s −1 photosynthetic active radiation (PAR) with a relative humidity of 75%. Three-day-old seedlings were irrigated with full strength Hoagland's solution instead of distilled water, until the first leaf was expanded at about 13 cm high. Drought stress was imposed by 25% (w/v) polyethylene glycol (PEG) 6000 dissolved in Hoagland's solution for five days and control plants were treated with Hoagland's solution without PEG. Each treatment was performed in four independent replicates. Whole plants were collected at 0d as control (C0), 3d and 5d for control (C1 and C2), 3d and 5d for drought treatment (D1 and D2). Two independent biological replicates from each treatment were used for transcriptome sequencing.

RNA Extraction
Total RNA was extracted from the whole plant samples using the Trizol reagent (TransGen, Beijing, China) following the manufacturer's instructions. Total RNA quality was first monitored using 1% agarose gels. RNA purity was tested using the NanoPhotometer® spectrophotometer (IMPLEN, Palo Alto, CA, USA), concentration of the RNA was measured using the Qubit® RNA Assay Kit in a Qubit® 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, Santa Clara, CA, USA).

Library Preparation for Transcriptome Sequencing
To construct the transcriptome library, 1.5 μg RNA per sample was used as input for each sequencing library. Sequencing libraries were generated using NEBNext® Ultra™ RNA Library Prep Kit for Illumina® (NEB, San Diego, CA, USA) following the manufacturer's protocol and index codes were added to attribute sequences to each sample as follows. mRNA was purified from total RNA using poly-T oligo-attached magnetic beads and fragmentation was carried out using divalent cations under elevated temperatures in the NEBNext First Strand Synthesis Reaction Buffer (5X). First strand cDNA was synthesized using random hexamer primers and M-MuLV Reverse Transcriptase (RNase H-). Second strand cDNA synthesis was performed using DNA Polymerase I and RNase H and the Figure 7. Unigene expression tendencies in both transcriptome and qRT-PCR analysis. The X-axis shows the different unigenes and the Y-axis represents expression in drought condition relative to the control. The numbers shown above the graphs indicated the fold changes for each unigene in the drought treatment relative to control conditions.

Plant Material and Drought Treatment
The surface of Vicia sativa seeds were sterilized with 75% ethanol for 5 min and rinsed with sterile distilled water. Seeds were germinated in plastic pots with 10 g seeds per pot (20 cm length, 15 cm width and 8 cm deep) filled with sterilized quartz. Pots were kept in a controlled growth chamber at the Sichuan Agricultural University in Chengdu (30 • 42 N, 103 • 51 E; Chengdu Wenjiang, Sichuan, China) and the chamber was set to a 12 h photoperiod cycle, 19 • C/15 • C day/night temperature and 500 µmol photons m −2 s −1 photosynthetic active radiation (PAR) with a relative humidity of 75%. Three-day-old seedlings were irrigated with full strength Hoagland's solution instead of distilled water, until the first leaf was expanded at about 13 cm high. Drought stress was imposed by 25% (w/v) polyethylene glycol (PEG) 6000 dissolved in Hoagland's solution for five days and control plants were treated with Hoagland's solution without PEG. Each treatment was performed in four independent replicates. Whole plants were collected at 0d as control (C0), 3d and 5d for control (C1 and C2), 3d and 5d for drought treatment (D1 and D2). Two independent biological replicates from each treatment were used for transcriptome sequencing.

RNA Extraction
Total RNA was extracted from the whole plant samples using the Trizol reagent (TransGen, Beijing, China) following the manufacturer's instructions. Total RNA quality was first monitored using 1% agarose gels. RNA purity was tested using the NanoPhotometer ® spectrophotometer (IMPLEN, Palo Alto, CA, USA), concentration of the RNA was measured using the Qubit ® RNA Assay Kit in a Qubit ® 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, Santa Clara, CA, USA).

Library Preparation for Transcriptome Sequencing
To construct the transcriptome library, 1.5 µg RNA per sample was used as input for each sequencing library. Sequencing libraries were generated using NEBNext ® Ultra™ RNA Library Prep Kit for Illumina ® (NEB, San Diego, CA, USA) following the manufacturer's protocol and index codes were added to attribute sequences to each sample as follows. mRNA was purified from total RNA using poly-T oligo-attached magnetic beads and fragmentation was carried out using divalent cations under elevated temperatures in the NEBNext First Strand Synthesis Reaction Buffer (5X). First strand cDNA was synthesized using random hexamer primers and M-MuLV Reverse Transcriptase (RNase H-). Second strand cDNA synthesis was performed using DNA Polymerase I and RNase H and the remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of the 3 ends of the cDNA, NEBNext Adaptors with hairpin loop structures were ligated to prepare for hybridization. In order to select for 150-200 bp cDNA fragments, the library fragments were purified with the AMPure XP system (Beckman Coulter, Pasadena, CA, USA). Then 3 µL USER Enzyme (NEB, USA) was used with the size-selected and adaptor-ligated cDNA at 37 • C for 15 min and then incubated at 95 • C for 5 min before PCR. The PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X) Primer. PCR products were purified using the AMPure XP system (Beckman Coulter, Pasadena, CA, USA) and library quality was assessed on the Agilent Bioanalyzer 2100 system (Agilent Technologies, Santa Clara, CA, USA). Index-coded samples were clustered on a cBot Cluster Generation System (Illumina, San Diego, CA, USA) using TruSeq PE Cluster Kit v3-cBot-HS (Illumina, San Diego, CA, USA) following the manufacturer's protocol. Once samples were clustered, libraries were sequenced on an Illumina Hiseq platform to generate paired-end reads.

Sequence Read Mapping, Assembly and SSR Detection
The raw data, or raw reads, in FASTQ format were processed through in-house Perl scripts. Clean data or clean reads were obtained by removing reads containing the adapter or ploy-N sequences, as well as removing low quality reads from the raw data. At the same time, Q20, Q30, GC-content and sequence duplication level of the clean data were calculated. All the downstream analyses were conducted on clean data that were of high quality. The transcriptome was assembled with the clean reads using Trinity [52] at default settings and with min_kmer_cov set to 2 by default. SSRs in the transcriptome were identified using MISA (http://pgrc.ipk-gatersleben.de/misa/misa.html). Protein coding sequences (CDS) of the assembled unigenes were predicted using BLAST and then EST Scan (E value < 10-5) [53].

Gene Expression Quantification and Differential Expression Analysis
Gene expression levels were estimated by RSEM (http://deweylab.github.io/RSEM/) [54] for each sample. The clean data were mapped back onto the assembled transcriptome and read count for each gene was obtained from the data mapped onto the transcriptome. To identify differentially expressed genes (DEGs), differential expression analysis of two groups was performed using the DESeq R package (1.10.1) (http://www.bioconductor.org/packages/release/bioc/html/DESeq.html). DESeq determines differential expression in digital gene expression data using a model based on a negative binomial distribution. The p values of the DESeq analyses were adjusted using the Benjamini and Hochberg's approach to control for the false discovery rate. Genes with an adjusted p-value < 0.05 were assigned as differentially expressed.

Functional Annotation
Gene Ontology (GO) enrichment analysis of the DEGs was implemented by the GOseq R packages based Wallenius non-central hyper-geometric distribution [55], which can adjust for gene length bias in DEGs.
KEGG [56] is a resource for identifying high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information (http: //www.genome.jp/kegg/). We used KOBAS [57] to test for the statistical enrichment of DEGs in KEGG pathways.
The DEGs were searched against the genomes of a related species using BLASTx and the protein-protein interactions (PPI) were tested using the STRING database (http://string-db.org/). The PPI of DEGs were visualized in Cytoscape [58].

Quantitative Real-Time-RCR Analysis
In order to validate the RNA-Seq data, 10 genes were randomly selected analyzed by qRT-PCR normalized to a reference gene (GAPDH ; Table S9). RNA was isolated from relevant samples as described above. In a fluorescence quantitative PCR tube (TLS-0851; Bio-Rad), 2 µL of cDNA (30 ng/µL), 1.5 µL of reverse primer (10 µmol/L), 1.5 µL of forward primer (10 µmol/L), 10 µL 2 × SYBR Premix Ex Taq (5 U/µL) and 5 µL of ddH 2 O were added to a total volume of 20 µL. PCR was conducted using three biological replicates tested over three technical replicates. The amplification procedure was as follows: 95 • C for 30 s, followed by 95 • C for 5 s and 64 • C for 30 s, repeated 40 times, followed by an extension phase from 60 • C to 95 • C, where the temperature for each cycle increased by 0.5 • C for 5 s to obtain Tm and fluorescent signals for the melting curve. To determine the relative fold change for each sample in each experiment, the Ct value for the reference gene and candidate genes were calculated using the Ct method.

Conclusions
Here, we used RNA-Seq to generate the common vetch transcriptome and analyze changes in gene expression under drought stress. Based on the assembled de novo transcriptome, 3126 and 10,368 genes were discovered at three days and five days after inducing drought stress, respectively. The KEGG pathway analysis uncovered 'Plant hormone signal transduction,' 'Glycolysis/Gluconeogenesis' and 'Phenylpropanoid biosynthesis' as the important pathways associated with response to drought. We also developed new genetic markers, including SSRs, which can further be used for genetic studies in the common vetch.