Transcriptome Sequencing Analyses between the Cytoplasmic Male Sterile Line and Its Maintainer Line in Welsh Onion (Allium fistulosum L.)

Cytoplasmic male sterility (CMS) is important for exploiting heterosis in crop plants and also serves as a model for investigating nuclear–cytoplasmic interaction. The molecular mechanism of cytoplasmic male sterility and fertility restoration was investigated in several important economic crops but remains poorly understood in the Welsh onion. Therefore, we compared the differences between the CMS line 64-2 and its maintainer line 64-1 using transcriptome sequencing with the aim of determining critical genes and pathways associated with male sterility. This study combined two years of RNA-seq data; there were 1504 unigenes (in May 2013) and 2928 unigenes (in May 2014) that were differentially expressed between the CMS and cytoplasmic male maintainer Welsh onion varieties. Known CMS-related genes were found in the set of differentially expressed genes and checked by qPCR. These genes included F-type ATPase, NADH dehydrogenase, cytochrome c oxidase, etc. Overall, this study demonstrated that the CMS regulatory genes and pathways may be associated with the mitochondria and nucleus in the Welsh onion. We believe that this transcriptome dataset will accelerate the research on CMS gene clones and other functional genomics research on A. fistulosum L.


Introduction
As an important vegetable crop, The Welsh onion (Allium fistulosum L.) is cultivated worldwide from tropical Asia to Northeast Asia. Because it is rich in propylene sulfide having bactericidal and anti-inflammatory effects, the Welsh onion has also been used as an herbal medicine for diverse diseases, such as febrile disease, headache, diarrhea, abdominal pain, eye-related disorders, and habitual abortion [1]. The Welsh onion is a cross-pollinated crop, and heterosis has been extensively applied in breeding [2]. Artificial emasculation is difficult in this species because of its small flower. Therefore, cytoplasmic male sterility (CMS) plays a crucial role in heterosis breeding.
CMS is a phenotypic trait that is widespread among plants and results in the inability of a plant to produce viable pollen [3], using CMS lines to obtain F 1 hybrids, which is the preferred method in heterosis breeding. The CMS plants are usually obtained by natural populations [4,5], crossed by different genera and species plants [6,7], physical and chemical mutagenesis [8], and plant protoplast

Sequencing and Transcriptome Assembly
To maximize the range of transcript diversity, a mixed RNA sample from three inflorescences in the flowering stage of the Welsh onion was prepared for RNA-seq using the Illumina HiSeq TM 2000. After stringent quality assessment and data filtering, we generated 57.19 million read pairs that corresponded to 11.55 Gb of sequence data in May 2013, and 56.82 million read pairs that corresponded to 11.48 Gb of sequence data in May 2014 ( Table 1). The Q30 percentages exceeded 85% in both years. In May 2013, using the Trinity de novo assembly program, the next-generation short-read sequences were assembled into 97,230 transcripts which had a mean length of 917.56 bp. A total of 52,126 unigenes with an average length of 755.69 bp were obtained. The N50 values of the transcripts and unigenes were 1419 and 1247 bp, respectively. In May 2014, there were 308,353 transcripts with a mean length of 1122.65 bp, and there were 70,360 unigenes with an average length of 879.72 bp. The N50 values of the transcripts and unigenes were 1813 and 1267 bp, respectively (Table S1). The longest transcript was taken as the sample unigene for data from each year.  The distribution curve for each sample was relatively flat in the randomly fragmented transcriptome. The number of sample sequences reached saturation under current conditions ( Figure S1). The length distributions of the ORFs are shown in Figure S2. These results indicate that the sequencing quality and throughput were sufficient for the subsequent analyses.

Functional Annotation and Classification
Unigenes that had at least one significant match to an existing gene model in BLAST were determined using searches against the NCBI NR, Swiss-Prot, KEGG, GO, and COG. After screening all differentially expressed genes, we extracted identified information from the unigenes (Figure 1), and presented the overview of the results in Table S2.

Functional Annotation and Classification
Unigenes that had at least one significant match to an existing gene model in BLAST were determined using searches against the NCBI NR, Swiss-Prot, KEGG, GO, and COG. After screening all differentially expressed genes, we extracted identified information from the unigenes (Figure 1), and presented the overview of the results in Table S2. (B) Whole-study overview of log-fold changes in gene expression in CMS vs. cytoplasmic male maintainer in May 2014. The x-axis indicates the absolute expression levels (log2 RPKM (Reads Per Kilobase per Million mapped reads)). The y-axis indicates the log2-FC (fold changes) between the two samples. Genes for which differential expression is significant are shown as blue dots (Log2FC > 1 or < −1; FDR (False Discovery Rate) < 0.01).

Functional Annotation and Classification
Unigenes that had at least one significant match to an existing gene model in BLAST were determined using searches against the NCBI NR, Swiss-Prot, KEGG, GO, and COG. After screening all differentially expressed genes, we extracted identified information from the unigenes (Figure 1), and presented the overview of the results in Table S2.  (Table S2). The unigenes were annotated in 22 (May 2013) and 23 (May 2014) COG categories ( Figure 2). Among these, "the general function prediction" was the largest group, followed by "the replication, recombination and repair", "transcription", and "signal transduction mechanisms", whereas the "Nuclear structure", "Extracellular structures", and "Cell motility" groups were the smallest.

Pathway Enrichment Analysis of DEGs (Differentially expressed unigenes)
To categorize the biological functions of the DEGs, we performed a KEGG pathway enrichment analysis. The DEGs that significantly enriched pathways in May 2013 and May 2014 were the "plantpathogen interaction" pathway, followed by the "oxidative phosphorylation" and "plant hormone signal transduction" pathways. In addition, pathways such as "starch and sucrose metabolism", "glycerophospholipid metabolism", and "protein processing in endoplasmic reticulum" composed a large component of the DEGs with pathway annotation ( Table 2). In the above-mentioned DEGs with the best-represented KEGG pathways, there was consistency in enzyme regulation in the "oxidative phosphorylation" pathway between May 2013 and May 2014 ( Figure 4). The enzymes "NADH: ubiquinone reductase (H + -translocating)", "ND5" (NADH dehydrogenase 5), "COX2" (cytochromec oxidase 2), and "F-type ATPase α" (Eukaryotes) were related to down-regulated genes, whereas "inorganic diphosphatase", "ATP phosphohydrolase" (H + -exporting), and "F-type ATPase E" (Eukaryotes) were related to up-regulated genes. "ATP phosphohydrolase" (H + -transporting) was related to both up-and down-regulated genes.

Pathway Enrichment Analysis of DEGs (Differentially Expressed Unigenes)
To categorize the biological functions of the DEGs, we performed a KEGG pathway enrichment analysis. The DEGs that significantly enriched pathways in May 2013 and May 2014 were the "plant-pathogen interaction" pathway, followed by the "oxidative phosphorylation" and "plant hormone signal transduction" pathways. In addition, pathways such as "starch and sucrose metabolism", "glycerophospholipid metabolism", and "protein processing in endoplasmic reticulum" composed a large component of the DEGs with pathway annotation (Table 2). In the above-mentioned DEGs with the best-represented KEGG pathways, there was consistency in enzyme regulation in the "oxidative phosphorylation" pathway between May 2013 and May 2014 (Figure 4). The enzymes "NADH: ubiquinone reductase (H + -translocating)", "ND5" (NADH dehydrogenase 5), "COX2" (cytochrome-c oxidase 2), and "F-type ATPase α" (Eukaryotes) were related to down-regulated genes, whereas "inorganic diphosphatase", "ATP phosphohydrolase" (H + -exporting), and "F-type ATPase E" (Eukaryotes) were related to up-regulated genes. "ATP phosphohydrolase" (H + -transporting) was related to both up-and down-regulated genes.

Analysis of CMS Related Genes in the Welsh Onion
This study utilized RNA-seq technology to explore CMS related genes between a CMS line and its maintainer line in the Welsh onion. All of the DEGs obtained from two years of RNA-seq data were investigated by BLAST in the NCBI database, and 559 unigenes were predicted to associate with CMS. Due to the DEGs that significantly enriched oxidative phosphorylation pathways in May 2013 and May 2014, the mitochondrial respiratory chain enzymes and enzyme complexes in oxidative

Analysis of CMS Related Genes in the Welsh Onion
This study utilized RNA-seq technology to explore CMS related genes between a CMS line and its maintainer line in the Welsh onion. All of the DEGs obtained from two years of RNA-seq data were investigated by BLAST in the NCBI database, and 559 unigenes were predicted to associate with CMS. Due to the DEGs that significantly enriched oxidative phosphorylation pathways in May 2013 and May 2014, the mitochondrial respiratory chain enzymes and enzyme complexes in oxidative phosphorylation pathways were important to CMS lines in other plants [22][23][24]. This study obtained six unigenes (IDs: c116086, c175619, c159049, c160965, c113452 and c50467) related to mitochondrial respiratory chain enzymes and enzyme complexes (Table 3), and they were significantly differentially expressed between the CMS line and the CMM line ( Figure 5). These six unigenes were validated using RT-qPCR; the transcription expression of five unigenes was down-regulated in the cytoplasmic male maintainer line. Although most RT-qPCR results indicated smaller differences compared with the RNA-seq analysis, there was consistent expression. The unigenes functioned in "Energy production and conversion" (c175619, c159049) and "Posttranslational modification, protein turnover, and chaperones" (c173435) in the COG class annotation; they annotated to "Plasma membrane ATPase" (c116086), "Cytochrome c oxidase subunit 2" (c175619), "Cytochrome c oxidase subunit 3" (c159049), "Probable F-box protein" (c113452) and "Polygalacturonase inhibitor 1" (c50467) in the Swiss-Prot annotation; and, for the nt annotation, they were involved in the "Cucumissativus plasma membrane ATPase 4-like" (LOC101221564) (c116086), "Hibiscus cannabinus cultivar P3B cytochrome c oxidase subunit II (cox2) gene" (c175619), "Allium cepa cultivar saski cytochrome oxidase subunit 3 (cox3) gene, complete cds; mitochondrial" (c159049), and "Allium cepa cultivar saski NADH dehydrogenase subunit 1 (nad1) gene, partial cds; and ATPase α subunit (atpA) gene, complete cds; mitochondrial" (c160965).

The Welsh Onion CMS Related Genes Were Enriched in the Oxidative Phosphorylation Pathway of the Mitochondrial Respiratory Chain
In this study, we profiled the Welsh onion transcriptome using the Illumina HiSeq 2000 platform, and classified the functions of the unigenes using the COG and GO classification and pathway enrichment analysis. A total of 1504 unigenes (in May 2013) and 2928 unigenes (in May 2014) were differentially expressed between the CMS line and its maintainer line in the Welsh onion.
In this study, we found many DEGs enriched in oxidative phosphorylation of the mitochondrial respiratory chain. The GO terms that identified the DEGs were related to CMS, i.e., "mitochondrion", "mitochondrial respiratory chain", "phosphorylation", and "oxidation-reduction process", and the cluster frequency of these DEGs was >5% (cluster frequency is the ratio of the number of DEGs that are annotated to this GO term to the number of DEGs that are annotated to all GO terms). The DEGs from the pathway analysis also showed many enzymes associated with the respiratory chain, and the regulation of numerous enzymes enriched in the "Oxidative phosphorylation" pathway, which is consistent in the RNA-seq data from both years.
In electron transfer system of mitochondrial respiratory chain, the respiratory chain enzymes and enzyme complexes (NADH dehydrogenase, cytochrome reductase, and cytochrome oxidase) play important roles in plants, their major function is to link electron transfer with proton translocation out of the mitochondrion, generating a transmembrane proton motive force that subsequently drives ATP synthesis by H + -ATPase [25]. In addition, cytochrome oxidase is the marker enzyme of the mitochondrial inner membrane with strong activity. Numerous studies have shown that cytochrome oxidase is relevant to CMS in plants [22,23]. In the pepper CMS line, a novel chimeric gene orf 456, was found to be inserted into the 3 1 -end of the coxII gene, and it may change the sterile characterization in the CMS line [22]. Luo et al. demonstrated that a new mitochondrial gene WA352 encodes a protein that interacts with the nuclear-encoded mitochondrial protein COXII, thus causing premature tapetal programmed cell death and consequent male sterility [23]. In some studies, the CMS related candidate genes had close relationships with ATPase genes [2,11]. The F-type ATPase was proved to be embedded in the inner membranes of mitochondria, and this enzyme is a coupling factor in oxidative phosphorylation or photophosphorylation and plays an important role in energy conversion [24]. In this study, several key genes in the oxidative phosphorylation pathway were observed to be down-regulated by the RNA-seq approach; these gene-related enzymes were "NADH: ubiquinone reductase (H + -translocating)", "ND5" (NADH dehydrogenase 5), "COXII" (cytochrome-c oxidase II), and "F-type ATPase α" (Eukaryotes). Two unigenes, c175619 and c159049, were annotated in "Energy production and conversion (COG)"; the unigene c116086 was identified as a plasma membrane ATPase (Swiss-Prot annotation), and the unigene c160965 was an ATPase α subunit (atpA) gene (nr annotation). In our previous study, the analysis of Two-Dimensional Electrophoresis results showed the ATPase α subunit was specifically expressed in the CMS line compared with the cytoplasmic male maintainer line (unpublished data). Based on the above research, we conjectured that CMS was closely related to the respiratory chain enzymes and enzyme complexes.

The Nuclear Cellular Component Function Genes Are Important to CMS in the Welsh Onion
In addition, more DEGs genes were obtained in the nucleus than in the mitochondrion: their cellular component functions (GO) were clustered in "nucleus" (Cluster frequency 25.42% in May 2013 and 13.43% in May 2014), "cytoplasmic membrane-bounded vesicle" (Cluster frequency 13.84% in May 2013 and 16.24% in May 2014), and "membrane" (Cluster frequency 15.68% in May 2013 and 12.70% in May 2014). In the COG functional classification, the "replication, recombination and repair" (Cluster frequency 10.22% in May 2013 and 13.27% in May 2014) and "transcription" (Cluster frequency 10.02% in May 2013 and 11.14% in May 2014) were the largest groups, and the processes of gene transcription mainly occur in the nucleus. Some research has reported that the sterility is caused by chimeric mitochondrial genes regulated by nuclear genes [26][27][28][29]. Thus, the nucleus may play an important role in the mechanisms of cytoplasmic male sterility. The results from Liu et al. suggest the CMS is the result of the nuclear-mitochondrial interaction [30]. In addition, Hong et al. suggested that nuclear-mitochondrial interaction may regulate pollen development and that the failure of sporogenous cell differentiation leads to sterility [31]. Therefore, we hypothesized that the Welsh onion male sterility regulatory genes and pathways are not only implicated in the mitochondria, but are also implicated in nuclear-mitochondria interaction. In addition, the identification of genes associated with male sterility in the nucleus remains to be further validated.

Sample Preparation
Two Welsh onion varieties (termed 64) were used in this study: 64-1, a CMS mutant in natural populations, and 64-2, a cytoplasmic male maintainer line, which when bred with 64-1 can produce fertile offspring ( Figure 6). Both lines were bred at the Beijing Vegetable Research Center and the Beijing Academy of Agriculture and Forestry Sciences. For each variety, RNA was isolated from three inflorescences in the flowering stage of the Welsh onion in May 2013. We then collected experimental materials the same way in May 2014 as a repeat experiment. All samples were ground immediately after harvest in a mortar and pestle using liquid nitrogen. Total RNA was extracted using RNAiso for polysaccharide-rich Plant Tissue Kit (TaKaRa Biotechnology, Dalian, Liaoning Province, China), and RNA integrity was evaluated using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). RNA-seq was performed at Beijing BioMarker Technologies (Beijing, China). The paired-end library preparation, cDNA library and sequencing were performed following standard Illumina methods on the Illumina sequencing platform (HiSeq™ 2000). The read length of sequencing in both years was 2ˆ103 bp. suggest the CMS is the result of the nuclear-mitochondrial interaction [30]. In addition, Hong et al. suggested that nuclear-mitochondrial interaction may regulate pollen development and that the failure of sporogenous cell differentiation leads to sterility [31]. Therefore, we hypothesized that the Welsh onion male sterility regulatory genes and pathways are not only implicated in the mitochondria, but are also implicated in nuclear-mitochondria interaction. In addition, the identification of genes associated with male sterility in the nucleus remains to be further validated.

Sample Preparation
Two Welsh onion varieties (termed 64) were used in this study: 64-1, a CMS mutant in natural populations, and 64-2, a cytoplasmic male maintainer line, which when bred with 64-1 can produce fertile offspring ( Figure 6). Both lines were bred at the Beijing Vegetable Research Center and the Beijing Academy of Agriculture and Forestry Sciences. For each variety, RNA was isolated from three inflorescences in the flowering stage of the Welsh onion in May 2013. We then collected experimental materials the same way in May 2014 as a repeat experiment. All samples were ground immediately after harvest in a mortar and pestle using liquid nitrogen. Total RNA was extracted using RNAiso for polysaccharide-rich Plant Tissue Kit (TaKaRa Biotechnology, Dalian, Liaoning Province, China), and RNA integrity was evaluated using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). RNA-seq was performed at Beijing BioMarker Technologies (Beijing, China). The paired-end library preparation, cDNA library and sequencing were performed following standard Illumina methods on the Illumina sequencing platform (HiSeq™ 2000). The read length of sequencing in both years was 2 × 103 bp. All of the datasets from the Illumina sequencing platform can be found in the Short Read Archive (SRA) database of the National Center for Biotechnology Information (NCBI) under accession number SRP071555.

Sequence Data Analysis and Assembly
The Bcl2fastq software was used for pre-processing. The raw reads were cleaned by removing the adaptor sequences, duplication sequences, low-quality reads (reads with ambiguous bases "N"), All of the datasets from the Illumina sequencing platform can be found in the Short Read Archive (SRA) database of the National Center for Biotechnology Information (NCBI) under accession number SRP071555.

Sequence Data Analysis and Assembly
The Bcl2fastq software was used for pre-processing. The raw reads were cleaned by removing the adaptor sequences, duplication sequences, low-quality reads (reads with ambiguous bases "N"), and reads with more than 50% of the bases with a Q-value ď5 [32] to obtain high-quality clean read data. The Trinity method [33] was used for de novo assembly. First, contigs were obtained by extension based on the overlap between sequences. Next, the resultant contigs were joined into transcripts with the paired-end information, and the following parameters were used: seqTypefq, min kmer cov 2, min contig length 200, group pairs distance 500 and other default parameters. The longest transcript was taken as the sample unigene, and the unigenes were combined to produce the final assembly used for annotation. Getorf [34] was used to predict the open reading frame (ORF) of the unigenes, and the longest ORF was extracted for each unigene. The Bowtie [35] was used to compare between unigene database and each sequencing read sample, and default parameters of the software were used.

Functional Annotation and Differential Gene Expression Analysis
The unigenes were aligned with the NCBI's nucleotide (nt) databases and non-redundant (nr) protein, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database, the Swiss-Prot protein database, and the Cluster of Orthologous Groups (COG) database. The Blast2GO program was used to assign GO terms [36]. The BLAST [37] program was used to performed COG and KEGG pathway annotation. The above searches were performed with a cut-off e-value of 10´5. The unigenes were aligned to the COG and KEGG databases to predict and classify the possible functions and metabolic pathways of the unigenes. For differential gene expression analysis, Differentially expressed genes were identified using DESeq software [38,39] in each comparison, and the results were analyzed as described previously [1].

Quantitative RT-PCR (RT-qPCR) Analysis
The cDNA was synthesized from the flowering stage of the Welsh onion with total RNA in a 10-µL reaction system using PrimeScript II RTase (TaKaRa Biotechnology, Dalian, Liaoning Province, China). The reverse transcription reaction mixture contained 5 µL of total RNA (0.8 µg), 3 µL of diethylpyrocarbonate (DEPC) water, 1 µL of oligo dT (50 µM) and 1 µL of dNTP (10 mM), and the reaction was performed following the kit protocol. Three biological replicates and three technical replicates for each experiment were performed. The qPCR reactions were performed in 96-well plates on the LightCycler 480 instrument (Roche Diagnostic, Mannheim, Germany) using SYBR Green I (TaKaRa Biotechnology, Dalian, Liaoning Province, China). The qPCR primers, which were designed using Primer3 (http://frodo.wi.mit.edu/primer3/), are shown in Table S3. Each gene was analyzed using the method described previously by Qianchun Liu et al. [1].

Conclusions
This study investigated the important CMS-related genes and pathways in the Welsh onion by transcriptome analysis. Several key genes such as F-type ATPase, NADH dehydrogenase, and cytochrome c oxidase were observed by RNA-seq and validated by RT-qPCR. These genes were not only closely related to mitochondria but also to the nuclear-mitochondria interaction. Further functional characterization is in progress to determine the specific functions of these genes in relation to CMS in Allium fistulosum L. This study will accelerate the research on CMS gene clones and other functional genomics research on A. fistulosum.