- freely available
Int. J. Mol. Sci. 2014, 15(3), 4635-4656; doi:10.3390/ijms15034635
Abstract: The RNA helicases, which help to unwind stable RNA duplexes, and have important roles in RNA metabolism, belong to a class of motor proteins that play important roles in plant development and responses to stress. Although this family of genes has been the subject of systematic investigation in Arabidopsis, rice, and tomato, it has not yet been characterized in cotton. In this study, we identified 161 putative RNA helicase genes in the genome of the diploid cotton species Gossypium raimondii. We classified these genes into three subfamilies, based on the presence of either a DEAD-box (51 genes), DEAH-box (52 genes), or DExD/H-box (58 genes) in their coding regions. Chromosome location analysis showed that the genes that encode RNA helicases are distributed across all 13 chromosomes of G. raimondii. Syntenic analysis revealed that 62 of the 161 G. raimondii helicase genes (38.5%) are within the identified syntenic blocks. Sixty-six (40.99%) helicase genes from G. raimondii have one or several putative orthologs in tomato. Additionally, GrDEADs have more conserved gene structures and more simple domains than GrDEAHs and GrDExD/Hs. Transcriptome sequencing data demonstrated that many of these helicases, especially GrDEADs, are highly expressed at the fiber initiation stage and in mature leaves. To our knowledge, this is the first report of a genome-wide analysis of the RNA helicase gene family in cotton.
RNA helicases are found in many organisms and have important roles in RNA metabolism. These enzymes participate in many metabolic pathways that involve the separation of double-stranded nucleic acid into single strands and the removal of nucleic-acid-associated proteins . Usually, they translocate along the bound strand unidirectionally and their specific polarity depends upon the bound strand. RNA helicases potentially regulate cellular growth and differentiation, as well as responses to abiotic stress, by affecting nuclear mRNA export, translation initiation, mRNA decay, rRNA processing, cell cycle progression, and the initiation of transcription [2–4].
Based on variations within the DEAD (Asp-Glu-Ala-Asp) motif, the RNA helicase superfamily proteins can be classified into three subfamilies, which are defined by the presence of either the DEAD-box, DEAH-box, or DExD/H-box [5,6]. Multiple members of each subfamily have been found in several plants. Many of them, especially DEAD-box genes, are key regulators of developmental processes and responses to diverse abiotic stresses, such as salt stress, oxygen levels, light, or temperature . In Arabidopsis, the DEAD-box RNA helicase LOS4 controls responses to low temperature and processes such as flowering and vernalization [7,8]. STRS1 and STRS2 mediate stress responses to various abiotic stresses . RCF1 maintains proper splicing of pre-mRNAs and regulates responses to changes in temperature . Tobacco VDL (for variegated and distorted leaf), a plastid DEAD-box RNA helicase, is essential for chloroplast differentiation and plant morphogenesis . Maize DRH1 interacts with the nucleolar protein fibrillarin, MA16, and controls the metabolism of ribosomal RNA . Rice ABP (ATP-binding protein) is upregulated in response to multiple abiotic stresses . We reported that the DEAD-box RNA helicase AvDH1 (Apocynum venetum ATP-dependent DEAD-box helicase) participates in the regulation of salt tolerance in the halophyte Apocynum venetum . The nucleolar DExD/H box RNA helicase AtMTR4 (mRNA transport protein) is required for normal rRNA biogenesis and development in Arabidopsis . ISE2, a DEVH-box RNA helicase, has been shown to be involved in plasmodesmata function during embryogenesis in Arabidopsis . The DEVH-box RNA helicase AtHELPS is important for responses to and tolerance of K+ deprivation in Arabidopsis . The Arabidopsis DEAH-Box RNA helicase RID1 controls pre-mRNA splicing .
The recent availability of genome sequences enabled systematic investigation of the families of RNA helicase genes from Arabidopsis, rice, tomato, maize, and soybean. A total of 32 DEAD-box RNA helicases have been identified in Arabidopsis ; 113 and 115 RNA helicase genes have been found in Arabidopsis and rice, respectively . Xu et al.  described the complete analysis and classification of 157 RNA helicase genes in tomato. Soon after, they reported a comparative genome-wide analysis of the RNA helicase gene families in Arabidopsis, rice, maize, and soybean . The RNA helicase genes from Arabidopsis, rice, maize, and soybean were classified into three subfamilies, with the following respective numbers of genes for each subfamily: DEAD-box (50, 51, 57, and 87 genes), DEAH-box (40, 33, 31, and 48 genes), and DExD/H-box (71, 65, 50, and 78 genes) .
Cotton is an economically important crop grown worldwide as a source of fiber and edible oil [20,21]. RNA helicases are ubiquitous proteins that are believed to play important roles in cotton development and stress tolerance. However, only a few RNA helicases have been characterized in cotton [22,23]. Gossypium raimondii is a diploid species of cotton. Release of the G. raimondii genome enabled us to identify and analyze the family of RNA helicase genes in this species. Presently, there are a few reports on studies of cotton transcriptomics [22,24]. Here, we looked at expression of the different RNA helicases in transcriptome data obtained at Peking University (Beijing, China) and accessed through NCBI [25,26]. The present study identified 161 RNA helicase genes from the G. raimondii genome. These were classified into three subfamilies, which were defined by the presence of either a DEAD-box, DEAH-box, or DExD/H-box. Detailed information about the chromosomal locations, expansion, genomic structures, and phylogenetic relationships of the RNA helicase genes is provided. Transcript profiles of 161 RNA helicase genes in mature leaves and at the 0-day-post-anthesis (DPA) and 3-DPA ovule developmental stages were investigated using transcriptome-sequencing data. Our results show that most of these helicase proteins, especially GrDEADs, might function during the fiber initiation stage. This is the first report of a genome-wide analysis of the RNA helicase gene family of G. raimondii.
2.1. Identification of RNA Helicase Family Genes in G. raimondii
RNA helicase usually contains a highly conserved adenosine triphosphate (ATP)-binding domain and a classical C-terminal domain [27–29]. To identify all members of the RNA helicase gene family in G. raimondii, all known Arabidopsis helicase gene sequences and all identified tomato RNA helicase gene sequences were used to query the protein database of G. raimondii using BLAST (basic local alignment search tool) analysis. This identified 161 genes, of which 151 genes had apparent helicase domains with the remaining 10 genes having either incomplete or no helicase domains. Given that the annotation of the G. raimondii genome indicated that these 10 genes encode helicase proteins, they were subjected to further analysis. This enabled us to classify all 161 putative helicase genes into three subfamilies based on their phylogenetic relationships and the structural features of the motif II region. The three subfamilies were defined by the presence of the DEAD-box (51 genes), DEAH-box (52 genes), and DExD/H-box (58 genes) motifs (Table 1, File S2). Based on their subfamily and the order in which they are found on chromosomes 1 through 13, the genes were renamed from GrDEAD1 through GrDEAD51, from GrDEAH1 through GrDEAH52, and from GrDExD/H1 through GrDExD/H58 (Table 1). The lengths, molecular weights, isoelectric points, and predicted subcellular localizations of each Gossypium raimondii helicase protein are summarized in Table 1. We found that GrDEAD genes were distinct from GrDEAH and GrDExD/H genes. Whereas the average GrDEAD protein contains 609 amino acids (aa), GrDEAH and GrDExD/H proteins each average approximately 1100 aa in length. The average theoretical isoelectric point (pI) of GrDEAD proteins, which is approximately 8.21, is higher than that for members of the two other families. Whereas, 74.5% of the helicases proteins analyzed were predicted to be located in the nucleus, 16.1% were predicted to reside in the cytoplasm, and others were mainly predicted to be located in the chloroplast and mitochondrion (Table 1).
2.2. Phylogenetic Analysis of the RNA Helicase Family Genes in G. raimondii
The phylogenetic tree of each subfamily in our study showed that the DEAD-box (Figure 1A), DEAH-box (Figure 1B) and DExD/H-box (Figure 1C) subfamilies could be further classified into four, six, and six subgroups, respectively. However, the available phylogenetic analysis of RNA helicases from Arabidopsis, rice, maize, and soybean  places these subfamilies into many more subclades. That study classified the DEAD-box, DEAH-box and DExD/H-box RNA helicase proteins from tomato into three, three, or five large subgroups, respectively. The diversity of these subclades indicates the extent of the variation of RNA helicase genes in plants.
2.3. Chromosomal Position and Gene Duplications
The 161 RNA helicase genes of G. raimondii are distributed across all 13 chromosomes, with different densities of their distribution along different chromosomes (Figure 2). For example, whereas chromosome 7 contained 19 RNA helicase genes, only five RNA helicase genes were annotated on chromosome 2. Based on the definition of gene clusters, a chromosomal region containing two or more genes within 200 kb can be considered to be a gene cluster [31,32]. In G. raimondii, 18 helicase genes were identified in nine clusters (GrDEAD34–35, GrDExD/H46–47, GrDEAD40-GrDExD/H39, GrDEAH10-GrDExD/H15, GrDEAH6-GrDExD/H12, GrDEAD27-GrDEAH22, GrDEAH27-GrDExD/H31, GrDEAD48-GrDExD/H52, and GrDEAD47-GrDExD/H5) that were dispersed across several chromosomes. Two pairs of genes (GrDEAD40-GrDExD/H39 and GrDEAH10-GrDExD/H15) were arranged in tandem repeats on chromosomes 5 and 10, respectively. In addition, recent studies have shown that G. raimondii has undergone the hexaploidization event (γ-WGD) shared by the eudicots and a cotton-specific whole genome duplication . To analyze the relationship between the RNA helicase genes and genome-wide duplications, we mapped the G. raimondii helicase genes to the duplicated blocks. We found that 62 of the 161 helicase genes (38.5%) had syntenic relationships (Figure 2, File S1). Of these, 40 genes involved only two chromosome regions, nine (GrDEAD4-18-48, GrDExD/H4-30-54, and GrDEAH7-16-39) spanned three chromosome regions, eight (GrDEAD3-8-11-24 and GrDExD/H13-33-38-58) traversed four chromosome regions, and five (GrDEAD19-32-35-39-41) crossed five chromosome regions (Figure 2, File S1).
We also examined the orthologous relationships between helicase genes from Gossypium raimondii and tomato, given that orthologs often retain equivalent functions during the course of evolution . We found that 66 (40.99%) helicase genes from Gossypium raimondii have one or several putative orthologs in tomato (Figure 3, File S1). Of these, 27, 16, and 23 were assigned to the DEAD, DEAH, and DExD/H-box subfamilies, respectively (File S1). One member in DEAD-box family, GrDEAD7, is an ortholog of SIDEAD27 in tomato and STRS1 in Arabidopsis. Whereas, GrDEAD37 is an ortholog of SIDEAD34 in tomato and LOS4 in Arabidopsis, GrDExD/H35 is an ortholog of SIDExD/H21 in tomato and ISE2 in Arabidopsis (File S1).
2.4. Structures and Domain Analysis of the Putative Helicase Genes
Figure 4 shows the exon-intron structures and conserved domains of putative RNA helicase genes in each subfamily. The number and location of introns varied among subfamilies. In general, compared with the two other subfamilies, DEAD family genes had simpler structures and more conserved structural patterns than members of the two other subfamilies. The relative levels of conservation are exemplified by comparison of the high level of conservation evident in the comparisons GrDEAD21–30, GrDEAD25–31, GrDEAD27–50, GrDEAD4-18-48, GrDEAD23-34-42, GrDEAD1-37-45, GrDEAD5-12-29-51, GrDEAD3-8-11-24, and GrDEAD19-32-35-39-41 (Figure 4A), with the lower level of conservation amongst members of the GrDEAH-box and GrDExD/H-box helicase family genes evident in the comparisons GrDEAH4-18, GrDEAH38-3, GrDEAH35-28, GrDEAH25-8, GrDEAH33-29, and GrDEAH7-16-39 (Figure 4B), as well as GrDExD/H5–36, GrDExD/H43-1, GrDExD/H20–48, GrDExD/H54-30-4, GrDExD/H28-3-23, and GrDExD/H38-58-33-13-44 (Figure 4C). Gene structures within the same subgroup of all three subfamilies were also very diverse. In addition, we found that whereas genes duplicated in different parts of the genome (such as combinations GrDEAD35-41-19 and GrDEAD4-48-18) had the same or similar gene structures, genes that had been duplicated in tandem (GrDEAD40-GrDExD/H39 and GrDEAH10-GrDExD/H15) had different gene structures, especially in terms of the numbers of exons. Domain analysis indicated the presence of a highly conserved ATP-binding domain and a classical C-terminal domain in almost all of the predicted RNA helicases. Additionally, we found a Q motif in all members of the DEAD family, except for GrDEAD16. A WW domain was observed in three members of the DEAD family. We also found that many of the DEAH and DExD/H family genes were surrounded by defined folds, such as the zf-RING, dsRBDs, and HSA domains.
2.5. Expression of RNA Helicase Family Genes in G. raimondii
The full-length RNA helicase protein sequences in G. raimondii were aligned by MUSCLE (v3.8.31)  and analyzed using maximum likelihood (ML) method. The attribution of proteins in Figure 5 is according to the attribution of proteins in Figure 1. Bootstrap values were calculated. The abundances of the transcripts that encode selected RNA helicases was examined at the fiber initiation stage and in mature leaves. The expression patterns of 161 RNA helicase family genes are shown in Figure 5. Of the 161 predicted genes, only six genes (those that encode GrDEAD1, GrDEAD22, GrDEAD30, GrDEAD45, GrDEAH3, and GrDExD/H15) were not expressed, whereas 141 genes were expressed both in ovules and leaves. The GrDEAD genes (Figure 5A) showed more homogenous levels of expression and an overall higher level of expression both in ovules and leaves when compared with GrDEAH genes (Figure 5B) and GrDExD/H genes (Figure 5C). Seven GrDEAD genes (GrDEAD13, GrDEAD19, GrDEAD21, GrDEAD28, GrDEAD32, GrDEAD35, and GrDEAD41), one GrDEAH gene (GrDEAH23), and three GrDExD/H genes (GrDExD/H33, GrDExD/H54, and GrDExD/H58) were expressed at high levels in all three samples. A member of the DEAD-box family, GrDEAD7, which is an ortholog of Arabidopsis STRS1 was highly expressed in 0-DPA ovules and mature leaves. More than half of the genes were mainly expressed in one of the development stages and tissues tested, with 50 genes (8 GrDEADs, 24 GrDEAHs, and 18 GrDExD/Hs) being the most abundant in 0-DPA ovules, 25 (7 GrDEADs, 5 GrDEAHs, and 13 GrDExD/Hs) being the most abundant in 3-DPA ovules, and 14 (3 GrDEADs, 6 GrDEAHs, and 5 GrDExD/Hs) being the most abundant in mature leaves. For example, GrDEAD37 (an ortholog of Arabidopsis LOS4) was highly expressed in 0-DPA ovules, whereas, GrDExD/H35 (an ortholog of Arabidopsis ISE2) was expressed in mature leaves. Whereas, the similarity of the abundance profiles of transcripts encoded by duplicated genes, such as GrDEAD19, GrDEAD35, and GrDEAD41, suggested functional redundancy between these family members, the different expression patterns of members of the GrDEAD4-18-48 indicated that other G. raimondii helicase genes have been preserved by sub-functionalization.
The expressions of helicase genes were very diverse, even within one subgroup. For example, in the GrDExD/H group II, six genes (GrDExD/H4, GrDExD/H30, GrDExD/H33, GrDExD/H38, GrDExD/H54, and GrDExD/H58) were highly expressed in all three samples and clustered together. However, eight genes, including GrDExD/H13 and GrDExD/H45, could only be detected in one to three organs, and were expressed at a low level in most organs. Nonetheless, given that as members in this group had a Q motif, they may be GrDEAD genes, which were assigned to the wrong subfamily owing to inadequate classification.
RNA helicases play important roles in plant development and responses to stress. However, only a few RNA helicases have been identified in plants. Genome-wide analysis is the first step to elucidating the biological roles of members of the RNA helicase family members in certain plant species. The recent availability of genome sequences has enabled systematic investigation of this family of genes in Arabidopsis, rice, tomato, maize, and soybean. However, no RNA helicases have been characterized in cotton. This study involved a complete analysis of the RNA helicase gene family in the G. raimondii genome, including gene classification and the analysis of chromosomal locations, gene expansion, phylogenetic relationships, and structures of the genes, as well as their expression profiles at the fiber initiation stage and in mature leaves under normal growth conditions.
3.1. RNA Helicases in G. raimondii
We identified 161 RNA helicase genes in the diploid genome of the cotton species G. raimondii (Table 1). This is close to the estimated number of RNA helicase predicted from the analysis of the tomato (157), Arabidopsis (161), rice (149), maize (136), and soybean (213) genomes. The relatively large number of cotton RNA helicase genes identified likely reflects the detailed method we used to identify the RNA helicase genes. The presence of a large RNA helicase gene family in all of these species underscores that RNA helicases likely play important regulatory roles in various processes during plant growth and development. We classified these helicases into three subfamilies, which include the DEAD-box (51 genes), DEAH-box (52 genes), and DExD/H-box (58 genes) gene families (Table 1). Whereas, each of the three subfamilies of cotton RNA helicase genes consists of a similar number of genes, Xu et al.  have shown that the DEAD-box and DExD/H-box subfamily were larger than DEAH-box subfamily in Arabidopsis, rice, maize, and soybean. We also analyzed the predicted lengths, molecular weights, isoelectric points, and subcellular localizations of each putative helicase protein identified in the G. raimondii genome. We found that DEAD-box RNA helicases were distinct from DEAH-box and DExD/H-box RNA helicases. The apparently higher isoelectric points and smaller sizes of DEAD helicase proteins might be related to their relatively simple gene structures compared with other classes of RNA helicase. In addition, most of DEAD-box and DExD/H-box RNA helicase proteins were predicted to be located in the nucleus and cytoplasm while most DEAH-box RNA helicase proteins were predicted to reside in the nucleus (Table 1). Thus, we suggested DEAH-box RNA helicase proteins may mainly function in nuclear RNA processing. Linder et al.  have reported two DEAH-box RNA helicases, ESP3 and MUT6 which were located in nucleus perform different roles-RNA splicing and decay-in nuclear RNA processing. Several DEAD-box RNA helicase proteins and DExD/H-box RNA helicase proteins were predicted to be located in the chloroplast and mitochondria. Recently, there have been a few reports about RNA helicase proteins in chloroplast or mitochondria in plant. Asakura et al.  demonstrated that chloroplast RH3 DEAD-box RNA helicases in maize and Arabidopsis function in splicing of specific group II introns and affect chloroplast ribosome biogenesis. He et al.  demonstrated mitochondria ABO6 DExH-box RNA helicase in Arabidopsis involves in regulating the splicing of several genes of complex I.
3.2. Expansion of the RNA Helicase Gene Family in G. raimondii
Recent studies have shown that the G. raimondii genome has undergone at least two rounds of genome-wide duplication . To detect possible relationships between RNA helicase genes and genome duplication events, we mapped 36 paralogous gene pairs (38.5%) of RNA helicase genes in G. raimondii (Figure 2, File S1). A similar percentage of paralogous pairs of RNA helicase genes was observed in Arabidopsis, rice, maize and soybean (35, 27, 25, and 62 pairs, respectively) . These results suggest that the expansion of RNA helicase gene family is associated with whole-genome duplication events. However, the 38.5% value is much lower than that of the family of genes that encode NAC (NAM/ATAF/CUC) transcription factors (NAC-TFs) in G. raimondii. Of the 127 G. raimondii NAC-TF genes, 76.37% were within 307 identified syntenic blocks . Given that segmental duplication events occur more often in the more slowly evolving gene families , the RNA helicase gene family may be evolving more rapidly than most other gene families in the cotton genome. In addition to the whole-genome duplication event, gene families can also arise through tandem amplification. For instance, in Chinese plum, tandem duplications played a key role in the expansion of the AP2/ERF family . In G. raimondii, Shang et al.  also detected 20 tandem duplications associated with NAC-TF genes. However, only two gene pairs in our study showed evidence of having participated in tandem duplication. The mechanisms that supported the expansion of the RNA helicase gene family may be more complicated than is suggested by our classification; the specific mechanisms involved require further investigation. Marchat et al.  reported that the evolution of RNA helicase proteins involved gene fusion. Moreover, we speculate that given that the majority of RNA helicase gene family members play vital and diverse role in plants, there would be strong selection against variation in their copy numbers. The DEAD genes showed the greatest degree of duplication. Given the more simple and conserved gene structures and domains, as well as higher expression of members of this subfamily than those of others, we propose that DEAD helicase genes may evolve more slowly and may play more basic roles in plant growth and development than the members of other subfamilies of RNA helicases.
3.3. Phylogenetic Analysis and Gene Structural Organization
The full-length RNA helicase protein sequences in tomato and G. raimondii were aligned by MUSCLE and analyzed using the more accurate maximum likelihood (ML) method (File S2). Members of the DEAD-box, DEAH-box, and DExD/H-box subfamilies were further classified into four, six, and six subgroups, respectively (Figure 1), although Xu et al.  placed them into many more subclades following phylogenetic analysis of the RNA helicases of Arabidopsis, rice, maize, and soybean. Those workers classified the DEAD-box, DEAH-box, and DExD/H-box RNA helicase proteins from tomato into three, three, or five large subgroups, respectively. The diversity in the number and compositions of the subclades from different species indicate variation in the compositions of different RNA helicase gene families from different plant species. In addition, analysis of the exon-intron structures and sequences of the conserved domains can provide insights into the evolution of gene families. Our results showed that the numbers and locations of introns varied among subfamilies. Members of the DEAD-box subfamily had comparatively simple and conserved structural patterns. Genes from the GrDEAH-box and GrDExD/H-box subfamilies were less conserved and more diverse than those from the DEAD-box subfamily. It is noteworthy that genome-wide duplicated genes had the similar or same gene structures, whereas, tandem duplicated genes did not; this requires further analysis. Domain analysis indicated that an ATP-binding domain and a C-terminal domain were highly conserved in these putative RNA helicase proteins, whereas, more diversity was evident amongst the other domains present in each subfamily. The most characteristic feature of the DEAD-box family is the conserved Q motif . We found Q motifs in all members of DEAD genes, except for GrDEAD16. In addition, three of the family members were found to have a WW domain. The WW domain has been implicated in mediating protein-protein interactions and linking cell signaling to the membrane cytoskeleton . Whereas, DEAH genes lack a Q motif, most members of group II of DExD/H family have a Q motif. This might be attributed to the not-very-strict classification criteria used to distinguish between members of the DEAD and DExD/H subfamily. Many of the DEAH and DExD/H family genes were surrounded by defined folds, such as the zf-RING, dsRBD, and HAS domains, which may extend the length of the helicase. These regions influence or even define the function of a helicase . The great diversity in these helicases may allow them to regulate many specific pathways in plants.
3.4. Expression Analysis Based on Transcriptome Sequencing Data
RNA helicases rearrange RNA secondary structure, potentially playing roles in any cellular process that involves RNA metabolism . Of the 161 predicted genes in our study, 141 (87.6%) RNA helicase genes were expressed both in ovules and leaves. Xu et al.  reported that more than 80% RNA helicase genes in Arabidopsis, rice, and maize were expressed in at least one of the development stages and tissues tested. The high expression level of this RNA helicase gene family in all of these species further indicates that the RNA helicases may play important roles in various processes. The GrDEAD genes were more homogenous in terms of their level of expression and were expressed at higher levels both in ovules and leaves when compared with GrDEAH genes and GrDExD/H genes (Figure 5). The DEAD-box family member GrDEAD7, which is an ortholog of Arabidopsis STRS1, was highly expressed in 0-DPA ovules and mature leaves. The RNA helicases STRS1 and STRS2 have been shown to be involved in responses to various abiotic stresses . GrDEAD genes may be more important in diverse cellular processes than GrDEAH genes and GrDExD/H genes. However, Xu et al.  have shown the DEAH-box RNA helicase genes higher proportion of the development stages and tissues in Arabidopsis, rice, and maize than DEAD-box RNA helicase genes and DExD/H-box RNA helicase genes. This phenomenon might be attributed to diversity between the different crops investigated or the diversity of the development stages and tissues examined. Moreover, tissue specificity of the expression of genes is commonly observed in plants. For example, cyclin dependent kinases-like proteins (CKL) of Arabidopsis show strong tissue specificity of expression, with CKL12 being root specific, CKL3 induced in stem tumors and callus, and CKL6 showing strong expression only in leaf tissue . Though Arabidopsis ISE2 has been shown to be involved in posttranscriptional gene silencing, and the absence of the ISE2 affects a critical factor required for correct plasmodesmata (PD) formation and function , the duration of PD closure was positively correlated with the final fiber length attained . Thus, GrDExD/H35 is not expressed at the initiation fiber stage.
4. Experimental Section
4.1. Identification of RNA Helicase Genes
The latest version (V2.0) of the Gossypium raimondii genome and protein sequences was downloaded from CottonGen . To identify the members of the RNA helicase gene family in Gossypium raimondii, all known Arabidopsis helicase gene sequences and the identified tomato RNA helicase gene sequences  were used as queries to perform multiple database searches using BLASTP . They were downloaded from The Arabidopsis Information Resource (TAIR)  and the plantGDB database , respectively. After selection of G. raimondii proteins with at least 50% identity with the query sequence, the candidate helicases proteins were aligned with each other to ensure that no gene was represented multiple times. All of the remaining protein sequences were examined using the domain analysis program PROSITE  with the default cutoff parameters. The three fields (length, molecular weight, and isoelectric point) of each Gossypium raimondii helicase protein were calculated using the online ExPasy program . Subcellular localization was analyzed using the CELLO v2.5 server .
4.2. Phylogenetic Analysis
The full-length protein sequences of the helicase genes were aligned using the MUSCLE (v3.8.31)  program with the default settings. Phylogenetic trees were constructed by employing the maximum-likelihood (ML) method of the phyML (20120412) program  with the WAG (Whelan and Goldman) substitution model. Bootstrap values were calculated using the aLRT (approximate likelihood ratio test) model with the default cutoff parameters.
4.3. Chromosome Localization and Gene Duplications
Synteny analysis was conducted locally using a method similar to that developed for the Plant Genome Duplication Database . Mcscan  was employed to identify homologous regions, and syntenic blocks were evaluated using Circos-0.64 . Default parameters were used in all steps. Tandem duplication was characterized as multiple genes of one family located within the same or neighboring intergenic region .
4.4. Gene Structure and Domain Analysis
All of the putative protein sequences were analyzed using the domain analysis program PROSITE  at ExPASy . Exon-intron structure information of these helicase genes was parsed from the GFF (Generic Feature Format) file downloaded along with the genomic data. Gene structures of the helicase genes were generated using the GSDS (Gene Structure Display Server) algorithm .
4.5. Gene Expression Analyses
The expression pattern of the helicase genes was analyzed using transcriptome sequencing data from mature leaves, 0-DPA ovules, and 3-DPA ovules of G. raimondii. These data were obtained from the NCBI Sequence Read Archive (SRA) . The accession numbers were: SRX111367, SRX111365, and SRX111366, respectively. The search was performed using nucleotide signatures at least 20 nucleotide long. Reads mapping were performed by BWA (0.7.5a-r405)  with the default parameters except that the seed length was set to be 31. Sequenced reads that were mapped on these helicase sequences were converted to RPKM in order to estimate gene expression levels [60,61]. The formula used was:
where C is the number of reads that were uniquely aligned to the transcript, N is the total number of reads that were uniquely aligned to all the transcripts in a specific sample and L is number of bases in the transcript.
Our study has reported a genome-wide analysis of the important RNA helicase gene family in G. raimondii. Based on their expression analysis, we hypothesize that GrDEAD37 and GrDExD/H35 might function during the fiber initiation stage. Only 38.5% of the putative RNA helicase genes were mapped to the previously identified syntenic blocks. The specific mechanisms used during the expansion of the RNA helicase family might be more complicated than suggested by our analysis, and require further investigation. Of the subfamilies of RNA helicases from G. raimondii, the GrDEADs have undergone the greatest degree of duplication and have the most conserved structural patterns and highest levels of expression, when measured at the level of transcript abundance. This suggests that the GrDEAD gene subfamily may evolve more slowly than the two other subfamilies, and that GrDEAD genes play a more important and basic role in cotton than DEAH or DExD/H genes. This study should provide a solid foundation for future functional studies and for guiding future experimental work on helicase genes in plants.
This research was mainly supported by the China Major Projects for Transgenic Breeding (Grant Nos. 2011ZX08005-004 and 2011ZX08005-002) and the China Key Development Project for Basic Research (973) (Grant No. 2010CB12606).
Conflicts of Interest
The authors declare no conflict of interest.
- Umate, P.; Tuteja, R.; Tuteja, N. Genome-wide analysis of helicase gene family from rice and Arabidopsis: A comparison with yeast and human. Plant Mol. Biol. 2010, 73, 449–465. [Google Scholar]
- Lohman, T.M.; Bjornson, K.P. Mechanisms of helicase-catalyzed DNA unwinding. Annu. Rev. Biochem. 1996, 65, 169–214. [Google Scholar]
- Owttrim, G.W. RNA helicases and abiotic stress. Nucleic Acids Res. 2006, 34, 3220–3230. [Google Scholar]
- Schmid, S.; Linder, P. D-E-A-D protein family of putative RNA helicases. Mol. Microbiol. 1992, 6, 283–292. [Google Scholar]
- Xu, R.; Zhang, S.; Lu, L.; Cao, H.; Zheng, C. A genome-wide analysis of the RNA helicase gene family in Solanum lycopersicum. Gene 2013, 513, 128–140. [Google Scholar]
- Xu, R.; Zhang, S.; Huang, J.; Zheng, C. Genome-wide comparative in silico analysis of the RNA helicase gene family in Zea mays and Glycine max: A comparison with arabidopsis and Oryza sativa. PLoS One 2013, 8, e78982. [Google Scholar]
- Gong, Z.; Dong, C.H.; Lee, H.; Zhu, J.; Xiong, L.; Gong, D.; Stevenson, B.; Zhu, J.K. A DEAD box RNA helicase is essential for mrna export and important for development and stress responses in Arabidopsis. Plant Cell Online 2005, 17, 256–267. [Google Scholar]
- Gong, Z.; Lee, H.; Xiong, L.; Jagendorf, A.; Stevenson, B.; Zhu, J.K. RNA helicase-like protein as an early regulator of transcription factors for plant chilling and freezing tolerance. Proc. Natl. Acad. Sci. USA 2002, 99, 11507–11512. [Google Scholar]
- Kant, P.; Kant, S.; Gordon, M.; Shaked, R.; Barak, S. STRESS RESPONSE SUPPRESSOR1 and STRESS RESPONSE SUPPRESSOR2 two DEAD-Box RNA helicases that attenuate Arabidopsis responses to multiple abiotic stresses. Plant Physiol. 2007, 145, 814–830. [Google Scholar]
- Guan, Q.; Wu, J.; Zhang, Y.; Jiang, C.; Liu, R.; Chai, C.; Zhu, J. A DEAD vox RNA helicase is critical for pre-mrna splicing cold-responsive gene regulation and cold tolerance in Arabidopsis. Plant Cell Online 2013, 25, 342–356. [Google Scholar]
- Wang, Y.; Duby, G.; Purnelle, B.; Boutry, M. Tobacco VDL gene encodes a plastid DEAD box RNA Helicase and is involved in chloroplast differentiation and plant morphogenesis. Plant Cell Online 2000, 12, 2129–2142. [Google Scholar]
- Gendra, E.; Moreno, A.; Albà, M. Interaction of the plant glycine-rich RNA-binding protein MA16 with a novel nucleolar DEAD box RNA helicase protein from Zea mays. Plant J. 2004, 38, 875–886. [Google Scholar]
- Macovei, A.; Vaid, N.; Tula, S.; Tuteja, N. A new DEAD-box helicase ATP-binding protein (OsABP) from rice is responsive to abiotic stress. Plant Signal. Behav. 2012, 7, 1138–1143. [Google Scholar]
- Liu, H.; Liu, J.; Fan, S.; Song, M.; Han, X.; Liu, F.; Shen, F. Molecular cloning and characterization of a salinity stress-induced gene encoding DEAD-box helicase from the halophyte Apocynum venetum. J. Exp. Bot. 2008, 59, 633–644. [Google Scholar]
- Lange, H.; Sement, F.M.; Gagliardi, D. MTR4 a putative RNA helicase and exosome co-factor is required for proper rRNA biogenesis and development in Arabidopsis thaliana. Plant J. 2011, 68, 51–63. [Google Scholar]
- Kobayashi, K.; Otegui, M.S.; Krishnakumar, S.; Mindrinos, M.; Zambryski, P. INCREASED SIZE EXCLUSION LIMIT2 encodes a putative DEVH Box RNA helicase involved in plasmodesmata function during Arabidopsis embryogenesis. Plant Cell Online 2007, 19, 1885–1897. [Google Scholar]
- Xu, R.R.; Qi, S.D.; Lu, L.T.; Chen, C.T.; Wu, C.A.; Zheng, C.C. A DExD/H box RNA helicase is important for K+ deprivation responses and tolerance in Arabidopsis thaliana. FEBS J. 2011, 278, 2296–2306. [Google Scholar]
- Ohtani, M.; Demura, T.; Sugiyama, M. Arabidopsis ROOT INITIATION DEFECTIVE1 a DEAH-Box RNA helicase involved in pre-mrna splicing is essential for plant development. Plant Cell Online 2013. [Google Scholar] [CrossRef]
- Aubourg, S.; Kreis, M.; Lecharny, A. The DEAD box RNA helicase family in Arabidopsis thaliana. Nucleic Acids Res. 1999, 27, 628–636. [Google Scholar]
- Rathore, K.S. Cotton. In Genetic Modification of Plants; Springer: Berlin/Heidelberg, Germany, 2010; pp. 269–285. [Google Scholar]
- Wendel, J.F.; Brubaker, C.; Alvarez, I.; Cronn, R.; Stewart, J.M. Evolution and natural history of the cotton genus. In Genetics and Genomics of Cotton; Springer: New York, NY, USA, 2009; pp. 3–22. [Google Scholar]
- Hovav, R.; Udall, J.A.; Hovav, E.; Rapp, R.; Flagel, L.; Wendel, J.F. A majority of cotton genes are expressed in single-celled fiber. Planta 2008, 227, 319–329. [Google Scholar]
- Hovav, R.; Udall, J.A.; Chaudhary, B.; Hovav, E.; Flagel, L.; Hu, G.; Wendel, J.F. The evolution of spinnable cotton fiber entailed prolonged development and a novel metabolism. PLoS Genet. 2008, 4, e25. [Google Scholar]
- Yao, D.; Zhang, X.; Zhao, X.; Liu, C.; Wang, C.; Zhang, Z.; Zhang, C.; Wei, Q.; Wang, Q.; Yan, H. Transcriptome analysis reveals salt-stress-regulated biological processes and key pathways in roots of cotton (Gossypium hirsutum L). Genomics 2011, 98, 47–55. [Google Scholar]
- Wang, K.; Wang, Z.; Li, F.; Ye, W.; Wang, J.; Song, G.; Yue, Z.; Cong, L.; Shang, H.; Zhu, S.; et al. The draft genome of a diploid cotton Gossypium raimondii. Nat. Genet. 2012, 44, 1098–1103. [Google Scholar]
- NCBI. Available online: http://www.ncbi.nlm.nih.gov/ (accessed on 1 October 2013).
- Marchat, L.A.; Orozco, E.; Guillen, N.; Weber, C.; Lopez-Camarillo, C. Putative DEAD and DExH-box RNA helicases families in Entamoeba histolytica. Gene 2008, 424, 1–10. [Google Scholar]
- Rocak, S.; Linder, P. DEAD-box proteins: the driving forces behind RNA metabolism. Nat. Rev. Mol. Cell Biol. 2004, 5, 232–241. [Google Scholar]
- Zou, J.; Chang, M.; Nie, P.; Secombes, C. Origin and evolution of the RIG-I like RNA helicase gene family. BMC Evol. Biol. 2009, 9, 85. [Google Scholar]
- Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32, 1792–1797. [Google Scholar]
- Holub, E.B. The arms race is ancient history in Arabidopsis the wildflower. Nat. Rev. Genet. 2001, 2, 516–527. [Google Scholar]
- Zhang, C.; Zhang, H.; Zhao, Y.; Jiang, H.; Zhu, S.; Cheng, B.; Xiang, Y. Genome-wide analysis of the CCCH zinc finger gene family in Medicago truncatula. Plant Cell Rep. 2013, 1–13. [Google Scholar]
- Altenhoff, A.M.; Dessimoz, C. Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput. Biol. 2009, 5, e1000262. [Google Scholar]
- Linder, P.; Owttrim, G.W. Plant RNA helicases: Linking aberrant and silencing RNA. Trends Plant Sci. 2009, 14, 344–352. [Google Scholar]
- Asakura, Y.; Galarneau, E.; Watkins, K.P.; Barkan, A.; van Wijk, K.J. Chloroplast RH3 DEAD box RNA helicases in maize and Arabidopsis function in splicing of specific group II introns and affect chloroplast ribosome biogenesis. Plant Physiol. 2012, 159, 961–974. [Google Scholar]
- He, J.; Duan, Y.; Hua, D.; Fan, G.; Wang, L.; Liu, Y.; Chen, Z.; Han, L.; Qu, L.J.; Gong, Z. DEXH Box RNA helicase-mediated mitochondrial reactive oxygen species production in Arabidopsis mediates crosstalk between abscisic acid and auxin signaling. Plant Cell Online 2012, 24, 1815–1833. [Google Scholar]
- Shang, H.; Li, W.; Zou, C.; Yuan, Y. Analyses of the NAC transcription factor gene family in Gossypium raimondii Ulbr: Chromosomal location structure phylogeny and expression patterns. J. Integr. Plant Biol. 2013, 55, 663–676. [Google Scholar]
- Cannon, S.B.; Mitra, A.; Baumgarten, A.; Young, N.D.; May, G. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol. 2004, 4, 10. [Google Scholar]
- Du, D.; Hao, R.; Cheng, T.; Pan, H.; Yang, W.; Wang, J.; Zhang, Q. Genome-wide analysis of the AP2/ERF gene family in Prunus mume. Plant Mol. Biol. Rep. 2013, 31, 741–750. [Google Scholar]
- Tanner, N.K.; Cordin, O.; Banroques, J.; Doère, M.; Linder, P. The Q motif: A newly identified motif in DEAD Box helicases may regulate ATP binding and hydrolysis. Mol. Cell 2003, 11, 127–138. [Google Scholar]
- Ilsleya, J.L.; Sudolb, M.; Windera, S.J. The WW domain: Linking cell signalling to the membrane cytoskeleton. Cell. Signal. 2002, 14, 183–189. [Google Scholar]
- Klostermeier, D.; Rudolph, M.G. A novel dimerization motif in the C-terminal domain of the Thermus thermophilus DEAD box helicase Hera confers substantial flexibility. Nucleic Acids Res. 2009, 37, 421–430. [Google Scholar]
- Menges, M.; de Jager, S.M.; Gruissem, W.; Murray, J.A. Global analysis of the core cell cycle regulators of Arabidopsis identifies novel genes reveals multiple and highly specific profiles of expression and provides a coherent model for plant cell cycle control. Plant J. 2005, 41, 546–566. [Google Scholar]
- Lucas, W.J.; Ham, B.K.; Kim, J.Y. Plasmodesmata—Bridging the gap between neighboring plant cells. Trends Cell Biol. 2009, 19, 495–503. [Google Scholar]
- CottonGen. Available online: http://www.cottongen.org/ (accessed on 1 October 2013).
- Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar]
- The Arabidopsis Information Resource. Available online: http://www.arabidopsis.org/ (accessed on 6 October 2013).
- PlantGDB. Available online: http://www.plantgdb.org/ (accessed on 6 October 2013).
- PROSITE Program. Available online: http://www.expasy.ch/prosite/ (accessed on 25 October 2013).
- Compute pI/Mw Tool. ExPASy Server. Available online: http://web.expasy.org/compute_pi/ (accessed on 25 October 2013).
- CELLO v2.5. Available online: http://cello.life.nctu.edu.tw/ (accessed on 25 October 2013).
- Guindon, S.; Lethiec, F.; Duroux, P.; Gascuel, O. PHYML Online—A web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res. 2005, 33, W557–W559. [Google Scholar]
- Tang, H.; Bowers, J.E.; Wang, X.; Ming, R.; Alam, M.; Paterson, A.H. Synteny and collinearity in plant genomes. Science 2008, 320, 486–488. [Google Scholar]
- Wang, Y.; Tang, H.; DeBarry, J.D.; Tan, X.; Li, J.; Wang, X.; Lee, T.-H.; Jin, H.; Marler, B.; Guo, H. MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012, 40, e49–e49. [Google Scholar]
- Krzywinski, M.; Schein, J.; Birol, İ; Connors, J.; Gascoyne, R.; Horsman, D.; Jones, S.J.; Marra, M.A. Circos: An information aesthetic for comparative genomics. Genome Res. 2009, 19, 1639–1645. [Google Scholar]
- Bairoch, A. PROSITE: A dictionary of sites and patterns in proteins. Nucleic Acids Res. 1991, 19, 2241. [Google Scholar]
- Gasteiger, E.; Gattiker, A.; Hoogland, C.; Ivanyi, I.; Appel, R.D.; Bairoch, A. ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 2003, 31, 3784–3788. [Google Scholar]
- GSDS v2.0. Available online: http://gsds.cbi.pku.edu.cn/ (accessed on 25 October 2013).
- Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25, 1754–1760. [Google Scholar]
- Mortazavi, A.; Williams, B.A.; McCue, K.; Schaeffer, L.; Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 2008, 5, 621–628. [Google Scholar]
- Yin, Z.; Wang, J.; Wang, D.; Fan, W.; Wang, S.; Ye, W. The MAPKKK gene family in Gossypium raimondii: Genome-wide identification classification and expression analysis. Int. J. Mol. Sci. 2013, 14, 18740–18757. [Google Scholar]
|Gene name||Gene identifier||Genomic position||Size (AA)||Mw||pI||Subcellular localization|
© 2014 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).