Genome-Wide Identification and Analysis of MAPK and MAPKK Gene Families in Bread Wheat (Triticum aestivum L.)

The mitogen-activated protein kinase (MAPK) cascade is a universal signal transduction module that plays a vital role in regulating growth and development, as well as environmental stress responses in plants. Wheat is one of the most important crops worldwide. Although the MAPK kinase kinase (MAP3K) family in wheat has been investigated, the MAPK and MAPK kinase (MAP2K) gene families remain unknown at present. Here, 54 MAPK and 18 MAPKK genes were identified in wheat using recent genomic information. Phylogenetic analysis of Triticum aestivum L. MAPKs and MAPKKs (TaMAPKs and TaMAPKKs) together with homologous genes from other species classified them into four groups, and the clustering was consistent with the genomic exon/intron structures. Conserved motif analysis found that MAPK proteins contained a typical TXY phosphorylation site and MAPKK proteins contained an S/T-X5-S/T motif. RNA-seq data mapping analysis showed that MAPK and MAPKK genes in group IV had tissue-specific expression profiles, whereas each group I member showed relatively high expression in all organs. Expression patterns of TaMAPK and TaMAPKK genes under stress conditions were also investigated and stress-responsive candidates were identified. Co-expression network analysis identified 11 TaMAPK genes and 6 TaMAPKK genes involved in the interaction network pathway. Overall, this study provided useful information for evolutionary and functional surveys of MAPK and MAPKK gene families in wheat and beyond.


Introduction
Wheat is one of the most important cereal crops worldwide, and provides about 30% of the staple food source for humankind [1,2]. Various abiotic stresses, including salt, drought and cold, are major limiting factors to wheat production. Improving the tolerance of wheat in response to diverse stresses holds promise for addressing the ever-increasing human population as well as the threat of climate change.
The self-protection mechanisms of plants activate numerous genes during enduring abiotic stress [3]. Mitogen-activated protein kinase (MAPK) cascade genes code for well-conserved proteins that function as key signal transduction components, consisting of three MAPK members activated by sequential phosphorylation, namely (MAPK kinase kinase) MAP3K-(MAPK kinase) MAP2K-MAPK [4]. Extensive studies have revealed that the MAPKs have indispensable regulatory roles in response to abiotic stresses in plants. MAPK3 and MAPK6 play a critical role in promoting cell division in integuments specifically during ovule development as well as being involved in abiotic stress response [5]. MAPK4, MAPK5, MAPK6 and MAPK7 were specifically induced by abiotic stress treatments such as salt, drought and low temperature in maize [6]. GhMAPK16 might be relevant to abiotic stress signal transduction pathways and its over-expression in Arabidopsis led to significant tolerance to drought stress [7]. The MAPK cascade gene family has been well studied in rice, maize and Brachypodium distachyon [8][9][10]. In wheat, the genome organization, evolutionary features and expression profiles of the MAP3K gene family has also been systematically studied [4]. However, information on wheat MAPK and MAPKK gene families is not well-understood, especially those families involved in abiotic stress response.
In this study, 54 MAPK and 18 MAPKK genes were identified in wheat genome based on a bioinformatics search. The phylogenetic tree, conserved motifs, gene expression pattern and regulatory network of these MAPKs and MAPKKs were further systematically analyzed. Phylogenetic tree and conserved motifs analysis of wheat MAPK and MAPKK families classified them into four groups. Additionally, the expression patterns of 54 MAPK and 18 MAPKK genes under abiotic stress or in various tissues were comprehensively investigated by RNA-seq analyses. Finally, the interaction network of putative wheat MAPK and MAPKK genes was also investigated. Overall, this study provided basic information about the genomic organization of MAPK and MAPKK genes in wheat, which will facilitate further functional studies.

Identification of MAPK and MAPKK Genes in Wheat
Potential members of wheat MAPK and MAPKK gene families were identified according to the method described by Wang et al [4]. All available protein sequences for wheat genotype Chinese Spring (CS42) were downloaded from the Ensemble database [11]. The available MAPK and MAPKK genes in Arabidopsis thaliana, Oryza sativa and B. distachyon were used to construct an hidden Markov model (HMM) profile using the hmm-build tool embedded in HMMER 3.0, and then the HMM profile was used to search wheat proteins using the hmmsearch tool embedded in HMMER 3.0 [12]. Conserved domains of wheat MAPK and MAPKK members were confirmed by PFAM [13] and InterProScan database [14]. Finally, all sequences obtained were verified by a BLASTN (Nucleotide BLAST) similarity search against wheat expressed sequence tags (ESTs) deposited in the National Center for Biotechnology Information NCBI database. The theoretical pI (isoelectric point) and Mw (molecular weight) of candidate genes were calculated using compute pI/Mw tool online (http://web.expasy.org/compute_pi/). Subcellular localization of these genes was predicted using the CELLO v2.5 web server [15].

Multiple Alignments and Phylogenetic Analysis
Multiple sequence alignments were performed using ClustalW [16]. An unrooted phylogenetic tree was generated by MEGA 6.0 software using the neighbor-joining method and the bootstrap test method with 1000 replications [17]. The MEME program [18] was used to predict the conserved motifs of TaMAPKs and TaMAPKKs.

Analysis of the Expression Profiles of MAPK and MAPKK Genes by RNA-Seq Datasets
To study the expression of MAPK and MAPKK genes in different organs and response to stresses, a total of 28 RNA-seq datasets of wheat variety Chinese Spring obtained from WHEAT URGI (https://urgi.versailles.inra.fr/files/RNASeqWheat/) and NCBI Sequence Read Archive (SRA) database were used to investigate differential expression of TaMAPK and TaMAPKK genes (Table S1). TopHat and Cufflinks software were used to analyze gene expression based on these RNA-seq data with the default parameters [19]. The FPKM (fragments per kilobase of transcript per million fragments mapped) value was calculated for each gene, and the log10-transformed (FPKM + 1) values of MAPK and MAPKK genes were used to generate heat maps.

Co-expression Network Analysis
Correlation networks are increasingly being used in bioinformatics applications. The WGCNAR_1.49 package was used to construct the co-expression network of wheat MAPK and MAPKK genes by performing weighted correlation network analysis of RNA-seq data [20].

Genome-Wide Identification of the MAPK and MAPKK Gene Families in Wheat
MAPK and MAPKK gene families are the important protein kinases involved in many physiological processes [21]. In order to identify members of the two families, we performed a HMM search and a total of 54 non-redundant MPAK and 18 MAPKK genes were identified in the wheat genome ( Table 1). The predicted genes in the two families were named TaMAPK1 to TaMAPK54 and TaMAPKK1 to TaMAPKK18, respectively. The number of MAPK genes in wheat (54) was much higher than in rice (16), maize (19) and B. distachyon (16) [8][9][10]. The number of TaMAPKK (18) was also slightly higher than in B. distachyon (12) and cucumber (14) [22]. The locations of both groups of MAPK cascade genes were not random across the wheat chromosomes. Sixteen of the TaMAPK genes were located on homoeologous group 7 chromosomes, 14 were located in group 1 chromosomes, and 9, 7, 6 and 2 were in groups 6, 3, 4 and 5, respectively. There was no MAPK gene on a group 2 chromosome. The locations of TaMAPKK genes differed from TaMAPKs. Homologous group 4 contained the most TaMAPKK genes, at 10, followed by group 5 with 5 TaMAPKKs, then groups 3 and 6, harboring 2 and 1, respectively. No TaMAPKK gene was present in groups 1, 2 and 7.
To support the actual existence of these putative genes, we performed a BLASTN search against the wheat EST and UniGene database using the MAPK and MAPKK genes as queries (Table 1). Fifty-one MAPK genes and 5 MAPKK genes had EST support, representing 94.4% and 27.8% of the predicted genes, respectively. We speculated that the TaMAPKs and TaMAPKKs with no EST support might have very low expression that could not be detected experimentally, or was not expressed under any conditions we used. The lengths of putative TaMAPK proteins ranged from 184 to 598 amino acids, with putative molecular weights (Mw) ranging from 21.4 to 68.2 kDa and theoretical pI ranging from 5.19 to 9.17, respectively. The lengths of TaMAPKK proteins ranged from 188 to 647 amino acids, Mw from 20.7 to 71.6 kDa and pI from 5.37 to 8.95, respectively. Subcellular localization analysis found that most of TaMAPKs and TaMAKKs were localized in cytoplasmic (Table 1).

Multiple Alignments, Phylogenetic and Conserved Motif Analysis of TaMAPKs and TaMAPKKs
To further evaluate the phylogenetic relationships of the wheat MAPK and MAPKK genes, the full-length protein sequences of the 54 TaMAPKs and 18 TaMAPKKs were aligned using ClustalW software [16] and phylogenetic trees were constructed using the neighbor joining (NJ) method in MEGA 6.0 [17]. As shown in Figure 1A, the sequences of the TaMAPK genes in the motif regions were highly conserved. Through phylogenetic analyses relative to homologs in Arabidopsis and cucumber, we classified the TaMAPKs into four groups ( Figure 1B), each group including 7 (Group I), 3 (II), 8 (III), 36 (IV) members. The results indicated evolutionary differences in wheat MAPK genes, consistent with those in B. distachyon [8]. Each MAPK cluster classified by phylogenetic analysis shared similar conserved motif compositions. A total of ten motifs were identified in TaMAPK proteins. Motif TXY is an important phosphorylation site for MAPK activation. Groups I-III have a TEY motif in their activation loops, except TaMAPK47 where THE replaced the TEY. The THE motif was not found in rice, maize, M. domestica or B. distachyon [9,10,23,24] suggesting that it might be wheat-specific.
As shown in Figure 2A, members of the TaMAPKK family also clustered into four groups. Groups I, II and III contained a S/T-X5-S/T motif, which functions as a phosphorylation site, but was not present in Group IV. This motif is not only found in rice, Arabidopsis and B. distachyon, but also in some animals and fungi [24]. Most interestingly, the MAPKKs that include the S/T-X5-S/T motif were highly expressed in many tissues or responded to multiple stresses.

Tissue-Specific Expression Patterns of TaMAPK and TaMAPKK Genes
The expression patterns of TaMAPKs in various wheat tissues were investigated using RNA-seq data for five tissues. As shown in Figure 3A, the heat map indicates that all 54 detected genes were involved in various biological processes and were expressed in the majority of wheat tissues, but their expression levels were highly variable. TaMAPK14, 18, 28, 31, 32 and 46 were expressed at high levels in all five tissues. Some TaMAPKs were highly expressed in specific tissues. For example, TaMAPK17 and 35 were highly expressed in developing grain, suggesting that those genes may play indispensable roles in grain development. TaMAPK5 and 42 were relatively highly expressed in leaf and root tissues. TaMAPKs highly expressed in specific organs may have specific roles in growth and development of the corresponding tissues. It is noteworthy that tissue expression specificity of MAPK genes has been reported in B. distachyon, maize and cucumber [9,15,24]. All members of group II were highly expressed in multiple tissues, consistent with CsMAPK13 in cucumber and BdMAPK11 in B. distachyon [15,24]. data for five tissues. As shown in Figure 3A, the heat map indicates that all 54 detected genes were involved in various biological processes and were expressed in the majority of wheat tissues, but their expression levels were highly variable. TaMAPK14, 18,28,31,32 and 46 were expressed at high levels in all five tissues. Some TaMAPKs were highly expressed in specific tissues. For example, TaMAPK17 and 35 were highly expressed in developing grain, suggesting that those genes may play indispensable roles in grain development. TaMAPK5 and 42 were relatively highly expressed in leaf and root tissues. TaMAPKs highly expressed in specific organs may have specific roles in growth and development of the corresponding tissues. It is noteworthy that tissue expression specificity of MAPK genes has been reported in B. distachyon, maize and cucumber [9,15,24]. All members of group II were highly expressed in multiple tissues, consistent with CsMAPK13 in cucumber and BdMAPK11 in B. distachyon [15,24].
Expression profiles of TaMAPKK genes in different tissues showed that most genes were expressed in five different tissues. Interestingly, MAPKKs with the S/T-X5-S/T motif show high expression in specific tissues ( Figure 3B). For example, TaMAPKK9 and TaMAPKK10 showed relatively higher expression in leaves than in other tissues, indicating that they positively controlled leaf development. Members of Groups I, II and III with S/T-X5-S/T motifs had higher expression in all organs, consistent with BdMAPKK3 and BdMAPKK6 in B. distachyon and suggesting that this motif was an important phosphorylation site in relation to gene expression level [24]. Conserved domains identification and analysis may facilitate identification of functional units in these kinase genes and accelerate an understanding of their roles in plant growth and development as well as stress response.  Expression profiles of TaMAPKK genes in different tissues showed that most genes were expressed in five different tissues. Interestingly, MAPKKs with the S/T-X5-S/T motif show high expression in specific tissues ( Figure 3B). For example, TaMAPKK9 and TaMAPKK10 showed relatively higher expression in leaves than in other tissues, indicating that they positively controlled leaf development. Members of Groups I, II and III with S/T-X5-S/T motifs had higher expression in all organs, consistent with BdMAPKK3 and BdMAPKK6 in B. distachyon and suggesting that this motif was an important phosphorylation site in relation to gene expression level [24]. Conserved domains identification and analysis may facilitate identification of functional units in these kinase genes and accelerate an understanding of their roles in plant growth and development as well as stress response.

Expression Patterns of TaMAPK and TaMAPKK Genes under Abiotic Stress
There is increasing data showing that the MAPK and MAPKK gene families have key roles in determining response to abiotic stresses [21]. For example, ZmMAPK1, GhMAPK4 and SiMAPK7 were reported to regulate abiotic stress response [25][26][27]. We analyzed the expression patterns of TaMAPK genes to determine whether they were responsive to heat, drought, cold or salt stress. As shown in Figure 3A, 84% and 16% TaMAPKs displayed up-or down-regulation following stress treatments. TaMAPK38 was up-regulated after 6 h of heat stress, whereas TaMAPK5 and TaMAPK42 were suppressed. TaMAPK10 and TaMAPK45 were suppressed whereas TaMAPK29, TaMAPK33 and TaMAPK41 were induced by salt stress. This was consistent with previous research reporting that TaMAPK4 was strongly induced under high salinity stress [28], and that over-expression of OsMAPK33 in rice led to enhanced salt sensitivity [29]. However, multiple numbers of clustered genes, such as TaMAPK32, TaMAPK14 and TaMAPK28, had similar expression profiles under abiotic stress, indicating that these genes might have analogous physiological functions. The expression levels of clustered genes were not significantly altered by stress treatment as also found in B. distachyon [24]. Finally, the TaMAPK47 gene was specifically expressed after 24 h of salt treatment, indicating it might be a salt-responsive candidate. Genes that are expressed under abiotic stress conditions may be the specific genes responsible for regulation of stress response and tolerance. Compared with TaMAPK genes TaMAPKKs show more significant differences under various stress conditions. For example, transcripts of TaMAPKK8 and TaMAPKK15 were down-regulated under heat treatment for 1 h, but were induced after 6 h. This was contrary to reports for SbMAPKK in S. brachiata in which there was up-regulation followed by down-regulation under dehydration treatment [30]. TaMAPKK12 and 16 were induced and TaMAPKK7 and 17 were reduced under all stress treatments, whereas other genes were reduced or induced following just one or two stress treatments. TaMAPKK5 and TaMAPKK14 showed opposite expression patterns following heat and salt treatments, suggesting that functional differentiation had occurred in these genes and that they were involved in regulating different stress signaling pathways. In summary, wheat MAPK/MAPKK genes likely play vital roles in different biological process and various abiotic stresses.

Interactions between TaMAPK and TaMAPKK Family Members
In general, MAPK cascade kinase genes form conserved signaling modules, involving three functionally linked protein kinases: MAPKs, MAPKKs and MAP3Ks [8]. In this study, we constructed the regulatory network of the MAPK, MAPKK cascade based upon different tissues and stress treatments in wheat. The WGCNAR package [20] provides a comprehensive set of functions for performing weighted correlation network analysis by RNA-seq data. As shown in Figure 3C, 6 MAPKK genes and 11 MAPK genes were involved in a single cascade. The 6 MAPKK genes contained S/T-X5-S/T motifs showed high expression levels. Furthermore, 4 of the 11 MAPK genes in Group I, including TaMAPK9, TaMAPK15, TaMAPK25 and TaMAPK30, showed close associations with MAPKK genes. In Arabidopsis, Group I members AtMAPK3 and AtMAPK6 play important roles in anther cell differentiation and normal anther lobe formation, and AtMAPK6 is involved in seed formation and the modulation of lateral and primary root development [5,31], suggesting that TaMAPK9, TaMAPK15, TaMAPK25 and TaMAPK30 might be involved in the regulatory network that controls organ differentiation or development. The constructed interaction network between TaMAPK and TaMAPKK provides information for further studies on MAPK regulatory pathways in wheat and other species.