Bioinformatics Analysis of MAPKKK Family Genes in Medicago truncatula

Mitogen-activated protein kinase kinase kinase (MAPKKK) is a component of the MAPK cascade pathway that plays an important role in plant growth, development, and response to abiotic stress, the functions of which have been well characterized in several plant species, such as Arabidopsis, rice, and maize. In this study, we performed genome-wide and systemic bioinformatics analysis of MAPKKK family genes in Medicago truncatula. In total, there were 73 MAPKKK family members identified by search of homologs, and they were classified into three subfamilies, MEKK, ZIK, and RAF. Based on the genomic duplication function, 72 MtMAPKKK genes were located throughout all chromosomes, but they cluster in different chromosomes. Using microarray data and high-throughput sequencing-data, we assessed their expression profiles in growth and development processes; these results provided evidence for exploring their important functions in developmental regulation, especially in the nodulation process. Furthermore, we investigated their expression in abiotic stresses by RNA-seq, which confirmed their critical roles in signal transduction and regulation processes under stress. In summary, our genome-wide, systemic characterization and expressional analysis of MtMAPKKK genes will provide insights that will be useful for characterizing the molecular functions of these genes in M. truncatula.


Introduction
MAP kinase signaling cascade pathways have been well identified and characterized in many plants [1].These MAPK pathways are characterized with important roles in plant growth, development, and response to abiotic stress [2].The MAPK pathway is minimally consisted of three members, a MAPKKK (MAPK kinase kinase), a MAPKK (MAPK kinase), and a MAPK [3,4]; they interact and transmit signals from upstream receptors to downstream function targets by phosphorylation function [5,6].Among these MAPK families, the MAPKKK family is the largest, with more gene members than other families.They are further classified into three subfamilies, RAF, MEKK, and ZIK, based on characteristic sequence motifs [3].
Medicago truncatula is an annual legume plant that can form a symbiotic association with soil bacteria called rhizobia.Because of its small, diploid genome, short life cycle, self-fertility, and high genetic transformation efficiency, M. truncatula has become an excellent legume model plant [19].When its genome sequence was released, Neupane et al. investigated two MAPK families, MAPK and MAPKK, and their reports showed that MAPK signaling cascade pathways played important roles in tissue development, such as leaf, root, and nodule [20].However, the largest family, MAPKKK, has not been identified on a genome level; their function is poorly characterized in M. truncatula.
In this study, we performed a genome-wide analysis of the MAPKKK family in M. truncatula, including phylogenetic analysis, chromosomal localization, and gene duplication analysis.Meanwhile, we also investigated their expression profiles by microarray data and an RNA-seq experiment, and explored their function in plant development and response to stresses.These findings would be valuable for understanding MAPK cascade function and promoting their utilization in legumes' genetic improvement.

Identification of the MAPKKK Gene Family in Medicago truncatula
The Medicago truncatula genome sequences were downloaded from JCVI (http://jcvi.org/medicago/, Mt4.0) [19].MAPKKK protein sequences of Arabidopsis were collected and used as queries to search against the M. truncatula genome using the BLASTP program with e-values of 1E-5.The blast hits were confirmed to contain protein kinase domain (PF00069) [21] using HMMER [22] tools.These proteins from the same gene locus were identified as protein duplications, and the redundancies were removed with the longest one kept.The remaining one was identified as a MAPKKK family member.All of the putative MtMAPKKK family genes were aligned to Arabidopsis MAPKKK proteins to classify them into different subfamilies as Janitza et al. described [23].Meanwhile, all of the annotation information of these MtMAPKKK genes was retrieved from the M. truncatula genome website, and their structures were displayed using GSDS software [24].

Phylogenetic Analysis of the MtMAPKKK Genes in M. truncatula
The protein sequences of MtMAPKKK genes were aligned using ClustalW with the default parameters [25].The results were used for phylogenetic analysis using MEGA (Version 4.0), and an unrooted phylogenetic tree was generated using the neighbor-joining (NJ) method with the following parameters: Poisson correction, pair-wise deletion, and 1,000 bootstrap replicates [26].

Chromosomal Location and Gene Duplication of MtMAPKKK Genes
Positional information about all of the MAPKKK genes was retrieved from the M. truncatula genome, and the nucleotide sequences of these genes were used as query sequences for a BLASTN search against each other to explore gene duplication, with similarities of more than 85%.In addition, duplications between the MAPKKK genes were also identified and complemented using the PGDD database (http://chibba.agtec.uga.edu/duplication/)[27].Based on the space between duplication gene pairs, these duplications were classified into tandem duplications (TD, separated by four or fewer gene loci) and segmental duplications (SD, separated by more than five genes), as in our previous description [28].The chromosome locations of MAPKKK genes in M. truncatula were drawn using the Circos software (http://circos.ca/)[29], and duplicated genes between different chromosomes or loci were also linked with colored lines in the diagrams.

Expression Analysis of MtMAPKKK Genes in Growth and Development
Gene expression data involving major organ systems development, particularly the development of nodules and seeds, were downloaded from the Medicago truncatula Gene Expression Atlas (MtGEA) Project (MtGEA, http://mtgea.noble.org/v3/)[30].Meanwhile, genome-wide transcriptome data from M. truncatula in different tissues during development were downloaded from the NCBI short read archive database (SRA database) (http://www.ncbi.nlm.nih.gov,Accession numbers SRX099057-SRX099062).The expressional profiles of MtMAPKKK genes were retrieved from these expression data, and they were analyzed, clustered, and displayed using ggplot2 of R software (Version 3.1.0).

Expression Analysis of MtMAPKKK Genes' Response to Abiotic Stress
An RNA-seq data previously reported by our group, Shu et al. [28], was analyzed for differential expression of MtMAPKKK genes involved in abiotic stress.In brief, the RNA-seq experiment was performed as follows: the seeds of M. truncatula (cv.Jemalong A17) were germinated and grown for eight weeks.Then, these seedlings were grown under normal conditions, cold stress (4 ˝C), freezing stress (´8 ˝C), osmotic stress (300 mM mannitol), salt stress (200 mM NaCl), and ABA (100 µM ABA).For each condition, five randomly chosen whole seedlings were pooled to form a biological replicate after three hours' treatment.All plant samples were frozen in liquid nitrogen and stored at ´80 ˝C until use.Total RNA was extracted from six samples, and they were sent to BGI-Shenzhen Ltd. (Shenzhen, China) for construction of pair-end cDNA libraries and performing Illumina sequencing.MtMAPKKK gene expressions across six treatment samples were evaluated using the TopHat [31] and Cufflinks [32] software, and they were analyzed, clustered, and displayed using the ggplot2 of R software (Version 3.1.0).

Identification and Characterization of MAPKKK Family in M. truncatula
To identify the MAPKKK genes family, we used 80 Arabidopsis MAPKKK genes as query sequences to perform a blast search against the M. truncatula genome sequence.In total 73 protein sequences from the M. truncatula were homologous to Arabidopsis MAPKKK genes, with protein kinase domain (PF00069), and they were identified as MAPKKK family genes, named MtMAPKKK01-73 based on their locations on the chromosomes, as Table 2 shows.According to homology with Arabidopsis MAPKKKs, they were classified into three subfamilies, MEKK (28 members), RAF (20 members), and ZIK (25 members).The amino acid sequence lengths of MtMAPKKK varied from 118 (MtMAPKKK73) to 1107 (MtMAPKKK19) amino acids (aa); average length was 532 aa.The number of introns was highly divergent, from one to 14, as Figure 1 shows, which is consistent with MAPKKK genes in Arabidopsis.The introns' distribution is subfamily specific: members of the ZIK subfamily generally contained the most and the longest introns; next was the MEKK subfamily, and last was the RAF subfamily, which may be involved in expression regulation [33].

Phylogenetic Analysis of MtMAPKKK Genes
To investigate the evolutionary relationships of MtMAPKKK genes, we performed multiple sequence alignment and phylogenetic analysis.The results showed that these MtMAPKKK genes were clearly divided into three subfamilies (see Figures 2 and S1), which confirmed their previous classification in Arabidopsis homologues.As shown in Figure 2, there was only one branch in the MEKK subfamily, which suggested that the MEKK subfamily was highly conserved.On the other hand, there were two branches in the phylogenetic tree of the RAF subfamily and three in the ZIK subfamily, which implied that their functions diverged.In the three types, there were many MtMAPKKK genes that diverged less and clustered together; for example, MtMAPKKK01-03 and MtMAPKKK14-17 in MEKK subfamily.These clusters indicated that the MtMAPKKK genes have undergone expansion through gene duplication during the M. truncatula genome evolution process, and these clusters conferred a number of paralogous genes, which might perform the same function in biological processes.

Chromosomal Location and Duplication Analysis of MtMAPKKK Genes
Based on physical locations of the MtMAPKKK genes on M. truncatula chromosomes, they were displayed using Circos software, as Figure 3 shows.The results showed that the MtMAPKKK genes (except MtMAPKKK28) are distributed across eight chromosomes, and each chromosome holds different contents of MtMAPKKK genes, ranging from five to 14 members.The chromosomes MtChr4 and MtChr8 contained the most MtMAPKKK genes (14 members), while chromosomes MtChr7 and MtChr9 held the fewest members (five genes).In addition, by blast analysis and

Chromosomal Location and Duplication Analysis of MtMAPKKK Genes
Based on physical locations of the MtMAPKKK genes on M. truncatula chromosomes, they were displayed using Circos software, as Figure 3 shows.The results showed that the MtMAPKKK genes (except MtMAPKKK28) are distributed across eight chromosomes, and each chromosome holds different contents of MtMAPKKK genes, ranging from five to 14 members.The chromosomes MtChr4 and MtChr8 contained the most MtMAPKKK genes (14 members), while chromosomes MtChr7 and MtChr9 held the fewest members (five genes).In addition, by blast analysis and database search, we identified 35 pairs of gene duplication events in these MtMAPKKK genes, which arose from tandem duplications (22 pairs) and segment duplications (13 pairs).These duplications led to expansion of the MtMAPKKK family in the M. truncatula genome.Among these duplications, tandem duplications have resulted in MtMAPKKK gene clusters or hot regions; for instance, MtMAPKKK19-22 results in an MEKK subfamily cluster in MtChr7.The segment duplication has resulted in MAPKKK members in different chromosomes-for example, duplication between MtMAPKKK01-03 and MtMAPKKK14-17 had expanded the MEKK subfamily from MtChr1 to MtChr5; the clustering was also confirmed by phylogenetic analysis.
Genes 2016, 7, 13 10 of 17 expression profiles of these MtMAPKKK genes from high-throughput sequencing were confirmed by the results in microarray expression (see Figure 5A); for example, MtMAPKKK40, 49, and 58 were specifically expressed in flowers both in microarray and high-throughput sequencing.
Similarly, MAPKKK44 was also confirmed to be expressed in buds by two datasets.However, there were 39 MtMAPKKK genes detected in high-throughput sequencing data, which was slightly more than in the microarray dataset (32 MtMAPKKK genes).

In Silico Expression Analysis of MtMAPKKK Genes Involved in Growth and Development
To investigate MtMAPKKK genes' expression in plant growth and development, expression data of M. truncatula were collected from MtGEA and NCBI, including microarray data and high-throughput sequencing data.Expression profiles of MtMAPKKK genes were retrieved and analyzed, as Figures 4  and 5 show.According to the microarray data, these MtMAPKKK genes were clustered into three groups.Group A genes, including MtMAPKKK05, 06, 16, 44, 47, 49, 51, 66, and 67, were expressed in different tissues; expression was associated with tissue development.Group B genes, including MtMAPKKK19, 31, 33, 34, 41, 57, 63, 64, and 71, were expressed during seed development.In the last group, including MtMAPKKK03, 06, 09, 18, 24, 25, 38, 42, 49, 52, 54, 66, 68, and 71, expression was associated with the nodulation process in M. truncatula.These results suggested that MtMAPKKK genes were expressed in specific tissues or during different stages of development, with a potential role in the development processes of M. truncatula.In addition, expression profiles of these MtMAPKKK genes from high-throughput sequencing were confirmed by the results in microarray expression (see Figure 5A); for example, MtMAPKKK40, 49, and 58 were specifically expressed in flowers both in microarray and high-throughput sequencing.Similarly, MAPKKK44 was also confirmed to be expressed in buds by two datasets.However, there were 39 MtMAPKKK genes detected in high-throughput sequencing data, which was slightly more than in the microarray dataset (32 MtMAPKKK genes).

Expression Analysis of MtMAPKKK Genes in Response to Abiotic Stresses
With the development of next sequencing technology, we have performed RNA-seq to identified MtMAPKKK genes' response to abiotic stress.There were 32 MtMAPKKK genes expressed in six samples, while 15 MtMAPKKK genes were differentially expressed under abiotic stress, including cold (five members), freezing (five members), osmotic (two members), salt (eight members) and ABA (four members) treatments (see Figure 5B).Among these MtMAPKKK genes, MtMAPKKK49 was upregulated by all stresses, while other MtMAPKKK genes were differentially affected by treatment.For example, MtMAPKKK36, 41, 51, 54, 66, 67, 71, and 72 were highly expressed under cold stress, but had no expression or very low expression in other stresses.Similarly, MtMAPKKK18, 65, 26, 46, 06, and 38 were specifically expressed under freezing stress; MtMAPKKK32, 44, 59, 62, 69, and 70 were expressed under osmotic stress; MtMAPKKK01, 04, 33, 39 and 49 were present in salt stress; and MtMAPKKK31, 52, 57, 59, and 60 were responsive to ABA stimuli.Compared to ABA treatment, we found that there were some correlation between osmotic and salt stress with ABA treatment, such as in MtMAPKKK52, 57, 59, and 60.However, there were no common MAPKKK members expressed in both cold and freezing conditions with ABA treatment, implying that these MtMAPKKK genes may have a role in osmotic and salt stresses through the ABA regulation pathway, but may not have a role in cold and freezing stresses.

Discussion
In previous studies, MAPK signaling cascade pathways played important roles in various processes, including developmental processes, biotic, and abiotic stress responses [1,8].To date, a large number of MAPKKK genes have been identified and characterized in plants, including Arabidopsis [14], rice [12], grapevines [34], maize [16], soybeans [17], and tomatoes [35].However, the M. truncatula genome has been reported [19], and both the MAPK and MAPKK gene families have been identified and characterized [20]; until now the MAPKKK gene family has not been reported.In the present research, we performed genome-wide analysis for the MAPKKK family in M. truncatula, and 73 MtMAPKKK genes were identified by homologous searching and domain analysis.The number of MtMAPKKK genes is similar to their members in Arabidopsis, rice, maize, and tomatoes, but half that of the soybean MAPKKK family, which is an allopolyploid species (see Table 3).The results showed that the MAPKKK family is highly conserved in the plants.Based on characteristic sequence motifs, they were divided into three subfamilies, MEKK, RAF, and ZIK, as is done for other plants [12,14].However, the numbers of subfamilies diverge from other plants, such as Arabidopsis, rice, maize, and tomatoes.The MEKK subfamily has 28 members, which is consistent with other plants (see Table 3), implying that it is the most conservative subfamily in M. truncatula.By gene duplication analysis, we identified 24 duplication events in the MEKK subfamily; 19 of them are tandem duplication events, which have conferred MEKK gene clusters in the M. truncatula genome, such as MtMAPKKK01-03, MtMAPKKK04-06 on MtChr01, MtMAPKKK14-17 on MtChr05, and MtMAPKKK19-23 on MtChr07.These results strongly suggested that tandem duplications mainly contributed to conservation of the MEKK subfamily in M. truncatula (see Figure 2).Compared to the MEKK subfamily, the RAF and ZIK subfamilies are more divergent.There are RAF genes in M. truncatula, but fewer than in other plants (see Table 3).However, the ZIK subfamily has more members (25), implying that ZIK genes have expanded.This divergence is present not in terms of numbers of members, but in gene structures.The MEKK genes contain 4 to 7 introns, while RAF has 0-6 introns, and the ZIK subfamily varied from 0 to 13 (see Table 2 and Figure 1).The intron is an important regulator of gene expression in eukaryotes; more introns generally indicate more complex regulation, which may be an important role in complex biological processes, such as tissue development in response to abiotic stress [33].It is worth noting that MtMAPKKK gene duplication conferred similar gene structures, such as MtMAPKKK01::03 (MEKK), MtMAPKKK05::06 (MEKK), MtMAPKKK36::41 (RAF), MtMAPKKK71::72 (ZIK), etc., which suggests they would have similar expression profiles and function in M. truncatula.In Arabidopsis, the MAPKKK family has undergone a large expansion through gene duplication, resulting in 80 members across three subfamilies [36].In the expansion process, there were a number of paralogous genes produced by gene duplication, whose expressions and functions differed in the evolutionary process [37].In M. truncatula, MtMAPKKK genes have also undergone a gene duplication process, which conferred paralogous gene pairs, such as MtMAPKKK01::03 (MEKK), MtMAPKKK05::06 (MEKK), MtMAPKKK32::36::41 (RAF), MtMAPKKK71::72 (ZIK), as previously described.Generally, these genes have similar expression profiles, such as MtMAPKKK05::06 (MEKK).These two MEKK genes were both highly expressed in leaves, while they were absent or less expressed in the seed development process (see Figure 4).A similar process happened in MtMAPKKK71::72 (ZIK): they were both highly induced by cold stress, but less expressed in response to other stresses (see Figure 5).However, the duplication gene pairs can also differ in expression level, for example, MtMAPKKK32::36::41.They were expanded to three chromosomes (MtChr03, 04, and 06; see Figure 3) by segment duplication events, and they had different expression levels (see Figure 6).MtMAPKKK36 was more highly expressed than MtMAPKKK41, and MtMAPKKK41 was more highly expressed than MtMAPKKK32.In addition, MtMAPKKK36 and 41 expression constituted the tissue development and response to abiotic stresses, implying their identical and essential functions during plant development and abiotic stress response processes.By contrast, MtMAPKKK32 was expressed in tissue development, specifically in flowers, which indicated its important role in flower development (see Figure 6).differed in the evolutionary process [37].In M. truncatula, MtMAPKKK genes have also undergone a gene duplication process, which conferred paralogous gene pairs, such as MtMAPKKK01::03 (MEKK), MtMAPKKK05::06 (MEKK), MtMAPKKK32::36::41 (RAF), MtMAPKKK71::72 (ZIK), as previously described.Generally, these genes have similar expression profiles, such as MtMAPKKK05::06 (MEKK).These two MEKK genes were both highly expressed in leaves, while they were absent or less expressed in the seed development process (see Figure 4).A similar process happened in MtMAPKKK71::72 (ZIK): they were both highly induced by cold stress, but less expressed in response to other stresses (see Figure 5).However, the duplication gene pairs can also differ in expression level, for example, MtMAPKKK32::36::41.They were expanded to three chromosomes (MtChr03, 04, and 06; see Figure 3) by segment duplication events, and they had different expression levels (see Figure 6).MtMAPKKK36 was more highly expressed than MtMAPKKK41, and MtMAPKKK41 was more highly expressed than MtMAPKKK32.In addition, MtMAPKKK36 and 41 expression constituted the tissue development and response to abiotic stresses, implying their identical and essential functions during plant development and abiotic stress response processes.By contrast, MtMAPKKK32 was expressed in tissue development, specifically in flowers, which indicated its important role in flower development (see Figure 6).On a global scale, plant growth and development are threatened by various abiotic stresses, such as cold, drought, and salinity [38].Therefore, plants should have the ability to constantly adapt to unfavorable environmental conditions.They employ complex regulatory mechanisms for undergoing physiological and biochemical changes in response to stresses [8,39].MAPK signaling cascade pathways play a remarkably important role in the sensing and transmitting of stress signals, which is an essential step in the establishment of tolerance to various stresses [1,10].The MAPKKK family has the most members in MAPK signaling pathways, and they are widely expressed to regulate plant processes, including growth, development, and response to abiotic stresses [6].In our study, we investigated the expression profiles of MtMAPKKK genes using microarray and RNA-seq data from NCBI and our previous research [28].The expression data were clustered and visualized using a heat-map method, and the results show a wide range of expression levels and distinct regulation during plant development and response to biotic and/or abiotic stresses.In M. truncatula growth and tissue development, the expression of MtMAPKKK genes overlaps among these tissues and organs.It is notable that many MtMAPKKK genes are typically expressed in specific tissues, such as MtMAPKKK05, 06, and 16 (MEKK subfamily) in leaves (see Figure 4); for MtMAPKKK40, 41, present in flowers, this finding is consistent with how MAPKKK genes are expressed in Arabidopsis [38].Based on our analysis of abiotic stress expression data, we found that a large number of MtMAPKKK genes were highly responsive to abiotic stress; some of them were specifically responsive to selected abiotic stresses (see Figure 5).For example, MtMAPKKK66, 71, and 72 (ZIK subfamily) were induced by cold stress; MtMAPKKK33 was induced by salt stress; MtMAPKKK69, 70 were repressed by freezing stress.However, Menges et al. have investigated the expressions of all members of the MAPKKK family through a large number of microarrays in Arabidopsis [37]; because of limited samples, there are still 38 MtMAPKKK genes that have not been implicated in M. truncatula development and/or response to abiotic stress in the present research.We need more biological samples involving various development processes, or more time points of abiotic stress, to reveal the expression profiles of MtMAPKKK genes, which will be useful for determining their function in the future.

Conclusions
In summary, we have identified 73 MtMAPKKK genes in M. truncatula; they were classified into three subfamilies based on phylogenetic analysis.Meanwhile, their expression profiles have been investigated using microarray and a high-throughput sequencing dataset; the results revealed their regulation roles in plant growth and tissue development, especially their essential functions in nodule development.In addition, an RNA-seq experiment was performed to explore their regulation in response to abiotic stresses, implying that MAPKKK family genes broadly participated in the abiotic response process in M. truncatula.The information from this investigation will be useful for the identification and characterization of MtMAPKKK genes whose function will be explored in the future.

Figure 2 .
Figure 2. Phylogenetic tree analysis of the MAPKKK gene family in Medicago truncatula.

Figure 3 .
Figure 3. Chromosomal distribution and expansion analysis of MtMAPKKK genes in Medicago truncatula.Red lines show duplications between members of the MEKK subfamily, blue lines show duplications between members of the RAF subfamily, and purple lines show duplications between members of the ZIK subfamily.

Figure 3 .
Figure 3. Chromosomal distribution and expansion analysis of MtMAPKKK genes in Medicago truncatula.Red lines show duplications between members of the MEKK subfamily, blue lines show duplications between members of the RAF subfamily, and purple lines show duplications between members of the ZIK subfamily.

Figure 4 .
Figure 4. Expression profile cluster analysis of MtMAPKKK genes involved in growth and development.The expression data were clustered and displayed using software ggplot2.The DAP infers days after pollination, while DPI infers days after inoculation; the information on samples is from Benedito et al. [30].

Figure 4 .
Figure 4. Expression profile cluster analysis of MtMAPKKK genes involved in growth and development.The expression data were clustered and displayed using software ggplot2.The DAP infers days after pollination, while DPI infers days after inoculation; the information on samples is from Benedito et al. [30].

Figure 5 .
Figure 5. Expression profile cluster analysis of MtMAPKKK genes involving in tissue development (A) and response to abiotic stress (B).

Figure 5 .
Figure 5. Expression profile cluster analysis of MtMAPKKK genes involving in tissue development (A) and response to abiotic stress (B).

Figure 6 .
Figure 6.The expression profiles of MtMAPKKK32::36::41 involved in tissue development and response to abiotic stress.

Figure 6 .
Figure 6.The expression profiles of MtMAPKKK32::36::41 involved in tissue development and response to abiotic stress.

Table 1 .
List of all MtMAPKKK genes identified in the Medicago truncatula genome.

Table 2 .
List of all MtMAPKKK genes identified in the Medicago truncatula genome.

Table 3 .
The numbers of MAPKKK genes in plant genomes.