Identification and Characterization of Mitogen-Activated Protein Kinase (MAPK) Genes in Sunflower (Helianthus annuus L.)

Mitogen-Activated Protein Kinase (MAPK) genes encode proteins that regulate biotic and abiotic stresses in plants through signaling cascades comprised of three major subfamilies: MAP Kinase (MPK), MAPK Kinase (MKK), and MAPKK Kinase (MKKK). The main objectives of this research were to conduct genome-wide identification of MAPK genes in Helianthus annuus and examine functional divergence of these genes in relation to those in nine other plant species (Amborella trichopoda, Aquilegia coerulea, Arabidopsis thaliana, Daucus carota, Glycine max, Oryza sativa, Solanum lycopersicum, Sphagnum fallax, and Vitis vinifera), representing diverse taxonomic groups of the Plant Kingdom. A Hidden Markov Model (HMM) profile of the MAPK genes utilized reference sequences from A. thaliana and G. max, yielding a total of 96 MPKs and 37 MKKs in the genomes of A. trichopoda, A. coerulea, C. reinhardtii, D. carota, H. annuus, S. lycopersicum, and S. fallax. Among them, 28 MPKs and eight MKKs were confirmed in H. annuus. Phylogenetic analyses revealed four clades within each subfamily. Transcriptomic analyses showed that at least 19 HaMPK and seven HaMKK genes were induced in response to salicylic acid (SA), sodium chloride (NaCl), and polyethylene glycol (Peg) in leaves and roots. Of the seven published sunflower microRNAs, five microRNA families are involved in targeting eight MPKs. Additionally, we discussed the need for using MAP Kinase nomenclature guidelines across plant species. Our identification and characterization of MAP Kinase genes would have implications in sunflower crop improvement, and in advancing our knowledge of the diversity and evolution of MAPK genes in the Plant Kingdom.


Introduction
Plant responses to abiotic and biotic stresses involve protein kinases that are crucial to signal transduction pathways [1]. The protein kinases are involved in a phosphorylation of Serine/Threonine and Tyrosine sidechains of proteins [2]. Among these protein kinases, Mitogen-Activated Protein Kinase (MAPK) cascade genes are key components of signal transduction pathways in animals, plants, and fungi [3] that help transduce extracellular signals to intracellular responses [4]. Discovered in 1986, the MAPK gene family was originally found in animal cells as a microtubule-associated protein kinase [5]. The first reports of plant MAPK gene family in 1993, identified MsERK1 in alfalfa [6] and D5 kinase in pea [7]. MsERK1 is believed to play a role as an inducer of mitosis in root nodules during Sunflower protein sequences from INRA inbred genotype XRQ whose genome is 3.6 gigabases and encodes 52,243 proteins distributed over 17 chromosomes [42] were analyzed in the present study. The 20 MPK and ten MKK sequences of A. thaliana [25] along with 38 MPK and 11 MKK sequences of G. max [26] were used as reference sequences for the identification of MPK and MKK proteins. The multiple sequence alignment of these reference sequences was employed in HMM profiling using the program HMMER (version 3.1b2) [49] at a threshold e-value of 0.01. MPK and MKK genes were further identified using InterProScan (version 5.27) [50], Pfam ID [51], and PROSITE ID (http://prosite.expasy.org/). The proteins with PfamID of MAPK domain (PS01351), ATP-binding domain (PS00107), protein kinase domain (PS50011), and serine/threonine protein kinase active site (PS00108) were used for identification of corresponding MPK and MKK proteins ( Figure 1). Multiple expectation maximization for motif elicitation (MEME) [52] and multiple sequence alignment analysis was performed to confirm the presence of the following signature motifs: (a) the phosphate binding P-loop, GxGxxG [1], where ATP binds in protein kinases; (b) the catalytic C-loop, D(L/I/V)K, found within the S/T PK active site signature; and (c) the activation-or T-loop, T(D/E)Y in MPK and GTxxYMSPER in MKK proteins. The following parameters for MEME were employed: maxsize: 100,000; mod: zoops; nmotifs: 10; minw: 6; and maxw: 25. Furthermore, MKK genes were identified using BLAST [53], with an e-value cutoff of 0.01, in which A. thaliana MKK sequences were used as a query, and the top ten hits for each A. thaliana MKK query sequence were employed for MKK gene identification. The protein theoretical molecular weight and isoelectric point were predicted using compute pI/Mw tool available in ExPASy (http://au.expasy.org/tools). Subcellular localization of the putative MPK and MKK genes of sunflower were analyzed using TargetP 1.1 [54].

Phylogenetic Tree Construction and Homology Assessment
The multiple sequence alignment of identified MPK and MKK proteins of H. annuus and other species used in this study was performed using CLUSTALW [55] and MUSCLE [56] in Geneious [57] and subjected to phylogenetic analysis employing the maximum likelihood (ML; with 100 replicates) using MEGA (version 7.0.14) [58]. The phylogenetic analyses employed an evolutionary model 'Jones-Taylor-Thornton with gamma distribution and invariant sites (JTT+G+I)', the best evolutionary model resulted from the ModelTest analysis using MEGA7. The trees using MPK and MKK sequences were rooted with corresponding human MAPK proteins [HsMAPK1 (GenBank: NP_002736.3) and HsMAPKK1 (GenBank: AAI37460.1), respectively] as an outgroup. Timetree was

Chromosomal Locations and Gene Structure
All 17 chromosome sequences of H. annuus accessed from the Phytozome database were uploaded into the program Geneious [57]. The chromosome locations of MPK and MKK genes of sunflower were visualized using annotation file in Generic Feature Format (GFF) obtained from the annotation database of Phytozome. The exon-intron distribution pattern was obtained by the Gene Structure Display Server [61].

Nomenclature of MPKs and MKKs
Nomenclature of sunflower MPKs and MKKs was carried out using MAPK gene nomenclature guidelines [3,4]. The nomenclature uses the following format: (a) the first letter (upper case) of the genus name followed by two to three letters of species (lower case) was used; (b) a number was provided based on the homology to the Arabidopsis MAPK cascade genes; and (c) the number was followed by a hyphen and a number if paralogs were present. Such guidelines for nomenclature of MPKs and MKKs have been employed in many studies [1,4,26,27,[33][34][35][36][62][63][64][65]. In this study, we renamed GSVIVT01005924001 (VvMPK2) and GSVIVT0102277001 (VvMPK10), identified by Cakir and Kılıçkaya 2015 [37], as VvMPK22 and VvMPK23, respectively, which were not identified in a study by Mohanta et al. 2015 [1].

Expression Analysis and miRNA Prediction of Sunflower MPKs and MKKs
The expression pattern of sunflower MPKs and MKKs was investigated using data accessed from NCBI SRA SRP092742 [SRR4996815 (polyethylene glycol or peg)-treated pooled root samples), SRR4996819 (NaCl-treated pooled root samples), SRR4996823 (Peg-treated pooled leaf samples), SRR4996828 (pooled control root samples), SRR4996834 (NaCl-treated pooled leaf samples), SRR4996836 (pooled control leaf samples), SRR4996839 (salicylic acid-treated pooled leaf samples), and SRR4996847 (salicylic acid-treated pooled root samples)]. These data are the result of the application of one hormone treatment (0.05 µM SA), two abiotic stresses [Peg 6000 (100 g/l), which creates osmotic stress, and NaCl (100 mM) for salt stress], and control [dimethyl sulfoxide (DMSO) only] collected from root and leaf samples. The detailed experiment is described in Badouin et al. 2017 [42]. Briefly, roots and first leaves were collected after six hours of treatment (SA, Peg, NaCl, and DMSO), and applied to two-week-old sunflower seedlings (INRA inbred genotype XRQ) grown in a hydroponic system. The collection was repeated three times and was pooled after separate RNA extractions in equimolar concentration. RNA sequencing of root and leaf samples was performed as non-oriented pair end libraries (2*76 bp for roots and 2*100 for leaves). The quality control of these reads was accessed by running the FastQC program (version 0.11.3) [66], and trimming was done using Btrim64 (version 0.2.0) [67] to remove low-quality bases (QC value > 20; 5-bp window size). High-quality pair-end reads were mapped against the coding sequences of H. annuus (Hannuus: Hannuus_494_r1.2.transcript.fa.gz) obtained from the Phytozome database using the Salmon (version 0.9.1) [68] in Bioconda [69]. The codes that were used for data processing are available as Supplementary File S1. The obtained transcript estimated quantification reads for each treatment were compared with their respective reads from the controls to calculate the log 2 Fold Change (log 2 FC) and visualized using integrated Differential Expression and Pathway analysis (iDEP 0.81 R/Bioconductor packages; http://bioinformatics.sdstate.edu/idep/) [70]. The heatmap was generated using the following criteria: distance-correlation, linkage-average and cut-off Z score-4 to study the hierarchical clustering and expression pattern of MPK and MKK genes in different tissues under different treatments. k-means clustering was done using the standardization normalization technique. For identifying the potential miRNA targeting sites, the nucleotide sequences of the identified sunflower MPKs and MKKs were subjected to a plant small RNA (psRNATarget) target analysis server [71] against seven published H. annuus microRNAs, selecting Schema V2 (2017 release) as a scoring option.

Tajima's Relative Rate and Neutrality Test
Tajima's relative rate test [72] was conducted to study the statistical significance of variations in molecular evolution in a different group of plants. The same MEGA files used in phylogenetic tree construction were used in the program MEGA7. In this test, three random sequences of either MPKs or MKKs of different plant species were selected, considering one of the sequences as the outgroup, and the χ 2 test statistic was applied. A p-value of less than 0.05 was used to reject the null hypothesis of equal rates of evolution between selected sequences of different plant groups. All positions containing gaps and missing data were eliminated. Tajima's test of neutrality [73] was performed to understand and distinguish the evolutionary pattern of randomly evolved MPKs or MKKs with non-randomly evolving MPKs or MKKs. During the neutrality test, all positions with less than 95% site coverage were eliminated. Therefore, fewer than 5% alignment gaps, missing data, and ambiguous bases were allowed at any position. The groupings of A, B, and C represent the statistical groups, which should not be confused with MPK or MKK clades.

Phylogenetic Analyses
Full-length amino acid sequences of MPKs and MKKs of sunflower, Arabidopsis and soybean were employed for evaluating evolutionary relationships, as well as for nomenclature of the MPKs and MKKs of species under study. These sequences were subjected to multiple sequence alignment and subsequent phylogenetic analyses. Phylogenetic analyses included MPK and MKK gene sequences from, A. coerulea, A. thaliana, A. trichopoda, C. reinhardtii, D. carota, G. max, H. annuus, O. sativa, S. lycopersicum, and V. vinifera.

MPKs
Sunflower MPK (HaMPK) protein sequence length ranged from 349 to 588 amino acid (aa), except for HaMPK4, which was only 157 aa. The average length of MPKs was 425 aa, with isoelectric points ranging from 5.22 (HaMPK13-1) to 9.65 (HaMPK23-1) and a predicted average molecular mass of 48523.772 Da (Table 1). Twenty-eight HaMPKs identified in this study were nested into four clades (A-D; each with bootstrap support > 70%) ( Figure S4), which corresponded to their homologs in A. thaliana and G. max, except for the Clade C MPK members (Table S3). The Clade A members in this study include the previously identified group A and B members of A. thaliana MPKs [3,4]. Likewise, Clade B consists of previously identified group C members of A. thaliana MPKs. In addition, Clade C includes the members identified in group E of soybean MPKs [26]. The number of HaMPKs in Clades A, B, C, and D were nine, four, five, and ten, respectively. Sunflower MPK Clade C included five members with HaMPK22 (a homolog to GmMPK22-1 and GmMPK22-2) and HaMPK23-1/23-2/23-3/23-4 (homologs to the corresponding GmMPK23-1/23-2/23-4/23-4). Clade A and B consisted of members with phosphorylation motif TEY (except for HaMPK23-1 and HaMPK23-2 that are nested within Clade C), while those with the TDY motif were found in Clade C and D. The sunflower MPK orthologs are shown in Table S4 Figure S1).
Phylogenetic analysis of full-length protein sequences was conducted to study evolutionary patterns of the MPKs in ten plant species with sequences of C. reinhardtii ( Figure 3). The MPKs were nested in four clades (Clade A-D; Table S3) (Table 3). Table 3. Consensus common docking sites in the MPK proteins belonging to clades A-D.

Clades
Consensus Common Docking Sites

Clades
Consensus common docking sites

MKKs
Sunflower HaMKK protein sequence length ranged from 308 to 520 aa. The average length of proteins for MKKs was 372 aa, with isoelectric points ranging from 5.43 (HaMKK2) to 9.25 (HaMKK5) and a predicted average molecular mass of 42688.86 (Table 1). Corresponding with their homologs in Arabidopsis and G. max, the eight identified HaMKKs are divided into four distinct clades ( Figure  S5). The MKK homologs of MKK1/2/6-1/6-2/3/4/5/9 were only found in sunflower. The clades' divergence followed serine/threonine amino acid motif patterns in sunflower. For example, Clade A contained SxxxxxS/TxxxxxT, Clade B with SxxxxxTxxxxxT, Clade C with SxxxxxTxxxxxS, and D with SxxxxxSxxxxxT. The HaMKKs in Clades A, B, C, and D were four, one, two, and one, respectively (Table S5). The orthologs of identified MKKs of sunflower in different plant species are represented in Table S6. In the HaMKK gene family, the number of introns ranged from zero (HaMKK9, HaMKK4, and HaMKK5) to 11 (HaMKK3) (Table 2, Figure S3). Clade A members HaMKK6-1 and HaMKK6-2 consisted of eight exons and are paralogs to each other. The remaining Clade A members, HaMKK2 and HaMKK1, consisted of nine and ten exons, respectively. The only member of Clade B, HaMKK3, consisted of twelve exons. Interestingly, the members of Clade C and D (HaMKK9, HaMKK4, and HaMKK5) had no introns.
Phylogenetic analysis of full-length MKK amino acid sequences from the plant species with sequences of C. reinhardtii under this study revealed four distinct clades (Clades A-D, Figure 5).

MKKs
Sunflower HaMKK protein sequence length ranged from 308 to 520 aa. The average length of proteins for MKKs was 372 aa, with isoelectric points ranging from 5.43 (HaMKK2) to 9.25 (HaMKK5) and a predicted average molecular mass of 42688.86 (Table 1). Corresponding with their homologs in Arabidopsis and G. max, the eight identified HaMKKs are divided into four distinct clades ( Figure S5). The MKK homologs of MKK1/2/6-1/6-2/3/4/5/9 were only found in sunflower. The clades' divergence followed serine/threonine amino acid motif patterns in sunflower. For example, Clade A contained SxxxxxS/TxxxxxT, Clade B with SxxxxxTxxxxxT, Clade C with SxxxxxTxxxxxS, and D with SxxxxxSxxxxxT. The HaMKKs in Clades A, B, C, and D were four, one, two, and one, respectively (Table S5). The orthologs of identified MKKs of sunflower in different plant species are represented in Table S6. In the HaMKK gene family, the number of introns ranged from zero (HaMKK9, HaMKK4, and HaMKK5) to 11 (HaMKK3) (Table 2, Figure S3). Clade A members HaMKK6-1 and HaMKK6-2 consisted of eight exons and are paralogs to each other. The remaining Clade A members, HaMKK2 and HaMKK1, consisted of nine and ten exons, respectively. The only member of Clade B, HaMKK3, consisted of twelve exons. Interestingly, the members of Clade C and D (HaMKK9, HaMKK4, and HaMKK5) had no introns.
Phylogenetic analysis of full-length MKK amino acid sequences from the plant species with sequences of C. reinhardtii under this study revealed four distinct clades (Clades A-D, Figure 5).

Expression Analysis and miRNA Prediction of Sunflower MPKs and MKKs
The functional analysis of both HaMPKs and HaMKKs was studied using RNA-seq data available in NCBI. Since the sunflower genome was recently available, the expression data for pathogen stress were not available in the public databases. We investigated the expression pattern of

Expression Analysis and miRNA Prediction of Sunflower MPKs and MKKs
The functional analysis of both HaMPKs and HaMKKs was studied using RNA-seq data available in NCBI. Since the sunflower genome was recently available, the expression data for pathogen stress were not available in the public databases. We investigated the expression pattern of MPKs and MKKs in leaves and roots treated with one hormone treatment (SA) and two abiotic stresses (NaCl and Peg). We did observe expression patterns for all HaMPKs and HaMKKs except for HaMPK4 (Supplementary File S5). The k-means clustering result showed that the HaMPKs and HaMKKs were clustered into four groups ( Figure S7 and Table S7). Cluster A consisted of seven HaMPKs (from Clades A, B, and D) and four HaMKKs (from Clades A, B, and C). Cluster B consisted of three HaMKK genes (from Clades A and D) and two HaMPK genes (from Clade A). Cluster C consisted of three genes belonging to both HaMPKs (from Clades A and D) and one HaMKK (from Clade C). Cluster D consisted of 15 genes belonging to HaMPKs (belonging to clades A-D). The log 2 FC for each gene and hierarchical clustering of HaMPKs and HaMKKs representing the functional divergence of these genes are represented in Figure S8 and Figure 6, respectively. Some genes were upregulated in response to the treatments compared to the control of their respective tissues. For instance, in leaves, HaMKK5, HaMKK6-2, HaMPK3-2, HaMPK11-1, HaMPK14, HaMPK1, HaMPK6-2, HaMPK19-1, and HaMPK18 showed log 2 FC > 1 in response to Peg; HaMKK5, HaMKK6-2, HaMPK11-1, HaMPK14 showed log 2 FC > 1 in response to NaCl; HaMPK11-1 showed log 2 FC > 1 in response to SA. In roots, HaMKK4, HaMKK1, HaMKK2, HaMPK3-2, HaMPK13-2, HaMPK23-2, HaMPK9-2 and HaMPK11-2 showed log 2 FC > 1 in response to Peg; HaMKK9, HaMPK13-2, HaMPK6-1, and HaMPK3-1 showed log 2 FC in range of 0.7 to 1.45 in response to SA; HaMPK6-1, HaMPK2, HaMPK23-2, and HaMPK17 showed log 2 FC > 0.9 in response to NaCl. In contrast, some genes were downregulated in response to the treatments compared to the control of their respective tissues. For example, in leaves, HaMKK9, HaMKK2, and HaMPK13-2 showed log 2 FC in a range of −0.6 to −0.8 in response to Peg; HaMKK9, HaMPK7, HaMPK23-1 showed log 2 FC in a range of −0.6 to −0.8 in response to NaCl; HaMKK4, HaMPK7, and HaMPK11-2 showed log2fold change in a range of −0.58 to −2.11 in response to SA. Likewise, in roots, HaMPK14 showed log 2 FC of −0.53 in response to Peg; HaMKK6-2, HaMPK13-2, HaMPK14, and HaMPK9-2 showed log2fold change in a range of −0.62 to −1.50 in response to NaCl; HaMPK14, HaMPK19-1, and HaMPK9-2 showed log 2 FC in a range of −0.68 to −1.6 in response to SA. In addition, the expression of HaMPKs, HaMKKs showed functional divergence in response to stresses as the clustering of these genes in a heatmap was not in accordance with the nesting pattern within clades in phylogenetic trees. The potential miRNA target sites in MPKs and MKKs identified using psRNATarget server revealed five (han-miR156a/b/c, han-miR160a, han-miR3630-5p) of seven miRNA families that may be involved in targeting sunflower MPKs only (Table S8). HaMPK16-2, HaMPK11-1, and HaMPK23-3 were found to be targeted by both miRNAs (han-miR156a/b).

Tajima's Relative Rate and Neutrality Tests on MPKs and MKKs
Separate statistical analyses were performed selecting three random sequences from MPKs and MKKs group. For Tajima's relative rate test for MPKs and MKKs, the sequences were selected from the species representing a diverse taxonomic group: monocot, a dicot, basal angiosperm, bryophytes, and algae. For the analysis of MPK genes following a group of sequences were selected: (a) OsMAPK4 (monocot) and HaMPK6 (dicot) with AmtMPK13-1 (basal angiosperm); (b) OsMAPK4 (monocot) and HaMPK6 (dicot) with sequence SfMPK4-1 (bryophyte); and (c) OsMAPK4 (monocot) and HaMPK16-1 (dicot), with sequence CreMPK2 (algae) ( Table S9). The plant group combination in column 1, 2 and 3 of MPKs resulted in p-values of 0.01, 0.0053, and 0.0007, with χ 2 values of 6.54, 7.78 and 11.46, respectively. For MKKs, the following group of sequences were selected: (a) OsMAPKK5 (monocot) and HaMKK6-1 (dicot) with AmtMKK6 (basal angiosperm); (b) OsMAPKK5 (monocot) and HaMKK6-1 (dicot) with sequence SfMKK3 (bryophyte); and (c) OsMAPKK5 (monocot) and HaMKK6-1 (dicot) with CreMKK3 (algae) ( Table S10). The plant group combination in columns 1, 2 and 3 of MKKs resulted in p-values of 0, 0.04965 and 0.05687 with χ 2 values of 100.55, 3.85 and 3.36, respectively. Tajima's Relative Rate test is commonly used to analyze variation in both DNA and amino acid sequences [78]. This test has been applied to various genes belonging to different gene families, such as MAPKs and WRKY transcription factors [1,78]. In this study, the p-value (less than 0.05) and χ 2 statistic showed randomly selected sequences of MPKs and MKKs of different plant groups to be statistically significant, rejecting the null hypothesis of equal rates between selected sequences of different plant groups. The interpretation of Tajima's D is as follows: D = 0 (observed variation is similar to the expected variation, which shows evidence of no selection), D < 0 (presence of excessive rare alleles, suggesting recent selection sweep and recent population expansion), and D > 0 (lack of rare alleles, suggesting balanced selection and sudden population contraction) [72,73]. The values in the ranges greater than 2 or less than −2 are considered to be statistically significant [72,73]. In our study, Tajima's neutrality test statistics (D) were found to be 5.391062 for MPKs and 5.928839 for MKKs (Table S11). This suggests that both MPKs and MKKs have undergone a balanced selection with contraction in gene family size. Also, the average heterozygosity of both MAPKs and MKKs is more than those of the segregating sites, suggesting a high frequency of polymorphism.

Nomenclature of MPKs and MKKs
A recent study on various Triticeae species (wheat, barley, rye, and triticale) by Goyal et al. 2018 [35] reported numerous discrepancies in MAPK nomenclature of wheat and barley and suggested a new name based on sequence homology. A consistent nomenclature of proteins belonging to the same gene family across species based on orthology facilitates easy prediction and understanding of the function of a particular protein [92]. Cakir and Kılıçkaya 2015 [37] reported MAP kinase cascade genes in V. vinifera and confirmed the orthology of VvMPK14, VvMPK12, VvMPK11, VvMPK13, VvMPK7, VvMPK3, VvMKK5, VvMKK3, and VvMKK2 to Arabidopsis AtMPK6, AtMPK3, AtMPK13, AtMPK12, AtMPK16, AtMPK9, AtMKK3, AtMKK6, and AtMKK2, respectively. Likewise, MAP Kinase cascade genes analyses in Ziziphus jujuba [30] provided nomenclature of MAP kinase cascade genes based on the order of appearance in different groups in the phylogenetic tree, and not based on orthology (or sequence homology) to Arabidopsis MAP Kinase cascade genes. The proper nomenclature of these MAP Kinase cascade genes should be used following an orthology or sequence homology-based MAPK gene nomenclature guidelines to maintain consistency across the plant kingdom.

Diversity and the Phylogenetic Relationship of MPKs
Our identification of MPKs yielded a slight variation in the number of genes from the previous studies; for example, we identified 15 MPKs in S. lycopersicum, which is different from Kong et al. 2012 [93], who reported 16 MPKs, and Mohanta et al. 2015 [1], who found 17 MPKs in the tomato genome. The number of AcMPKs in this study was 11, whereas Mohanta et al. 2015 [1] reported only 10 AcMPKs. In C. reinhardtii, six CreMPKs identified in this study were consistent with Mohanta et al. 2015 [1], whereas Dóczi et al. 2012 [39] reported only five CreMPKs. The variation in several genes within the same species in different studies might come as a result of different statistical and stringency parameters employed during HMM profiling and further downstream motif analysis. The detailed study of MPKs of D. carota, A. trichopoda, S. fallax, and H. annuus has never been reported in previous studies. The number of MPK genes in sunflower is higher than that previously identified in numerous other plant species, such as Arabidopsis (119Mb) [3] and rice (420Mb) [94], and lower than in soybean (1100Mb) [26]. Even the size of the sunflower genome, which is believed to have undergone the first whole genome triplication approximately 38-50 MYA, and whole genome duplication approximately 29 MYA, is about 3.5 times larger [95] than that of the soybean genome: the number of MPKs is lower in sunflower than soybean. Soybean has undergone two polyploidization events, approximately 59 and 13 MYA [75,96]. Thus, recent polyploidy in plants has resulted in extra copies of genes to their genome [97,98]. The slightly lower number of MPKs in sunflower might be due to past polyploidization events and the recent amplification of repetitive elements causing highly similar and related sequences [99]; the sunflower genome also encodes 52,243 proteins [42], which is slightly fewer than the soybean genome (56,044 proteins) [75].
Phylogenetic analysis of HaMPKs revealed four distinct clades, which were consistent to the MPKs previously identified in Arabidopsis [100], poplar [101], rice [102], Brachypodium distachyon [33], Malus domestica [32], Ziziphus jujuba [30], Triticeae species [35], Brassica rapa [28], and Fragaria vesca [103]. In Clade A, sunflower has one extra copy of MPK3, MPK6, MPK11, and MPK13 genes that might be because of duplications after the divergence from Arabidopsis. Such extra copies of these genes have also been observed in soybean [26]. The two copies of MPK3 and MPK6 were also found in D. carota. The nesting pattern of sunflower and other species' MPK genes with the characterized Arabidopsis MPKs suggest their potential role in respective functions. AtMPK3 is involved in various signaling pathways related to various stresses, such as wounding and hypersensitive responses elicited by Avr-R gene interaction [8,104]. The MAP kinase genes, IbMPK3 and IbMPK6, in sweet potato (Ipomoea batatas), and homologs of AtMPK3 and AtMPK6, provide resistance to Pseudomonas syringae pv. tabaci (Pta) bacteria in tobacco leaves, and were induced in various abiotic stresses, as well [84]. In maize, ZmMPK3, a homolog of AtMPK3 is induced in response to various environmental stresses [105]. Similarly, AtMPK4 and AtMPK6 are involved in response to abiotic and biotic stress such as cold, drought, touch and wounding, resulting in the production of reactive oxygen species in Arabidopsis [106,107]. AtMPK4 is phosphorylated and activated by the upstream components AtMEKK1 and AtMKK2 upon cold and salt stress signaling in Arabidopsis [107,108]. Clade A also consists of AtMPK5, the homolog of which in rice, OsMPK5, is well characterized to regulate stress responses [109]. All copies of MPK1/2, MPK7/14 are retained in soybean in sunflower, soybean, and Arabidopsis. Among them, AtMPK1, AtMPK2, AtMPK7, AtMPK14 are phosphorylated by AtMKK3 upon abscisic acid application in A. thaliana plantlets [110]. AtMPK1 is induced upon salt stress, whereas some MPKs in rice and alfalfa such as BWMK1 and TDY1, respectively, are activated upon wounding by pathogens [111][112][113]. G. max MAP kinase 1 (GMK1), a homolog of AtMAPK1, is activated in response to salt stress in soybean [114]. Likewise, a homolog of AtMPK7 in maize, ZmMPK7 is involved in the removal of reactive oxygen species upon induction by abscisic acid and hydrogen peroxide in maize [115]. Another homolog of AtMPK1 in Hordeum vulgare (HvMPK4) showed enhanced resistance to Magnaporthe grisea and enhanced tolerance to salt stress [85]. Clade C members include the homologs of G. max GmMAPK22-1/22-2 and GmMAPK23-1/23-2/23-3/23-4 [26] with no MPKs in Arabidopsis. A single copy of GmMAPK22-1/GmMAPK22-2 ortholog is retained in sunflower, and hence it is named HaMPK22. Meanwhile, all copies of GmMAPK23-1/23-2/23-3/23-4 are retained in sunflower and are hence named HaMPK23-1/23-2/23-2/23-3/23-4. All the members of Clade D consist of the TDY motif in the T-loop and are homologs to various Arabidopsis and soybean MPKs belonging to MPK16/19/18/8/15/17/9.

Diversity and Phylogenetic Relationship of MKKs
Sunflower MKKs also formed four distinct clades (A-D) with previously identified MKKs of Arabidopsis and soybean. These four clades (A-D) are consistent with the MKKs of various plant species such as Arabidopsis [100], rice [102], poplar [101], B. distachyon [33] and apple [32]. MKK clades consist of well-characterized MKK proteins such as AtMKK1/2/3/4/5 [118][119][120][121]. Clade A consists of HaMKKs grouped with AtMKK1/6/2, GmMAPKK6-1/6-2, GmMAPKK1, GmMAPK2-1/2-2. Sunflower and soybean have extra one copy of MKK6 compared to Arabidopsis and other plant species under study, including S. fallax. This suggests that the extra one copy of MKK6 was not seen until soybean diverged from Arabidopsis. Also, the retention of at least one copy of MKK6 in all species suggests its important role in signaling mechanisms during various stresses. We did not find a copy of MKK2-2 in sunflower, as is found in soybean (GmMAPKK2-2). The characterized AtMKK1 protein (orthologue of HaMPKK1) is induced upon the application of various stresses such as wounding, drought, cold, and high salinity in Arabidopsis seedlings [118]. AtMKK2 (ortholog of HaMKK2) is activated upon cold and salt stress signaling in Arabidopsis and mediates the phosphorylation of downstream MPKs [107]. Clade B consists of MKKs from the MKK3 proteins across all species under study, including C. reinhardtii. All species have a single copy of the MKK3 proteins except G. max, with two copies (GmMAPKK3-1/3-2). Two copies of MKK3 proteins in soybean is expected, as they underwent two duplication events to become a tetraploid. A divergence-time estimation based on Clade B sequences (each from all species) revealed how MKK3 proteins are conserved and retained in Algae, Bryophyte, Amborellales, Monocots, Ranunculales, Rosids, and Asterids. The divergence time analysis of MKK3 with CreMKK3 as the outgroup showed bryophyte and Amborellales being sister to the land plants and other extant species, which is consistent to previous studies [122,123] and follows the evolutionary history inferred on the Angiosperm Phylogeny Website [124]. One of these MKK3s, AtMKK3 is activated upon exposure to various stresses, such as cold, salt, hyperosmotic and ABA treatments [120]. This suggests the potential role of HaMKK3 in such stresses. Clade C consists of both copies of AtMKK4 and AtMKK5 only in A. trichopoda, O. sativa, and sunflower. However, V. vinifera, S. lycopersicum, and D. carota consist copies of MKK5 (MKK4 group absent). AtMKK4 and AtMKK5 are activated in Arabidopsis, mediating cell death and production of hydrogen peroxide [119].
In Clade D, the orthologs for MKK9 were found in all angiosperms except in soybean and O. sativa. Interestingly, we found three copies of MKK10 in S. fallax, as in O. sativa, and one copy of MKK10 in basal angiosperm, A. trichopoda, and Ranunculales, A. coerulea. We did not find any copy of MKK10 in sunflower, S. lycopersicum, D. carota, or V. vinifera. We observed HaMKK4/5/9 with one exon each that correlates to the At1g51660 (AtMKK4), At3g21220 (AtMKK5), and At1g73500 (AtMKK9), consisting of one exon per gene (https://www.Arabidopsis.org/index.jsp). Also, members belonging to Clade C and D in Gossypium raimondii had one exon in each [116]. This suggests that gene members belonging to Clade C and D encode proteins that are well conserved across plant species. Altogether, the diversity in the exon-intron structures might imply that duplication events caused the evolution of these genes under different environmental conditions. Also, AtMKK1 and AtMKK2 are involved in maintaining ROS homeostasis in Arabidopsis [121]. Since the paralog pairs, HaMKK6-1/6-2 and SfMKK10-1/10-2, are present on their different respective chromosomes, we infer a possible role of segmental duplications.

Expression Analysis and miRNA Prediction
In this study, we explored the expression pattern of MPKs and MKKs of sunflower under one hormone treatment, SA and two simulated abiotic stresses: NaCl for salinity, and Peg for osmotic stress in leaves and roots from the publicly available RNA-seq data. The expression of all sunflower MPKs and MKKs was detected in both leaves and roots, except for HaMPK4. In response to hormone SA, HaMPK11-1 was upregulated in leaves; HaMKK9, HaMPK13-2, HaMPK6-1, and HaMPK3-1 were upregulated in roots; HaMKK4, HaMPK7, and HaMPK11-2 were downregulated in leaves; HaMPK19-1, HaMPK14, and HaMPK9-2 were downregulated in roots. It has been established that SA is directly involved in MAPK phosphorylation [125]. SA-induced protein kinase (SIPK; AtMPK6) and wound-induced protein kinase (WIPK; AtMPK3) are important in balancing salicylic acid or jasmonic acid during herbivore wounding [126]. In Arabidopsis, AtMKK9 and AtMPK6 play important role in leaf senescence, which is a complex process caused by various factors including salicylic acid [127]. Also, ZmMPK3 in Zea mays is activated upon the application of SA hormone [128]. Thus, HaMPK3-1, HaMKK9, and HaMPK6-1 might play an important role in leaf senescence and salicylic acid pathways in sunflower. In response to NaCl, HaMKK5, HaMKK6-2, HaMPK11-1 were upregulated in leaves; HaMPK14, HaMPK6-1, HaMPK2, HaMPK23-2, and HaMPK17 were upregulated in roots; HaMKK9, HaMPK7, HaMPK23-1 were downregulated in leaves; HaMKK6-2, HaMPK13-2, HaMPK14, and HaMPK9-2 were downregulated in roots. Among them, HaMPK17 play an important role under salinity stress, as its ortholog in Gossypium hirsutum, GhMPK17, was induced by salt, osmosis and abscisic acid [129]. The expression pattern of some genes depended on different parts of the plant, for example, HaMKK6-2 was upregulated in leaves and downregulated in roots in response to NaCl. In response to Peg, HaMKK5, HaMKK6-2, HaMPK3-2, HaMPK11-1, HaMPK14, HaMPK1, HaMPK6-2, HaMPK19-1, and HaMPK18 were upregulated in leaves; HaMKK4, HaMKK1, HaMKK2, HaMPK3-2, HaMPK13-2, HaMPK23-2, HaMPK9-2, and HaMPK11-2 were upregulated in roots; HaMKK9, HaMKK2, and HaMPK13-2 were downregulated in leaves; HaMPK14 was downregulated in roots. This reveals that at least 19 HaMPK and seven MKK genes were induced upon these treatments, as compared to the control. Among them, some genes are induced upon multiple treatments. For example, HaMKK4 and HaMKK6-2 were induced upon both NaCl and Peg; HaMPK6-1 was induced upon NaCl and Peg; HaMPK16-2 was induced upon both SA and NaCl. The functional divergence can be observed on both HaMPKs and HaMKKs, as the hierarchical clustering patterns of expression of these genes do not follow the nesting pattern within clades in the phylogenetic trees, except for a few genes. For example, in MPKs, HaMPK22/23-3 that belonged to Clade C, HaMPK3-1/3-2/11-2 that belonged to Clade A, and HaMPK9-2/16-2/17 that belonged to clade D showed hierarchical clustering for expression of these genes. However, only HaMKK6-1/6-2 that belonged to Clade A of the MKK subgroup showed hierarchical clustering for expression of these genes. This shows the functional divergence and convergence of the HaMPK and HaMKK genes within and among the clades under different stress responses. Among seven published H. annuus microRNAs, five families of miRNAs are involved in possibly targeting eight MPKs. We did not find any miRNAs targeting HaMKK genes. Previous studies have reported the role of miRNAs in MAPK signaling pathways of animal systems in chronic myeloid leukemia [130], papillary thyroid carcinoma [131], Caenorhabditis elegans [132]. Not only in animals, but studies also reported the prediction of miRNAs targeting MAPK genes of plants such as Gossypium hirsutum (ghr-miR5272a regulating MAPKK6) [133] and Oryza sativa (miR1429_5p targeting MPK17-1 and miR531 families targeting various MKKK transcripts) [134].

Conclusions
This study represents the first genome-wide identification, analysis and nomenclature of MPKs and MKKs in H. annuus, D. carota and, S. fallax, as well as reassessment of these genes in A. coerulea, A. trichopoda, C. reinhardtii, and S. lycopersicum. We identified 28 MPKs and eight MKKs in sunflower, and studied their genomic architecture, phylogenetic relationships, and functions in relation to nine other plant species (including A. thaliana, G. max, O. sativa, and V. vinifera). While the 3.6 gigabase sunflower genome is one of the largest among plants with available complete genome sequences, more MPKs and MKKs were found in soybean, which has a genome size of 975 Mbs. Analyses of P-loop, catalytic C-loop, and T-loop showed that HaMPKs and HaMKKs could be classified into four clades, which are comparable to those groups identified in A. thaliana and G. max. However, clades such as Clade A, B, and C of MPKs consisted of different group members of A. thaliana and G. max. Among the MPK and MKK genes studied, the MKK3 proteins were well-conserved and retained in all species included in this study, including the outgroup C. reinhardtii, which warrants further exploration of these proteins across a wide array of species. The transcriptome data generated under hormone and abiotic stress treatments revealed diverse expression patterns of sunflower MPKs and MKKs exhibiting a dynamic role in adaptation to changing environmental conditions. We observed functional divergence of the HaMPK and HaMKK genes within the gene members of the same clade. The results of this study are important for understanding diversity and evolution of the MAPK gene family in plants and enhancing our knowledge of MAPK signaling pathways in sunflower. These findings can inform cultivar improvement in sunflower through stress-tolerance breeding.
Supplementary Materials: Supplementary files are available at http://www.mdpi.com/2223-7747/8/2/28/s1, Table S1: MPKs identified in different species in the present study. Table S2: MKKs identified in different species in the present study. Table S3: Abundance of the MPK gene family of sunflower and ten other species. Table  S4: Sunflower MPKs with their orthologs in other plant species. Table S5: Abundance of the MKK gene family of sunflower and ten other species.