Genome-Wide Mining and Identiﬁcation of Protein Kinase Gene Family Impacts Salinity Stress Tolerance in Highly Dense Genetic Map Developed from Interspeciﬁc Cross between G. hirsutum L. and G. darwinii G.

: Abiotic stress is an important limiting factor in crop growth and yield around the world. Owing to the continued genetic erosion of the upland cotton germplasm due to intense selection and inbreeding, attention has shifted towards wild cotton progenitors which o ﬀ er unique traits that can be introgressed into the cultivated cotton to improve their genetic performance. The purpose of this study was to characterize the Pkinase gene family in a previously developed genetic map of the F 2 population derived from a cross between two cotton species: Gossypium hirsutum (CCRI 12-4) and Gossypium darwinii (5-7). Based on phylogenetic analysis, Pkinase (PF00069) was found to be the dominant domain with 151 genes in three cotton species, categorized into 13 subfamilies. Structure analysis of G. hirsutum genes showed that a greater percentage of genes and their exons were highly conserved within the group. Syntenic analysis of gene blocks revealed 99 duplicated genes among G. hirsutum, Gossypium arboreum and Gossypium raimondii . Most of the genes were duplicated in segmental pattern. Expression pattern analysis showed that the Pkinase gene family possessed species-level variation in induction to salinity and G. darwinii had higher expression levels as compared to G. hirsutum. Based on RNA sequence analysis and preliminary RT-qPCR veriﬁcation, we hypothesized that the Pkinase gene family, regulated by transcription factors (TFs) and miRNAs, might play key roles in salt stress tolerance. These ﬁndings inferred comprehensive information on possible structure and function of Pkinase gene family in cotton under salt stress. of At7, At12, Dt1, Dt3, Dt16, Dt18, A02, A03, A07, A09, A011, D01, D03, D09, D12 D13)


Introduction
Cotton (Gossypium spp.) is an important crop cultivated for its fiber and oil around the world, producing about 35% of the fiber worldwide. The annual global production is estimated to be approximately $500 billion per annum [1]. The genus Gossypium is comprised of 53 identified species, including 7 tetraploid (2n = 4x = 52) and 46 diploid (2n = 2x = 26) species. Four species, two diploids (Gossypium arboreum, Gossypium herbaceum) and two tetraploids (Gossypium hirsutum, Gossypium barbadense), are cultivated for their fiber production [2][3][4]. It has been proposed that two major events have contributed to the evolution of the genus. First, about 5 to 10 million years ago (MYA), D genome diploids diverged from A genome diploids [5], followed by interspecific hybridization between genome D, ancestor to Gossypium raimondii, and genome A, ancestor to G. arboreum, about 1 to 2 MYA, resulted in the evolution of the heterogeneous polyploid G. hirsutum genome [6]. Exogenous polyploid G. hirsutum is inherently diverse from its parent species G. raimondii and G. arboreum, both morphologically and economically [7,8]. G. darwinii (AD5) is a tetraploid wild species endemic to the Galapagos Islands, characterized by its fine fiber quality and resistance to major abiotic stresses and fungal wilt diseases [9]. The use of wild progenitors has been of great significance in boosting the genetic diversity of various crops [10]. In other crops such as rice, several new and improved genotypes have been developed by crossing the cultivated species to its wild relatives [11].
Salt stress, the most common abiotic stress, is a serious threat to agriculture and the environment by limiting crop productivity worldwide. A complex combination of factors causes high salt concentrations in soil, including arid climates, high underground water levels, seawater infiltration and improper management practices [12,13]. High concentrations of Na + causes ionic imbalance, disrupts osmotic adjustments, metabolic dysfunction, membrane disorganization, damage to cellular structures within plant cells and inhibits photosynthesis leading to a reduction in growth [14,15]. Various hormones, Ca 2+ -related and reactive oxygen species (ROS) signaling pathways play the key role in signal transduction [16]. It is well known that abscisic acid (ABA), ethylene (ET), salicylic acid (SA) and jasmonates (JA) are the major mediators in regulating plant defense response against pathogens and abiotic stresses [17,18].
Plants have evolved an integrated homoeostasis mechanism to cope with ionic imbalance involving numerous transcription factors such as MYB (myeloblastosis), TCP (teosinte branched1 cycloidea PCF1) and WRKY (tryptophan, arginine, lysine, tyrosine containing domain), which control many of the crucial natural processes in plants [19]. The eukaryotic protein kinases are defined as enzymes that use the γ-phosphate of adenosine triphosphate (ATP) to phosphorylate serine, threonine or tyrosine residues in protein [20]. Protein kinases have been reported to regulate the responses of plants to salt and osmotic stress. Salt overly sensitive (SOS) pathway, a novel pathway linking Ca 2+ signaling in response to salt stress [21,22], results in the exclusion of excess Na + ions out of the cell via the plasma membrane. Briefly, following exposure to salt stress, Ca 2+ signals are generated, that are sensed by Ca 2+ sensors or Ca 2+ -binding proteins, leading to strong responses of downstream factors, for instance, SOS3 kinase; the binding of calcium and myristoylation of SOS3 activates its function which in turn activates SOS2 kinase. Once activated, SOS2 phosphorylates the SOS1 Na + /H + antiporter, which pumps Na + out of the cytosol and reinstates cellular ion homeostasis [23].
Cotton is a moderately salt-tolerant crop, with a salinity threshold level of 7.7 dS m −1 [24]. However, the productivity and quality of cotton is adversely affected by salinity stress in different ways. Germination and seedling growth are particularly affected with increasing salinity resulting in lower crop quantity and quality ultimately reducing total agricultural production. Moreover, the continual soil salinization process is gradually limiting the arable land for cultivation. Owing to the significant agricultural and economic importance of cotton as a major source of fiber and oil, sustainable cotton production is deemed paramount. Therefore, the growing demand for cotton requires the development of new cotton varieties with enhanced productivity in saline environments [25,26]. As such, researchers have focused on exploring the key molecular factors involved in response to salt stress in order to breed salt-tolerant cotton cultivars. Although, Pkinases from various plants have been identified and the Agronomy 2019, 9,560 3 of 26 regulatory mechanisms in plant development or responses to stress have been explored [27,28], little is known about the involvement of Pkinase genes in response to salt stress in cotton. In the present study, a total of 151 candidate Pkinase genes were identified, with 75 genes in G. hirsutum, 41 in G. raimondii and 35 in G. arboreum. Further detailed investigations such as collinearity analysis of the genetic and physical map of the At and Dt subgenome of G. hirsutum, phylogenetic tree analysis, gene structure and syntenic analysis, subcellular localization, possible functions based on gene ontology and expression analysis in different organs of G. hirsutum and G. darwinii in response to salt stress, were performed to assess the importance of these genes in cotton. These analyses can be used as an important indicator that Pkinase genes could have key roles in response to salt stress. Our findings provide a fundamental background by revealing the evolutionary history of the Pkinase gene family and presents valuable data for efficient use in breeding salt-tolerant cotton varieties.

Plant Material
A genetic map generated from the F 2 population was developed from an interspecific cross between G. hirsutum L. var. CCRI 12-4 and G. darwinii 5-7 as female and male parents, respectively [9]. The maternal parental line is currently the predominantly cultivated variety (over 90%) in China, but its production is limited due to its low tolerance to drought and salt stress [29]. G. hirsutum var. CCRI 12-4 was established by the Institute of Cotton Research, Chinese Academy of Agricultural Sciences (CAAS), Anyang, Henan Province, People's Republic of China. G. darwinii is a wild species originating from the Galapagos Islands; it is genetically close to G. barbadense and has many valuable agronomic traits such as fiber fineness, salinity and drought tolerance, verticillium and fusarium wilt resistance [9]. A genetic map was constructed using JoinMap (4.0) [30]. This genetic map was first developed using the same interspecific cross for 188 F 2 population mainly composed of expressed sequence tagged-simple sequence repeat (EST-SSR) primers accessible at Cotton Marker Database (CMD) http://www.cottonmarker.org [9]. A total of 2763 polymorphic markers were used in this study. The physical positions of 1523 markers available from the cotton functional genomics database, were downloaded from the website (https://cottonfgd.org) [31], and the collinearity analysis between the genetic and physical maps of these markers was performed using CIRCOS software (v 0.69).

Gene Mining within SSR Markers
The complete sequences of 1523 polymorphic markers were obtained from the cotton functional genomic database (cottonFGD) [31] and BLASTx with an E-value cut-off ≤ 1 × 10 −5 and identity ≥90%, which was used to find homologous genes in the genomes of G. hirsutum, G. raimondii and G. arboreum. The mining of genetic structures within the marker regions used for constructing genetic maps has been applied previously [32]. The protein kinase domain of each Pkinase gene was confirmed from Simple Modular Architecture Research Tool (SMART) and Protein Families (PFAM) databases [33,34].

Phylogenetic Analysis
The full-length amino acid sequences encoded by Pkinase genes were aligned by ClustalX program (version 2.0) with default settings and then manually adjusted in Molecular Evolutionary Genetic Analysis (MEGA v6.06) [35]. Subsequently, the evolutionary distance was inferred using the neighbor-joining (NJ) method with 1000 bootstrap replicates using the Jones-Taylor-Thornton (JTT) substitution model in MEGA 6.06 for phylogenetic tree construction. To identify the orthologous genes in cotton, the protein sequences of G. hirsutum Pkinases were subjected to a BLASTp search against the protein database of G. arboreum and G. raimondii; hits with E-values ≤1 × 10 −5 and ≥90% similarity were considered significant.

Subcellular Localization and Structure Analysis
Subcellular location prediction of G. hirsutum genes was conducted using the TargetP1.1 server (http:/www.cbs.dtu.dk/services/TargetP/) and validation for the determination of the possible cell compartmentalization of all the Pkinase proteins was done by WoLFPSORT [36]. The structure of genes was graphically visualized by the Gene Structure Display Server (GSDS) (version 2.0) [37]. Mapchart software was used to plot the gene loci on G. hirsutum, G. arboreum and G. darwinii chromosomes [38] 2.5. Gene Ontology (GO) Annotation Functional classification of genes including cellular component (CC), biological process (BP) and molecular functions (MF) was determined using the cotton functional genomic database (cottonFGD) [31]. A heat map was constructed with R statistical software package based on RNA expression data.

Syntenic Analysis and Duplication of Genes
Chromosomal distributions of Pkinase genes were mapped on cotton chromosomes by CIRCOS (Circular Genome Data Visualization) software (v 0.69) [39]. Distribution of genes on chromosomes was examined based on gene position. Homologous genes of G. hirsutum, G. raimondii and G. arboreum were identified using BLASTp with a threshold of >80% similarity and at least an 80% alignment ratio based on the protein length. All orthologous genes sequences were aligned using MEGA 6.06 software selecting clustalX with default parameters for alignment. An aligned FASTA file which resulted from MEGA 6.06 was submitted to DNaSP 5.10 (DNA Sequence Polymorphism) to conclude the non-synonymous (Ks), synonymous (Ka) and synonymous/nonsynonymous substitutions (Ka/Ks) [40]. MCScanX, a package from the Plant Genome Duplication Database (PGDD) (http://chibba.pgml.uga.edu/duplication/), was used to perform syntenic analysis of orthologous genes among the genomes of the three cotton species as described previously [41].

Prediction of miRNA Target and Transcription Factor Binding Sites (TFBS) Analysis
The microRNA (miRNA) sequences of G. hirsutum were obtained from miRBase [42], the plant microRNA database (PMRD) [43] and EST database (dbEST) [44]. Pkinase genes targeted by miRNAs were predicted by searching their coding sequence (CDS) regions for complementary sequences atcottonFGD [31], and the sequences were blasted through online software http://bioinformatics. cau.edu.cn/PMRD, to obtain miRNA sequences. In order to find the gene targeted by miRNAs, both 5 and 3 untranslated regions and CDS sequences of all the mined genes were searched for complementary sequences of the cotton miRNAs using the psRNATarget server with default parameters [45]. Additionally, for cis-element or TFBS analysis, the genomic sequences of 1.5 kb upstream of the translation start site (TSS) of each Pkinase gene was extracted from the assembly files of G. hirsutum, downloaded from CottonFGD [31]. The putative transcription response elements (TREs) of the Pkinase gene promoter regions were predicted using the online tool at the PLACE database [46].

Plant Material and Hydroponic Culture
The experiment was carried out in a controlled environment at the Institute of Cotton Research (ICR), Chinese Academy of Agricultural Sciences (CAAS) Anyang, Henan Province, China. Hydroponic culture used for this experiment was based on the method described by Oluoch [47], with three replicates. Seeds of cotton (G. hirsutum CCRI 12-4 and G. darwinii 5-7) were surface-disinfected in 0.5% sodium hypochlorite (NaClO) for 5 min and washed five times with sterile distilled water. Afterwards, healthy seeds were sorted out after seed grading and transferred to autoclaved petri plates containing sterile double-layered Whatman filter paper in an incubator at room temperature for 3 days until germinated. The filter papers were soaked in 100 mL of distilled water and retained in the incubator at 33 • C. Three similar seedlings were chosen and transferred into holes of thermo-pore sheets and fixed in the tray with the help of a soft sponge. About 7 L of half strength modified Hoagland nutrient solution was maintained in every container [48], air pumps were attached to each container for aeration in the solution to ensure proper root growth. Temperature in the greenhouse was kept between 27-30 • C with photoperiod of 14-h/10-h light/dark cycle. At the three-leaf stage, the nutrient solution was supplemented with 200 mM sodium chloride (NaCl) solution, while the control plants were treated with sterile distilled water in the same way. The samples for RNA extraction were collected at 0, 1, 3, 6 and 12 h after stress exposure from leaves, roots and stems of both stressed and control plants. All samples were directly frozen in liquid nitrogen and stored at −80 • C for extraction of RNA.

RNA Sequence and RT-qPCR Analysis
The RNA sequence data was extracted from the website at cottonFGD [31], with reference to control and salt stress expression profiles at different time intervals. SIGMA Life Science STRN50-1KT, Spectrum TM RNA kit (Sigma-aldrich, Darmstadt, Germany) was used for extraction of RNA according to the manufacturer's protocol. The concentration and quality of RNA samples were determined by NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA) and RNA at the standard spectrum of 260-280 and range of 1.85-2.0 were used for RT-qPCR analysis. Single strand cDNA was synthesized using reverse transcriptase and TranScript-All-in-One Single Strand cDNA Synthesis SuperMix (TransGen Biotech kit, Beijing, China) for RT-qPCR analysis following the manufacturer's procedure. Primers for 12 genes were designed using NCBI (National Center for Biotechnology Information) primer blast tool. Detailed information about primers is presented in Table S1. Fast real-time PCR machine (7500) with 10 µL of FastStart Universal SYBR Green Master (Rox) sigma solution (Roche Diagnotics, Mannheim, Germany) for each sample was used to perform RT-qPCR analysis. The final reaction volume was adjusted to 20 µL, containing 6 µL double distilled water, 2 µL cDNA, 10 µL SYBR green, 1 µL each of forward and reverse primer and 2 µL of Ghactin7 primer pair used as a reference. Three technical repeats were used for RT-qPCR. The expression profile of genes was calculated using the formula E = 2 −∆∆Ct .
The expression value of cotton (FPKM, fragments per kilobase per million reads), calculated from high-throughput RNA-sequencing data was downloaded from CottonFGD [31] and used to systematically analyze the expression profiling of cotton Pkinase genes in different tissues and under different stress regimes. Gene expression levels were calculated according to the log10 of FPKM values and the default empirical abundance threshold of FPKM >1 was used to identify the expressed gene.

Genetic Linkage Map Features
The genetic map used in this study was based on Chen et al. [9], consisting of 2922 markers amplifying 2763 loci plotted into 26 linkage groups corresponding to 26 chromosomes. It comprised 4176.7 cM of the genome with an average inter loci distance of 1.51 cM. In total, 636 eSSRs were distributed on the At subgenome while 683 loci were on the Dt subgenome. In addition, 715 gSSRs were distributed on the At subgenome and 729 gSSRs on the Dt subgenome. More loci were distributed on the Dt rather than the At subgenome.

Assessment of Collinearity of the Genetic and Physical Map of G. hirsutum
A total of 2763 polymorphic markers were blasted against At and Dt subgenomes of G. hirsutum and 1523 matches were obtained after removing redundant markers. Finally, 1523 markers were used for collinearity analysis between the genetic and physical map of At and Dt subgenomes of G. hirsutum. In the At subgenome, chromosomes 1-13 in the genetic map showed collinearity with At and Dt subgenomes of the physical map. In the At subgenome, chr1 homologs with A1 and chr2 with A2 in sequence, but in the Dt genome the chromosomes have random collinearity such as chr14 homologs with D1 and chr22 homologs with D4 ( Figure 1). The results indicated that most of the markers on At and Dt subgenomes of G. hirsutum exhibited good collinearity blocks (Table S2). hirsutum. In the At subgenome, chromosomes 1-13 in the genetic map showed collinearity with At and Dt subgenomes of the physical map. In the At subgenome, chr1 homologs with A1 and chr2 with A2 in sequence, but in the Dt genome the chromosomes have random collinearity such as chr14 homologs with D1 and chr22 homologs with D4 ( Figure 1). The results indicated that most of the markers on At and Dt subgenomes of G. hirsutum exhibited good collinearity blocks (Table S2).

Phylogenetic Analysis
Phylogenetic tree analysis provides valuable knowledge on the lines of evolutionary descent of different genes or proteins from a common ancestor, and a powerful tool for structure classifications, biological diversity and for providing insight into events that occurred during gene evolution [49]. A total of 1257 genes were predicted from the blast of 1523 SSR polymorphic markers in all three species. Since it was technically unfeasible to analyze all the gene domains simultaneously, the domains with the highest frequency of occurrence, termed as the dominant domains, were selected for further analysis. Based on this criterion, 151 genes belonging to the Pkinase domain were analyzed; G. raimondii, G. arboreum and G. hirsutum were found to have 41, 35 and 75 genes of the Pkinase domain (PF00069), respectively (Table S3, Figure 2a). A neighbor-joining phylogenetic tree was constructed to comprehend the evolutionary history and relationships among Pkinase gene classes in cotton species ( Figure 3).
The resulting phylogenetic tree showed that the cotton Pkinase genes tend to cluster together. Based on the clustering pattern, Pkinase genes in cotton were further classified into 13 subfamilies including probable types of serine/threonine-protein kinases (four genes), serine/threonine-protein kinase WNK11 (six genes), serine/threonine-protein kinase WNK2 (six genes), serine/threonine-protein kinase WNK1 (eight genes), serine/threonine-protein kinase WNK5 (four genes), serine/threonine-protein kinase WNK8 (eight genes), serine/threonine-protein kinase WNK4 (four genes), serine/threonine-protein kinase WNK6 (six genes) and serine/threonine-protein kinase WNK7 (three genes); as well as leucine-rich repeat receptor-like serine/threonine-protein kinase (14 genes), L-type lectin-domain containing receptor kinase (two genes), cysteine-rich receptor-like protein kinase 3 (four genes) and leaf rust resistance kinase (82 genes) (Figure 2b). Leaf rust resistance kinase (LRK) was the largest group with 82 genes from G. hirsutum (41), G. raimondii (22) and G. arboreum (19); the second largest group was LSPK (leucine-rich repeat receptor-like serine/threonine-protein kinase), with 14 genes, while the smallest number belonged to L-type lectin-domain containing receptor kinase (two genes). Several Pkinases contained other domains in addition to the kinase domain with a maximum number of five different domains coded by a single gene (Table S3). For instance, leucine-rich repeat receptor-like serine/threonine-protein kinase was

Phylogenetic Analysis
Phylogenetic tree analysis provides valuable knowledge on the lines of evolutionary descent of different genes or proteins from a common ancestor, and a powerful tool for structure classifications, biological diversity and for providing insight into events that occurred during gene evolution [49]. A total of 1257 genes were predicted from the blast of 1523 SSR polymorphic markers in all three species. Since it was technically unfeasible to analyze all the gene domains simultaneously, the domains with the highest frequency of occurrence, termed as the dominant domains, were selected for further analysis. Based on this criterion, 151 genes belonging to the Pkinase domain were analyzed; G. raimondii, G. arboreum and G. hirsutum were found to have 41, 35 and 75 genes of the Pkinase domain (PF00069), respectively (Table S3, Figure 2a). A neighbor-joining phylogenetic tree was constructed to comprehend the evolutionary history and relationships among Pkinase gene classes in cotton species ( Figure 3).
The resulting phylogenetic tree showed that the cotton Pkinase genes tend to cluster together. Based on the clustering pattern, Pkinase genes in cotton were further classified into 13 subfamilies including probable types of serine/threonine-protein kinases (four genes), serine/threonine-protein kinase WNK11 (six genes), serine/threonine-protein kinase WNK2 (six genes), serine/threonine-protein kinase WNK1 (eight genes), serine/threonine-protein kinase WNK5 (four genes), serine/threonine-protein kinase WNK8 (eight genes), serine/threonine-protein kinase WNK4 (four genes), serine/threonine-protein kinase WNK6 (six genes) and serine/threonine-protein kinase WNK7 (three genes); as well as leucine-rich repeat receptor-like serine/threonine-protein kinase (14 genes), L-type lectin-domain containing receptor kinase (two genes), cysteine-rich receptor-like protein kinase 3 (four genes) and leaf rust resistance kinase (82 genes) (Figure 2b). Leaf rust resistance kinase (LRK) was the largest group with 82 genes from G. hirsutum (41), G. raimondii (22) and G. arboreum (19); the second largest group was LSPK (leucine-rich repeat receptor-like serine/threonine-protein kinase), with 14 genes, while the smallest number belonged to L-type lectin-domain containing receptor kinase (two genes). Several Pkinases contained other domains in addition to the kinase domain with a maximum number of five different domains coded by a single gene (Table S3). For instance, leucine-rich repeat receptor-like serine/threonine-protein kinase was involved in others domains including LRRNT_2 (leucine-rich repeat N-terminal domain, PF08263), LRR_8 (leucine-rich repeat, PF13855) and LRR_1 (leucine-rich repeat, PF08263). The total number of orthologous genes among G. hirsutum, G. arboreum and G. raimondii were 92 out of 151 genes, translating to 60.92% of the mapped genes in the phylogenetic tree. Thirteen subfamilies were made into separate groups in the phylogenetic tree with each one indicated by a different color as shown in Figure 3.
Among the three Gossypium species, the sum total of Pkinase genes in G. arboreum and G. raimondii, regarded as A-genome and D-genome ancestors of the allotetraploid species, respectively, were roughly equal to G. hirsutum, which has been a consistent feature regarding the respective diploids and tetraploids genomes (Table S3). Additionally, the Pkinases localized in the At subgenome of upland cotton mainly clustered with that of G. arboreum, and those situated in the Dt subgenome clustered with the Pkinases of G. raimondii (Figure 3). These results are consistent with the hypothesis that allopolyploid cotton types arose as a result of the two diploid species reuniting geographically followed by chromosome polyploidization approximately 1-2 million years ago (MYA) [6,50], suggesting that the Pkinase gene family of allotetraploid cotton types might have expanded by hybridization and a subsequent polyploidization event.
Agronomy 2019, 9, x FOR PEER REVIEW 7 of 26 involved in others domains including LRRNT_2 (leucine-rich repeat N-terminal domain, PF08263), LRR_8 (leucine-rich repeat, PF13855) and LRR_1 (leucine-rich repeat, PF08263). The total number of orthologous genes among G. hirsutum, G. arboreum and G. raimondii were 92 out of 151 genes, translating to 60.92% of the mapped genes in the phylogenetic tree. Thirteen subfamilies were made into separate groups in the phylogenetic tree with each one indicated by a different color as shown in Figure 3. Among the three Gossypium species, the sum total of Pkinase genes in G. arboreum and G. raimondii, regarded as A-genome and D-genome ancestors of the allotetraploid species, respectively, were roughly equal to G. hirsutum, which has been a consistent feature regarding the respective diploids and tetraploids genomes (Table S3). Additionally, the Pkinases localized in the At subgenome of upland cotton mainly clustered with that of G. arboreum, and those situated in the Dt subgenome clustered with the Pkinases of G. raimondii (Figure 3). These results are consistent with the hypothesis that allopolyploid cotton types arose as a result of the two diploid species reuniting geographically followed by chromosome polyploidization approximately 1-2 million years ago (MYA) [6,50], suggesting that the Pkinase gene family of allotetraploid cotton types might have expanded by hybridization and a subsequent polyploidization event.

Chromosomal Distribution of Pkinase Genes
A blast search was conducted to identify a total of 151 Pkinase genes in all three species, using the sequences of 1523 polymorphic markers on both At and Dt subgenomes. Distribution of genes on chromosomes was determined based on their relative positions and information retrieved from reference cotton genome sequences. Chromosomal distribution was revealed using BLASTn against the three reference cotton genomes in the cotton functional genomic database [31] ( Figure S1 and Table S4). In total, 75 Pkinase genes were mapped on 21 chromosomes of G. hirsutum. Two genes could not be mapped on any chromosome, alternatively they were assumed to be located in scaffolds. The highest numbers of mapped genes were distributed on the Dt subgenome as compared to the At subgenome. More genes were distributed on the chromosomes At11 and Dt15. In G. hirsutum, At8, At10, Dt14 and Dt24 contained smaller number of genes; one in each At and Dt subgenome (Table 1).

Chromosomal Distribution of Pkinase Genes
A blast search was conducted to identify a total of 151 Pkinase genes in all three species, using the sequences of 1523 polymorphic markers on both At and Dt subgenomes. Distribution of genes on chromosomes was determined based on their relative positions and information retrieved from reference cotton genome sequences. Chromosomal distribution was revealed using BLASTn against the three reference cotton genomes in the cotton functional genomic database [31] ( Figure S1 and Table S4). In total, 75 Pkinase genes were mapped on 21 chromosomes of G. hirsutum. Two genes could not be mapped on any chromosome, alternatively they were assumed to be located in scaffolds. The highest numbers of mapped genes were distributed on the Dt subgenome as compared to the At subgenome. More genes were distributed on the chromosomes At11 and Dt15. In G. hirsutum, At8, At10, Dt14 and Dt24 contained smaller number of genes; one in each At and Dt subgenome (Table 1). In the A genome of G. arboreum, 35 genes for the Pkinase gene family were mapped and dispersed on 11 different chromosomes, one gene was observed in the scaffold. The chromosomes A01 and A11 contained the highest number of genes (seven genes) accounting for 20% of the total genes in the A genome.
In G. raimondii, 41 genes for the Pkinase gene family were mapped on 13 chromosomes of the D genome. The highest number of genes (nine genes) was found on chromosome D02 with 21.95% in the D genome (Table 1). A distinct clustering scheme was observed in particular regions of the chromosomes of the three species at the time of mapping; most of the genes were clustered on the top and lower arm regions of the chromosomes (At1, At7, At12, Dt1, Dt3, Dt16, Dt18, Dt19, Dt20, A02, A03, A07, A09, A011, D01, D03, D09, D12 and D13) while only a smaller number occurred in the middle region. Agronomy 2019, 9, x FOR PEER REVIEW 9 of 26

Structural Analysis and Localization of Genes
Analysis of the exon/intron structure of the Pkinase genes in G. hirsutum was done using the Gene Structure Display Server [37]. Gene structural diversity is regarded as a possible indicator for the evolution of multigene families [51]. To gain further information into the structural diversity of cotton Pkinase genes, the exon/intron organization in the full-length cDNAs was analyzed in comparison with their corresponding genomic DNA sequences of individual genes in G. hirsutum (Figure 4a,b), and it was found that a greater percentage of the Pkinase genes and their exons were highly conserved within the group (Table S5). Pkinase genes are generally identified with the presence of three exons and two introns. The maximum number of exons and introns were noted in Gohir.D01G048100 (21 exons, 20 introns) and Gohir.A01G061500 (20 exons, 19 introns). Interestingly, five genes had just one exon and no introns. Exons and introns for different Pkinase genes were observed to be dissimilar based on their lengths. For instance, 16 genes have similar structures having two exons and one intron. Another 20 genes have three exons and two introns; the number of exons in genes ranged between nine and one while introns ranged between zero to eight. The remarkable structural diversity suggests the divergent functions performed by this group of the protein family in upland cotton. The detailed information such as number of exons and introns, isoelectric point, grand average hydropathy, molecular weight and the length of protein in G. hirsutum is given in Table S5.
The physicochemical parameters of each Pkinase gene were calculated using the online tool, ExPASy [52]. All the Pkinases in G. hirsutum had negative grand average of hydropathy (GRAVY)

Structural Analysis and Localization of Genes
Analysis of the exon/intron structure of the Pkinase genes in G. hirsutum was done using the Gene Structure Display Server [37]. Gene structural diversity is regarded as a possible indicator for the evolution of multigene families [51]. To gain further information into the structural diversity of cotton Pkinase genes, the exon/intron organization in the full-length cDNAs was analyzed in comparison with their corresponding genomic DNA sequences of individual genes in G. hirsutum (Figure 4a,b), and it was found that a greater percentage of the Pkinase genes and their exons were highly conserved within the group (Table S5). Pkinase genes are generally identified with the presence of three exons and two introns. The maximum number of exons and introns were noted in Gohir.D01G048100 (21 exons, 20 introns) and Gohir.A01G061500 (20 exons, 19 introns). Interestingly, five genes had just one exon and no introns. Exons and introns for different Pkinase genes were observed to be dissimilar based on their lengths. For instance, 16 genes have similar structures having two exons and one intron. Another 20 genes have three exons and two introns; the number of exons in genes ranged between nine and one while introns ranged between zero to eight. The remarkable structural diversity suggests the divergent functions performed by this group of the protein family in upland cotton. The detailed information such as number of exons and introns, isoelectric point, grand average hydropathy, molecular weight and the length of protein in G. hirsutum is given in Table S5.  The physicochemical parameters of each Pkinase gene were calculated using the online tool, ExPASy [52]. All the Pkinases in G. hirsutum had negative grand average of hydropathy (GRAVY) values ranging from −0.701 (Gohir.D01G048100) to −0.015 (Gohir.D01G034300), which implied that the cotton Pkinases are hydrophobic, a common property found among the stress inductive genes, such as late embryogenesis abundant (LEA) genes with average GRAVY values lower than 0 [53]. Similarly, the molecular weights in kilodalton (kDa) varied from 22.818 (Gohir.A01G042500) to 112.001 (Gohir.D03G032800). Protein lengths ranged from 198 (Gohir.A01G042500) to 1026 amino acids (Gohir.D03G032800, Gohir.A02G147600), while the isoelectric point (PI) ranged between 4.332 (Gohir.D03G086500) and 9.373 (Gohir.D09G050600). Very low GRAVY values <0 and a high net charge on the Pkinases point to a unique feature also observed among the cyclin-dependent kinase (CDK) genes in Arabidopsis [54]. Various stress inducible proteins are commonly characterized by low hydrophobicity and a high net charge, which enables them to be totally or partially disordered; this feature gives the Pkinase proteins the ability to form flexible molecular structures such as molecular chaperones, which are central to the protection of plants from desiccation effects [55].
The online WoLFPSORT analysis indicated that Pkinase proteins are distributed in six different sites with the majority of the proteins found to be localized within the nucleus (25) or embedded in the plasma membrane (24), accounting for 33% and 32% of all the Pkinase proteins, respectively, while a lesser number were found to be localized in chloroplast (eight), vacuole (nine), cytoplasm (five) and extracellular matrix (four) (Figure 4a,b and Table S6). Subcellular location prediction was elaborated using the TargetP1.1 server and validated for possible cell compartmentalization by WoLFPSORT [36].

Gene Annotation by GO Analysis
Putative functions of 75 genes in G. hirsutum Pkinase gene family, including biological processes (BP), molecular functions (MF) and cellular components (CC), were identified using cotton functional genomics database [31]. In biological processes, the functions include protein phosphorylation

Duplication and Syntenic Analysis of Genes
Three types of duplication events are primarily responsible for the expansion of gene families, namely, tandem, segmental and whole genome duplications [56]. The genomes of the three cotton species were combined to explore relationships between gene synteny and duplication in the Pkinase family. A total of 99 genes were found to be duplicated across three cotton species subjected to analysis ( Figure 6, Table S7). Most of the gene duplication occurred between G. hirsutum and its progenitors, G. raimondii and G. arboreum, resulting in the origin of polyploid AD genome from A and D genomes and the evolution of the new species, G. hirsutum. Results indicated that mostly two types of duplication events occurred in Pkinase (tandem and segmental). Most of the duplicated genes belonged to the segmental type, implying that segmental duplication played a crucial role during the course of evolution of gene families ( Figure 6). The resultant calculation of Ka/Ks gene orthologs indicated that 92 pairs had values less than one while seven genes had more than one Ka/Ks value, which shows that genes have negative/purifying selection (Table S8). Previously, the Ka/Ks ratio of 156 paralogous pairs had values less than one and 20 pairs had more than one for the LEA family in cotton [57]. The highest Ka ratio of 0.2912 was noted for gene pairs in Dt vs. DD genome of gene pairs. The Ka value of At vs. AA ranged from 0-0.0848. The Ka/Ks value was higher for Dt vs. DD genome gene pairs (2.69014) as compared to At vs. AA and AA vs. DD ( Figure 6; Table  S8). These results are similar with previous findings for other gene families in the same cotton species, such as in the case of reactive oxygen species (ROS) gene family where Ka/Ks value of 1.8 for Dt vs. DD genome, 1.7 for At vs. AA and 1.0 for AA vs. DD genomes was observed [58].

Prediction of Transcription Factor Binding Sites
Transcription factors (TFs) temporarily and spatially regulate the transcription of their target genes through binding cis-acting regulatory elements and also acting as terminal points in the signal transduction process [59]. The cis-regulatory promoters are located upstream of genes which function as binding sites for transcription factors (TFs) that are essential in determining the tissue-specificity or stress responsive expression patterns of the genes [60]. To explore regulatory

Duplication and Syntenic Analysis of Genes
Three types of duplication events are primarily responsible for the expansion of gene families, namely, tandem, segmental and whole genome duplications [56]. The genomes of the three cotton species were combined to explore relationships between gene synteny and duplication in the Pkinase family. A total of 99 genes were found to be duplicated across three cotton species subjected to analysis ( Figure 6, Table S7). Most of the gene duplication occurred between G. hirsutum and its progenitors, G. raimondii and G. arboreum, resulting in the origin of polyploid AD genome from A and D genomes and the evolution of the new species, G. hirsutum. Results indicated that mostly two types of duplication events occurred in Pkinase (tandem and segmental). Most of the duplicated genes belonged to the segmental type, implying that segmental duplication played a crucial role during the course of evolution of gene families ( Figure 6). The resultant calculation of Ka/Ks gene orthologs indicated that 92 pairs had values less than one while seven genes had more than one Ka/Ks value, which shows that genes have negative/purifying selection (Table S8). Previously, the Ka/Ks ratio of 156 paralogous pairs had values less than one and 20 pairs had more than one for the LEA family in cotton [57]. The highest Ka ratio of 0.2912 was noted for gene pairs in Dt vs. DD genome of gene pairs. The Ka value of At vs. AA ranged from 0-0.0848. The Ka/Ks value was higher for Dt vs. DD genome gene pairs (2.69014) as compared to At vs. AA and AA vs. DD ( Figure 6; Table S8). These results are similar with previous findings for other gene families in the same cotton species, such as in the case of reactive oxygen species (ROS) gene family where Ka/Ks value of 1.8 for Dt vs. DD genome, 1.7 for At vs. AA and 1.0 for AA vs. DD genomes was observed [58].

Prediction of Transcription Factor Binding Sites
Transcription factors (TFs) temporarily and spatially regulate the transcription of their target genes through binding cis-acting regulatory elements and also acting as terminal points in the signal transduction process [59]. The cis-regulatory promoters are located upstream of genes which function as binding sites for transcription factors (TFs) that are essential in determining the tissue-specificity or stress responsive expression patterns of the genes [60]. To explore regulatory interactions between TFs and cotton Pkinase promoter elements, we collected the upstream 1.5 kb genomic sequences relative to the translation start site (TSS) of each Pkinase gene as putative promoter regions. Several stress responsive cis-acting regulatory elements were present abundantly in Pkinase genes in G. hirsutum such as ABRE (abscisic acid responsiveness), MBS (drought-inducibility), MYBS (abiotic stress tolerance), TCA element (salicylic acid responsiveness) and CCAAT-box (plant MYBHv1 binding site), among others ( Table 2). Several of the identified cis-regulatory elements such as ABRE, DRE and MYB have been previously known to be linked with top-ranked plant stress responsive genes [61]. The presence of promoter elements involved in stress response strongly suggests the possible role of upland cotton Pkinase genes in growth and development processes as well as responding to various stress stimuli in cotton. interactions between TFs and cotton Pkinase promoter elements, we collected the upstream 1.5 kb genomic sequences relative to the translation start site (TSS) of each Pkinase gene as putative promoter regions. Several stress responsive cis-acting regulatory elements were present abundantly in Pkinase genes in G. hirsutum such as ABRE (abscisic acid responsiveness), MBS (drought-inducibility), MYBS (abiotic stress tolerance), TCA element (salicylic acid responsiveness) and CCAAT-box (plant MYBHv1 binding site), among others ( Table 2). Several of the identified cis-regulatory elements such as ABRE, DRE and MYB have been previously known to be linked with top-ranked plant stress responsive genes [61]. The presence of promoter elements involved in stress response strongly suggests the possible role of upland cotton Pkinase genes in growth and development processes as well as responding to various stress stimuli in cotton.

miRNA Target Analysis of Genes
MicroRNAs (miRNAs) are small RNA molecules, which function as regulators of gene expression in a range of developmental and signaling pathways in plants and animals. A number of studies have shown that abiotic stress trigger aberrant expression of many miRNAs, thus indicating that miRNAs may be a new target for genetically improving plant tolerance to certain stresses [62]. Similarly, in cotton, G. hirsutum, a group of miRNAs have been found to target several of the identified genes, and some of them respond to salt and drought stress [63]. To preliminarily explore the miRNA-mediated posttranscriptional regulatory mechanisms of the Pkinase gene family in cotton, we searched putative target sites of cotton miRNAs in CDS sequences using the psRNATarget server. The prediction of different genes targeted by miRNAs by the use of psRNATarget has already been applied for other functional genes in cotton [45,53]. Upon examining the association of G. hirsutum genes to the known cotton ghr-miRNAs, it was found that a total of 33 genes out of the 75 were targeted by 17 ghr-miRNAs, translating to 44% of all the genes of the protein kinase group.

RNA Sequence Data of Salt Tolerant Genes
RNA sequence is an important tool for providing the cumulative role of genes. It delivers information about the richness of genes in many plant organs as well as gene expression when plants are treated with abiotic and biotic stress. RNA sequence data of 1, 3, 6 and 12 h of control and salt stress of 75 salt responsive genes were downloaded from the cotton functional genomic database.
RNA sequence data of these 75 genes was changed into log10 and used to determine the expression levels at 1, 3, 6 and 12 h in control and salt stress conditions. The genes were divided into two groups: genes in the first group were mostly upregulated, while in the second group, almost all genes were downregulated (Figure 7). Twelve genes in the first group were highly upregulated in control and salt stress conditions. These highly upregulated genes including Gohir.A10G149800, Gohir.A02G147500, Gohir.D03G032800, Gohir.D12G024300, Gohir.D12G078600, Gohir.A01G061500, Gohir.D02G038500, Gohir.A02G147600, Gohir.D11G171000, Gohir.D07G077100, Gohir.A07G072600 and Gohir.A02G031800 were selected for further validation.

RT-qPCR Analysis of the Candidate Genes under Salt Stress
We used twelve highly upregulated genes, as per the RNA sequence data, for RT-qPCR to determine the expression of genes in roots, stems and leaves at different time intervals under stress conditions (Table S1). Two parent species, a wild G. darwinii and elite cultivated G. hirsutum, were grown in a greenhouse under a controlled environment. The RT-qPCR analysis was performed on RNA samples collected from leaves, roots and stem tissues after 0, 1, 3, 6, and 12 h of exposure to stress treatment. The results presented indicate that genes were differentially expressed in different tissues under 200 mM salt stress conditions (Figure 8a,b).
In G. hirsutum, gene expression profiles were clustered into two groups. Group 1 contained three genes which were upregulated in all tissues except stem, after 3 and 12 h of salt stress, while high levels of expression were observed after 6 and 12 h in roots and after 3 h in stem tissue. Group 2 contained nine genes, in which seven genes were differentially expressed and two genes were downregulated in all tissues (Figure 8a).
Similarly, in G. darwinii, the genes expressed in response to salt were also patterned into two groups: group 1 has four genes, most of which were upregulated in all tissues at 12 h stress exposure except Gohir.D03G032800, which was only partially expressed. Group 2 consists of eight genes, in which five genes were differentially expressed at 0, 1, 3, 6 and 12 h after salt treatment (Figure 8b). Three genes showed downregulated expression in all tissues. The results indicated that these genes were directly or indirectly involved in salt stress with major effects on leaves and roots (Figure 8b).
Gohir.D12G024300 and Gohir.D12G078600 were highly upregulated genes in the roots of both species after 12 h of salt treatment, which exhibited their role in root development in salt tolerance. The Gohir.A02G147600 and Gohir.A02G147500 were highly upregulated in leaves and stem tissue in both species, respectively. Furthermore, the two highly upregulated genes in G. darwinii, Gohir.A02G147600 and Gohir.D03G032800 showed the reverse pattern in G. hirsutum. Most of the genes had higher expression levels in roots of G. darwinii. This is comprehensible because the first part of the plant that is affected by stress is the root. In G. hirsutum, most of the genes were observed with high expression in leaves. In G. darwinii, more genes were upregulated as compared to G. hirsutum, which is an indication that the wild variety has enhanced salt stress tolerance in contrast to its cultivated relative. Similar results have been previously reported for Fructose-1, 6-bisphosphate Aldolase (FBA) genes with a higher expression of genes in G. darwinii as compared to G. hirsutum [64].
Gohir.D12G024300 and Gohir.D12G078600 were highly upregulated genes in the roots of both species after 12 h of salt treatment, which exhibited their role in root development in salt tolerance. The Gohir.A02G147600 and Gohir.A02G147500 were highly upregulated in leaves and stem tissue in both species, respectively. Furthermore, the two highly upregulated genes in G. darwinii, Gohir.A02G147600 and Gohir.D03G032800 showed the reverse pattern in G. hirsutum. Most of the genes had higher expression levels in roots of G. darwinii. This is comprehensible because the first part of the plant that is affected by stress is the root. In G. hirsutum, most of the genes were observed with high expression in leaves. In G. darwinii, more genes were upregulated as compared to G. hirsutum, which is an indication that the wild variety has enhanced salt stress tolerance in contrast to its cultivated relative. Similar results have been previously reported for Fructose-1, 6-bisphosphate Aldolase (FBA) genes with a higher expression of genes in G. darwinii as compared to G. hirsutum [64].

Discussion
There has been a decline in cotton production due to the adverse effects of various abiotic and biotic stresses which have been exasperated by the narrow genetic base of elite cotton cultivars [65]. Wild cotton progenitors provide an important reservoir of agronomic characters that can be introgressed into cultivated varieties to enhance their performance in the face of different biotic and abiotic stress epidemics in cotton plants [66]. The application of simple sequence repeat (SSR) markers in mapping F2 population developed from two wild cotton species of the D genome, has led to the identification of key genes such as the NAC (NAM, ATAF1/2 and CUC2) genes, which have

Discussion
There has been a decline in cotton production due to the adverse effects of various abiotic and biotic stresses which have been exasperated by the narrow genetic base of elite cotton cultivars [65]. Wild cotton progenitors provide an important reservoir of agronomic characters that can be introgressed into cultivated varieties to enhance their performance in the face of different biotic and abiotic stress epidemics in cotton plants [66]. The application of simple sequence repeat (SSR) markers in mapping F 2 population developed from two wild cotton species of the D genome, has led to the identification of key genes such as the NAC (NAM, ATAF1/2 and CUC2) genes, which have previously been implicated in alleviating the various stress effects in plants [66]. Therefore, the detection of myriad genes of the protein kinase family, based on already developed highly dense genetic maps in G. hirsutum, G. arboreum and G. raimondii, could offer a better alternative in solving the problem of salt stress in cotton breeding approaches. In the present study, we identified 151 Pkinase genes in three cotton species, G. raimondii, G. arboreum and G. hirsutum, which were found to have 41, 35 and 75 genes, respectively. The number identified in upland cotton (G. hirsutum) was relatively higher (more than double) than in G. arboreum and G. raimondii, which indicates that whole genome duplication (WGD) or polyploidy events played a major role in Pkinase gene family expansion [67]. Allotetraploid cotton possibly emerged from the polyploidization of the A and D genomes, suggesting that the expansion of Pkinase family genes from diploid to allotetraploid cotton is mainly a consequence of gene duplication and the conservation of the Pkinase genes during the polyploidization process. This indicates the significant role played by this group of genes in the process of plant growth and development [68]. Similar findings have been reported in the case of the FBA gene family with 19, 9 and 9 genes and CDK family with 31, 15 and 12 genes in G. hirsutum, G. raimondii and G. arboreum, respectively [64,69].
As a consequence of the selection pressure imposed by exposure to extreme environmental conditions, plants undergo drastic transformation by modulating stress regulatory responses and sensing mechanisms. One of the significant strategies employed by plants to combat stress response is gene duplication [70]. Gene duplication is a major feature of genomic architecture, with a central role in plant genomic evolution by acting as a source of new genetic materials for genetic drift, mutation and selection, expanding the existing gene pool that enable plants to gain new gene functions and explore novel gene networks [71]. Thus, gene duplication mechanism not only leads to the expansion of genomic content but assists in the diversification of gene function to ensure adequate adaptability to changing environments [56]. In this study, the majority of the genes were found to be duplicated segmentally. Segmental duplication is known to be dominant over the tandem duplication in the process of gene evolution. For instance, large-scale segmental duplications were uncovered upon examining the complete sequence of A. thaliana genome [72]. Similarly, in Arabidopsis, 22 cell cycle genes belonging to the CDK family were found to have emerged through segmental duplication [73]. The role of segmental duplicated genes in relation to drought stress has been reported, such as, the expansion of Hsf genes in sesame was found to have occurred through segmental duplication [74]. Moreover, the tetraploid cotton G. hirsutum, has gone through whole genome duplication and allopolyploidization events during the course of evolution [5,75]. Therefore, this process is significant for the ability of the plants to sense and respond in new ways to abiotic stimuli which is critical for plant survival [76]. Tetraploidization of G. hirsutum occurred about 1.5 MYA and corresponds to a Ks peak [5]. In our study, we also identified G. hirsutum Pkinase orthologs Ks peaks (0-0.6) ( Table S8), indicating that the tetraploidization event of G. hirsutum was important for Pkinase expansion. We also identified several PK collinearities among G. raimondii (D), G. arboreum (A), and G. hirsutum (At, Dt) ( Figure 6).
Phylogenetic tree analyses provide valuable information on the patterns of gene evolution by discerning the evolutionary descent of different genes/proteins from a common ancestor. It provides evidence of the contribution of whole genome duplication on the excess of upland cotton Pkinase genes as compared to its parental donors. The genomes of G. arboreum and G. raimondii contained 35 and 41 genes in comparison to the 75 genes in G. hirsutum. These results showed that the gene number and phylogeny of Pkinases were consistent with the genome paleo-history, which suggested that the Pkinase gene family expanded and plant genomes became more complex as result of gene expansion during evolution. Briefly, it can be inferred that Gossypium Pkinases descended from one common ancestor, which expanded in diploid cotton types by a duplication event after splitting from the cacao lineage about 16.6 MYA, followed by a reunion and polyploidization of the diploid cotton species forming the allopolyploid cotton approximately 1-2 MYA, combining and expanding the Pkinase gene family along with A and D genome [77]. Moreover, the gene distribution across the whole cotton genome was mostly limited to the chromosome arms, while some of the chromosomes were found to lack the genes altogether. Pkinase genes were mapped on 21, 11 and 12 chromosomes across the three cotton species (Table 1). It has been observed previously that this type of expansion can be ascribed to large segmental duplications that have been subjected to scrambling by chromosomal rearrangements in CDK genes in cotton, tomato and Arabidopsis [69,78].
The structural analysis of the Pkinase genes in cotton revealed a variation in terms of the exon/ intron ratios, the number of exons in genes ranged between nine and one while introns ranged between zero to eight. Interestingly, out of 75 genes, five had one exon and no intron. Our results are consistent with previous findings about the reduced intron numbers in stress responsive genes, such as the trehalose-6-phosphate synthase gene family which plays an important role in abiotic stress and metabolic regulation [79]. Similarly, about 45% and 41% of genes in the F-box subfamily are speculated to be intronless in Arabidopsis and Oryza sativa, respectively. In Arabidopsis and O. sativa, the genes without introns accounts for 20.7% and 19.9% of the whole genomes, respectively [80,81]. Gene integration and reverse transcription might be responsible for the loss of introns from genes in plants [82]. The presence of introns has been presumed to cause considerable burden in terms of transcription efficiency since intron splicing requires a large molecular complex called the spliceosome composed of five sRNAs and approximately 150 proteins; the expression and assembly of these components costs the cell substantial amounts of time and energy [83,84]. Therefore, intronless genes are known to be effective in transcription initiation and elongation as compared to spliced genes [85]. The grand average hydropathy value of all genes was less than zero, indicating that they were hydrophilic in nature. Hydrophilic protein-encoding genes have a high solubility, hence these proteins may have a significant role in desiccation tolerance [68]. The variation in Pkinases reflects their diverse cellular functions and is consistent with previous reports on other species [86,87].
GO annotation is a useful tool to classify genes based on their functions in cells including BP, MF, and CF [88]. The biological process, molecular function and cellular function are basic characteristics, which allow a basic understanding of the diverse molecular role of genes. According to GO annotations, the 75 genes of upland cotton (G. hirsutum) were classified into biological process and molecular function. In biological processes, various genes have direct correlation to stress factors such as protein phosphorylation (GO:0006468), intracellular signal transduction (GO:0035556) and abscisic acid activated signaling pathway (GO:0009738). In higher organisms, protein kinases are important elements playing key roles in signal transduction in response to metabolic changes and during biotic and abiotic stress regimes [89]. In molecular functions, the functions include protein kinase activity (GO:0004672), ATP binding (GO:0005524), polysaccharide binding (GO:0030247), protein binding (GO:0005515) and carbohydrate binding (GO:0030246) ( Figure 5). The molecular functions identified encompass various molecular strategies adopted by the plants in order to survive and tolerate various stress factors. The majority of the molecular functions are important components of the signaling pathways; for instance, protein tyrosine kinase has been reported to be involved in the abscisic acid signaling pathway [90]. Protein kinases have been described to regulate responses of plants to salt and osmotic stresses by modulating the calcium signaling pathway. For instance, following perception of salt stress on the plasma membrane, Ca 2+ signals are generated, which are perceived by SOS3 kinase, the binding of calcium and myristoylation of SOS3 activates its function and then, SOS3 activates the SOS2 kinase; the subsequently activated SOS2 kinase phosphorylates the SOS1 Na + /H + antiporter, which pumps Na + out of the cytosol [23].
In carrying out a detailed study of the dominant domain of Pkinase genes, obtained from the genetic map of a wild parent, cis-element, miRNA analysis and GO functional annotations revealed the fact that Pkinase genes could have a significant role in stress within plants. Gene promoters, also termed as cis-element or transcription-factors binding sites, play various key roles in the transcriptional regulation of genes controlling several abiotic stress and plant hormone responses. Phytohormones improve plant adaptability to changing environments and adverse conditions like stress. Many abiotic stress-related and plant hormones-related cis-elements, including TCA-elements, W-Box, DRE, MBS and ABRE have been identified previously [60,91]. These stress-responsive elements were relatively abundant in the promoters of the upland cotton Pkinase genes, specifically MYBS, indicating an important functional role of these elements in salt stress tolerance in G. hirsutum. In total, 55 genes were found to be regulated by MYBS. MYBS is well-studied cis-acting promoter element with a key role in the abscisic acid-dependent signaling pathway in response to drought, salt and cold [92]. ABRE is also an important cis-element that plays a key role in abscisic acid signaling in response to abiotic stress [60]. In Arabidopsis, the role of cis-acting regulatory elements (CARE) to enhance tolerance against high salt, drought and cold stress has been reported previously [93].
The small RNAs are a diverse class of non-coding regulatory RNAs with important functions by regulating gene expression through targeting the RNA transcripts of endogenous as well as exogenous genes involved in stress response [94,95]. Growing evidence reveals that miRNA-guided regulation of ROS-related genes at the post-transcriptional level is essential for normal growth and development [96], and adaptation to stress conditions including salinity [97,98]. However, reports on Pkinase gene expression and regulation mediated by miRNAs under salt stress has not been carried out in detail. In our study, we predicted the miRNA-mediated posttranscriptional regulation of pkinases and found some putative targets of cotton miRNAs (Table S9). Various cotton miRNAs such as miR162, miR172, miR396 and miR156, among others, detected in this study, have been found to be associated to some of the top ranked genes related to drought and salinity, for example, NAC, MYB, and MAPK (mitogen-activated protein kinase) [99]. The miRNA miR482, has been known to play a functional role in the regulation of nucleotide binding site-leucine-rich repeat (NBS-LRR) defense genes during fungal pathogen infection [100]. The ghr-miR395a/b is also likely to be involved in sulfur metabolism by regulating sulphate adenylyl transferase in cotton. Thus, these stress-related miRNAs and their targets might also play roles in response to drought and salinity stress [63]. Similarly, a recent study on small RNA sequencing revealed that salinity and drought stress interrupt miR156 expression, indicating a novel role for miR156 in response to salinity and drought stress [63]. Furthermore, the miR399 was involved in regulating phosphate homeostasis in Arabidopsis after exposure to both NaCl and polyethylene glycol (PEG) [101]. This indicates the essential roles for miRNAs in maintaining the target gene expression under stress, which are directly or indirectly involved in salt stress tolerance mechanism in plants.
Expression analysis is an imperative tool that provides functional information about genes. High expression levels for tested genes were observed among different tissues of plant as depicted by the expression pattern in the heat map (Figure 8a,b). Results of RT-qPCR revealed that the 12 key genes have better expression levels in G. darwinii as compared to G. hirsutum as per the log 10 (FPKM) range. Gohir.A02G147600 and Gohir.D03G032800 were highly upregulated in G. darwinii while downregulated in G. hirsutum. The genes exhibited a significant upregulation in the roots as compared to leaf and stem tissues, which indicated the functional conservation of the subfamily. Gohir.A02G147600 and Gohir.D03G032800 belong to leucine-rich repeat receptor-like serine/threonine-protein kinase BAM1. Leucine-rich repeats receptor-like kinases (LRR-RLKs) play important roles in plant growth and development as well as in the signal perception and transduction to abiotic stress responses [102]. It is well known that LRR-RLKs are important regulators of plant response to salt stress [103][104][105]. For instance, the Arabidopsis LRR-II type RLK genes were induced by many environmental stresses, such as cold, osmotic stress, auxin and abscisic acid, suggesting that they may participate in the general abiotic stress response [106]. The expression of rice LRR-RLKs (OsSIK1, OsGIRL1, OsLP2) were regulated by salt, drought, abscisic acid, salicylic acid, jasmonic acid and H 2 O 2 stresses, indicating that rice LRR-RLKs might be involved in multiple signaling pathways regulating developmental and stress processes [77,104,107]. In the Antarctic moss, Pohlia nutans, expression of PnLRR-RLK27 significantly increased the transcript levels of the major transcription factors (ABF3, DREAB2A and MYB2) and the stress-related genes (AtRD22, AtRD29A, AtRD29B, AtKIN1 and AtCOR47) after salt treatment. Similarly, cotton Pkinases GhMKK1, GhMKK3 and GhMKK5 are shown to be involved in drought and salinity stress resistance [108,109]. These results validated that these salt-tolerant genes most likely came from the tolerant parent (G. darwinii). The variation in expression between G. hirsutum and G. darwinii could be broadly based on changes in environmental conditions; G. darwinii exhibits divergence signals that are associated with directionally selected traits and are functionally related to stress responses. These results suggest that stress adaptation in G. darwinii might have involved the evolution of salt-tolerant genes that can be introgressed into elite upland cotton, in order to boost their performance for breeding of salt-resistant cotton varieties in the future.

Conclusions
The use of genetic maps has become increasingly significant in understanding marker-assisted selection, gene cloning and breeding, yet intensive analysis of genes within the SSR regions has not been extensively studied thoroughly. In the current study, a genetic map of G. hirsutum/G. darwinii F 2 population was used for identification of candidate genes for salt stress tolerance. We analyzed 75, 35 and 41 genes of dominant domain, Pkinase, in G. hirsutum, G. arboreum and G. raimondii, respectively. The Pkinase gene family was categorized into 13 subfamilies based on the phylogenetic tree relationship. Chromosomal mapping and syntenic analysis revealed that genes were dispersed randomly on all chromosomes of the three species, with gene clusters located on the upper or lower arms of the chromosomes indicating that segmental duplication played a major role in expansion of the Pkinase family. The majority of the genes had lower Ka/Ks values indicating that purified selection worked on most of the duplicated gene pairs with only a minimal number of genes going through negative selection. Two genes, Gohir.A02G147600 and Gohir.D03G032800, have higher expression levels in roots of G. darwinii species when exposed to salt stress pointing to the fact that these salt-tolerant genes were most likely derived from G. darwinii. The analysis of these genes provides a strong indication that genes within the Pkinase subfamily might play a significant role in combating salt stress tolerance, based on the expression pattern, miRNA, cis-element analysis and GO functional annotations. This study provides a solid foundation for further in-depth evaluation of these genes in understanding the specific role of these genes in cotton in relation salt stress tolerance and paves the way for breeding elite cotton cultivars with potential resistance to abiotic stresses.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4395/9/9/560/s1, Figure S1(a): Gene distribution in tetraploid upland cotton (G. hirsutum) chromosomes; Figure S1(b): Gene distribution in A genome of G. arboreum and D genome of G. raimondii chromosomes; Table S1: PCR primers; Table S2: Markers inconformity between genetic map and the physical map of At and Dt subgenomes of G. hirsutum; Table S3: Protein kinase genes identified in G. hirsutum, G. raimondii and G. arboreum; Table S4: Genomic locations of genes in G. raimondii, G. arboreum and G. hirsutum; Table S5: The structural analysis of Gh Pkinase; Table S6: Subcellular localization of genes; Table S7: Syntenic analysis of homologous genes among G. hirsutum, G. arboreum and G. raimondii; Table S8: Determined Ka, Ks and Ka/Ks values for orthologous Pkinase gene pairs in G. hirsutum, G. arboreum and G. raimondii genomes; Table S9: miRNA targets of genes.

Acknowledgments:
We express gratitude to all of our team and lab fellows who have helped us in our research experiment.

Conflicts of Interest:
All authors declare no conflicts of interest.