Genome-Wide Identification and Evolutionary Analysis of Sarcocystis neurona Protein Kinases

The apicomplexan parasite Sarcocystis neurona causes equine protozoal myeloencephalitis (EPM), a degenerative neurological disease of horses. Due to its host range expansion, S. neurona is an emerging threat that requires close monitoring. In apicomplexans, protein kinases (PKs) have been implicated in a myriad of critical functions, such as host cell invasion, cell cycle progression and host immune response evasion. Here, we used various bioinformatics methods to define the kinome of S. neurona and phylogenetic relatedness of its PKs to other apicomplexans. We identified 97 putative PKs clustering within the various eukaryotic kinase groups. Although containing the universally-conserved PKA (AGC group), S. neurona kinome was devoid of PKB and PKC. Moreover, the kinome contains the six-conserved apicomplexan CDPKs (CAMK group). Several OPK atypical kinases, including ROPKs 19A, 27, 30, 33, 35 and 37 were identified. Notably, S. neurona is devoid of the virulence-associated ROPKs 5, 6, 18 and 38, as well as the Alpha and RIO kinases. Two out of the three S. neurona CK1 enzymes had high sequence similarities to Toxoplasma gondii TgCK1-α and TgCK1-β and the Plasmodium PfCK1. Further experimental studies on the S. neurona putative PKs identified in this study are required to validate the functional roles of the PKs and to understand their involvement in mechanisms that regulate various cellular processes and host-parasite interactions. Given the essentiality of apicomplexan PKs in the survival of apicomplexans, the current study offers a platform for future development of novel therapeutics for EPM, for instance via application of PK inhibitors to block parasite invasion and development in their host.


Introduction
Equine protozoal myeloencephalitis (EPM) is an infectious, progressive, degenerative neurological disease of horses caused by the apicomplexan parasite, Sarcocystis neurona [1]. To complete its life cycle, this heteroxenous parasite requires a reservoir host (i.e., opossums; Didelphis virginiana, Didelphis albiventris) and an aberrant (horses) or intermediate host (cats, skunks, raccoons and sea Zotters) [2]. Opossums become infected upon ingestion of sarcocysts containing hundreds of bradyzoites. The bradyzoites undergo gametogony and sporulate into mature oocysts that are then shed in the feces. After ingestion by the intermediate or aberrant hosts, the oocysts transform into the environmentally-resistant sporozoites that chronically parasitize the neural and inflammatory cells of the host's central nervous system (CNS). Clinical EPM symptoms depend on the part of the CNS that is parasitized and in general results in abnormal gait, dysphagia and muscle atrophy in affected horses [3].
The intracellular nature of S. neurona and its ability to evade the host's immune surveillance [4] makes EPM treatment expensive, lengthy and challenging. Traditionally, clinical treatment of EPM

Sarcocystis neurona Encodes 97 Putative Kinases
To date, at least 15 apicomplexan genomes (coccidians, gregarines, hemosporidians and piroplasmids) have either been fully sequenced or partially annotated [24]. In the current study, we conducted an exhaustive genome-wide search of the newly-sequenced S. neurona genome [35], and identified 97 putative PKs ( Table 1). The identified PKs contained the characteristic PK (IPR000719) or PK-like (IPR011009) domains and three conserved amino acids constituting the catalytic triad (Lys30, Asp125, Asp143). The PKs had sizes ranging between 152 and 6544 amino acids and relative molecular weights of between 15.94 and 671.51 kDa. The majority of the PKs had an isoelectric point (pI) greater than 7.0, implying that the PKs have low turnover rates, since in general, acidic proteins are thought to be degraded more rapidly than neutral or basic proteins [36].

The AGC Group
The numbers of apicomplexan AGCs range from four (in Babesia bovis) to 15 (in T. gondii) [17]. Based on our Blast2GO annotations and BLASTp homology searches against the kinome database, five out of the nine S. neurona AGCs (SRCN_3339, SRCN_3990, SRCN_5165, SRCN_5610 and SRCN_1312) were homologs to the universally-conserved PKAs that are found in N. caninum and T. gondii (see Table 1). The PKAs are essential for the completion of schizogony (asexual reproduction) in Plasmodium parasites [40]. Further, S. neurona contains a putative PKG (SRCN_4518), which shows high homology (92%) to the T. gondii TgPKG1 (Table 1); PKGs are essential in apicomplexans [41]. Table 1. Description of the 97 putative protein kinases (PKs) identified in the kinome of Sarcocystis neurona. The putative PKs could be classified into eight groups. The amino acid coordinates of the conserved PK domains in the protein sequences and the PK homologies to other apicomplexan PKs are shown in Columns 7-12. CAMKs form the second-largest apicomplexan PKs (after OPKs). Apicomplexan kinomes constitute varying numbers of CAMKs, which range from seven (in B. bovis) to 29 (in T. gondii) [17]. The most important CAMK family is the CDPK, which appeared to constitute almost 50% of S. neurona putative CAMKs (see Table 1). In terms of homologies, the S. neurona kinome contained orthologs to the T. gondii CDPK1 (SCRN_3314), CDPK2B (SCRN_2165), CDPK3 (SCRN_3701), CDPK4 (SCRN_6606), CDPK5 (SCRN_3583), CDPK6 (SCRN_3011), CDPK7 (SCRN_6597) and CDPK8 (SCRN_5948). Other CDPK orthologs were to the N. caninum CDPK2 (SCRN_4390) and Hammondia hammondi CDPK9 (SCRN_5812) ( Table 1). Inhibition of TgCDPK1 has been shown to disrupt the motility, host cell invasion and egress of T. gondii [43]. Owing to the absence of mammalian CDPK homologs, the identification of a relatively large number of CDPK homologs in S. neurona could be utilized in the rational design of anti-parasitic therapeutics.

The CMGC Group
The CMGC is the largest PK group in apicomplexans; CMGC numbers range from 15 in B. bovis to 23 in Plasmodium vivax [17], which is within the range we identified in the S. neurona kinome in our study (i.e., 19 CMGCs; see Table 1). Notable of these were the two GSK homologs (SRCN_1731 and SRCN_1732). This finding is similar to what has been observed in Plasmodium parasites in which two GSK-3 enzymes have been reported, both of which are essential for the parasite [46]. Homology searches showed considerable sequence similarity (51% and 41% for SRCN_1731 and SRCN_1732, respectively) to the PfGSK-3 enzymes (data not shown). Notably, eight of the 19 CMGCs in S. neurona were CDKs, including CDK7 (SRCN_4674, SRCN_2759 and SRCN_761), CDK10 (SRCN_895) and CDK11 (SRCN_977). Available data show that CDKs are essential in P. falciparum [24]. We also identified two putative MAPK homologs (SRCN_4209 and SRCN_5365) and ERK7 (SRCN_6472) (see Table 1), a result that is comparable to the two MAPKs in the kinome of P. falciparum [17].

The OPK Group
The apicomplexan-specific OPKs are a tight cluster of PKs without clear relation to any of the other major PK groups. Notable of these are ROPKs, which have high sequence divergence and have been thought to be largely restricted to T. gondii [47], which has a total of 34 members spread in over 40 distinct sub-families [23]. Although their diversification in apicomplexans is poorly understood, some ROPKs are key virulence factors in T. gondii [23]. At least nine putative ROPKs could be identified in S. neurona, including ROPK19A (SRCN_6184), ROP27 (SRCN_3247), ROP30 (SRCN_2076), ROP33 (SRCN_7082 and SRCN_7086), ROP35 (SRCN_2183, SRCN_2123, SRCN_7083 and SRCN_4410) and ROP37 (SRCN_7084), implying that the ROPKs are not restricted to T. gondii. Although largely presumed to be inactive, ROPKs are implicated in the regulation of the host transcription [47], and their presence in S. neurona may support the hypothesis that the ROPKs have a unique activation mechanisms in their regulatory functions that facilitate apicomplexan pathogenesis [24,48]. Other notable OPKs included two parasite-specific eukaryotic initiation factor-2 (elF2) kinases (elF2K-C (SRCN_1606) and elF2K-B (SRCN_4503)), four NEKs (SRCN_4528, SRCN_2630, SRCN_286 and SRCN_3151) and four ULKs (SRCN_3444, SRCN_3669, SRCN_6812 and SRCN_6157) ( Table 1). The elF2Ks are conserved in apicomplexans and are important for the induction of parasite differentiation into the bradyzoites cysts, which are clinically important [34].

The STE Group
The STEs are poorly represented in apicomplexans, and although most apicomplexans have one or two STE genes per genome, some parasites, such as C. parvum, are reported to harbor up to six STEs [17,20]. Our results suggest that S. neurona has at least one putative STE (Table 1). STEs are thought to function in MAPK pathway cascades despite the fact that this pathway is absent in apicomplexans. The small repertoire of apicomplexan STEs is in contrast to that reported in other parasites, such as trypanosomatids, in which these enzymes regulate the length of the flagella [49].

The TKL Group
Apicomplexans harbor a maximum of seven TKL-coding genes, which makes it notable that we identified six putative TKLs in S. neurona (Table 1). Reverse genetics studies have demonstrated that some of the conserved TKLs, for instance PfTKL3, are essential for the asexual Plasmodium proliferation [27], thereby a potential drug target. Two of the six S. neurona putative TKLs had considerable sequence similarities to the Plasmodium TKLs, including SRCN_3466 (36% similar to Plasmodiuim malariae TKL1) and SCRN_1435 (49% similar to Plasmodium ovale TKL3) (data not shown).

The aPK Group
The aPKs have been detected in apicomplexan parasites, such as P. falciparum [17,18] and T. gondii, which has at least four genes thought to encode these enzymes, the products of which are hypothesized to be part of the ovoid mitochondrial cytoplasmic (OMC) complex [50], a composite assembly of organelles observed only in growing tachyzoites of T. gondii. An exhaustive search of the S. neurona proteome revealed four putative PIKKs (SRCN_3988, SRCN_6464, SRCN_6465, SRCN_1259) and one PDHK (SRCN_1743) ( Table 1). Whereas PIKKs have been identified in at least 12 apicomplexan kinomes, PDHK seem to have been identified only in the T. gondii kinome [17]. Our analyses of the putative S. neurona PKs did not yield any homologs of the Alpha and RIO kinases, implying that these PKs are absent from the kinome of this parasite; RIO kinases have been reported in P. falciparum [17,18], as well as in the kinomes of other apicomplexans including C. parvum, T. gondii and B. bovis [17].

Evolution of S. neurona Protein Kinases
We investigated the evolutionary relationships among the various S. neurona PK groups and their homologs in related apicomplexans. Our analysis revealed valuable insights into the biology of these organisms. The kinome of S. neurona is comprised of slightly fewer AGCs (n = 9) compared to the kinomes of T. gondii (n = 11), N. caninum (n = 13) and H. hammondi (n = 15). In general, the phylogenetic clustering of the S. neurona AGCs mirrored the homologies of these enzymes to those of the three apicomplexans used in this study ( Figure 1; compare with Table 1). Sequence analysis of S. neurona AGCs revealed significant divergence with only~30% sequence similarity amongst members of this group. Two S. neurona AGCs SRCN_5610 (SnPKA1) and SRCN_3990 (SnPKA2) clearly cluster with T. gondii PKAs TGME49_028420 and TGME49_015670 [51] (Figure 1). Moreover, SRCN_5610 shares high (~60%) full length sequence identity with its ortholog, TgPKA1. It is also notable that the single putative PKG (SRCN_4518) distinctly clustered with its T. gondii ortholog, TGME49_111360 (TgPKG) (Figure 1). It has recently been shown that P. falciparum PKG acts as a signaling hub that plays a central role in a number of core parasite processes [52]. In addition to the kinase domain, SnPKA1, SRCN_3339, SRCN_5165 and SRCN_4518 possess the AGC-kinase C-terminal domain, which contains two of the three conserved phosphorylation sites in AGCs (data not shown). These conserved sites serve as phosphorylation-regulated switches in the control of both intra-and inter-molecular interactions [53]. Like T. gondii, S. neurona lacks PKB and PKC. However, S. neurona contains a putative phosphoinositide-dependent kinase-1, PDPK1 (SRCN_1312), that clusters with the T. gondii PDPK1 (TGME49_268210) [51].
Despite the absence of PKC in S. neurona, CAMK family members were identified, which perhaps underscores the importance of Ca 2+ regulation in this apicomplexan parasite. The majority of the identified S. neurona CAMKs segregated with their orthologs in T. gondii, N. caninum and H. hammondi in clades with robust bootstraps (Figure 2), thus validating the annotation of the CAMKs. Amongst the CAMKs, SRCN_2544 clustered with T. gondii PK1 (TGME049_243500) of the AMPK/SNF1 sub-family. There were also three additional SNF1 members in S. neurona (SRCN_5410, In addition to the kinase domain, SnPKA1, SRCN_3339, SRCN_5165 and SRCN_4518 possess the AGC-kinase C-terminal domain, which contains two of the three conserved phosphorylation sites in AGCs (data not shown). These conserved sites serve as phosphorylation-regulated switches in the control of both intra-and inter-molecular interactions [53]. Like T. gondii, S. neurona lacks PKB and PKC. However, S. neurona contains a putative phosphoinositide-dependent kinase-1, PDPK1 (SRCN_1312), that clusters with the T. gondii PDPK1 (TGME49_268210) [51].
TGME49_040390 that is annotated as a CDPK and SRCN_4076 that clusters with TGME49_106480, also annotated as a CDPK (Figure 2).
Moreover, SRCN_3142 (SnPIK3R4) segregates with T. gondii TgPIK3R4 (TGME49_018550). NEKs are involved in cell cycle regulation, while Aurora kinases play pivotal roles in endodyogeny, duplication rate and parasite virulence [33]. Taken together, the presence of a variety of ROPKs in S. neurona is interesting given the fact that in T. gondii, ROPKs are key virulence factors [63].

Discussion
The kinomes of apicomplexans range from 35 PKs (in B. bovis) to 135 PKs (in T. gondii) [24]. We identified a total of 97 putative PKs in the kinome of S. neurona, compared to the PKs reported in the kinomes of P. falciparum (n = 99), T. gondii (n = 135), N. caninum (n = 130) and H. hammondi (n = 124) [17,19]. Although the total number of S. neurona PKs appeared markedly reduced compared to that of its close coccidian relatives (T. gondii and N. caninum [23]), taken as a percentage of total genome size, the proportion of S. neurona PKs is comparable to the 2% observed in humans [13] and other coccidians [23]. The contraction of the S. neurona kinome could be attributed to genome compaction, which occasionally offsets lineage-specific expansions of specific gene families. Notably, genome contraction is a common mode of genomic evolution in intracellular parasites, including apicomplexans [64,65]. As such, the evolution of PKs may be in tandem to the overall genomic adaptive strategies of these parasites.
Using a hierarchical scheme based on the major PK groups, the S. neurona kinases could be classified and phylogenetically clustered into the various PK families. A complement of nine putative AGC kinases was identified in S. neurona, which is reduced compared with that of T. gondii, N. caninum and H. hammondi. Despite this potential gene loss, seven of the nine AGCs (SRCN_5165, SRCN_5610, SRCN_3339, SRCN_4249, SRCN_3990, SRCN_5430 and SRCN_1312) had orthologs in T. gondii, N. caninum and H. hammondi. In agreement with the observation that PKA is conserved in apicomplexans [23], two PKAs (SRCN_5610 and SRCN_3990) were identified in S. neurona. In T. gondii, increases in cytosolic cAMP levels activate PKA to trigger the developmental switch from the rapidly proliferating tachyzoites to the quiescent bradyzoites [66]. Additionally, two other S. neurona AGCs (SRCN_5165 and SRCN_3339) were putative PKAs given that they contained the characteristic GxGxxG motif found in PKA [51]. Notably, based on orthology, S. neurona contains a single putative PKG (SRCN_4518) that distinctly clustered with T. gondii PKG (TGME49_111360).

Conclusions and Future Perspectives
The kinome of S. neurona contains members of the major classes of PKs, including AGC, CMGC, GSK, CAMK, CK, TKL, aPKs and several PKs in the OPK family. Similar to other apicomplexans, S. neurona kinome is devoid of PKC, the TKs, Alpha kinases, as well as RIO kinases. Further, the S. neurona kinome harbors two putative MAPK homologs, a finding that is similar to some apicomplexans, such as P. falciparum. S. neurona kinome also lacks some of the ROPKs that have been implicated in the virulence of T. gondii. Given the central roles played by PKs in the regulation of the host-parasite interactions and in the facilitation of the parasite proliferation and differentiation, delineation of the S. neurona kinome offers a platform for future development of efficacious drugs for EPM, for instance via parasite transmission blocking vaccine against the parasites (specific inhibition of the parasite's PKs). This approach is made possible by the differences between parasite and host PK homologs [76]. Zhang et al. [77] reviewed the applications and the progress made in the targeting of specific PKs as antimalarial drugs against Plasmodium parasites. Proof of principle of this approach has been demonstrated by the inhibition of human PKs using chemical ligands to treat cancers and other diseases [78,79]. Recently, Ojo et al. [80] provided evidence that PKs can be targeted for rationally-designed drugs that can potently inhibit the growth of S. neurona. The technology is available and approved for therapeutic intervention, thus offering a unique prospect of repurposing chemical ligands to manage S. neurona infections [81]. It is however important to note that experimental validations are required to validate the S. neurona putative PKs to facilitate the development of anti-parasitic interventions. A potential approach is the application of genetically-encoded sensors to identify inhibitors of important parasite signaling pathways.

Genome-Wide Identification of Putative S. neurona PKs
The predicted S. neurona proteome was downloaded from the Toxoplasma Genomics Resource database (Release 28; Version May 2016) [42]. A hidden Markov model (HMM) profile of signature PK domains obtained from the Kinomer database v 1.0 [82] was used to search for S. neurona kinases using HMMER v 3.1b2 [83]. The sequences having PK domain (IPR011009) or PK-like domain (IPR000719) were considered as putative kinases. Annotation of the putative kinase sequences was performed by BLASTp search against the non-redundant (nr)-NCBI protein and UniProtKB/Swiss-Prot databases at an e-value of ≤10 −6 . The identified S. neurona putative PKs were subsequently classified by BLASTp interrogations into the KinBase [84]. Gene ontology (GO) mapping was performed using Blast2GO v 4.0.7 [37]. The molecular weight (Mw) and isoelectric point (pI) were obtained using the ExPASy compute pI/Mw tool [85]. Motifs analysis was performed with the MEME Suite v 4.11.2 [86]. The parameters were as follows: number of repetitions, any; maximum numbers of motifs, 30; and the optimum motif widths, between 6 and 200 residues.

Phylogenetic Analysis
Phylogenetic trees were constructed to decipher the orthologous and paralogous relationships of S. neurona kinases. Protein kinase domains from putative S. neurona kinase groups were extracted and aligned with protein kinase domains from their homologs in T. gondii [42], H. hammondi and N. caninum using MUSCLE [87]. The alignments were subsequently manually edited in Jalview [88] for curation of alignment to remove uncertain regions due to gaps and poor alignment. Phylogenetic reconstruction was undertaken using the maximum likelihood program PhyML 3.0 [89] and RAxML v 8.0 [90] and the Bayesian inference program MrBAYES v 3.2 [91]. For PhyML, the LG substitution model was selected assuming an estimated proportion of invariant sites and four gamma-distributed rate categories to account for rate heterogeneity across sites. The gamma shape parameter was estimated directly from the data. The robustness of internal branches was evaluated using 100 bootstraps. MrBayes was run for 5,000,000 generations with two runs and four chains in parallel and a burn-in of 25%. Obtained trees were rendered with the Interactive Tree of Life server (iTOL) [92].