Unique Mitochondrial Single Nucleotide Polymorphisms Demonstrate Resolution Potential to Discriminate Theileria parva Vaccine and Buffalo-Derived Strains

Distinct pathogenic and epidemiological features underlie different Theileria parva strains resulting in different clinical manifestations of East Coast Fever and Corridor Disease in susceptible cattle. Unclear delineation of these strains limits the control of these diseases in endemic areas. Hence, an accurate characterization of strains can improve the treatment and prevention approaches as well as investigate their origin. Here, we describe a set of single nucleotide polymorphisms (SNPs) based on 13 near-complete mitogenomes of T. parva strains originating from East and Southern Africa, including the live vaccine stock strains. We identified 11 SNPs that are non-preferentially distributed within the coding and non-coding regions, all of which are synonymous except for two within the cytochrome b gene of buffalo-derived strains. Our analysis ascertains haplotype-specific mutations that segregate the different vaccine and the buffalo-derived strains except T. parva-Muguga and Serengeti-transformed strains suggesting a shared lineage between the latter two vaccine strains. Phylogenetic analyses including the mitogenomes of other Theileria species: T. annulata, T. taurotragi, and T. lestoquardi, with the latter two sequenced in this study for the first time, were congruent with nuclear-encoded genes. Importantly, we describe seven T. parva haplotypes characterized by synonymous SNPs and parsimony-informative characters with the other three transforming species mitogenomes. We anticipate that tracking T. parva mitochondrial haplotypes from this study will provide insight into the parasite’s epidemiological dynamics and underpin current control efforts.


Introduction
The protozoan parasite Theileria parva that causes East Coast fever (ECF) and Corridor Disease (CD) is considered among the most debilitating tick-borne pathogens in cattle over its endemic range pure parasite DNA free from host-DNA contamination [26]. An accurate determination of the origin (buffalo or cattle derived) and geographic spread of strains will help intervention and control efforts. Additionally, accurate characterization of T. parva strains will help to track their frequency and distribution in specific populations, and to characterize breakthroughs in areas of live vaccine field deployments. Further, since T. parva has sexual reproductive phases that are associated with genetic recombination [27], unraveling the parasite genotypes could enhance the understanding of the long-term effects of live vaccine components in the field.
Owing to limited or no recombination, uniparental inheritance patterns and a high substitution rate relative to nuclear genomes, mitochondrial genome studies on related apicomplexan parasites have provided clues of the geographical origin and variants of parasites [28,29]. However, the utility of mitochondrial genomes in T. parva in delimiting the strains and their geographical origin remains unexplored. In this study, we sequenced the mitochondrial genomes of ten T. parva strains, found within the parasite's currently known endemic range, as well as some characterized isolates used as vaccine strains. We also included the mitogenomes of nine other T. parva isolates assembled from their whole-genome data that are publicly available [30]. Further, this study assessed the divergence of T. parva from the closely related host-leukocyte transforming species T. annulata, T. taurotragi and T. lestoquardi with an aim to identify phylogenetically informative mitochondrial characters.

Source of Isolates
The parasite material for the different strains consisted of infected frozen ground-up tick supernatants (GUTS), salivary glands (SG), cattle whole blood, or infected lymphocyte cell lines ( Table 1). The GUTS and SG were collected from archived T. parva-infected R. appendiculatus specimens from early live vaccination projects. Parasite DNA from GUTS, SG, and cell culture sample sources was extracted using the NucleoSpin Tissue kit (Macherey-Nagel, Düren, Germany), whereas the NucleoSpin Blood Mini (Macherey-Nagel) was used to extract DNA from blood samples.  [39] * Indicates the year the material used for this study was created, which may differ from the time when the parasite was first isolated for some strains. n.a.: not available.

Next-Generation Sequencing (NGS) T. parva Datasets
We additionally obtained publicly available whole-genome datasets of nine T. parva strains (DRR002439-46), downloaded in FASTQ from the NCBI (SRA accession number: DRA000613) for assembly of their mitogenomes sequences (Supplementary Table S2). The details of the parasite strains are described in a previous study [30]. All NGS datasets comprised 36 nucleotide, single-end sequence runs performed on the Illumina GAII Analyzer [30].

Mitogenome Amplification and Sequencing
Primers were designed based on an alignment of T. parva and T. annulata mitogenomes available in the GenBank (Accession nos. AB499089 and NW_001091933, respectively). A 5808 bp fragment was amplified from all isolated DNA extracts (0.5-5 ng) in 25 µL reaction volumes comprising; 0.5 U of S7 Fusion polymerase (Biozym Scientific, Hessisch Oldendorf, Germany), 5× GC Phusion buffer (ThermoFisher Scientific GmbH, Darmstadt, Germany), 200 mM of dNTPs mix, and 0.5 µM of each primer (Supplementary Table S1). The cycling conditions were as follows: 98 • C for 30 s, followed by 44 cycles of 98 • C for 15 s, 60 • C for 25 s, and 72 • C for 4 min. The final extension step was maintained at 72 • C for 10 min. Amplification of expected~5.8 kb fragments was confirmed on 1.5% agarose gels stained with GRGreen DNA stain (Excellgene, Monthey, Switzerland) under UV trans-illumination. The amplicons were purified using GeneJET PCR Purification Kit (ThermoFisher Scientific GmbH, Darmstadt, Germany) before cloning using the Strataclone blunt vector (Agilent Technologies, USA) under the manufacturer's instructions. The plasmid was purified using the GenUP™ Plasmid Kit (biotechrabbit GmbH, Berlin, Germany) and evaluated for targeted inserts based on the EcoRI digestion (ThermoFisher Scientific GmbH, Darmstadt, Germany). The plasmids were sequenced by Sanger technology using standard vector primers (LGC Genomics GmbH, Berlin, Germany) and 10 primers designed in this study to amplify overlapping regions of the mitogenome (Supplementary Table S1).

Assembly, Mapping, and Annotation
The Sanger generated sequences were assembled in Geneious prime 2020.2.3 (www.geneious.com) by creating consensus sequences from the approximately 1000 bp overlapping reads aligned to a reference mitogenome (GenBank Accession: AB499089), which is based on T. parva Muguga vaccine strain. Similarly, the Illumina NGS reads were mapped with reference to (AB499089) using the Geneious mapper under medium-low sensitivity with fine-tuning of at least five iterations. Consensus sequences were generated from contigs based on a threshold of at least 60% of the adjusted chromatogram quality of contributing bases, while ignoring reads mapped to multiple locations on the reference. The same GenBank reference was used to map and annotate protein-coding genes (PCGs) and the known rRNA genes based on nucleotide similarities.

Phylogenetic Analysis and Identification of Informative Single Nucleotide Polymorphisms (SNPs)
Mitogenome sequence alignments generated using MAFFT [40] as well as concatenated alignments of cox1 and cob sequences with additional GenBank retrieved sequences of non-transforming Theileria spp. and Babesia spp. were used to infer maximum-likelihood phylogenies. We selected best-fit models for nucleotide substitution based on the lowest Bayesian information (BIC) scores calculated using the jModel Test 2 program and tested nodal support with 100 bootstrap replicates [41]. Phylogenetic trees were generated using PhyML implemented as a plugin within the Geneious software platform [42].
To avoid the challenges of missing data due to incomplete read coverage of the Illumina assemblies, we used only the Sanger data to generate the multiple alignments used for the SNPs detection. We aligned the ten T. parva Sanger-generated mitogenome sequences together with three other host-transforming species; T. taurotragi, T. lestoquardi and T. annulata (retrieved from GenBank: NW_001091933). SNPs were identified in Geneious prime with reference to the T. parva Muguga GenBank AB499089 sequence under default settings, with analysis of the variants on protein translations based on Mold-Protozoan Mitochondrial genetic code. We determined informative SNPs for the 13 mitogenomes under the parsimony optimality criterion with equal weights for all characters using PAUP*4.0 software [43].

T. parva Mitogenomes Haplotypes Definition and Network Analysis
Using a modified approach from [30], a second set of SNPs with consideration to the ten T. parva Sanger mitogenomes was extracted from the initial parsimony-informative SNPs. We used DnaSP v.6.12.03 on the second SNP data set to generate T. parva haplotype data [44] and a median-joining (M-J) network was constructed using Network V. 10 software (https://www.fluxus-technology.com/) under default settings to examine relationships among the T. parva mitogenomes [45]. Of the Illumina assembled mitogenomes, strains that had missing data with respect to the second SNP data set were excluded from the haplotype analysis.

Results
At least ten bidirectional overlapping Sanger reads were obtained for each strain, which were assembled into mitogenomes sequences ranging in size from 5800 to 5811 bp. The sequences are archived in NCBI's GenBank under accession numbers MW172707-MW172717; MW218514. The content and gene order for all ten T. parva, one T. taurotragi, and one T. lestoquardi mitogenomes were consistent with previous data, comprising three PCGs, fragmented rRNAs, and no tRNA [46] (Figure 1).

Divergence of T. parva from Other Host-Lymphocyte Transforming Theileria sp.
Due to length variations, we considered 5793 positions in the multiple alignment of the sanger-sequenced T. parva mitogenomes and the three additional host-lymphocyte transforming Theileria spp. Of the positions considered, there were 42 indels and 1036 SNPs. However, only 662 of the SNPs were parsimony informative across all mitogenome sequences. In terms of percentage identities of the PCG, cob was the most diverse gene, having 73.2-79.6% identity between T. parva Muguga strain and the three host-lymphocyte transforming Theileria ( Figure 2). Phylogenetic analyses were congruent both using the whole mitogenomes and the concatenated gene sequences. In all instances, T. annulata and T. lestoquardi consistently formed an outgroup clade to T. parva and T. taurotragi ( Figure 3; Appendix A, Figure A1).
Life 2020, 10, x FOR PEER REVIEW 7 of 16

Divergence of T. parva from Other Host-Lymphocyte Transforming Theileria sp.
Due to length variations, we considered 5793 positions in the multiple alignment of the sangersequenced T. parva mitogenomes and the three additional host-lymphocyte transforming Theileria spp. Of the positions considered, there were 42 indels and 1036 SNPs. However, only 662 of the SNPs were parsimony informative across all mitogenome sequences. In terms of percentage identities of the PCG, cob was the most diverse gene, having 73.2-79.6% identity between T. parva Muguga strain and the three host-lymphocyte transforming Theileria ( Figure 2). Phylogenetic analyses were congruent both using the whole mitogenomes and the concatenated gene sequences. In all instances, T. annulata and T. lestoquardi consistently formed an outgroup clade to T. parva and T. taurotragi ( Figure 3; Appendix A Figure A1).

T. parva Haplotype Analysis
We used the extracted second set of SNPs, which comprised nine informative SNPs for T. parva haplotype analysis. As previously noted, we included three Illumina assembled mitogenomes (Nyakizu from Rwanda, MandaliZ22H10, and Buffalo Z5E5 from Zambia) that had data on all the determined nine informative SNPs, irrespective of the other missing regions lacking reads coverage. In total, 13 T. parva strains were considered for the haplotype analysis. The SNPs segregated the T.

Divergence of T. parva from Other Host-Lymphocyte Transforming Theileria sp.
Due to length variations, we considered 5793 positions in the multiple alignment of the sangersequenced T. parva mitogenomes and the three additional host-lymphocyte transforming Theileria spp. Of the positions considered, there were 42 indels and 1036 SNPs. However, only 662 of the SNPs were parsimony informative across all mitogenome sequences. In terms of percentage identities of the PCG, cob was the most diverse gene, having 73.2-79.6% identity between T. parva Muguga strain and the three host-lymphocyte transforming Theileria (Figure 2). Phylogenetic analyses were congruent both using the whole mitogenomes and the concatenated gene sequences. In all instances, T. annulata and T. lestoquardi consistently formed an outgroup clade to T. parva and T. taurotragi (Figure 3; Appendix A Figure A1).

T. parva Haplotype Analysis
We used the extracted second set of SNPs, which comprised nine informative SNPs for T. parva haplotype analysis. As previously noted, we included three Illumina assembled mitogenomes (Nyakizu from Rwanda, MandaliZ22H10, and Buffalo Z5E5 from Zambia) that had data on all the determined nine informative SNPs, irrespective of the other missing regions lacking reads coverage. In total, 13 T. parva strains were considered for the haplotype analysis. The SNPs segregated the T.

T. parva Haplotype Analysis
We used the extracted second set of SNPs, which comprised nine informative SNPs for T. parva haplotype analysis. As previously noted, we included three Illumina assembled mitogenomes (Nyakizu from Rwanda, MandaliZ22H10, and Buffalo Z5E5 from Zambia) that had data on all the determined nine informative SNPs, irrespective of the other missing regions lacking reads coverage.
In total, 13 T. parva strains were considered for the haplotype analysis. The SNPs segregated the T. parva strains used into seven haplotypes, which we have identified in this study by assigning the TpmtH prefix numerically beginning with Muguga as a reference sequence (Figure 4). The Muguga strain isolate was assigned into one haplotype identical to Onderstepoort and Serengeti isolates (TpMtH1). Similarly, T. parva lawrencei (Manyara) isolated from an African buffalo, and Pugu I, both from Tanzania, together with a Zambian buffalo isolate (Buffalo Z5E5), formed one haplotype (TpMtH7) that was characterized by two SNPs within both the cox 1 and cob genes. We presumed Pugu I to have originated from buffalo T. parva based on analysis of its sporozoite surface (p67) antigen gene, which showed that it lacked the typical 129 bp deletion that is present in cattle transmissible T. parva [47,48]. The p67 sequence generated (See Appendix A for primer information and cycling conditions) for the Pugu I isolate is archived under accession no MW183674 in the GenBank. parva strains used into seven haplotypes, which we have identified in this study by assigning the TpmtH prefix numerically beginning with Muguga as a reference sequence (Figure 4). The Muguga strain isolate was assigned into one haplotype identical to Onderstepoort and Serengeti isolates (TpMtH1). Similarly, T. parva lawrencei (Manyara) isolated from an African buffalo, and Pugu I, both from Tanzania, together with a Zambian buffalo isolate (Buffalo Z5E5), formed one haplotype (TpMtH7) that was characterized by two SNPs within both the cox 1 and cob genes. We presumed Pugu I to have originated from buffalo T. parva based on analysis of its sporozoite surface (p67) antigen gene, which showed that it lacked the typical 129 bp deletion that is present in cattle transmissible T. parva [47,48]. The p67 sequence generated (See appendix A for primer information and cycling conditions) for the Pugu I isolate is archived under accession no MW183674 in the GenBank. Further, the Marikebuni strain originating from the Kenyan coast, Mandali from Zambia, and Satinsyi strain from Rwanda formed one haplotype (TpMtH3) characterized by two SNP transitions relative to T. parva Muguga (Table 2; Figure 4). The Marula and Kiambu-V isolates formed independent haplotypes (TpMtH4 and 5), but differed with one SNP position (119) between them ( Figure 5). The Boleni isolate (TpMtH6) possessed a transversion mutation within the cox1 gene (SNP 584). This transversion mutation was also notable within the buffalo haplotype (TpMtH7). Interestingly, all haplotypes deviate from TpMtH1 by a transition (A→G) at SNP position 4060, which lies in a currently functionally unknown region, but appears to be ancestral in the other transforming Theileria (Table 2; Figure 5). This transition is the single defining SNP of the Nyakizu (Rwanda) strain from Muguga, and makes TmMtH2 a central node from which all other haplotypes deviate. However, there was no apparent differentiation by geographic origin as the M-J network nodes associated with multiple haplotypes clustered isolates of diverse origin (Figure 4). Further, the Marikebuni strain originating from the Kenyan coast, Mandali from Zambia, and Satinsyi strain from Rwanda formed one haplotype (TpMtH3) characterized by two SNP transitions relative to T. parva Muguga (Table 2; Figure 4). The Marula and Kiambu-V isolates formed independent haplotypes (TpMtH4 and 5), but differed with one SNP position (119) between them ( Figure 5). The Boleni isolate (TpMtH6) possessed a transversion mutation within the cox1 gene (SNP 584). This transversion mutation was also notable within the buffalo haplotype (TpMtH7). Interestingly, all haplotypes deviate from TpMtH1 by a transition (A→G) at SNP position 4060, which lies in a currently functionally unknown region, but appears to be ancestral in the other transforming Theileria (Table 2; Figure 5). This transition is the single defining SNP of the Nyakizu (Rwanda) strain from Muguga, and makes TmMtH2 a central node from which all other haplotypes deviate. However, there was no apparent differentiation by geographic origin as the M-J network nodes associated with multiple haplotypes clustered isolates of diverse origin (Figure 4).  TpMtH3  Transition  C→T  GCC→GCT  951  TpMtH4  Transition  C→T  CTG→TTG  76  TpMtH5  Transition  G→A  GTG→GTA  36  Transition  C→T  CTG→TTG  76  TpMtH6  Transversion  A→C  GTA→GTC  501  TpMtH7  Transversion  A→C  GTA→GTC  501  Transition  C→T  TAC→TAT TpMtH2  Transition  A→G  4060  TpMtH3  Transition  A→G  4060  TpMtH4  Transition  A→G  4060  TpMtH5  Transition  T→C  1924  Transition  A→G  4060  TpMtH6  Transition  A→G  4060  TpMtH7  Transition  A→G  4060  Transversion  T→A  4382 Life 2020, 10, x FOR PEER REVIEW 9 of 16 Table 2. SNPs among the seven haplotypes based on the GenBank reference AB499089. The reference sequence matches haplotype TpMtH1 in the present study. Cox1  TpMtH3  Transition  C→T  GCC→GCT  951  TpMtH4  Transition  C→T  CTG→TTG  76  TpMtH5  Transition  G→A  GTG→GTA  36  Transition  C→T  CTG→TTG  76  TpMtH6  Transversion  A→C  GTA→GTC  501  TpMtH7  Transversion  A→C  GTA→GTC  501  Transition  C→T  TAC→TAT TpMtH2  Transition  A→G  4060  TpMtH3  Transition  A→G  4060  TpMtH4  Transition  A→G  4060  TpMtH5  Transition  T→C  1924  Transition  A→G  4060  TpMtH6  Transition  A→G  4060  TpMtH7  Transition  A→G  4060  Transversion T→A 4382

Intraspecific Divergence among T. parva Strains
Relative to the GenBank reference AB499089, an alignment of 10 Sanger-sequenced T. parva mitogenomes showed variation at 11 sites, all of which were SNPs with no indels observed ( Figure  1). Of the 11 SNPs, only three were found within the intergenic region and involved one transversion within the haplotype associated with buffalo T. parva. In total, there were three transversion SNPs positions, two of which were observed within the cob gene of the buffalo-associated haplotype. Among the genes, cox3 was most conserved with only a single SNPs position within the Boleni

Intraspecific Divergence among T. parva Strains
Relative to the GenBank reference AB499089, an alignment of 10 Sanger-sequenced T. parva mitogenomes showed variation at 11 sites, all of which were SNPs with no indels observed (Figure 1). Of the 11 SNPs, only three were found within the intergenic region and involved one transversion within the haplotype associated with buffalo T. parva. In total, there were three transversion SNPs positions, two of which were observed within the cob gene of the buffalo-associated haplotype. Among the genes, cox3 was most conserved with only a single SNPs position within the Boleni mitogenome, while the cox1 gene had five mutated positions, all of which were synonymous. The remaining two SNPs that were found within the cob gene sequences were non-synonymous and resulted in two contiguous amino acid substitutions in their predicted proteins (Table 2). Both substitutions involved the valine codon, which was replaced by an alanine amino acid codon. Notably, these substitutions were only in the haplotype associated with the buffalo T. parva isolates.

Discussion
In this study, we describe promising mitogenome-based SNPs that demonstrate precision and convenience in characterizing T. parva strains. Previously identified nuclear-based markers mainly based on a panel of mini-and micro-satellites are sometimes biased due to selective amplification of predominant strain clonotypes during passages through cattle and ticks [27,49]. In addition, since the design of the initial markers relied on the genome of T. parva Muguga stock, some markers are possibly biased in detecting diversity within this stock [50]. Although whole-genome SNPs analysis has been demonstrated to have high discriminatory power in typing vaccine strains, it is yet to find field applications [50,51]. Additionally, obtaining pure parasite DNA for whole-genome sequencing, especially for buffalo-derived T. parva is a hurdle due to its biology and may be complicated in the field where the parasite exists as a mixed diverse population [23,27].
Mitochondrial genomes and their individual genes have been extensively used to study phylogeny and applied in species identification and delimitation across broad taxonomic levels [52]. The majority of apicomplexan mitochondrial genomes that have been sequenced to date exhibit an extreme size reduction, containing at most three protein genes (cox1, cox3, and cob) and fragmented rRNA genes [53]. The extreme mitogenomes size reduction and a faster coalescence make mtDNA attractive to study differentiated T. parva population strains. Indeed, our analysis revealed haplotype-defining SNPs within the T. parva mitogenomes, which are parsimonious with other host-leukocytes transforming Theileria species. Based on median joining (MJ) parsimony analysis, the T. parva mtDNA sequences generated were segregated into an unambiguous network, congruent with the existence of multiple linked lineages. With respect to the T. parva Muguga isolate haplotype, each haplogroup was defined by synonymous nucleotide changes, except for non-synonymous changes leading to amino acid substitutions within the cob of the buffalo-associated strains; T. parva lawrencei, T. parva Buffalo Z5E5, and one field isolate, T. parva Pugu I.
Our analysis identifies T. parva Muguga and Serengeti-transformed as belonging to the same haplotype characterized by nine defining SNPs positions from the strains used in this study. This is not surprising as previous studies have demonstrated a similar monoclonal antibody profile and conservation on their known T. parva antigen coding genes [50,54]. Further, the two strains are strikingly similar at the whole genome level, with only 420 non-synonymous substitutions in Serengeti-transformed relative to the Muguga reference genome reported [50]. These substitutions occur in a paltry 53 genes (out of over 4000 T. parva genes) mainly within polymorphic multicopy gene families and ATP-binding cassette transporter genes located in subtelomeric ends [50]. With the almost similar identity of the two strains, our results, in addition to previous studies, question the necessity of both Muguga and Serengeti-transformed in the trivalent cocktail instead of a divalent cocktail containing either of the two and Kiambu-V. Interestingly, both Muguga and Serengeti-transformed T. parva strains also shared the same haplotype with a historical isolate T. parva Onderstepoort, a laboratory maintained stock isolated in 1937 on the farm Schoonspruit in the Transvaal, South Africa prior to ECF eradication in this country [1,37,55]. Earlier analyses on three T. parva antigen proteins; the Polymorphic immunodominant molecule (PIM), sporozoite surface protein (p67), and p104, have shown that these nuclear-encoded antigen genes are, in fact, identical to those of the Muguga parasite [50,56]. It is thus conceivable that the ECF-causing strains derive from a common lineage that can be inferred at the mitochondrial genome.
An important finding of this study is the clustering of buffalo-derived T. parva strains under one haplotype (TpMtH7) with the same nine SNPs. It is noteworthy that the buffalo strains used in this study originate from two different countries (Zambia and Tanzania), while the field isolate (Pugu I) was isolated during vaccine field trials in Tanzania. And although a Kenyan Buffalo (T. parva LAWR) from the NGS assembly was not included in the haplotype analysis due to missing data on SNP position (159), all its other SNPs positions also matched haplotype (TpMtH7) (data not shown). The buffalo has long been recognized as the natural reservoir of T. parva. The T. parva strains maintained in cattle are considered a subset population from that maintained in buffalo [23,25]. However, there has not been a definitive genetic basis to differentiate what constitutes a buffalo-derived T. parva and a cattle-derived T. parva or whether their designation as a single species is justified [23,56]. The available approach of their differentiation based on the p67 alleles only provides a preliminary indication of presumptive exposure of cattle to buffalo T. parva based on alleles-2, 3, and 4, which are considered highly probable to be of buffalo origin in contrast to allele-1 that is found in cattle transmitted T. parva, but does not necessarily preclude its presence in buffalo [48,49,57]. Our analysis suggests strain defining mitochondrial SNPs that are potential markers for buffalo-derived T. parva lineages.
Noticeably, the Boleni strain formed a separate haplotype (TpMt6) that shared a transversion mutation within the coxI gene with the buffalo haplotype. This strain was isolated from Zimbabwe from a farm that had experienced a severe theileriosis outbreak in January 1978 [31]. Under the now obsolete trinomial nomenclature of T. parva, it was named Theileria parva bovis, which was associated to what was referred to as January disease [58]. The delineation of this strain from our data is thus a significant find as it agrees with the epidemiological distinctions that have been apparent from earlier investigations on theilerioses caused by T. parva. Further, our analysis identifies the Kenyan-Marekebuni, Zambian-Mandali, and Rwandese Satinsyi strains as one haplotype (TpMtH3). Although the shared haplotypes from widely separated regions may suggest a lack of geographical differentiation of the haplotypes, our observations could also be because of a limited sample size as well as through spread by carrier animals.
A high level of interspecies divergence among the transforming Theileria is observed that is characterized by up to 42 indels with respect to T. parva Muguga. However, a limited polymorphism is observed amongst the 13 T. parva mitogenomes analyzed, which is also observed in other apicomplexan species such as Plasmodium falciparum [59]. Of the eleven T. parva SNPs observed, only nine were informative. We modestly suppose this may be convenient compared to whole-genome-based SNPs in which up to >120,000 SNPs have been observed in buffalo strains alone [30]. Additionally, we think our approach to defining SNPs that are foremost parsimonious with other leukocyte transforming Theileria provides initial indications on the potential of the identified SNPs to be informative for typing of recently diverged field T. parva strains from common leukocyte transforming ancestor. Nonetheless, further investigation to test the utility of the SNPs is necessary with a larger field population across the T. parva endemic range, especially in wildlife-livestock areas where 'breakthrough' infections against the trivalent live vaccine are known to occur.
The phylogenetic analysis of both the full-length near-complete mitochondrial genomes and the concatenated cox1 and cob genes place T. parva and T. taurotragi in one clade, consistent with previous analyses using nuclear genes such as the 18S RNA gene. The same phylogenetic tree topology is maintained with the sporozoite surface protein gene and its orthologues in respective leukocyte host-transforming species [60]. Thus, the mitogenomes data's observed congruency with nuclear-based data rules out possibilities of inheritance patterns specific to mitochondria in our analysis.
Our data indicate T. annulata and T. lestoquardi form an outgroup clade among the transforming parasites, reflecting an allopatric speciation separation from T. parva and T. taurotragi, and conforms to their currently known demography. Noticeably, T. taurotragi was initially described as a parasite of the eland (Taurotragus oryx) [61], but is also reported to cause infections in cattle in the known endemic range (Eastern, Central, and Southern Africa) of T. parva and its tick vector R. appendiculatus, alongside other tick vectors [38]. As such, co-infections of T. taurotragi and T. parva are, in fact, frequently common [62]. While the pathogenicity of T. taurotragi in cattle is not clearly understood, it has been shown to transform a wide range of host cells in in vitro studies [63], and has been associated with cases of cerebral theileriosis (BCT) [38,64].
Similarly sympatric, T. annulata and T. lestoquardi, occur within the same currently known endemic range (N. Africa, S. Asia, and S. Europe) and are transmitted by ticks belonging to Hyalomma genus. Both are important parasites responsible for heavy economic losses and have an intertwined epidemiology that poses interpretation challenges in their overlap in affected countries [65]. Theileria lestoquardi is a parasite of small ungulates and causes malignant ovine theileriosis, while T. annulata causes bovine tropical theileriosis but also co-infects with the former in sheep [65][66][67].
In conclusion, this study catalogs SNPs based on mitogenomes of characterized T. parva strains and vaccine stocks that can facilitate their tracking in the field. We identify haplotypes defined by SNPs that are initially parsimonious among transforming Theileria; T. parva, T. annulata, T. taurotragi, and T. lestoquardi mitogenomes, the latter two reported herein for the first time. We anticipate that the knowledge of the circulating haplotypes with reference to the live vaccine strains haplotypes will be insightful in characterizing T. parva epidemiology with important implications for control, and have a predictive value on the success of live vaccine deployments besides characterization of breakthrough infections.