An Update of Bovine Hemoplasmas Based on Phylogenetic and Genomics Analysis

Mycoplasma wenyonii and ‘Candidatus Mycoplasma haemobos’ are bacteria that have been described as significant hemoplasmas that infect cattle worldwide. Currently, three bovine hemoplasma genomes are known. This work aimed to describe the main genomic characteristics and the evolutionary relationships between hemoplasmas, and provide a list of epitopes predicted by immunoinformatics as diagnostic candidates for bovine hemoplasmosis. Thus far, there is no vaccine to prevent this disease that economically impacts cattle production worldwide. Additionally, there is a lack of vaccines against bovine hemoplasmosis. In this work, we performed a genomic characterization of hemoplasmas, including two Mexican strains reported in bovines in the last few years. The generated information is a new scenario about the phylogeny of hemoplasmas. Also, we show genomic features among hemoplasmas that strengthen their characteristic genome plasticity of intracellular lifestyles. Finally, the elucidation of antigenic proteins in Mexican strains represents an opportunity to develop molecular detection methods and diagnoses.


Introduction
Hemotrophic mycoplasmas (hemoplasmas) are a group of erythrocytic bacteria pathogens of the Mollicutes class that infect a wide range of vertebrate animals [1,2]. At first, these small and uncultivable in vitro bacteria were classified as genera Haemobartonella and Eperythrozoon within the Anaplasmataceae family and Rickettsiales order [2]. However, the genetic analysis of the 16S ribosomal RNA (rRNA) gene and morphologic similarities showed that these bacteria are closely related to the Mycoplasma genus [1,3]. In 2001, a formal proposal was presented to transfer these organisms to the genus Mycoplasma within the Mycoplasmataceae family [4]. Currently, 12 hemoplasma genomes have been identified in the GenBank database (https://www.ncbi.nlm.nih.gov/genbank/, accessed on 15 August 2022), including Mycoplasma (M.) wenyonii strains: Massachusetts and INIFAP02, and one Candidatus (Ca.) Mycoplasma haemobos strain, INIFAP01 [5][6][7]. These hemoplasma genomes provided relevant information about possible pathogenic mechanisms, metabolism, and divergence compared to other mycoplasma species [1].
Currently, M. wenyonii and Ca. M. haemobos are significant hemoplasmas infecting cattle worldwide [8][9][10]. In cattle, acute hemoplasma infections are rare and characterized by anemia, fever, depression, and diarrhea [11,12]. Chronic bovine hemoplasma infections are associated with variable clinical signs (that is often confused with clinical signs of diseases such as anaplasmosis), including low-grade bacteremia, weight loss, decreased

Genome Sequences and Annotation
In Mexico, the Anaplasmosis Unit (CENID-SAI, INIFAP) reported the draft genomes of two Mexican strains of hemoplasmas that infect cattle: M. wenyonii INIFAP02 and 'Ca. M. haemobos' INIFAP01. This whole-genome shotgun project was deposited at DDBJ/ENA/GenBank under accession no. LWUJ00000000 and QKVO01000000. Both genomes were de novo assembled using the SPAdes version 3.11.1 program.
The 12 hemoplasma genomes (two Mexican strains and ten strains from other countries) that infect different hosts, included in this study and reported in the GenBank database (https://bit.ly/314fOre, accessed on 20 August 2022, are listed in Table S1. The general features of the 12 hemoplasma genomes were obtained using the quality assessment tool for genome assemblies (QUAST) (v5.0.2) program [21] with default settings. All genomes were annotated automatically to predict the coding sequences (CDS) using the rapid annotation using subsystem technology (RAST) (v2.0) server (https://bit.ly/2XjTTey, accessed on 22 August 2022) [22] with the classic RAST algorithm.

Phylogenetic and Pangenome Analysis
For the phylogenetic reconstruction, we used 39 16S rRNA gene sequences that correspond to 12 16S rRNA gene sequences reported in the 12 hemoplasma genomes, including the two Mexican strains, and 3 16S rRNA gene sequences reported for bovine hemoplasmas whose genomes have not been reported. We also included 22 16S rRNA gene sequences from hemoplasmas that do not infect bovines, but other animals such as swine, alpaca, felines, sheep, goats, and others. As an outgroup, we used two 16S rRNA gene sequences from Ureaplasma spp. of the family Mycoplasmataceae that infects humans. In total, 39 16S rRNA gene sequences were used in the phylogenetic reconstruction.
Two pangenome analyzes were performed using the GET_HOMOLOGUES (v3.3.2) software package [29] with the following options: (i) among the 12 hemoplasma genomes, and (ii) among the 3 genomes of bovine hemoplasmas. The Fasta amino acid (FAA) annotation files of hemoplasma genomes were used as input files by the GET_HOMOLOGUES software package. The get_homologues.pl and compare_clusters.pl Perl scripts were used to compute a consensus pan-genome, resulting from clustering the all-against-all protein BLAST (Blastp) results with the COGtriangles and OMCL algorithms. The pan-genomic analysis was performed using the binary (presence-absence) matrix.

Comparative Genomics
The average nucleotide identity (ANI) values of 12 hemoplasma genomes were calculated using the calculate_ani.py Python script (https://bit.ly/2X96hho, accessed on 27 August 2022) with the BLAST-based ANI (ANIb) algorithm. Ultimately, the level of conserved genomic sequences of bovine hemoplasmas was visualized by the alignment of the genomes of the Mexican strains ('Ca. M. haemobos' INIFAP01 and M. wenyonii INIFAP02) against the reference genome of M. wenyonii, Massachusetts, using the NUCmer program of MUMmer (v3.0) software package [30] to obtain the positions of nucleotides that were aligned, and Circos (v0.69-9) software package [31]. The circular comparative genomic map of bovine hemoplasmas was edited with Adobe Photoshop CC (v14.0 x64).

Prediction of Antigenic Proteins
After RAST annotation, we identified several proteins of the six subsystems: virulence, disease and defense, cell division and cell cycle, fatty acids, lipids and isoprenoids, regulation and cellular signaling, stress response, and DNA metabolism.
For Ca. M. haemobos INIFAP01, the proteins were selected from three subsystems including virulence, disease and defense, cell division and cell cycle, and stress response. For M. wenyonii INIFAP02, the subsystems selected were virulence, disease and defense, cell division and cell cycle, fatty acids, and lipids and isoprenoids. The selection of the subsystems was based on the fact that these were proteins with the potential to be antigenic, as shown in an initial antigenicity prediction with SVMTrip (http://sysbio.unl.edu/SVMTriP/, accessed on 20 August 2022). Finally, we selected 11 protein sequences of M. wenyonii INIFAP02 and 12 proteins of Ca. M. haemobos INIFAP01 that were submitted to the VaxiJen v2.0 server to predict protective antigens (http://www.ddgpharmfac.net/vaxijen/VaxiJen/VaxiJen.html, accessed on 20 August 2022) with default parameters.

Prediction of Subcellular Localization and Stability of Proteins
Predicted antigenic proteins of M. wenyonii INIFAP02 and Ca. M. haemobos INI-FAP01 were submitted to predict the secondary structure server Raptor X (http://raptorx. uchicago.edu/, accessed on 21 August 2022).

Linear B-Cell Epitope Prediction and Three-Dimensional Modeling
B-cell epitopes of M. wenyonii INIFAP02 and Ca. M. haemobos INIFAP01 were predicted using BCEpred (https://webs.iiitd.edu.in/raghava/bcepred/bcepred_submission.html, accessed on 21 August 2022) predicts based on physicochemical properties such as hydrophilicity, flexibility, accessibility, turns, exposed surface, polarity, and antigenic propen- The PHYRE2 server was used to predict the tridimensional structure of the proteins of both hemoplasmas. Phyre2 PDB files were visualized with the Protter tool.

General Features of Genomes
Of the twelve hemoplasma genomes, two genomes assembled as contigs (Ca. M. haemobos INIFAP01 and M. wenyonii INIFAP02) and ten genomes as a single chromosome. The features of the 12 hemoplasma genomes are shown in Table 1.  The mapping of rRNA genes shows that hemoplasmas are separated into two groups. The four genomes of group 1 contain one copy of the 16S-23S-5S rRNA operon ( Figure 1A). The 16S rRNA gene sequence length of group 1 ranges from 1429 to 1486 bp. Conversely, seven genomes of group 2 contain one copy of the 16S rRNA gene ranging from 1469 to 1508 pb, separated from one copy of the 23S-5S rRNA operon ( Figure 1B,C). In addition, the genome of M. ovis Michigan of group 2 contains two copies of the 16S rRNA gene with lengths of 1479 and 1493 bp, separated from each other and separated from one copy of the 23S-5S rRNA operon ( Figure 1D). rRNA gene with lengths of 1479 and 1493 bp, separated from each other and separated from one copy of the 23S-5S rRNA operon ( Figure 1D).

Phylogenetic and Pangenome Analyzes
The model of nucleotide substitution of the phylogenetic tree based on the 16S rRNA gene of hemoplasmas was GTR + I + G. The phylogenetic tree shows that the hemoplasmas arrange into two groups (

Phylogenetic and Pangenome Analyzes
The model of nucleotide substitution of the phylogenetic tree based on the 16S rRNA gene of hemoplasmas was GTR + I + G. The phylogenetic tree shows that the hemoplasmas arrange into two groups (Figure 2  Pangenome analysis among the 12 hemoplasmas shows that the core, softcore, shell, and cloud genomes are composed of 110, 146, 787, and 3099 gene clusters, respectively (Figures 3 and S1). Additionally, the core genomes of groups 1 and 2 of hemoplasmas are composed of 236 and 149 gene clusters, respectively. Pangenome analysis among the three genomes of bovine hemoplasmas shows that the core genome comprises 154 gene clusters.

Comparative Genomics
ANIb values between different hemoplasma species show that the alignment coverage is less than 79% ( Figure 4A and Table S2), and the identity is less than 83% ( Figure 4B and Table S3)

Comparative Genomics
ANIb values between different hemoplasma species show that the alignment coverage is less than 79% ( Figure 4A and Table S2), and the identity is less than 83% ( Figure 4B and Table S3) INIFAP01 genome only has three small regions (red lines highlighted with a green marker in the inner track) greater than 78% identity aligned with the M. wenyonii Massachusetts genome (the black circle in the outer track). Furthermore, the circular map shows that M. wenyonii INIFAP02 has few regions (blue lines in the intermediate track) greater than 78% identity aligned with the M. wenyonii Massachusetts genome ( Figure S2).

Selection and Prediction of B-Cell Epitopes in Proteins
After the hemoplasmas genomes were annotated in the RAST server, only 18% of Ca. M. haemobos INIFAP01 proteins are classified in the different subsystems of RAST and 22% proteins for M. wenyonii. This fact exhibits a low percentage of known proteins in this database. For this reason, we manually reviewed the annotated proteins in the subsystems of both hemoplasmas. Then we selected those proteins whose category suggests a protein with potential antigenicity. The sequences of 11 proteins of M. wenyonii and 12 proteins of Ca. M. haemobos were submitted to VaxiJen, and the prediction of antigen/non-antigen for each selected protein was calculated ( Table 2). The tridimensional structure of the proteins was performed to locate the predicted epitopes.  The collection of B-cell epitopes (linear antigens) was predicted with SVMTrip, and BCEPred server for those proteins predicted as "Antigen" by VaxiJen (Table 3).

Discussion
Hemoplasmas have constantly been detected in cattle in the last few years [32][33][34][35][36]. However, their transmission by ticks lacks evidence. On the contrary, their mechanical transmission mediated by blood-sucking flies and contaminated veterinary instruments could be essential for hemoplasmas dispersion [37,38]. In this scenario, information about hemoplasmas contributes to a better understanding of these pathogens and their impact on animal health. In this regard, we performed comparative genomic and immune-informatics studies of several hemoplasmas genomes reported worldwide.
This study aimed to provide information about hemoplasmas derived from a comparative genomic and how these data, along with immuno-informatics, provided potential antigenic candidates with the potential to be used in detection and diagnosis methods. This work comprises an analysis of the hemoplasmas reported in the last few years, including two Mexican strains, representing the most recent information in the field.
Hemoplasmas have undergone phylogenetic reclassification after several studies based on molecular markers [39]. Their genome size variation, positional shuffling of genes, and poorly conserved gene synteny is evidence of the high dynamics of their genomes [1]. All the genomic differences of the Mexican bovine hemoplasmas confirm this dynamic. In group 1, Ca. M. haemobos' INIFAP01, we found the canonical structure of the rRNA operon, 16S-23S-5S; however, in group 2 M. wenyonii INIFAP02, we identified an unlinked rRNA gene structure (Figure 1). The canonical structure in bacteria allows a rapid response to the demand of changing growth conditions, whereas unlinked rRNA genes result in the genome degradation typical of obligate intracellular lifestyles, as reported in members of the order of Rickettsiales [40][41][42].
We also present 12 genomes of hemoplasmas classified into two groups (groups 1 and 2) and with different numbers of CDS and tRNAs. Additionally, the distribution of rRNA genes is specific to each species. Specifically, the genome of 'Ca. M. haemobos' INIFAP01 is significantly longer than the two genomes of M. wenyonii species, but 'Ca. M. haemobos' INIFAP01 has a lower G + C content. Again, these features denote the substantial gene gain/loss throughout the evolution observed in hemoplasmas [1].
The phylogenetic reconstruction shows that 'Ca. M. haemobos' INIFAP01 is phylogenetically distant through evolution from M. wenyonii INIFAP02 and M. wenyonii Massachusetts, whereas the strains INIFAP02 and Massachusetts are closely related through the evolution of group 2.
The number of genes in the core (those present in all considered genomes); softcore (found in 95% of genomes); shell (present in more than two genomes and less than 95% of genomes); and cloud genes (found in not more than two genomes) revealed in the pan-genomic analysis suggest that there is considerable loss/gain in genes through the evolution of the 12 hemoplasmas genomes.
Regarding Surprisingly, the alignment coverage and identity percentages among both genomes of M. wenyonii (strains INIFAP02 and Massachusetts) suggest that these strains may not belong to the same species because the ANI values are <95%, the species ANI cutoff value [43][44][45].
Since bovine hemoplasmas show significant differences at the genomic level, and their impact on cattle health causes economic losses, we performed an immune-informatic analysis to identify B-cell epitopes that could be used to design molecular detection methods of bovine anaplasmosis. The immune-informatics studies predict several B-cell epitopes (Table 3), which could be applied in molecular detection methods and vaccines. Peptides that contain epitopes are widely applied for pathogen detection by serological methods [46,47], immunolocalization of pathogen proteins [48], and vaccines against animal diseases [49][50][51]. For Ca. M. haemobos INIFAP01, the proteins predicted with antigenic peptides are DNA gyrase subunit B, SSU ribosomal protein S7p and S12p, and translation elongation factor G. All of them participate in vital processes. We observe something similar in M. wenyonii INIFAP02, where the proteins predicted with antigens also participate in vital processes (SSU ribosomal protein S7p and S12p, translation elongation factor G, ribosomal protein LSU L35p) ( Table 3).
The epitope collection generated in this work will help to design molecular tools to diagnose bovine hemoplasmosis. Undoubtedly, this approach could have further applications as part of the prevalence of studies with animals from different geographic regions. Also, this strategy should consider the best alternative of the antigenic peptide to discriminate from other pathogens that infect bovines.

Conclusions
This work describes the evolutionary relationships between 39 hemoplasmas reported until now, including genomes, isolates, and clones. Also, we present the main genomic characteristics and pangenome analyses of twelve hemoplasma genomes and a prediction of B-cell epitopes in proteins annotated in M. wenyonii INIFAP02 and Ca. M. haemobos INIFAP01. Regarding Ca. M. haemobos INIFAP01, the proteins predicted with antigenic potential that include components of vital processes (DNA gyrase subunit B, SSU ribosomal protein S7p and S12p, translation elongation factor G) are the candidates to initiate studies for molecular detection of this hemoplasma. We observe something similar in M. wenyonii INIFAP02, where the predicted antigens also participate in vital processes (SSU ribosomal protein S7p and S12p, translation elongation factor G, ribosomal protein LSU L35p). Finally, the data presented here about antigenic peptides of M. wenyonii INIFAP02 and Ca. M. haemobos INIFAP01, identified by immuno-informatics, have potential use in detecting and preventing hemoplasmosis.

Data Availability Statement:
The genomic data used to support the findings of this study were deposited in the GenBank repository with accession numbers NZ_QKVO00000000.1 and LWUJ00000000.1.