Comparative Genomic Analysis of Phytopathogenic Xanthomonas Species Suggests High Level of Genome Plasticity Related to Virulence and Host Adaptation

Xanthomonas bacteria are known phytopathogens difficult to control in the field, which cause great losses in many economically important crops. Genomic islands are fragments acquired by horizontal transference that are important for evolution and adaptation to diverse ecological niches. Virulence and pathogenicity islands (PAIs) enhance molecular mechanisms related to host adaptation. In this work, we have analyzed 81 genomes belonging to X. campestris, and a complex group of X. citri, X. axonopodis, and X. fuscans belonging to nine different pathovars and three subspecies, to analyze and compare their genomic contents. Xanthomonas pan-genome is open and has a massive accessory genome. Each genome showed between three and 15 exclusive PAIs, well conserved through strains of the same pathovar or subspecies. X. axonopodis pv. anacardii had higher general similarity to X. citri subsp. citri and X. fuscans subsp. aurantifolii, with which a few PAIs were shared. Genomic synteny was even for almost all strains, with few rearrangements found in X. axonopodis pv. anacardii. The prophage regions identified in the genomes were mostly questionable or incomplete, and PAI13 in X. campestris pv. campestris ATCC33913 matched a prophage region of 19 transposable elements. Finally, PAIs in Xanthomonas are pathovar-specific, requiring individual strategies of combat.


Introduction
Bacteria of the genus Xanthomonas affect several crops of economic importance, responsible for diseases such as bacterial black spot, citrus canker, and the bacterial canker of grapevine, for instance [1,2]. Even though these phytobacterioses cause considerable economic losses [3], there is still no adequate method to control their spread and reduce disease incidence in agriculture, specially those associated with X. campestris and X. citri [4][5][6][7]. Studies regarding this and other Xanthomonas species, revealed the enormous potential of comparative genomics approaches focused on the accessory genome and mobile elements to elucidate aspects of pathogenicity explicitly assigned to adaptive traits, cell physiology, and host specificity [8][9][10][11][12]. When applied to different pathovars of the same species, this type of approach aids in plant-host relationship studies, since it is possible to identify specific genes and metabolic pathways based on the genome structure and composition as well [13]. Consequently, studies for the identification and characterization of virulence factors and protein effectors shared with different strains of the same species allow the discovery of specific genomic profiles suggesting an increased or reduced virulence for a given host [14][15][16].
Previous studies based on the morphology and other phenotypic characteristics of Xanthomonas have misclassified it taxonomically, such as strains that infect grape, cashew, and mango crops, amongst others [17]. As whole genome-based techniques were improved, topologies in the Xanthomonas genus were changed in many species. For instance, whole-genome sequencing, DNA-DNA hybridization and multilocus analyses in strains of X. axonopodis have shown biased taxonomy in species, subspecies, and pathovar levels, supporting reclassification to X. citri pv. anacardii X. citri pv. vignicola, and X. citri pv. glycines instead [18]. Moreover, other phylogenies confirmed with these new topologies, reclassifying X. fuscans as a pathovar of X. citri, and X. albilineans would form a joint clade linking the Xanthomonas genus to Xyllela fastidiosa [19]. Similarly, recent phylogenetic and ANI (average nucleotide identity) analyses have supported transference of X. campestris pv. arecae and X. campestris pv. musacearum strains to X. vasicola [20].
Even though those species might be similar, it has not yet been elucidated how similar their genomic features as a whole might be, including the dispersion of molecular features related to pathogenicity. Thereby, this study aims to understand the influence of the genomic content of the X. citri complex and X. campestris on virulence, pathogenicity, and the specific host-pathogen correlation.

Genome Annotation of Xanthomonas species
We obtained 80 complete genomes from the National Centre for Biotechnology Information (NCBI) belonging to X. citri and X. campestris species. The genomes of those species were selected due to the interest of the research group in species that affect agriculture in Northeast Brazil. All genomes are listed in Table A1, according to their strain, isolation source, and related disease.

Genomic Characterization and Enrichment Analysis
Initially, all genomes were automatically annotated by Rapid Annotation using Subsystem Technology (RAST) performed to predict putative CDSs, tRNA, and rRNA regions [21]. Subsequently, we used the Artemis tool [22] for manual gene characterization according to alignments done at UniProt, Pfam, and InterProScan databases. To carry out genetic enrichment, we used the Gene Ontology Functional Enrichment Annotation Tool (GO Feat) [23] for the characterization of genes and proteins at three different ontological levels (molecular function, biological process, cellular component) of all 80 genomes of Xanthomonas strains. We also used the Kyoto Encyclopedia of Genes and Genomes (KEGG) [24] to characterize metabolic pathways for both pangenomes and individual sets of genomes. We only performed the characterization for individual sets of genomes to X. citri and X. campestris, due to the low number of representative X. vasicola genomes available.

Pan-Genome Analyses
For the pan-genome analyses, we used Bacterial Pan-Genome Analyses pipeline (BPGA) [25] to identify clusters of orthologous sequences (COG) among Xanthomonas genomes. We also used Roary [26] for phylogenomic inference and FigTree [27] for phylogenomic tree visualization. Finally, we used the Virulence Factor Database (VFDB) [28] to determine a cluster map of virulence factors throughout all Xanthomonas genomes.

Prediction and Characterization of Pathogenicity Islands
We used the Genomic Island Prediction Software (GIPSy) tool [29], which is one of the most cited and recommended tools in the literature [30], to obtain genomic islands in all genomes, focusing on pathogenicity islands (PAIs). Then, we added and correlated GO terms to CDS in each pathogenicity island in X. citri and X. campestris strains. In this step, we only considered exclusive PAIs, since many regions were associated with more than one type of genomic island (metabolic, resistance, or symbiotic). We used BLAST Ring Image Generation (BRIG) [31] to represent and evaluate the position of genomic islands in different strains of Xanthomonas, as well as the similarity between strains of the same and different pathovars and/or subspecies. To understand island distribution and conservation through different genomes, the graphs were plotted using representative pathovars, each with a reference strain, so that the relationships between PAIs and hosts would be easier to approach. We chose genomes with the highest number of PAIs as representative genomes for each pathovar. The faded regions in the rings denoted lower similarity (<70%) whilst strong colors denoted higher similarity (>90%). We also used the MAUVE tool [32] to identify probable genomic rearrangements among strains of interest by comparing strainspecific regions correlated with genomic islands of reference strains and checking if there were any disruption, change of location, or change of sense between different strains.

Subcellular Localization of Proteins inside PAIs
The online platform PSORTb v.3 [33] was used to predict the subcellular location of the coding sequences products inside pathogenicity islands of interest. Specific multifasta files were manually created for each island of interest, with sequences available in the Artemis interface.

Prediction of Prophages Regions in Xanthomonas genomes
We performed a comparison of prophage regions between different pathogens using PHASTER [34]. This tool identified and classified each prophage region in representative pathovars of X. citri and X. campestris according to scoring in three categories: intact (score green > 90), questionable (score blue 70-90), and incomplete (score red < 70).

Functional Genomics and Comparative Genomics Insights of Xanthomonas species
All 80 genomes presented around 65% GC content, all relevant information regarding strains is available in Table A1. The automatic annotation on RAST predicted 4287 coding sequences (CDS) in the selected reference genome X. campestris pv. campestris ATCC 33913. The GO results for all 80 strains' genomes showed that most proteins are involved in molecular functions, representing a percentage higher than 1/3 of the total protein repertoire. The other two categories, biological processes and cellular components ranged from 1925 to 2223 and 1509 to 1813 proteins, respectively ( Figure S1). Overall, the COG profile of both X. campestris and X. citri were very similar. Categories of general or unknown function (R and S) had the highest percentages (around 40% and 20%, respectively)in the core, accessory, and unique genomes of these two species, possibly due to the lack of studies on the functional characterization of representative genomes. Other categories of COG with considerable percentages of genes were related to (1) cell wall/membrane/envelope and biogenesis, (2) signal transduction mechanisms, (3) intracellular trafficking, secretion, and vesicular transport, and (4) transcription ( Figure S2). For the pan-genome analysis, the distribution of unique, accessory, and core genomes among the categories was almost uniform, except for nucleotide transport and metabolism, in which no unique genes were identified for X. citri, and a small portion of unique with no accessory genes found for X. campestris ( Figure S2). Roary prediction of genome shows that the accessory genome of Xanthomonas had a higher number of genes compared to the core and unique genomes, corroborating with BPGA analysis ( Figure S3) (Table 1). Moreover, BPGA results indicate the pan-genome is still open, as the alpha value was lower than 1. However, as the curve flattened and stabilized with the increasing number of genomes added to the analysis, it may be closed soon enough, as seen in Figure 1.  Agreeing to that, the distribution of metabolic pathways on KEGG for the Xanthomonas pan-genome showed that some categories had more sequences belonging to the accessory genome than to the core, specifically categories that are somehow related to pathogenicity and virulence, such as cell motility, glycan biosynthesis and metabolism, infectious diseases, membrane transport, signal transduction, xenobiotics degradation and metabolism ( Figure 2). General categories like cellular processes and environmental information processing also showed more genes related to the accessory pan-genome. The pan-genome matrix based on the core genome revealed that X. campestris exhibited clusters of genes different from the other species' genomes ( Figure S4). The core phylogenomic tree on this figure is also available in Figure S5.
Finally, the search for virulence factors revealed 14 genes present through the genomes, but only 4 were identified in all 81 genomes: htpB, pilG, cheY, and fliN. Other virulence factors such as pilT, vipB, clpV1, ugd, and vipA were also common for most of the genomes. Only the genomes of X. campestris pv. raphani 756C, X. citri subsp. malvacearum XcmN1005 and X. citri subsp. malvacearum MS14003 presented exclusive virulence factors fliA and pilC, respectively. A cluster map for all virulence factors is available in Figure S6.

Genomic Islands Reveal a New Potential Regarding the Pathogenicity of the Genus
All genomes analyzed presented PAIs, and most of them matched with genomic regions of GC content deviation. The genomes of X. campestris pv. campestris 17, X. citri pv. fuscans ISO118C5, X. citri pv. fuscans ISO118C3, X. citri pv. phaseoli var. fuscans CFBP6167 and X. citri pv. phaseoli var. fuscans CFBP6996R presented only three PAIs, making the lowest number found in this analysis. On the other hand, X. citri subsp. malvacearum XcmH1005 exhibited the highest quantity, with 15 exclusive PAIs. The total number of PAIs found for all genomes is available in Table A2. Most of the islands found on the genomes constituted CDSs annotated as hypothetical proteins, only being able to infer their function based on the enrichment analysis. Among the X. citri subsp. citri genomes, strain TX160197 exhibited the highest number of PAIs (14) and high genomic similarity with other strains, except for AW13, AW14, AW15, AW16, TX160042, and TX160149. Besides, all different strains from this subspecies exhibited similarity below 70% for PAI24 and gaps inside this genomic region ( Figure 3). As for X. campestris, strain ATCC 33913 contained ten PAIs, of which most were highly conserved in this species' strains, except for strain 17 and X. campestris pv. raphani 756C. As for X. vasicola pv. arecae NCPB2649 and X. vasicola pv. musacearum ( Figure 4).
The previous scenario was similar for X. citri subsp. malvacearum: PAIs in the reference strain XcmH1005 had high conservation, although gap regions were present in large islands, such as PAI14 ( Figure 5). However, in X. citri pv. phaseoli var. fuscans, almost all islands found in the selected reference strain CFBP7767 consisted of gap regions in the other strains of this pathovar, except for strains CFBP6992 and CFBP4885 ( Figure 6), also similar to what was observed in genomes of X. citri pv. fuscans . Figure 3. Alignments of X. citri subsp. citri genomes and their genomic islands. PAI24 is well conserved for (A) strains AW13, AW14, AW15, and AW16, but not for others such as (B) BL18, FB19, gd2, and gd3. It is possible to observe that PAI24 does not present the complete region in the strains BL18, FB19, gd2, and gd3 showing a low identity (<70%) region.

A B
X. campestris pv. campestris  ForX. citri pv. aurantifolii genomes, strain FDC1559 presented 11 well-conserved and exclusive PAIs. Moreover, strain 1566 had a similarity of 100% with the reference genome ( Figure 7). For X. citri pv. vignicola, X. citri pv. anacardii, X. axonopodis pv. vignicola, and X. axonopodis pv. anacardii, most of the PAIs identified in the reference genome were gap regions in the other strains ( Figure 8). Finally, X. citri pv. glycines exhibited the highest conservation of pathogenicity islands among their strains ( Figure 9). Xanthomonas citri pv. glycines Figure 9. Alignments of X. citri pv. glycines genomes and genomic islands. All pathogenicity islands identified in reference strain 12-2 are conserved in strain 8ra, except for PAI5.
As for interspecies comparisons, genomes of X. axonopodis pv. anacardii showed high genomic identity when compared against the genome of X. campestris pv. campestris ATCC 33913. However, none of the PAIs identified in the latter showed high conservation amongst X. axonopodis pv. anacardii genomes, as most of its PAIs, matched with significant genomic gaps ( Figure 10). In contrast, PAIs found in X. citri subsp. citri TX160197 had higher conservation and similarity. Similar profiles were observed for PAIs in X. citri subsp. malvacearum Xcmh1005, X. citri pv. phaseoli var. fuscans CFBP7767, X. axonopodis pv. vignicola CFBP7111 and X. axonopodis pv. glycines 12-2. The highest similarity of PAIs was observed in comparison with the genome of X. fuscans subsp. aurantifolii FDC1559 (PAIs 6, 11, 12, and 28) ( Figure 11). B A Figure 10. Genomic islands comparisons with (A) X. campestris pv. campestris ATCC33913 and (B) X. citri subsp. citri TX160197 against strains of X. citri pv. anacardii. The islands showed higher similarity for X. citri subsp. citri genomes than for X. campestris, but no genomic islands of both species were conserved in X. citri pv. anacardii genomes. Although some of these islands were conserved in different pathovars, they presented different levels of integrity. An example is CDS XCG122_1740, a virulence regulator located in PAI8 of X. citri pv. glycines 12-2, which suffered an inversion in X. citri pv. anacardii IBSBF2579 PAI6, corresponding to CDS IBSBF_1976 ( Figure S7). On the other hand, some genomic islands exhibited exceptional conservation in certain strains of the same pathovar, exhibiting almost no changes, such as for PAI21 of X. citri pv. vignicola CFBP7111, CFBP7112, and CFBP7111 ( Figure S8).

Cellular Localization and Ontology of Pathogenicity Islands
Most PAIs contained hypothetical proteins that were secondly annotated as putative virulence factors, such as sequences related to carbohydrate metabolic process (GO:0005975), hydrolase activity (GO:0016787), DNA-binding transcription factor activity (GO:0003700), and signaling receptor activity (GO:0038023).
Subcellular analysis of 28 well-conserved PAIs indicated that most proteins had cytoplasmic or unknown cellular localization. Only 119 coding sequences had different localizations, while only nine were described as extracellular proteins: three found in X. citri pv. anacardii IBSBF2579, one in X. citri pv. vignicola CFBP7111 and X. citri pv. aurantifolii FDC1559, and two in X. citri pv. glycines 12-2 and X. citri pv. fuscans ISO188C1 genomes. No extracellular proteins were found in well-conserved PAIs belonging to X. campestris or other X. citri genomes. Finally, five proteins were found for the outer membrane, and 15 proteins might be in multiple positions inside the cell. All the other proteins found were for cytoplasmic membrane location.
Interestingly, some genomic islands and prophage regions overlapped. For instance, in the genome of X. campestris pv. campestris ATCC33913, PAI13, a genomic island of identity above 90% when compared against other genomes of the same species, comprehended mostly insertion sequences (IS) transposases and matched a region of an intact prophage. Those sequences belonged to the Isxcc1 and IS3 families (19 out of 23 CDSs), and others were annotated as phage-related proteins. The different sequences were putative virulence factors related to molecular functions, such as carbon-sulfur lyase activity (GO:0016846) and oxidoreductase activity (GO:0016491). In addition, many transposable elements in well-conserved PAIs were seen in PAIs 23 and 24 of X. citri subsp. citri TX160197, PAI4 of X. citri subsp. malvacearum AR81009 (which was well conserved in the genomes of strains XcmH1005 and XcmN1003, but not in MSCT and MS14003), and PAIs 11 and 12 of the X. citri subsp. malvacearum AR81009 PAI11. The genome of X. citri pv. phaseolis var. fuscans CFBP7767 showed sequences related to bacterial conjugation, pili, transposases, and virulence factors within PAI13. As for X. citri pv. anacardii IBSBF2579 genome, PAI8 was not only counted with many transposable elements but also a few putative virulence factors as well. However, this region did not match any prophage region detected in the genome, intact or not.

Insights Provided by the Pan-Genome of Xanthomonas
Some pangenomics studies on certain species of Xanthomonas have been carried out to find out what makes strains more or less virulent. For instance, pan-genome tools have been addressed to aid in taxonomy revisions of X. arboricola, and also to find relationships between different lineages of Xanthomonas species based on the core genome [35][36][37]. Some effectors, such as TALEs and T3SEs, are present in most strains of X. citri pv. glycines, indicating a closed pan-genome due to low differences in genomic content [38,39]. For X. perforans, however, it has been found that most of the pan-genome is made of more than 4000 genes present in accessory genomes, indicating an open genome due to recombination mechanisms of this species [16]. Studies on the sister clade of Xanthomonas and Stenotrophomonas have shown that almost only 10% of the genes make part of a core genome of this genus [40]. In the present study, we described the pan-genome for the most representative species of the genus. We suggest the reason it is still open is due to a high diversity of accessory genomes, especially regarding their pathogenicity traits. These results open new perspectives for the study of evolutionary genomics of Xanthomonas, focusing on their accessory genomes.

Enrichment Analysis in Pathogenicity Islands
The genomic features annotation was more in-depth than in previous studies [41,42], considering mainly the number of identified pseudogenes. Moreover, the high percentage of unique genes with general or unknown functions in X. citri strains had already been associated with its wide range of diversification [43]. Two significant characters of integrative conjugative elements (ICE) are found in a high number of Xanthomonas PAIs: integrase-like sequences and type IV secretion systems (T4SS) genes. Since ICEs may also originate from genomic islands, the high quantity of these types of sequence may indicate Xanthomonas might have acquired genomic islands through ICE events, and the well-conserved ones among strains are those containing mobile elements encoded by ICE [44]. Furthermore, as PAIs evolve to carry only the genes essential for their maintenance and transference [45], considering the PAIs here identified lack the usual genetic features commonly associated with pathogenicity (such as virulence factors, toxins, adhesins, and others), perhaps the genomic islands of Xanthomonas might have already gone through strong evolutionary pressure. As seen in Figure S7, a few putative virulence factors changed their genomic syntenic region when transference occurred, which may indicate genome shuffling. Although component genes of T4SS were not directly associated with virulence in different Xanthomonas species, the presence of T4SS genes in PAIs is essential because it may aid in interspecific competition whenever contact between different plant microbiome bacteria happens, thus, facilitating host colonization [42,46]. Our results present significant ontological terms of biological, molecular, and cellular processes implicated in pathogenicity and bacterial virulence. Considering biological process features analysis, a considerable number of GO terms classified as DNA-mediated transposition was found in genomes of X. citri. The recombinant DNA-mediated gene can produce xanthan gum, which is essential for colonization and bacterial growth [47,48], promoting infection success. In molecular mechanisms, genes encoding type III effectors play a crucial role in plant-pathogen interaction, stimulating defense responses [49,50]. Sequences related to the nucleic acid-binding ontology, present in most PAIs, have already been described in X. campestris [51]. Those proteins may act as catabolic factors and are involved in phytopathogenesis regulation; however, it is important to note that not only nucleic acid-binding proteins are responsible for gene regulation [52]. Activities related to the growth and proliferation of bacteria within the host are also associated with the presence of iron-sulfur binding proteins, essential for micronutrient acquisition required for pathogenic bacteria [53][54][55]. They play critical roles in the control of many cellular activities essential to pathogenicity, such as metabolic pathways and intracellular transport molecules [56]. Genomic evolution studies of X. campestris pv. campestris and X. citri pv. citri identified 30 genomic islands that originated from the lateral transference of genes [57]. This suggests that it could be associated with the improved ability of a pathogen to adapt to specific environments and/or hosts. In our study, homologous genes for resistance were found on the pathogenicity islands as well. Metallopeptidases, for instance, are typically associated with catalytic activities such as the cleavage of reactive oxygen species (ROS) in dismutation reactions [58]. The burst of ROS has an antimicrobial effect on the plant cell, so those genes are associated with defense within the host [59]. It demonstrates how genes located in PAIs do not strictly act as virulence factors but may also aid in the pathogen protection against natural defenses on host.
Finally, this work is the first study to carry out an in-depth comparative analysis of Xanthomonas species presenting ontological characterization and excellent curation of genes and their predicted products in these complete genomes. The pathogenicity island analysis revealed an essential role in the infection process and a wide field for experimental investigation in pathways acting directly on plant-pathogen response. Due to the environmental microbial diversity, this community undergoes selective pressures over the years, making it difficult to understand the association of a pathovar/disease to a particular genomic repertoire. More studies aimed at the microbial community can open a potential spectrum on this subject. Due to the high similarity among PAIs belonging to X. citri pv. anacardii, X. citri pv. glycines and X. citri pv. fuscans strains, we propose that those pathovars might have similar virulence mechanisms. Therefore, the same methods for diagnostic and preventive treatment might also be considered for the affected crops. The response mechanism that those different hosts display to stop these pathogens also needs to be considered.

Prophage Regions and the Evolution of Bacterial Phytopathogens
Many studies have tried to infer the origin and role of prophage regions in bacteria to elucidate a disease or the ability to interact with the host; nevertheless, the mechanisms still seem unknown [60,61]. Plant-pathogen interactions allow a high genetic variation and exchange both between the environment and between the host, so the higher is the host range of a pathogen, the higher genetic variability it displays. Transposable element insertion may alter the frame of determining genes, reducing or increasing their expression level and, therefore, affecting bacterial virulence. The presence of many genetic elements in these genomes reinforces the occurrence of past recombination or horizontal transfer events [8].
Prophage regions have already been reported for X. campestris pv. campestris ATCC 33913, attributed to insertion events before the evolution and separation events among X. campestris and Xanthomonas oryzae species, with identity superior to 90% [60]. Since we observed overlapping regions with PAIs and prophage sequences, it suggests that phage insertions occurred in one original strain and disseminated through genomic island sharing events. Most strains of the same pathovar were obtained in the same geographical region, reinforcing that they probably shared those genomic regions before the diversification of those species.

Conclusions
Overall, the participation of pathogenicity islands in the Xanthomonas genus' accessory genome is undeniable since most islands were specific in each lineage subgroup, despite the overall great genomic similarity. Up until now, a significant part of Xanthomonas' molecular features is still unknown, and as they are also present in PAIs, they may contain vital information for understanding its virulence. On this thought, future studies are fundamental to help understand pathogen-host relationships better, and that knowledge would also influence the development of preventive biotechnological tools in modern agriculture, especially for plant immunization. A broad accessory pan-genome influencing pathogenicity means developing specific strategies to fight each pathovar on their own, but a soon-to-be-closed pan-genome means those strategies might be effective for a considerable time. In addition to that, the characterization of the different PAIs investigated in this study suggests that some may be indirectly involved in virulence, as many of them contain transposable elements that can alter the actual genomic configuration and control of gene expression originating from past phages. Finally, Xanthomonas bacteria are highly adapted to phytopathogenicity due to their big genetic arsenal.