Comparative Genome Analyses of Plant Rust Pathogen Genomes Reveal a Confluence of Pathogenicity Factors to Quell Host Plant Defense Responses

Switchgrass rust caused by Puccinia novopanici (P. novopanici) has the ability to significantly affect the biomass yield of switchgrass, an important biofuel crop in the United States. A comparative genome analysis of P. novopanici with rust pathogen genomes infecting monocot cereal crops wheat, barley, oats, maize and sorghum revealed the presence of larger structural variations contributing to their genome sizes. A comparative alignment of the rust pathogen genomes resulted in the identification of collinear and syntenic relationships between P. novopanici and P. sorghi; P. graminis tritici 21–0 (Pgt 21) and P. graminis tritici Ug99 (Pgt Ug99) and between Pgt 21 and P. triticina (Pt). Repeat element analysis indicated a strong presence of retro elements among different Puccinia genomes, contributing to the genome size variation between ~1 and 3%. A comparative look at the enriched protein families of Puccinia spp. revealed a predominant role of restriction of telomere capping proteins (RTC), disulfide isomerases, polysaccharide deacetylases, glycoside hydrolases, superoxide dismutases and multi-copper oxidases (MCOs). All the proteomes of Puccinia spp. share in common a repertoire of 75 secretory and 24 effector proteins, including glycoside hydrolases cellobiohydrolases, peptidyl-propyl isomerases, polysaccharide deacetylases and protein disulfide-isomerases, that remain central to their pathogenicity. Comparison of the predicted effector proteins from Puccinia spp. genomes to the validated proteins from the Pathogen–Host Interactions database (PHI-base) resulted in the identification of validated effector proteins PgtSR1 (PGTG_09586) from P. graminis and Mlp124478 from Melampsora laricis across all the rust pathogen genomes.


Introduction
Cereal rust pathogens cause major crop losses, threatening food security and sustainability of crop production in 31 countries across the world (http://www.fao.org/ agriculture/crops/thematic-sitemap/theme/pests/wrdgp/en/, accessed 14 July 2022). Rust pathogens evolve to generate new virulent races by sexual reproduction and somatic hybridization with the ability of long-distance transmission via air-borne urediniospores [1][2][3]. a broad range of hosts, including wheat, maize, sorghum and switchgrass. It would also help us to answer important questions, such as: does a genome synteny signal a similar infection strategy employed by the rust pathogens? Do all monocot rust pathogens secrete similar types of effector proteins? Is there any enrichment of a particular gene family in rust pathogens based on their adaptation on specific hosts?
In this study, we aimed to perform a comprehensive analysis of cereal rust pathogen genomes, with a focus on their secretory proteins and effectors. Further, we aimed to understand the structural variation between the monocot cereal rust pathogen genomes and the role of repeat elements in shaping their genomes. Our results as presented here showcase the complexity of the cereal rust pathogen genomes and their structure and novelty in their variations in terms of their gene families, repeat content and effector proteins.

Analysis of P. novopanici Genome and Comparison with Other Puccinia Species
In our previous study, a de novo hybrid assembly of P. novopanici with 101,620,558 bp was generated from PacBio and Illumina data [18]. The gene annotation of this assembly identified 19,064 gene models resulting in 16,622 non-redundant transcripts [18]. Gene ontology (GO) analysis resulted in the identification of 9427 proteins with GO terms (Table S1A). GO classification resulted in the 7683 predicted proteins that were classified as involved in the biological process or other molecular functions (Table S1A),~975 with the predicted enzymatic activities and~2515 with TM domains (Table S1B,C; Figure S1A). Approximately 42% of the predicted proteins are predicted to be nuclear-localized and 7% cytosolic ( Figure S1B).

Structural Variation and Phylogeny of Puccinia Species
Genome plasticity allows fungal pathogens to quickly adapt to changing environments and thus conquer new frontiers in host invasions. This is particularly interesting for rust genomes, as they are constantly coevolving with their cereal and non-cereal hosts [37]. To study the structural variation in the Puccinia spp. genomes, we performed genomic alignments of Pgt 21-0, Pgt Ug99, Pgt 75-36-700-3, Pt BBBD1, P. novopanici, Pst CY32, Pst 78, Pst 38S102, P. coronata, P. hordei, P. sorghi and an outlier Melampsora laricis-populina (98AG31) (Figure 1) with progressive mauve [38].In each of the genomes, two strands of information represent the positive and negative strand of DNA, with the sense strand on the upper side of the bar (Figure 1). Following the alignment of the Puccinia spp. genomes, a set of 26,175 locally collinear blocks (LCBs) were identified that appear in the same order and orientation in the genomes ( Figure 1, Table S3). P. novopanici gene annotations were adapted to guide identification of regions/blocks during the alignment (Table S1). Few predicted transcripts were found to be conserved across all rust genomes (e.g., Cytochrome b5 reductase and Aconitase hydratase); some with abundant copies in P. novopanici and P. sorghi (e.g., S/T phosphatase, histone N-methyl transferase); few predicted transcripts were either present or absent in P. novopanici in comparison to other genomes (e.g., CTD kinases, ammonium transporters). The phylogenetic tree suggests that P. novopanici is more closely related to P. sorghi than wheat rust genomes ( Figure S2). A local alignment of the P. sorghi genome to the P. novopanici genome resulted in the identification of 6251 LCBs ( Figure 2 and Table S3), showing a closer synteny of the genomes. BLAST analysis of the P. novopanici predicted transcripts shows clear identity to P. sorghi, with 76% homology. A good amount of synteny can be observed at the whole genome level between Pgt 21 and Pgt Ug 99; P novopanici and Ps; Pgt 21 and Pt; P novopanici and Pt; P sorghi and Pt; Pst 78 and Pt (Figure 3). A moderate amount of synteny exists between P novopanici and P. coronata; Pgt 21 and P. hordei and between P. coronata and P. hordei ( Figure 3). P. novopanici and P. sorghi were non-collinear with wheat rust pathogen genomes Pgt 21 and Pst 78 ( Figure 3). Strong segmental gaps and genomic re-arrangements in a few blocks are observed between genomes Pgt 21 and Pst 78; Pgt 21 and Pt; Pt and P. novopanici ( Figure 3). The structural variations between the cereal rust pathogen genomes display a birth-death model for genes thus making them adaptable to their environments [39][40][41].
Pst 78 (Figure 3). Strong segmental gaps and genomic re-arrangements in a few blocks are observed between genomes Pgt 21 and Pst 78; Pgt 21 and Pt; Pt and P. novopanici ( Figure  3). The structural variations between the cereal rust pathogen genomes display a birthdeath model for genes thus making them adaptable to their environments [39][40][41].

Gene Family Identification and Their Functional Relevance
The comparison of Puccinia spp. genomes resulted in the identification of several gene families that were analyzed iteratively to derive their phylogenetic relatedness (Table S4). All the proteins from Puccinia spp. and M. larici-populina were searched for homology to the proteins in PANTHER database [42][43][44] and the resulting proteins were grouped into 2462 protein families with 9176 subfamilies (https://www.zhaolab.org/P_novopanici/ download, accessed 28 June 2022; Table S4). The large protein families with a maximum number of family members include helicases, zinc finger proteins, lysophospholipases, transcription factors and transporters. Interestingly, P. novopanici has a significantly higher number of restriction of telomere capping 4 (RTC4; PTHR41391) proteins (45 RTC4 proteins) in comparison to P. sorghi (15 RTC4 proteins) (https://www.zhaolab.org/P_novopanici/ download, accessed 28 June 2022). These proteins were identified previously in a genetic screen in a budding yeast [45] and were thought to play a role to counteract specific aspects of DNA damage response (DDR) in fungi [46]. In addition, other protein families that were enriched in P. novopanici include Multicopper oxidases (MCOs) and Lysophospholipases (Table S4).
In the MCOs protein family (PTHR11709-SF414), there are 58 protein family members in all the rust pathogens analyzed in this study. MCO family members are enzymes that oxidize their substrate by accepting electrons and result in the reduction of oxygen into two molecules of water. MCO coding genes were previously identified to be redundant in fungal genomes due to their role in different physiological roles depending on environmental conditions [47]. Similar to the RTC4 protein complex, P. novopanici has a higher number  (20) in comparison to other Puccinia spp. (Table S3C). The phylogeny tree for the MCO protein family of Puccinia spp. shows five subgroups with paralogs in each of the subgroups ( Figure S4). Consistent with the phylogenetic relationship, P. novopanici MCO protein family members are closer to P. sorghi compared to wheat and poplar rust genomes ( Figure S4).
Lysophospholipases (PTHR10728-SF33) found in fungi are involved in diverse processes, such as membrane homeostasis, nutrient acquisition, microbial pathogenesis and virulence [48]. We identified 135 protein family members in all the Puccinia spp., analyzed through profile hidden Markov model using PANTHER databases (Table S4). Phylogenetic classification of the Lysophospholipase family members resulted in four subgroups ( Figure S5A). Interestingly, we identified P. novopanici to have 36 Lysophospholipase family members in comparison to 20 in wheat rust genomes, 17 in P. sorghi and 21 in M. larici-populina (Table S4). Similar to the MCO family, we found that P. novopanici is phylogenetically closer to P. sorghi compared to other rust pathogens analyzed ( Figure S5). Interestingly, each subgroup has several orthologs and more than one paralog for each genome analyzed ( Figure S5).
As expected, some of the predicted protein families are either missing or have a reduced representation in M. larici-populina compared to Puccinia spp. (Table S4) due to the weak phylogenetic relationship between them. Interestingly, DNA helicase (PTHR10492:SF76) protein family members are more present in P. novopanici and P. sorghi compared to wheat and poplar rust pathogens. In contrast, few other protein families, such as cell surface superoxide dismutase (PTHR10003), were significantly enriched in wheat rust pathogens compared to P. novopanici and P. sorghi (Table S4). Phylogenetic similarity and synteny of cereal rust pathogens may help us to better understand the mechanisms of infection. All other protein family phylogeny trees were placed into a zip folder and are available for download (https://www.zhaolab.org/P_novopanici/download, accessed 28 June 2022).

Repetitive Elements from the Comparisons of Puccinia Genomes Species
Transposable elements (TEs) were shown to be the primary contributors to fungal genome diversity resulting from genome wide rearrangements, insertions and segmental deletions [49]. To understand the genome size variations within the rust pathogens, we analyzed the repeat elements in their genomes [50][51][52] (Table 2). TEs in fungi are broadly classified into two major classes; retroelements and DNA transposons based on the type of replication mechanisms [53]. Repeat element analysis of all the Puccinia spp. reveals that the genome size is directly correlated with the number of repeat elements and the lengths they have in a genome (Table 2). Puccinia genomes have a variable number of repeat elements from 1 to 3%, and a vast majority of those repeat elements are found to be retroelements ( Table 2). The retroelements class comprises short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs) and long terminal repeats (LTR) elements, of which the LTR elements class represents more than 80% (Table 2). A full-length analysis of the repeat elements with the lengths of each repeat element and their percentage of the genomes in all the Puccinia spp. is presented as supplemental information (Table S5). The highest number of retroelements, 7644, are identified from P. hordei (genome size~207 MB) in comparison to other genomes (Tables 2 and S5). Among the Puccinia spp., retroelements are comparatively higher in P. coronata (4365), Pgt 21-0 (3439) and Pgt Ug99 (3449), and the lowest number of retroelements are present in P. striiformis 38S102 (964) ( Tables 2 and S5). Interestingly, Puccinia spp. with a smaller size of the genomes (Pst 38S102, Pgt 75-36-700-3) have a smaller number of repeat elements and retroelements compared to the relatively larger Puccinia genomes ( Table 2). The second most abundant class of the repeat elements are the DNA transposons (Tables 2 and S5). DNA transposons comprise 0.02-0.2% of the Puccinia genomes and are comprised of the elements hobo-activator, TC1 element class, PiggyBac, Harbinger and En-Spm class of elements. Simple repeats and other classes of the repeat elements also follow the same correlation statistic with the genome size (Tables 2 and S5).  160  129  272  278  139  86  127  92  95  217  288  79   Tc1-IS630-Pogo  68  81  83  80  35  49  77  47  13  82  1760  49 En-Spm

Effector Proteins of P. novopanici in Comparison to Other Puccinia spp.
Plant pathogenic fungi particularly the biotrophic plant pathogenic fungi target the host defense system by silencing their defense genes in various compartments of the plant cell [54]. One of the primary criteria for being an effector protein is by the presence of a signal peptide, high effector-probability score and presence in the cytoplasm [55][56][57]. The entire proteomes of the Puccinia spp.: Pgt 21-0, Pgt Ug99, Pgt 75-36-700-3, Pt 77, P. novopanici, Pst 78, P. coronata and P. sorghi were analyzed for the presence of signal peptide and the secretory proteins summarized (Table S6). The highest number of secretory proteins were identified in Pgt 21-0 (15%, 5493 proteins) and Pgt Ug99 (14%, 5352 proteins), while a lower percentage of proteins with signal peptide were found in P. novopanici (6%, 1031) and P. sorghi (5%, 950) in this analysis (Table S6). All Puccinia spp. share 75 predicted secretory proteins between them ( Figure S3). The common secretory proteins include super oxide dismutase, glucan endoglucosidases, glucanases, vacuolar proteases, pectin esterases, cuticle degrading proteases, chitin deacetylases, lysophospholipases, endochitinases and transporter proteins, all of which are involved in host cell wall disruption or in activities that help fungi survive in their respective hosts (Table S6). P. novopanici shares an additional 165 secretome protein complex with P. sorghi with 90% protein homology between the two species ( Figure S3), and these include superoxide dismutases, cell wall degrading enzymes and iron transport multicopper oxidases that may help them survive in their natural host environments (Table S6). P. striiformis f. sp. tritici shares 171 predicted secretory proteins with P. triticina and 243 proteins with P. sorghi. Each of the Puccinia spp. have their own unique sets of secretory proteins (P. novopanici, 89; P. sorghi, 115; Pst 78, 195; Pt, 115; P. coronata 93; and Pgt 21, 3782) that were not shared with other Puccinia genomes and probably evolved because of host specialization ( Figure S3).
Following the prediction of proteins that contain signal peptide, the resulting proteins are analyzed with EffectorP version 3.0 [56] for identification of effector proteins (Table S7). Approximately 10% of the Puccinia spp. proteomes were embedded with the effector proteins. Similar to SignalP predictions, the highest number of effector proteins were predicted in Pgt 21-0 (11%, 4072 proteins) and Pgt Ug99 (11%, 3075 proteins), while lower percentages of effector proteins were predicted in P. novopanici (2%, 519) and P. sorghi (2%, 495). (Figure 4). The effector proteome comparison resulted in the identification of 24 common effector proteins (Table S7). The most commonly identified effector proteins include surface super oxide dismutase proteins, ATPases with a role in the protein import into endoplasmic reticulum (ER), NADH dehydrogenases, ER vesicle proteins, phosphatidyl glycerol/phosphatidylinositol transfer proteins, sodium-dependent amino acid transporters, glucanases, carbohydrate esterase proteins and laccase/multicopper oxidases. All the effector proteins were searched for BLASTP homology anchoring them to either P. novopanici or Pgt 21-0 for the purpose of deriving the common annotations. All the effector proteins from different genomes were presented in the Venn diagram ( Figure 4). The genomes P. sorghi and P. novopanici share 76 effectors among them, while P. graminis and P. triticina share 417 effector proteins between them (Figure 4). Significantly enriched effector protein families or classes of proteins were summarized from all the Puccinia spp. (Table 3). These proteins include DNA helicases, fatty acid synthases, proteins involved in transport, phosphorylation and host modification enzymes, such as hydrolases or dehydrogenases (Table 3). Summary analysis of all secretory and effector proteins from Puccinia spp. suggests that each of the Puccinia spp. allocates roughly 10% of its genome for secretory protein complex. We further analyzed cell surface super oxide dismutase (SOD) protein family and constructed the phylogenic tree ( Figure S5B). SOD helps the invading pathogen by detoxifying the reactive oxygen species (ROS) in host plants, thus evading host defense responses [58]. Apart from SOD, other enzymatic proteins are rich in P. novopanici, which probably makes it suitable to infect switchgrass, unlike other rust pathogens. All the secretome and effector protein family phylogenies can be downloaded as Newick trees (https://www.zhaolab.org/P_novopanici/download, accessed 28 June 2022). Gene expression studies available publicly confirmed the expression of the genes corresponding to the predicted effector proteins from P. novopanici [18,20]. The effector protein classes were predicted and analyzed using EffectorP version 3.0 [56]. The classes that had representation in more than two genomes were considered as enriched for effector classes. Abbr: Puccinia graminis tritici (Pgt); Puccinia striiformis tritici (Pst); Puccinia triticina (Pt); Puccinia coronata avenae (Pc); Puccinia novopanici (Pn); Puccinia sorghi (Ps); Puccinia hordei (Ph).

Identification of Host Pathogenicity-Related Genes in Puccinia spp.
Virulence factors are the most important class of proteins in pathogens, as they can counteract the defense mechanisms of the host and enhance the spread of the pathogen [59]. Pathogen-host interactions database (PHI-base) catalogs experimentally verified pathogenicity genes, virulence and effector genes from fungal, oomycete and bacterial pathogens of animal, plant, fungal and insect hosts [60]. To find the experimentally validated pathogenicity, virulence factors and effector proteins of plant rust pathogens, proteomes of Pgt 21-0, Pgt Ug99, Pgt 75-36-700-3, Pt, P. novopanici, Pst 78, P. coronata and P. sorghi were used to query against the 7544 PHI-base proteins [60][61][62]. BLASTP analysis against PHI-base resulted in the identification of validated proteins involved in pathogenicity, virulence and effector-related functions (Table S8). Potential matches were identified based on the identity percentage greater than 45% (Table S8). The potential pathogenicity genes identified in this analysis may play important roles in the infection and development of the fungi as they were shown to be functional pathogenic factors from related fungi. A comparative analysis of all rust pathogen genome predicted effector proteins and the corresponding PHI-base validated proteins, their mutant phenotype and    The effector protein classes were predicted and analyzed using EffectorP version 3.0 [56]. The classes that had representation in more than two genomes were considered as enriched for effector classes. Abbr: Puccinia graminis tritici (Pgt); Puccinia striiformis tritici (Pst); Puccinia triticina (Pt); Puccinia coronata avenae (Pc); Puccinia novopanici (Pn); Puccinia sorghi (Ps); Puccinia hordei (Ph).

Identification of Host Pathogenicity-Related Genes in Puccinia spp.
Virulence factors are the most important class of proteins in pathogens, as they can counteract the defense mechanisms of the host and enhance the spread of the pathogen [59]. Pathogen-host interactions database (PHI-base) catalogs experimentally verified pathogenicity genes, virulence and effector genes from fungal, oomycete and bacterial pathogens of animal, plant, fungal and insect hosts [60]. To find the experimentally validated pathogenicity, virulence factors and effector proteins of plant rust pathogens, proteomes of Pgt 21-0, Pgt Ug99, Pgt 75-36-700-3, Pt, P. novopanici, Pst 78, P. coronata and P. sorghi were used to query against the 7544 PHI-base proteins [60][61][62]. BLASTP analysis against PHI-base resulted in the identification of validated proteins involved in pathogenicity, virulence and effector-related functions (Table S8). Potential matches were identified based on the identity percentage greater than 45% (Table S8). The potential pathogenicity genes identified in this analysis may play important roles in the infection and development of the fungi as they were shown to be functional pathogenic factors from related fungi. A comparative analysis of all rust pathogen genome predicted effector proteins and the corresponding PHI-base validated proteins, their mutant phenotype and gene function were summarized in Table 4. All the Puccinia genomes studied share a conservatively similar group of pathogenicity genes, while distinct subgroups of virulence, pathogenicity and effector genes are shared between the Puccinia genomes. Most prominent effector proteins with validated functions commonly identified in all the Puccinia genomes are conserved glycoside hydrolase family 7 cellobiohydrolase, effector protein, peptidyl-propyl cis-trans isomerases, polysaccharide deacetylases and protein disulfide-isomerases (Table S8). One of the proteins, polysaccharide deacetylases, was validated from P. striiformis interaction studies. The corresponding gene Pst_13661 has a mutant phenotype of reduced virulence. Other commonly found effector proteins from our analysis were found to have a match with validated proteins PgtSR1 (PGTG_09586) from P. graminis and Mlp124478 from M. laricis (Table S8). We identified a total of 132 known validated proteins from all the Puccinia genomes, which comprise twenty-six proteins from Pgt 21; twenty-three from Pgt Ug99; twenty-four from Pst78; ten from Pt77; nine from Pgt 75-36-700; six from P. sorghi; eleven from P. novopanici; eight from P. coronata and fifteen from M. laricis (Table 4).

Discussion
We compared the rust pathogen genomes from wheat, maize, sorghum and switchgrass to understand the underlying variability and the causal factors for infection diversity. A large number of syntenic blocks were observed between P. novopanici and P. sorghi, suggesting more collinearity among these two genomes in comparison to other Puccinia spp., although quite a number of collinear blocks are observed between all the rust pathogen genomes compared.
A large structural variation identified among the rust genomes Pgt 21-0, Pgt Ug99, Pgt 75-36-700-3, Pt BBBD1, P. novopanici, Pst CY32, Pst 78, Pst 38S102, P. coronata, P. hordei and P. sorghi might be due to the insertions, deletions and various genomic re-arrangements of the genomic fragments during the course of pathogen evolution. It was known that TEs promote chromosomal rearrangements through homologous recombination and alternative transposition [24]. We verified the presence of the repeat elements in all the Puccinia spp. and found that the repeat elements occupy 1-3% of the genomes, which might not have been pivotal to the genome size contribution. A recent extensive analysis of repeat content in 18 fungal genomes, including strains of the same species and species of the same genera, concluded that an exceptional variability of 0.02% to 29.8% exists within their genomes due to TEs [49]. Another study that compared 10 different fungal genomes for their TEs content identified a very low rate of repeat induced point mutations (RIP) in Ascomycota and Basidiomycota, which leaves their genome more vulnerable for repeat expansion [63]. Recent comprehensive analyses of fungal TEs show an exceptional variability in the repeat content [64,65], in which amplification events tend to be more related to the fungal lifestyle than to phylogenetic proximity [63,66].
Effectorome and secretome studies from all the Puccinia spp. (Pgt 21-0, Pgt Ug99, Pgt 75-36-700-3, Pt BBBD1, P. novopanici, Pst 78, P. coronata and P. sorghi) identified proteins involved in signal transduction (protein kinases), protein degradation (ubiquitin-related), DNA unwinding (helicase domain proteins) and other proteins useful for pathogen survival in host environments (cellulases, phosphokinases and aminotransferases). A recent study comparing the enrichment of gene families in the two rust fungal genomes Melampsora larici-populina (poplar leaf rust) and P. graminis (wheat stem rust) identified gene families encoding host-targeted, hydrolytic enzymes acting on plant biopolymers, such as proteinases, lipases and several sugar-cleaving enzymes (carbohydrate-active enzymes; CAZymes), to be highly up-regulated in both rust pathogen transcriptomes [14]. Further, we were also able to confirm the expression of the genes corresponding to the effector protein predictions from RNAseq studies published on P. novopanici [18,20].
A secretory repertoire of enzymes, including the hydrolytic enzymes or cell wall degrading enzymes, are often employed by the rust pathogens in mounting a successful infection strategy. The effector proteins that were identified can be validated through reverse genetics by host-induced gene silencing (HIGS). Similar mechanisms were instrumental in generation of wheat plants with resistance against Pst [31,67,68] and Pt [69]. The identification and characterization of effectors and their cognate R genes is an important first step to understanding the host-pathogen biology in rusts and, consequently, to our ability to develop sustainable and potentially more durable resistance breeding strategies. In direct evidence of effector suppression of host defenses, rust effector protein Mlp124478 was shown to have a virulence effect in Arabidopsis, and it suppresses host immune responses by binding to the TGA1a promoter [30]. Some oomycetes secretory proteins with special signatures, such as RXLX [EDQ] or RXLR motifs in pathogens, function as effectors that manipulate and/or destroy host cells [70]. The RXLR motif, however, has not been observed as readily in rust fungal proteins, and no other consensus motif has been identified that easily distinguishes rust effectors [71]. Some of the most common effector proteins include chitin binding effectors, protease inhibitors, cysteine protease inhibitors, peroxidase inhibitors, glucoside hydrolases and fungal phospholipases [72]. Wheat stem rust fungus Pgt produces a tryptophan 2-monooxygenase (Pgt-IaaM) specifically in the haustorium to produce excessive indole acetic acid (IAA) in the host cells during infection in wheat to disrupt phytohormone-based defense signaling pathways [73]. Genes corresponding to secreted protein families, such as cutinases, pectin esterases, endo1-4 β-D glucanases and mannanases, showed gene expansion in Pst and Pgt; however, this phenomenon was not observed in the genomes of t or other Puccinia genomes [74]. All the rust pathogen genomes, Pgt 21-0, Pgt Ug99, Pgt 75-36-700-3, Pt, P. novopanici, Pst 78, P. coronata and P. sorghi, share 75 predicted secretory proteins and 24 common effector proteins.
A significant number of pathogenicity-related (PR) genes, such as TaPR5 (Thaumatinlike), TaPR10 and TaGlu (Glucan endo-1,3-beta-glucosidase GII precursor), have been previously shown to be induced during stripe rust infection [75,76]. Secretome analysis of seven stripe rust isolates identified species-specific proteins, suggesting the diverse roles they play in their interactions with wheat hosts [77]. In our gene family identification and comparison of effector proteins across Puccinia rust pathogens, cell surface SOD was one of the families identified with a variation in the number of gene family members across different Puccinia spp. SOD helps the invading pathogen to detoxify ROS in host plants, thus evading one of the host defense responses [58]. Therefore, SOD may be one of the contributing factors in host specificity. All the wheat rust pathogen genomes carry a significant number of SOD gene family members (15)(16)(17)(18) in comparison to seven in P. sorghi and six in P. novopanici. Experimental validation of effectors or secretory proteins is challenging as Puccinia spp. are obligate biotrophs.
RecQ DNA helicases are another variable class of the effector family found to be significantly enriched in rust pathogens. RecQ DNA helicases are known for their ability to unwind various DNA structures and also contribute to stabilization and repair of damaged DNA replication forks, telomere maintenance, homologous recombination and DNA damage checkpoint signaling [78]. A strong presence of the family members of DNA helicases also suggests the aggressive repair mechanisms to defend and survive in a diverse host environment. Apart from DNA helicases, P. novopanici also has a higher number of RTC4 proteins, which, in combination with DNA helicases, can help the pathogen counter DDR responses. In combination with the RTC protein complex and DNA helicases, there seem to be more specific mechanisms fortified in Puccinia genomes towards DNA damage response.
In a recent study of 16 plant fungal genomes for plant cell wall (PCW)-and fungal cell wall (FCW)-degradation-associated CAZymes, genes encoding CAZymes were shown to be lower in the Puccinia spp. studied [79]. In comparison to necrotrophic and hemibiotrophic fungi, genes encoding PCW-and FCW-degradation-associated CAZymes were significantly lower in wheat rust pathogens Pgt, Pt and Pst [79]. Perhaps the higher numbers of PCW-and FCW-degradation-associated CAZymes in P. novopanici and P. sorghi show the requirement of additional gene family members to support their infection in their hosts.
Consistent with these results, these effector proteins, when compared to the validated proteins involved in the pathogen-host interactions from PHI-base, identified proteins that were effectors, glycoside hydrolase, peptidyl-propyl cis-trans isomerase, polysaccharide deacetylase and protein disulfide-isomerases. One of the effector proteins identified from our studies was validated as effector protein Pst_13661 in the interaction studies, with a mutant phenotype exhibiting reduced virulence. A few other effector proteins identified through our studies also found homology to the known validated proteins PgtSR1 (PGTG_09586) from P. graminis and Mlp124478 from M. laricis. Many of these predictive sets of effector proteins can be used to effectively characterize them and can also be used as a panel of effectors in the rust pathogen 'effectorome' studies. Assembly information was generated using assembly scan tool (https://github.com/ rpetit3/assembly-scan, accessed 28 June 2022) and NG50/LG50 measures were performed as described [35,36].

Gene Prediction and Annotations
Genome annotation was performed in P. novopanici as previously described [18] and further validated by P. novopanici RNAseq data [20]. Genes were predicted using FGENESH against Puccinia spp. (P. graminis, P. triticina and P. striiformis and P. sorghi) with default parameters. FGENESH output and sequences were parsed. Motifs and domains were annotated using InterProScan24 by searching against GO databases. Finally, the results annotated from the KOG, GO, KEGG, NR, Swissprot and TrEMBL databases were combined to obtain the final annotation of the P. novopanici genome. Complete gene feature file annotations (gff) along with protein FASTA format files for P. novopanici genome are presented in the portal (https://www.zhaolab.org/P_novopanici/download, accessed 28 June 2022).

Phylogenetic Analysis of Gene Family
Maximum likelihood (ML) method was used to build the phylogenetic relationship of the detected gene families. The protein sequences in each gene family were aligned by MAFFT software (https://mafft.cbrc.jp/alignment/software/, accessed 28 June 2022). Then, ML trees were built by RAxML (https://cme.h-its.org/exelixis/web/software/ raxml/index.html, accessed 28 June 2022) software with bootstrap setting of the value 100. Tree visualization was conducted using interactive tree of life (iTOL; https://itol.embl.de/, accessed 28 June 2022), which is an online tool developed for the display, annotation and management of phylogenetic trees [82][83][84]. Each of the classes identified were analyzed for their gene family structure and are available for visualization. All the phylogenies can be downloaded from https://www.zhaolab.org/P_novopanici/download, accessed 28 June 2022.

Gene Family Detection
All the protein sequences were matched against the PANTHER database (PAN-THER15.0; http://pantherdb.org/, accessed 28 June 2022) with the pantherScore2.0 program and HMMER3 (http://hmmer.org/, accessed 28 June 2022) and grouped by the PANTHER family ID. All proteins from Puccinia spp. and M. larici-populina were searched for homology to the proteins in PANTHER database and grouped into 2462 protein families with 9176 subfamilies. Iterative analysis occurred to deduce the phylogenetic relationship between these families. All other protein family phylogeny trees are placed into zip folder and are available to download (https://www.zhaolab.org/P_novopanici/download, accessed 28 June 2022).

Identification of Repeat Elements
Repeat elements belonging to various classes, including LTRs, non-LTRs and DNA transposon elements (TE), were identified using the BLAST homology comparison of the whole genome sequences against the repeat databases [85][86][87][88][89]. Repeat elements identified in their genomes using RepeatMasker version 4.1.2 [50][51][52] using the genome fasta files that were downloaded from NCBI as described earlier [50][51][52]90]. RepeatMasker is run with the default parameters and in parallel cores of 32.

Conclusions
Surveying different Puccinia spp. genomes and their gene families helped us to understand the complex nature of evolutionary forces that shaped the structure of cereal plant rust genomes and their fitness to colonize and infect their hosts. Stronger synteny and collinearity were observed between P. novopanici and P. sorghi; P. graminis tritici 21-0 (Pgt 21) and P. graminis tritici Ug99 (Pgt Ug99) and between Pgt 21 and P. triticina (Pt), showing the conserved family and gene structure among them. Repeat element analysis indicated a strong correlation of repeat elements to the genome size variation of~1-3%. All the Puccinia spp. share in common a repertoire of 75 secretory and 24 effector proteins, including glycoside hydrolases cellobiohydrolases, peptidyl-propyl isomerases, polysaccharide deacetylases and protein disulfide-isomerases, that remain central to their pathogenicity. The comparison of the predicted effector proteins from Puccinia spp. genomes to the validated proteins from Pathogen-Host Interactions database (PHI-base) resulted in the identification of validated effector proteins PgtSR1 (PGTG_09586) from P. graminis and Mlp124478 from Melampsora laricis across all the rust pathogen genomes. Many of these predictive sets of effector proteins, which were shown to be functional through the pathogen-host interactome studies, can be used to effectively characterize them and can also be used as a panel of effectors in the rust pathogen 'effectorome' studies. Further, the effector proteins that were identified can be validated through reverse genetics by host-induced gene silencing (HIGS) and can be deployed for durable resistance in cereal crop plants.  Table S2: BUSCO analysis of Puccinia spp. genomes. All the selected Puccinia spp. genomes Pgt 21-0, Pgt Ug99, Pgt 75-36-700-3, Pt BBBD1, P. novopanici, Pst CY32, Pst 78, Pst 38S102, P. coronata, P. hordei, P. sorghi and an outlier Melampsora laricis were analyzed for testing the completeness of their genomes. Supplemental Table S3: (A) Locally collinear blocks between P. novopanici and other Puccinia spp. for first 100 positions. B) Locally collinear blocks between P. novopanici and P. sorghi. Supplemental Table S4: PANTHER database gene family summary. (A) All the gene families were recursively searched in the PANTHER database and were summarized. (B) Protein family homology of Puccinia spp. using PANTHER database. Supplemental Table S5: Repeat element analysis of Puccinia spp. All the repeat elements from all the Puccinia genomes Pgt 21-0, Pgt Ug99, Pgt 75-36-700-3, Pt BBBD1, P. novopanici, Pst CY32, Pst 78, Pst 38S102, P. coronata, P. hordei, P. sorghi and an outlier Melampsora laricis were analyzed for their numbers, lengths and percentages. Supplemental Table S6