Seventeen Ustilaginaceae High-Quality Genome Sequences Allow Phylogenomic Analysis and Provide Insights into Secondary Metabolite Synthesis

The family of Ustilaginaceae belongs to the order of Basidiomycetes. Despite their plant pathogenicity causing, e.g., corn smut disease, they are also known as natural producers of value-added chemicals such as extracellular glycolipids, organic acids, and polyols. Here, we present 17 high-quality draft genome sequences (N50 > 1 Mb) combining third-generation nanopore and second-generation Illumina sequencing. The data were analyzed with taxonomical genome-based bioinformatics methods such as Percentage of Conserved Proteins (POCP), Average Nucleotide Identity (ANI), and Average Amino Acid Identity (AAI) analyses indicating that a reclassification of the Ustilaginaceae family might be required. Further, conserved core genes were determined to calculate a phylogenomic core genome tree of the Ustilaginaceae that also supported the results of the other phylogenomic analysis. In addition, to genomic comparisons, secondary metabolite clusters (e.g., itaconic acid, mannosylerythritol lipids, and ustilagic acid) of biotechnological interest were analyzed, whereas the sheer number of clusters did not differ much between species.


Introduction
The family of Ustilaginaceae belongs to the order of Ustilaginomycetes. It consists of 14 genera, including Ustilago, Sporisorium, and Macalpinomyces [1]. The taxonomy of the Ustilaginaceae, which is still a matter of debate, was focused on during several studies resulting in several taxonomic revisions [2]. For instance, Kirk et al. (2008) proposed that the Ustilaginaceae family comprises 17 genera comprising 607 species [3]. The ever-increasing genomic data should settle the discussion at one point. Ustilaginaceae are known for their plant pathogenicity and the capability to infect economically essential crops, including barley, sugarcane, wheat, and oats [3]. Even though these plant diseases cause crop loss, smut fungi attracted particular attention in the field of industrial biotechnology.
Ustilaginaceae show a versatile product spectrum including organic acids (e.g., itaconate, malate, succinate), polyols (e.g., erythritol, mannitol), and extracellular glycolipids, which are considered value-added chemicals with potential applications in the pharmaceutical, food, and chemical industries [4,5]. For instance, mannosylerythritol lipids (MELs) are surface-active glycolipids secreted by various fungi, including U. maydis. MELs can be used as biosurfactants, displaying the much-asked biodegradability resource for the

Nanopore Library Preparation and GridION ® Sequencing
A sequencing library with genomic DNA from the different fungal strains was prepared using the Nanopore Rapid DNA Sequencing kit (SQK-RAD04, Oxford Nanopore Technologies, Oxford, UK) according to the manufacturer's instructions. Sequencing was performed on an Oxford Nanopore GridION Mk1 sequencer using an R9.4.1 flow cell, which was prepared according to the manufacturer's instructions.

Illumina Library Preparation and MiSeq Sequencing
Whole-genome-shotgun PCR-free libraries were constructed from 5 µg of gDNA with the Nextera XT DNA Sample Preparation Kit (Illumina, San Diego, CA, USA) according to the manufacturer's protocol. Quality of the resulting libraries was controlled by using an Agilent 2000 Bioanalyzer with an Agilent High Sensitivity DNA Kit (Agilent Technologies, Santa Clara, CA, USA) for fragment sizes of 500-1000 bp. Paired-end sequencing was performed on the Illumina MiSeq platform (2 × 300 bp, v3 chemistry). Adapters and low-quality reads were removed by an in-house software pipeline prior to polishing as recently described [21].

Base Calling, Reads Processing, and Assembly
MinKNOW (Oxford Nanopore Technologies) was used to control the run using the 48 h sequencing run protocol; base calling was performed offline using Bonito. The assembly was performed using canu v2.1.1 [22]. The resulting contigs were polished with Illumina short-read data using Pilon [23] run for ten iterative cycles. BWA-MEM [24] was used for read mapping in the first five iterations and Bowtie2 v2.3.2 [25] in the second set of five iterations.

Gene Prediction and Genome Annotation
Gene prediction was performed by applying GeneMark-ES 4.6.2 [26] using default settings. Predicted genes were functionally annotated using a modified version of the genome annotation platform GenDB 2.0 [27] for eukaryotic genomes as previously described [28]. For automatic annotation within the platform, similarity searches against different databases, including COG [29], KEGG [30], and SWISS-PROT [31], were performed.

Comparative Genome Analyses and Phylogenetic Analysis
The genomes of the sequenced and annotated fungal strains were used for comparative genome analyses. Comparative analyses between fungal genomes were accomplished using a modified version of the comparative genomics program EDGAR designed to handle eukaryotic genomes and their multi-exon genes [34,35] as described recently [36]. Within the EDGAR workflow, classification of genes as core genes (genes shared in all genomes within a tested set) or singletons was performed based on BLAST Score Ratio Values (SRVs) and visualized in a Venn diagram. In addition, Average Nucleotide Identity (ANI) and Average Amino Acid Identity analyses (AAI) were performed based on the GeneMark prediction similarly to previously described methods [36] to determine the relationship between the different species. Multiple alignments for all core protein sequences were created using MUSCLE [37]. Pairwise Percentage of Conserved Proteins (POCP) analysis was performed according to [38] and as previously described [19,39,40].
For the AAI method, the percent identity values between amino acid sequences of orthologous genes as computed by the BLAST algorithm are analyzed [41]. In comparison, ANI values are computed based on an all-against-all BLAST comparison of 1020-bp pieces of the nucleotide sequences of the selected genomes as described by Goris et al. (2007) [42]. An ANI or AAI of 97% for a pairwise genome comparison is the currently recommended replacement for the 70% DDH values for species delineation of fungi [43]. The average nucleotide and amino acid identity between two genomes can be used for fungal species delineation, but it is not suitable for genus demarcation. We used the Percentage of Conserved Proteins (POCP) between two strains to estimate their evolutionary and phenotypic distance. A comprehensive genomic survey indicated that the POCP could serve as a robust genomic index for establishing the genus boundary [19,38].

Genomic Data
Within this work, 17 different Ustilaginaceae strains from 14 species were chosen for genome sequencing by application of third-generation Oxford Nanopore sequencing in combination with second-generation Illumina sequencing. Long-read nanopore sequencing was combined with the short-read Illumina sequencing method to improve base accuracy and thus significantly reduce error rates in the final genomes. The strain selection was based on previous work, including relevant species such as U. cynodontis [44], U. trichophora [45], U. vetiveriae [11], and U. maydis [6,46]. Several U. maydis isolates were included as previous studies determined differences in the secondary metabolite profiles. Apart from the named Ustilago species, a broad species selection was aimed for, including Macalspinomyces [47], Pseudozyma [48], Sporisorium [47], and Ustanciosporium.
The amount of obtained reads by Nanopore sequencing ranged from 175,216 to 621,889 with an average read count of 350,192, whereas the mean read length shows an average of 13,253 bp (8214-17,321 bp). Assemblies polished by Illumina short reads resulted in an average contig number of 66 (22-316) with a 72-fold average coverage (45.8-95.3 fold). Established genome sequences range in size from 18 to 36 Mb and feature GC contents around 54%, which is common for eukaryotes as they barely reach GC values above 60%. Table 2 shows the assembly results in more detail, incorporating genome size, contig numbers, GC-content, and the number of annotated genes.
A previous study presented draft genome sequences from nine different Ustilaginaceae that are known for itaconic acid production [17]. Nevertheless, so far, no study has focused on a broader spectrum of Ustilaginaceae isolates for whole-genome sequencing, including genomic and phylogenomic analyses.
In summary, the combinatorial approach of Illumina and nanopore sequencing resulted in 17 high-quality and state-of-the-art fungal draft genome sequences that can be used for further analysis.

Phylogenomic Analysis
To deduce the phylogeny of the Ustilaginaceae isolates, the comparative genomics platform EDGAR 3.0 was applied. Based on all core genes determined for the 14 species-in total 2370 genes-a phylogenetic tree was computed ( Figure 1). In the case of two strains-U. maydis No. 482 and U. gigantosporum Uma 706-two isolates were sequenced. Thus, they were labelled with -1 and -2, respectively. The tree consists of six different clusters. In total, two outgroups are visible representing Pseudozyma antartica NRRL Y -7808 and U. gigantosporum UMa706 (group 1, highlighted in blue), as well as M. ordensis BRIP 26904 a (group 2), followed by two clusters consisting of two different Ustilago species. The largest cluster consists of all U. maydis isolates (highlighted in orange), U. vetiveriae, and M. mackinlayi. The last cluster includes all Sporisorium isolates and U. curta. A previous study presented draft genome sequences from nine different Ustilaginaceae that are known for itaconic acid production [17]. Nevertheless, so far, no study has focused on a broader spectrum of Ustilaginaceae isolates for whole-genome sequencing, including genomic and phylogenomic analyses.
In summary, the combinatorial approach of Illumina and nanopore sequencing resulted in 17 high-quality and state-of-the-art fungal draft genome sequences that can be used for further analysis.

Phylogenomic Analysis
To deduce the phylogeny of the Ustilaginaceae isolates, the comparative genomics platform EDGAR 3.0 was applied. Based on all core genes determined for the 14 speciesin total 2370 genes-a phylogenetic tree was computed ( Figure 1). In the case of two strains-U. maydis No. 482 and U. gigantosporum Uma 706-two isolates were sequenced. Thus, they were labelled with -1 and -2, respectively. The tree consists of six different clusters. In total, two outgroups are visible representing Pseudozyma antartica NRRL Y -7808 and U. gigantosporum UMa706 (group 1, highlighted in blue), as well as M. ordensis BRIP 26904 a (group 2), followed by two clusters consisting of two different Ustilago species. The largest cluster consists of all U. maydis isolates (highlighted in orange), U. vetiveriae, and M. mackinlayi. The last cluster includes all Sporisorium isolates and U. curta. Figure 1. Phylogenetic tree of the investigated Ustilaginaceae strains. The phylogenetic tree is based on the core genes (2370 genes) of the selected strains. The SH-like local support values were computed using FASTtree within the comparative genomics tool EDGAR 3.0 [49]. 1,2 Two isolates were sequenced. a Additional strain isolated from sample #2836* (Table 1).
Previous comparative studies on genomes of smut fungi have indicated that U. maydis is more closely related to other taxa than to species of Ustilago [1]. Other systematic studies confirmed this fact and further showed that U. maydis is closely related to species of Sporisorium and Anthracocystis [50,51]. The close relationship can be explained by similarities in their host-plant infection mechanism. McTaggart et al. (2016) recovered U. maydis in a clade, among others, with M. mackinlayi, which all form hypertrophied sori in inflorescences of their hosts [1]. Furthermore, the authors considered that localized, host-derived, hypertrophied sori were an apomorphy for this group [1]. During our phylogenomic analysis based on the core genes of the selected strains, the close relationship between U. maydis, M. mackinlayi, and Sporisorium species was also observed, confirming previous studies ( Figure 1). The question remains if U. maydis or the other Ustilago species require a reclassification. The phylogenetic tree is based on the core genes (2370 genes) of the selected strains. The SH-like local support values were computed using FASTtree within the comparative genomics tool EDGAR 3.0 [49]. 1,2 Two isolates were sequenced. a Additional strain isolated from sample #2836* (Table 1).
Previous comparative studies on genomes of smut fungi have indicated that U. maydis is more closely related to other taxa than to species of Ustilago [1]. Other systematic studies confirmed this fact and further showed that U. maydis is closely related to species of Sporisorium and Anthracocystis [50,51]. The close relationship can be explained by similarities in their host-plant infection mechanism. McTaggart et al. (2016) recovered U. maydis in a clade, among others, with M. mackinlayi, which all form hypertrophied sori in inflorescences of their hosts [1]. Furthermore, the authors considered that localized, host-derived, hypertrophied sori were an apomorphy for this group [1]. During our phylogenomic analysis based on the core genes of the selected strains, the close relationship between U. maydis, M. mackinlayi, and Sporisorium species was also observed, confirming previous studies ( Figure 1). The question remains if U. maydis or the other Ustilago species require a reclassification.

Comparative Genomics
In order to determine the similarities within the Ustilaginaceae species, comparative genomic tools such as Percentage of Conserved Proteins (POCP, Figure 2), Average Nucleotide Identity (ANI, Figure 3), and Average Amino Acid Identity (AAI, Figure 4) were applied to the whole genome sequences. The conserved proteins between a pair of genomes were determined by aligning all the protein sequences of one genome (query genome) with all the protein sequences of another genome using the BLASTP program [38]. Proteins from the query genome were considered conserved when they had a BLAST match with an E-value of less than 1 × 10 −5 , a sequence identity of more than 40%, and an alignable region of the query protein sequence of more than 50% [38]. Two isolates were sequenced in case of U. maydis No. 482 and U. gigantosporum UMa706.

Average Nucleotide Identity
Apart from POCP, ANI ( Figure 3) was performed to determine the nucleotide-level genomic similarity between two genomes. The obtained ANI values range from 76.6 to 99.9% identity. Again, the highest identities and similarities were observed for the U. maydis isolates ranging from 96.7 to 99.9%. Further, the other comparisons showed a maximum identity of 99.8% for U. curta and U. xerochloae. Thus, a clear boundary between inter-and intraspecies comparison is not feasible for our obtained data set using ANI analysis only. Furthermore, differentiation of interspecific and intergeneric identities was impossible due to overlapping identities similar to POCP.  observed that members of the Hypoxylaceae share at least 70% of their nucleotide content [19]. The different Ustilaginaceae isolates compared during this study showed at least 77% nucleotide content, which is comparable to the previous data received from other fungal families. This threshold is identical to the one estimated for the POCP analysis.
For prokaryotic genome data sets, ANI analyses have been widely used to identify genomic variations and, thus, define species boundaries [52]. Concerning fungal data sets, studies are available for comparably small taxon selections [53]. For selected Rhizoctonia solani isolates, an approximate sequence identity of 80% was shown. In a recent study of a larger data set on Hypoxylaceae, a sequence identity within the family of 73.2-93.3% The conserved proteins between a pair of genomes were determined by aligning all the protein sequences of one genome (query genome) with all the protein sequences of another genome using the BLASTP program [38]. Proteins from the query genome were considered conserved when they had a BLAST match with an E-value of less than 1 × 10 −5 , a sequence identity of more than 40%, and an alignable region of the query protein sequence of more than 50% [38]. Two isolates were sequenced in case of U. maydis No. 482 and U. gigantosporum UMa706.

Pairwise Percentage of Conserved Proteins
The POCP analysis has been proposed as a novel method to define genus boundaries in prokaryotes [38]. In prokaryotic studies, POCP analysis revealed that all pairwise comparisons of species from different families resulted in values lower than 50%, the proposed threshold for a genus boundary. A recent study by  proposed that this threshold cannot be directly applied to fungal genomes [19]. Since fungal genes are much more conserved than prokaryotic genes, the threshold was proposed to be set at 70% separating family and non-family members [19]. POCP analysis can be used to compare two strains to estimate their evolutionary and phenotypic distance. Thereby, the quality of POCP analysis depends on the reliability of the applied gene prediction models.  showed that protein sequences derived from GeneMark delivered an adequate coverage of the gene content, which is comparable to RNA-Seq-based prediction pipelines [19]. obtained identities (76.6 to 99.9%) [19,54]. When comparing the ANI and POCP between species pairs in the family of Ustilaginaceae, it can be seen that POCP values are overall higher. Comparing U. curta BRIP 2629a with U. maydis isolates, POCP identities of 91.5-91.8 were obtained. ANIs for the comparison resulted in values of 79.5%. A similar phenomenon, higher conservation of protein sequence, was observed for the family of Hypoxylaceae, confirming the obtained results during this study [19].

Figure 3.
Pairwise Average Nucleotide Identity (ANI) analysis between genome-sequenced Ustilaginaceae. The ANI between the query genome and the reference genome was calculated as the mean identity of all BLASTN matches [42]. Two isolates were sequenced in the case of U. maydis No. 482 and U. gigantosporum Uma706.

Average Amino Acid Identity
AAI ( Figure 4) measures the genomic difference of the investigated strains based on orthologous proteins [19]. The AAI identities of the different Ustilaginaceae range from 67.7-99.9%, indicating that the strains share at least two-thirds of the presented proteincoding genes. Excluding U. gigantosporium and U. xerochloae with a contaminated or diploid genome, the identities increase to 74.6-99.9%. The different sequenced U. maydis isolates show AAIs ranging from 99-100%. Comparing isolates from different genera, AAIs up to 91.2% were observed, e.g., comparing U. curta BRIP 26929 and S. walkeri RK031. Comparing the different U. maydis isolates and S. walkeri RK031 results in AAIs of 80%. U. xerochloae BRIP 2629a and U. maydis isolates showed identities of 78%. During a previous study,  proposed an intergeneric and interfamilial threshold value of 75% for the tested Hypoxylaceae [19]. This study focused on one fungal Ustilaginaceae family only. Nevertheless, AAIs above 74.6% were observed, supporting the findings of the previous study.
In comparison to ANI (Figure 3), the AAIs were higher, which can be explained by the degenerated genetic code as nucleotide sequence changes do not necessarily result in an amino acid change [55].  The AAI between the query genome and the reference genome was calculated as the mean identity of all BLASTN matches [42]. Two isolates were sequenced in the case of U. maydis No. 482 and U. gigantosporum Uma706.

Gene-Based Comparison
Results of the genomic comparison by analyzing the number of core and individual genes between different U. maydis and Sporisorium isolates, respectively, are displayed in Figure 5. Similar to the POCP, the numbers strongly depend on the accuracy of gene prediction and hence cannot be considered as exact measurements. Nevertheless, it gives insights into the distribution of orthologous genes between related and unrelated species.
U. maydis isolates ( Figure 5A) share 6245 core genes and contain between 136 (U.  The POCP identities determined in this study ranged from 52-99% ( Figure 2). Thereby, the observed identities of <70% could be explained by the impurity of another fungus in samples of U. gigantosporium and U. xerochloae. Excluding this artifact, the observed identities are comparable to the previously published ones showing POCP identities from 70-99% between the tested family members [19]. The different U. maydis strains within one species even share 96-99% identity. The remaining species show identities up to 95%. Nevertheless, U. maydis displays the only species that contained several strains within the obtained analysis. In order to verify a potential species threshold, various additional isolates should be genome-sequenced. Differentiation of interspecific and intergeneric identities between the different Ustilaginaceae isolates was not possible, which was also observed in previous studies due to overlapping identities [19]. Thus, our results combined with the phylogenomic analysis (Section 3.2) indicate that a reclassification of the Ustilaginaceae family might be required.

Average Nucleotide Identity
Apart from POCP, ANI (Figure 3) was performed to determine the nucleotide-level genomic similarity between two genomes. The obtained ANI values range from 76.6 to 99.9% identity. Again, the highest identities and similarities were observed for the U. maydis isolates ranging from 96.7 to 99.9%. Further, the other comparisons showed a maximum identity of 99.8% for U. curta and U. xerochloae. Thus, a clear boundary between inter-and intraspecies comparison is not feasible for our obtained data set using ANI analysis only. Furthermore, differentiation of interspecific and intergeneric identities was impossible due to overlapping identities similar to POCP.  observed that members of the Hypoxylaceae share at least 70% of their nucleotide content [19]. The different Ustilaginaceae isolates compared during this study showed at least 77% nucleotide content, which is comparable to the previous data received from other fungal families. This threshold is identical to the one estimated for the POCP analysis.
For prokaryotic genome data sets, ANI analyses have been widely used to identify genomic variations and, thus, define species boundaries [52]. Concerning fungal data sets, studies are available for comparably small taxon selections [53]. For selected Rhizoctonia solani isolates, an approximate sequence identity of 80% was shown. In a recent study of a larger data set on Hypoxylaceae, a sequence identity within the family of 73.2-93.3% was demonstrated [19]. Moreover, in a study on Arthrodermataceae-related species, identities of 76.4-90.0% were observed, which are comparable to the other studies and our obtained identities (76.6 to 99.9%) [19,54]. When comparing the ANI and POCP between species pairs in the family of Ustilaginaceae, it can be seen that POCP values are overall higher. Comparing U. curta BRIP 2629a with U. maydis isolates, POCP identities of 91.5-91.8 were obtained. ANIs for the comparison resulted in values of 79.5%. A similar phenomenon, higher conservation of protein sequence, was observed for the family of Hypoxylaceae, confirming the obtained results during this study [19].

Average Amino Acid Identity
AAI (Figure 4) measures the genomic difference of the investigated strains based on orthologous proteins [19]. The AAI identities of the different Ustilaginaceae range from 67.7-99.9%, indicating that the strains share at least two-thirds of the presented protein-coding genes. Excluding U. gigantosporium and U. xerochloae with a contaminated or diploid genome, the identities increase to 74.6-99.9%. The different sequenced U. maydis isolates show AAIs ranging from 99-100%. Comparing isolates from different genera, AAIs up to 91.2% were observed, e.g., comparing U. curta BRIP 26929 and S. walkeri RK031.
Comparing the different U. maydis isolates and S. walkeri RK031 results in AAIs of 80%. U. xerochloae BRIP 2629a and U. maydis isolates showed identities of 78%. During a previous study,  proposed an intergeneric and interfamilial threshold value of 75% for the tested Hypoxylaceae [19]. This study focused on one fungal Ustilaginaceae family only. Nevertheless, AAIs above 74.6% were observed, supporting the findings of the previous study.
In comparison to ANI (Figure 3), the AAIs were higher, which can be explained by the degenerated genetic code as nucleotide sequence changes do not necessarily result in an amino acid change [55].

Gene-Based Comparison
Results of the genomic comparison by analyzing the number of core and individual genes between different U. maydis and Sporisorium isolates, respectively, are displayed in Figure 5. Similar to the POCP, the numbers strongly depend on the accuracy of gene prediction and hence cannot be considered as exact measurements. Nevertheless, it gives insights into the distribution of orthologous genes between related and unrelated species. In addition to the core genome, the regions involved in secondary metabolite synthesis for the different Ustilaginaceae isolates were analyzed and compared (Table 3). In general, these fungi include-also in comparison with other Basidiomycota, e.g., R. solani [19]-only a few secondary metabolite synthesis genes and gene clusters, which agrees with previous reports [56]. The number ranges between 10 and 15 regions per genome.
Here, the differences between species are not very pronounced. Most of the predicted regions belong to the core genome of the isolates. However, also unique regions were predicted, e.g., the RIPP region in U. gigantosporum Uma706 or a second T1PKs region in M. mackinlayi BRIP 52549a. Here, additional experiments are needed, such as RNA sequencing experiments to predict the function of the metabolites and to predict if it is involved in the pathogenic interaction with the host plant. Differences between the two respective isolates of U. maydis No. 482 and U. gigantosporum Uma706 can be explained by separate cultivation and isolation experiment. In the case of eukaryotes, such genetic differences are a common phenomenon. In general, the presented results solely display a prediction on the chosen enzyme families and do not cover all potential secondary metabolites. In the case of polyketide synthases (PKSs), antiSMASH predicts one type 1 PKS, whereas it is known that U. maydis has three polyketide synthases (pks3, pks4, and pks5) for the production of the pigment melanin [57]. Indeed, antiSMASH updates are announced to further improve the predictions of secondary metabolite synthesis gene clusters in varying organisms [58]. Another example of missed secondary metabolites are isoprenoids derived from carotenoids-for example, the plant hormone abscisic acid, of which its synthesis was reported for U. maydis [59]. The largest class of genes encoding enzymes involved in secondary metabolite synthesis identified here are NRPSs (Table 3). A promi- In addition to the core genome, the regions involved in secondary metabolite synthesis for the different Ustilaginaceae isolates were analyzed and compared (Table 3). In general, these fungi include-also in comparison with other Basidiomycota, e.g., R. solani [19]-only a few secondary metabolite synthesis genes and gene clusters, which agrees with previous reports [56]. The number ranges between 10 and 15 regions per genome. Here, the differences between species are not very pronounced. Most of the predicted regions belong to the core genome of the isolates. However, also unique regions were predicted, e.g., the RIPP region in U. gigantosporum Uma706 or a second T1PKs region in M. mackinlayi BRIP 52549a. Here, additional experiments are needed, such as RNA sequencing experiments to predict the function of the metabolites and to predict if it is involved in the pathogenic interaction with the host plant. Differences between the two respective isolates of U. maydis No. 482 and U. gigantosporum Uma706 can be explained by separate cultivation and isolation experiment. In the case of eukaryotes, such genetic differences are a common phenomenon. In general, the presented results solely display a prediction on the chosen enzyme families and do not cover all potential secondary metabolites. In the case of polyketide synthases (PKSs), antiSMASH predicts one type 1 PKS, whereas it is known that U. maydis has three polyketide synthases (pks3, pks4, and pks5) for the production of the pigment melanin [57]. Indeed, antiSMASH updates are announced to further improve the predictions of secondary metabolite synthesis gene clusters in varying organisms [58]. Another example of missed secondary metabolites are isoprenoids derived from carotenoids-for example, the plant hormone abscisic acid, of which its synthesis was reported for U. maydis [59]. The largest class of genes encoding enzymes involved in secondary metabolite synthesis identified here are NRPSs (Table 3). A prominent example of a Ustilaginaceae product requiring NRPS enzymes is the metal chelator ferrichrome [60].

Itaconate Cluster Identification
Based on the obtained genomes, secondary metabolite clusters were investigated in order to verify genetic similarities and differences between the Ustilaginaceae. During previous empirical studies, differences in itaconic acid production were observed for various Ustilaginaceae strains [5]. Thus, the itaconate gene cluster was compared at first ( Figure 6). The complete itaconate cluster was identified in six isolates: U. maydis No. 198, 485, 482, and 512, as well as in U. vetiveriae RK075 and U. cynodontis. M. mackinlayi BRIP 52549a did not contain mtt1 and ria1, which encode a mitochondrial tricarboxylate transporter and the regulator of itaconic acid biosynthesis, respectively [14]. Thus, itaconate production was tested during a cultivation experiment (Supplementary Figure S1). An itaconate production of 2.2 ± 0.01 g L −1 after 48h was observed, indicating a functional itaconic acid synthesis cluster that is also reacting to nitrogen starvation as previously reported for the other Ustilago itaconate producer [9]. Previous studies focused on the role of different itaconate cluster genes as, e.g., a deletion of mtt1 in U. maydis MB215 (No. 512) led to a strong decrease in itaconate production, but its production was not completely abolished [14]. This transporter is known as the rate-limiting step in itaconate biosynthesis in U. maydis [61]. Most likely in M. mackinlayi BRIP 52549a, different less specialized, and therefore less efficient, transport proteins substitute the dedicated itaconic acid transporter, as most eukaryotic mitochondrial transporters have a diverse substrate spectrum with different affinities [62].
Four isolates, U. gigantosporum UMa706, S. walkeri RK031, and U. xerochloae BRIP 60876a, only harbor itp1, encoding the itaconate transport protein lacking the remaining itaconic acid cluster genes. This transporter secretes itaconic acid, itatartarate, and 2-hydroxyparaconate from the cytosol to the exterior of the cell [62]. As itaconate reduction was observed in some long-lasting experiments (Supplementary Figure S2), itaconate import might be facilitated by Itp1. Indeed, the degradation of itaconate might be important in some niches due to the antimicrobial properties of itaconic acid [63,64]. Geiser et al. [17] presented draft genome sequences of itaconate-producing Ustilaginaceae, including S. iseilematis-ciliati BRIP 60887a, P. tsukubaensis NBRC 1940, P. hubeiensis NBRC 105055, as well as U. maydis and U. vetiveriae strains. All strains showed the complete itaconate cluster except for P. tsukubaensis NBRC 1940. Available genome sequences of investigated Ustilaginaceae strains could be included in the obtained data, which helps facilitate further research on the biology of Ustilaginaceae and increase the list of tools for metabolic engineering of itaconate production by Ustilaginaceae.
To further investigate the itaconic acid cluster genes, amino acid sequence identities were compared ( Figure 6B). Each enzyme shares amino acid sequence identities with the corresponding enzymes in U. maydis No. 512, with identities ranging from 35-100%. Among those, the highest identities were found for the different U. maydis isolates ranging from 96-100%. The lowest identity of 96% is obtained comparing U. maydis No. 485 and No. 512 for genes rdo1 and mtt1. Mtt1 is directly involved in itaconic acid biosynthesis as it encodes a mitochondrial tricarboxylate transporter. Rdo1 is not directly involved in the itaconic acid synthesis, but it is proposed to encode an enzyme converting (S)-2-hydroxyparaconate to itatartarate [62]. An empirical study by Geiser et al. (2014) confirmed a lower itaconate production of U. maydis 485, which might result from genetic differences within the itaconic acid gene cluster [5]. Comparing the different genes, cyp3 encoding a cytochrome P450 monooxygenase and itp1 encoding an itaconate transport protein are highly conserved, showing amino acid identities of 100% between the tested U. maydis isolates.
For U. cynodontis and U. vetiveriae, the sequence identity of proteins encoded by the itaconate biosynthesis genes is mostly conserved in a range of 68-89% compared to the U. maydis No. 512 sequence. The most divergent protein of the itaconate cluster is Ria1, showing identities from 45-46% for U. vetiveriae and U. cynodontis, respectively. Geiser et al. observed a similar identity of 43% between U. maydis and U. vetiveriae even though they are phylogenetically closely related [13]. Further, Geiser et al. (2018) were able to activate silent itaconate clusters by overexpression of Ria1 originating from related species, although the amino acid sequences of Ria1 regulators are less conserved than the enzymes involved in itaconic acid synthesis [13]. Additionally, the authors could enhance itaconate production in weak producers up to 4-fold [13]. Thus, it was shown that activation of silent secondary metabolite clusters could be achieved in a range of related species with reduced genetic engineering efforts.
To conclude, this study confirmed a highly conserved itaconate cluster of the investigated U. maydis strains (96-100% identity), even though there are differences in itaconate production and by-product formation such as (S)-2-hydroxyparaconate, which were observed during previous empirical studies. Thus, the obtained results can be used to further increase the list of tools for metabolic engineering of itaconate production by Ustilaginaceae.

MEL Cluster Identification
Mannosylerythritol lipids (MELs) are glycolipid biosurfactants produced by various basidiomycetous yeasts belonging to the genera Ustilago, Pseudozyma, Moesziomyces, and Sporisorium [66]. Apart from their interfacial activity, MELs are known to repair damaged human skin and hair. Thus, they have been commercialized in the cosmetics industry for more than a decade [66]. Hewald et al. (2006) [67,68] identified the gene cluster involved in MEL biosynthesis and outlined the pathway for MEL biosynthesis in U. maydis. The cluster consists of four enzymes and a transporter: erythritol/mannose transferase (emt1), two acyltransferases (mac1 and mac2), acetyltransferase (mat1), and the putative transporter (mmf1) (Figure 7). production and by-product formation such as (S)-2-hydroxyparaconate, which were observed during previous empirical studies. Thus, the obtained results can be used to further increase the list of tools for metabolic engineering of itaconate production by Ustilaginaceae. Blue color indicates genes directly involved in itaconate production and transport [13]. Orange-colored genes are involved in itaconic acid conversion into its derivates [14]. Cyp3 (P450 monooxygenase), rdo1 (glyoxalase domain-containing protein RDO1), tad1 (trans-aconitate decarboxylase 1) itp1 (itaconate transport protein), adi1 (aconitate-delta-isomerase 1), mtt1 (mitochondrial tricarboxylate Figure 6. Schematic overview of the itaconate biosynthesis gene cluster in U. maydis. Modified from [65] (A): Organization of the cluster genes is compared to the sequenced Ustilaginaceae isolates. Blue color indicates genes directly involved in itaconate production and transport [13].
Orange-colored genes are involved in itaconic acid conversion into its derivates [14]. Cyp3 (P450 monooxygenase), rdo1 (glyoxalase domain-containing protein RDO1), tad1 (trans-aconitate decarboxylase 1) itp1 (itaconate transport protein), adi1 (aconitate-delta-isomerase 1), mtt1 (mitochondrial tricarboxylate transporter 1), ria1 (regulator of itaconic acid biosynthesis role in pathogenic development or biofilm formation [70]. The findings in our study display a valuable genomic data set facilitating further understanding of MEL production in different Ustilaginaceae strains. The identification of gene clusters via whole-genome sequencing is essential for the modification of the MEL biosynthesis pathway. Recently, the gene clusters involved in MEL biosynthesis in various Pseudozyma strains have been identified by genomic sequencing [6]. The cluster genes were also identified in Ustilaginaceae strains, including S. graminicola and U. hordei, during recent work [69]. Within this study, 14 isolates exhibited the whole MEL cluster. S. scitamenium exhibited all genes except mac2, encoding an acyl transferase. Based on the identification of the MEL cluster, new potential MEL producers were reported within this work: U. curta, U. lituana, U. trichophora, U. xerochloae, M. ordensis, and M. mackinlayi. Nevertheless, production levels could differ due to regulation of the secondary metabolite clusters as observed for itaconate production. MEL production of strains U. cynodontis and S. exsertum was already observed during cultivation experiments [6,69]. Geiser et al. (2014) identified M. eriachnes as a natural MEL producer [5]. No MEL cluster genes were identified in U. gigantosporum, U. vetiveriae, and S. walkeri, asking for further analyses into the function of MELs.
Further, amino acid identities between the different MEL cluster genes were compared between U. maydis No. 512 and the different Ustilaginaceae strains ( Figure 7B). Amino acid identities ranging from 52-100% compared to U. maydis No. 512 were computed. A study from Saika et al. (2018) reported similar identities of 54-82% comparing different Pseudozyma strains to U. maydis No. 512 [6]. The data set obtained during this study extends the existing set with several U. maydis strains enabling the interspecies comparison of the different isolates. MEL cluster genes are highly conserved, showing identities ranging from 99-100% for the different U. maydis strains. Comparing other Ustilago species (such as U. curta, U. cynodontis, U. xerochloae, U. lituana and U. trichophora) to U. maydis No. 512 identities ranging from 55-80% were observed. Comparing the different analyzed Pseudozyma strains by Saika et al. (2018), identities from 55-82% were obtained, confirming our intraspecies (intergeneric) observations [6]. Nevertheless, no clear amino acid identity boundary can be set comparing different genera such as Macalpinomyces, Pseudozyma, and Sporisorium.
The obtained genomic and gene cluster data can be used for modifications in MEL production, e.g., via metabolic engineering. Previous studies identified mac1 and mac2 as essential genes encoding two acyltransferases for MEL production in U. maydis as the deletion strains ∆mac1 and ∆mac2 lacked any MEL synthesis [6]. Furthermore, gained knowledge about MEL cluster genes can be used to tailor MEL production in Ustilaginaceae, exploiting more strains than the existing model organisms.
During this study, 14 out of 17 isolates, i.e., 82%, exhibited genes responsible for MEL production. In general, MEL improves the accessibility of hydrophobic nutrients and enhances attachment to nonpolar surfaces. In addition to these more general functions, some biosurfactants also bind heavy metals, display antimicrobial activity, or play an important role in pathogenic development or biofilm formation [70]. The findings in our study display a valuable genomic data set facilitating further understanding of MEL production in different Ustilaginaceae strains.
The genes responsible for UA in U. maydis No. 512 production are displayed in Figure 8, which also includes the gene cluster comparison to the other isolates. Based on the given data set, only the reference genome of U. maydis No. 512 shows all genes responsible for ustilagic acid production. The remaining U. maydis isolates U. maydis No. 482, 198, and 485 lack uat2 encoding an acetyltransferase. Furthermore, another group was identified consisting of U. xerochloae, M. ordensis, M. mackinlayi, and S. scitamineum showing all UA cluster genes except orf2, which encodes an ustilagic acid biosynthesis cluster protein with a so far unknown function [65]. To conclude, 9 out of 17 isolates (52%) show UA cluster genes. Thereby, this study shows similar findings compared to Geiser et al., confirming U. maydis, S. scitamenieum, S walkeri, and M. eriachnes as UA-producing strains during cultivation experiments [5]. Furthermore, smut fungi such as U. cynodontis, U. vetiveriae, and S. exertum did not show UA production during the experiments [5]. Our genomic comparison revealed that this could be explained by the complete lack of the UA cluster. U. xerocholoae was identified as a potential novel UA producer as its UA production was not observed during previous studies.

Macro Synteny Plot of U. maydis Isolates
Genomic synteny of the tested U. maydis isolates was compared to U. maydis No. 512, the reference strain in biotechnology (Figure 9). When comparing the different isolates, a high genomic synteny can be observed between the respective organism pairs. The chromosomes structure of the U. maydis strains is highly conserved. As stated previously, in the case of two strains-U. maydis No. 482 and U. gigantosporum Uma 706-two isolates were sequenced. Thus, they were labeled with -1 and -2, respectively. Only the two isolates of U. maydis No. 482 show different rearrangements of the second chromosome. Differences between the two isolates might result from separate cultivation, assembly errors, or different read length in the respective region. Nevertheless, the two isolates show rearrangements in the same region of the second chromosome which indicates a difference to chromosomes of U. maydis No. 512.
To conclude, the highly conserved chromosome structures of the U. maydis strains confirm the high identities obtained from POCP, AAI, and ANI resulted in high identities between the different strains.
ferences between the two isolates might result from separate cultivation, assembly errors, or different read length in the respective region. Nevertheless, the two isolates show rearrangements in the same region of the second chromosome which indicates a difference to chromosomes of U. maydis No. 512.
To conclude, the highly conserved chromosome structures of the U. maydis strains confirm the high identities obtained from POCP, AAI, and ANI resulted in high identities between the different strains.

Conclusions
Here, we reported high-quality genomes of 17 Ustilaginaceae strains covering 14 species combining nanopore and Illumina sequencing. A broad strain selection for genomic investigation was aimed for based on previous work on relevant Ustilaginaceae species (i.e., U. cynodontis [44], U. trichophora [45], U. vetiveriae [11], U. maydis [6,46]). By phylogenomic comparison and the application of taxonomical genome-based bioinformatics methods such as POCP, ANI, and AAI analyses on a larger set of related fungal species, we gained insights into their relationships and were able to deduce taxonomic hierarchies. The obtained results indicate that a reclassification of the Ustilaginaceae family might be required. In addition to genomic comparisons, secondary metabolite cluster analysis was performed. The numbers of identified secondary metabolite clusters did not differ much between species and ranged between 10 and 15 regions per genome. Here, the differences between the different isolates were not very pronounced. Notably, few of the identified regions could be associated with the secondary metabolite synthesized, asking for additional experiments such as RNA sequencing to predict the function of the metabolite and to predict if it is involved in the pathogenic interaction with the host plant.
With the presented high-quality draft genomes (N50 > 1 Mb), the investigation of Ustilaginaceae biodiversity is enabled, as well as metabolic engineering of secondary metabolites production (e.g., MELs, UA, itaconic acid) is fostered. The generated genomic data set enhances our toolbox for investigating and engineering the fungal Ustilaginaceae family.