Characterization of Host-Specific Genes from Pine- and Grass-Associated Species of the Fusarium fujikuroi Species Complex

The Fusarium fujikuroi species complex (FFSC) includes socioeconomically important pathogens that cause disease for numerous crops and synthesize a variety of secondary metabolites that can contaminate feedstocks and food. Here, we used comparative genomics to elucidate processes underlying the ability of pine-associated and grass-associated FFSC species to colonize tissues of their respective plant hosts. We characterized the identity, possible functions, evolutionary origins, and chromosomal positions of the host-range-associated genes encoded by the two groups of fungi. The 72 and 47 genes identified as unique to the respective genome groups were potentially involved in diverse processes, ranging from transcription, regulation, and substrate transport through to virulence/pathogenicity. Most genes arose early during the evolution of Fusarium/FFSC and were only subsequently retained in some lineages, while some had origins outside Fusarium. Although differences in the densities of these genes were especially noticeable on the conditionally dispensable chromosome of F. temperatum (representing the grass-associates) and F. circinatum (representing the pine-associates), the host-range-associated genes tended to be located towards the subtelomeric regions of chromosomes. Taken together, these results demonstrate that multiple mechanisms drive the emergence of genes in the grass- and pine-associated FFSC taxa examined. It also highlighted the diversity of the molecular processes potentially underlying niche-specificity in these and other Fusarium species.


Introduction
The genus Fusarium represents a diverse group of Ascomycetes that are important in industry, agriculture, and medicine [1,2]. While known species in the genus include pathogens of humans [3][4][5], animals [6] and insects [7,8], the majority are destructive plant pathogens [9][10][11][12][13][14]. Additionally, Fusarium species generate a variety of toxic secondary metabolites that can contaminate feedstocks and food, affecting the quality and quantity of most agriculturally important crops [15]. Due to their importance in forestry, agriculture and medicine, the genomes of many Fusarium species have been determined [16].
Fusarium species associate with diverse plant species, ranging from gymnosperms to angiosperms, including many monocots in the grass family, Poaceae [17][18][19]. Examples of these species are included in the phylogenetically defined lineage of Fusarium,
Analysis with the Benchmarking Universal Single-Copy Orthologs (BUSCO) tool v. 3.0.2 [45] using the "Sordariomyceta" database, showed that all six genomes were from 97.3% to 99.1% complete (Supplementary Table S1). These genomes were also similar in size, gene density, and G + C content ( Table 1). Comparison of the four genomes assembled into pseudochromosomes (F. circinatum, F. fracticaudum, F. pininemorale and F. temperatum) further showed that chromosome size seemed to be conserved throughout Chromosomes 1-11, as they did not differ by more than a factor of 1.00-1. 10 (Supplementary Table S2). However, Chromosome 12 showed extreme chromosome length polymorphism (CLP) and differed by as much as a factor of 2.08, ranging from 0.5 Mb in F. circinatum to 1.1 Mb in F. fracticaudum (Supplementary Table S2).  2 As determined using MAKER2 [46].
For further analysis, we used the F. circinatum genome as a reference for pine-hostassociated species and the F. temperatum genome as a reference for grass-host-associated species. Analysis of the telomere-associated repeat sequence (TTAGGG/CCCTAA) [47,48] in the reference genomes indicated that most of the chromosomes had a telomeric cap (Supplementary Table S3). In the F. circinatum assembly, seven chromosomes had two telomeric caps and five had only one telomeric cap. In the F. temperatum assembly, nine chromosomes had two telomeric caps, two chromosomes had one telomeric cap and one chromosome had no telomeric cap.

Genes Unique to Pine-and Grass-Host-Associated Species
Analysis with OrthoFinder v. 2.3.1 [49] facilitated the identification of 72 and 47 genes that were unique to pine-or grass-host-associated Fusarium species, respectively (Supplementary Tables S4 and S5). Multiple paralogs of a few of these genes occurred in some genomes: one paralog of one gene in F. circinatum and F. pininemorale, two paralogs of two genes in F. fracticaudum and F. subglutinans and three paralogs of one gene in F. konzum. Interestingly, the paralogs of a particular gene were present together on the same chromosome, except in F. fracticaudum, where two of these genes were located on different chromosomes. No inferences could be made regarding the chromosomal location of the multiple-copy genes of F. konzum and F. subglutinans due to the fragmented nature of these assemblies. The phylogenetic relationship between the paralogous genes and the unique gene could be determined for both F. fracticaudum paralogs, the one F. pininemorale paralog and only one of the two F. subglutinans paralogs. The phylogenetic relationship for the remaining F. subglutinans paralog, along with the three F. konzum paralogs, could not be determined, as the ancestral origins of these genes were unknown and/or had no significant BLAST hit in the NCBI database (Supplementary Figure S1). All paralogs shared the same ancestral origins as the unique gene, except for F. fracticaudum. The paralog of the unique gene was located on a different chromosome and grouped within the ancestral origin group FFSC and FOSC, whereas the unique gene shared an ancestral origin outside Fusarium but in the Ascomycetes.
Most of the host-range-associated genes had a gene ontology (GO) description available from Blast2GO [50]. Of the 72 unique F. circinatum genes, no GO terms were available for 11 of the genes. Apart from these 11 genes, 26 genes were either hypothetical or predicted to encode uncharacterized proteins. The remaining 35 could be organized into four broad groups based on the proteins they encode, i.e., those involved in virulence, transcriptional regulators, substrate transporters and permeases, proteins potentially involved the metabolism of carbohydrates, fatty acids, and steroids (Supplementary Table S4). Of the 47 unique F. temperatum genes, five did not have a hit in BLAST analysis. A further 17 unique F. temperatum genes were either hypothetical or predicted to encode uncharacterized proteins. The remaining 25 genes could be organized into five broad groups based on the likely functions of their protein products, i.e., those involved in virulence, substrate transporters and permeases, proteins potentially involved the metabolism of carbohydrates, fatty acids, amino acids, and steroids, and an HET-domain protein (Supplementary Table S5).
To determine whether certain GO terms in the respective unique sets were significantly enriched relative to the rest of each respective genome, we employed Fisher's Exact test implemented in Blast2GO [50]. In the pine-host-associated set, five GO terms were significantly (p < 0.05) enriched in comparison to the whole genome (Supplementary Table S6A), whereas 21 GO terms were significantly (p < 0.05) enriched in the grass-host-associated set (Supplementary Table S6B). Of the genes associated with these enriched GO terms, one gene was classified as having a "biologically relevant" role (i.e., glutamate metabolism), and four as having "essential molecular functions", such as coding for a gene product involved in glutamate decarboxylase activity, steroid dehydrogenase activity and RNA-DNA hybrid ribonuclease activity. Furthermore, one gene was coded for a gene product involved in cellular components, such as the microtubule and kinesin complex.
We also used previous expression data for F. circinatum [51,52] to search for evidence of expression of the pine-host-associated genes (Supplementary Table S7). The combined results indicated that 62 of the 72 pine-host-associated genes were transcribed under the conditions used to generate the expression data [51][52][53]. No expression data are currently available for any of the grass-host-associated Fusarium species.

Phylogenetic Origins of the Host-Range-Associated Genes
Ancestry of genes in the pine-and grass-host-associated sets were inferred using their protein sequences in BLAST searches against NCBI's non-redundant protein database analyses (performed on 04 October 2021), which were supplemented in some cases by several rounds of alignment of the retrieved sequences followed by phylogenetic analyses. Based on these data, we assigned the 119 genes from the two sets into eight groups (Table 2 and Table S8, Supplementary Figures S2-S8). Genes included in the first seven groups had evolutionary origins within the FFSC, the broader Fusarium genus, or lineages outside Fusarium but within the Ascomycetes. The last group contained those genes lacking significant BLAST hits in the NCBI database, as well as those that did not comply with the cut-off values stated in Section 4.3. Table 2. Numbers of genes, and their inferred evolutionary origins, included in the pine-and grass-host-associated sets of host-range-associated genes. The results of these analyses indicated that the ancestral origin for most of the genes was in members of the FFSC or other lineages of Fusarium. Indeed, based on their phylogenies, the genes included in groups 1-4 (i.e., 39 genes in the pine-associated set and 25 in the grass-associated set) likely emerged early during the evolution of Fusarium and/or the FFSC, after which they were retained in only some lineages. By contrast, 12 pine-hostassociated and seven grass-host-associated genes lacked homologs in Fusarium and were more closely related to homologs in other Ascomycetes (groups 5 and 6) or outside of the kingdom Fungi (group 7). A considerable proportion of the genes had no detectable homologs in GenBank.

Genomic Distribution of the Host-Range-Associated Genes
All F. circinatum chromosomes included at least one pine-host-associated gene, while all but two F. temperatum chromosomes included at least one grass-host-associated genes (Figures 1, 2, S9 and S10). Chi-squared tests indicated that significantly (p < 0.05) more of these genes were located in subtelomeric regions than outside of them ( Table 2). That is, about half of the host-associated genes in each species were restricted to subtelomeric regions, which constitute approximately one quarter of each genome (Supplementary Tables S9 and S10). Of the 38 host-range-associated genes in subtelomeric regions of F. circinatum, 25 yielded significant hits in BLAST analysis (and evidence of expression; see Supplementary Table S7). By contrast, 16 of the 30 host-range-associated genes in subtelomeric regions in F. temperatum had BLAST hits. Phylogenetic analyses of these genes indicated that most had evolutionary origins within fungi (see Table 2). However, the origins of a substantial number of subtelomeric genes remained unknown, as there were no homologs for them in GenBank, while one gene in both sets apparently has origins outside of fungi (see groups 7 and 8, respectively, in Table 2).   The distribution of host-range-associated genes from pine-host-associated Fusarium species and conservation of synteny across and among chromosomes and genomes. Distribution of pine-associated genes across each of the chromosomes are indicated by vertical blue lines. The conservation of synteny and inversion between the relevant genomes are indicated in the brown blocks and red lines. FCIR = F. circinatum; chromosome size is given in Kb.

Figure 2.
The distribution of host-range-associated genes from Poaceae-host-associated Fusarium species and conservation of synteny across and between chromosomes and genomes. Poaceaeassociated genes distribution across each of the chromosomes as indicated by the blue lines. The The density of host-range-associated genes differed among chromosomes in both F. circinatum and F. temperatum (Supplementary Table S11). Most chromosomes, except for Chromosome 5, 10 and 12, had higher unique gene densities for F. temperatum when the assemblies for F. circinatum and F. temperatum were compared. Homologous chromosomes of F. circinatum and F. temperatum had similar densities of host-range-associated genes, ranging from 1.04× (Chromosome 4) to 1.97× (Chromosome 7). Moderate differences appeared in the host-range-associated gene density (more than two-fold) of Chromosome 9 and 12, where F. circinatum contained more than F. temperatum, and vice versa, for these two chromosomes. Lastly, a major difference (more than seven-fold) occurred on Chromosome 3 and 6, where F. circinatum had more host-range-associated genes compared to the same chromosomes in F. temperatum. The sequence of F. circinatum Chromosome 6 did not include a telomeric cap. It is also worth noting that Chromosome 12 in F. temperatum contained more host-range-associated genes compared to the same chromosome in F. circinatum.
Our data showed that some of the host-range-associated genes occurred in clusters or were located in close proximity to one another, and that such genes could have the same or different ancestral origins (Figure 3; Supplementary Table S12). For example, F. circinatum genes FCIR_6_gene_2.26 and FCIR_6_gene_2.131 were adjacent to one another on Chromosome 6 and were in the same ancestral origin group (Group 5) ( Figure 3A; Table 2), whereas genes FCIR_6_gene_2.2 and FCIR_6_gene_2.142 were near one another on Chromosome 6 but were in different ancestral origin groups (Groups 8 and 7, respectively) ( Figure 3A,B; Table 2). or were located in close proximity to one another, and that such genes could have the same or different ancestral origins (Figure 3; Supplementary Table S12). For example, F. circinatum genes FCIR_6_gene_2.26 and FCIR_6_gene_2.131 were adjacent to one another on Chromosome 6 and were in the same ancestral origin group (Group 5) ( Figure 3A; Table 2), whereas genes FCIR_6_gene_2.2 and FCIR_6_gene_2.142 were near one another on Chromosome 6 but were in different ancestral origin groups (Groups 8 and 7, respectively) ( Figure 3A,B; Table 2). Bold letters above the green and orange genes indicate ancestral origin group as described in Table  2.
Using SynChro [54], we investigated whether gene areas flanking clustered hostrange-associated genes exhibited synteny in F. circinatum and F. temperatum (Supplementary Figures S11 and S12). For this analysis, we included 10 genes flanking host-range-associated genes, five genes upstream and five downstream. We used the analysis to determine the frequency with which 1-10 genes flanking host-range-associated genes were syntenic in F. circinatum and F. temperatum These analyses indicated that the genomic locations around host-range-associated genes in F. circinatum were normally distributed (Shapiro-Wilk's test for departure from normality; p > 0.05; W = 0.95) [55], but not in F. temperatum (p < 0.05; W = 0.86) (Figure 4). Bold letters above the green and orange genes indicate ancestral origin group as described in Table 2.
Using SynChro [54], we investigated whether gene areas flanking clustered hostrange-associated genes exhibited synteny in F. circinatum and F. temperatum ( Supplementary  Figures S11 and S12). For this analysis, we included 10 genes flanking host-range-associated genes, five genes upstream and five downstream. We used the analysis to determine the frequency with which 1-10 genes flanking host-range-associated genes were syntenic in F. circinatum and F. temperatum These analyses indicated that the genomic locations around host-range-associated genes in F. circinatum were normally distributed (Shapiro-Wilk's test for departure from normality; p > 0.05; W = 0.95) [55], but not in F. temperatum (p < 0.05; W = 0.86) (Figure 4).  The difference in distribution pattern was particularly evident when comparing clusters in the two genomes that had five and six of the 10 syntenic genes. In F. temperatum, the clusters with five out of 10 syntenous genes appeared to be severely depleted, indicating minimal synteny when compared to the F. circinatum genome. However, the clusters with six out of 10 syntenous genes in F. temperatum were expanded compared to F. circinatum. The general trend was that genes flanking host-range-associated genes in F. circinatum were frequently syntenic with genes in F. temperatum, whereas genes flanking host-range-associated genes in F. temperatum were less frequently syntenic with genes in The difference in distribution pattern was particularly evident when comparing clusters in the two genomes that had five and six of the 10 syntenic genes. In F. temperatum, the clusters with five out of 10 syntenous genes appeared to be severely depleted, indicating minimal synteny when compared to the F. circinatum genome. However, the clusters with six out of 10 syntenous genes in F. temperatum were expanded compared to F. circinatum. The general trend was that genes flanking host-range-associated genes in F. circinatum were frequently syntenic with genes in F. temperatum, whereas genes flanking host-rangeassociated genes in F. temperatum were less frequently syntenic with genes in F. circinatum. This trend likely suggests higher conservation of the genes and/or genomic regions neigbouring or flanking the position of host-range-associated genes in F. circinatum compared to F. temperatum.

Discussion
In the current study, we used a comparative genomics approach to explore the molecular basis of plant-fungus interactions by making use of two groups of Fusarium species in the FFSC, i.e., one associated with pine and the other with grass. This allowed for the identification of sets of genes that are unique to each of the two groups of Fusarium species, and that markedly differed in terms of their identity and the function of the proteins they encode. The two sets of genes also showed large differences in their ancestral origins, and they tended to occur in subtelomeric regions of chromosomes. Overall, these genes highlighted the different evolutionary origins and molecular processes that underpin the capacity of these fungi to colonize their respective plant hosts.
To conduct this study, we used two high-quality reference genome assemblies. The one was for the pine pathogen F. circinatum, while the other was for the maize pathogen F. temperatum. Within these assemblies, most chromosomes had telomere-to-telomere coverage, which allowed us to compare the localization and clustering of the identified hostrange-associated genes. This also allowed for comparisons of chromosomal architecture among four of the six studied Fusarium species.
The host-range-associated genes identified in the genomes of the pine-host-associates (F. circinatum, F. fracticaudum and F. pininemorale) and in those of the grass-host-associates (F. konzum, F. subglutinans and F. temperatum) were preferentially localized in subtelomeric regions, possibly due to the greater genomic instability occurring in these regions compared to elsewhere on the chromosome [56]. It has been speculated that the preferred subtelomeric location facilitates gene-switching and expression, and the generation of new gene variants [57,58]. The subtelomeric location of genes has been studied in several organisms, which include Homo sapiens [59], Drosophila melanogaster [60], Plasmodium falciparum [61], Trypanosoma brucei [62], Saccharomyces cerevisiae [63], and the fungal parasite Encephalitozoon cuniculi [64]. Upon genome analyses of S. cerevisiae, Louis [63] uncovered that subtelomeric genes belong to several gene families encoding proteins involved in carbon source utilization. In humans, for example, genes encoding for olfactory receptors (i.e., proteins capable of binding odor molecules) are the most studied and largest gene family located in the subtelomeric regions of the human genome [65,66].
The presence of subtelomeric regions provide an evolutionary advantage and have potential importance in replication and chromosome stability, but also seem to play important roles in the reversible silencing of genes mediated by proteins binding to the telomere, and engagement in ectopic recombination with other subtelomeres, likely resulting in gene diversification [58,67,68]. The subtelomeric regions are also thought to have higher genomic instability than elsewhere on the chromosome, which possibly allows for the tight regulation, and occasional switching of expression between different gene copies, allowing for organisms to evade the immune system of their respective hosts [67,68]. In addition, the literature suggests that genes located in subtelomeres play important evolutionary roles towards adaptation, such as contributing to niche-or host-specificity and virulence in pathogens [69][70][71][72]. For example, the fungus Pneumocystis jiroveci has large and highly variable gene families located in its subtelomeric regions that encode for surface proteins with tightly regulated, yet switchable, expression patterns [73]. Fusarium subtelomeres have been reported to contain virulence genes involved in host-specificity, e.g., in F. graminearum [36,74,75] and F. fujikuroi [29]. In F. circinatum, a locus determining growth-rate region was also characterized from a subtelomeric region [71]. In the case of the Fusarium species examined in this study, a large proportion of the genes inferring potential host-specificity were positioned on subtelomeres, pointing to their possible roles in fungal development, adaptation, and survival for these Fusarium species.
The presence/absence of host-range-associated genes in the two sets of genomes, as well as their placement in subtelomeric regions, is consistent with what is expected for genes belonging to the accessory genomic compartment, since these regions are considered to be components of the accessory genome. In other words, they represent accessory genes likely involved in niche-associated traits, which are typically enriched for functions related to cell-cell interactions, secondary metabolism, and stress responses, all potential contributors to pathogenicity and virulence [34,76]. The dynamic evolutionary processes associated with the accessory genome [34,41,42,77,78] likely allowed for the preferential loss and/or gain of genes with new functions. This ultimately led to the pine-and grass-host-associates examined here being able to colonize and survive on the tissue of their respective hosts.
Our findings are consistent with the notion that horizontal gene transfer (HGT) plays a major role in the evolution of accessory genomes [34,41,42,[77][78][79][80][81]. As such, HGT is increasingly recognized as a significant driver of adaptive evolution in fungi and the pathogenic traits of pathogens [82][83][84]. For example, the ability of Valsa mali to grow on its host and be pathogenic to it was acquired via HGT from diverse sources [84]. HGT can even cause basic lifestyle changes, e.g., facilitating the transition of the plant-associated ancestor of Metarhizium into the entomopathogenic taxon known today [85]. Among FFSC species, multiple independent HGT events were reported to cause the development of the growth-determining locus in F. circinatum [71]. The polyphyletic origins of many of the host-range-associated genes identified in the current study were likely also acquired from such diverse external sources (see groups 5-8 in Table 2), while most of the remainder likely originated within Fusarium or the FFSC and were retained only in certain lineages (groups 1-4 in Table 2). More research is needed to determine the possible role of internal genomic mutations (due to duplication, displacement, and translocation events) [30,86,87] in determining the host associations of the examined fungi.
Despite the existence of genes unique to the respective pine-and grass-host-associated fungi, their genomes were remarkably syntenic [29,30]. Variability was most evident when comparisons of inversions were investigated. Inversions closer to the ends of chromosomes were previously reported in fungi [36,88,89]. Inversions were most noticeable for Chromosome 8, especially when considering the host-range-associated genes on this chromosome. All host-range-associated genes from Chromosome 8 were located in regions prone to inversion. The variability of chromosome 8 is likely due to the sizable reciprocal translocation with Chromosome 11 within some FFSC taxa, including all species investigated in this study [30,[90][91][92][93]. Chromosome 11 from F. circinatum had only one unique gene located outside of the inversed regions, whereas the same chromosome in F. temperatum lacked any host-range-associated genes. This suggests that Chromosome 8 is more predisposed to chromosomal structural changes. Current advances in sequencing technologies highlighted the importance of chromosome rearrangements events in cellular processes by providing longer reads and telomere-to-telomere coverage [94]. Translocation events have been reported to drive the evolution of secondary metabolite biosynthetic gene clusters, whereas chromosome rearrangements may result in gene loss and/or gain, genetic diversity, and changes in pathogenicity [95][96][97][98]. For example, effector genes located in rearranged chromosomal regions are more prone to evolution and are more likely to contribute to adaptation and speciation [99,100].
The importance of Chromosome 12 in possessing host-range-associated genes poses an interesting question. This chromosome is a dispensable chromosome within the FFSC [42,[101][102][103] and can be strain-specific [29,102]. Chromosome 12 is not essential for pathogenicity [29,102,104] and its role in niche-specificity is unknown. However, the larger Chromosome 12 of F. temperatum contained more host-range-associated genes that were potentially involved in niche-specificity among Fusarium species that are associated with Poaceae. This occurrence may imply that some elements of the chromosome length polymorphism between F. circinatum and F. temperatum accounts for the addition of host-range-associated genes and that this chromosome is rapidly evolving. Chromosome 12 requires further investigation and would, therefore, be an important target for future studies.
Based on their inferred products, the host-range-associated genes could mostly be grouped as being involved in virulence, substrate transport, and metabolism of carbohydrates, and in the case of F. temperatum, also as a heterokaryon incompatibility (HET)domain protein. It also highlights the evolutionary changes that Poaceae-associated Fusarium species had to undergo for these genes to be successful colonizers of members of Poaceae. The presence of these groups of genes is indicative of their role in the survival and adaptation of these Fusarium species, likely providing a competitive advantage as well as a protective advantage against the host defense responses.
In conclusion, this study used near-complete and high-quality genome data to identify a set of genes that potentially underpin the association of six Fusarium species with their pine and grass hosts. These genes are involved in diverse processes that could potentially provide the fungi with ecological advantages in their respective niches [105][106][107][108][109][110]. Furthermore, these genes preferentially form part of the accessory genome of the examined fungi, where diverse processes likely determined their evolution and development. This work thus forms a strong foundation from which future studies could explore the functional relevance of the identified genes, as well as their expression, especially given their chromosomal locations. Such studies are needed if we are to understand the ecology and biology of the fungi that threaten the health of socioeconomically important plants and crops.
A reference genome was selected to represent either the pine-or grass-host-associates. The purpose of this was to achieve a detailed comparison with two well-assembled genomes between Fusarium species colonizing different plant hosts. Fusarium circinatum was chosen to represent the pine-host-associates and F. temperatum represented grass-host-associates. The abundance of the telomere-associated repeat sequence (TTAGGG/CCCTAA) [47,48] was evaluated using a motif search performed in CLC Genomics Workbench v. 11 (CLC bio, Aarhus, Denmark) to further investigate the completeness of the two reference genomes (F. circinatum and F. temperatum). For the motif search, a window size of 10,000 bp with 5000 bp increments was used. Only repeats with ≥80% similarity to the telomere repeat were considered in this analysis.
The MAKER2 pipeline [113] was utilized for structural annotations of all six genomes in order to identify protein-coding genes. Gene prediction was performed in MAKER using SNAP [114], GeneMark ES [115] and AUGUSTUS [116]. As additional evidence, gene model data from F. circinatum [93], F. fujikuroi [29], F. verticillioides and F. graminearum [42], as well as F. mangiferae and F. proliferatum [19], were included. These isolates were selected due to the availability of their genomic information on the NCBI public database (https://www.ncbi. nlm.nih.gov/; (accessed on 1 December 2021).

Genes Unique to Pine-and Grass-Host-Associated Species
The gene content of all six Fusarium genomes was evaluated to determine which genes are unique to the species associated with the two groups of plant hosts. For this purpose, OrthoFinder v. 2.3.1 (https://github.com/davidemms/OrthoFinder (accessed on 1 December 2021)) was implemented [49]. The genes occurring only in the genomes of the pine-and grass-host-associates were labeled as "unique" genes.
Functional annotation was performed using the Blast2GO v. 6 [50] plugin (Valencia, Spain) for CLC Genomics Workbench. A two-tailed Fisher exact test was implemented (p < 0.05) in Blast2GO [50] to detect the GO terms that were overrepresented in the hostrange-associated genes set of both Fusarium representatives, using the whole genome of each as reference.

Phylogenetic Origins of the Host-Range-Associated Genes
The two unique gene sets were uploaded onto NCBI to perform BLASTp searches against the non-redundant database using the online position-specific iterative (PSI) BLAST tool (https://blast.ncbi.nlm.nih.gov/Blast.cgi (accessed on 1 December 2021)) [117]. All highly divergent protein sequences were excluded by considering only those sequences with at least 40% amino acid identity over 70% of the query sequence length and that had E-values ≤ 1 × 10 −5 [71]. All the sequences were aligned using the constraint-based alignment tool (COBALT) [118] and phylogenetic trees were viewed to determine the ancestral origins of the host-range-associated genes. The host-range-associated genes with unclear ancestral origin were selected to construct phylogenetic trees. To infer phylogenies, all sequences were aligned with Multiple sequence Alignment based on Fast Fourier Transform (MAFFT) v. 7.0 with default settings [119]. These alignments included the relevant Fusarium sequences, together with those from other Ascomycota. MEGA v. 7.0.26 (https://www.megasoftware.net/ (accessed on 1 December 2021)) [120] was used to draw initial tree(s) for the heuristic search by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances to estimate the best substitution model to use. The Maximum Likelihood branch support was estimated using bootstrap analyses based on 100 pseudoreplicates.

Genomic Distribution of the Host-Range-Associated Genes
The location and distribution of the host-range-associated genes were plotted across the 12 chromosomes in each of the two reference genomes using KaryoploteR v. 3.9 (http://bioconductor.org/packages/karyoploteR (accessed on 1 December 2021)) [121]. The difference in the distribution of genes in the different genomic regions was evaluated using Chi-squared tests (p < 0.05), by comparing those within and outside subtelomeric regions, with the null hypothesis that they are characterized by similar frequencies. Here, we regarded the first and last 500 kb from chromosome ends as subtelomeres [71,[122][123][124]. Furthermore, the synteny and conservation of the location of the different host-rangeassociated genes were studied with SynChro which revealed the synteny breakpoints between the reference and query genomes [54]. Lastly, the Shapiro-Wilks test was performed to test for normality by detecting all departures from normality for genes belonging to F. circinatum and F. temperatum.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/pathogens11080858/s1, Figure S1. Phylogenetic relationship of paralogs with regards to the respective host-range-associated gene. FCIR = Fusarium circinatum; FFRAC = Fusarium fracticaudum; FPIN = Fusarium pininemorale; FSUB = Fusarium subglutinans; Figure S2. Host-range-associated genes with ancestral origins that emerged within the FFSC. The investigated host-range-associated genes are highlighted in yellow; FCIR = Fusarium circinatum and FTEMP = Fusarium temperatum; Figure S3. Host-range-associated genes with ancestral origins that emerged within the FFSC and FOSC. The investigated host-range-associated genes are highlighted in yellow; FCIR = Fusarium circinatum and FTEMP = Fusarium temperatum; Figure S4. Host-range-associated genes with ancestral origins that emerged within the broader Fusarium outside the FFSC and FOSC. The investigated host-range-associated genes are highlighted in yellow; FCIR = Fusarium circinatum and FTEMP = Fusarium temperatum; Figure S5. Host-range-associated genes with less than 10 ancestral origin hits and mostly Fusarium. The investigated host-range-associated genes are highlighted in yellow; FCIR = Fusarium circinatum and FTEMP = Fusarium temperatum; Figure S6. Host-range-associated genes with ancestral origins hits and mostly not Fusarium. The investigated host-range-associated genes are highlighted in yellow; FCIR = Fusarium circinatum and FTEMP = Fusarium temperatum; Figure S7. Host-range-associated genes with ancestral origins outside Fusarium but in the Ascomycetes.
The investigated host-range-associated genes are highlighted in yellow; FCIR = Fusarium circinatum and FTEMP = Fusarium temperatum; Figure S8. Host-range-associated genes with ancestral origins outside Fungi. The investigated host-range-associated genes are highlighted in yellow; FCIR = Fusarium circinatum and FTEMP = Fusarium temperatum; Figure S9. The distribution of host-range-associated genes from pine-host-associated Fusarium species and conservation of synteny across and between chromosomes and genomes. Pine-host-associated genes distribution across each of the chromosomes as indicated by the blue lines. The conservation of synteny and inversion between the relevant genomes are indicated in the brown blocks and red lines. FCIR = F. circinatum; chromosome size is given in kbp; Figure S10. The distribution of hostrange-associated genes from Poaceae-host-associated Fusarium species and conservation of synteny across and between chromosomes and genomes. Poaceae-host-associated gene distribution across each of the chromosomes as indicated by the blue lines. The conservation of synteny and inversion between the relevant genomes are indicated in the brown blocks and red lines. FTEMP = F. temperatum; chromosome size is given in kbp; Figure S11. The syntenous relationship between genes from F. circinatum versus F. temperatum; Figure S12. The syntenous relationship between genes from F. temperatum and F. circinatum; Table S1. BUSCO results for the relevant Fusarium genomes; Table S2. The size difference between the chromosomes of four of the six Fusarium species; Table S3. Presence of telomeres at chromosomal ends for the two representative Fusarium species examined; Table S4. The Blast2GO data for the 72 unique pine-host-associated genes specifically for (A) Fusarium circinatum, (B) F. fracticaudum and (C) F. pininemorale; Table S5. The Blast2GO data for the 47 unique Poaceae-host-associated genes, specifically for (A) Fusarium konzum, (B) F. subglutinans and (C) F. temperatum; Table S6. The Fischer exact test data for (A) the 72 unique pine-host-associated genes and (B) the 47 unique Poaceae-host-associated genes; Table S7. The EST and RNA-seq data for F. circinatum, obtained from Wingfield et al. [52] and Phasha et al. [53], respectively; Table S8. The placement of host-range-associated genes in groups that infer their evolutionary origins; Table S9. The gene information for the unique F. circinatum genes, in terms of chromosome location, subtelomeric placement, ancestral origin and BLAST description; Table S10. The gene information for the unique F. temperatum genes, in terms of chromosome location, subtelomeric placement, ancestral origin and BLAST description; Table S11. The host-range-associated gene density for both F. circinatum and F. temperatum; Table S12. The SynChro data for genes downstream and upstream of the host-rangeassociated genes of both the pine-and Poaceae-host-associated Fusarium species; FCIR = Fusarium circinatum and FTEMP = Fusarium temperatum.