The First Telomere-to-Telomere Chromosome-Level Genome Assembly of Stagonospora tainanensis Causing Sugarcane Leaf Blight

The sexual morph Leptosphaeria taiwanensis Yen and Chi and its asexual morph Stagonospora tainanensis W. H. Hsieh is an important necrotrophic fungal phytopathogen, which causes sugarcane leaf blight, resulting in loss of cane tonnage and sucrose in susceptible sugarcane varieties. Decoding the genome and understanding of the basis of virulence is vitally important for devising effective disease control strategies. Here, we present a 38.25-Mb high-quality genome assembly of S. tainanensis strain StFZ01, denovo assembled with 10.19 Gb Nanopore sequencing long reads (~267×) and 3.82 Gb Illumina short reads (~100×). The genome assembly consists of 12 contigs with N50 of 2.86 Mb of which 5 belong to the telomere to telomere (T2T) chromosome. It contains 13.20% repeat sequences, 12,543 proteins, and 12,206 protein-coding genes with the BUSCO completeness 99.18% at fungi (n = 758) and 99.87% at ascomycota (n = 1706), indicating the high accuracy and completeness of our gene annotations. The virulence analysis in silico revealed the presence of 2379 PHIs, 599 CAZys, 248 membrane transport proteins, 191 cytochrome P450 enzymes, 609 putative secreted proteins, and 333 effectors in the StFZ01 genome. The genomic resources presented here will not only be helpful for development of specific molecular marker and diagnosis technique, population genetics, molecular taxonomy, and disease managements, it can also provide a significant precise genomic reference for investigating the ascomycetous genome, the necrotrophic lifestyle, and pathogenicity in the future.


Introduction
Sugarcane (Saccharum spp. hybrids), cultivated in more than 120 countries, is a crucial sugar crop accounting for 80% of the world's and nearly 90% of China's sugar production [1,2]. Similar to other crops, sugarcane is also exposed to many different diseases during cultivation. Among them, fungal diseases are the most serious due to the production of a large number of conidia, which can be transmitted by air, wind, and the splash of water during rain, and lead to the elimination of many elite cultivars [3][4][5][6]. Unlike the stalk-infected diseases, such as smut and pokkah boeng caused by Sporisorium scitamineum and Fusarium sp., respectively, which lead to serious economic losses almost every year in susceptible varieties, the foliar diseases are confined to leaves and outbreaks in susceptible varieties during monsoon, which increase wounds and humidity, a steady stream of wet weather and after the monsoon season. It means that the foliar diseases do not cause outbreaks every year and thus the economic repercussions caused by them are uncertain. Therefore, less attention has been paid to the foliar diseases especially for those with limited distribution, such as sugarcane leaf blight (SLB). However, we note an increasing concern of foliar diseases in sugarcane due to minor diseases becoming major diseases. For example, pokkah boeng was a minor disease in past decades [7], but it has become a major disease in India [8,9]. The first severe outbreaks of brown rust caused by Puccinia melanocephala on sugarcane were reported in 1978 in Florida [10] and recently reported in India [11]. Orange rust caused by Puccinia kuehnii was also considered a minor disease in most countries, including Australian before 1999; however, severe epidemics occurred in cultiver Q124 during 1999-2001, which resulted in 50% of yield losses, overall losses estimated to be A$150-210 millions [4], and an outbreak of this disease in America in 2007 [12]. Additionally, brown spot caused by Cercospora longipes was found to be alarming in India [13], peanut collar rot caused by Aspergillus niger in Asia [14,15], and brown stripe caused by Helminthosporium stenospila have been major diseases in China [16]. Similarly, SLB has been alarming recently in Yunnan and Guangxi provinces in China, due to the changed climate and the expansion of susceptible varieties, such as Guitang42, Taitang25, and Liucheng03-182 [16].
Sugarcane leaf blight caused by Stagonospora tainanensis was first reported in the year 1952 in Taiwan, China, and its asexual morph was first named Cercospora tainanensis [17] and then changed to S. tainanensis W. H. Hsieh, according to the further study on the isolates and inoculation, together with the pathogenic morphological features [18]. It occurred through the year in the east coast area with high rainfall, but not in the west coast area with less rain fall. The prevalence conditions for SLB were similar to that of the disease of Stagonospora nodorum blotch caused by Stagonospora. nodorum in wheat [19,20] of which the sexual morph is Leptosphaeria nodorum [21]. For a long time, research on SLB has been limited to pathogen isolation and identification, although it is one of the most harmful fungal diseases threatening the sugarcane industry and causing high cane yield and sugar losses in susceptible cultivars [17,22] because the pathogenic conidia are highly virulent, cause blight symptoms on sugarcane leaves, and result in loss of photosynthetic capacity [23]. Morphologically, S. tainanensis is an ascomycete, belonging to Stagonospora of Phaeosphaeriaceae within the class Dothideomycetes and the order Pleosporales. Its asci are slightly curved, 62-115 × 21-33 µm in size, born in scattered and dark brown perithecia [22]. Each ascus contains eight fusiform bent ascospores with one septum and its cylindrical conidia are straight to slightly curved in shape with three septa generally, containing three to eight oil droplets [20]. Recently, the efficient PCR detection technology of S. tainanensis and SLB were developed based on genomic information [24], and identification of SLB-resistance associated loci/genes were reported [25,26]. These will advance our understanding of the molecular mechanisms of pathogen infection and host resistance.
In the recent 10 years, the revolutionary progress of the third-generation sequencing (TGS) technologies led by PacBio and Oxford Nanopore Technology (ONT) has brought genome research into a new era [27,28]. Third-generation sequencing can produce long reads from 10 kb to 1 Mb, which dramatically reduce the time and cost for genome assembling and thus makes it possible to finish a high-quality non-model fungus genome assembly of approximately 50 Mb for a small lab [29]. For a model plant pathogenic fungi, such as Pyricularia oryzae and Fusarium graminearum, TGS-based chromosome-level reference genome and more than 100 genome assemblies are available in NCBI [30][31][32][33]. For fungi in the Massarinaceae family, only three species, including Byssothecium circinans [34], Massarina eburnea [35], and Stagonospora sp. [36], have been reported. However, in our studied genus Stagonospora, only S. nodorum, a model species of necrotrophic Pleosporales pathogens, had been sequenced. Additionally, S. nodorum genome was the first Dothideomycete genome, which was sequenced in 2005 and published in 2007 [37], and had a revolutionary impact on the understanding of this important pathogen and other fungal pathogens due to the limited genomic information available.
Until now, the genome of S. tainanensis has not been publicly available. In this study, we employed Nanopore sequencing and Illumina sequencing together to finish a near telomereto-telomere chromosome-level genome assembly and RNA-seq based gene annotation of this necrotrophic infecting fungus S. tainanensis strain StFZ01. It can provide a more precise understanding of the pathogen and the fungal pathogenicity and can offer a series of putative proteins in the fungal pathogenesis, such as effectors, and it is thus beneficial for developing a new disease management strategy and for sugarcane improvement of leaf blight resistance.

Sample Preparation and Sequencing
Sugarcane leaf blight-susceptible sugarcane cultivar Yuetang93-159 was planted in Fuzhou, China (26 • 5 0 N, 119 • 13 45 E) and the S. tainanensis strain StFZ01 was collected from its leaves with the typical symptoms of SLB. After the isolates were testified using ITS detection ( Figure 1) and by pathogenic morphology [24], one assigned as StFZ01 was used for genome sequencing and analysis. The fresh mycelia cultivated on potato dextrose broth (PDB) media was collected for DNA and RNA extraction. For long-read genomic sequencing, high-quality genomic DNA was extracted using Ligation Sequencing Kit (SQK-LSK110), then BluePippin DNA size selection system was used to select large DNA fragments (>20 kb) for sequence library preparation following the manufacturer's instructions, and the sequencing was conducted on PromethION sequencing platform from Oxford Nanopore Technologies (ONT). For Illumina short-read sequencing, genomic DNA and mRNA were extracted, purified, and prepared for sequencing libraries using Illumina DNA Prep Kits (Illumina, Inc., San Diego, CA, USA) and Illumina Stranded mRNA Prep (Illumina, Inc., San Diego, CA, USA), respectively. Illumina genomic DNA sequencing and RNA-seq were performed on the Illumina HiSeq 3000 sequencing platform (350 bp library and PE150 strategy).

De Novo Genome Assembly
The de novo assembly of ONT long reads was performed using NextDenovo v2.5.0 (https://github.com/Nextomics/NextDenovo) (accessed on 10 January 2022) with the "correct-then-assemble" strategy. Then, base errors (SNPs/Indels) of the draft genome assembly were fixed by NextPolish v1.4.0 [40] using both ONT long reads and Illumina short reads (task = best model) to generate a high-continuity and high-accuracy genome assembly of strain StFZ01.

Genome Completeness Assessment
The software Benchmarking Universal Single-Copy Orthologs (BUSCO) v5.3.2 [41] was used to evaluate the completeness of the genome assembly and annotated genes with the lineage dataset of fungi_odb10 (n = 758) and ascomycota_odb10 (n = 1706). Furthermore, the completeness of the genome assembly was also assessed by mapping of sequenced reads. RNA-Seq reads were aligned to the repeat-masked genome assembly using HISAT2 v2.2.1 [42]. The ONT long reads and genomic Illumina short reads were mapped to the unmasked genome assembly with minimap2 v2.21-r1071 [43] and BWA-MEM2 v2.2.1 [44], respectively (Table S1).

De Novo Genome Assembly
The de novo assembly of ONT long reads was performed using NextDenovo v2.5.0 (https://github.com/Nextomics/NextDenovo) (accessed on 10 January 2022) with the "correct-then-assemble" strategy. Then, base errors (SNPs/Indels) of the draft genome assembly were fixed by NextPolish v1.4.0 [40] using both ONT long reads and Illumina short reads (task = best model) to generate a high-continuity and high-accuracy genome assembly of strain StFZ01.

Repeat Masking
Transposable element (TE) of the StFZ01 genome was annotated using a combination of ab initio and homology-based methods. First, a high-quality ab initio TE library was constructed with RepeatModeler v2.02 [46]. Next, RepeatMasker v4.1.2-p1 (http://repeatmasker.org/) (accessed on 20 April 2021) was applied to perform a homologybased TE search throughout the StFZ01 genome using the ab initio TE database. Finally, The StFZ01 genome was repeat-masked as the hard-masked (repeat sequences replaced with N) sequence used in mapping of RNA-seq reads and the soft-masking (repeat sequences masked as low case) genome for gene annotation.
The putative secreted proteins were identified following a pipeline in the previous study [30], of which proteins with signal peptide and without transmembrane helix were identified by SignalP v5.0 [58] and TMHMM v2.0 [59], respectively, and those with extracellular location were identified using ProtComp v9.0 from MolQuest v2.4 (Softberry Inc., New York, USA). Furthermore, these effectors were further scanned by EffectorP v3.0 [60] and divided into cytoplasmic and apoplastic effectors.

Comparative Genomic Analysis
Whole-genome protein sequences of nine fungi species, including three of the Massarinaceae family and six of the Pleosporaceae familiy in the Pleosporales order, were downloaded from NCBI (Table S2). The longest proteins for each gene were selected and clustered using OrthoFinder v2.5.4 [51] with the following parameters: -S diamond, -M msa [62]. Single-copy core orthologous proteins were aligned using MAFFT v7.490 [63] and then the phylogenetic tree of species was constructed with FastTree v2.1.11 [64] and visualize by Interactive Tree Of Life (iTOL) v6.5.8 online services [65].

The Morphology of the Pathological Lesions and Pathogenic S. tainanensis Used for Genome Sequencing
The typical single mature or early-mature lesion of sugarcane leaf blight on the infected leaves caused by S. tainanensis is spindly and elongated, which is observed on variety Yuetang93-159 ( Figure 2A). The color of pathological leaf tissues was found to change from early yellowish to yellow, then to red-brown ( Figure 2B). Therefore, the typical symptoms of SLB are relatively easy to identify in later stages of lesion development ( Figure 2B). The mycelia of the pathogenic S. tainanensis strain StFZ01 used for DNA isolating were collected from the culture growth on PDA agar medium ( Figure 2C) and the pathogenic conidia ( Figure 2D), and sexual asci and ascospores ( Figure 2E) were observed.

Genome Sequencing and Assembly
After quality control, a total of 10.19 Gb clean ONT long reads (depth: ~267×, N50: 21,784 bp, maximum length: 133,873 bp) were used for de novo genome assembly, 3.82 Gb Illumina short reads (depth: ~100×, 2 × 150 bp) for estimation of genome size and polishing of draft genome assembly, and 6.08 Gb RNA-seq reads (2 × 150 bp) for gene annotation ( Figure 3A and Table S1).

Genome Sequencing and Assembly
After quality control, a total of 10.19 Gb clean ONT long reads (depth:~267×, N 50 : 21,784 bp, maximum length: 133,873 bp) were used for de novo genome assembly, 3.82 Gb Illumina short reads (depth:~100×, 2 × 150 bp) for estimation of genome size and polishing of draft genome assembly, and 6.08 Gb RNA-seq reads (2 × 150 bp) for gene annotation ( Figure 3A and Table S1).
We estimated genome size of S. tainanensis strain StFZ01 to be 40,445,307 bp (Model fit = 98.13%, ploidy = haploid), based on k-mer distribution (k = 21, average k-mer depth 76×) of Illumina short reads ( Figure 3B and Table S3). The estimated repeat is 19.09%  (Table 1 and Figure 3C). The length of genome assembly is slightly smaller than the estimated genome size (94.58% of 40,445,307 bp). Its genome size is comparable to another species S. nodorum (37.21 Mb) in the same genus and Bipolaris maydis (36.23 Mb) causing Southern corn leaf blight [66], but smaller than the Leptosphaeria maculans (45.12 Mb), a pathogenic fungus closely related to S. nodorum, and much smaller than the genomes of Colletotrichum higginsianum (53.4 Mb) and Colletotrichum graminicola (57.4 Mb) [67]. Circles from outside to inside present contigs (1st circle, the smallest contig ctg12 was not shown), distribution of protein-coding genes (2nd), TEs (3rd), and putative secreted proteins (4rd) per 50 kb window size (color blue to red means number from low to high). The lines in the center of circle show the synteny blocks (≥10 kb) between different contigs. A total of nine contigs (ctg1-ctg9) were found to start or end with telomeric repeat, (5′-TTAGGG-3′)n or (5′-CCCTAA-3′)n, of which five contigs (ctg1, ctg3, ctg5, ctg6, ctg7, and ctg9) contain telomeric repeats at both contig ends, indicating that these contigs reached perfect gapless T2T chromosome level [45] (Table S4). Circles from outside to inside present contigs (1st circle, the smallest contig ctg12 was not shown), distribution of protein-coding genes (2nd), TEs (3rd), and putative secreted proteins (4rd) per 50 kb window size (color blue to red means number from low to high). The lines in the center of circle show the synteny blocks (≥10 kb) between different contigs. A total of nine contigs (ctg1-ctg9) were found to start or end with telomeric repeat, (5 -TTAGGG-3 )n or (5 -CCCTAA-3 )n, of which five contigs (ctg1, ctg3, ctg5, ctg6, ctg7, and ctg9) contain telomeric repeats at both contig ends, indicating that these contigs reached perfect gapless T2T chromosome level [45] (Table S4).

Genome Quality Assessment
The BUSCO completeness values were estimated to be 99.34% at fungi (n = 758) and 97.54% at ascomycota (n = 1706) for genome assembly of StFZ01 ( Figure 4A). All clean ONT long reads and Illumina genomic reads were aligned to unmasked genome assembly, and the mapping rate of ONT long reads and Illumina genomic reads are 99.20% and 99.02%, respectively ( Figure 4B and Table S1). Furthermore, one RNA-seq sample of StFZ01 was mapped to repeat-masked genome assembly, and 91.89% (85.9% properly paired) of RNA-seq reads showed unique mapping to gene regions ( Figure 4B and Table S1). All of these results attest to the high continuity and completeness of our assembled genome.

Genome Quality Assessment
The BUSCO completeness values were estimated to be 99.34% at fungi (n = 758) and 97.54% at ascomycota (n = 1706) for genome assembly of StFZ01 ( Figure 4A). All clean ONT long reads and Illumina genomic reads were aligned to unmasked genome assembly, and the mapping rate of ONT long reads and Illumina genomic reads are 99.20% and 99.02%, respectively ( Figure 4B and Table S1). Furthermore, one RNA-seq sample of StFZ01 was mapped to repeat-masked genome assembly, and 91.89% (85.9% properly paired) of RNA-seq reads showed unique mapping to gene regions ( Figure 4B and Table  S1). All of these results attest to the high continuity and completeness of our assembled genome.

Repeat Analysis
Repetitive sequences were identified using a combination of ab initio and homology-based approaches. In total, 13.20% (5,048,126 bp) of the assembled StFZ01 sequences were annotated as repeat sequences (Tables 1 and 2), which is less than the C. graminicola (22.3%) but more than the C. higginsianum (9.1%) [67]. Interspersed repeats, as the major component (90.41% of total repeats) were found to account for 11.28% of the genome, including 1,927,917 bp long terminal repeats (LTRs), 1,242,571 bp DNA transposons, 498,662 bp long interspersed nuclear elements (LINEs), 12,515 bp short interspersed nuclear elements (SINEs), and 1,242,571 bp unclassified interspersed repeats ( Table 2). The dynamic polymorphism of repeat insertion in phytopathogenic fungi usually associated with virulence variations, hence, high frequent interspersed repeats will be candidate DNA markers for identification of different virulent strain, similar to Pot2 rep-PCR fingerprinting analysis in rice blast fungus [68]. In addition, we identified several types of non-coding RNAs, including 162 tRNAs, 142 rRNAs, and 48 other ncRNAs (Table 1).

Repeat Analysis
Repetitive sequences were identified using a combination of ab initio and homologybased approaches. In total, 13.20% (5,048,126 bp) of the assembled StFZ01 sequences were annotated as repeat sequences (Tables 1 and 2), which is less than the C. graminicola (22.3%) but more than the C. higginsianum (9.1%) [67]. Interspersed repeats, as the major component (90.41% of total repeats) were found to account for 11.28% of the genome, including 1,927,917 bp long terminal repeats (LTRs), 1,242,571 bp DNA transposons, 498,662 bp long interspersed nuclear elements (LINEs), 12,515 bp short interspersed nuclear elements (SINEs), and 1,242,571 bp unclassified interspersed repeats ( Table 2). The dynamic polymorphism of repeat insertion in phytopathogenic fungi usually associated with virulence variations, hence, high frequent interspersed repeats will be candidate DNA markers for identification of different virulent strain, similar to Pot2 rep-PCR fingerprinting analysis in rice blast fungus [68]. In addition, we identified several types of non-coding RNAs, including 162 tRNAs, 142 rRNAs, and 48 other ncRNAs (Table 1).

Gene Structural Annotation
In total, 12,206 high-confidence protein-coding genes were predicted by the BRAKER2 pipeline, which was more than the Magnaporthe grisea PMg_Dl (10,218) though its genome size (38.25 Mb) was less than the M. grisea (47.89 Mb) [69], and this phenomenon was also observed in the other necrotrophic fungal pathogen Pyrenophora teres f. teres (41.95 Mb size, 11,799 genes) [70] and L. maculans 'brassicae' WA74 (44.20 Mb size, 10,624 genes) [71]. However, the gene number and the genome size of S. tainanensis were comparable to M. oryzae (on average 12,684 genes, 40.12 Mb size) [72]. In addition, this number is less than the other species in the same genus Stagonospora, i.e., model species S. nodorum SN15 (37.02 Mb size, 17,580 genes) (https://www.ncbi.nlm.nih.gov/assembly/GCA_016801 405.1/) (accessed on 30 August 2022), substantially higher than the known filamentous fungi though the gene number changed from first 10,792 supported by EST [35] to 12,382 supported by integrated multidimensional omics [73] and now 17,580 [74], while the gene number was comparable to the other typical species in filamentous fungi. These genes encode 12,543 proteins and 97.42% (11,891) of genes were predicted, encoding only one protein isoform. Only 315 genes were predicted with alternatively spliced protein isoforms, including 294 genes encoding two protein isoforms, 20 genes encoding three protein isoforms, and only 1 gene encoding 4 protein isoforms (Table S5). The number of exons per gene ranged from 1~18, and most genes contained 1~5 exons (Table S6). The BUSCO completeness of genes is 99.18% at fungi (n = 758) and 99.87% at ascomycota (n = 1706) ( Figure 4A), indicating high accuracy and completeness of our gene annotations.
The gene distribution is not uniform on the contigs. In most cases, it is opposite to the distribution of repetitive sequences, especially in the repeat-riched telomeric region at both ends of the contig, which contains almost no genes ( Figure 3C). This phenomenon is common in repeat-rich pathogenic fungi, such as like rice blast fungus P. oryzae [30] and soil borne plant pathogen Verticillium dahliae [75].
In addition, no protein-coding gene was identified in the smallest contig ctg12 (154,536 bp). Blastn against with NCBI nr database revealed it is mitochondrion DNA, the most similar sequence is mitochondrion from B. sorokiniana (NC_047242.1, 92.83% similarity and 26% coverage). Thus, we did not show it in the genome circos plot ( Figure 3C).
We identified 1323 proteins with a signal peptide and 609 proteins (encoded by 606 genes) with extracellular location that were defined as putative secreted proteins (PSPs) after removing proteins containing transmembrane helix (Table 3). Among these secreted proteins, we deciphered 333 effectors (encoded by 332 genes), including 187 cytoplasmic and 145 apoplastic effectors ( Figure 6D and Table S17), and the number of effectors was much less than that in the M. grisea PMg-ID (594) [70]. Interestingly, we found the putative secreted proteins were enriched in or nearby the high repeat regions ( Figure 3C), which is consistent with other pathogenic fungi, such as Magnaporthe oryzae [46].
We identified 1323 proteins with a signal peptide and 609 proteins (encoded by 606 genes) with extracellular location that were defined as putative secreted proteins (PSPs) after removing proteins containing transmembrane helix (Table 3). Among these secreted proteins, we deciphered 333 effectors (encoded by 332 genes), including 187 cytoplasmic and 145 apoplastic effectors ( Figure 6D and Table S17), and the number of effectors was much less than that in the M. grisea PMg-ID (594) [70]. Interestingly, we found the putative secreted proteins were enriched in or nearby the high repeat regions ( Figure 3C), which is consistent with other pathogenic fungi, such as Magnaporthe oryzae [46].

Secondary Metabolite Biosynthetic Gene Clusters (SMBGCs)
A total of 58 SMBGCs were identified, including 23 nonribosomal peptide synthetases (NRPS), 9 NRPS-like, 32 type I polyketide synthases (T1PKS), 1 type III polyketide synthases (T3PKS), 3 indoles, and 13 terpenes ( Figure 7A and Table S18). More than half of the SMBGCs (33) were located in ctg1, ctg2, and ctg3, the top three longest contigs ( Figure 7B). Eight SMBGCs were found to have more than a 50% similarity with the known SMBGCs, with four having a 100% similarity with the clavaric acid biosynthetic gene cluster from Hypholoma sublateritium, melanin biosynthetic gene cluster from B. oryzae, AbT1 biosynthetic gene cluster from Aureobasidium pullulans, and (-)-Mellein biosynthetic gene cluster from Parastagonospora nodorum, respectively ( Figure 7C). The rest of the 50 novel SMBGCs will provide a chance for mining novel secondary metabolites in S. tainanensis. J. Fungi 2022, 8, 1088. 13 known SMBGCs, with four having a 100% similarity with the clavaric acid biosynthetic gene cluster from Hypholoma sublateritium, melanin biosynthetic gene cluster from B. oryzae, AbT1 biosynthetic gene cluster from Aureobasidium pullulans, and (-)-Mellein biosynthetic gene cluster from Parastagonospora nodorum, respectively ( Figure 7C). The rest of the 50 novel SMBGCs will provide a chance for mining novel secondary metabolites in S. tainanensis. Interestingly, clavaric acid was reported to be an inhibitor of the human Ras-farnesyl transferase [77,78]; it thus has antitumor activity [79], and its terpene biosynthetic gene cluster was also detected in Aspergillus terreus [80] and Sporothrix species [76]. Melanin, a black pigment synthesized by T1PKS type SMBGCs, has a central role in the pathogenicity of plant pathogenic fungi, such as rice blast fungus P. oryzae [81,82].

Comparative Genomic Analysis
To figure out the genome differentiation with close relationship species, whole-genome orthologous gene cluster analysis was performed among S. tainanensis, three species were from the Massarinaceae family and the other six species were from the Pleosporaceae family in the Pleosporales order (Table S2). We collected 125,793 genes from the 10 species, and 114,719 genes (91.20%) were clustered into 14,038 orthorgroups (Table  S19). For the genome of S. tainanensis StFZ01, 114,77 out of 12,206 genes (94.03%) were clustered into 9782 orthorgroups, which include 5981 (61.14%) core orthorgroups shared by all other nine fungi, and 47 (0.48%) species-specific orthorgroups with 163 genes ( Figure 8A,B and Table S20). Together with the 729 unclustered genes, we identified 892 species-specific genes (7.31%) in S. tainanensis StFZ01 ( Figure 8A and Table S20).
We selected 4750 single-copy orthorgroups to construct phylogenetic trees ( Figure  8C), which showed that S. tainanensis and Stagonospora sp. were placed on a branch outside of Massarina eburnea and Byssothecium circinans ( Figure 8C). Interestingly, clavaric acid was reported to be an inhibitor of the human Ras-farnesyl transferase [77,78]; it thus has antitumor activity [79], and its terpene biosynthetic gene cluster was also detected in Aspergillus terreus [80] and Sporothrix species [76]. Melanin, a black pigment synthesized by T1PKS type SMBGCs, has a central role in the pathogenicity of plant pathogenic fungi, such as rice blast fungus P. oryzae [81,82].

Comparative Genomic Analysis
To figure out the genome differentiation with close relationship species, whole-genome orthologous gene cluster analysis was performed among S. tainanensis, three species were from the Massarinaceae family and the other six species were from the Pleosporaceae family in the Pleosporales order (Table S2). We collected 125,793 genes from the 10 species, and 114,719 genes (91.20%) were clustered into 14,038 orthorgroups (Table S19). For the genome of S. tainanensis StFZ01, 114,77 out of 12,206 genes (94.03%) were clustered into 9782 orthorgroups, which include 5981 (61.14%) core orthorgroups shared by all other nine fungi, and 47 (0.48%) species-specific orthorgroups with 163 genes ( Figure 8A,B and Table S20). Together with the 729 unclustered genes, we identified 892 species-specific genes (7.31%) in S. tainanensis StFZ01 ( Figure 8A and Table S20).
We selected 4750 single-copy orthorgroups to construct phylogenetic trees ( Figure 8C), which showed that S. tainanensis and Stagonospora sp. were placed on a branch outside of Massarina eburnea and Byssothecium circinans ( Figure 8C).

Conclusions
In conclusion, this study presented the first T2T chromosome-level genome assembly and high-quality gene annotation of the pathogenic fungus S. tainanensis strain StFZ01 causing sugarcane leaf blight, integrating with Nanopore sequencing and Illumina sequencing. The well annotated repeats and genes, such as CAZys and effectors will play as the reference genome for designing species-specific molecular markers and identifying pathogenicity-related genes in the future.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jof8101088/s1, Table S1: Summary of sequencing reads; Table S2: Whole-genome proteins date of close species used for orthologous gene clusters; Table S3: Genome size estimated by GenomeScope2 with Illumina genomic reads; Table S4: Contigs with telomeric repeats; Table S5: Distribution of protein isoforms encodes by genes; Table S6: Distribution of exon numbers per gene; Table S7: GO annotation; Table S8: KEGG annotation; Table S9; Pfam Annotation; Table S10: KOG annotation; Table S11: Summary of GO annotation; Table S12: Summary of KEGG pathway annotation; Table S13: Genes with cytochrome P450 annotation; Table  S14: CAZys annotation; Table S15: Membrane transport proteins annotated by TCDB; Table S16: PHI_annotation; Table S17: Summary of secreted proteins and effectors; Table S18: Annotation of secondary metabolite biosynthetic gene clusters by AntiSMASH; Table S19: Summary of orthogroups analysis overall species; Table S20: Summary of orthogroups analysis per species.  (B) Summary of orthogroups among species. Species specific genes (white), total orthogroups (black, in the red dotted circle), and core orthogroups (red) were shown from outside to inside of flower. (C) Phylogenetic tree inferred with alignment of single-copy core orthogroups.

Conclusions
In conclusion, this study presented the first T2T chromosome-level genome assembly and high-quality gene annotation of the pathogenic fungus S. tainanensis strain StFZ01 causing sugarcane leaf blight, integrating with Nanopore sequencing and Illumina sequencing. The well annotated repeats and genes, such as CAZys and effectors will play as the reference genome for designing species-specific molecular markers and identifying pathogenicity-related genes in the future.

Conflicts of Interest:
The authors declare no conflict of interest.