Genomic Characteristics and Comparative Genomics Analysis of Two Chinese Corynespora cassiicola Strains Causing Corynespora Leaf Fall (CLF) Disease

Rubber tree Corynespora leaf fall (CLF) disease, caused by the fungus Corynespora cassiicola, is one of the most damaging diseases in rubber tree plantations in Asia and Africa, and this disease also threatens rubber nurseries and young rubber plantations in China. C. cassiicola isolates display high genetic diversity, and virulence profiles vary significantly depending on cultivar. Although one phytotoxin (cassicolin) has been identified, it cannot fully explain the diversity in pathogenicity between C. cassiicola species, and some virulent C. cassiicola strains do not contain the cassiicolin gene. In the present study, we report high-quality gapless genome sequences, obtained using short-read sequencing and single-molecule long-read sequencing, of two Chinese C. cassiicola virulent strains. Comparative genomics of gene families in these two stains and a virulent CPP strain from the Philippines showed that all three strains experienced different selective pressures, and metabolism-related gene families vary between the strains. Secreted protein analysis indicated that the quantities of secreted cell wall-degrading enzymes were correlated with pathogenesis, and the most aggressive CCP strain (cassiicolin toxin type 1) encoded 27.34% and 39.74% more secreted carbohydrate-active enzymes (CAZymes) than Chinese strains YN49 and CC01, respectively, both of which can only infect rubber tree saplings. The results of antiSMASH analysis showed that all three strains encode ~60 secondary metabolite biosynthesis gene clusters (SM BGCs). Phylogenomic and domain structure analyses of core synthesis genes, together with synteny analysis of polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS) gene clusters, revealed diversity in the distribution of SM BGCs between strains, as well as SM polymorphisms, which may play an important role in pathogenic progress. The results expand our understanding of the C. cassiicola genome. Further comparative genomic analysis indicates that secreted CAZymes and SMs may influence pathogenicity in rubber tree plantations. The findings facilitate future exploration of the molecular pathogenic mechanism of C. cassiicola.


Introduction
The fungus Corynespora cassiicola (Berk. and Curt) C. T. Wei, belonging to the Ascomycota phylum, Dothideomycetes class, and Pleosporales order, is responsible for diseases in a wide range of plants, including rubber tree, tomato, cucumber, soybean, cotton, and various others [1]. This fungus has also been isolated from sponges, nematodes, and otics and pigments [22]. In phytopathogenic fungi, a variety of SMs are produced during interactions with plants, and used as weapons to invade host plants [23]. SM biosynthetic genes in filamentous fungi are typically organised into contiguous gene clusters in the genome, and clustered near chemical backbone synthesis genes, such as non-ribosomal peptide synthases (NRPSs), polyketide synthases (PKSs) and terpene synthases [24]. In contrast to obligate biotrophic fungi, necrotrophic and hemibiotrophic fungi often contain many SM BGCs [23]. However, synthetic products of SM gene clusters in C. cassiicola and their roles in pathogenesis remain unclear. SM gene clusters vary between C. cassiicola strains, and their relevance to pathogenicity requires further study.
In the present study, we determined gapless genome sequences of two highly virulent C. cassiicola strains that cause CFL disease in rubber trees in China. This is the first genome reported for a Chinese CFL disease-causing strain. In order to analyse pathogenesisrelated genes and pathways, we compared Chinese strains and a virulent reference isolate CCP genome. The results showed that all three strains evolved under different selective pressures, and possess different secreted CAZymes and SM gene cluster compositions. This indicates that secreted CAZymes and SMs may influence pathogenicity in fungi affecting rubber trees, and may also partially explain the high variability in the pathogenicity of C. cassiicola.

Fungal Growth Conditions and DNA Preparation
C. cassiicola YN49 (CGMCC 3.20259) and CC01 (CGMCC 3.20258) strains were separately isolated from rubber plantations in Hekou, Yunnan province, and Yangjiang, Guangdong province. Both strains were maintained in our laboratory. Fungi were grown on potato dextrose agar medium and incubated at 28 • C for 10 days. Mycelia were harvested and DNA was extracted from grounded mycelia using a genomic DNA kit (Qiagen, New York, NY, USA). Agarose gel electrophoresis, a NanoDrop 1000 spectrophotometer (Thermo, Bedford, MA, USA) and a Qubit fluorimeter (Thermo, Bedford, MA, USA) were used to analyse the integrity, quality, and concentration of DNA, respectively. Genomic DNA was further purified for sequencing (Oxford Nanopore Technology, Oxford, UK) using a BluePippin DNA size selection system (Sage Science, Beverly, MA, USA).

Pathogenicity Tests
Rubber tree cultivar Reyan7-33-97, one of the main rubber varieties in China, was selected to analysis pathogenicity of C. cassiicola YN49 and CC01 according to protocol of Déon's with modifications [14]. The C. cassiicola isolates were cultivated on PDA medium at 28 • C and conidia were collected and resuspended in sterile water at a concentration of 10,000 conidia mL −1 . Three drops of each conidial suspension (20 µL) were applied to the abaxial surface of 1 detached rubber tree leaflet at developmental stage C (brownish to limp green) and leaflets were maintained in a moist environment at 28 • C for 24 h in the dark and then under alternate light (photoperiod 12 h/12 h).

Detection and Sequencing of the Cassiicolin-Encoding Genes
Detection of the cassiicolin gene was conducted by PCR on genomic DNA from C. cassiicola isolates YN49 and CC01. Primers and PCR reaction conditions of Déon's were used for cassiicolin gene detection [14]. PCR product of cassiicolin genes was sequenced by BGI (Shenzhen, China). Maximum likelihood trees generated using amino acid sequences of A and KS domains were constructed using the method described previously by Déon et al. (2014) with MEGA version 6.0.

Genome Sequencing and Assembly
After repairing DNA damage, an SQK-LSK108 ligation kit (Oxford Nanopore Technology, UK) was used to construct a library, and a Qubit fluorimeter was used to assess the quality of the library. Single-molecule real-time sequencing of long reads was conducted on a GridION X5 platform (Oxford Nanopore Technology, UK). After filtering the sequencing adapters and low-quality sequences, clean data (YN49, 5.1 Gb clean reads, 115 × sequencing depth; CC01, 10.5 Gb clean reads, 228× sequencing depth) were obtained and assembled using CANU (version 1.3) [25] with default parameters. In addition, a separate paired-end (PE) DNA library was sequenced using an Illumina HiSeq4000 platform (Illumina, San Diego, CA, USA). The sequencing data (filtered reads = 2.99 Gb, sequencing depth = 90×) were used to further improve the assembly using bwa mem (version 0.7.12-r1039) and two runs of Pilon (version 1.22) with continuous iteration correction [26]. The integrity of assembly was evaluated using BUSCO (benchmarking universal single-copy orthologs) version 3.0.1 (https://busco.ezlab.org/ accessed on 13 October 2020) [27].

Gene Prediction and Annotation
A combination of Augustus and Glimmer was used for de novo prediction of proteincoding genes by constructing models. GeneWise [28] was then used to predict proteincoding genes via homology analysis with known protein sequences from related species (C. cassiicola, Cercospora canescens, Bipolaris maydis, Pyricularia oryzae, Fusarium oxysporum and Pseudocercospora fijiensis). EVidenceModeler (EVM) (http://evidencemodeler.github.io/ accessed on 13 October 2020) was subsequently used to compute weighted consensus gene structure annotations. We obtained the final gene sets after removing genes with transposable elements using TransposonPSI (http://transposonpsi.sourceforge.net/ accessed on 13 October 2020). Functional annotations for all predicted gene models were made using multiple databases, including Swiss-Prot, NR, KEGG and COG, and BlastP, with E-values ≤ 1.0 × 10 −5 .

Gene Family Expansion Analysis
Using the OrthoMCL gene family results, CAFE (computational analysis of gene family evolution, version 4.0.1) [32] was employed to detect gene family expansion and contraction (using divergence time instead of branch length).

Positive Selection
To perform positive selection, we obtained a new gene set of orthologous gene pairs using the genomes of the 12 phytopathogenic fungi mentioned before. Using BLAST (version 2.2.30) with an E-value cut-off ≤ 1.0 × 10 −5 , we identified orthologous gene pairs with reciprocal best hits among the 12 species. We estimated the dN/dS ratio (ω) using PAML version 4.9e [33] with the coding sequence alignments above to determine the selection pressure on corresponding gene pairs. Genes with positively selected sites were detected using branch-site models (model = 2 and NSsite = 2). For the null hypothesis we used the parameters fix_omega = 1 and omega = 1, but for the alternative hypothesis we used fix_omega = 0 and omega = 1.5. We used a false discovery rate (FDR)-corrected likelihood ratio test (LRT) with an adjusted LRT p-value cut-off ≤ 0.05 to identify positively selected sites of genes.
Lipases, proteases and the secretome were predicted according to the procedure described previously by Lopez and colleagues for Corynespora cassiicola [16]. Hmmsearch (HMMER 3.1b1; http://hmmer.org/ accessed on 13 October 2020) was used with predicted protein sequences as queries to search against the Lipase Engineering Database (version 3) with an E-value inclusion threshold set at 0.01 [37]. Proteases were predicted in YN49, CC01 and CCp strain genomes by performing a BLASTP search (E-values ≤ 1.0 × 10 −5 ) against the MEROPS blast database (version 9.12) with predicted protein sequences as queries [38]. SignalP (http://www.cbs.dtu.dk/services/SignalP/ accessed on 13 October 2020) was used to predict signal peptides and cleavage sites of predicted proteins. Proteins with a signal P D-score = Y were scanned for transmembrane spanning regions using TMHMM (version 2.0; http://www.cbs.dtu.dk/services/TMHMM/ accessed on 13 October 2020) and all proteins with 0 TMs or 1 TM, if located in the predicted N-terminal signal peptide, were retained. Proteins potentially secreted through endoplasmic reticulum (ER)/Golgiindependent pathways were not taken into account in this study.

C. cassiicola Cas5 and Cas2
Isolates from China Differ in Virulence toward Rubber Tree Leaves C. cassiicola YN49 and CC01 strains were separately isolated from rubber plantations in Hekou, the Yunnan province, and Yangjiang, the Guangdong province. Based on the sequence of the cassicolin-encoding gene, the YN49 strain belongs to Cas5 isolates and CC01 belongs to Cas2 isolates ( Figure S1, Supplementary Materials File S7). YN49 and CC01 mycelia grown on PDA was fluffy and grey ( Figure 1), similar to the CCP strain [16]. YN49 produced different pigments on PDA compared with other strains, and the PDA medium turned light orange as the mycelia aged ( Figure 1). After 5 days of conidia inoculation, both YN49 and CC01 caused necrotic spots and typical darkening of the veins on the leaves of Hevea brasiliensis clone Reyan7-33-97, one of the main rubber tree cultivars in China. Compared with CC01, YN49 was more pathogenic toward Reyan7-33-97 ( Figure 1). CC01 mycelia grown on PDA was fluffy and grey ( Figure 1), similar to the CCP strain [16]. YN49 produced different pigments on PDA compared with other strains, and the PDA medium turned light orange as the mycelia aged ( Figure 1). After 5 days of conidia inoculation, both YN49 and CC01 caused necrotic spots and typical darkening of the veins on the leaves of Hevea brasiliensis clone Reyan7-33-97, one of the main rubber tree cultivars in China. Compared with CC01, YN49 was more pathogenic toward Reyan7-33-97 ( Figure  1).

General Genome Features and Annotation
Single-molecule real-time sequencing of long reads was conducted on a GridION X5 platform, and the genome of YN49 was sequenced with 115 × coverage, while the genome of CC01 was sequenced with 228 × coverage. CANU was used for de novo assembly of the sequencing data with Pilon-based continuous iteration correction using Illumina HiSeq4000 sequencing data, which generated 32 contigs with an N50 length of 2.51 Mb for the YN49 genome, and 33 contigs with an N50 length of 2.56 Mb for YN49. The genome size of YN49 (45.1 Mb) and CC01 (47.1 Mb) is slightly larger than that of HGCC (42.7 Mb), CCP (44.8 Mb) and UM591 (41.4 Mb). The GC content of YN49 (51.09%) and CC01 (50.96%) is lower than that of HGCC (51.78%), CCP (51.89%) and UM591 (52.47%; Table 1). The completeness of the genome assembly was assessed using BUSCO, which showed that

General Genome Features and Annotation
Single-molecule real-time sequencing of long reads was conducted on a GridION X5 platform, and the genome of YN49 was sequenced with 115 × coverage, while the genome of CC01 was sequenced with 228 × coverage. CANU was used for de novo assembly of the sequencing data with Pilon-based continuous iteration correction using Illumina HiSeq4000 sequencing data, which generated 32 contigs with an N50 length of 2.51 Mb for the YN49 genome, and 33 contigs with an N50 length of 2.56 Mb for YN49. The genome size of YN49 (45.1 Mb) and CC01 (47.1 Mb) is slightly larger than that of HGCC (42.7 Mb), CCP (44.8 Mb) and UM591 (41.4 Mb). The GC content of YN49 (51.09%) and CC01 (50.96%) is lower than that of HGCC (51.78%), CCP (51.89%) and UM591 (52.47%; Table 1). The completeness of the genome assembly was assessed using BUSCO, which showed that 96.9% and 97.5% of the gene groups were correct assembled for the YN49 and CC01 scaffolds, respectively (Supplementary Materials File S1: Table  S1). The YN49 genome contains 388 noncoding RNAs (ncRNAs) comprising 152 ribosomal RNAs (rRNAs), 44 small nuclear RNAs (snRNAs) and 192 transfer RNAs (tRNAs), while the CC01 genome contains fewer ncRNAs (338) comprising 79 rRNAs, 45 snRNAs and 214 tRNAs (Supplementary Materials File S1: Table S2). Furthermore, 8.18% of the YN49 genome and 9.18% of the CC01 genome are repetitive based on de novo and referencebased repeat analysis results (Supplementary Materials File S1:

Analysis of Orthologues and Phylogenetic Relationships between C. cassiicola YN49 and CC01 and Other Fungi
We clustered the annotated genes of C. cassiicola YN49 and CC01, and the other 10 phytopathogenic fungi, into gene families, including 3,388 single-copy genes, which were used for phylogenetic tree construction. A maximum likelihood phylogenetic tree was generated by the RaxML method, based on the GTRGAMMA model ( Figure 2). The results revealed that C. cassiicola YN49 and CC01 are evolutionarily closely related to C. cassiicola CCP, a fungus isolated in the Philippines that causes CLF disease in rubber trees.

Gene Family Expansion and Contraction, and Positive Selection of Genes
The expansion and contraction of gene families is thought to be important in adaptive phenotypic diversification [41]. For plant pathogenetic fungi, continuous coevolution of host plants gives rise to constant selective pressure for the preservation of expanded gene families relevant to virulence and host-based nutrient usage [42]. Based on sequence homology, we identified 98 (432 genes), 120 (479 genes) and 168 (734 genes) gene families

Gene Family Expansion and Contraction, and Positive Selection of Genes
The expansion and contraction of gene families is thought to be important in adaptive phenotypic diversification [41]. For plant pathogenetic fungi, continuous coevolution of host plants gives rise to constant selective pressure for the preservation of expanded gene families relevant to virulence and host-based nutrient usage [42]. Based on sequence homology, we identified 98 (432 genes), 120 (479 genes) and 168 (734 genes) gene families showing expansion in YN49, CC01 and CCP genomes, respectively ( Figure 2 and Supplementary Materials File S2: Table S4). We also identified 206 (212 genes), 115 (178 genes) and 118 (131 genes) gene families showing contraction in YN49, CC01 and CCP genomes, respectively ( Figure 2 and Supplementary Materials File S2: Table S5). KEGG pathway enrichment analysis of expanded and contracted gene families indicated that most of these gene families are associated with primary and secondary metabolism pathways, such as amino acid, fatty acid, terpenoid and aflatoxin metabolism (Figure 3 and Figure S2; Supplementary Materials File S2: Tables S1 and S2). In addition to the expanded and contracted gene families, genes showing positive selection commonly contribute to adaptive phenotypic evolution and adaptation. Herein, 61, 49 and 24 genes were identified as positively selected genes in YN49, CC01 and CCP genomes, respectively (Supplementary Materials File S2: Table S6). KEGG (metabolic pathway) enrichment analysis of these positively selected genes revealed that some KEGG pathways that were significantly enriched were related to carbon metabolism and amino acid metabolism ( Figure 4 and Supplementary Materials File S2: Table S3).

Whole-Genome Synteny Comparisons between C. cassiicola YN49, CC01 and CCP
Phylogenomic analysis revealed that C. cassiicola YN49 and CC01 are evolutionarily closely related to C. cassiicola CCP. We therefore performed a synteny comparison between these three strains. The resulting synteny dot-plot displays macrosynteny between the three genomes, and high levels of sequence homology with each other; 19 contigs of YN49, 23 contigs of CC01 and 15 contigs of CCP have conserved syntenic blocks ( Figure 5).

Whole-Genome Synteny Comparisons between C. cassiicola YN49, CC01 and CCP
Phylogenomic analysis revealed that C. cassiicola YN49 and CC01 are evolutionarily closely related to C. cassiicola CCP. We therefore performed a synteny comparison between these three strains. The resulting synteny dot-plot displays macrosynteny between the three genomes, and high levels of sequence homology with each other; 19 contigs of YN49, 23 contigs of CC01 and 15 contigs of CCP have conserved syntenic blocks ( Figure  5). Synteny analysis of C. cassiicola YN49, CC01 and CCP strains. Phylogenomic analysis revealed that C. cassiicola YN49 and CC01 are evolutionarily closely related to C. cassiicola CCP. Therefore, we performed synteny comparison between these three species. The resulting synteny dot-plot shows macrosynteny between the three genomes, and high levels of sequence homology between strains. There are 19 contigs in YN49, 23 contigs in CC01 and 15 contigs in CCP with conserved syntenic blocks.

Secretome and Putative Pathogenicity Genes
The secretome of a plant pathogenetic fungus includes extracellular secreted proteins that are deployed to the host-pathogen interface during infection, including important virulence factors such as effector proteins for the manipulation of host cell dynamics and cell wall-degrading enzymes [43]. A secretome prediction pipeline for C. cassiicola [16] was implemented to predict the secretomes of C. cassiicola YN49, CC01 and CCP. A total of 1,563 secreted proteins in the YN49 genome, 1,474 in the CC01 genome, and 1,534 in the Synteny analysis of C. cassiicola YN49, CC01 and CCP strains. Phylogenomic analysis revealed that C. cassiicola YN49 and CC01 are evolutionarily closely related to C. cassiicola CCP. Therefore, we performed synteny comparison between these three species. The resulting synteny dot-plot shows macrosynteny between the three genomes, and high levels of sequence homology between strains. There are 19 contigs in YN49, 23 contigs in CC01 and 15 contigs in CCP with conserved syntenic blocks.

Secretome and Putative Pathogenicity Genes
The secretome of a plant pathogenetic fungus includes extracellular secreted proteins that are deployed to the host-pathogen interface during infection, including important virulence factors such as effector proteins for the manipulation of host cell dynamics and cell wall-degrading enzymes [43]. A secretome prediction pipeline for C. cassiicola [16] was implemented to predict the secretomes of C. cassiicola YN49, CC01 and CCP. A total of 1563 secreted proteins in the YN49 genome, 1474 in the CC01 genome, and 1534 in the CCP genome were predicted, accounting for 10.78%, 8.56% and 8.93% of their proteomes, respectively ( Figure 6

Carbohydrate-Active Enzymes
Secreted carbohydrate degradation is an important component of fungal pathogenicity and virulence. Based on catalytic activity, CAZymes were further classified into auxiliary activities (AAs), carbohydrate esterases (CEs), glycoside hydrolases (GHs), glycosyl transferases (GTs), and polysaccharide lyases (PLs) [36]. We examined the CAZymes of C. cassiicola YN49, CC01 and CCP. Using the common CAZy annotation pipeline for the genomic analysis of fungi, we identified 417 putative secreted CAZymes falling into 83 CAZyme families in YN49, 380 falling into 79 families in CC01, and 531 falling into 101 families in CCP (Supplementary Materials File S3: Tables S1-S3). To overcome the barrier of the plant cell wall, phytopathogenic fungi produce enzymes that degrade cellulose, pectin and cutin, and that are capable of degrading cell wall polymers [19]. CAZymes involved in plant cell wall degradation, such as cellulose, hemicellulose pectin and cutin degradation, are listed in Table 2, according to classification by Chang [46] and Kubicek [19]. The results indicate that CCP possesses 223 secreted plant cell wall degradation-related CAZymes, 14.95% more than YN49 (194) and 21.19% more than CC01 (184).

Carbohydrate-Active Enzymes
Secreted carbohydrate degradation is an important component of fungal pathogenicity and virulence. Based on catalytic activity, CAZymes were further classified into auxiliary activities (AAs), carbohydrate esterases (CEs), glycoside hydrolases (GHs), glycosyl transferases (GTs), and polysaccharide lyases (PLs) [36]. We examined the CAZymes of C. cassiicola YN49, CC01 and CCP. Using the common CAZy annotation pipeline for the genomic analysis of fungi, we identified 417 putative secreted CAZymes falling into 83 CAZyme families in YN49, 380 falling into 79 families in CC01, and 531 falling into 101 families in CCP (Supplementary Materials File S3: Tables S1-S3). To overcome the barrier of the plant cell wall, phytopathogenic fungi produce enzymes that degrade cellulose, pectin and cutin, and that are capable of degrading cell wall polymers [19]. CAZymes involved in plant cell wall degradation, such as cellulose, hemicellulose pectin and cutin degradation, are listed in Table 2, according to classification by Chang [46] and Kubicek [19]. The results indicate that CCP possesses 223 secreted plant cell wall degradation-related CAZymes, 14.95% more than YN49 (194) and 21.19% more than CC01 (184).

Secondary Metabolite Gene Clusters
Phytotoxic SMs are crucial weapons that phytopathogenic fungi use to invade target plants, and many are made from polyketides, non-ribosomal peptides, terpenes and alkaloids [47]. Compared with the SMs mentioned above, beta-lactones are rarely found in plant pathogens. However, with improved biochemical knowledge and bioinformatic predictions, new beta-lactones with novel functions and related biosynthetic gene clusters are being identified in fungi [48,49]. Herein, AntiSMASH 5.1.2 (fungi view) was used to identify SM BGCs in the genomes of C. cassiicola YN49, CC01 and CCP, and all putative SM BGCs are listed in Supplementary Materials File S1: Table S5. As shown in Table 3, YN49 and CCP both have 57 SM BGCs, while CC01 has 62. All three strains share a similar number of BGCs of most types of SM, including NRPSs, PKSs (type I and III), and indole and terpene BGCs, but PKS/NRPS, PKS/indole and beta-lactone BGCs did differ somewhat between the target strains (Table 3). CC01 has 10 PKS/NRPS BGCs, nine more than YN49 and six more than CCP. There is a single beta-lactone BGC located in both YN49 and CCP genomes, but none in the CC01 genome. YN49 is the only strain possessing a single PKS/indole BGC (Table 3).

Phylogenomic Analysis of NRPS, PKS and PKS/NRPS Genes, and Domain Structure Analysis
In order to determine differences between the secondary metabolomes of C. cassiicola YN49, CC01 and CCP, we analysed the phylogenomic relationships of the NRPS, PKS, and PKS/NRPS BGCs identified in the three strains, based on the sequence of A and KS domains, which are relatively conserved in NRPSs and PKSs [50,51]. The phylogenomic relationships of A and KS domains indicated that most NRPSs and PKSs are conserved in these three strains, with many small clades containing three KS or A domains from three different strains (Figures 7 and 8

Synteny Analysis of PKS/NRPS Gene Clusters
Our results showed that a number of PKS/NRPS clusters differ significantly betw C. cassiicola YN49, CC01 and CCP (Table 3). Additionally, phylogenomic relationship a ysis indicated that the PKS/NRPS genes shared high homology, and could be clust into one clade based on the sequences of the A or KS domains (Figures 7 and 8). An i SM BGC contains genes involved in product modification, transport, and transcrip regulation, but not backbone synthesis genes whose enzymatic products produce a metabolite [52]. To analyse the differences between PKS/NRPS clusters in target C. c icola strains, synteny of the PKS/NRPS clusters was analysed. Among the seven te clusters, only three (CC01-21.1, CC01-25.1 and CC01-30.1) displayed good synteny. C 23.1, CC01-30.2 and CC01-30.1 share some synteny, but CC01-12.3 and CCP-7.1 hav most no synteny with other analysed clusters (Figure 9). These results indicate large ferences among the PKS/NRPS gene clusters in target C. cassiicola strains, which sugg that secondary metabolism may vary between C. cassiicola strains.

Synteny Analysis of PKS/NRPS Gene Clusters
Our results showed that a number of PKS/NRPS clusters differ significantly between C. cassiicola YN49, CC01 and CCP (Table 3). Additionally, phylogenomic relationship analysis indicated that the PKS/NRPS genes shared high homology, and could be clustered into one clade based on the sequences of the A or KS domains (Figures 7 and 8). An intact SM BGC contains genes involved in product modification, transport, and transcription regulation, but not backbone synthesis genes whose enzymatic products produce a core metabolite [52]. To analyse the differences between PKS/NRPS clusters in target C. cassiicola strains, synteny of the PKS/NRPS clusters was analysed. Among the seven tested clusters, only three (CC01-21.1, CC01-25.1 and CC01-30.1) displayed good synteny. CCP-23.1, CC01-30.2 and CC01-30.1 share some synteny, but CC01-12.3 and CCP-7.1 have almost no synteny with other analysed clusters (Figure 9). These results indicate large differences among the PKS/NRPS gene clusters in target C. cassiicola strains, which suggests that secondary metabolism may vary between C. cassiicola strains.

Synteny Analysis of PKS/NRPS Gene Clusters
Our results showed that a number of PKS/NRPS clusters differ significantly between C. cassiicola YN49, CC01 and CCP (Table 3). Additionally, phylogenomic relationship analysis indicated that the PKS/NRPS genes shared high homology, and could be clustered into one clade based on the sequences of the A or KS domains (Figures 7 and 8). An intact SM BGC contains genes involved in product modification, transport, and transcription regulation, but not backbone synthesis genes whose enzymatic products produce a core metabolite [52]. To analyse the differences between PKS/NRPS clusters in target C. cassiicola strains, synteny of the PKS/NRPS clusters was analysed. Among the seven tested clusters, only three (CC01-21.1, CC01-25.1 and CC01-30.1) displayed good synteny. CCP-23.1, CC01-30.2 and CC01-30.1 share some synteny, but CC01-12.3 and CCP-7.1 have almost no synteny with other analysed clusters (Figure 9). These results indicate large differences among the PKS/NRPS gene clusters in target C. cassiicola strains, which suggests that secondary metabolism may vary between C. cassiicola strains. Figure 9. Synteny analysis of PKS/NRPS gene clusters in C. cassiicola CC01 and CCP. Figure 9. Synteny analysis of PKS/NRPS gene clusters in C. cassiicola CC01 and CCP.

Discussion
C. cassiicola strains have been isolated from different plants with various life styles, including endophyte, saprophyte, and many necrotrophic pathogens [16,[53][54][55]. Rubber tree CLF disease, caused by C. cassiicola, is a devastating leaf disease affecting rubber plantations in many countries in Asia and Africa, and it also threatens rubber production in China [14,16,56]. C. cassiicola isolates show high genetic diversity, and virulence profiles vary significantly between rubber tree cultivars [14]. Based on the amino acid sequence of the phytotoxin cassiicolin, C. cassiicola is clustered into the following seven types: Cas1-6 and Cas0. Cas2 isolates have only been found in China [15]. Except cassiicolin, little is known about the pathogenesis and associated pathogenic factors of C. cassiicola. Although 41 C. cassiicola genome assemblies are available in the NCBI database (https://www.ncbi. nlm.nih.gov/genome/browse/#!/eukaryotes/31373/ accessed on 13 October 2020), most assemblies are not annotated, and are composed of thousands of contigs. CCP is the only annotated assembly, but it also contains hundreds of contigs [16]. In the present study, we determined two high-quality gapless genome sequences, acquired by single-molecule real-time sequencing, of two C. cassiicola strains isolated from rubber trees in China with different virulence, including one Cas2 isolate. To decipher the genomic basis underlying the pathogenesis of CLF disease, we performed a comparative analysis of genomic data between Chinese strains and the highly virulent Philippine strain CCP.
Utilisation of host-based nutrition is critical for fungal parasitism, and gene families can contract or expand to adapt to distinct host-based nutrients [57]. For plant pathogenetic fungi, advantageous substitutions that enhance the capacity to infect hosts or adapt to new environments are likely to be rapidly fixed in the population [58]. Rubber trees are traditionally planted between 15 • N and 10 • S, but China has 1.13 million hectares of rubber plantations between 18 • N and 24 • N. Due to the higher latitude, cold weather is the most serious threat to rubber plantations in China. Thus, cold-resistant rubber clones have been developed and are widely cultivated in Chinese rubber tree plantations [59,60]. Different rubber tree cultivars result in different C. cassiicola CFL strains, including Cas2 type isolates that infect pepper, cucumber and papaya worldwide, but infect rubber trees only in China. The virulence profiles of C. cassiicola vary significantly depending on rubber tree cultivars, and also display geographical specialisation [14,56]. There are the following two hypotheses explaining host and geographical specialisation of C. cassiicola: the coevolution of isolates with different rubber cultivars, and host switching from other plants. Based on our results, the Chinese Cas2 isolate CC01 appears to have diverged phylogenetically from CCP approximately 6.95 (5.29-9.18) million years ago (MYA), and the Chinese Cas5 isolate YN49 diverged~9.81 (7.64-12.82) MYA ( Figure 2). Since rubber trees have only been grown in China for less than 100 years, CC01 and YN49 have presumably transferred from other host plants to rubber trees. We also found that the most expanded and contracted gene families in the genomes of the two Chinese C. cassiicola CFL strains and the Philippine strain are related to metabolic pathways ( Figure 3 and Supplementary Materials, Table S4), and a large proportion of positively selected genes, which reflect the evolutionary pressure imposed by natural selection, in these three strains are metabolism-related (Figure 4 and Additional File 2: Table S6), including genes involved in carbohydrate metabolism and secondary metabolism. The diversity of metabolism-related genes allows C. cassiicola to adapt to different rubber cultivars, and this may lead to differences in pathogenesis and/or pathogenic factors, as YN49 and CC01 showed different virulence (Figure 1).
Plant pathogenic fungi can secrete a series of proteins that are deployed to the hostpathogen interface during infection, including enzymes interacting with plant substrates (CAZymes, peptidases and lipases), together with proteins of unknown function [61]. Necrotrophic and hemibiotrophic plant pathogenic fungi usually secrete a larger number of enzymes that are more important for host invasion than biotrophs [62,63]. Our results showed that all three tested C. cassiicola strains possess a large number of secreted proteincoding genes in their genomes. In agreement with expanded and contracted gene families results, the number of secreted protein-coding genes differed between all three tested C. cassiicola strains. The Cas1 isolate (CCP) has more secreted protein-encoding genes than the Cas2 isolate (CC01) and the Cas5 isolate (YN49), consistent with previous reports [16] and phenotype results (Figure 1).
CAZymes are important for carbon acquisition and metabolism in fungi, and CWDEs are used by phytopathogenic fungi as powerful weapons to penetrate and degrade the plant cell wall [64]. The Cas2 isolate (CC01) and the Cas5 isolate (YN49) were found to contain fewer CWDEs than the Cas1 isolate (CCP) in the present study, especially the Cas2 isolate ( Table 2). The most obvious difference was the number of pectin-and cutin-degrading enzymes, and GH5, which is related to cellulose degradation ( Table 2). Pectinolytic enzymes are important in pathogenesis as potential virulence factors, especially in phytopathogens affecting dicots, since the pectin content of the cell wall (30%) is much higher than that of monocots (10%) [65,66]. Other pathogenetic fungi affecting rubber trees, such as Colletotrichum spp., also secrete pectin lyases to invade rubber leaves [67,68]. CCP encodes many more secreted pectin lyases than YN49 and CC01, such as GH28, PL1, and three other families, and CC01 possesses the fewest pectinolytic enzyme-encoding genes. This suggests that pectin degradation ability varies between different C. cassiicola strains, and Cas1 isolates might degrade pectin components more effectively than Cas2 and Cas5 isolates, this could partially explain the pathogenicity difference between different C. cassiicola strains.
Rubber tree leaves are covered with a thick, waxy cuticle, and the thickness is related to pathogen resistance [18,69]. The cuticle consists a matrix of mid-chain hydroxy and/or epoxy C16 and/or C18 fatty acid monomers and waxes, including numerous very-longchain fatty acids (VLCFAs; C20-C40). Both the cuticle and the waxes cover the surface of leaves to prevent water loss and invasion of pathogens [70]. To breach these hydrophobic layers, many phytopathogenic fungi secrete cutinases and lipases in the early stages of infection, as demonstrated for Valsa mali, Blumeria graminis and F. graminearum [45,47,71]. One CCP strain was found to encode 11 putative cutinases, more than are present in hemibiotrophs, aprotrophs and ectomycorrhizal fungi [16]. Herein, we analysed the distribution of secreted cutinases between different cassicolin toxin classes, and the results indicated that Cas1 isolates possess more secreted cutinase-encoding genes than Cas2 and Cas5 isolates, with Cas2 isolates containing the fewest. The trend in the number of secreted lipase-encoding genes differed from the results of cutinase analysis; CCP encodes fewer secreted lipase genes than both YN49 and CC01. In order further evaluate the lipid degradation ability, we combined the results of secreted lipase and PHI analyses, and found that CCP possesses more pathogenicity-relevant lipases than CC01, but fewer than YN49 ( Figure 3). Thus, Cas1 isolates (CCP) and Cas5 isolates (YN49) appear to encode more putative cuticle and wax degradation enzymes than Cas2 isolates (CC01), and this might be one of the reasons explain why CC01 showed less virulence than YN49 (Figure 1).
Phytotoxins (PTs) are largely represented by low-molecular-weight SMs that disrupt vital activities of plant cells and/or cause death at concentrations below 10 mM [72]. Cassiicolin is the only phytotoxin of C. cassiicola characterised to date, and the cassiicolin gene has only been detected in 47% of isolates. Based on the presence of the cassiicolin toxin and its amino acid sequence, C. cassiicola strains are divided into seven categories (Cas1-6 and Cas0) [14]. Since C. cassiicola shows high genetic diversity and little correlation between genetic clades and the traits of strains, many studies have attempted to connect cassiicolin types with pathogenicity and geographical locations. Most such studies have been carried out on rubber trees, and the results vary between different cultivars. The only thing in common is that Cas1 isolates, especially CCP strains, are more virulent toward some specific cultivars [8,14,73].
In our previous study, we analysed the pathogenicity of three C. cassiicola isolates of three different toxin classes (Cas2, Cas5 and Cas0), which are the dominant populations in Chinese rubber plantations, toward four different rubber tree cultivars. All three isolates showed differences in pathogenicity toward the four clones, and there were no obvious trends [74]. In the present work, YN49 showed stronger virulence than CC01 toward Hevea brasiliensis clone Reyan7-33-97 ( Figure 1). Comparative genomic results showed that CCP contains many more CWDEs than YN49 and CC01, and CC01 has the fewest such enzymes. Thus, we speculate that the cassicolin toxin is not the only pathogenic factor affecting the virulence of C. cassiicola toward rubber trees, and secreted CWDEs also play an important role during pathogenesis, particularly in the early stages of infection. Secreted CWDEs are potent weapons used by C. cassiicola to breach the cell wall barrier, and assist the release of cassicolin and other pathogenic factors. Thus, a thick cuticle and wax layer covering rubber tree leaves, and other enhanced cell wall structures, may effectively prevent the invasion of C. cassiicola strains lacking CWDEs, but may be unable to block the penetration of C. cassiicola strains containing numerous CWDEs. Since the resistance of rubber tree leaves and CWDEs of different C. cassiicola strains remain unclear, the results of pathogenicity tests on the CFL strains were somewhat difficult to interpret. This may partially explain why different rubber tree clones show differences in susceptibility to CLF disease. For example, GT1 is resistant in Africa but highly sensitive in Thailand [8].
Numerous C. cassiicola strains do not encode the cassicolin toxin, and some can still cause CFL disease in rubber trees in many areas, including China, India and Thailand [8,62]. Similarly, serval Cas0 isolates were found to infect cucumber [15]. These findings indicate that C. cassiicola contains other pathogenic factors in addition to the cassicolin toxin. Phytopathogenic fungi, especially filamentous fungi, usually contain dozens of SM BGCs encoding a diverse array of small molecules that function as phytotoxins, antibiotics and pigments [24,52]. Herein, we identified~60 SM BGCs in three different C. cassiicola strains, many more than the average for phytopathogenic fungi, consistent with previous studies, indicating that C. cassiicola possesses the genomic basis for the synthesis of various SMs [16,75]. Based on the genomes of 66 different Aspergillus fumigatus strains, Lind et al. (2017) summarised five general types of variation in SM BGCs within a fungal species, and revealed diversity and discontinuity in the distributions of SM BGCs, including between strains sharing high levels of synteny [52]. Although the total number of SM BGCs is similar in the three analysed C. cassiicola strains, they have significant differences in the number of genes encoding beta-lactone, PKS/indole and NRPS/indole SM BGCs, and especially PKS/NRPS BGCs (Table 3). PKS/indole and NRPS/indole BGCs are only present in YN49 and CC01, and beta-lactone BGCs are not present in CC01, indicating whole-gene cluster polymorphisms in C. cassiicola for SM BGCs. Further phylogenomic and domain structure analyses showed that although most PKSs and NRPSs could be clustered into one clade with high bootstrap value support, some parts had distinct domain structures, such as the PKS of the CCP-7.5 cluster and the YN49-11.2 cluster, and the NRPS of the CCP-10.1 cluster (Figures 7 and 8). This is consistent with single nucleotide polymorphisms (SNPs) and short indel polymorphisms. All the PKS/NRPSs could be clustered into one clade based on A or KS domain sequences, suggesting that PKS/NRPSs are relatively highly conserved between C. cassiicola strains (Figures 7 and 8). However, synteny analysis of clusters containing PKS/NRPSs gave diverse results; only three clusters showed synteny, while the rest shared almost no synteny (Figure 9). This indicates gene content polymorphisms in C. cassiicola SM BGCs, even between clusters with similar core genes. C. cassiicola SM BGC polymorphisms likely correlate with differences in SMs between strains, and this may be one of the reasons why C. cassiicola displays complex pathotypic diversity.

Conclusions
This study determined high-quality whole-genome sequences of two Chinese isolates (YN49 and CC01) of C. cassiicola that cause CLF disease of rubber trees. Comparative genomics of gene families in these two stains and the virulent CCP strain from the Philippines showed that C. cassiicola strains with different geographic origins have experienced different selective pressures, resulting in differences in metabolism-related gene families. Secreted protein analysis indicates that the number of secreted CWDEs is correlated with pathogenesis, and the structural resistance of rubber tree leaves may also influence the pathogenicity of CFL strains. C. cassiicola strains encode numerous SM BGCs, and there is significant diversity and discontinuity in the distribution of SM BGCs between C. cassiicola strains. This implies SM polymorphisms between C. cassiicola strains, which may play an important role in pathogenic progress. These findings form the basis for further experimental studies on the pathogenesis of rubber tree CLF disease.

Data Availability Statement:
The whole-genome sequencing datasets from this study have been submitted to the BioProject database of the National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/ accessed on 13 October 2020) under accession number PRJNA687613 and PRJNA687612. All sequencing raw data were uploaded to the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra accessed on 13 October 2020) database under the GenBank accession numbers SRR13318044 and SRR13316927.