Genomic Features of Cladobotryum dendroides, Which Causes Cobweb Disease in Edible Mushrooms, and Identification of Genes Related to Pathogenicity and Mycoparasitism.

Cladobotryum dendroides, which causes cobweb disease in edible mushrooms, is one of the major fungal pathogens. Our previous studies focused on the genetic and morphological characterization of this fungus, as well as its pathogenicity and the identification of appropriate fungicides. However, little is known about the genome characters, pathogenic genes, and molecular pathogenic mechanisms of C. dendroides. Herein, we reported a high-quality de novo genomic sequence of C. dendroides and compared it with closely-related fungi. The assembled C. dendroides genome was 36.69 Mb, consisting of eight contigs, with an N50 of 4.76 Mb. This genome was similar in size to that of C. protrusum, and shared highly conserved syntenic blocks and a few inversions with C. protrusum. Phylogenetic analysis revealed that, within the Hypocreaceae, Cladobotryum was closer to Mycogone than to Trichoderma, which is consistent with phenotypic evidence. A significant number of the predicted expanded gene families were strongly associated with pathogenicity, virulence, and adaptation. Our findings will be instrumental for the understanding of fungi-fungi interactions, and for exploring efficient management strategies to control cobweb disease.


Introduction
Cobweb disease, together with dry bubble (Lecanicillium fungicola), green mold (Trichoderma aggressivum), and wet bubble (Mycogone perniciosa), are currently considered the four most serious diseases of edible mushrooms that are caused by parasitic fungi [1]. These fungal diseases pose threats to mushroom production, and cause economic losses in all mushroom-growing countries worldwide [2,3]. The incidence of these diseases is increasing at an alarming rate, causing serious damage to a variety of edible mushrooms, including Agaricus bisporus [4,5], Pleurotus ostreatus [6], P. eryngii [7], Flammulina velutipes [3], Hypsizygus marmoreus [8], and Ganoderma sichuanensis [9]. Cobweb disease is characterized by the rapid growth of abundant coarse mycelium over the affected mushrooms [10]. Typical symptoms of this disease are cottony fluffy white or yellowish to pink colonies on mushrooms, rapid colonization,

Genome Sequencing and Assembly
We obtained genomic DNA from a single spore pure culture of C. dendroides (CCMJ2807) isolated from L. edodes. The genome was sequenced using the Sequel sequencing platform, and yielded 5.40 Gb of sequence data at 164× coverage. A total of 38.48 Mb of reads were assembled into eight contigs, with a contig N50 of 4.76 Mb, an N90 of 4.40 Mb, and a maximum span of 7.02 Mb. The size of the assembled genome was smaller than previously reported for C. protrusum (39.09 Mb) [17] and M. perniciosa (44.00 Mb) [18], but larger than other species within the Hypocreaceae, including T. longibrachiatum (32.24 Mb) [25], T. reesei (33.39 Mb) [26], and Escovopsis weberi (27.20 Mb) [27]. The guanine-cytosine (GC) content of the C. dendroides genome was 48.2%. The completeness of the assembled genome was evaluated by mapping the Benchmarking Universal Single-Copy Orthologs (BUSCO) set of 3725 fungal genes to the assembly. This analysis identified 3662 complete BUSCOs in the C. dendroides gene sets, which indicated a high-quality genome with an estimated completeness of 98.31% (Table 1 and Pathogens 2020, 9, 232 3 of 16 Figure 1). Telomeres comprised of TTAGGG repeats are commonly identified at the molecular end of the linear chromosomes in filamentous fungi [28]. Among the eight contigs, four contigs (utg0, 1, 3, and 11) contained complete telomere-to-telomere structures, including remarkably regular repeat sequences ([T]TTAGGG). The remaining four contigs (utg2, 4, 9, and 83) could also be arranged into two complete telomere pairs, as each was comprised of one telomeric repeat. Consequently, there were six potential chromosomes in the C. dendroides genome, similar to related Trichoderma species, which have three to seven chromosomes [22]. These results indicated that our assembled genome was contiguous and sufficient for further analysis. Our new, high-quality, whole genome sequence of C. dendroides will provide insights into its evolution and pathogenic mechanisms.

Gene Prediction and Annotation
There were ~1.95 Mb of repetitive elements in the C. dendroides genome, which represented 5.07% of the total genome; this was consistent with previous findings in C. protrusum [17]. These repetitive elements comprised all of the major transposable element (TE) types, including DNA repeats, long interspersed nuclear elements (LINEs), short interspersed elements (SINEs), and long terminal repeats (LTRs). LTRs were the most abundant TE, representing 2.25% (0.86 Mb) of the genome, followed by LINEs (0.47%; 0.79 Mb) and DNA transposable elements (0.36%). Variations in genome size and structure, as well as in the numbers of predicted genes in the genome of a given species, are potentially closely associated with differences in repetitive DNA content; these differences may be due to natural selection, species lifestyle, and ecological niche [29]. C. dendrodides and C. protrusum have fewer repetitive elements, especially TEs, as compared M. perniciosa [17]. This might be because M. perniciosa has more efficient DNA removal mechanisms [18], and might explain the smaller size of the M. perniciosa genome. Genome size might be affected by host-and environment-associated

Gene Prediction and Annotation
There were~1.95 Mb of repetitive elements in the C. dendroides genome, which represented 5.07% of the total genome; this was consistent with previous findings in C. protrusum [17]. These repetitive elements comprised all of the major transposable element (TE) types, including DNA repeats, long interspersed nuclear elements (LINEs), short interspersed elements (SINEs), and long terminal repeats (LTRs). LTRs were the most abundant TE, representing 2.25% (0.86 Mb) of the genome, followed by LINEs (0.47%; 0.79 Mb) and DNA transposable elements (0.36%). Variations in genome size and structure, as well as in the numbers of predicted genes in the genome of a given species, are potentially closely associated with differences in repetitive DNA content; these differences may be due to natural selection, species lifestyle, and ecological niche [29]. C. dendrodides and C. protrusum have fewer repetitive elements, especially TEs, as compared M. perniciosa [17]. This might be because M. perniciosa has more efficient DNA removal mechanisms [18], and might explain the smaller size Pathogens 2020, 9, 232 5 of 16 of the M. perniciosa genome. Genome size might be affected by host-and environment-associated adaptations as well as evolution [30,31]. We also performed a high-throughput screening of simple sequence repeat (SSR) loci, and identified 2520 SSR loci, including 817 dinucleotide repeats (DNRs), 644 trinucleotide repeats (TNRs), 782 tetranucleotide repeats (TTNRs), 203 pentanucleotide repeats (PNRs), and 74 hexanucleotide repeats (HNRs). These repeats included 454 types of motif sequences. The mean motif length was 27.72 bp, and the longest motif was 224 bp long, consisting of the TTNR "TTAT" repeated 56 times. The largest numbers of DNR, TNR, PNR, and HNR repeats were 47, 65, 26, and 12, respectively.
Gene predictions were performed using a combination of homology-based and de novo-based approaches. Using these approaches, we identified 9548 protein coding genes, with an average length of 1819.49 bp, in the C. dendroides genome. Across the identified genes, the average sizes of the exons and introns were 564.7 and 108.75 bp, respectively, which was similar to other species of Hypocreaceae [17,18]. We then functionally annotated these coding genes against seven typical databases, and found that 9369 (98.13%) genes had homologs in at least one database. Specifically, 9345 (99.74%) were homologous to known proteins in the National Center for Biotechnology Information (NCBI) non-redundant database (Nr); 4637 (49.49%) to known proteins in the Nucleotide Sequence Database (Nt); 5003 (53.40%) to known proteins in the Eukaryotic Clusters of Orthologous Groups (KOG) database; 7230 (77.17%) to known proteins in SwissProt; 4446 (47.45%) to known proteins in the Gene Ontology (GO) database; 5203 (55.53%) to known proteins in the Kyoto Encyclopedia of Genes and Genomes (KEGG); and 5381 (57.43%) to known proteins in Pfam. The total length of the non-coding RNA (ncRNA) was 34,626 bp, accounting for 0.09% of the assembled genome. The ncRNA contained 225 transfer RNAs (tRNAs), 26 small nuclear RNAs (snRNAs), and 56 ribosome RNAs (rRNAs).

Identification of Mating-Type Idiomorphs
We identified two previously-characterized MAT genes (MAT1-1-2 and MAT1-1-3) grouped together in the genome at a single locus (contig utg0) (Supplementary Figure S1). The MAT 1-2 idiomorph was not found in C. dendroides, incongruent with results in C. protrusum and M. perniciosa, but consistent with results in Macrophomina phaseolina, T. harzianum, and T. longibrachiatum [21,22]. In addition, the genes adjacent to the MAT genes were all located upstream on this locus, including complex I intermediate-associated protein 30 (CIA30), anaphase-promoting complex (APC5), cytochrome C oxidase subunit VIa (Cox), and DNA lyase (APN2). Notably, the flanking genes in C. dendroides were the same as those in M. perniciosa (contig utg16) and were oriented in the same direction [18]. The identification of these mating-type structures in the C. dendroides genome provides some insights into the sexual lifestyle of this organism.

CAZymes Analysis
At the beginning of infection, pathogens primarily use CAZymes to destroy the polysaccharide component of the host cell wall [32]. We identified 327 C. dendroides genes across the six categories of CAZymes. We found that 163 genes, representing the largest proportion of the genome, encoded glycoside hydrolases (GHs), 103 genes encoded glycosyl transferases (GTs), 27 genes encoded carbohydrate-binding modules (CBMs), and 27 genes encoded auxiliary activities (AAs). The carbohydrate esterases (CEs; 4 genes) and polysaccharide lyases (PLs; 3 genes) were poorly represented in the genome. C. dendroides had fewer CAZymes than several other species in the Hypocreaceae: Across the three pathogens (C. dendroides, C. protrusum, and M. perniciosa), most genes encoded GH and GT enzymes, which might be used to degrade the host cell barrier during the fungi-fungi infection process. Most of the differences between the two Cladobotryum and three Trichoderma (T. harzianum, T. virens, and T. guizhouense) genomes were due to the high copy number of GH and CBM families in these Trichoderma genomes. Most chitinase-and cellulose-degrading enzymes are categorized within GH classes. The ability of C. dendroides to decompose chitin is suggested by the abundance of genes that belongs to the GH families. Of the GH families, GH18 is encoded by the most genes (19), followed by GH3 (13). GH18 contains the class III and V chitinases. The abundance of GH18 and GH3 was consistent with the efficient degradation of chitinase, cellulose, and hemicellulose [33], and suggested that these enzymes might play a role in the C. dendroides genome. In addition, we found that the C. dendroide genome contained a large number of CBMs, which occur as fusions to GHs, CEs, or AAs. CBM18 is most abundant in the CBM family in C. dendroides: approximately nine residues in modules. These modules were attached to a number of chitinase catalytic domains. Chitin-binding functions have been demonstrated in many cases [34]. Our results suggested that C. dendroides might destroy the host cell wall by secreting a large amount of chitinase Across the three pathogens (C. dendroides, C. protrusum, and M. perniciosa), most genes encoded GH and GT enzymes, which might be used to degrade the host cell barrier during the fungi-fungi infection process. Most of the differences between the two Cladobotryum and three Trichoderma (T. harzianum, T. virens, and T. guizhouense) genomes were due to the high copy number of GH and CBM families in these Trichoderma genomes. Most chitinase-and cellulose-degrading enzymes are categorized within GH classes. The ability of C. dendroides to decompose chitin is suggested by the abundance of genes that belongs to the GH families. Of the GH families, GH18 is encoded by the most genes (19), followed by GH3 (13). GH18 contains the class III and V chitinases. The abundance of GH18 and GH3 was consistent with the efficient degradation of chitinase, cellulose, and hemicellulose [33], and suggested that these enzymes might play a role in the C. dendroides genome. In addition, we found that the C. dendroide genome contained a large number of CBMs, which occur as fusions to GHs, CEs, or AAs. CBM18 is most abundant in the CBM family in C. dendroides: approximately nine residues in modules. These modules were attached to a number of chitinase catalytic domains. Chitin-binding functions have been demonstrated in many cases [34]. Our results suggested that C. dendroides might destroy the host cell wall by secreting a large amount of chitinase during infection. The CBM1 and CBM50 binding Pathogens 2020, 9, 232 7 of 16 domains have been previously described in the order Hypocreales [22,35]. We found additional CBMs that putatively bind to starch, xylanase, fructans, cellulose, and glucans. This implied that C. dendroides makes full use of these domains, which might result in the faster and more competitive degradation of the respective polymers.
Using SECRETOOL, 371 candidate secretory proteins were predicted. Remarkably, 287 peptidase proteins with an identity ≥70% were identified against the peptidase database (MEROPS) database (Supplementary Table S3). Among these secretory proteins, seven homologs were identified in MEROPS, and three homologs were identified in PHI. Fungal effectors can play a vital molecular role in fungus-plant communication, targeting and regulating the phytohormone signaling of hosts by changing or operating them [39]. A set of 400 putative effector proteins were predicted using the filtered set SignalP (identify ≥70%), out of which seven were unique genes. Previous studies have predicted several effector proteins in C. protrusum [17] and M. perniciosa [18]. Pathogens may optimize their own effector sets to adapt to hosts [40]. Overall, these fungal effectors in C. dendroides might lead to the adaptation of pathogens with a broad host range to a specific host over time [41]. We also identified 336 (3.52%), 175 (1.83%), and 48 (0.50%) genes encoding for cytochrome P450 (CYP), major
We next phylogenomically analyzed 2287 single copy orthologs that were conserved across all of the fungi analyzed ( Figure 3C). Our phylogeny of the 17 species indicated that Cladobotryum, Mycogone, and Trichoderma were distantly-related genera within the family Hypocreaceae; this was consistent with previous reports [18,22,43]. Furthermore, Cladobotryum was more genetically similar to Mycogone than to Trichoderma, which was consistent with morphological taxonomy, traditional classification, and life history. Our phylogeny also recovered C. dendroides and C. protrusum in a monophyletic clade, and showed that these species diverged relatively recently. Based on phylogenetic placement, we also inferred that, within the Trichoderma clade, subgroup I (consisting of T. virens, T. harzianum, and T. guizhouense) evolved independently of subgroup II (T. reesei and T. koningii), and occurred species differentiation. This conclusion was consistent with that of Kubicek et al. [43]. These results indicated that our phylogeny tree accurately reflected evolutionary relationships. We next phylogenomically analyzed 2287 single copy orthologs that were conserved across all of the fungi analyzed ( Figure 3C). Our phylogeny of the 17 species indicated that Cladobotryum, Mycogone, and Trichoderma were distantly-related genera within the family Hypocreaceae; this was consistent with previous reports [18,22,43]. Furthermore, Cladobotryum was more genetically similar to Mycogone Pathogens 2020, 9, 232 9 of 16 than to Trichoderma, which was consistent with morphological taxonomy, traditional classification, and life history. Our phylogeny also recovered C. dendroides and C. protrusum in a monophyletic clade, and showed that these species diverged relatively recently. Based on phylogenetic placement, we also inferred that, within the Trichoderma clade, subgroup I (consisting of T. virens, T. harzianum, and T. guizhouense) evolved independently of subgroup II (T. reesei and T. koningii), and occurred species differentiation. This conclusion was consistent with that of Kubicek et al. [43]. These results indicated that our phylogeny tree accurately reflected evolutionary relationships.
We found that gene family contraction in C. dendrodides was more common that gene family expansion. A total of 120 gene families in C. dendroides expanded during evolution, fewer than in C. protrusum (248) and M. perniciosa (385) ( Figure 3C). Functional analyses showed that the significant expansion of gene families in C. dendroides were mainly associated with the major facilitator superfamily (MFS), ankyrin repeats (three copies), and NACHT domains (P < 0.05). MFS transporter proteins allow the uptake of sugars to permit growth on varied substrates [52]. MFS gene families with three duplications were expanded in C. dendrodides, providing this fungus with the versatile ability to utilize sugars for growth. NACHT domains and ankyrin repeats are two kinds of heterokaryon incompatibility protein (HET) related domains. NACHT domains are predicted NTPases, which play a role in programmed cell death (PCD); PCD might be ancient, preceding the radiation of animals and fungi [53,54]. The ankyrin proteins modulate protein-protein interactions among HET proteins [55]. HET and related genes are essential for genetic information transfer in Pyrenochaeta lycopersici [55]. Interestingly, MFS transporters and ankyrin repeats were also expanded in M. perniciosa and Trichoderma species [18,22]. Furthermore, in order to adapt to unfavorable conditions and to host defense mechanisms, pathogens must generate variation [55]. Therefore, this expansion of multiple gene families in C. dendrodides might play a significant role in the pathogenicity of this species, as well as its ability to effectively infect macrofungi, to tolerate varied climates, and to adapt to different host lifestyles [56]. We identified 2.75-fold more contracted gene families in C. dendroides (1085) than have been found in C. protrusum (395) [17], but twice as many were identified in M. perniciosa (2192) [18]. These contracted gene families may serve to decrease the host range of Cladobotryum and Mycogone compared to Trichoderma species [18]. In addition, all of the noticeably contracted gene families were predicated to be P450 members (CYP5150A2, CYP5151A1, and CYP620H1) (P < 0.05). We identified 278 significantly positively selected genes in C. dendroides (P < 0.01), which were mainly related to transport and catabolism, the metabolism of cofactors and vitamins, the immune system, signal transduction, sorting and degradation, and cell growth and death. These genes might play a crucial role in the adaptation of C. dendroides to harsh environments, and in its mycoparasitic lifestyle.
Whole-genome collinearity analyses for C. dendroides and C. protrusum were then performed ( Figure 4). Most of the contigs in C. dendroides were highly conserved syntenic blocks shared with C. protrusum. For example, contig9 and contig0 from C. dendroides corresponded with contig102 and contig83, respectively, from C. protrusum. We identified a few inversions and rearrangements between the homologous regions of the two genomes. For example, we observed a large inversion between contig3 from C. dendroides and contig67 from C. protrusum, as well as rearrangements in contig1 and contig4 from C. dendroides and contig14 and contig28 of C. protrusum. This suggested that a set of fusions or breakages might have arisen among the chromosomes of Cladobotryum species during their long evolutionary history. Notably, we observed an inversion between the middle region of contig9 from C. dendroides (145 genes) and the middle part of contig83 from C. protrusum (150 genes). Functional analyses showed that the 145 C. dendroides genes were mainly associated with GH18 (chitinase), GH35 (beta-galactosidase), serine/threonine-protein kinase, MFS, P450, and transcription factors. Additional genes located in other inversions and rearranged regions were related to amidase, glutathione S-transferase, GH3, GH16, signal peptidases, and zinc-finger proteins (C2H2 type). This genomic diversity might play a crucial role in the morphological formation, lifestyle, adaptation, and resistance of these two species.

Culture Conditions and Genomic DNA Extraction
A single virulent C. dendroides spore was isolated from a specimen of L. edodes infected with symptomatic cobweb disease in Qingyuan, Zhejiang Province (China). This strain was defined as CCMJ 2808, and it was used for whole genome sequencing. We confirmed the identification of this strain as C. dendroides using morphological characters, molecular analysis of ITS DNA sequences, and artificial inoculation [15]. After culturing strain CCMJ 2808 in potato dextrose agar (PDA) at 25 º C in the dark for three days, mycelial mats of pure isolates were harvested and frozen using liquid nitrogen. Genomic DNA extraction was performed using Nuclean plant genomic DNA kits (CWBIO, Beijing, China), following the manufacturer's protocol. We measured DNA integrity, purity, and concentration using 0.6% agarose gel, a Nanodrop 2000 (Thermo Fischer Scientific, Life Technologies, USA), and a Qubit 3.0 (Thermo Fischer Scientific, Life Technologies, USA), respectively. Strain CCMJ 2808 was maintained in the Engineering Research Center of Chinese Ministry of Education for Edible and Medicinal Fungi (ERCCMEEMF, Changchun, China).

Genome Sequencing and Assembly
We constructed 20-kb libraries for C. dendroides, and the genome was sequenced with a PacBio Sequel long-read sequencing platform [17,18,23,57]. To redress mismatch rate and increase sequencing read lengths, we performed next generation sequencing. SMARTdenovo

Culture Conditions and Genomic DNA Extraction
A single virulent C. dendroides spore was isolated from a specimen of L. edodes infected with symptomatic cobweb disease in Qingyuan, Zhejiang Province (China). This strain was defined as CCMJ 2808, and it was used for whole genome sequencing. We confirmed the identification of this strain as C. dendroides using morphological characters, molecular analysis of ITS DNA sequences, and artificial inoculation [15]. After culturing strain CCMJ 2808 in potato dextrose agar (PDA) at 25 ºC in the dark for three days, mycelial mats of pure isolates were harvested and frozen using liquid nitrogen. Genomic DNA extraction was performed using Nuclean plant genomic DNA kits (CWBIO, Beijing, China), following the manufacturer's protocol. We measured DNA integrity, purity, and concentration using 0.6% agarose gel, a Nanodrop 2000 (Thermo Fischer Scientific, Life Technologies, USA), and a Qubit 3.0 (Thermo Fischer Scientific, Life Technologies, USA), respectively. Strain CCMJ 2808 was maintained in the Engineering Research Center of Chinese Ministry of Education for Edible and Medicinal Fungi (ERCCMEEMF, Changchun, China).

Genome Sequencing and Assembly
We constructed 20-kb libraries for C. dendroides, and the genome was sequenced with a PacBio Sequel long-read sequencing platform [17,18,23,57]. To redress mismatch rate and increase sequencing read lengths, we performed next generation sequencing. SMARTdenovo (https://github.com/ruanjue/ smartdenovo) was used for the de novo assembly of the genome. BUSCOs [58] were used to assess the completeness of the genome assembly. The assembled genome of C. dendroides was submitted to the NCBI database and deposited with the accession number WWCI01000000.

Gene Prediction and Genome Annotation
Repeat elements (including SSRs) and non-coding genes were predicted as described previously [17,18]. Coding genes were predicted using a combination of sequence homology and de novo prediction. The homology approach was based on representative reference genomes, including those of Cor. militaris [48], T. reesei [26], C. protrusum [17], F. oxysporum [50], and F. graminearum [51]. De novo predictions were performed using Augustus [59] and GlimmerHMM [60]. Finally, MAKER2 [61] was used to combine the gene models generated by both de novo prediction and protein homology matching. The predicted genes were used for subsequent analyses. The predicted protein-coding genes were functionally annotated against seven databases, with a cutoff e-value of 1 × 10 −5 : NCBI Nr, Nt, KOG [62], SwissProt [63], GO [64], KEGG [65], and Pfam [66]. The presence of mating-type genes in C. dendroides was confirmed by homologous alignment with mating-type genes and flanking sequences from the order Hypocreales. Biological sequences 1.0 was used to draw the gene structures [67].

Orthological, Phylogenetic, Evolutionary, and Whole-Genome Collinearity Analyses
OrthoMCL (v.2.0.9) [72] was used to identify the orthologous gene families based on the CDSs of C. dendroides, C. protrusum, M. perniciosa, and 14 other selected genomes downloaded from the NCBI. The shared single-copy genes were screened and aligned using Clustal omega [73]. To construct a genome-based phylogenetic tree, we used the maximum likelihood (ML) algorithm in RAxML [74], with P. oryzae as the outgroup. Computational Analysis of Gene Family Evolution (CAFE) 3.1 [75] was used to predict the expansion and contraction of the orthologous gene families; those with a P-value < 0.05 were considered to be significant candidates. Positively selected genes were identified using the branch-site model of the CodeML tool in PAML [76]. The whole-genome collinearity of C. dendrodides and C. protrusum was analyzed using the Python version of MCscan, with default parameters (https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)).

Conclusions
In this study, we obtained a high-quality genome sequence of C. dendroides, a causal agent of cobweb disease on cultivated edible mushrooms, using the PacBio Sequel platform. From our genomic and comparative analyses, we drew three conclusions. First, compared with M. perniciosa, Cladobotryum (C. dendroides and C. protrusum) lost many repeated sequences over evolutionary time, which resulted in a contraction in the sizes of both genomes. Second, C. dendroides shared highly conserved syntenic blocks with C. protrusum. In addition, within the Hypocreaceae, Cladobotryum was more closely related to Mycogone than to Trichoderma. Third, a significant number of the predicted expanded gene families in C. dendroides were strongly associated with pathogenicity, virulence, resistance, and adaptation, providing insights into the molecular biology, genetics, evolution, and pathogenicity of this species. Overall, our genomic data provide a valuable resource with which to accelerate functional studies of the pathogen C. dendroides, subsequently supporting the development of control strategies as well as breeding programs designed to improve cobweb disease resistance in edible mushrooms.