Hybrid De Novo Whole-Genome Assembly, Annotation, and Identification of Secondary Metabolite Gene Clusters in the Ex-Type Strain of Chrysosporium keratinophilum

Chrysosporium is a polyphyletic genus belonging (mostly) to different families of the order Onygenales (Eurotiomycetes, Ascomycota). Certain species, such as Chrysosporium keratinophilum, are pathogenic for animals, including humans, but are also a source of proteolytic enzymes (mainly keratinases) potentially useful in bioremediation. However, only a few studies have been published regarding bioactive compounds, of which the production is mostly unpredictable due to the absence of high-quality genomic sequences. During the development of our study, the genome of the ex-type strain of Chrysosporium keratinophilum, CBS 104.66, was sequenced and assembled using a hybrid method. The results showed a high-quality genome of 25.4 Mbp in size spread across 25 contigs, with an N50 of 2.0 Mb, 34,824 coding sequences, 8002 protein sequences, 166 tRNAs, and 24 rRNAs. The functional annotation of the predicted proteins was performed using InterProScan, and the KEGG pathway mapping using BlastKOALA. The results identified a total of 3529 protein families and 856 superfamilies, which were classified into six levels and 23 KEGG categories. Subsequently, using DIAMOND, we identified 83 pathogen–host interactions (PHI) and 421 carbohydrate-active enzymes (CAZymes). Finally, the analysis using AntiSMASH showed that this strain has a total of 27 biosynthesis gene clusters (BGCs), suggesting that it has a great potential to produce a wide variety of secondary metabolites. This genomic information provides new knowledge that allows for a deeper understanding of the biology of C. keratinophilum, and offers valuable new information for further investigations of the Chrysosporium species and the order Onygenales.


Introduction
The genus Chrysosporium was proposed by Corda to introduce a single species, Chrysosporium corii [1]. However, Saccardo [2] synonymized that genus with Sporotrichum and, consequently, the former fell into oblivion. More than fifty years later, Hughes [3] reintroduced Chrysosporium for C. corii and Chrysosporium pannorum (syn. Geomyces pannorum), restricting the generic concept of Sporotrichum to those species with wide hyphae, dark conidia and the absence of intercalary conidia. In a revision carried out by Carmichael [4], Blastomyces, Emmonsia, Geomyces, Myceliophthora, and Zymonema were synonymized with Chrysosporium, leaving that genus morphologically highly one-sided. Dominik [5] expanded Carmichael's concept of Chrysosporium a little more, including Sepedonium, a genus that, like Sporotrichum, was later demonstrated to have phylogenetic links with basidiomycetous fungi [6]. Van Oorschot [6], in her monograph on Chrysosporium and allied genera, restored the order to the genus, disaggregating Emmonsia, Geomyces, Myceliophthora, and Zymonema from them, and introducing the genus Trichosporiella, based on colony features, conidial morphology, temperature resistance, and keratin degradation, among other phenotypic characters. Van Oorschot [6] also remarked the connection between the species the low-quality reads were removed by Trimmomatic v0.39 [24], using the ILLUMINACLIP, SLIDINGWINDOW and MINLEN options. Hybrid assemblies with short and long reads were performed using the SPAdes v3. 13.0 [25] and MaSuRCA v4.0.5 [26] software, with default settings. All assemblies obtained were evaluated using QUAST v5.1.0rc.1 [27] and BUSCO v5.3.1 [28] to assess the completeness of the genome. Based on QUAST and BUSCO results, only one assembly was considered for downstream analysis. The best result draft assembly was polished using Illumina short-read data with POLCA (from MaSuRCA v.4.0.5).

Genome Information and Comparison with the Closest Species
We present the first hybrid de novo genome sequencing of the ex-type strain of Chrysosporium keratinophilum using short-and long-read technologies. The QUAST analysis showed that the best assembly was obtained with MaSuRCA. The resulting polished genome consisted of 25.4 Mbp, spread across 25 contigs with an N50 of 2.0 Mb and a BUSCO score of 96.0%. This last result is comparable to the C. immitis RS, C. posadasii C735 delta SOWgp and A. verrucosus IHEM 4434 genome assemblies (96.8%, 96.8% and 96.3%, respectively), indicating that our assembly was relatively contiguous (Figure 1). A total of 166 tRNAs, with a length ranging from 67 bp to 129 bp, and 24 rRNAs were predicted in the genome.
Assembly statistics of Chrysosporium keratinophilum, with its closest phylogenetically related species, are referred to in Table 2. A total of 166 tRNAs, with a length ranging from 67 bp to 129 bp, and 24 rRNAs were predicted in the genome.
Assembly statistics of Chrysosporium keratinophilum, with its closest phylogenetically related species, are referred to in Table 2.

Average Nucleotide Identity
Based on the whole-genome alignment, the average nucleotide identity (ANI) showed values between some members of the Onygenales from 96.14 to 72.48%. These results confirmed that Chrysosporium keratinophilum belongs to the family Onygenaceae, showing a close relationship with Aphanoascus verrucosus IHEM 4434, with an ANI value of 81.19%, although it is loosely related to other species that are more closely related to the Onygenales (Amauroascus niger UAMH 3544, Brunneospora queenslandica CBS 280.77, Coccidioides immitis RS, Coccidioides posadasii C735 delta SOWgp, Ophidiomyces ophiodiicola CBS 122,913 and Uncinocarpus reesii UAMH 1704) ( Figure 2). Based on the ANI results, we accept Chrysosporium keratinophilum CBS 104.62 as belonging to the genus Aphanoascus, as has been previously proposed [22].

Prediction of Genes from the Assembled Genome
Gene annotation, using BRAKER2 pipeline, resulted in 34,824 coding sequences (CDS) and 8002 protein sequences. Functional annotation, using Interproscan with Pfam and SUPERFAMILY options, produced a total of 3529 protein families and 856 superfamilies as the results (Supplementary Tables S1-S4). Annotations based on the Pfam and SU-PERFAMILY databases assigned functions to 76.6% and 62.7%of the predicted proteins, respectively. The most prevalent Pfam dominants included WD domain G-beta repeat, protein kinase domain, reverse transcriptase (RNA-dependent DNA polymerase), Ankyrin repeats (three copies), and mitochondrial carrier protein as the most prevalent families. In the case of superfamilies, the analysis showed that the five most prevalent were: a P-loop containing nucleoside triphosphate hydrolases, protein kinase-like (PKlike), ribonuclease H-like, NAD(P)-binding Rossmann-fold domains and DNA/RNA pol- In the present study, the highest ANI value obtained was between Coccidioides immitis RS and Coccidioides posadasii C735 delta SOWgp (ANI value = 96.1%), and the lower values were shown by Ophidiomyces ophiodiicola CBS 122,913 when it was compared with the other analyzed strains (ANI values ≤ 72.7%). Brunneospora queenslandica CBS 280.77 and Amauroascus niger UAMH 3544 showed an ANI value of 83.43%. Our results suggest this ANI value is too high for two strains belonging to different genera, because previous studies have obtained ANI values close to 79% for fungi of the same genus [38]. Therefore, an exhaustive taxonomic review of the Onygenales is recommended in order to look for possible errors in the taxonomic assignment or for limitations of the ANIs in discriminating between the genera of that order.

Prediction of Genes from the Assembled Genome
Gene annotation, using BRAKER2 pipeline, resulted in 34,824 coding sequences (CDS) and 8002 protein sequences. Functional annotation, using Interproscan with Pfam and SUPERFAMILY options, produced a total of 3529 protein families and 856 superfamilies as the results (Supplementary Tables S1-S4). Annotations based on the Pfam and SU-PERFAMILY databases assigned functions to 76.6% and 62.7%of the predicted proteins, respectively. The most prevalent Pfam dominants included WD domain G-beta repeat, protein kinase domain, reverse transcriptase (RNA-dependent DNA polymerase), Ankyrin repeats (three copies), and mitochondrial carrier protein as the most prevalent families. In the case of superfamilies, the analysis showed that the five most prevalent were: a P-loop containing nucleoside triphosphate hydrolases, protein kinase-like (PK-like), ribonuclease H-like, NAD(P)-binding Rossmann-fold domains and DNA/RNA polymerases.
Previous studies have shown a fluctuating number of gene families in some members of the Onygenales [39][40][41]. The genome analysis of C. keratinophilum showed a reduction in the number or an absence of gene families related to the degradation of the plant cell wall, such as the cellulase (glycosyl hydrolase family 5), fungal cellulose-binding domain and glycosyl hydrolase family 61. At the same time, analysis showed a higher number of genes from families related to the degradation of animal material, such as the protein tyrosine kinase and subtilase family. Regarding other protein families, we would like to highlight the high frequency of the LysM domain, with a total of 36 genes, being the largest number of genes reported within the order Onygenales [41][42][43]. The LysM domain is linked to various functions, such as improving fungal-fungal union interactions and chitin and keratin degradation, the latter being fundamental in a keratinophilic fungus.
In recent years, various keratinases have been identified both in bacteria and fungi. In bacteria, these enzymes have been reported in some species of Bacillus, Pseudomonas and Stenotrophomonas, among others, and in fungi in genera such as Microsporum, Onygena and Trichophyton [44]. Keratinases are distributed across various families belonging to the serine proteases and metalloproteases [45]. In the current genome, various families of peptidases that were previously associated with keratin degradation [45][46][47] were identified, such as peptidase family S41, dipeptidyl peptidase IV (DPP IV), peptidase family M16, peptidase family M28, and the fungalysin metallopeptidase (M36), peptidase family M3 and peptidase family M48, which could be linked to the fact that C. keratinophilum has been described as a keratinophilic species. In this way, keratin degradation by C. keratinophilum could go along the following pathway: a rupture of the keratin disulfide bonds bisulfite reductases; then, the endoproteases of the M36 family would act, providing small peptides; next, exoproteases of the M28 family and dipeptidyl peptidase IV (DPP IV) hydrolyze the peptides into oligopeptides; and, finally, the peptidase M3 family of enzymes can hydrolyze these oligopeptides.
The BlastKOALA tool is a KEGG web service that annotates genomes in order to understand the biological functions and interactions of genes [48]. KEGG route-mapping assigned the annotated genes into six levels and distributed them across 22 KEGG categories. Of the six levels, the most prevalent was metabolism (2921, 39.5%), followed by human diseases (1811, 24.5%) and genetic information processing (808, 10.9%). These enzymes were then categorized according to the functional category. The five most prevalent were: genetic information processing (1597, 43%), carbohydrate metabolism (317, 9%), cellular processes (226, 6%), protein families: signaling, cellular processes (178, 5%) and amino acid metabolism (176, 5%) ( Figure 3). assigned the annotated genes into six levels and distributed them across 22 KEGG categories. Of the six levels, the most prevalent was metabolism (2921, 39.5%), followed by human diseases (1811, 24.5%) and genetic information processing (808, 10.9%). These enzymes were then categorized according to the functional category. The five most prevalent were: genetic information processing (1597, 43%), carbohydrate metabolism (317, 9%), cellular processes (226, 6%), protein families: signaling, cellular processes (178, 5%) and amino acid metabolism (176, 5%) ( Figure 3). The carbohydrate-active enzymes (CAZymes) are a broad class related to the breaking down of complex carbohydrates and polysaccharides into small molecules [49]. Analysis of CAZymes showed that the genome of C. keratinophilum encodes a large, varied set  The carbohydrate-active enzymes (CAZymes) are a broad class related to the breaking down of complex carbohydrates and polysaccharides into small molecules [49]. Analysis of CAZymes showed that the genome of C. keratinophilum encodes a large, varied set of CAZyme families that resulted in the identification of 421 genes (Table 3 and Supplementary  Tables S5 and S6), a lower value compared to human pathogenic species of the same order, such as Blastomyces dermatitidis, C. immitis or C. posadasii [50]. Based on the results obtained with DIAMOND, glycoside hydrolases (GHs) were the most prevalent family, with 61 enzymes. The next most prevalent were glycosyltransferases (GTs) with 41, the third was the carbohydrate-binding module (CBM) group with 22, followed by the families of auxiliary activities (AAs), carbohydrate esterase (CE) and polysaccharide lyases (PLs) with 15, 7 and 2, respectively. The glycosyltransferase enzymes catalyze the formation of glycosidic bonds by the transfer of sugar moieties from activated donor molecules to specific acceptor molecules [51]. In the present study, the most prevalent glycosyltransferases were GT2, GT1 and GT22. The GT2 family was the group with the highest number of genes (with 18), and is one group of enzymes that synthesizes chitin [52]. Previous investigations have shown that the GT2 families are the most common component in most fungal species [51]. The GT1 enzyme encodes sterol glucosyltransferase, which catalyzes the synthesis of sterol glycosides and membrane-bound lipids, and is widespread in some algae, fungi, bacteria, and animals [53]. Finally, the GT22 family is involved in α-1,2-mannosyltransferase activity, which was previously found to contribute to virulence in fungi [54].
The family of glycoside hydrolases (GHs) hydrolyze the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate [55]. In the present study GH18, GH47 and GH125 were the most prevalent in this family. The chitinases from family GH18 have been reported previously in fungi and plants. In the case of fungi, these enzymes relate to nutrition, growth, mycoparasitism and virulence [49]. The enzymes GH47 and GH125 relate to the activity of α-mannosidase, although there has been no information on the function of these enzymes until now [56].
For the other families, the most prevalent were CBM50, CE8, AA3 and PL3_2. The CBM50 family is associated with chitinase catalytic domains, implicated in binding chitin [57]. The CE8 family has pectin methylesterase activity, which is essential for the metabolism of pectin [58]. The AA3 family has FAD-dependent (GMC) oxidoreductase activity, relating to the formation of metabolites such as hydroquinones or H 2 O 2 , required by other AA enzymes [59]. Finally, the PL3_2 family is a pectin lyase that catalyzes the scission of pectin [58].
Previous studies performed on different pathogenic fungi genera related to Chrysosporium, such as Blastomyces, Coccidioides, Histoplasma and Sporothrix, have shown the absence of CAZymes of the PL class [50,60]. In the genome of strain CBS 104.62, the identification of PL3_2 and PL1_7, both related to pectin degradation, was possible [58,61]. Moreover, it was also possible to identify another CAZyme related to pectin hydrolysis, GH28, also absent in the Coccidioides genome. The presence of these families in the analyzed genome could be due to the fact that C. keratinophilum is a saprophyte fungus with soil as its main ecological niche.
The PHI base is a database that contains verified information on virulence-related genes that affect the outcome of pathogen-host interactions [36]. Based on the PHI analysis, we identified a total of 83 PHI putative genes in the C. keratinophilum genome (1.06% of total genes) (Figure 4 and Supplementary Table S7), Aspergillus fumigatus being the species with the highest number of homologous genes (30 genes), followed by Fusarium graminearum (20 genes), Magnaporthe oryzae (15 genes) and other fungal species (20 genes). Among the genes, the reduced virulence group showed a higher number of genes (35 genes), followed by unaffected pathogenicity with 21, and mixed with 21 genes. The high number of reduced virulence and unaffected pathogenicity can indicate that C. keratinophilum CBS 104.62 might be considered to have a weak pathogenic ability. However, various studies consider some strains of Chrysosporium spp. as opportunistic pathogens, causing skin and nail diseases, and deeper infections in immunocompromised patients [62].  The secondary metabolite analysis, using AntiSMASH, classified 27 BGCs into nine types, which, according to the genomic organization principle implicated upon transcriptional regulation, could have a role in the production of secondary metabolites by this strain [63] (Figure 5): six non-ribosomal peptide synthetase (NRPS) clusters, six type 1 polyketide synthase (T1PKS) clusters, tree terpene clusters, one indole cluster, two type 3 polyketide synthase (T3PKS) clusters, one lasso peptide, one non-ribosomal peptide synthetase (NRPS)-like cluster, one beta-lactone and six hybrid clusters. From this, one BGC can be identified as okaramine B, with 85% similarity, and the other three as UNII-YC2Q1O94PT YC2Q1O94PT (ACR toxin I), clavaric acid and dimethyl coprogen, with  The secondary metabolite analysis, using AntiSMASH, classified 27 BGCs into nine types, which, according to the genomic organization principle implicated upon transcriptional regulation, could have a role in the production of secondary metabolites by this strain [63] (Figure 5): six non-ribosomal peptide synthetase (NRPS) clusters, six type 1 polyketide synthase (T1PKS) clusters, tree terpene clusters, one indole cluster, two type 3 polyketide synthase (T3PKS) clusters, one lasso peptide, one non-ribosomal peptide synthetase (NRPS)-like cluster, one beta-lactone and six hybrid clusters. From this, one BGC can be identified as okaramine B, with 85% similarity, and the other three as UNII-YC2Q1O94PT YC2Q1O94PT (ACR toxin I), clavaric acid and dimethyl coprogen, with 100% similarity. The UNII-YC2Q1O94PT (ACR toxin I) is associated with the production of leaf spot disease on rough lemon by Alternaria alternata [64], clavaric acid is an antitumor isoprenoid compound that acts as an inhibitor of Ras farnesyl transferase, previously described in Hypholoma sublateritium [65], and finally, dimethyl coprogen is well known as a siderophore to chelate iron during depleted conditions by Alternaria alternata [66].

Conclusions
In this study, we present the only genome of Chrysosporium keratinophilum that has been sequenced and published using a hybrid assembly strategy to date. The genome annotation and the genomic analysis provide new knowledge that will allow us to deepen our understanding of the biology of Chrysosporium keratinophilum, and gather new information for further investigations within the Onygenales. In addition, its genetic capability to produce secondary metabolites was successfully determined by the elucidation of the biosynthetic gene pathways, suggesting that the studied strain has a great biosynthetic potential to produce compounds of biotechnological interest. However, future analysis will be necessary to corroborate the in vitro production of such molecules.   A previous study determined transcriptionally active genes, as well as their enzymatic products after classifying the biosynthetic genes, using the fungal genomes of anaerobic fungi from the class Neocallimastigomycetes under laboratory conditions [67]. Although our results suggest the probable production of secondary metabolites associated with C. keratinophilum, more studies are needed to prove the production of these compounds by this strain.

Conclusions
In this study, we present the only genome of Chrysosporium keratinophilum that has been sequenced and published using a hybrid assembly strategy to date. The genome annotation and the genomic analysis provide new knowledge that will allow us to deepen our understanding of the biology of Chrysosporium keratinophilum, and gather new information for further investigations within the Onygenales. In addition, its genetic capability to produce secondary metabolites was successfully determined by the elucidation of the biosynthetic gene pathways, suggesting that the studied strain has a great biosynthetic potential to produce compounds of biotechnological interest. However, future analysis will be necessary to corroborate the in vitro production of such molecules.