Whole-Genome Sequence and Comparative Analysis of Trichoderma asperellum ND-1 Reveal Its Unique Enzymatic System for Efﬁcient Biomass Degradation

: The lignocellulosic enzymes of Trichoderma asperellum have been intensely investigated toward efﬁcient conversion of biomass into high-value chemicals/industrial products. However, lack of genome data is a remarkable hurdle for hydrolase systems studies. The secretory enzymes of newly isolated T. asperellum ND-1 during lignocellulose degradation are currently poorly known. Herein, a high-quality genomic sequence of ND-1, obtained by both Illumina HiSeq 2000 sequencing platforms and PacBio single-molecule real-time, has an assembly size of 35.75 Mb comprising 10,541 predicted genes. Secretome analysis showed that 895 proteins were detected, with 211 proteins associated with carbohydrate-active enzymes (CAZymes) responsible for biomass hydrolysis. Additionally, T. asperellum ND-1, T. atroviride IMI 206040, and T. virens Gv-298 shared 801 orthologues that were not identiﬁed in T. reesei QM6a, indicating that ND-1 may play critical roles in biological-control. In-depth analysis suggested that, compared with QM6a, the genome of ND-1 encoded a unique enzymatic system, especially hemicellulases and chitinases. Moreover, after comparative analysis of lignocellulase activities of ND-1 and other fungi, we found that ND-1 displayed higher hemicellulases (particularly xylanases) and comparable cellulases activities. Our analysis, combined with the whole-genome sequence information, offers a platform for designing advanced T. asperellum ND-1 strains for industrial utilizations, such as bioenergy production.


Introduction
Lignocellulose from agricultural wastes, such as corn stover and sugarcane bagasse, serves as a widespread, renewable, and available resource [1][2][3]. Its components contain abundant and complex polysaccharides, including hemicellulose, cellulose, and lignin [4][5][6]. Particularly, hemicellulose and cellulose are becoming potential biomass feedstocks in the generation of high-value chemicals or bioenergy products [7][8][9]. Efficiently catalytic conversion of lignocellulose is mainly dependent on the availability of carbohydrateactive enzymes (CAZymes) [10,11], typically glycoside hydrolases (GHs), which degrade lignocellulosic biomass into simple sugars [12], a critical process for the production of second-generation bioethanol [13]. In spite of remarkable progress that has been achieved in enzymatic biodegradation of lignocellulosic materials [14,15], the high production cost of lignocellulases is still a major hurdle that must be solved prior to commercial-scale implementation of cellulosic ethanol [16].
In nature, the complete hydrolysis of biomass polysaccharides is usually carried out by synergetic action of various CAZymes (hemicellulases, cellulases, and lignin-modifying yielded 10,541 genes for T. asperellum ND-1 and 12,802 genes for T. asperellum CBS 433.97, respectively, both greater than the estimate for T. asperellum IC-1 (8803) ( Table S1).

Featuers
Trichoderma asperellum ND-1 Mobile elements. Repetitive DNA elements and transposable elements (TEs) play critical roles related to the gene functions, the evolution, and genome structure of the filamentous fungi [52]. The repeated sequences of the T. asperellum ND-1 genome were identified to be approximately 591,590 bp, including simple repeats, low complexity, small RNA, interspersed repeats, and satellites (Table S2). The repeated sequences represent 1.66% of the genome. Moreover, 78% of the TEs were simple repeats, whereas LINEs was just estimated to be 2% ( Figure S1, Supplementary Materials). Notably, low complexity and small RNA account for 16 and 4%, respectively. Prediction and analysis of the T. asperellum ND-1 secretome. The secretory proteome is believed to have an important role in identifying the capacity of the fungi to interact with distinct nature environment [38]. According to online software SignalP (version 4.1), the total number of 895 (represent 8.5% of the protein-coding genes) secreted proteins were predicted and annotated in T. asperellum ND-1 genome, which was higher than that of T. reesei Rut C30 (636 proteins) [53]. From this, GO terms were identified into 529 putative secreted proteins in the GO groups, namely, biological process (730), molecular function (561), and cellular component (675) (Figure S2). In the cellular component group, secretory proteins for membrane and membrane part, cell and cell part, organelle and organelle part, extracellular region, and macromolecular complex were highly abundant. Within the biological process, including the metabolic process, cellular process, localization, single-organism process, biogenesis or cellular component organization, biological regulation, and regulation of biological process, responses to the stimulus were highly represented. Under molecular function category, proteins related to binding, nucleic acid binding transcription factor activity, transporter activity, catalytic activity, electron carrier activity, and antioxidant activity were most abundant.
As for potential pathogenesis-related proteins of T. asperellum ND-1 secretome, 175 secreted proteins identified within the PHI database were assigned to various categories. Among them, 81 (41%) proteins were associated with reduced virulence, (76) 38% proteins were of unaffected pathogenicity, (18) 9% proteins were related to increased virulence (hypervirulence), and (18) 9% were related to loss of pathogenicity ( Figure 2). Cytochrome P450 (CYP450) monooxygenase superfamily is involved in numerous metabolisms of the filamentous fungi, including secondary metabolites, lifestyle, and pathogenicity [54][55][56]. In T. asperellum ND-1, 163 CYP proteins were confirmed, of which 99 showed homologous counterparts in the PHI database. A large number of extracellular enzymes secreted from T. asperellum have been recognized, many of which are involved in the degradation of complex biomass carbohydrates in various environments [38,41]. Using the CAZy database and carrying out a HMMER (version 3.3) scan, according to the profile compound in dbCAN release 2.0, we identified the presence of 67% GHs, 12% auxiliary activities (AAs), 10% carbohydrate esterases (CEs), 6% glycosyl transferases (GTs), 3% polysaccharide lyases (PLs), and 2% carbohydrate binding modules (CBMs) in T. asperellum ND-1 secretome ( Figure 3A). LPMOs are the monocopper enzymes widely distributed in nature that catalyze the hydroxylation of glycosidic bonds in most abundant available polysaccharide in nature, i.e., cellulose [57,58]. Secretomic analysis revealed that T. asperellum ND-1 encode two predicted LPMOs from AA9 and AA11, respectively. Moreover, the AA9 family could have important roles as copper dependent LPMOs, cleaving oxidatively biomass cellulose [38]. Additionally, this work contributes to the broader mapping of enzyme activity in the Auxiliary Activity family (particularly AA9, AA11, and AA14) and provides new biocatalysts for potential applications in biomass modification.
The analysis of the CAZy categories was performed for the biomass hydrolysis enzymes families. Results showed that 141 genes encoding glycosyl hydrolases enzymes were divided into 49 families. The GH families possessing three or more genes had 18 in T. asperellum ND-1 secretome, with GH18 being the largest family (17 genes), followed by GH16 (11 genes), GH55 (8 genes), GH3 (7 genes), GH92 (7 genes), and GH5 (6 genes) ( Figure 3F). PL7 (3 genes), PL20 (2 genes), PL1 (1 genes), and PL8 (1 genes) of the PL families were also identified ( Figure 3C). A previous study reported that members of the Trichoderma fungi (particularly T. atroviride and T. harzianum) are widely utilized as agricultural biocontrol agents [59,60], and both secondary metabolites and GH18 (chitinases) could play critical roles in growth and attacking pathogens [61]. Additionally, out of seven CEs families confirmed, members of family CE10 contained the maximum genes [7], followed by CE5 (4 genes), CE8 (3 genes), CE4 (2 genes), CE3 (2 genes), and CE1 (1 gene) ( Figure 3E). The enzymatic activities of carboxylesterases were displayed in both CE10 and CE1 families [62]. Further, the enzymes providing auxiliary functions for degradation of polysaccharides were represented by four families of carbohydrate binding modules (CBM6, CBM24, CBM42, CBM66), glycosyl transferases (9 families) ( Figure 3B), and auxiliary activities (9 families) ( Figure 3D). Among them, the numbers of CBM42, AA7, and GT90/GT22/GT15 genes were significantly higher. Moreover, the secretory proteins of T. asperellum ND-1 also contained an assortment of proteases, transferases, and chitinases. These results imply that T. asperellum ND-1 secretome consists of various functional proteins and the major components associated with proteolytic and cellulolytic enzymes, which are crucial for promoting the hydrolysis of the host plant to obtain essential nutrients and adapt various environments. Phylogenetic relationships. The evolutionary relationships of T. asperellum ND-1 and other selected fungi species were evaluated using the proteomes of these fungi. According to phylogenetic analysis results, all the selected Trichoderma were distributed into a single primary cluster ( Figure 4). Majority of Trichoderma species are commonly applied in agriculture as effective agents for biological control against many phytopathogenic microorganisms; examples are T. asperellum T203 [63], T. harzianum [59], and T. asperellum SKT-1 [64]. The isolated lignocellulolytic fungus T. asperellum ND-1 is evolutionally close to T. asperellum CBS 433.97 ( Figure 4). T. asperellum ND-1 is also close to the other two biological species, T. atroviride IMI 206040 and T. gamsii T6085, suggesting that T. asperellum ND-1 may have biocontrol functions applied in agriculture. Moreover, T. asperellum ND-1 and T. reesei QM6a (a representational producer of plant biomass degrading enzymes) were distributed into different subclusters ( Figure 4). In addition, all the Asperellus species were grouped into another single clade that is distantly related to T. asperellum ND-1. Grifola frondosa 9006-11 was served as an outgroup in the phylogenomic analysis.

Comparative analysis of orthologous genes between different Trichoderma species.
The annotated proteome of T. asperellum ND-1 was further compared with the other three biological species, T. reesei QM6a, T. virens Gv-298, and T. atroviride IMI 206040, by using orthoMCL [65]. Among the four Trichoderma species, a total of 7073 common clusters were identified ( Figure 5). The common clusters accounted for 60-80% of the four fungal proteomes, respectively, which indicated that the vast majority of the genes were conserved in the Trichoderma group. However, T. asperellum ND-1, T. atroviride IMI 206040, and T. virens Gv-298 contained about 1381, 1618, and 1991 species-specific clusters, respectively, but the T. reesei QM6a had only 520 unique clusters ( Figure 5), consistent with a previous study, showing that T. reesei contained fewer exclusive orthologous genes than other sequenced fungus [43,44]. Moreover, T. virens and T. atroviride are probably the most popular investigated biocontrol agents utilized in various agriculture fields [60]. In this study, we found that T. asperellum ND-1, T. atroviride IMI 206040, and T. virens Gv-298 shared 801 orthologues that were not detected in T. reesei QM6a ( Figure 5), which may be partial factors that resulted in a T. asperellum ND-1 biological control function [63,64]. In addition, we identified that 7250 orthologous genes were present between T. asperellum ND-1 and T. reesei QM6a ( Figure 5), indicating that T. asperellum ND-1 may have strong biomass degradation ability [38,41]. A number of 7889, 8179, and 8957 common clusters were also predicted between Trichoderma species ( Figure 5) when comparing T. reesei QM6a vs. T. atroviride IMI 206040, T. reesei QM6a vs. T. virens Gv-298, and T. atroviride IMI 206040 vs. T. virens Gv-298, respectively.
All five Trichoderma species also produced a large series of enzymes, the majority of which were known to be associated with chitin degradation. For example, the GH18 family, containing various enzymes linked to chitin hydrolysis [61], was remarkably expanded in the genomes of T. virens Gv-298, T. atroviride IMI 206040, T. asperellum CBS 433.97, and T. asperellum ND-1 (32, 27, 27, and 26 genes, respectively), relative to T. reesei QM6a (19 genes) ( Figure 6). The component of fungal cell walls was comprised of substantial chitin and chitinolytic enzymes and was therefore an indispensable part of mycoparasitic attack [48]. Moreover, hydrolases from GH75 (chitosanases) and GH18 (endo-β-N-acetylglucosaminidases) also play a critical role in the degradation of fungal cell walls [43,61]. The most abundant of all glycoside hydrolases in T. asperellum ND-1 genome was GH18 comprised of 26 chitinolytic enzymes ( Figure 6), which is consistent with a previous study [73]. Therefore, the T. asperellum ND-1 may be served as an effective and environmentally friendly bio-control agent, similar to T. virens Gv-298 and T. atroviride IMI 206040, against numerous phytopathogenic microorganisms [43]. In addition, identified amylolytic enzymes of T. asperellum ND-1 comprised six α-amylase (GH13), and three glucoamylase (GH15) were detected ( Figure 6). Consequently, T. asperellum ND-1 could have great application potentials in the production of value-added biomolecules maltose from α-glucan like starch.
Comparative analysis of lignocellulolytic enzyme activities. Efficiently catalytic degradation of lignocellulose is dependent on the synergistic action of various enzymes that hydrolyze lignocellulolytic biomass into fermentable sugars [17,19,20]. The present results show that T. asperellum ND-1 and other filamentous fungi displayed different time course profiles of lignocellulase activities (Figure 7). Among hemicellulases, xylanase activity in T. asperellum ND-1 extract improved sharply over time and obtained the highest level of 173.25 ± 3.14 U/mL after 5 days of cultivation ( Figure 7A). Xylanases produced by T. asperellum ND-1 were identified to efficiently hydrolyze xylan into major product xylobiose [11]. For P. decumbens, the corresponding activity increased slowly to reach the maximum value (80.83 ± 4.55 U/mL) on day 3 ( Figure 7A). Xylanase activity produced by T. reesei increased gradually until the end of the cultivation and a maximum of 68.84 ± 2.98 U/mL was achieved on day 6, while it was much lower in G. frondosa, F. solani, A. tamarii, A. niger ND-1, and M. thermophila extract ( Figure 7A). β-xylosidase activity of T. asperellum ND-1 increased over time, and a peak (0.54 ± 0.08 U/mL) displayed on day 3 ( Figure 7B). For A. niger ND-1 and P. decumbens, the activity was up to a maximum (0.43 ± 0.003 U/mL and 0.31 ± 0.02 U/mL, respectively) after 4 days ( Figure 7B) and then remained relative stable. In contrast, the enzyme activity of T. reesei, M. thermophila, and F. solani fluctuated at a low level between 0.046 ± 0.003 and 0.12 ± 0.02 U/mL during the cultivation time ( Figure 7B). A minimal β-xylosidase activity was observed in the G. frondosa and A. tamarii extract ( Figure 7B). In addition, reports reveal that side-chain-degrading enzymes play a crucial role in the degradation of biomass [17]. α-L-arabinofuranosidase activities were found and showed the maximum level on day 5 in all selected fungi. In T. asperellum ND-1, the enzyme activity reached a maximum value of 1.58 ± 0.06 U/mL ( Figure 7C), with a 5.5-fold, 4-fold, and 1.3-fold higher level, respectively, compared with that of M. thermophila, T. reesei, and P. decumbens. These results, taken together, reveal that T. asperellum ND-1 produced various hemicellulases with significantly higher activities, which can be utilized in various fields, particularly in the production of valuable biomolecules (prebiotics, xylooligosaccharides).
In terms of cellulolytic enzymes, endoglucanases from T. asperellum ND-1 displayed the highest level (6.33 ± 0.11 U/mL) on day 5, and the activity exhibited was comparable with that of T. reesei (8.78 ± 0.03 U/mL) ( Figure 7D). The corresponding enzyme activity in P. decumbens and M. thermophila increased rapidly to the maximum value (17.71 ± 0.03 U/mL and 11.20 ± 0.84 U/mL, respectively) on day 5 ( Figure 7D), but it was too low in A. niger ND-1, A. tamarii, F. solani, and G. frondosa during the overall cultivation period. In addition, another two major cellulase activities (cellobiohydrolase (0.23 ± 0.01 U/mL) and β-glucosidase (0.087 ± 0.01 U/mL)), produced by T. asperellum ND-1, were much higher than those of other selected fungi ( Figure 7E,F), including A. niger ND-1, F. solani, and G. frondosa. The maximum exoglucanase activity (1.01 ± 0.06 U/mL) in P. decumbens extract appeared on day 4 ( Figure 7E), and the β-glucosidase secreted by A. tamatrii obtained a maximum activity of 0.14 ± 0.01 U/mL on day 6 ( Figure 7F). T. reesei was well known for its involvement in the degradation of complex biomass carbohydrates and was used as the main industrial producers of cellulases [34,35]. These lignocellulase activities profiles indicated that T. asperellum ND-1 generated an enzyme mixture with enhanced cellulose hydrolysis capability similar to that of T. reesei. Moreover, genome sequencing and analysis of the biomass-degrading fungus T. asperellum ND-1 were performed to pave the way for designing enhanced T. asperellum ND-1 strains toward a more rapid conversion of lignocellulose into soluble sugars for bioenergy production.

Conclusions
The whole genome sequence and lignocellulases activities of the newly isolated T. asperellum ND-1 were determined for the first time. A high-quality genomic sequence of ND-1 has an assembly size of 35.75 Mb comprising 10,541 predicted genes. Secretome analysis showed that 895 proteins were detected, with 211 proteins associated with CAZymes, possessing remarkable potential for utilization in biomass decomposition. Comparative genome analysis suggested that the genome of ND-1 contained many genes involved in biological-control, which would be useful to investigate Trichoderma species as biocontrol agents. Furthermore, the genome of ND-1 encoded a higher diversity of polysaccharidedegrading enzymes, especially those associated with hemicellulose deconstruction. Compared with T. reesei (CICC 40932), ND-1 produced higher hemicellulases (particularly xylanase) and similar cellulases activities. These results will help us understand the unique hydrolytic enzyme system of T. asperellum ND-1 and promote the investigation of more efficient and cost-effective enzymes for the degradation of lignocellulosic biomass.
All fungi were precultured on potato dextrose agar (PDA) at 28 • C for 4 days. In total, 5 g of unpretreated, dry corn stover (milling to 2 cm), 0.2 g tryptone, 0.2 g yeast extract, and 100 mL of a basal salt solution (0.5 g/L MgSO 4 ·7H 2 O, 1 g/L K 2 HPO 4 ·3H 2 O, 2 g/L NH 4 Cl, 0.5 g/L KCl, 0.02 g/L FeSO 4 ·7H 2 O, 0.03 g/L CaCl 2 , and 0.02 g/L ZnSO 4 ·7H 2 O) were added to 250 mL Erlenmeyer flasks, and the mixtures were sterilized at 121 • C for 30 min used as inducing medium. For lignocellulases (cellulases, hemicellulases) activity analysis, suspensions of T. asperellum ND-1 and other fungi were inoculated onto sterile inducing medium at approximately 5 × 10 8 spores, which were cultured with agitation at 200 rpm, 28 • C, for 6 days. Culture samples were taken every day and centrifuged at 12,000× g for 10 min to collect the supernatant. The supernatant containing the crude enzymes was then used directly for enzyme assays. Experiments were performed in triplicate.
Genomic DNA preparation and quality assessment. The T. asperellum ND-1 (MH496612) strain cultured on PDA medium at 28 • C for 2 days was inoculated in potato dextrose broth (PDB) and incubated at 28 • C for 3 days, 200 rpm. Fungal biomass (3.5 g) of 500 mL was acquired via centrifugation for 15 min, at 4000 rpm, and maintained in liquid nitrogen. Genomic DNA of T. asperellum ND-1 was isolated using the Omega Fungal DNA Kit D3390-02, according to fungal DNA extraction protocol. The purity and concentration of genomic DNA were quantified the with NanoDrop 2000 (Thermo Fisher Scientific, Waltham, MA, USA) and TBS-380 (Turner BioSystems Inc., Sunnyvale, CA, USA) methods, respectively.
Sequencing and assembly. T. asperellum ND-1 genome was sequenced using a combination of PacBio sequel single molecule real-time (SMRT) [42] and Illumina sequencing platforms (MajorBio Co., Shanghai, China). DNA libraries containing~400 bp and 10-kb inserts were prepared. The 400-bp library was constructed according to NEXTflex™ Rapid DNA-Seq Kit, including fragmentation of genomic DNA, end repair, adaptor ligation, and PCR amplification. The 400-bp library was used for paired-end Illumina sequencing (2 × 150 bp) by Illumina HiSeq 2000 and assembled with SOAPdenovo version 2.04 (http://soap.genomics.org.cn/, accessed on 5 April 2022). The 10-kb library was prepared using PacBio's standard methods. DNA fragments were purified, end-repaired, and ligated with SMRTbell sequencing adapters following the manufacturer's instruction (Pacific Biosciences, Menlo Park, CA, USA). The 10-kb library was evaluated with 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA), sequenced by SMRT, and the sequencing results (filtered reads: 4.92 G, sequencing depth: 123×) were assembled into contigs through CANU (version 1.7) with default parameters [74]. Furthermore, error correction of the PacBio assembly results was performed using the Illumina reads and gap filling with GAPCLOSER version 1.12 [75]. Finally, quality assessment of genome assembly was carried out using CEGMA (version 2.5) and BUSCO (version 3.0) softwares.
Phylogenetic analyses. The evolutionary relationships of T. asperellum ND-1 and other selected fungi species were evaluated using the proteomes of these fungi. Protein sequence alignment was performed using ClustalW software [85], and the phylogenetic tree was constructed by MEGA version 7.0 [86] with the UPGMA method. In addition to T. asperellum ND-1, the proteomes of other selected fungi available on DOE Joint Genome Institute [87] were contained: T. asperellum CBS 433.97 (GenBank assembly accession GCA_003025105.1), T. Comparison analysis of orthologous gene families. In order to identify the orthologous genes of the four Trichoderma species (T. asperellum ND-1, T. reesei QM6a, T. atroviride IMI 206040, T. virens Gv-298), we used orthoMCL for the similar pairwise matches to confirm that the groups were orthologous in the Trichoderma genomes [65,88]. The genes that were defined as orthologs from clusters of paralogs were subtracted, then the rest of species-specific gene sets of the cluster group expanded because of the most recent common ancestor (MRCA) of the four Trichoderma genomes [43].
Enzyme assays. β-glucosidase, β-xylosidase, α-L-arabinofuranosidase, and cellobiohydrolase activities were evaluated [91]. Reaction mixture containing 100 µL crude enzyme and 100 µL of 5 mM pNPX, pNPC, pNPG, and pNPAf substrates were incubated in 50 mM sodium acetate buffer (pH 5.0) at 50 • C for 10 min. The reaction was terminated using 100 µL sodium carbonate (1.0 M). A mixture without enzymes was used as the control. An amount of liberated pNP was quantified by determining the absorbance at 405 nm, and one unit was defined as the number of enzymes required to release 1 µmol pNP per min.
Endoxylanase and endoglucanase activities were assayed using the 3,5-dinitrosalicylic acid (DNS) method [92], with 1% (w/v) of CMC-Na and BWX as substrate, respectively. The reaction system (150 µL of 1.0% (w/v) substrate with 50 µL crude enzyme) was incubated in 50 mM sodium acetate buffer (pH 5.0) for 10 min at 50 • C, and the reaction was stopped by adding 50 µL of 1 M NaOH. A mixture without enzymes was used as the control. After boiling at 100 • C for 5 min, the amount of reducing sugar was assayed at absorbance 540 nm, with one activity unit defined as the enzyme (endoglucanase or endoxylanase) amount that liberated 1 µmol of reducing sugar (equivalent to glucose or xylose) per min from CMC-Na or BWX (equivalent to glucose or xylose) per min under assay conditions. The respective standard curves were obtained with 0.1-0.7 mg/mL glucose and xylose. All enzyme activities were performed in triplicate.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/catal12040437/s1, Figure S1. Percentage distribution of different types of repetitive elements in the genome of T. asperellum ND-1.; Figure S2. Functional annotation of the T. asperellum ND-1 secretome showing top 20 hits of different category. MF, molecular function; CC, cellular component; BP, biological process; Figure S3. Statistical analysis of CAZymes of T. asperellum ND-1 genome. Different colors of the pie chart represent different CAZy classifications, and their areas represent the proportion of genes in the classification; Table S1. Genome features of T. asperellum ND-1, T. asperellum IC-1 and T. asperellum CBS 433.97.; Table S2. Repetitive elements identified in the T. asperellum ND-1 genome. Table S3. Glycoside hydrolases (GHs) identified in the genome of T. atroviride IMI 206040, T. virens Gv-298, T. reesei QM6a and T. asperellum ND-1.

Data Availability Statement:
The whole genome sequence data reported in this paper has been deposited in the Genome Warehouse [93] at the National Genomics Data Center, Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences, under accession number GWHAOQT00000000, which is publicly accessible at https://bigd.big.ac.cn/gwh, accessed on 5 April 2022.

Conflicts of Interest:
The authors declare no competing interest.