Hybrid Assembly Improves Genome Quality and Completeness of Trametes villosa CCMB561 and Reveals a Huge Potential for Lignocellulose Breakdown

Trametes villosa is a wood-decaying fungus with great potential to be used in the bioconversion of agro-industrial residues and to obtain high-value-added products, such as biofuels. Nonetheless, the lack of high-quality genomic data hampers studies investigating genetic mechanisms and metabolic pathways in T. villosa, hindering its application in industry. Herein, applying a hybrid assembly pipeline using short reads (Illumina HiSeq) and long reads (Oxford Nanopore MinION), we obtained a high-quality genome for the T. villosa CCMB561 and investigated its genetic potential for lignocellulose breakdown. The new genome possesses 143 contigs, N50 of 1,009,271 bp, a total length of 46,748,415 bp, 14,540 protein-coding genes, 22 secondary metabolite gene clusters, and 426 genes encoding Carbohydrate-Active enzymes. Our CAZome annotation and comparative genomic analyses of nine Trametes spp. genomes revealed T. villosa CCMB561 as the species with the highest number of genes encoding lignin-modifying enzymes and a wide array of genes encoding proteins for the breakdown of cellulose, hemicellulose, and pectin. These results bring to light the potential of this isolate to be applied in the bioconversion of lignocellulose and will support future studies on the expression, regulation, and evolution of genes, proteins, and metabolic pathways regarding the bioconversion of lignocellulosic residues.


Introduction
Lignocellulosic biomass (LB), including agro-industrial residues, municipal solid wastes, and forest litter is one of the largest stocks of carbon and energy on Earth [1]. These are a sustainable, renewable, and abundant source of raw material, which can be (bio-) converted into high value-added products, such as bio-based chemicals, polymers, and fuels [1,2]. LB is composed mainly of cellulose (40-50%), hemicellulose (25-30%), and lignin (15-25%), as well as other compounds in lower proportions, such as pectin, proteins, extractables, and ash [3]. Although lignocellulosic materials are inexpensive, abundant, and considered a valuable feedstock for biorefineries (bio-based industry), a major challenge a size of 46.748 Mb, 14,540 proteins-encoding genes, and 22 secondary metabolite gene clusters (SMGCs). In addition, a wide array of genes encoding lignocellulose-modifying enzymes was identified, revealing a huge potential of the isolate T. villosa CCMB561 to act in the degradation of all lignocellulose polymers, making it a high-potential strain to be industrially used.

Fungal Strain and Extraction of Genomic DNA
The fungal strain T. villosa CCMB561 was isolated from field-collected basidiomata growing on a decaying tree branch (unidentified angiosperm) in the semiarid region of northeastern Brazil (Serra das Candeias, Quijingue, Bahia, Brazil; Lat: 39 • 04 30 W and Long: 10 • 55 16 S). Dehydrated basidiomata were deposited in the HUEFS herbarium (HUEFS108280), and the culture derived from the basidiomata tissue was preserved in sterile distilled water and deposited in the Culture Collection of Microorganisms of Bahia (CCMB, Feira de Santana, Bahia, Brazil) under access code CCMB561. The isolate was grown on Malt Extract Agar (2% Malt Extract, 2% dextrose and 2% Agar) at 28 ± 2 • C for seven days. Then, the total DNA was extracted using the ZymoBIOMICS TM DNA Miniprep Kit (Zymo Research, Irvine, CA, USA). The DNA sample was analyzed qualitatively by agarose gel electrophoresis 1%, and quantitatively by a Nanodrop 1000 ND spectrophotometer (Thermo Scientific, Waltham, MA, USA) and Qubit fluorometer (Invitrogen, Waltham, MA, USA). For species-level certification of the extracted DNA, the internal transcribed region (ITS1-5.8S-ITS2) was amplified and sequenced using the ITS 6 (5 -TTCCCGCTTCACTCGCAGT-3 ) and ITS 8 (5 -AGTCGTAACAAGGTTTCCGTAGGTG-3 ) primers [33]. Amplification reaction, purification, and sequencing of the amplicons were carried out according to the methods described by Tomé et al. 2019 [34].

MinION Library Preparation and Sequencing
We fragmented genomic DNA [8 µg] to approximately 8 Kbp using the Covaris g-TUBE (Covaris, Woburn, MA, USA). After fragmentation, 1200 ng of DNA was purified using the AMPureXP beads (Beckman Coulter Inc., Brea, CA, USA), not adopting the DNA repair step. The sequencing library was prepared using the Ligation Sequencing Kit 1D (SQK-LSK108), the Native Barcoding Kit 1D (EXP-NBD103), and the Library Loading Bead Kit (EXP-LLB001), following the recommendations of Oxford Nanopore Technologies. The library was sequenced for 48 h in the flowcell FLO-MIN106 (ID: FAK07371) using the MinKNOW program with the real-time base calling function enabled. Porechop software (https:// github.com/rrwick/Porechop, accessed on 15 January 2020) was used to demultiplex the libraries and remove the adapters ( Figure S1).

Illumina Library Preparation and Sequencing
The sequencing library was prepared from genomic DNA [1 µg] using the NEBNext Fast DNA Fragmentation and Library Preparation Kit (New England Biolabs, Ipswich, MA, USA) following the manufacturer's recommendations. The library quality was assessed using the Agilent 2100 Bioanalyzer equipment, and the paired-end DNA sequencing was carried out in the Illumina HiSeq 2500 platform. After sequencing, the raw reads quality was assessed using the FastQC v0.11.5 software (https://github.com/s-andrews/FastQC, accessed on 15 January 2020). Adapter sequences and bases with low quality (Phred score <20) were trimmed using BBDuk software (https://sourceforge.net/projects/bbmap/, accessed on 15 January 2020) ( Figure S1). Genome features such as size, heterozygosity, and repetitiveness were assessed prior to genome assembly using Jellyfish and GenomeScope 2.0 [35,36].

Genome Annotation and Gene Ontology Analyses
Genome annotation was performed using the MAKER2 v2.31.9 software [46][47][48] and the following ab initio gene prediction software: SNAP [49], Augustus [50], and Gene-Mark [51] ( Figure S2). Low-and high-complexity repetitive genomic regions were masked using the RepeatMasker [52], the Repbase database, and the RepeatRunner software [48]. The identification of gene regions and the prediction of proteins were performed through the alignment of ESTs (Expressed Sequence Tags) and proteins of the genus Trametes (obtained from NCBI until September 2020) using the BLAST algorithm and the Exonerate program. After the annotation by evidence, the software SNAP, Augustus, and Gene-Mark were used in the further annotation steps. The annotation metrics, such as the number of genes, exons, and introns, were obtained using GAG software (Genome Annotation Generator) [53]. The assignment of gene function was carried out with the support of tools provided by MAKER [47], the makeblastdb application, the UniProt database (uniprot_sprot.fasta), and the blastp algorithm. Gene ontology analyses were carried out in the web server GoFeat (Gene Ontology Functional Enrichment Annotation Tool), with the support of the following databases: Uniprot, NCBI protein, KEGG, InterPro, Pfam, EMBL, and Gene Ontology [54]. The prediction of transfer RNA (tRNA) was performed using the software tRNAscan-SE [55]. Secondary metabolite gene clusters (SMGCs) were predicted using the online tool antiSMASH 6.0.1 [56].

Repeat Annotation
Transposable elements (TE) were identified de novo using the RepeatModeler package (repeatmasker.org/RepeatModeler, accessed on 15 September 2020) with the support of RepeatMasker [52], RECON [57], RepeatScout [58], TRF [59], and RMBlast. The obtained TE library (consensus sequences) was filtered by removing all sequences <100 bp and those showing significant hits with proteins not identified as TE using blastx and the UniProt database [7]. The classification and number of occurrences of TE were assessed using the RepeatMasker tool.

Comparative Genomics and Phylogenomics
The genome of the fungus T. villosa CCMB561 was compared with the following seven genomes publicly available at the National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov/, accessed on 15 March 2020): T. coccinea (GCA_002092935.1), T. sanguinea (GCA_008973685.1), T. cinnabarina (GCA_000765035.1), T. hirsuta (GCA_001302255.2), T. polyzona (GCA_001939255.1), T. pubescens (GCA_001895945.1), and T. versicolor (GCF_000271585.1); and two publicly available at the Joint Genome Institute (JGI, https://genome.jgi.doe.gov/portal/, accessed on 15 March 2020): T. ljubarskyi (CIRM1659) and T. elegans (CIRM1663, synonym: Artolenzites elegans). The completeness and the main metrics of the retrieved genomes were assessed using BUSCO and QUAST, respectively. In order to standardize and improve the accuracy of the comparative analyses, all genomes were reannotated, and the transposable elements were identified using the methods described in Sections 2.5 and 2.6, respectively. Phylogenomic analyses were carried out using the script BUSCO_phylogenomics (https://github.com/jamiemcg/BUSCO_ phylogenomics, accessed on 15 October 2020), in which single-copy ortholog genes were aligned using MUSCLE. The alignment was trimmed using trimAl [60], and the estimation of the best-fit model was performed using ModelFinder. The maximum likelihood phylogenetic tree was generated using the IQ-TREE software, adopting the supermatrix method [61]. The consensus tree was constructed considering 1000 bootstrap replicates and visualized in FigTree v1.4.3. The species Polyporus brumalis (Polyporales, Basidiomycota-GCA_001792895.1) was used in the phylogenomic analyses as the outgroup. Network and correlation analyses were performed based on the genome length, number of genes, TE coverage, GC content, and number of tRNAs, using the PAST 4.04 software. The Bray-Curtis dissimilarity index (edge cutoff: 50%) was adopted for the network analyses while the correlation analyses were conducted using Pearson's correlation.

CAZy Annotation and Potential for Lignocellulose Degradation
The Carbohydrate Active enzymes (CAZymes) of the Trametes species were functionally annotated using the dbCAN2 web server (http://bcb.unl.edu/dbCAN2/, accessed on 15 November 2020) with the integration of the following automated annotation tools/databases: (i) HMMER, (ii) DIAMOND, and (iii) Hotpep [62]. The dbCAN outputs were manually curated. Proteins identified by two or three tools (HMMER, DIAMOND, and HOTPEP) were considered correctly classified while those identified by only one tool were subjected to blastp analyses (protein-protein BLAST) to confirm the dbCAN classification. After annotation and manual curation, proteins related to cellulose, hemicellulose, lignin, and pectin degradation were counted and heat maps were generated using the pheatmap package (1.0.12) in R software (R 4.0.3).

Illumina and MinION Sequencing
After adapter removal and quality trimming, 48,347,940 short reads were obtained through the Illumina sequencing (read length about 150 pb), corresponding to approximately 14 Gb (Table 1). Using these data and the GenomeScope software, the genome of the isolate T. villosa CCMB561 was estimated to have 44,895,640 bp in length, a homozygosity rate of 97.4%, and heterozygosity of 2.6% (Supplementary Data S1). The sequencing using the Oxford Nanopore platform generated 1,043,247 long reads, totaling 8.1 Gb. The long reads had an average size of 4.47 kb, N50 of 5.1 kb, and the longest read with 21,613 bp (Table 1). According to the estimated genome size of the CCMB561 strain, coverages of 129× and 93× were obtained through sequencing on the Illumina and Oxford Nanopore platforms, respectively. In previous studies, genomes with lengths similar to the estimated length of T. villosa CCMB561 were assembled with good contiguity and completeness using even smaller coverages than those obtained in this study. For example, the genome of the fungus Leptosphaeria maculans Nz-T4, which has a size of 43.42 Mb, was assembled in 288 contigs, using 56× long reads and 98× short reads [63]. In another study, the genome of the alga Chlorella variabilis, with a size of 46.67 Mb, was assembled into 302 contigs, using 56× long reads and 78× short reads [64]. Thus, the sequencing coverage obtained in this study was considered sufficient for the high-quality assembly of the genome of T. villosa CCMB561.

Genome Assembly and Assessment
In this study, eight assembly strategies were tested: one using exclusively short reads (Illumina HiSeq), four using only long reads (Oxford Nanopore MinION), and three based on the hybrid assembly, combining short and long reads ( Table 2). The best result was obtained using the assembly workflow MaSuRCa-Purge_Dups (Hybrid assembly) ( Table 2 and Figure 1a), which used MaSuRCa software to generate a primary assembly and the Purge_Dups program to identify and remove haplotypic duplications. The use of this workflow resulted in a genome with 143 contigs, a total length of 46,748,415 bp, the largest contig with 9,749,168 bp, and N50 of 1,009,271 bp ( Table 2). This genome had the smallest difference according to the genome size estimated by GenomeScope2 (difference of 1,852,775 bp) and presented the best completeness index through the BUSCO analysis. The assembled genome presented 99.1% of the orthologous genes searched, of which 96.7% are single copies, 2.4% are duplicated, and 0.1% are fragmented. Finally, only 0.8% of the genes were not found (Table 3).     The assemblies in which the Purge_Dups software was not used showed a high degree of gene duplication and/or genome size greater than expected for the Trametes genus (~44 Mb), except when the CANU-smartdenovo pipeline was used (Tables 2 and 3). Previous studies have already demonstrated that, in order to facilitate the genome sequencing and assembly from dikaryotic fungi, dedikaryotization and thus the obtainment of a monokaryotic isolate is an essential step [27]. Hence, the high rate of gene duplication and genome size larger than expected may be related to the sequencing of the dikaryotic mycelium. Nevertheless, the use of Purge_Dups software [43] allowed us to remove duplications and increase genome contiguity without the need to obtain a monokaryotic isolate, which is laborious and time-consuming.
A high degree of fragmentation was detected when the genome was assembled using only Illumina sequencing data (Table 2). Conversely, when only long reads were used, the genome showed low fragmentation, but smaller completeness (Tables 2 and 3). This result is due to Illumina HiSeq sequencing generating reads with sizes between 100 and 250 bp, which leads to greater fragmentation of the genome. On the other hand, the MinION approach, despite generating long reads, which could exceed 2 Mb, presents a higher error rate (1D sequencing), which may imply a lower completeness rate [65].
Compared to the preliminary and draft genome of T. villosa CCMB561 deposited at the NCBI, herein, using the MaSuRCa-Purge_Dups assembly workflow (Figure 1a), we obtained a much higher-quality genome, with a significant reduction in the number of contigs (from 10,327 to 143) and improvement in the metrics N50, L50, and size of the largest contig ( Figure 1b). Furthermore, based on the BUSCO analysis (Figure 1c), the newly assembled genome has better completeness indices and fewer duplicated, fragmented, and missing genes. Similarly, Maggiori et al. (2021) demonstrated that by using a hybrid assembly approach (HiSeq + MinION), it was possible to obtain a greater number of genomes from metagenomic samples, with longer contigs, more coding sequences, higher completeness, less contamination, and higher N50 [66].

Genome Annotation and Gene Ontology (GO) Analysis
The annotation results showed that T. villosa CCMB561 has 14,540 protein-coding genes, 86,516 exons, and 71,976 introns, which correspond to 66% of the genome (Table S1). Each gene had, on average, six exons and five introns with sizes of 265 bp and 112 bp, respectively, which agrees with other species of the genus (Table S1). In total, 274 transposable elements (TEs) were identified in the CCMB561 isolate, corresponding to 7.13% of the genome. The number of transport RNA (tRNAs) identified was 334.
According to gene ontology (GO) analysis, 8169 proteins of T. villosa CCMB561 were associated with GO terms, corresponding to 56.18% of the predicted sequences. Functionally annotated proteins were classified into three categories: (i) molecular function, (ii) cellular component, and (iii) biological process ( Figure 2). In the "cellular component" category ( Figure 2c), most proteins were associated with the terms "integral component of membrane" (2395), "nucleus" (638), and "cytoplasm" (319), which are terms related to the cell anatomy.

Secondary Metabolite Gene Clusters (SMGCs)
Fungi possess many gene clusters responsible for producing Secondary Metabolites (SMs), which have important ecological functions [70]. SMs are not essential for the normal growth of the organism but may act as defense compounds (e.g., against fungi and bacteria) and signaling molecules, being fundamental for ecological interactions and survival [71]. Because of their bioactive pharmacological properties, these molecules have been widely studied and tested in the healthcare industries to be used as antibiotics, antifungals, anti-inflammatory, and anticancer agents [71,72].
In T. villosa CCMB561, we identified 22 SM biosynthesis clusters, which comprise one non-ribosomal peptide synthetase (NRPS) cluster, one NRPS-like/betalactone cluster, three Type I Polyketide synthase (T1PKS) clusters, six NRPS-like clusters, and eleven Terpene clusters (Figure 3a and Supplementary Data S2). The NRPS-type cluster (Region 33.1/scf7180000000809) was identified with 100% of similarity with the basidioferrin compound cluster from Gelatoporia subvermispora (BGC0001527.1) (Polyporales, Basidiomycota). Basidioferrin is widely distributed in basidiomycetes and is part of the siderophore synthetases family, which are enzymes responsible for the biosynthesis of siderophores and iron metabolism [73]. Previous studies have reported that in most bacteria and fungi (pathogenic and non-pathogenic), the acquisition of high-affinity iron is mediated by siderophore-dependent pathways [74]. In the categories "biological processes" (Figure 2b) and "molecular function" (Figure 2d), terms related to the degradation of lignocellulosic biomass, such as "carbohydrate metabolic process" (GO:0005975), "hydrolase activity" (GO:0016787) "oxidoreductase activity" (GO:0016491), and "heme-binding" (GO:0020037) are among the most representative. This is an expected result since species from the genus Trametes act in the degradation of the main components of lignocellulosic biomass through the production and secretion of a large set of hydrolytic and oxidative enzymes [7,9,12,67]. Additionally, 205 proteins associated with "transmembrane transporter activity" (GO:0022857) have been identified (Figure 2d). This term could indicate enzymes that are secreted and have extracellular activity, such as lignocellulose-degrading enzymes [6].
In the genome of the CCMB561 strain, 147 cytochrome P450 (CYP) genes and one NADPH-cytochrome P450 reductase (CPR) gene were identified. Similar results were found by Sun et al. (2018), who identified in T. versicolor the presence of only one NADPH genecytochrome P450 reductase and multiple sequences belonging to genes of the cytochrome P450 family [68]. These genes are widely known for their importance in the degradation of lignin and organic pollutants (aromatic and xenobiotic compounds) [68]. Genes from the CYP family also play a role in the metabolism and adaptation of fungi to specific ecological niches [6,69]. Complementarily, Liu et al. (2019) described that the fungus Trametes trogii S0301 has 158 CYPs that may be related to a variety of metabolic functions [6].

Secondary Metabolite Gene Clusters (SMGCs)
Fungi possess many gene clusters responsible for producing Secondary Metabolites (SMs), which have important ecological functions [70]. SMs are not essential for the normal growth of the organism but may act as defense compounds (e.g., against fungi and bacteria) and signaling molecules, being fundamental for ecological interactions and survival [71]. Because of their bioactive pharmacological properties, these molecules have been widely studied and tested in the healthcare industries to be used as antibiotics, antifungals, antiinflammatory, and anticancer agents [71,72].
In T. villosa CCMB561, we identified 22 SM biosynthesis clusters, which comprise one non-ribosomal peptide synthetase (NRPS) cluster, one NRPS-like/betalactone cluster, three Type I Polyketide synthase (T1PKS) clusters, six NRPS-like clusters, and eleven Terpene clusters (Figure 3a and Supplementary Data S2). The NRPS-type cluster (Region 33.1/scf7180000000809) was identified with 100% of similarity with the basidioferrin compound cluster from Gelatoporia subvermispora (BGC0001527.1) (Polyporales, Basidiomycota). Basidioferrin is widely distributed in basidiomycetes and is part of the siderophore synthetases family, which are enzymes responsible for the biosynthesis of siderophores and iron metabolism [73]. Previous studies have reported that in most bacteria and fungi (pathogenic and non-pathogenic), the acquisition of high-affinity iron is mediated by siderophore-dependent pathways [74].  Most of the secondary metabolites biosynthesis clusters identified in T. villosa CCMB561 were assigned to the terpene type (11 clusters) (Figure 3a). Terpenoids have multiple biological activities and comprise sesquiterpenoids, diterpenoids, and triterpenoids. Their activities in inducing the apoptosis of human tumor cells, antibacterial, antimetastasis, and anti-HIV activity have already been demonstrated [72]. Using the MIBiG database (Minimum Information about a Biosynthetic Gene cluster), whole or partial genes of some known terpene clusters were identified. Our results showed that fragments of the clusters squalestatin S1 from Aspergillus sp. (BGC0001839.1), geosmin from Streptomyces coelicolor (BGC0001181.1), and koraiol from Fusarium fujikuroi (BGC0001642.1) were identified in the CCMB561 genome.
It is important to highlight that most clusters (21 clusters), despite being classified according to the type, had no similarity in the MIBiG database with the biosynthesis cluster of known compounds. Therefore, the CCMB561 isolate has the potential to produce a variety of secondary metabolites; however, these metabolites have not yet been identified or reported in the literature. Most of the secondary metabolites biosynthesis clusters identified in T. villosa CCMB561 were assigned to the terpene type (11 clusters) (Figure 3a). Terpenoids have multiple biological activities and comprise sesquiterpenoids, diterpenoids, and triterpenoids. Their activities in inducing the apoptosis of human tumor cells, antibacterial, antimetastasis, and anti-HIV activity have already been demonstrated [72]. Using the MIBiG database (Minimum Information about a Biosynthetic Gene cluster), whole or partial genes of some known terpene clusters were identified. Our results showed that fragments of the clusters squalestatin S1 from Aspergillus sp. (BGC0001839.1), geosmin from Streptomyces coelicolor (BGC0001181.1), and koraiol from Fusarium fujikuroi (BGC0001642.1) were identified in the CCMB561 genome.

CAZome Annotation
It is important to highlight that most clusters (21 clusters), despite being classified according to the type, had no similarity in the MIBiG database with the biosynthesis cluster of known compounds. Therefore, the CCMB561 isolate has the potential to produce a variety of secondary metabolites; however, these metabolites have not yet been identified or reported in the literature.
The Carbohydrate-binding module (CBM) domain was detected in 33 genes encoding enzymes belonging to classes AA, GH, and CE (Supplementary Data S3). Most of the CBM domains (18 in total) found in the annotated genes belong to the Carbohydrate-binding module family 1 (CBM1) and were found in enzyme-encoding genes acting in the cellulose (AA9, GH3, GH5_5, GH6, and GH131 families) and hemicellulose breakdown (CE1, CE15, GH5_7, GH10, and GH74 families). CBM domains promote an enzyme association with the substrate and increase enzymatic hydrolysis and degradation of polysaccharides [76,77]. In addition, four dye-decolorizing peroxidases (DyPs) with the potential to oxidize lignin-like compounds and other phenolic polymers were identified in the genome of the isolate T. villosa CCMB561 [10].
Similar results of our CAZome annotation have been reported for the species T. versicolor, which has 424 genes encoding Carbohydrate-Active enzymes (CAZymes) [18]. T. versicolor is closely related to T. villosa and is one of the most common and widespread species of white-rot and basidiomata-forming fungi, showing great potential to act in lignocellulose breakdown [7,18].

Comparative Genomics and Phylogenomics of the Genus Trametes
The Maximum Likelihood phylogenetic matrix (RaxML) included 11 sequences with 798,158 amino acids from 1346 concatenate proteins of each genome. From these amino acids, 146,797 had distinct patterns, 152,144 were parsimoniously informative, 128,942 were parsimoniously non-informative, and 517,072 were constant characters. According to the tree topology (Figure 4a), T. villosa, T. versicolor, and T. pubescens were grouped in the same clade with a 100% bootstrap while the monophyletic clade formed by T. coccinea, T. sanguinea, and T. cinnabarina is the most phylogenetically distant from Trametes villosa CCMB561. These clustering patterns have already been reported in a previous phylogenetic reconstruction of Polyporales fungi, based on LSU and ITS ribosomal DNA markers [78]. In the Complex network analysis (Figure 4b), as well as in the phylogenomic analysis, T. villosa, T. versicolor, and T. pubescens were grouped together, suggesting these species have similar structural genomic characteristics, reinforcing the phylogenetic proximity.
Genome size and the number of genes were the only features that had a positive and statistically significant correlation (p < 0.05) in the genomes of Trametes spp. (Figure 4c). Interestingly, TE coverage was not significantly correlated with genome size. Similar results were described by Castanera et al. (2017), who described for some genera of the Agaricomycotina subphylum, including Trametes species, a high correlation between genome size and gene content while the correlation between the number of TEs and genome size was unclear [82]. Overall, all the analyzed Trametes genomes displayed variability in relation to the analyzed features, with no clear pattern concerning the genome size, number of genes, number and coverage of TEs, and number of tRNAs. This variability may be related to the environment, selective pressures, and ecological and evolutionary factors to which each species is subjected to [84]. The fungal genome size can be impacted by different factors and is directly related to the fungal lifestyle, as well as adaptive and ecological needs [79][80][81]. Therefore, herein we evaluated the genome size of Trametes spp. and features that may impact the genome length, such as the number of genes, number and coverage of TEs, number of tRNAs, and GC content. Among the species analyzed, the genome sizes ranged from 32.758 Mb (T. coccinea) to 46.748 Mb (T. villosa CCMB561), and the number of genes from 10,725 (T. elegans) to 14,540 (T. villosa CCMB561) (Figure 4a). The average genome length for Basidiomycetes is 46.48 Mb, ranging from 9.82 (Wallemia sebi) to 130.65 Mb (Dendrothele bispora) [79]. The genomes of the Trametes spp. evaluated in this study had sizes within this range, and T. villosa CCMB561 was the species with a genome size closest to the mean described to the phylum Basidiomycota. This species also presented the highest number of genes, followed by the closely related T. versicolor and T. pubescens.
The coverage of TEs in the genomes ranged from 2.22% (T. coccinea) to 10.61% (T. hirsuta) (Figure 4a). Most of the classified TEs were LTR retrotransposons (long terminal repeats) belonging to Copia and/or Gypsy types (Table 4). Moreover, retrotransposons SINEs and LINEs, as well as DNA transposons and Helitrons were also identified. TEs can act in the modulation of genomes, through recombination and transposition, which can lead to chromosomal rearrangements and alter gene expression [80]. In Basidiomycota, the genome content corresponding to TEs can vary from 0.1 to 45.2% (average around 11%) and, usually, most of the TEs are LTR retrotransposons (Gypsy and Copia) [82]. A possible reason for the high copy number of TEs class I in Basidiomycota, including the studied species, is its transposition mechanism, which uses an intermediate RNA, resulting in the increased proliferative success of the TEs [82]. The GC content had no significant variation among the analyzed species and ranged from 55.7% (T. cinnabarina) to 59.5% (T. villosa) (Figure 4a). On the other hand, the number of tRNAs varied from 270 (T. versicolor) to 396 (T. polyzona), with a difference of more than 100 tRNAs among the species in this genus (Figure 4a). Transfer RNAs play a central role in protein biosynthesis and are involved in many biological functions in eukaryotic organisms, such as in the regulation of gene expression [83]. Therefore, this difference/expansion in the number of tRNAs could be related to specific evolutionary mechanisms and the lifestyle of each species [84].
Genome size and the number of genes were the only features that had a positive and statistically significant correlation (p < 0.05) in the genomes of Trametes spp. (Figure 4c). Interestingly, TE coverage was not significantly correlated with genome size. Similar results were described by Castanera et al. (2017), who described for some genera of the Agaricomycotina subphylum, including Trametes species, a high correlation between genome size and gene content while the correlation between the number of TEs and genome size was unclear [82].
Overall, all the analyzed Trametes genomes displayed variability in relation to the analyzed features, with no clear pattern concerning the genome size, number of genes, number and coverage of TEs, and number of tRNAs. This variability may be related to the environment, selective pressures, and ecological and evolutionary factors to which each species is subjected to [84].

Potential for Lignocellulose Breakdown by Trametes spp.
Exploring the presence, abundance, and composition of oxidoreductases and carbohydrate-active enzymes (CAZymes) in wood-decay fungi provides important information on its nutritional preferences and adaptations, the metabolic pathways used, as well as the expansion and evolution of gene families related to lignocellulose breakdown. This information is important for an understanding of fungal biology and further application in the biotechnology industry. In this study, ten genomes of different Trametes species were evaluated for the presence of 40 gene families encoding enzymes that act in the breakdown of lignin, hemicellulose, cellulose, and pectin ( Figure 5).
The lignin degradation is mainly performed by white-rot fungi and involves a series of enzymes classified as an Auxiliary Activity family (AA) [7,10,27,85]. In Trametes spp. genomes, many AAs were identified and have a central role in lignin modification (Figure 5a). The studied genomes harbor from 7 to 9 AA1-encoding genes, and in T. villosa, eight genes were recognized as AA1 (Figure 5a). The AA1 family encompasses multicopper oxidases (Laccases) that act directly on a wide range of aromatic and phenolic compounds, such as lignin [75,85,86]. These enzymes do not require cofactors for their activity, so they are of great interest for industrial applications [86].
The greatest number of lignin-modifying enzymes encoding genes was identified in families AA2 (9-25 genes) and AA3 (14-30 genes) (Figure 5a). The AA2 family includes Lignin Peroxidase (LiP), Manganese Peroxidase (MnP), and Versatile Peroxidase (VP). These enzymes are classified as class II peroxidases (PODs), since they use hydrogen peroxide (H 2 O 2 ) as a cofactor for lignin breakdown [7,75,85,86]. The results displayed in Figure 5a also demonstrated an expansion in the number of AA2-encoding genes in T. villosa (25 copies), T. hirsuta (18 copies), T. pubescens (20 copies), T. ljubarskyi (16 copies), T. polyzona (17 copies), and T. versicolor (22 copies). This result could indicate a possible metabolic adaptation of these species to initially use lignin as the main carbon source for their growth. It is worth noting that T. villosa was the species with the largest number of AA2-encoding genes.
hydrate-active enzymes (CAZymes) in wood-decay fungi provides important information on its nutritional preferences and adaptations, the metabolic pathways used, as well as the expansion and evolution of gene families related to lignocellulose breakdown. This information is important for an understanding of fungal biology and further application in the biotechnology industry. In this study, ten genomes of different Trametes species were evaluated for the presence of 40 gene families encoding enzymes that act in the breakdown of lignin, hemicellulose, cellulose, and pectin ( Figure 5).  In the metabolic pathway of lignin degradation, enzymes belonging to the AA3 and AA5 families play a fundamental role in the generation of H 2 O 2 and activation of PODs [75,86]. AA3 are flavoproteins containing a flavin-adenine dinucleotide (FAD)binding domain and are generally recognized as glucose-methanol-choline (GMC) oxidoreductases, which include Pyranose 2-oxidase, Alcohol oxidase, Glucose 1-oxidase, and Aryl-alcohol oxidase [75,86]. In our analyses, AA3 was the family with the highest number of identified genes, ranging from 14 to 30 copies per species. Additionally, T. villosa was the species with the highest number of AA3-encoding genes, possessing 30 copies.
The AA5 family is composed of copper radical oxidases and includes two described subfamilies: AA5_1 (glyoxal oxidase) and AA5_2 (galactose oxidases) [75,86]. The Trametes spp. genomes harbor 6 to 9 AA5-encoding genes, and in T. villosa, seven AA5 genes were identified. One AA6-encoding gene was conserved in all Trametes species. This enzymatic family includes 1,4-benzoquinone reductases, which are responsible for the intracellular cleavage of aromatic compounds and for the protection of fungal cells from reactive quinone compounds [75].
Cellulose is a linear polymer formed by residues of D-glucose linked by β-1,4-glycosidic bonds. This is the most abundant and the least complex polysaccharide of the plant cell wall and is degraded by three classes of enzymes, β-1,4-endoglucanases, cellobiohydrolases, and β-glucosidases [3,18,77]. Figure 5c exhibits the families of enzymes related to cellulose hydrolysis. The GH3 family (β-glycosidases) had the highest number of genes, ranging from 7 to 10 copies per genome. In general, families of β-1,4-endoglucanases (families GH5_5, GH5_22, GH9, GH12, GH45, GH131), exoglucanases/cellobiohydrolases (families GH6 and GH7), and β-glucosidases (GH1 and GH3 families) were identified in all fungi. The genes encoding the GH9 and GH45 families were absent in T. ljubarskyi. Likewise, in T. pubescens and T. versicolor, the GH45 gene family was not found (Figure 5c). Finally, 15 to 19 AA9encoding genes (LPMOs) were identified in the Trametes spp. genomes (Figure 5a). AA9 genes are classified as copper-dependent lytic polysaccharide monooxygenases (LPMOs) that act in the oxidative depolymerization of crystalline cellulose [75]. Figure 5d displays the genes encoding enzymes with activity in pectin degradation. As evidenced, the GH28 family, which includes part of the glycosidic hydrolases, especially endo-and exo-polygalacturonases and endo-and exo-rhamnogalacturonases, had the largest number of genes, ranging from 5 to 11 copies per genome. Moreover, other enzymatic families were observed to be involved in pectin hydrolysis, but in a smaller proportion, such as GH78 (two to three genes), GH88 (one gene), GH105 (one to two genes), and GH53 (one gene), which include α-rhamnosidases, unsaturated glucuronyl hydrolases, unsaturated rhamnogalacturone hydrolases, and β-endogalactanases, respectively. In all genomes, pectinmethylesterases (CE8) were identified, and in most of them, enzymes belonging to the CE12 family (pectin acetylesterase) were not found. Finally, it was observed that all Trametes spp. genomes contain one gene encoding PL4 (rhamnogalacturonan endolyase), except for T. polyzona, which has three genes encoding this enzymatic family.
From the exploratory analysis of CAZymes, a set of genes encoding cellulases, hemicellulases, pectinases, and lignin-modifying enzymes were identified in the genomes of the Trametes species. These enzymes act synergistically, contributing to the breakdown of all polymers that make up the plant cell wall [3,7,18,27,77]. Among the analyzed genomes, T. villosa CCMB561 was the species with the highest number of genes encoding ligninmodifying enzymes (91 genes) and pectinases (21 genes) and the second with the highest number of genes encoding cellulases (31 genes) and hemicellulases (45 genes). It is also worth mentioning that T. villosa CCMB561 harbors all 40 searched genes related to the lignocellulose breakdown. Therefore, this isolate has great potential to be applied in the bioconversion of lignocellulosic biomass in the industry.

Conclusions
In this study, we demonstrated that through the hybrid assembly using short (Illumina HiSeq) and long reads (Oxford Nanopore MinION), and the assembly workflow MaSuRCA-Purge_dups, a high-quality genome for the isolate T. villosa CCMB561 was obtained. The contiguity and completeness of the genome assembled and presented in this study significantly increased when compared to the preliminary and draft version of this isolate previously sequenced using only short reads (Illumina HiSeq). The accurate annotation of the new genome, the comparative genomic analyses, associated with the functional annotation of the CAZymes-encoding genes demonstrated the genetic potential of the isolate T. villosa CCMB561 to act in the degradation of all components of lignocellulose. Among the analyzed genomes, T. villosa was the species with the highest number of genes encoding lignin-modifying enzymes. Lignin is the most recalcitrant polymer of the plant cell wall and, thus, its removal is considered the most limiting step for the conversion of lignocellulosic biomass. Taken together, data generated in this study provide support for future studies using genomics, transcriptomics, and proteomics tools. Still, they contribute to the understanding of the complex mechanisms involved in the expression, regulation, and evolution of genes and proteins associated with lignocellulose breakdown.
Supplementary Materials: The following supporting information can be downloaded at https: //www.mdpi.com/article/10.3390/jof8020142/s1, Figure S1: Workflow used for the genome sequencing and assembly of Trametes villosa CCMB561. Figure S2: Gene and protein annotation pipeline used for all Trametes species. Table S1: Gene annotation statistics generated by GAG software for all genomes from the Trametes genus analyzed in our study. Supplementary Data S1: GenomeScope results for the Trametes villosa