Genome Characteristics of the Endophytic Fungus Talaromyces sp. DC2 Isolated from Catharanthus roseus (L.) G. Don

Talaromyces sp. DC2 is an endophytic fungus that was isolated from the stem of Catharanthus roseus (L.) G. Don in Hanoi, Vietnam and is capable of producing vinca alkaloids. This study utilizes the PacBio Sequel technology to completely sequence the whole genome of Talaromyces sp. DC2The genome study revealed that DC2 contains a total of 34.58 Mb spanned by 156 contigs, with a GC content of 46.5%. The identification and prediction of functional protein-coding genes, tRNA, and rRNA were comprehensively predicted and highly annotated using various BLAST databases, including non-redundant (Nr) protein sequence, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Clusters of Orthologous Groups (COG), and Carbohydrate-Active Enzymes (CAZy) databases. The genome of DC2 has a total of 149, 227, 65, 153, 53, and 6 genes responsible for cellulose, hemicellulose, lignin, pectin, chitin, starch, and inulin degradation, respectively. The Antibiotics and Secondary Metabolites Analysis Shell (AntiSMASH) analyses revealed that strain DC2 possesses 20 biosynthetic gene clusters responsible for producing secondary metabolites. The strain DC2 has also been found to harbor the DDC gene encoding aromatic L-amino acid decarboxylase enzyme. Conclusively, this study has provided a comprehensive understanding of the processes involved in secondary metabolites and the ability of the Talaromyces sp. DC2 strain to degrade plant cell walls.


Introduction
Catharanthus roseus (L.) G. Don is a flowering plant species in the family Apocynaceae [1].C. roseus is widely distributed in the regions of America, Africa, Asia, southern Europe, Australia, and Vietnam [2].The plant's secondary metabolites exhibit a diverse range of beneficial effects in combating various diseases (leukemia, various types of cancer) and illnesses (sore throat, fever, indigestion, septic wounds, diabetes) [1].Moreover, the plant is highly valued in the field of medicine due to the existence of numerous alkaloids with pharmaceutical properties, such as vindoline, vinblastine, catharanthine, vincristine, ajmalicine, reserpine, serpentine, horhammericine, tabersonine, leurosine, and lochnerine [1].Among these alkaloids, vincristine, vindesine, and vinblastine have been recognized for their anticancer properties [3].However, the plant only produces a limited quantity of these beneficial alkaloids.Many research efforts have been undertaken to enhance the production of vinca alkaloids.
Endophytic fungi have gained increased attention for their capacity to produce vinca alkaloids, such as Fusarium oxysporum [4], Talaromyces radius CrP20 [5], Curvularia verruculosa [6], Botryosphaeria laricina strain CRS1 [7], and Alternaria alternata AUMC14391 [8].Several studies on C. roseus have discovered that the utilization of biotic elicitors, such as fungal concentrate, can effectively increase the synthesis of secondary metabolites under in vitro conditions.Several endophytic fungal strains elicit the accumulation of vinca alkaloids in the leaves of C. roseus.Inoculation of C. roseus with endophytes (Curvularia sp.CATDLF5 and Choanephora infundibulifera CATDLF6) was found to enhance vindoline content.This was achieved by upregulating genes associated with the terpenoid indole alkaloid biosynthesis in C. roseus [9].Previous research also found that cell extracts of endophytic fungi, such as Fusarium solani RN1 and Chaetomium funicola RN3, greatly increased the accumulation of alkaloids in the cell suspension culture system [10].
Many studies have also revealed that Talaromyces species possess gene clusters associated with cell wall-degrading enzymes and secondary metabolites.Whole genome se quencing of strain T. piceus 9-3 revealed that its genome had a diverse set of lignocellulolytic enzymes, including two cellobiohydrolases, one endo-β-1,4-glucanase, and ten β-glucosidase gene clusters [19].The genome of strain T. pinophilus 1-95 contained two cellobiohydrolases, eight β-1,4-endoglucanases, 29 β-glucosidases, 97 hemicellulosedegrading enzymes, 24 α-amylases, and 52 secondary metabolism gene clusters [20].The genome of T. albobiverticillius Tp-2 contained eight distinct gene clusters responsible for the biosynthesis of secondary metabolites [17].The genome of T. albobiverticillius Tp-2 contained eight distinct gene clusters responsible for the biosynthesis of secondary metabolites.In this study, the whole genome sequencing of the Talaromyces DC2 strain was performed using PacBio Sequel and Illumina NovaSeq 6000 sequencing platforms.The Talaromyces DC2 strain has already been identified by our research as a prolific producer of vinca alkaloid with anticancer properties [21].The acquired whole genome sequencing data enrich our understanding of the relationships between gene clusters and metabolic products in the DC2 strain.Our results demonstrated that the DC2 strain has the capability to degrade pectin and starch, synthesize xylooligosaccharides and short-chain fructooligosaccharides, and produce swainsonine, varicidin A, asperterpenoid A, squalestatin S1, ustethylin A, and ilicicolin H, as well as perform the decarboxylation of L-tryptophan.Furthermore, the obtained whole genome data can serve as a valuable resource for future bioengineering research.

Fungal Strain
Strain DC2 was isolated from the surface sterilized stem of the Catharanthus roseus (L.) G. Don plant cultivated in Hanoi, Vietnam, with a yellow-colored colony as previously described [21].The protocol for obtaining endophytic fungi from plant materials has been elucidated in a previous study [21].Strain DC2 has already been proven to have the ability to produce anticancer compounds, including vincristine and vinblastine.Identification and quantification of vincristine and vinblastine produced by the DC2 strain were conducted by ultra-high performance liquid chromatography/multiple reaction monitoring mass spectrometry analyses.

Extraction of Genomic DNA
For DNA extraction, isolated endophytic fungi were inoculated in 100 mL potato dextrose broth (PDB; Sigma, Saint Louis, MO, USA) medium and cultured in 250 mL Erlenmeyer flasks at 25 • C in the dark on a rotary shaker at 200 rpm.After 7 days, the fungal biomass was harvested by centrifugation at 10,000 rpm for 15 min on an Eppendorf 5810R centrifuge (Eppendorf, Hamburg, Germany) and used for DNA extraction by the cetyltrimethylammonium bromide (CTAB) method with minor adjustments for optimization [22].Qualification and quantification of extracted DNA were measured using a Nanodrop ® 1000 spectrophotometer (Thermo Scientific, Waltham, MA, USA).

Genome Sequencing
The genomic DNA of strain DC2 was sequenced using the PacBio Sequel system (Menlo Park, CA, USA) and the Illumina NovaSeq 6000 (San Diego, CA, USA).For the PacBio sequencing library, 5-10 µg of genomic DNA was sheared into 10-15 kb fragments using a g-TUBE device (PerkinElmer, Ho Chi Minh City, Vietnam).Then the library was constructed using the SMRTbell Express Template Preparation Kit 2.0 (Pacbio, Menlo Park, CA, USA), following the manufacturer's protocol.In brief, the process involved amplifying the DNA fragments using barcoded DNA primers, resulting in a pooled collection of all the samples.For the Illumina sequencing library, the library was prepared using the VAHTS Universal Pro DNA Library Prep Kit (Vazyme, Nanjing, China) following the manufacturer's protocol.The generated library was cleaned up, and the sequencing process was carried out using 2-150 paired-end (PE) and 10-15 kb read length configurations for Illumina and PacBio sequencing, respectively.
Secondary metabolite biosynthetic clusters were identified using the antiSMASH web server (fungal version 7.0.1)with the default settings [27].

Genome Sequencing, Assembly, and Genomic Features
The Illumina sequencing data yielded a total of 39,360,260 clean reads, which corresponds to 5,900,310,550 bases.These readings had 91.83% of their bases with a quality score of Q30.On the other hand, the PacBio sequencing data produced 107,913 raw reads, totaling 340,232,173 bases.The N50 value was 3423 bp.After assembly, a total of 156 scaffolds were obtained, with a total size of 34,575,287 bp (Table 1).The final assembly revealed a GC content of 45.94%.The genome size of strain DC2 was compared to the recently available 75 genome sizes in NCBI, which range from 26.6 Mb of Talaromyces piceae strain 9-3 (GCA_001657655.1)to 42.5 Mb of Talaromyces nanjingensis strain JP-NJ4 (GCA_031010415.1)(Supplementary Table S1).
The 6509 genes were mapped to known enzyme pathways in six KEGG types: cellular processes, environmental information processing, genetic information processing, human diseases, metabolism, and organismal systems (Figure 1).The most abundant pathways in DC2 include carbohydrate metabolism (978), amino acid metabolism (908), signal transduction (824), and xenobiotics biodegradation and metabolism (796).The abundance of genes in the xenobiotics biodegradation and metabolism, as well as signal transduction pathways, suggests that strain DC2 is capable of metabolizing xenobiotics in its environments.
in DC2 include carbohydrate metabolism (978), amino acid metabolism (908), signal transduction (824), and xenobiotics biodegradation and metabolism (796).The abundance of genes in the xenobiotics biodegradation and metabolism, as well as signal transduction pathways, suggests that strain DC2 is capable of metabolizing xenobiotics in its environments.
Lignin is a complex polymer that embeds in cellulose and hemicellulose to strengthen the structure of the plant cell wall.The primary enzymes involved in lignin degradation are the laccase and peroxidase families [39].Our data analysis showed that the DC2 strain contains 15 genes that encode laccase and 1 gene that encodes peroxidase.This suggested that that the DC2 strain has the potential to break down the lignin matrix (Table 2).The six fungal strains, interestingly, do not possess any genes that encode liginiolytic enzymes.
Inulin is a fructan polysaccharide found in plants that serves as a storage carbohydrate.It consists of glucose molecules at the terminal end [46].The process of inulin conversion involves the utilization of the glycosyl hydrolase families GH32 and GH91, which include enzymes such as inulinase, invertase, levanase, 1-exohydrolases, fructanfructosyltransferases, and sucrose fructosyltransferases [47].The genome of strain DC2 contained five genes encoding endo-inulinase from the GH32 family, which is consistent with the discovery made in Apgergillus niger [48].Endo-inulinase breaks out the glycosidic bond β(2→1) to produce short-chain fructooligosaccharides [49].Furthermore, strain DC2 was found to contain a gene encoding the inulin binding domain, CBM38.The data suggest that strain DC2 is capable of degrading inulin as an endobiont.In addition, the short-chain fructooligosaccharides serve as prebiotics [49].Therefore, strain DC2 has the potential to be a great source for producing fructooligosaccharides that are similar to those produced by different Aspergillus strains [33,50,51].
There are six secondary metabolism clusters that have a gene similarity of 100% with six known biosynthetic clusters.These known clusters produce substances such as monascorubrin, YWA1, alternariol, ochratoxin, choline, and cyclic depsipeptide (Table 3).In region 9.1, one T1PKS was responsible for the biosynthesis of monascorubrin.This compound has been used as a natural red colorant for a wide range of foods in Asian countries [52].Monascrorubin has been identified in Talaromyces species such as T. marneffei [52] and T. atroroseus [53].In region 29.1, one T1PKS has been found to be responsible for the biosynthesis of naphthopyrone YWA1.This compound is considered a precursor of dihydroxynaphthalene (DHN)-melanin in Aspergillus nidulans [54] and aurofusarin in Fusarium graminearum [55].In region 30.2, one T1PKS was found to be accountable for production of alternariol, a toxic metabolite in Alternaria that showed multiple potential pharmacological effects [30].The alternariol has also been identified in T. pinophilus AF-02 [56].Region 34.1 was responsible for the biosynthesis of ochratoxin A, a potent pentaketide nephrotoxin produced by Aspergillus and Penicillium species.This toxin can be detected in fungal contaminated food, beverages, and feed [57].However, ochratoxin A was not included in the list of 238 secondary metabolite substances produced by Talaromyces species [18].In region 43.2, one NRPS-like was responsible for the biosynthesis of choline, which is an essential metabolite for the growth of filamentous fungi and the regulation of mycelial morphology [58].In region 69.1, one NRPS was identified as the catalyst for the production of cyclic depsipeptide.This occurs when the amide groups in the peptide structure are substituted with lactone bonds, which is facilitated by the presence of a hydroxylated carboxylic acid [59].Cyclic peptides were discovered in T. wortmannii [60].Seven BGC clusters exhibit the similarities ranging from 60% to 75%, including swainsonine (66%), varicidin A (71%), asperterpenoid A (66%), squalestatin S1 (60%), ustethylin A (70%), trichobrasilenol/xylarenic acid B/brasilane A/F/E/D (60%), and ilicicolin H (75%) (Figure 3).In region 7.1, one T1PKS was responsible for the biosynthesis of swainsonine, an indolizidine alkaloid that is produced by endophytic fungi and has the potential to be used as a drug for cancer therapy [61,62].In region 9.2, a single NRPS was found to be linked to the production of varicidin A. Varicidin A is an naturally occurring antifungal compound that is produced by a Diels-Alderase reaction [63].In region 24.1, a single terpene was responsible for the production of asperterpenoid A, a compound which exhibits strong inhibitory activity against Mycobacterium tuberculosis protein tyrosine phosphatase B [64][65][66].In region 30.1, a specific terpene was found to be accountable for the biosynthesis of squalestatin S1, which acts as a highly potent picomolar inhibitor of squalene synthase [67].Additionally, squalestatin S1 exhibits a wide range of antifungal properties and serves as a lead structure for the development of cholesterol-lowering drugs [68].In region 40.2, one T1PKS was responsible for the biosynthesis of ustethylin A, a compound synthesized by Aspergillus ustus [69].In region 78.1, a specific terpene was identified to be responsible for the biosynthesis of trichobrasilenol/xylarenic acid B/brasilane A/F/E/D.This is an unusual sesquiterpene alcohol synthesized by a sesquiterpene cyclase from Trichoderma sp.[70].In region 95.1, one NRPS was found to be accountable for the biosynthesis of ilicicolin H.This compound is a broad-spectrum antifungal agent that acts

The Indole Alkaloid Biosynthesis in the DC2
Decarboxylation of L-tryptophan leads to the formation of tryptamine, which serves as a common backbone for many secondary metabolites.One such metabolite is the pathway of terpenoid indole alkaloids in plants [76].In strain DC2, a DDC gene (Gene ID: g533) that encodes an aromatic L-amino acid decarboxylase (AADC) was identified (Figure 4).

The Indole Alkaloid Biosynthesis in the DC2
Decarboxylation of L-tryptophan leads to the formation of tryptamine, which serves as a common backbone for many secondary metabolites.One such metabolite is the pathway of terpenoid indole alkaloids in plants [76].In strain DC2, a DDC gene (Gene ID: g533) that encodes an aromatic L-amino acid decarboxylase (AADC) was identified (Figure 4).The open reading frame (ORF) of g533 had a length of 1536 nucleotides and corresponded to the coding sequence for 512 amino acids (Supplementary Figure S1).The g533 shared the closest genetic similarities to those of Talaromyces islandicus (CRG88687.1;93.58%) and Talaromyces rugulosus7 (XP_035346356.1;90.91%).The three sequences also formed a clade in the phylogenetic gene (Figure 5).The open reading frame (ORF) of g533 had a length of 1536 nucleotides and corresponded to the coding sequence for 512 amino acids (Supplementary Figure S1).The g533 shared the closest genetic similarities to those of Talaromyces islandicus (CRG88687.1;93.58%) and Talaromyces rugulosus7 (XP_035346356.1;90.91%).The three sequences also formed a clade in the phylogenetic gene (Figure 5).
In contrast to AADCs from animals and plants, fungal AADCs have not been extensively studied.The first description of a fungal AADC was reported by Niedens et al. [77].The authors demonstrated its broad substrate specificity, including L-tryptophan, L-tyrosine, L-phenylalanine, o-fluorophenylalanine, and p-fluorophenylalanine.Later, Kalb et al. [78] reported on the Ceriporiopsis subvermispora aromatic L-amino acid decarboxylases (CsTDCs) that were heterologously produced in a laboratory setting.The study identified that CsTDC exhibited strict specificity towards L-tryptophan and 5-hydroxy-L-tryptophan.Interestingly, AADC of strain DC2 in our study also contains the same sequence, 368 LGRRFR 373 , as CsTDC's sequence, 350 LGRRFR 355 , where G351 is the active site.However, CsTDC has a phenylalanine at residue 329, whereas that of g533 has tyrosine.This is similar to the amino acid sequence of PcDHPAAS, which is capable of converting L-3,4-dihydroxyphenylalanin to 3,4-dihydroxylphenylacetaldehyde [79].However, further investigation is required to assess the decarboxylation capacity of g533 towards aromatic amino acids.In contrast to AADCs from animals and plants, fungal AADCs have not been extensively studied.The first description of a fungal AADC was reported by Niedens et al. [77].The authors demonstrated its broad substrate specificity, including L-tryptophan, L-tyrosine, L-phenylalanine, o-fluorophenylalanine, and p-fluorophenylalanine.Later, Kalb et al. [78] reported on the Ceriporiopsis subvermispora aromatic L-amino acid decarboxylases (CsTDCs) that were heterologously produced in a laboratory setting.The study identified that CsTDC exhibited strict specificity towards L-tryptophan and 5-hydroxy-L-tryptophan.Interestingly, AADC of strain DC2 in our study also contains the same sequence, 368 LGRRFR 373 , as CsTDC's sequence, 350 LGRRFR 355 , where G351 is the active site.However, CsTDC has a phenylalanine at residue 329, whereas that of g533 has tyrosine.This is similar to the amino acid sequence of PcDHPAAS, which is capable of converting L-3,4-dihydroxyphenylalanin to 3,4-dihydroxylphenylacetaldehyde [79].However, further investigation is required to assess the decarboxylation capacity of g533 towards aromatic amino acids.

Conclusions
In summary, whole genome sequencing has provided a comprehensive understanding of Talaromyces sp.DC2, encompassing its overall functions of CAZymes and secondary metabolites.Genome analysis showed that strain DC2 might serve as a potential source for the degradation of pectin and starch, the synthesis of xylo-oligosaccharides and shortchain fructooligosaccharides, and the production of swainsonine, varicidin A, asperterpenoid A, squalestatin S1, ustethylin A, and ilicicolin H. Additionally, it has the ability to carry out the fungal decarboxylation of L-tryptophan.Furthermore, the obtained genome sequencing data can serve as a valuable resource for future bioengineering research.However, further investigations are required to confirm the distinct characteristics and feasibility of the Talaromyces DC2 strain.

Conclusions
In summary, whole genome sequencing has provided a comprehensive understanding of Talaromyces sp.DC2, encompassing its overall functions of CAZymes and secondary metabolites.Genome analysis showed that strain DC2 might serve as a potential source for the degradation of pectin and starch, the synthesis of xylo-oligosaccharides and short-chain fructooligosaccharides, and the production of swainsonine, varicidin A, asperterpenoid A, squalestatin S1, ustethylin A, and ilicicolin H. Additionally, it has the ability to carry out the fungal decarboxylation of L-tryptophan.Furthermore, the obtained genome sequencing data can serve as a valuable resource for future bioengineering research.However, further investigations are required to confirm the distinct characteristics and feasibility of the Talaromyces DC2 strain.

Supplementary Materials:
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jof10050352/s1, Figure S1: The ORF and amino acid translation of DDC gene in the genome of strain DC2; Table S1: Comparison analysis of genome features of strain DC2 and 75 available Talaromyces strain in NCBI.

Figure 4 .
Figure 4.The indole alkaloid biosynthetic pathway gene found in Talaromyces sp.DC2 is depicted by the red-colored box.AADC, aromatic L-amino acid decarboxylase.

Figure 4 .
Figure 4.The indole alkaloid biosynthetic pathway gene found in Talaromyces sp.DC2 is depicted by the red-colored box.AADC, aromatic L-amino acid decarboxylase.

Figure 5 .
Figure 5. Phylogenetic tree includes the amino acid sequences of putative aromatic L-amino acid decarboxylases.MUSCLE v.5.0 was used for sequence alignment and a neighbor-joining algorithm was used to construct the tree in the Mega-X v.10.2.6.

Figure 5 .
Figure 5. Phylogenetic tree includes the amino acid sequences of putative aromatic L-amino acid decarboxylases.MUSCLE v.5.0 was used for sequence alignment and a neighbor-joining algorithm was used to construct the tree in the Mega-X v.10.2.6.

Table 1 .
Genome summary statistics for Talaromyces sp.DC2 and related strains.

Table 3 .
Putative biosynthetic gene clusters of Talaromyces sp.DC2 showed similarity to known gene clusters in the minimum information about a biosynthetic gene cluster database.