Whole-Genome Sequence Analysis of an Endophytic Fungus Alternaria sp. SPS-2 and Its Biosynthetic Potential of Bioactive Secondary Metabolites

As one of the commonly isolated endophytic fungi, Alternaria has been known for the production of numerous secondary metabolites (SMs). However, its detailed genomic features and SM biosynthetic potential have not been extensively studied thus far. The present work focuses on the whole-genome sequencing and assembly of an endophytic strain Alternaria sp. SPS-2 derived from Echrysantha chrysantha Lindl. and gene annotation using various bioinformatic tools. The results of this study suggested that the genome of strain SPS-2 was 33.4 Mb in size with a GC content of 51% and an N50 scaffold of 2.6 Mb, and 9789 protein-coding genes, including 644 CAZyme-encoding genes, were discovered in strain SPS-2 through KEGG enrichment analysis. The antiSMASH results indicated that strain SPS-2 harbored 22 SM biosynthetic gene clusters (BGCs), 14 of which are cryptic and unknown. LS–MS/MS and GNPS-based analyses suggested that this endophytic fungus is a potential producer of bioactive SMs and merits further exploration and development.


Introduction
Endophytes have been recognized as one of the important sources of bioactive natural products with therapeutic potential [1,2]. Alternaria strains are commonly detected in plants and have been shown to be a treasure trove of secondary metabolites (SMs) since as many as 482 substances have been isolated and structurally elucidated from this genus by extensively searching the Dictionary of Natural Products database (accessed on 10 July 2022). Structurally, the SM inventory of the genus Alternaria consists of diverse groups, including terpenes, pyrones, cyclopeptides, nitrogen-containing compounds, and miscellaneous, which showed a variety of biological activities, such as antimicrobial effect, phytotoxicity, and cytotoxicity [3][4][5]. Additionally, these SMs are biosynthesized via an assembly line process that is catalyzed by modular polyketide synthases (PKSs), non-ribosomal peptide synthetase (NRPS), NRPS-PKS, terpene, and ribosomally synthesized and posttranslationally modified peptides (RiPPs). Thus far, however, the detailed genomic features and SM biosynthetic potential of the Alternaria strain have been seldom reported.
In our continuous search for new SMs from endophytic fungi, an endophytic Alternaria strain SPS-2 was obtained from Echrysantha chrysantha Lindl., one of the traditional Chinese medicine used for the treatment of the swelling of the eye, ophthalmalgia, delacrimation, nephelium of the eye, and nocturnal emission [6,7]. To better understand this endophytic fungus and explore its SM biosynthetic potential, in this study, whole-genome sequencing and assembly were conducted by hybridizing next-generation sequencing (NGS) and using the Illumina MiSeq platform. Its gene annotation was extensively predicted using various BLAST databases, including non-redundant (Nr) protein sequence, Swiss-Prot, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and EuKaryotic Orthologous Groups (KOG), and Carbohydrate-Active Enzyme (CAZy) databases. Additionally, an antibiotic and secondary metabolite analysis database (antiSMASH) was used to determine the SM biosynthetic potential of strain SPS-2.

Strain, Cultivation, and Crude Extracts Preparation
Endophytic strain SPS-2 was isolated and purified from leaves of the coastal plant E. chrysantha Lindl., which naturally grows on the bank of the intertidal zone of Hangzhou Bay (China). A suspension of culture containing mycelia in a potato dextrose agar (PDA) medium supplemented with glycerol (20% v/v) was stored at −80 • C in our lab at Zhejiang University of Technology (China).
Strain SPS-2 was cultured on PDA at 28 • C for 7 days. A balanced amount of fungal colony was transferred to culture broth in a 500 mL Erlenmeyer flask, which contained 250 mL potato dextrose broth (PDB) consisting of potato 200 g/L and glucose 20 g/L. This was followed by shaking at 200 rpm at 28 • C for 2 days that prepared it as seed broth; then, the seed broth was transferred to a fluid medium (glucose 20 g/L, maltose 20 g/L, mannitol 20 g/L, glutamate 10 g/L, peptone 5 g/L, and yeast extract 3 g/L) in a 1 L Erlenmeyer flask. After two weeks of incubation (200 rpm, 30 • C), all mediums (approximate 30 L) were extracted three times with the same volume of ethyl acetate (Fangping Chemical Co., Ltd., Hangzhou, China); the upper solvent was evaporated at 25 • C in a vacuum to yield extract (about 12.35 g).

Phylogenetic Analysis
Strain SPS-2 was inoculated into the PDA medium for a few days to culture and become cultivated, followed by morphological observations and the nuclear ribosomal internal transcribed spacer (ITS) gene amplicon sequencing. First, the sequence of ITS was amplified via polymerase chain reaction (PCR) with ITS1-ITS4 primer pairs. Additionally, 3 µL of the PCR product was taken for 1% agarose gel electrophoresis detection to confirm the PCR amplification fragment. Then, the products were recovered using an AxyPrep DNA Recovery Kit. Finally, sequencing was performed with an ABI3730-XL sequencer. Meanwhile, the ITS sequence with 1199 bp was submitted to GenBank in the NCBI database, and an accession number was acquired. A phylogenetic tree was constructed using MEGA (version 7.0, https://www.megasoftware.net/, accessed on 25 June 2022) by comparison with other Alternaria strains with high similarities in the NCBI database.

Genome Sequencing and Assembly
Strain SPS-2 was inoculated into the PDA medium and cultivated for three days at 28 • C. The fungal chromosome was extracted using the cetyltrimethylammonium bromide (CTAB) extraction protocol. The integrity and purity were assessed via 1% agarose gel electrophoresis, and then dissolved in sterile water and adjusted to a concentration of 149 ng/µL.
Gene annotation is the functional analysis of all protein-coding genes, including predictions of information such as motifs, domains, protein functions, and metabolic pathways. Functional annotations on putative genes were carried out using the following bioinformatic tools: BLAST for Swiss-Prot and KOG databases (E-value threshold of 10 −6 ), KAAS (version 2.1, Gokasho, Kyoto, Japan) for KEGG annotation [18], and InterPro (version 66.0, Hinxton, Cambridgeshire, UK) for GO annotation [19]. The Diamond (version 0.9.10.111, Tübingen, Germany) software was then used to run protein-coding genes against the NCBI nr database (E-value threshold of 10 −6 ) [20].

Prediction of CAZymes
Carbohydrates contain a considerable amount of biological information and, therefore, are beneficial to analyze the metabolic process of strain SPS-2 and the differences between strains. The Hmmscan software (v3.1b2, http://hmmer.org/, accessed on 10 May 2022) was employed in prediction and annotation of the presence of carbohydrate-active enzyme (CAZyme)-related genes in strain SPS-2. Meanwhile, the data of 12 other Alternaria strains obtained from the CAZy database were compared with these of strain SPS-2 to further understand its carbohydrate degradation capacity.

Metabolomic Profiling of Strain SPS-2 Using LC-MS/MS
The LC-MS/MS analysis of crude extract of strain SPS-2 was conducted on a QTOF mass spectrometer (AB SCIEX X500B). A 1.8 µm Agilent ZORBAX Extend C18 (2.1 mm × 100 mm), maintained at 35 • C, was operated using a gradient elution of H 2 O and MeOH, running at 0.3 mL/min. The gradient program was as follows: 10% MeOH for 2 min, 10-100% MeOH for 16 min, and 100% MeOH for 5 min. Data dependent acquisition (DDA) mode in mass spectrometry was recorded in positive ion mode with a spray voltage of 5.5 kV and a temperature of 550 • C. Then, the raw LC-MS/MS data files were converted into an mzXML format using MSConvert, and the data were analyzed with MZ mine2 (version 2.53, https://github.com/mzmine/mzmine2/releases, accessed on 16 July 2022). Then, the data were uploaded to the Global Natural Products Social Molecular Networking (GNPS) web platform for molecular networking [21]. The Cytoscape software (version 3.9.1, https://cytoscape.org/, accessed on 16 July 2022)was used to visualize the resulting molecular networking, and known metabolites were annotated by comparing the mass and mass fragmentation pattern with GNPS spectral libraries [22].

Morphology and Phylogenetic Analysis of Strain SPS-2
Spores of strain SPS-2 developed after culturing on a PDA medium for 6 days at 28 Global Natural Products Social Molecular Networking. Its white colonies grew rapidly on the PDA medium at the initial stage and darkened after 6 days ( Figure 1). Subsequently, a phylogenetic tree was constructed for strain SPS-2 on the basis of its ITS sequence (GenBank accession no.: ON872220.1) and analysis, suggesting that this strain is the most closely related to the genus Alternaria (Figure 2).

Genome Sequencing and Assembly
The genome sequencing of strain SPS-2 afforded a sequence with a length of 33,400,178 bp with a G + C content of 51.0% and an N50 value of 2,610,814 bp (Table 1). Additionally, the integrity was 95.2%, indicating that the quality of the genome assembly was high (Table S1). The total CDS sequence length was 13,553,185 bp, accounting for 40.58% of the genome. Comparison with other Alternaria strains deposited in the NCBI database (Table 2) revealed that the genome size of strain SPS-2 was average, but it had the fewest gene-coding number. For non-coding RNA, 12 rRNAs,108 tRNAs, and 32 ncRNAs were predictably discovered in the genome of strain SPS-2 ( Table 3). The number of repeating sequences was determined as 378,941 bp (1.13% of the genome), which was fewer than that of other Alternaria strains, such as A. solani (1.5%), A. alternantherae (16.5%), and A. avenicola (11.9%) [23]. The number of long terminal repeats was 26,157 bp, representing 0.08% of the whole genome, while the number of DNA elements was 12,782 bp, only occupying 0.04%.

Genome Annotation
To conduct the functional annotation of the gene model in strain SPS-2, the blast search function was used to enter the putative protein-coding sequences into the NR, KOG, KEGG, Swiss-Prot, and GO databases. There were 6181 (66.31%) annotated genes for the 3 main GO categories of biological process, cellular component, and molecular function, including 48 subcategories (Figure 3a). The molecular function component was mainly distributed across molecular activity, ion binding, and oxidoreductase activity. The biological processes, cellular nitrogen compound metabolic processes, and biosynthetic processes contained the most proteins. However, compared with SPS-2, the molecular function component of A. alternata Y784-BC03 was detected to be involved in catalysis, binding, and transport [24]. To further understand the functions of the strain SPS-2 protein, 3241 (34.77%) genes were annotated and assigned to 45 different KEGG pathways (Figure 3b). "Carbohydrate metabolism" was the most enriched pathway, followed by "amino acid metabolism" and "translation". These results suggested the presence of an enriched and varied array of carbohydrates and amino metabolic functions that enable higher energy conversion efficiency. Similarly, the KEGG analysis of the predicted genes of strains Y784-BC03 revealed an abundant number of metabolic pathways, and many of the predicted genes were associated with the biosynthesis of SMs. Among these 25 KOG functional categories, most of the genes were associated with "carbohydrate transport and metabolism", followed by "post-translational modification, protein", "secondary metabolite biosynthesis, transport, and catabolism", and "amino acid transport and metabolism" (Figure 3c).

CAZyme Analysis
CAZymes as one of the most important gene families in the fungal genome are involved in lignocellulose degradation and some other biological processes [25,26]. A total of 644 genes were annotated as the CAZyme family in strain SPS-2, including 257 glycoside hydrolases (GHs), 146 auxiliary activities (AA), 118 carbohydrate esterases (CEs), 83 glycosyl transferases (GTs), 25 polysaccharide lyases (PLs), and 15 carbohydrate-binding modules (CBMs) (Figure 4, Table S2). Accordingly, GHs occupied the predominant genes in all the predicted CAZymes of strain SPS-2. By comparison with other Alternaria strains from the CAZy database, it was found that this strain harbored the most abundant CAZymes including GHs, CEs, and PLs, suggesting that it has the strongest capability for plant biomass decomposition [27].

Analysis of Secondary Metabolite Biosynthetic Potential
The antiSMASH results indicated that strain SPS-2 possessed 22 BGCs for SM biosynthesis, which was similar to the other 10 Alternaria strains ( Figure 5, Table S3). Additionally, its BGC inventory consisted of 10 NRPSs, 7 PKSs, 4 terpenes, and 1 fungal-RiPP, of which 8 BGCs with high similarity with known gene clusters are putatively responsible for the production of equisetin, betaenones A-C, alternariol, dimethylcoprogen, and melanin ( Figure 6, Table S4). However, the biosynthetic products of other BGCs in strain SPS-2 cannot be characterized and need to be further investigated and unveiled.  Region 11.1, one NRPS BGC, displayed 45% similarity with the BGC from F. heterosporum (GenBank: KC439347.1) responsible for the biosynthesis of equisetin, which is an antibacterial agent and selectively inhibits Staphylococci and Mycobacteria by inhibiting specific ATPases or ionophores in bacterial and mitochondrial inner membranes [28]. Although some genes of region 11.1 showed high similarity with equisetin synthetase (Figure 7a, Table S5) [29], other genes did not share significant sequence homology, suggesting that region 11.1 might produce other alkaloids with similar structures of equisetin.  Region 21.2 displayed significant similarity with the BGC of betaenones A-C in Phoma betae (GenBank: LC011911.1). These metabolites exhibited strong inhibitory effects on PKC, CDK4, and EGF-R tyrosine kinases, with the corresponding IC 50 values of 36.0, 11.5, and 10.5 µM, respectively, and antiangiogenic activity [30]. Genes Bet1 and Bet3 with similarities of 80% and 90%, respectively, are key enzymes in betaenone biosynthesis ( Figure 7b, Table S6) [31].
Region 8.2 displayed a high similarity with alternariol (AOH) BGC from Parastagonospora nodorum SN15 (GenBank: KP941080.1). Core gene ctg_1223 is a PKS-related enzyme, which had a significant BLAST hit with the key gene AKN45693.1 for the synthesis of AOH, an important mycotoxin used to alter the action of glutathione and the enzymes involved in the redox system as well as causing DNA damage [32]. Gene pksI is a core gene for AOH synthesis, and it usually co-expresses with other genes including O-methyltransferases, FADdependent monooxygenases, short-chain dehydrogenase, and other different enzymes to produce alternariol derivatives (Figure 7c, Table S7) [33].
One PKS BGC with 100% homology to that (GenBank: JQ973666.1) of A. alternate presumably for dimethylcoprogen biosynthesis was discovered in region 19.1. This compound is a novel trihydroxamate siderophore [34], and its core NRPS gene was found to have a highly similar sequence to that of AFN69082.1 in BGC0001249 (Figure 7d, Table S8).
Region 8.1 displayed 100% similarity with the BGC (GenBank: JQ973666.1) responsible for the biosynthesis of melanin, which is a ubiquitous pigment with potent resistance to environmental stress such as UV radiation [35]. Its pksI was determined as the essential gene for melanin biosynthesis in Bipolaris oryzae (Figure 7e, Table S9) [36].

Molecular Networking Analysis
The feature-based molecular networking for the metabolite profile of strain SPS-2 was created with the Global Natural Products Social Molecular Networking (GNPS) analysis tool. In addition to five cyclodipeptides (1-5), a number of alkaloids (6-13) were detected and characterized in the crude extract of strain SPS-2 ( Figure 8). Among these SMs, compound 7 was originally obtained from strain Sorangium cellulosum. Therefore, ce38 exhibited potent anticancer activity and a strong effect on lysosomes [37]. Compound 11 displayed a cytotoxic effect on HepG-2 cells and had the potential to antagonize depression and anxiety [38,39], while compound 12 had AChE inhibitory capacity, with an IC 50 value of 12.24 ± 0.12 [40], and 13 showed an antiviral effect [41]. These results indicated strain SPS-2 has great potential to biosynthesize many bioactive SMs. However, the GNPS-predicted compounds were not similar to those analyzed with antiSMASH, which might be attributed to the fact that some BGCs in strain SPS-2 are silent under traditional laboratory conditions. Furthermore, substances with molecular weights of over 600 Dalton cannot be identified in the GNPS repository, suggesting that this strain has the potential to produce new SMs.

Conclusions
In this work, a high-quality de novo genome of an endophytic Alternaria sp. SPS-2 was obtained via whole-genome sequencing and assembly. Genome analysis suggested that strain SPS-2 is capable of encoding a wide array of CAZy and secondary metabolic enzymes. The antiSMASH results indicated that this strain harbored 22 secondary metabolite BGCs including 7 PKSs, 10 NRPSs, 4 Terpenes, and 1 fungal-RiPP, and 14 of these BGCs are unknown and have the biosynthetic potential of new SMs. LS-MS/MS and GNPS-based analyses suggested that this endophytic fungus is a potential producer of bioactive SMs. Therefore, future efforts should be focused on awakening these cryptic BGCs in strain SPS-2 to produce more SMs using various approaches, such as OSMAC strategy [42], heterologous gene expression [43], and transcriptional regulation [44].
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/microorganisms10091789/s1, Table S1: Genome assembly integrity assessment for strain SPS-2; Table S2: CAZyme profiles of strain SPS-2 and other twelve Alternaria strains deposited in CAZyme databases; Table S3: The number and type of secondary metabolite BGCs in strain SPS-2 and other Alternaria strains deposited in antiSMASH database; Table S4: Secondary metabolite BGCs of strain SPS-2 by antiSMASH analysis;