Genome Mining and Molecular Networking-Based Metabolomics of the Marine Facultative Aspergillus sp. MEXU 27854

The marine-facultative Aspergillus sp. MEXU 27854, isolated from the Caleta Bay in Acapulco, Guerrero, Mexico, has provided an interesting diversity of secondary metabolites, including a series of rare dioxomorpholines, peptides, and butyrolactones. Here, we report on the genomic data, which consists of 11 contigs (N50~3.95 Mb) with a ~30.75 Mb total length of assembly. Genome annotation resulted in the prediction of 10,822 putative genes. Functional annotation was accomplished by BLAST searching protein sequences with different public databases. Of the predicted genes, 75% were assigned gene ontology terms. From the 67 BGCs identified, ~60% belong to the NRPS and NRPS-like classes. Putative BGCs for the dioxomorpholines and other metabolites were predicted by extensive genome mining. In addition, metabolomic molecular networking analysis allowed the annotation of all isolated compounds and revealed the biosynthetic potential of this fungus. This work represents the first report of whole-genome sequencing and annotation from a marine-facultative fungal strain isolated from Mexico.


Introduction
Fungi play an essential ecological role in both terrestrial and aquatic environments. It is now well-known that marine-derived fungi are an excellent source of novel natural products (NPs), from which numerous compounds have been isolated for drug development [1][2][3]. Marine Aspergillus have yielded over 30% of the total marine-microbial NPs [4]. This genus has 11 species listed in the World Register of Marine Species [5]. Among them, human pathogens and allergens (A. fumigatus), plant pathogens (A. flavus), model organisms (A. nidulans), and species with industrial applications (A. niger and A. terreus) can be found [5]. Efforts to sequence the genome of Aspergillus species began in 2005, allowing to answer questions related to evolution, ecological adaptation, and pathogenicity [6][7][8]. Since then, more robust and less expensive sequencing technologies have been developed, which has resulted in an increase in the number of Aspergillus genome assemblies in public databases. Currently, there are 374 released isolate assemblies in the NCBI Genbank (Table S1) [9]. From these, six are described as a complete genome, 19 at the chromosome level, and the rest at the scaffold or contig level (Table S1). A. terreus, the most important lovastatin producer, was the first member of the section Terrei whose genome was sequenced [7]; however, it is barely a scaffold. In our previous studies on the marine-facultative Aspergillus sp. MEXU 27854 (section Terrei), we isolated two new dioxomorpholines and three new derivatives, along with one new cyclic pentapeptide and the known compounds PF1233 A and B, and butyrolactone II [10,11]. Interestingly, these rare dioxomorpholines showed P-glycoprotein inhibition, which is associated with drug resistance in cancer therapy [10,12,13]. To date, the only partially known dioxomorpholines biosynthetic pathway was described for acu-dioxomorpholines A and B using a platform for screening and heterologous expression of intact and entire gene clusters that use fungal artificial chromosomes and metabolomic scoring (FAC-MS) [12]. Thus, studying the biosynthetic pathway of this group of compounds and its derivatives is of great interest. To better understand the biosynthesis of this class of compounds, we sequenced the genome of the strain MEXU 27854 and, using genome mining analysis, predicted a putative biosynthetic gene cluster (BGC) for the dioxomorpholines and other metabolites from this fungus.

General Genomic Features of Aspergillus sp. MEXU 27854
The marine-facultative Aspergillus sp. MEXU 27854 was subjected to whole-genome sequencing using the PacBio technology. A total of 7 GB of sequencing data with an average read length of 9183 bp were generated. For assembly, reads over 12 kb in length were used (~84× coverage), which resulted in 11 contigs with a total and N50 read length of 30,756,112 bp and 3,946,678 bp, respectively (GenBank accession no. JAGMTT000000000). The sequencing quality statistics and predicted genomic information for this strain are shown in Table 1. The completeness of the assembly (10,822 total predicted genes; Table S2) was relatively high, as indicated by a BUSCO score of 99.5% (complete and single copy, 4153; duplicated, 15; fragmented, 6; missing, 14; n, 4191) when compared with genes conserved in the Eurotiales. Moreover, a total of 2744 predicted genes encoding hypothetical proteins without apparent homologs to currently available sequences were found in the fungal strain genome. According to the Gene Ontology (GO) database, 8078 predicted proteins that accounted for 75% of the entire genome were mainly distributed in four functional entries: binding, catalytic activity, transporter activity, and metabolic process ( Figure 1A and Table S2). In addition, functional gene annotation was successfully assigned to 1116 (14%) putative proteins to their orthologs using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database ( Figure 1B and Table S3).
In an earlier study, MEXU 27854 was phylogenetically identified as belonging to the Aspergillus section Terrei, based on CaM phylogeny [14]. The strain occurred in an isolated position, in a clade sister to A. niveus, A. carneus, A. alhabadii, A. neoindicus, and A. aureoterreus. In the present study, sections of CaM and RPB2, single-copy protein-coding genes used for identification of Aspergillus, were extracted from the genome (Supplementary FASTA files). A BLAST search from these two regions also supported that MEXU 27845 showed homology with the above Aspergillus strains in the Terrei section, but only with ≥94% sequence similarity. These preliminary analyses suggest that MEXU 27845 might be a putatively new species of Aspergillus from Mexico.  In an earlier study, MEXU 27854 was phylogenetically identified as belonging to the Aspergillus section Terrei, based on CaM phylogeny [14]. The strain occurred in an isolated position, in a clade sister to A. niveus, A. carneus, A. alhabadii, A. neoindicus, and A. aureoterreus. In the present study, sections of CaM and RPB2, single-copy protein-coding genes used for identification of Aspergillus, were extracted from the genome (Supplementary FASTA files). A BLAST search from these two regions also supported that MEXU 27845 showed homology with the above Aspergillus strains in the Terrei section, but only with ≥94% sequence similarity. These preliminary analyses suggest that MEXU 27845 might be a putatively new species of Aspergillus from Mexico.

Function (% Identity to Corresponding Adx/Not
Proteins) a Adx Proteins (AA)

Function (% Identity to Corresponding Not
Transcriptional factor (2343) a Gene function predicted using BLAST search. (−), homology cannot be calculated due to unrelatedness.
Transcriptional factor (2343) a Gene function predicted using BLAST search. (−), homology cannot be calculated due to unrelatedness.

Cyclic Peptides
Cyclic peptides from fungi are a well-known family of secondary metabolites with interesting structures and biological activities [25]. There are over 50 cyclic pentapeptides reported from fungi, and <20 produced by Aspergillus spp. [25]. Recently, we discovered the new N-methyl cyclic pentapeptide caletasin (8) from Aspergillus sp. MEXU 27854 [11], which is closely related to cotteslosins A and B produced by A. versicolor [26] and the sansalvamides produced by F. solani [27]. Genome mining, gene function, and protein sequence similarity analysis allowed the prediction of the caletasin BGC in the strain MEXU 27854 ( Figure 5). The main protein, Calsyn, is a putative NRPS (GenBank accession no. MZ503796) organized in five essential adenylation and condensation domains for the pentapeptide formation ( Figure 6). The calsyn NRPS gene sequence has 30% identity to the NhNPS5 sansalvamide synthase (GenBank accession no. XP_003044554). Eight additional genes were also predicted without clear participation in the peptide biosynthesis.

Cyclic Peptides
Cyclic peptides from fungi are a well-known family of secondary metabolites with interesting structures and biological activities [25]. There are over 50 cyclic pentapeptides reported from fungi, and <20 produced by Aspergillus spp. [25]. Recently, we discovered the new N-methyl cyclic pentapeptide caletasin (8) from Aspergillus sp. MEXU 27854 [11], which is closely related to cotteslosins A and B produced by A. versicolor [26] and the sansalvamides produced by F. solani [27]. Genome mining, gene function, and protein sequence similarity analysis allowed the prediction of the caletasin BGC in the strain MEXU 27854 ( Figure 5). The main protein, Calsyn, is a putative NRPS (GenBank accession no. MZ503796) organized in five essential adenylation and condensation domains for the pentapeptide formation ( Figure 6). The calsyn NRPS gene sequence has 30% identity to the NhNPS5 sansalvamide synthase (GenBank accession no. XP_003044554). Eight additional genes were also predicted without clear participation in the peptide biosynthesis.

Butyrolactones
A chemical study of a fresh organic extract of Aspergillus sp. MEXU 27854 yielded 3-O-methylbutyrolactone II (9), which was identified by NMR and HRMS analysis ( Figures  S8 and S9, and Table S5) [28]. This secondary metabolite was isolated in 2015 from a

Butyrolactones
A chemical study of a fresh organic extract of Aspergillus sp. MEXU 27854 yielded 3-O-methylbutyrolactone II (9), which was identified by NMR and HRMS analysis ( Figures  S8 and S9, and Table S5) [28]. This secondary metabolite was isolated in 2015 from a gorgonian-derived Aspergillus strain [28] and is the methyl-derivative of butyrolactone II (10), previously identified in the MEXU 27854 strain [11]. Biosynthesis of 9 and 10 is carried out by an NRPS-like (btyA) and an S-adenosyl methionine (SAM)-methyltransferase (btyB), as demonstrated in A. terreus and A. nidulans [29][30][31]. As expected, the comparative analysis of the genomes of the strain MEXU 27854 and A. terreus showed high similarity (80% with btyA and 75% with btyB) in the BGC of these compounds (Table 4).

Mass-Spectrometry-Based Metabolomics Analysis
The most commonly used technique for targeted and untargeted metabolomic analysis of NPs is liquid chromatography-tandem mass spectrometry (LC-MS/MS). This platform provides high sensitivity and selectivity for compounds' identification. Moreover, data analysis tools, such as principal component analysis and molecular networking (MN), are required to show the chemical space and diversity of the features or metabolites in the samples, which could be correlated to the functional phenotype of the natural source [32]. In this work, the Global Natural Products Social (GNPS) MN platform was used to further explore the biosynthetic potential of Aspergillus sp. MEXU 27854. The comprehensive network for this fungus was generated for spectra with a minimum of four fragment ions (Figure 7). Feature-based MN grouped the metabolite features into 52 chemical families (>3 nodes). Interestingly, only the known compounds asperphenamate and butyrolactone II (10) were annotated by GNPS (Table 5). Asperphenamate is produced by A. flavus and, even though it was not isolated, it is likely to be produced in strain MEXU 27854 because we found its BGC with a high percentage of similarity in the MEXU 27854 strain (Table 3). In addition, we were able to manually annotate all isolated compounds from this strain: dioxomorpholines and derivatives 1-7, caletasin (8), and butyrolactones (9 and 10), because MS data from all pure compounds was included in the MN analysis (Table 5 and Figure 7).

Strain and DNA Isolation
Aspergillus sp. MEXU 27854 was isolated from sandy soil collected in the intertidal zone located in Caleta Bay, Acapulco, Guerrero, Mexico [10]. High-molecular-weight (HMW) genomic DNA was obtained from a pure culture of the strain using a modified phenol-chloroform DNA isolation protocol [33]. Briefly, the strain was cultivated on 30 mL of YESD (1% of yeast extract, 2% of soy peptone, 2% of dextrose) medium and incubated for 4 days at room temperature and 100 rpm (Lab Companion, Billerica, MA, USA). Ground mycelium (400-600 mg) was mixed with 300 µL of EB buffer (10 mM Tris-HCl, pH 8.5), then an equal volume of a PCI (phenol−chloroform−isoamyl alcohol) (25:24:1) solution was added. After vortexing for 1 min and centrifuging at 12,000 rpm for 5 min (HERMLE Labortechnik GmbH, Wehingen, Germany), the top aqueous layer was transferred into a new tube. An equal volume of a chloroform−isoamyl alcohol (24:1) solution was added to the new tube, vigorously vortexed for 1 min, and centrifuged (12,000 rpm) for 5 min. The aqueous layer was again transferred into a new tube. After cooling at −20 °C for 2 h with ethanol (100%) and NH4OAc (0.75 M final concentration), a precipitate was obtained. The pellet was washed with ethanol (70%) and air-dried for 2 min. RNAse A

Strain and DNA Isolation
Aspergillus sp. MEXU 27854 was isolated from sandy soil collected in the intertidal zone located in Caleta Bay, Acapulco, Guerrero, Mexico [10]. High-molecular-weight (HMW) genomic DNA was obtained from a pure culture of the strain using a modified phenol-chloroform DNA isolation protocol [33]. Briefly, the strain was cultivated on 30 mL of YESD (1% of yeast extract, 2% of soy peptone, 2% of dextrose) medium and incubated for 4 days at room temperature and 100 rpm (Lab Companion, Billerica, MA, USA). Ground mycelium (400-600 mg) was mixed with 300 µL of EB buffer (10 mM Tris-HCl, pH 8.5), then an equal volume of a PCI (phenol−chloroform−isoamyl alcohol) (25:24:1) solution was added. After vortexing for 1 min and centrifuging at 12,000 rpm for 5 min (HERMLE Labortechnik GmbH, Wehingen, Germany), the top aqueous layer was transferred into a new tube. An equal volume of a chloroform−isoamyl alcohol (24:1) solution was added to the new tube, vigorously vortexed for 1 min, and centrifuged (12,000 rpm) for 5 min. The aqueous layer was again transferred into a new tube. After cooling at −20 • C for 2 h with ethanol (100%) and NH 4 OAc (0.75 M final concentration), a precipitate was obtained. The pellet was washed with ethanol (70%) and air-dried for 2 min. RNAse A was used for DNA purification and re-extracted with PCI solution. HMW DNA was quantified using a UV-Vis BioDrop µLITE+ (Biochrom, Cambridge, United Kingdom). The quality of the genomic material was assessed on a Qubit 3.0 fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) and an automated electrophoresis 4200 TapeStation system (Agilent Technologies, Santa Clara, CA, USA).

Sequencing and Assembly
Genome sequencing of the HMW DNA of the fungal strain was performed at the Centre d'expertise et de services Génome Québec in Quebec, Canada, using single-molecule real-time (SMRT) sequencing (0.5 SMRT cells from a single library) with the Sequel II system (Pacific Biosciences, Menlo Park, CA, USA). PacBio sequence data was error-corrected and assembled with the SMRT Link v9.0 software (Pacific Biosciences). Benchmarking Universal Single-Copy Orthologs (BUSCO) software was used to assess the completeness of genome assembly with single-copy orthologs [34]. BUSCO v2.0 was run on the genome assembly (using A. terreus genome as template). The lineage dataset of BUSCO was Eurotiales_odb10 (creation date: 2020-11-10).

Genome Annotation and BGC Prediction
Gene prediction was performed by AUGUSTUS version 3.3.3 [35]. The resulting gene sets were integrated to obtain the most comprehensive and non-redundant reference genes. The functional annotations of predicted genes were mainly based on homology to known annotated genes within different databases using the OmicsBox 1.4.12 platform as the main tool [36]. To achieve their corresponding annotation, protein models were aligned with the National Center for Biotechnology Information (NCBI) non-redundant (nr) database Blast2Go, InterPro [37], GO (http://geneontology.org/; accessed on 2 September 2021), and KEGG (https://www.genome.jp/kegg/; accessed on 2 September 2021). AntiSMASH fungal v.5.0 software was employed to predict the gene clusters of secondary metabolites with the cluster-finder algorithm for BGC border prediction and default settings [38]. Comparative bioinformatics analyses of the catalytic domains of the putative proteins and BGCs of the dioxomorpholines and other compounds were performed using ClustalX 2.1 [39].

Extract Preparation and Chemical Study
An organic extract (1.15 g) from a fresh solid culture (100 g of rice and 200 mL of H 2 O) of Aspergillus sp. MEXU 27854 was prepared as previously described [11]. From this, 1.0 g was fractionated via flash chromatography on a RediSep Rf Gold Si-gel column (40 g of Si-gel; Teledyne Inc., Thousand Oaks, CA, USA) using sequential mixtures of n-hexane−CHCl 3 −MeOH. Fifteen primary fractions were obtained according to their UV and ELSD profiles. Fraction eight (99 mg) was subjected to preparative HPLC (Kinetex C18 column 250 mm × 21.2 mm I.D., 5.0 µm, 100 Å; Phenomenex Inc., Torrance, CA, USA) using a gradient system from 60:40 to 100:0 of CH 3 CN-0.1% aqueous formic acid in 15 min at flow rate of 21.24 mL/min, yielding 3-O-methylbutyrolactone II (9) (40 mg, t R = 5.2 min), which was characterized by comparing its NMR and HRMS spectra with those previously reported [28].

LC-MS/MS Analysis
A solution of the organic extract was prepared at 3 mg/mL and filtered with a 0.22 µm membrane before analyzing on an Acquity ultraperformance liquid chromatography (UPLC) system (Waters Corp., Milford, MA, USA) coupled to a Q Exactive Plus (Thermo Fisher Scientific, ThermoWaltham, MA, USA) mass spectrometer. An Acquity BEH C18 column was used for UPLC separations (50 mm × 2.1 mm I.D., 1.7 µm; Waters) with a flow rate of 0.3 mL/min equilibrated at 40 • C. The mobile phase consisted of a linear gradient between CH 3 CN-0.1% aqueous formic acid from 15% to 100% of CH 3 CN over 8 min, then held for 1.5 min at 100% CH3CN and returning to the starting conditions. High-resolution mass spectrometry (HRMS) data and MS/MS spectra were collected using an electrospray ionization (ESI) source (positive and negative modes) at a full scan range (m/z 150−2000), with the following settings: capillary voltage, 5 V; capillary temperature, 300 • C; tube lens offset, 35 V; spray voltage, 3.80 kV; sheath and auxiliary gas flow, 30 and 20 arbitrary units.

Metabolomic Analysis
Raw MS/MS data from samples (extract and pure compounds 1-10), solvents (blank), and culture media (blank) were converted to .mzML file format using the Global Natural Products Social (GNPS) quick start converter and uploaded to the GNPS server (http: //gnps.ucsd.edu; accessed on 2 September 2021) [40]. MN was performed using the reference GNPS data analysis workflow [40]. Briefly, for spectral networks, a parent mass and fragment ion tolerance of 0.01 and 0.02 Da were considered. Different parameters (cosine and minimum matched fragment ions) were evaluated to determine the best networking conditions. For edges construction, a cosine score over 0.70 was fitted. A minimum of four matching ions, two nodes at least in the top 10 cosine scores, and 100 maximum connected components were considered for the analysis. Afterwards, the network spectra were searched against GNPS spectral libraries, considering scores above 0.70 and at least four matched ions. The chemical classification was carried out with the MolNetEnhancer GNPS tool, where the score is calculated representing what percentage of nodes within a molecular family are attributed to a given chemical class [41]. GNPS spectral libraries and graphic visualization of the MN were performed in Cytoscape 3.8.1. [42]. Manually dereplication was assessed using UV-absorption maxima and HRMS-MS/MS data against MS/MS data of 1-10. The annotation of these compounds was at confidence level 2 according to the metabolomics standards initiative [43] and exact mass accuracy <5 ppm.

Conclusions
This work represents the first report of genome sequencing and annotation of a marine-facultative fungal strain isolated from Mexico. Genomic data analysis and secondary metabolites biosynthetic potential of this fungus was assessed by the prediction of over 10,000 putative genes and 67 BGCs. Our work provides additional insight into the biosynthetic pathway of dioxomorpholines, whose BGC is only partially known. Finally, metabolomic MN analysis allowed us to highlight the biosynthetic capability of the fungus and to contribute to the GNPS community by providing data of rare compounds.

Data Availability Statement:
The authors confirm that the data supporting the findings of this study are available within the article and its supplementary material.