Distribution and Evolution of Nonribosomal Peptide Synthetase Gene Clusters in the Ceratocystidaceae

In filamentous fungi, genes in secondary metabolite biosynthetic pathways are generally clustered. In the case of those pathways involved in nonribosomal peptide production, a nonribosomal peptide synthetase (NRPS) gene is commonly found as a main element of the cluster. Large multifunctional enzymes are encoded by members of this gene family that produce a broad spectrum of bioactive compounds. In this research, we applied genome-based identification of nonribosomal peptide biosynthetic gene clusters in the family Ceratocystidaceae. For this purpose, we used the whole genome sequences of species from the genera Ceratocystis, Davidsoniella, Thielaviopsis, Endoconidiophora, Bretziella, Huntiella, and Ambrosiella. To identify and characterize the clusters, different bioinformatics and phylogenetic approaches, as well as PCR-based methods were used. In all genomes studied, two highly conserved NRPS genes (one monomodular and one multimodular) were identified and their potential products were predicted to be siderophores. Expression analysis of two Huntiella species (H. moniliformis and H. omanensis) confirmed the accuracy of the annotations and proved that the genes in both clusters are expressed. Furthermore, a phylogenetic analysis showed that both NRPS genes of the Ceratocystidaceae formed distinct and well supported clades in their respective phylograms, where they grouped with other known NRPSs involved in siderophore production. Overall, these findings improve our understanding of the diversity and evolution of NRPS biosynthetic pathways in the family Ceratocystidaceae.


Introduction
Fungi produce an extensive variety of secondary metabolites (SMs) [1]. These small organic molecules differ from primary metabolites as they are not essential for growth in vitro. In fungi, one class of secondary metabolites, nonribosomal peptides (NRPs), are known to function in basic metabolism, as well as an array of other biological processes including cellular development and morphology, pathogenicity, and stress responses [1]. This diverse family of natural products has also received attention because of its medicinal uses as immunosuppressive drugs or antibiotics [2].
NRPs are small bioactive peptides that are not synthesized via the main ribosome-based translational process, but instead by serial reduction of proteinogenic and/or nonproteinogenic amino acids [3,4]. At least 500 different monomers containing hydroxy acids, fatty acids, and nonproteinogenic amino acids have been identified as NRP building blocks [5]. These building blocks play a role in both the structural adaptability and diversity of biological activities present among NRPs [5]. In filamentous
Davidsoniella australis isolate CMW2333 [28] and D. neocalidoniae isolate CMW26392 [34] were obtained from the culture collection (CMW) of the Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, South Africa. Both isolates were grown on malt extract agar (MEA; 2% (w/v) Bacto™ malt extract (BD BioSciences, San Jose, CA, USA), and 2% (w/v) Difco™ agar (BD BioSciences, San Jose, CA, USA) at 25 • C for 14 days. High-quality DNA was extracted using a method employing CTAB (cetyl trimethylammonium bromide) [35], and sent for sequencing at the Central Analytical Facility (University of Stellenbosch, Stellenbosch, South Africa). For this purpose, the Ion-Torrent™ (Thermo Fisher Scientific, Johannesburg, South Africa) Ion S5™ system and Ion 530 Chip Kit were used to produce 400-nucleotide single reads. The raw sequence reads were filtered for quality and used for a de novo assembly in CLC Genomics Workbench v.11.0.1 (Qiagen Bioinformatics, Aarhus, Denmark) with default settings. All contigs longer than 500 bases were submitted to the nucleotide repository at the National Center for Biotechnology Information (NCBI; www.ncbi.nlm.nih.gov/genbank/). The completeness of all of the genomes included in this study was assessed using the Benchmarking Universal Single-Copy Orthologs (BUSCO) tool [36].

Identification of NRPS Genes and Clusters
To identify contigs that might contain NRPS clusters, each of the Ceratocystidaceae target genomes were submitted to the fungal version of antiSMASH v.4.0 [37] and the fungal-specific tool SMURF [22]. In cases where a cluster was predicted across multiple contigs, a reference assembly was used to confirm that these belonged to a single continuous sequence. To do this, the raw reads of the genome in question were mapped to a complete reference locus obtained from a close relative. For comparison, the repertoire of NRPS genes encoded by other Sordariomycetes were also determined using the antiSMASH and SMURF online tools. Full genome sequences for 16 representative Sordariomycete species were obtained from the NCBI database and the MycoCosm genomics resource [38] of the Joint Genome Institute's fungal program [39] (Supplementary file S-1).
The Ceratocystidaceae contigs identified by antiSMASH and SMURF as putatively containing NRPS clusters were manually annotated. To do this, the identified and reconstructed contigs were annotated using the Web AUGUSTUS Service (http://bioinf.uni-greifswald.de/webaugustus/; [40]) based on Fusarium graminearum gene models with the default program parameters. The results were first analyzed for any nonribosomal peptide synthetase that could form the basis of a NRPS cluster by BLASTp searches against the GenBank database using the translated genes as query. Once this gene was identified, the predicted open reading frames (ORFs) present in 15 kilobases (Kb) of sequence upstream and downstream of the putative NRPS gene were searched for similarity to genes implicated in SM biosynthesis clusters using the results of the BLASTp search. The results of this analysis were used to infer the boundary of the NRPS cluster.
In order to further characterize the similarities between the Ceratocystidaceae NRPS sequences and those previously described, we examined the domain structure of the genes present in the defined cluster. Characterization of the domains present in these genes was done using the InterPro Scan tool (https://www.ebi.ac.uk/interpro/; [41]), the PKS/NRPS analysis website (http://nrps.igs.umaryland. edu/nrps) [42]), MOTIF search (http://www.genome.jp/tools/motif/) and NCBI's Conserved Domain Database [43]. In addition, genes present in the defined cluster were examined for the presence of signature NRPS domains characteristic of the NRP biosynthesis enzymes [2,8] by making use of the online available PKS/NRPS Analysis website (http://nrps.igs.umaryland.edu). A putative NRPS gene was confirmed if it possessed at least one module containing all three of the conserved domains (i.e., A, T and C) [44]. The results for this analysis were also used to deduce the NRPS module organization.

Confirmation of Gene Order and Annotation
The gene order and content of the identified Ceratocystidaceae clusters was verified using a PCR-based approach. For this purpose, DNA was extracted from isolates that were grown on medium containing MEA, using the a DNeasy Plant Mini Kit (Qiagen, Carlsbad, CA, USA). A series of primers were designed to amplify the genes and intergenic sequences for five representatives of the family (Ceratocystis manginecans, CMW17570; Thielaviopsis musarum, CMW1546; Endoconidiophora polonica, CMW20930; Huntiella bhutanensis, CMW8217; Davidsoniella virescens, CMW17339; and Bretziella fagacearum, CMW2656). To design the primers, each identified gene was submitted to the online primer design software Primer3web [45] to design a forward and reverse primer. Details regarding the primers as well as PCR conditions used are presented in Supplementary file S-2. Amplicons were purified with Sephadex G50 columns (Sigma-Aldrich, Modderfontein, South Africa), and sequenced using the BigDye Terminator 3.1 cycle sequencing premix kit (Applied Biosystems, Foster City, CA, USA). The products were analyzed on an ABI PRISM 3300 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) at the Bioinformatics and Computational Biology Unit of the University of Pretoria. Consensus sequences were constructed with CLC Main Workbench v.9.1.1 (QIAGEN Bioinformatics, Aarhus, Denmark).
The annotations of the genes and clusters were evaluated using RNA-Seq data available from previous studies on H. moniliformis and H. omanensis [46]. The respective sets of sequence reads were quality filtered in CLC Genomics Workbench based on Phred quality scores, and reads with a score below 20 (Q ≤ 0.01) were discarded as described previously [46]. These data were then mapped to the original contigs containing NRPS gene clusters, by making use of the RNA-legacy tool in CLC Genomics Workbench using minimum similarity fractions of 0.8 and minimum length fractions of 0.5.

Phylogenetic Analysis of the Mono-and Multimodular NRPS Genes
The protein sequences of the A domain of the mono-and multimodular NRPS genes were used as a proxy to examine the evolutionary relationships of the putative NRPS orthologs. This is because the A domain represents the most conserved domain in fungal NRPS genes [47]. For this, datasets including sequences identified in this study, as well as those obtained from NCBI's protein database, were subjected to phylogenetic analyses. The latter were identified by performing BLASTp searches against the GenBank database using the identified Ceratocystidaceae A domain NRPS proteins as query. Proteins identified with an expect value (E) ≤ 0.000001 were used in the analysis. Datasets were subjected to MAFFT (multiple alignment using fast Fourier transform) [48] using the E-INS-I function to allow for alignment with iterative refinement. Individual datasets were then used to construct neighbor-joining and maximum likelihood phylogenies based on suitable protein substitution models (LG + I + G + F for monomodular NRPS genes and JTT + I + G for multimodular NRPS genes) by making use of MEGA v.7 [49]. Branch support was estimated using bootstrap analysis of 1000 pseudo replicates and the same model parameters.

Fe-CAS Blue Agar Test
The universal siderophore assay using the iron-chrome azurol S (Fe-CAS) dye complex was performed as described by Milagres et al. [49,50]. This assay was used to test the ability of fungi in the Ceratocystidaceae to produce iron-binding complexes in the medium. For this purpose, the dye solution was made by dissolving 36.5 mg hexadecyltrimethyl-ammonium bromide (HDTMA) in 20 mL distilled water, and then gradually supplementing it with 25 mL of CAS solution (1.21 g/L) and 10 mL of an iron solution (containing 1 mM FeCl 3 ·6H 2 0, 10 mM HCl). The dye solution was then added to 375 mL water containing piperazine-N,N -bis(2-ethanesulfonic acid) (PIPES; 40 g/L; pH 6.8) and 20 g/L agar for the preparation of Fe-CAS blue agar medium in Petri plates. To prepare the Fe-CAS blue assay plates, half of the contents of a Petri dish containing MEA growth medium was removed and replaced with Fe-CAS blue agar medium. The tested fungal strains were inoculated onto the MEA half of the plate and incubated at 25 • C for three weeks in the dark until a colour change was observed in the Fe-CAS-containing half of the assay plates.

Genome Assemblies
The quality-filtered Ion-Torrent sequence libraries for D. australis and D. neocalidoniae consisted of 4,353,100 and 3,307,125 reads, respectively. These were assembled into genomes that were respectively 38.6 Mb and 35.3 Mb in size. All contigs over 500 bp were deposited at DDBJ/ENA/GenBank under the accessions RHLR00000000 for D. australis and RHDR00000000 for D. neocalidoniae, with the respective versions described in this paper being RHLR01000000 and RHDR01000000. Based on the BUSCO analysis, these assemblies, as well as those obtained from NCBI and other sources, showed high levels of completeness (>96%), thus allowing for meaningful genomic comparisons (Table 1).

Identification of NRPS Genes and Clusters
With the aid of SMURF and antiSMASH, contigs containing putative NRPS biosynthesis gene clusters were identified in both the Ceratocystidaceae and other Sordariomycetes genomes examined ( Figure 1). Two putative NRPS biosynthesis gene clusters were identified in all the Ceratocystidaceae genomes examined, which included one multimodular and one monomodular NRPS gene cluster (Supplementary files S-3 and S-4). In two of the Davidsoniella species, namely D. australis and D. neocaledoniae, the genes of the monomodular clusters occurred on more than one contig. However, reference assemblies using the raw reads of these two fungi against the D. virescens assembly suggested that these genes are part of a single cluster in both species. The consensus sequences for the clusters in D. neocaledoniae and the D. australis have been deposited in NCBI with the accession numbers MK694917 and MK694918, respectively. When compared with other Sordariomycetes, the Ceratocystidaceae falls within the lower spectrum with regards to the number of NRPS biosynthesis gene clusters encoded. Among the fungi considered, Chaetomium golobosum (11 clusters) and Diaporthe longicolla (nine clusters) had the greatest number of such clusters.

Identification of NRPS Genes and Clusters
With the aid of SMURF and antiSMASH, contigs containing putative NRPS biosynthesis gene clusters were identified in both the Ceratocystidaceae and other Sordariomycetes genomes examined ( Figure 1). Two putative NRPS biosynthesis gene clusters were identified in all the Ceratocystidaceae genomes examined, which included one multimodular and one monomodular NRPS gene cluster (Supplementary files S-3 and S-4). In two of the Davidsoniella species, namely D. australis and D. neocaledoniae, the genes of the monomodular clusters occurred on more than one contig. However, reference assemblies using the raw reads of these two fungi against the D. virescens assembly suggested that these genes are part of a single cluster in both species. The consensus sequences for the clusters in D. neocaledoniae and the D. australis have been deposited in NCBI with the accession numbers MK694917 and MK694918, respectively. When compared with other Sordariomycetes, the Ceratocystidaceae falls within the lower spectrum with regards to the number of NRPS biosynthesis gene clusters encoded. Among the fungi considered, Chaetomium golobosum (11 clusters) and Diaporthe longicolla (nine clusters) had the greatest number of such clusters. The gene content and order of the putative monomodular NRPS gene cluster was identical across all the Ceratocystidaceae genomes, and appeared to represent an extracellular-type siderophore biosynthesis cluster (Supplementary files S-3 and S-4). This is because the putative NRPS gene present in this cluster shared high similarity (≥90% sequence similarity) to genes present in extracellular-type siderophore biosynthesis clusters present in Colletotrichum, Fusarium, Trichoderma, and Metarhizium (Supplementary file S-5). In addition to the NRPS gene, this cluster also contained genes coding for a siderophore biosynthesis gene, a siderophore transporter, and an ABC-transporter ( Figure 2).  The gene content and order of the putative monomodular NRPS gene cluster was identical across all the Ceratocystidaceae genomes, and appeared to represent an extracellular-type siderophore biosynthesis cluster (Supplementary files S-3 and S-4). This is because the putative NRPS gene present in this cluster shared high similarity (≥90% sequence similarity) to genes present in extracellular-type siderophore biosynthesis clusters present in Colletotrichum, Fusarium, Trichoderma, and Metarhizium (Supplementary file S-5). In addition to the NRPS gene, this cluster also contained genes coding for a siderophore biosynthesis gene, a siderophore transporter, and an ABC-transporter ( Figure 2). The putative multimodular NRPS gene cluster identified in the Ceratocystidaceae genomes (Supplementary files S-3 and S-4) also appeared to represente a siderophore biosynthesis cluster. This is because the NRPS gene located in this cluster shared a very high similarity (96-99%) to one encoding a ferricrocin synthetase (20). The cluster contained three complete NRPS modules, which resembled that of other known siderophore synthetases, specifically those of Fusarium oxysporum, Colletotrichum gloesporioides, Scedosporium apiospermum, and Colletotrichum gloesporioides (Supplementary file S-5). It was also flanked by a gene encoding an L-ornithine N5-monooxygenase with high similarity (88-99%) to that encoded near the ferricrocin biosynthesis cluster of Colletotrichum gloesporioides. The Ceratocystidaceae cluster was comprised of a NRPS, an L-ornithine N5-monooxygenase, endothiapepsin, two hypothetical protein coding genes, and a transcription factor (Figure 3).
Conserved domain analysis showed that the domain organization of the monomodular NRPS gene was highly conserved in all the Ceratocystidaceae genomes examined (see Figures 4,5 below). These genes contained the adenylation, thiolation, and condensation domains in the order A-T-C-T-C, similar to the NRPS genes reported for Trichoderma virens (XP_013952882) and Metarhizium anisopliae (XP_014549678). In contrast to the monomodular NRPS genes, the domains in the multi-modular NRPS gene of the Ceratocystidaceae occurred in the order A-T-C-A-T-C-T-C-A-T-C-T-C-T (or slight variations of this). This was similar to those of Colletotrichum orbiculare (NCBI accession number ENH79738) and Trichoderma reesei (NCBI accession number XP_006969410). The domain structure in the Ceratocystidaceae multimodular NRPS genes was typical of the genes involved in the biosynthesis of ferricrocin-type siderophores (i.e., ferrichrome synthetase group) [51]. However, the Ceratocystidaceae multimodular NRPS gene differed in that they had a T-C next to the second A-T-C The putative multimodular NRPS gene cluster identified in the Ceratocystidaceae genomes (Supplementary files S-3 and S-4) also appeared to represente a siderophore biosynthesis cluster. This is because the NRPS gene located in this cluster shared a very high similarity (96-99%) to one encoding a ferricrocin synthetase (20). The cluster contained three complete NRPS modules, which resembled that of other known siderophore synthetases, specifically those of Fusarium oxysporum, Colletotrichum gloesporioides, Scedosporium apiospermum, and Colletotrichum gloesporioides (Supplementary file S-5). It was also flanked by a gene encoding an L-ornithine N5-monooxygenase with high similarity (88-99%) to that encoded near the ferricrocin biosynthesis cluster of Colletotrichum gloesporioides. The Ceratocystidaceae cluster was comprised of a NRPS, an L-ornithine N5-monooxygenase, endothiapepsin, two hypothetical protein coding genes, and a transcription factor (Figure 3). module, while the ferrichrome synthetase genes examined previously only had three complete A-T-C modules and a terminal T-C repeat [52].

Confirmation of Gene Order and Annotation
PCR amplification and sequencing of the genes and intergenic regions confirmed the gene content and order within each of the two clusters (Supplementary file S-2). In these analyses, lengths and sequences of the amplicons corresponded with those predicted from the respective assemblies. Analysis of the mapped RNA-seq data of H. moniliformis and H. omanensis confirmed the annotation of the clusters and genes while also confirming that these genes were expressed (Supplementary file S-6).

Phylogenetic Analysis of the Mono-and Multimodular NRPS Genes
The phylogenetic analysis revealed that the A domain of both the mono-and multimodular NRPS genes predicted in the genomes of Ceratocystidaceae formed distinct and well supported clades in their respective phylograms (Figures 4,5). In the case of the monomodular NRPS (Figure 4), the Ceratocystidaceae clustered with the A domains of monomodular NRPS sequences from other Ascomycetes (e.g., Trichoderma virens (NCBI accession number XP_013952882) and Metarhizium anisopliae (NCBI accession number XP_014549678)) that are known to be involved in the production of siderophores [51]. Conserved domain analysis showed that the domain organization of the monomodular NRPS gene was highly conserved in all the Ceratocystidaceae genomes examined (see Figures 4 and 5 below). These genes contained the adenylation, thiolation, and condensation domains in the order A-T-C-T-C, similar to the NRPS genes reported for Trichoderma virens (XP_013952882) and Metarhizium anisopliae (XP_014549678). In contrast to the monomodular NRPS genes, the domains in the multi-modular

NRPS gene of the Ceratocystidaceae occurred in the order A-T-C-A-T-C-T-C-A-T-C-T-C-T (or slight variations of this)
. This was similar to those of Colletotrichum orbiculare (NCBI accession number ENH79738) and Trichoderma reesei (NCBI accession number XP_006969410). The domain structure in the Ceratocystidaceae multimodular NRPS genes was typical of the genes involved in the biosynthesis of ferricrocin-type siderophores (i.e., ferrichrome synthetase group) [51]. However, the Ceratocystidaceae multimodular NRPS gene differed in that they had a T-C next to the second A-T-C module, while the ferrichrome synthetase genes examined previously only had three complete A-T-C modules and a terminal T-C repeat [52]. For the multimodular NRPS domain A phylogeny ( Figure 5), sequences separated into three groups, where each contained the sequences for one of the Ceratocystidaceae NRPS A domains, together with those of fungal NRPS genes that have been functionally shown to produce siderophores [51,53]

Fe-CAS Blue Agar Test
Based on our Fe-CAS blue assay, all of the Ceratocystidaceae isolates examined produced iron-binding compounds ( Table 2). All the isolates grew normally on the MEA half of the medium, but produced characteristic pink to purple halos when they reached the Fe-CAS blue half of the medium. This indicated that the growth of the fungus was associated with removal of Fe from the Fe-CAS containing medium [49]. However, the individual species differed in the strength of their iron-binding ability as a range of colour changes (purple, pink, or purplish-red) were observed.

A-2 domain
A PP C Domains Figure 5. Phylogenetic relationships of taxa based on the inferred protein sequences of A domains of the multimodular NRPS genes examined. The panel to the right indicates the domain structure of the NRPS genes in the respective clusters. The tree was generated with the neighbor-joining (NJ) method in MEGA v.7 using the JTT + I + G model. Similar groupings were obtained using maximum likelihood (ML) analysis and the same model parameters. Branch support is indicated at the internodes (>50% bootstrap values based on a 1000 repeats) in the order NJ/ML.

Confirmation of Gene Order and Annotation
PCR amplification and sequencing of the genes and intergenic regions confirmed the gene content and order within each of the two clusters (Supplementary file S-2). In these analyses, lengths and sequences of the amplicons corresponded with those predicted from the respective assemblies. Analysis of the mapped RNA-seq data of H. moniliformis and H. omanensis confirmed the annotation of the clusters and genes while also confirming that these genes were expressed (Supplementary file S-6).

Phylogenetic Analysis of the Mono-and Multimodular NRPS Genes
The phylogenetic analysis revealed that the A domain of both the mono-and multimodular NRPS genes predicted in the genomes of Ceratocystidaceae formed distinct and well supported clades in their respective phylograms (Figures 4 and 5). In the case of the monomodular NRPS (Figure 4), the Ceratocystidaceae clustered with the A domains of monomodular NRPS sequences from other Ascomycetes (e.g., Trichoderma virens (NCBI accession number XP_013952882) and Metarhizium anisopliae (NCBI accession number XP_014549678)) that are known to be involved in the production of siderophores [51].
For the multimodular NRPS domain A phylogeny ( Figure 5), sequences separated into three groups, where each contained the sequences for one of the Ceratocystidaceae NRPS A domains, together with those of fungal NRPS genes that have been functionally shown to produce siderophores [51,53]. For example, the first A domain grouped with the first A domain in the NRPS genes of Metarhizium rileyi (NCBI accession number OAA51510), Trichoderma citrinoviride (NCBI accession number XP_024748869), Trichoderma reesei (NCBI accession number XP_006969410), Fusarium culmorum (NCBI accession number PTD04828), Valsa mali (NCBI accession number KUI59087), and Colletotrichum orbiculare (NCBI accession number ENH79738).

Fe-CAS Blue Agar Test
Based on our Fe-CAS blue assay, all of the Ceratocystidaceae isolates examined produced iron-binding compounds ( Table 2). All the isolates grew normally on the MEA half of the medium, but produced characteristic pink to purple halos when they reached the Fe-CAS blue half of the medium. This indicated that the growth of the fungus was associated with removal of Fe from the Fe-CAS containing medium [49]. However, the individual species differed in the strength of their iron-binding ability as a range of colour changes (purple, pink, or purplish-red) were observed. Table 2. Evaluation of siderophore production by different Ceratocystidaceae representatives included in this study using the Fe-CAS (iron-chrome azurol S) blue agar universal test (49).

Species
Growth (

Discussion
This study is the first to explore the repertoire of NRPS biosynthetic gene clusters in the Ceratocystidaceae. The use of genome data, together with various bioinformatic tools, facilitated the identification of two such clusters in these fungi. In this regard, Ceratocystidaceae falls on the lower end of the spectrum of fungi encoding NRPS biosynthetic gene clusters [54]. The genomes of fungi such as Colletotrichum gloeosporioides and Diaporthe longicolla contain many more genes and clusters potentially responsible for the production of NRPs (e.g., they respectively encode 11 and nine putative NRPS genes).
The two NRPS biosynthetic gene clusters identified in the Ceratocystidaceae likely encode siderophores. This is mostly based on the similarity of the core gene sequences of the clusters to those of other known NRPSs, but gene knockout studies are needed for verification. Siderophores are low molecular weight, iron-binding molecules that enable iron uptake in microorganisms [55,56]. Most fungi utilize NRPS to produce hydroxamate-type siderophores that share the structural unit N 5 -acyl-N 5 -hydroxyornithine (i.e., fusarinines, coprogens, and ferrichromes) [57,58]. By contrast, certain Zygomycetes and bacteria utilize NRPS-independent mechanisms to produce polycarboxylate siderophores [59][60][61]. However, all known siderophores of the Ascomycetes are produced by NRPSs [60].
One of the predicted Ceratocystidaceae NRPS genes is monomodular and the other is multimodular. Different from other multimodular NRPSs that usually terminate with a condensation-like domain involved in releasing the final peptide, the Ceratocystidaceae multimodular NRPS gene terminates with a C-T domain. This is similar to what has been found for Penicillium chrysogenum (Gene ID, Pc13g05250) [16]. However, previous studies have also shown that the occurrence of C and T domains at the carboxy terminus may be characteristic of iterative NRPS [47]. Indeed, C and T domains were often found at the C terminus of the majority of Aspergillus and Penicillium NRPSs [47,62]. With regards to monomodular NRPS genes, the one encoded by the Ceratocystidaceae terminates with a C, while those of other fungi typically have a T terminal domain [25]. Interestingly, this domain architecture is also identified at the carboxy terminus of the Ustilago maydis NRPS gene, and the protein product functions to close the growing tripeptide ring of the siderophore [63].
The monomodular NRPS gene identified in the Ceratocystidaceae is likely involved in the production of an extracellular siderophore. This is because the putative NRPS gene existing on this cluster shared ≥90% sequence similarity to those present in the extracellular-type siderophore biosynthesis clusters of other Ascomycetes (e.g., Fusarium, Metarhizium, and Trichoderma) [53,64]. Along with the NRPS gene, this cluster is also comprised of other important genes related to extracellular siderophore production (e.g., genes encoding a siderophore transporter and an ABC-transporter). A homologous cluster has been reported in various fungi including Trichoderma virens, Metarhizium anisopliae, Sordaria macrospora, and Fusarium graminearum [53,64]. In T. virens, the same cluster is responsible for the biosynthesis of an extracellular siderophore involved during host infection [51], and gene deletion studies revealed its capacity to produce many additional secreted siderophores [53]. During plant infection, extracellular siderophores are thought to supply the fungus with iron as an essential nutrient in planta, thereby enhancing its virulence [65,66]. These types of siderophores are often implicated in pathogenicity or virulence of fungal pathogens [67].
The multimodular NRPS of the Ceratocystidaceae is likely responsible for the production of an intracellular siderophore. This gene is highly conserved among Ascomycetes and encodes a ferricrocin synthetase [68]. The first step in intracellular siderophore biosynthesis is the formation of N 5 -hydroxy-L-ornithine through N 5 -hydroxylation of L-ornithine [20]. This reaction is catalyzed by an L-ornithine-N 5 -monooxygenase. In most cases, the genes encoding this enzyme are located in the vicinity of an NRPS gene [20,69], and in the Ceratocystidaceae genomes examined, a homolog of it was located next to the NRPS gene. Similarly, in the genomes of F. pseudograminearum and Verticillium dahliae, homologs of these two genes are positioned next to each other [20]. The role of this cluster in ferricrocin biosynthesis has been confirmed experimentally [54]. Knock out studies in A. fumigatus of the gene encoding L-ornithine N 5 -monooxygenase revealed that the transformants were unable to synthesize ferricrocin and fusarinine C siderophores [66,70]. Ferricrocin siderophores appear to be essential elements for the growth and survival of many fungi. In Cochliobolus heterostrophus and Fusarium graminearum, ferricrocin also has been shown as essential for sexual development [71]. Additionally, in Trichoderma virens ferricrocin is needed for conidial development [72] and in A. fumigatus it is an intracellular iron transporter involved in the distribution of iron during cellular development [73].
Similar to other NRPS enzymes, the Ceratocystidaceae NRPS contains three specifically recognized domains (A, T, and C) for peptide bond formation [2,8]. In some cases, the T-C units lacking an A domain may be functional by stimulating nonadjacent A domains [74]. T-C units, lacking an associated A domain, may be created either through independent duplication of T-C units or through loss of an associated A domain from a complete A-T-C module. Our results are congruent with the hypothesis of Schwecke and colleagues [74] of a hexamodular origin with six complete A-T-C units, following loss of separate A domains, for the ferrichrome synthetase [52]. For example, the Ceratocystidaceae multimodular NRPSs have six T domains, while they have only three A domains. To the best of our knowledge, the mechanisms regulating iterative usage of NRPS domains are unknown. It is also likely that proteins with similar domain structures most likely produce very similar secondary metabolites.
Given that Ceratocystidaceae likely encode two NRPS-dependent siderophores, our results correspond to those shown for F. graminearum, which also encoded one intracellular and one extracellular siderophore [65,71]. Various aspects of our results also point towards biological functionality of the two NRPS biosynthetic gene clusters found in Ceratocystidaceae. These NRPS genes are highly conserved and have high similarity to those of fungi known to produce siderophores, which would be expected for loci involved in essential biological functions [52,74]. The same is also true for their overall cluster structures and gene content. Additionally, we have found evidence in previously published RNA-seq data [46] that almost all of the genes in both of the clusters are expressed in at least two members of the family. Finally, isolates of Ceratocystidaceae exhibited siderophore-like activity by binding Fe when they were grown on medium containing the Fe-CAS dye complex [50]. Our future research would therefore seek to reveal the roles of NRPS genes and clusters in the overall biology, ecology, and host-associations of the Ceratocystidaceae.

Supplementary Materials:
The supplementary material is available: http://www.mdpi.com/2073-4425/10/5/328/s1. Supplementary file S1. Genome sequence information for the 16 Sordariomycetes representative included in this study. Secondary metabolite unique regions finder (SMURF; www.jcvi.org/smurf/; (22) was used to predict NRPS genes from the Sordariomycetes genomes available from the Joint Genome Institute (JGI; https://jgi.doe.gov/ our-science/science-programs/fungal-genomics/) and the National Centre for Biotechnology Information (NCBI; http://blast.ncbi.nlm.nih.gov/). Supplementary file S2. To confirm the order of genes within the different NRPS clusters identified, a PCR-based approach was used. For each cluster type, primers were designed that allow amplification of individual genes, as well as the regions between them. Correlation between predicted and observed fragment sizes were used as evidence that the specific cluster was correctly assembled. Supplementary file S3. Blast hits for putative Ceratocystidaceae NRPS biosynthetic cluster genes. For confirmation of the antiSMASH results, we utilized secondary metabolite unique regions finder (SMURF; www.jcvi.org/smurf/; (22). For this purpose, genes that were 15 Kb upstream and downstream of the identified NRPS genes were retrieved and submitted to the BLASTp server at the National Center for Biotechnology Information (NCBI, ftp://ftp.ncbi.nih.gov/blast/) for identification. Supplementary file S4. The tables below show the predicted nonribosomal peptide synthetase (NRPS) gene clusters predicted by SMURF (secondary metabolite unique regions finder) (22; 37). Supplementary file S5. Top 10 BLASTp hits for each NRPS sequence. BLASTp search was done against the non-redundant protein sequences in the National Centre for Biotechnology Information database (NCBI; http://blast.ncbi.nlm.nih.gov/), and the top 10 hits are indicated with species where they are found, E-value, percent sequence identity, and coverage. Supplementary file S6. Mapping of RNA reads to different genes of the NRPS gene clusters of C. fimbriata, H. moniliformis, and H. omanensis. This included the genes of the monomodular NRPS gene clusters: A, hypothetical protein; B, nonribosomal peptide synthetase; C, siderophore transporter; D, siderophore biosynthesis; E, oxidoreductase; F, ABC transporter; and G, transporter. It also included mapping of RNA reads to different genes of multimodular NRPS gene clusters of C. fimbriata, H. moniliformis, and H. omanensis. A, nonribosomal peptide synthetase; B, orntithine monooxygenase; C, endothiapepsin; D, RNA polymerase transcription subunit; and E, aldehyde dehydrogenase. Reads in green and red represent forward and reverse orientation, respectively. Funding: Funding was received from the University of Pretoria, the South African National Research Foundation (NRF) and the South African Department of Science and Technology (DST) via the Centers of Excellence program (Center of Excellence in Tree Heath Biotechnology) and the South African Research Chairs Initiative. The grant holders acknowledge that opinions, findings, and conclusions or recommendations expressed in publications generated by NRF supported research are that of the authors, and that the NRF accepts no liability whatsoever in this regard.