Comparison of Expression of Secondary Metabolite Biosynthesis Cluster Genes in Aspergillus flavus, A. parasiticus, and A. oryzae

Fifty six secondary metabolite biosynthesis gene clusters are predicted to be in the Aspergillus flavus genome. In spite of this, the biosyntheses of only seven metabolites, including the aflatoxins, kojic acid, cyclopiazonic acid and aflatrem, have been assigned to a particular gene cluster. We used RNA-seq to compare expression of secondary metabolite genes in gene clusters for the closely related fungi A. parasiticus, A. oryzae, and A. flavus S and L sclerotial morphotypes. The data help to refine the identification of probable functional gene clusters within these species. Our results suggest that A. flavus, a prevalent contaminant of maize, cottonseed, peanuts and tree nuts, is capable of producing metabolites which, besides aflatoxin, could be an underappreciated contributor to its toxicity.


Introduction
Biosynthesis of many fungal secondary metabolites, including mycotoxins, typically requires enzymes encoded by sets of clustered genes [1]. With the availability of full genome sequences, genes can be associated with secondary metabolite biosynthesis by use of the software program SMURF [2]. This program allows automated search of the genome to identify sets of contiguous genes that include a "backbone" gene encoding a protein required for biosynthesis of a metabolite precursor [3], OPEN ACCESS a transcription factor for regulation of gene expression, oxidases or reductases for modification of the metabolite precursor and transporters for export or for moving the metabolite to vacuoles or vesicles within the cell [3,4]. For secondary metabolite formation, typical backbone enzymes include non-ribosomal peptide synthases (NRPSs), polyketide synthases (PKSs) [5,6] or geranylgeranyl pyrophosphate synthases (GGPSs) [7] for one or more of the biosynthesis steps. Also, characteristic of some NRPS-derived metabolites is a step involving tryptophan prenylation, which is catalyzed by a cluster-associated dimethylallyltryptophan synthase (DMATS) [8]. The ability of fungi to co-ordinately regulate transcription of clustered genes usually depends on a single sequence-specific DNA-binding protein of the Zn 2 Cys 6 -type unique to a given cluster [9]. Expression of genes controlled by such transcription factors should define the boundaries for the gene cluster [10]. A method that combined SMURF with microarray expression analysis was recently described that also could help to better define the cluster boundaries for genes in secondary metabolite biosynthesis clusters [11].
In the present study expression analysis by RNA-seq was performed on two sclerotial size variants of A. flavus (called S and L strains) and the non-aflatoxigenic variant, A. oryzae. These A. flavus variants are morphologically and phylogenetically distinct [12]. Analysis was also done on A. parasiticus, a close relative of A. flavus that produces G-in addition to B-aflatoxins. Although RNA-seq data were available for isolates of an A. flavus L strain and A. oryzae [13][14][15], they were not available for an S strain A. flavus or for A. parasiticus. The comparison of RNA-seq data described in this paper evaluates the potential of these fungi to produce secondary metabolites when grown on a typical fungal growth medium. Such identification is the first step for rational assignment of a biosynthetic gene cluster to production of a specific metabolite.

Types of Backbone Genes
The gene clusters for secondary metabolism in A. flavus NRRL3357 previously identified by SMURF [16] were used for identification and annotation of homologous clusters in the related species: A. parasiticus, two variant A. flavus S strain isolates and A. oryzae. Putative backbone genes for gene clusters identified in A. flavus NRRL3357 are given in Tables 1-3. The PKS-encoding backbone genes in Table 1 are arranged by types of proteins predicted to be produced by these genes. Those encoding polyketide synthases with reducing domains are distinguished from those encoding proteins that lack such domains. The NRPS genes are arranged in Table 2 by those predicted to encode proteins with repeated condensation (C) domains and those predicted to encode proteins with single or no C domains. For both types of secondary metabolite, putative PKSs and NRPSs with only a single, or at most two, catalytic domains are listed separately. Genes for clusters 23 and 55 are predicted to encode a single polypeptide containing both PKS and NRPS catalytic domains. In Tables 1 and 2 transcription factors associated with the putative gene clusters are listed separately. Only some of the gene clusters contain transcription factors within the putative cluster [10]. Gene clusters containing the biosynthetic enzymes for production of GGPSs and DMATSs are listed in Table 3. One secondary metabolite whose biosynthesis has recently been studied, kojic acid, is derived from glucose [17]. Because of this difference in biosynthesis it is not shown in these lists or in Table S1.     are shown in bold font; d not found: BLASTN search did not give hits with E value below 1e-10 and a percent identity above 80%.   Notes: a aaa-length in amino acids; b Domains: KS-ketosynthase; AT-acyltransferase; DH-dehydratase; ER-enoyl reductase; KR-ketoreductase; PP-Phosphopantetheine attachment site; M-methyltransferase; TE-thioesterase. A-adenylation; C-condensation; T-thiolation; R-reductase; SDR_e1-short-chain dehydrogenases/reductase; FabG-3-oxoacyl-(acyl-carrier-protein) reductase; CaiC-carnitine CoA ligase; NADB-NAD-binding; c RPKM values were determined for cultures grown for 40 h on PDA medium; d not found-tBlastX search did not give hits with E value = 0.

Comparison of Putative Secondary Metabolite Clusters from A. oryzae, A. flavus S and L morphotype Isolates and A. parasiticus
Tables 1-3 compare secondary metabolite backbone genes in the SMURF-identified gene clusters in A. flavus NRRL3357 [16] with homologs in the other isolates. Homologs were determined by reciprocal best hit BLASTN search against the Genbank database for A. flavus NRRL3357. Additionally, we selected only the BLAST hits that had an expect (E) value below 1e-10 and a percent identity above 80%. By this criterion, the PKSs encoded by genes in clusters 23,33,36,38,40,43, and 49 were not identified in the A. parasiticus genome and PKSs in clusters 36 and 43 were not identified in A. oryzae (Table 1). Of the NRPS clusters, A. flavus backbone genes in clusters 4, 7, 22, 28, 48 and 53 in A. parasiticus, in 34, 48, and 53 in AF70, and in 9 and 48 in A. oryzae were not identified in the genomes of these isolates ( Table 2). The GGPS gene associated with cluster 22 was not identified in A. parasiticus (Table 3). NRPS, PKS, DMATS and GGPS genes that were not recognized by SMURF as being in a secondary metabolite gene cluster in A. flavus NRRL3357 are shown in Table 4 with their putative homologs in the other isolates. Some of these genes may be in, as yet, unrecognized secondary metabolite biosynthesis clusters. While many of these genes are present in all isolates, seven are found only in A. parasiticus. These may represent genes encoding biosynthesis of metabolites unique to A. parasiticus. Supplementary Table S2 lists the genes surrounding some of these backbone genes.

RNA-seq Analyses
For RNA-seq analysis we grew the fungi on PDA, a medium previously found to stimulate production of a wide variety of fungal secondary metabolites, including the aflatoxins [18], to determine which backbone genes clusters are actively transcribed. RNA-seq RPKM values are given in Tables 1-4 and in Supplemental Tables S1 and S2. For the purpose of comparison of these data, we consider that an RPKM value less than 1 represents, at most, only a low level of expression, whereas an RPKM value greater than 1 represents detectable expression. Based on these criteria, the RPKM values shown in Table 1 suggest that under our growth conditions, only half of the 29 PKSs and 26 NRPSs for any one isolate can be considered to be expressed and in some cases, the backbone genes that were expressed in the different isolates had markedly different RPKM values. The most prominent differences were found for PKSs in clusters 5, 38, 46, and 52 (Table 1) and for NRPSs in clusters 21, 26, 37, and 55 ( Table 2). Some of the backbone genes not previously assigned to gene clusters (Table 4) have RPKM values >1 and potentially could express genes that encode secondary metabolite biosynthesis enzymes. A. flavus CA42, an S strain isolate similar to AF70 (shown only in Tables 3 and S1) gives much higher RPKM values for the PKS genes in clusters 1, 27 and 39, the NRPS genes in clusters 12,23,25,35,37 and 55, and the DMATS and GGPS genes for aflatrem production in clusters 15 and 32 when grown for 168 h than when grown for only 40 h. At these longer times S strain A. flavus produce abundant sclerotia. It is possible that timing of expression for some of the gene clusters is coordinated with sclerotial production and that the associated metabolites accumulate preferentially in sclerotia. To support this conjecture we found, in a separate study, that aflatrem was produced abundantly by both S strain isolates only when sclerotia are formed (Ehrlich and DianaDiMavungu, unpublished results) and under these conditions the genes for the aflatrem biosynthesis (in clusters 15 and 32) were expressed with high RPKM values. Also, the gene for cluster 27 PKS, which was shown to be necessary for most sclerotial pigmentation [19], only is expressed highly in cultures undergoing sclerotial formation (A. flavus CA42 in Table S1). Several of the non-reducing PKS genes that are differentially expressed in the different isolates, based on homology to genes in other fungi [20], are predicted to be associated with production of polyketides required for pigment formation, for example, those in clusters 5, 36, 39 and 42. The gene for the DMATS in cluster 19 was expressed at a high RPKM level in most isolates while the GGPS of cluster 37 (an NRPS cluster) was expressed at the highest level in NRRL3357.
These data show that the combination of RNA-seq analysis of secondary metabolite gene expression with SMURF-derived tabulation of putative backbone biosynthetic genes and their clustered common decorating genes is able to provide an accurate way to assess which secondary metabolite biosynthesis gene clusters encode the genes for metabolite production under a given set of growth conditions. However, it is possible that, even if the genes in a cluster are expressed, the resulting protein(s) may not be functional. Most of the PKS and NRPS genes listed in Tables 1 and 2 as short sequences and which only encode one or two domains of a PKS or NRPS gave no or low RPKM values in our study with the exception of the putative ketosynthase and enoyl reductase genes in cluster 36, the ketosynthase genes in clusters 7 and 8, and the epimerase gene in cluster 7 (Tables 1 and 2). While these backbone genes are annotated in the databases as PKS-or NRPS-encoding genes, usually such genes are quite large and encode multifunctional enzymes [5,6]. It is possible that for some of these clusters the genes were not annotated correctly in the database and that neighboring sequence should be included in establishing the identity of these protein-coding regions. However, given the lack of expression of most of these genes and their abnormal size, it is likely that such gene clusters, by themselves, do not encode proteins involved in formation of a secondary metabolite.
To prove that a gene cluster actually is involved in biosynthesis of a particular metabolite produced by these closely related Aspergilli (for a list of metabolites known to be produced by the isolates examined, see Supplemental Table S3), gene knockout and add back experiments must be done to show that the knockout mutant loses and regains, respectively, the ability to produce the metabolite. Such knockout gene experiments have been done, so far, to confirm the roles of clusters 15 and 32 in production of aflatrem [7], clusters 35 and 48 in production of two related piperazines [21], cluster 27 in production of asparasone [19], cluster 54 in production of aflatoxin [22], and cluster 55 [8] in production of cyclopiazonic acid. In studies of A. flavus, A. oryzae and A. parasiticus, about 20 different classes of metabolites have been isolated from culture extracts [18,23]. Because the types of backbone biosynthetic enzymes often indicate the probable type of metabolite that can be produced based on the catalytic properties of the main PKS or NRPS in the cluster [24,25] the RNA-seq data are consistent with production of about 20 different classes of metabolites. Since many of the putative backbone genes listed in Tables 1-4 were not expressed, it is possible that these inactive clusters could become active under different growth conditions. In the present study only one growth condition (PDA) was used. It was previously found that gene activity can be induced by association of fungi with the proper microbial or nutritional environment or by artificial alteration of the chromatin state of the genes in the cluster [24,26,27]. The availability of RNA-seq data should improve the chances of being able to select a secondary metabolite backbone gene, that when disrupted, will actually result in loss of production of a specific metabolite.