RNA-Seq-Based Transcriptome Analysis of Aflatoxigenic Aspergillus flavus in Response to Water Activity

Aspergillus flavus is one of the most important producers of carcinogenic aflatoxins in crops, and the effect of water activity (aw) on growth and aflatoxin production of A. flavus has been previously studied. Here we found the strains under 0.93 aw exhibited decreased conidiation and aflatoxin biosynthesis compared to that under 0.99 aw. When RNA-Seq was used to delineate gene expression profile under different water activities, 23,320 non-redundant unigenes, with an average length of 1297 bp, were yielded. By database comparisons, 19,838 unigenes were matched well (e-value < 10−5) with known gene sequences, and another 6767 novel unigenes were obtained by comparison to the current genome annotation of A. flavus. Based on the RPKM equation, 5362 differentially expressed unigenes (with |log2Ratio| ≥ 1) were identified between 0.99 aw and 0.93 aw treatments, including 3156 up-regulated and 2206 down-regulated unigenes, suggesting that A. flavus underwent an extensive transcriptome response during water activity variation. Furthermore, we found that the expression of 16 aflatoxin producing-related genes decreased obviously when water activity decreased, and the expression of 11 development-related genes increased after 0.99 aw treatment. Our data corroborate a model where water activity affects aflatoxin biosynthesis through increasing the expression of aflatoxin producing-related genes and regulating development-related genes.

the A. flavus transcriptome as well as specific data regarding differentially expressed genes between 0.93 aw and 0.99 aw. This work improves the understanding of the effect of water activity on development and aflatoxin biosynthesis of A. flavus at the transcriptome level. These findings are significant for predicting the impact of climate change on aflatoxin production, which might be used to improve food safety and to develop specific approaches to control such carcinogenic natural metabolites in the food chain.

Fungal Strains and Growth Conditions
The A. flavus NRRL 3357 was kindly provided by Zhumei He (Sun Yat-sen University, Guangzhou, China). The strains were inoculated in YES medium (20 g yeast extract, 150 g sucrose, 1 g MgSO4· 7H2O, 1 L). Spores from a 7-day-old culture grown at 37 °C were dislodged with a sterile loop and placed into 10 mL of sterile water +0.05% DMSO in a 25 mL universal bottle. The spores were counted, and a 10 6 spore mL −1 concentration was prepared. The agar medium was modified with glycerol to adjust the water availability to 0.93 aw and 0.99 aw, and the following amounts were used per liter (108 mL, 0.99; 24.5 mL, 0.93) [17]. The 9 cm Petri plates containing media treatments were all overlaid with sterile 8.5 cm disc cellophanes and then centrally inoculated with a 10-μL-spore suspension. Replicates (five per treatment) were incubated at 28 °C.

Growth Assessment and Aflatoxin Analysis
After incubation at a different aw level, the colony morphology was observed after 5 days. For quantitative comparison of conidia production, conidia were washed off the agar plates using 0.05% DMSO solution and counted on a hemocytometer. Quantitative determination of aflatoxin B1 from fungal colonies was performed by TLC analysis. For this purpose, the biomass was removed from the cellophane surface for aflatoxin extraction. Extraction was performed using 40 mL of chloroform (twice with 20 mL each), and then the chloroform phase was filtered through filter paper and concentrated to dryness under 50 °C in an incubator. The residue was redissolved in 20 μL of methanol, and 10 μL of this solution was spotted and developed on a Si250 silica gel plate (Haiyang, Qingdao, China) with a solvent system of chloroform/acetone (90:10, v/v) [20]. Aflatoxin production was measured in micrograms per gram of culture biomass.

cDNA Preparation and Illumina Sequencing
Five day-old mycelium was removed from the cellophane surface for isolation of RNA, and cDNA was prepared according to a protocol with some modifications [21]. Genomic DNA was digested using DNase (New England Biolabs, Beijing, China), and total RNA was isolated using TRIzol reagent (Invitrogen, Shanghai, China). A Nano Drop 2000 and Agilent 2100 were used to evaluate the quality of RNA. After total RNA extraction and DNase I treatment, magnetic beads with oligo (dT) were used to isolate mRNA. Mixed with the fragmentation buffer, the mRNA was cleaved into short fragments. Then cDNA was synthesized using the mRNA fragments as templates. Short fragments were purified and resolved with elution buffer for end reparation and single nucleotide adenine addition. After that, the short fragments were connected with adapters. The suitable fragments were selected as templates for PCR amplification. During the QC steps, Agilent 2100 Bioanaylzer and ABI StepOnePlus Real-Time PCR System were used in quantification and qualification of the sample library. Lastly, the library was sequenced using an IlluminaHiSeqTM 2000.

Clean Reads and Sequence Assembly
Raw reads were filtered to remove adaptors, reads with more than 5% unknown nucleotides, and other low quality reads. After QC filtering, the following analysis was performed. Transcriptome de novo assembly was conducted with the short-reads assembly program Trinity. Trinity, including three independent software modules, Inchworm, Chrysalis, and Butterfly, was applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, which represent the transcriptional complexity at a given gene or locus. Then, each graph was independently processed to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. The result sequences from Trinity are called unigenes.

Annotation and Analysis of Unigenes
BLASTx alignment (e-value < 10 −5 ) between unigenes and protein databases, including Nr, Swiss-Prot, KEGG, and COG, was performed, and the best alignment results were used to decide sequence direction of unigenes. If results of different databases conflicted with each other, a priority order of Nr, Swiss-Prot, KEGG, and COG was followed. When a unigene happened to be unaligned to none of the above databases, ESTScan (http://estscan.sourceforge.net) [22], a program that can detect coding regions in low-quality sequences, was introduced to decide its sequence direction. To obtain protein functional annotation, unigenes were aligned by BLASTx to protein databases (e-value < 10 −5 ), and aligned by blastn to nucleotide databases nt (e-value < 10 −5 ), retrieving proteins with the highest sequence similarity with the given unigenes. With Nr annotation, the Blast2GO program was used to obtain GO annotation of unigenes. After obtaining GO annotation for every unigene, WEGO software was used to perform GO functional classification for all unigenes and to understand the distribution of gene functions [23]. With the help of KEGG database, we could further study the genes' biological complex behaviors, and using KEGG annotation we could obtain pathway annotation for unigenes.

Identification and Analysis of Differentially Expressed Genes
First, the RPKM method was used to calculate the expressed value of genes (Reads Per kb per Million reads). The RPKM method is able to eliminate the influence of different gene length and sequencing level on the calculation of gene expression. Therefore the calculated gene expression can be directly used for comparing the different expression between samples. Then, the p value was applied to determine differentially expressed unigenes. FDR (False Discovery Rate) control is a statistical method used in multiple hypothesis testing to correct for p-value. In our analysis, we choose those with FDR ≤ 0.001 and a ratio ≥ 2. Finally, differentially expressed genes (DEGs) were then subjected to GO functional analysis and KEGG pathway analysis.

Availability of Supporting Data
The raw Illumina sequencing dataset of Aspergillus flavus was submitted to the NCBI Sequence Read Archive under the accession number of SRP034649.

Effect of Water Activity on Growth and Aflatoxin Production of A. flavus
Growth and aflatoxin production by A. flavus at the phenotypic level was monitored in relation to changes in different treatments. As seen in Figure 1A, the colony diameter of strains at 0.93 aw was significantly smaller than that at 0.99 aw. When grown on YES plates at 37 °C, the strains under 0.93 aw exhibited decreased conidiation compared to that under 0.99 aw, and the strains under 0.93 aw displayed an approximately 16-fold conidia reduction compared with the strains under 0.99 aw (data not shown). As previously described, growth was highly influenced by water activity [24]. To identify the effect of different water activity on aflatoxin production, thin-layer chromatography analysis was performed with the standards of aflatoxin on the silica gel G plates, and the results are shown in Figure 1B. When aw was reduced, there was a sharply decrease in aflatoxin biosynthesis although the culture condition remained at 28 °C. Compared with that at 0.99 aw, aflatoxin production was very low, and only other extracted metabolites were observed at 0.93 aw. Our findings are consistent with a previous report that more aflatoxin was produced under 0.99 aw than under 0.93 aw [16]. The data indicates that aflatoxin production of A. flavus was obviously affected by water activity. This phenomenon may be due to complex regulation of the aflatoxin biosynthesis gene cluster of A. flavus in relation to various levels of water activity [17].

Illumina Sequencing and Reads Assembling
Two cDNA libraries were prepared at the fifth day and sequenced using the Illumina platform. Illumina sequencing generated a total of 41,004,372 reads (0.99 aw) and 40,712,492 reads (0.93 aw) that were 90 bp in length after stringent data cleaning and quality checks. The mean of Q20 percentage (proportion of nucleotides with quality value larger than 20 in reads), N percentage (proportion of unknown nucleotides in clean reads) and GC percentage are 96.54%, 0.00% and 52.96%, respectively. Trinity was used to assemble clean reads, producing a total of 60,039 contigs with a minor of N50 of 1623 nt (i.e., the median length of all unigenes) for A. flavus. After further processes of sequence splicing and redundancy removing, a total of 23,320 non-redundant unigenes were identified. Of these, 24,991 and 25,190 unigenes were generated from the 0.99 aw and 0.93 aw treatments, respectively ( Table 1). The length distribution in Figure 2A indicated that 47.08% unigenes (total 10,978 unigenes) had a length > 1000 nt (mean 1297 nt).
To evaluate the quality of RNA-Seq data, several quality control analyses were performed. Firstly, the ratio of the gap length of assembled unigenes was assessed, and the results indicate that gap lengths were less than 5% of the total length. In addition, the total coverage of reads from the 5' to the 3' end of genes was examined, and it revealed that both the 0.99 aw and 0.93 aw RNA-Seq reads were evenly distributed with the exception of the very 5' and 3' ends ( Figure 2B,C). Therefore, the assembled data are of high quality in current study. The Aspergillus genus are widely distributed molds in the environment, many of which are documented to cause human disease [25]. Some of the RNA-Seq data for Aspergillus has been published previously [26][27][28], and very recently, Chang et al. (2014) compared the different transcriptome profiles of A. flavus exposed and not exposed to decanal [29]. To our knowledge, this study was the first report on the complete transcriptome of Aspergillus in response to two different water activities using an Illumina paired-end sequencing strategy.

Annotation and Analysis of All-Unigenes
To understand the transcriptome of A. flavus, all unigenes were aligned against sequences from the NCBI non-redundant (nr) protein database by using the BLASTx algorithm with an e-value threshold of 10 −5 . BLASTx alignment analysis indicated that a total of 19,838 unigenes matched to known proteins in the Nr databases. Thus far, a total of 13,071 genes encoding proteins have already been annotated in the genome of A.flavus [30], whereas Lin et al. (2013) estimated that A. flavus has 14,510 genes by combining NCBI database with their RNA-Seq data [19]. However, the RNA-seq data presented in this work implies that more unigenes have the potential for translation into functional proteins, which serves to enrich the annotation of the A. flavus genome. A possible explanation for this phenomenon is posttranscriptional regulation, such as alternative splicing and RNA editing, enlarges their transcripts diversity [21]. Figure 3A,B show the similarity distribution of all unigenes in detail. The results indicate that 96.81% of all unigenes had an identification of more than 60% of the annotated genes. In a comparison with the nr database, we interestingly found that 38.5% of sequences matched to that of A. oryzae, but only 32.6% unigenes were well matched to that of A. flavus. However, Yu et al. (2008) found A. oryzae shared over 95% identity to A. flavus on the DNA level, and fewer than 300 genes were unique to each species [29].

Functional Analysis and Classification of All-Unigenes
To deeply understand the transcriptome of A. flavus, GO (Gene Ontology) and COG (Clusters of Orthologous Groups of proteins) were applied to classify functions of the predicted all unigenes. A total of 13,342 unigenes were grouped to at least one GO term, and these unigenes were classified into three functional categories ( Figure 4A). Sequences with GO terms corresponding to the "biological process" group were divided into 23 subcategories, "cellular component" into 16 subcategories and "molecular function" into 14 subcategories. As shown in Figure 4A, the largest subcategory found in the "biological process" group was "metabolic process" which comprised 32.1% of the unigenes in the subcategory. By applying COG platform, we obtained 18,394 sequences involved in COG classification, which were grouped into 25 categories ( Figure 4B). Among the 25 COG categories, "general function prediction only" was the most populated group (17.72%) followed by "carbohydrate transport and metabolism" (8.27%) and "amino acid transport and metabolism" (7.59%). Furthermore, the Kyoto Encyclopedia of Genes and Genomes (KEGG) database was used to identify the biological pathways in A. flavus. A total of 12,232 annotated unigenes were grouped to 108 KEGG pathways (Table S1). The pathways with the most representation among the unique sequences were involved in metabolic pathways (28.78%, 3520), biosynthesis of secondary metabolites (12.61%, 1543) and starch and sucrose metabolism (4.85%, 593). As expected, most unigenes belong to metabolic pathways because of their involvement in the maintenance of basic biological processes of A. flavus. The A. flavus genome sequence contains remarkable enzymatic genes associated with secondary metabolite synthesis [2,30], which intimates it has the capacity to express more unigenes for biosynthesis of secondary metabolites under specific conditions. It has been documented previously that A. flavus produces numerous hydrolyses [31], including α-amylase precursor, α-amylase A precursor, α-L-arabinofuranosidase precursor, β-galactosidase, catalase (A and B), glutaminase A and α-mannosidase, which are believed to be important for fungal utilization of starch-rich. These results are in full agreement with the KEGG annotations of the unigenes.

Identification and Analysis of DEGs
To identify the differences of molecular response between 0.99 aw and 0.93 aw treatments, gene expression levels were calculated using the RPKM method [22]. Based on RPKM values, out of 23,320 unigenes, 5362 differentially expressed unigenes (with p < 0.05, FDR ≤ 0.001, |log2Ratio| ≥ 1) were identified ( Figure 5A). Among them, 3156 and 2206 genes displayed up-regulation under 0.99 aw and 0.93 aw treatments, respectively. All of the differentially expressed sequences were subjected to GO analysis, and the number of unigenes with GO annotations in 0.99 aw DEGs (1714) was more than that of 0.93 aw (1296). As shown in Figure 5C, the GO terms "transporter activity", "localization", "establishment of localization", and "biological regulation" were significantly over-represented at the transcriptional level at 0.99 aw compared with the 0.93 aw. In contrast, the GO categories "structural molecule activity", "catalytic activity", "nucleic acid binding transcription factor activity", "organelle part", "organelle", "membrane-enclosed lumen", "macromolecuar complex", and "cellular component organization or biogenesis" were expressed at high levels under 0.93 aw conditions, demonstrating that   To study the function of DEGs, KEGG metabolic pathways analysis was performed by initially aligning unigenes with sequences from GenBank. Among 2721 DEGs, 1516 annotated unigenes upexpressed in 0.99 aw conditions, and 1205 genes up-expressed in 0.93 aw conditions were grouped into 108 known metabolic or signaling pathway classes (Table S2). Although the pathway distributions of these up-expressed genes in both 0.99 aw and 0.99 aw were almost in accordance with each other, more genes displayed at least two-fold up-regulation in 0.99 aw conditions ( Figure 5B). For example, more genes related to fatty acid metabolism such as "fatty acid biosynthesis", "biosynthesis of unsaturated fatty acids" and "fatty acid elongation" displayed high transcriptional activity at 0.99 aw. Aflatoxins are known to begin with norsolorinic acid, which is synthesized in vivo by a specialized pair of fatty acid synthases (FAS-1 and FAS-2) and a separately transcribed polyketide synthase (PKS-A) [32].

Analysis of DEGs Involved in Aflatoxin Biosynthesis
To evaluate the effect of water activity on the regulation of aflatoxin biosynthesis, we used the sequence information of 33 candidate genes provided by NCBI to identify the putative aflatoxin biosynthesis genes in the A. flavus transcriptomes [15]. As shown in Table 2, a number of different expression genes related to aflatoxin biosynthesis in response to water stress were identified. Among the 33 candidate genes identified in the transcriptome of A. flavus, 16 genes were up-regulated more than twofold in 0.99 aw conditions compared with 0.93 aw conditions. Several genes coding for aflatoxin biosynthesis were significant differences between the two regimes. For example, aflF, aflU and aflG all have more than 10-fold transcriptional changes in 0.99 aw relative to 0.93 aw conditions. The gene aflF also named norB, shares 68% amino acid similarity to an aryl alcohol dehydrogenase encoded by an aflE (norA) gene, which is putatively involved in the conversion of NOR to AVN [33]. An additional gene, aflD (nor-1), was identified in the aflatoxin gene cluster encoding a ketoreductase that is capable of converting NOR to AVN [34]. Therefore, the presence of one of them was enough to catalyze NOR to AVN [35], which may help explain the phenomenon that only aflF up-regulated more than 2-folds. The gene aflU encodes a polypeptide of 498 amino acids, which has a typical heme-binding motif of cytochrome P450 monooxygenase [35]. Based on sequence analysis and the enzymatic requirement for G-group toxin biosynthesis, this gene is most likely involved in G-group toxin formation in aflatoxin biosynthesis [33]. A previous expression study revealed that its transcript was detected only under aflatoxin-conducive conditions and not on non-conducive conditions [35], which is in a good agreement with our findings. The gene aflG encodes a cytochrome P450 monooxygenase that converts AVN to HAVN [36]. A striking finding that aflG/aflL is contiguous only in the cluster of section Flavi species suggested aflG/aflL either was recruited from other genomic locations or reorganization of cluster genes from a sterigmatocystin ancestor [37]. The gene aflNa (hypD), first reported by expressed sequence tag data, has been predicted to encode a small integral membrane protein and suggested to affect both development and secondary metabolism of Aspergillus [38]. Interestingly, among the five most highly up-regulated genes, aflF, aflU and aflT genes are adjacent and located on the very end of the gene cluster, whereas aflG and aflNa are located next to each other in the middle of the gene cluster. Therefore, the gene aflF could be related to turning on/off aflatoxin pathway gene expression, and on chromosomal location these gene may be responsive to the environmental queue of water activity. The gene aflR is a Zn2Cys6-type transcription factor that is believed to be necessary for regulating most of the genes in the aflatoxin gene cluster in A. flavus [38], and they demonstrated that water activity had a significant effect on aflR transcription at lower aw (0.90) compared with higher aw (0.99) [39]. Curiously, the expression of this gene did not display somewhat difference even though strains were removed to a favor aflatoxin-producing regimes.

Analysis of DEGs Involved in Development
The control of secondary metabolism in fungi is often coordinated to fungal growth and development [40]. To further explore potential DEGs involved in aflatoxin biosynthesis in A. flavus, we analyzed 69 annotated sequences for the genes involved in development [41]. We found that the transcriptional patterns of most genes involved in development were down-regulated when A. flavus was treated with a lower water activity. For instance, flbC encoding C2H2 transcription factor, which is involved in asexual development, sexual development and germination, decreased its RPKM value from 135.08 to 5.67 (Table 3). In wild-type colonies, FlbC localizes in the nuclei of vegetative hyphae and in conidiophores, activates brlA, abaA, and vosA but not wetA [42]. Apart from flbC, four flb genes, flbA, flbB, flbD and flbE were taken into account in present study. FlbA encodes an RGS domain protein, which negatively regulates vegetative growth signaling [43]. FlbB encodes a fungal specific bZIP-type transcription factor, which is located within the cytoplasm at the hyphal apex during early vegetative growth and involved in asexual development [44]. FlbD, a c-Myb transcription factor, is uniquely involved in both asexual and sexual differentiation in A. nidulans [45]. FlbE localized at hyphal tips, which may protect FlbB from proteolytic degradation [46]. Although the flb genes are conserved in A. fumigatus, A. oryzae and A. nidulans [41], only flbC was un-regulated in the current study. This inconsistency may be explained by FlbC acting in a pathway parallel to that of other flb genes. Alternatively, the promoter-binding regions of FlbC and FlbB/FlbD may overlap [47].

Conclusions
Aspergillus flavus is an imperfect filamentous fungal pathogen causing diseases of many agricultural crops, such as maize, cotton, and peanuts, as well as tree nuts [48]. In the current work, a transcriptome database of A. flavus was constructed. From the two different treatments (0.99 aw and 0.93 aw), we identified differentially expressed genes by transcriptome analysis and found that numerous metabolic pathways related to biosynthesis were significantly over-expressed when treated with 0.99 aw, especially in the biosynthesis of aflatoxin in A. flavus. During treatment with 0.99 aw, unigenes involved in development, such as flbC, were significantly up-regulated. The relationship between the aflatoxin biosynthesis pathway and development of A. flavus is complex and further analytical work is required. Moisture is an important regime factor for fungi growth and mycotoxin production, but little transcription level information is available at present; therefore, our transcriptome provides a resource for further studies examining water activity, and fungi growth and aflatoxin production. Collectively, this study opens the way to future studies analyzing the effect of water activity on other fungi physiology.