Deletion and Gene Expression Analyses Define the Paxilline Biosynthetic Gene Cluster in Penicillium paxilli

The indole-diterpene paxilline is an abundant secondary metabolite synthesized by Penicillium paxilli. In total, 21 genes have been identified at the PAX locus of which six have been previously confirmed to have a functional role in paxilline biosynthesis. A combination of bioinformatics, gene expression and targeted gene replacement analyses were used to define the boundaries of the PAX gene cluster. Targeted gene replacement identified seven genes, paxG, paxA, paxM, paxB, paxC, paxP and paxQ that were all required for paxilline production, with one additional gene, paxD, required for regular prenylation of the indole ring post paxilline synthesis. The two putative transcription factors, PP104 and PP105, were not co-regulated with the pax genes and based on targeted gene replacement, including the double knockout, did not have a role in paxilline production. The relationship of indole dimethylallyl transferases involved in prenylation of indole-diterpenes such as paxilline or lolitrem B, can be found as two disparate clades, not supported by prenylation type (e.g., regular or reverse). This paper provides insight into the P. paxilli indole-diterpene locus and reviews the recent advances identified in paxilline biosynthesis.


Introduction
Paxilline is a member of a large and structurally diverse group of indole-diterpene secondary metabolites, many of which are potent tremorgenic mammalian mycotoxins, synthesized by filamentous fungi [1]. These metabolites have a common structural core comprised of a cyclic diterpene skeleton derived from geranylgeranyl diphosphate (GGPP) and an indole group that is proposed to be derived from indole-3-glycerol phosphate, a precursor of tryptophan [2,3]. Paspaline is proposed to be the first stable intermediate from which many of the other metabolites of this class are derived [4]. Further chemical elaboration of paspaline is proposed to occur by additional prenylations, different patterns of ring substitutions and different ring stereochemistry [5].
Understanding fungal indole-diterpene biosynthesis has progressed considerably in recent years principally through research on paxilline biosynthesis in Penicillium paxilli. This is an ideal organism for studying indole-diterpene biosynthesis because it grows rapidly, produces large quantities of paxilline in submerged culture and is readily amenable to genetic manipulation [6,7]. Using a combination of plasmid insertional mutagenesis and chromosome walking, a cluster of genes was isolated and shown to be required for paxilline biosynthesis [8]. Gene disruption and chemical complementation experiments have shown that paxG, paxP and paxQ are required for paxilline biosynthesis [8][9][10].
PaxG, a geranylgeranyl diphosphate (GGPP) synthase [11], is proposed to catalyze the first step in paxilline biosynthesis ( Figure 1). Targeted deletion of paxG resulted in mutant strains that were completely blocked for indole-diterpene biosynthesis [8,11]. Using a P. paxilli mutant deleted for the entire pax gene cluster we were able to show by gene reconstitution experiments that just four genes, paxG, paxM, paxB and paxC were necessary and sufficient for paspaline biosynthesis [4]. Based on this study we proposed a biosynthetic scheme for paspaline biosynthesis involving condensation of indole-3-glycerol phosphate with GGPP to form 3-geranylgeranylindole (3-GGI), followed by epoxidation and cyclization of this intermediate to form paspaline ( Figure 1). This scheme has recently been experimentally validated by reconstituting paspaline biosynthesis in the heterologous host Aspergillus oryzae [12]. Stepwise introduction of paxG, paxC, paxM and paxB into A. oryzae, combined with in vitro protein expression studies, demonstrated that PaxC is a prenyl transferase required for formation of 3-GGI and that PaxM and PaxB catalyze the stepwise epoxidation and cyclization of 3-GGI to paspaline [12]. Two cytochrome P450 monooxygenases, PaxP and PaxQ are involved in the later steps of the pathway in which paspaline is converted to paxilline [9,10]. While deletion mutants of paxP and paxQ were blocked for paxilline biosynthesis, they accumulated paspaline and 13-desoxypaxilline, respectively, confirming that both genes were required for paxilline biosynthesis and that paspaline and 13-desoxypaxilline were the most likely substrates for the corresponding enzymes [9]. This was confirmed by feeding these compounds to strains lacking the pax cluster but containing ectopically integrated copies of paxP and paxQ [10]. Transformants containing paxP converted paspaline into 13-desoxypaxilline as the major product and β-PC-M6 as the minor product. paxQ-containing transformants converted 13-desoxypaxilline into paxilline. These results confirmed that paspaline, β-PC-M6 and 13-desoxypaxilline are paxilline intermediates and that paspaline and β-PC-M6 are substrates for PaxP, and 13-desoxypaxilline is a substrate for PaxQ [10]. Stepwise introduction of the pax genes into A. oryzae showed that addition of paxG-C-M-B-P-Q was sufficient to reconstitute the machinery for paxilline biosynthesis [12].  [12] and Liu et al. [13].
Here we present a complete functional analysis of the PAX gene cluster locus. Using a combination of bioinformatics, gene expression and multiple targeted gene replacement analyses, we have demarcated the boundaries of the gene cluster and defined a set of seven genes required for paxilline biosynthesis in P. paxilli, plus one additional gene needed for paxilline prenylation. Collectively, the data presented here along with previously published results by us and others establish the P. paxilli pax gene cluster as a model system for understanding indole-diterpene biosynthetic pathways.

Results and Discussion
Our first reported annotation of the PAX locus predicted the involvement of 17 genes in the biosynthesis of paxilline, with the paxN and paxO boundaries flanked by genes encoding a putative lipase and an arabinase, respectively [8]. A re-analysis of the DNA sequence at this locus identified a total of 21 putative genes, reannotated as PP101 (paxN) to PP121 (paxO). The four new putative genes identified were predicted to encode a hypothetical protein (PP103), an acetyl transferase (PP109) and two integral membrane associated proteins (PaxA/PP114 and PaxB/PP116). A summary of the putative functions of all 21 genes is summarized in Table 1.  To define the core cluster of genes required for paxilline biosynthesis a set of targeted gene deletion mutations were generated at the PAX locus ( Figure 2). PCR-generated linear fragments of the gene replacement constructs were recombined into the genome of P. paxilli. PCR screening of hygromycin or geneticin resistant transformants identified putative replacements. Southern blot analysis was used to identify transformants containing a targeted gene replacement ( Figure 2). These transformants were analyzed by normal phase TLC for their ability to synthesize paspaline, 13-desoxypaxilline and paxilline ( Figure 3). This analysis showed that ΔpaxG [8,11], ΔpaxA, ΔpaxM, ΔpaxB and ΔpaxC mutants were unable to synthesize paxilline or any other indole-diterpene intermediates found in P. paxilli wild-type. The absence of any identifiable indole-diterpene compound in these extracts was confirmed by reverse phase HPLC analysis. As previously shown, ΔpaxP and ΔpaxQ mutants accumulate paspaline and 13-desoxypaxilline respectively [9]. Deletions of PP104 and PP105, encoding putative transcription factors with Zn(II) 2 Cys 6 binuclear cluster DNA-binding motifs, PP107 (encoding a putative dehydrogenase), PP112 (encoding a conserved hypothetical protein) and paxD (=PP120; encoding a putative indole dimethylallyl transferase) all accumulated paxilline and the other indole-diterpene intermediates found in P. paxilli wild-type. While the amount of paxilline present in the ΔPP112 sample is low (Figure 3), independent TLC analyses confirmed this mutant did synthesize paxilline at levels comparable to the other mutants not involved in paxilline biosynthesis. The PP104-PP105 double mutant also had the same phenotype as wild-type, as did CYD-67, a deletion of paxD that extends through PP121 to an undefined point beyond both genes. This deletion analysis defines a set of 7 genes, paxG through to paxQ, that are required for paxilline biosynthesis.  Closed arrows indicate the direction of gene/ORF transcription. The genes shown to be involved in paxilline and prenylated paxilline biosynthesis are designated as pax and the other predicted genes as PP (Penicillium paxilli). The thin red or green lines under the PAX locus indicate the deleted region for each gene or as an arrow in the case of the mutant CYD-67 that extends beyond the genomic region shown. Color scheme depicts role in paxilline biosynthesis based on gene deletion analysis: red-known role in paxilline biosynthesis; green-no role in paxilline biosynthesis, but paxD has a role in post-paxilline biosynthesis; (b) Normal phase TLC analysis for paxilline production in the P. paxilli strains deleted for the genes/ORFs mentioned in panel A. For paxilline extraction, mycelium was harvested 6 days after inoculation. Abbreviations: 13-dp, 13-desoxypaxilline; pasp, paspaline; pax, paxilline.
PaxC is predicted to be a prenyl transferase as it contains the five conserved domains found in other prenyl transferases [15] (Figure 4), including PaxG, which has recently been shown to be a functional GGPP synthase [11]. This superfamily of enzymes is characterized by the presence of two aspartate-rich motifs, DDXXD and DDXXN/D, located in Domains II and V, respectively, that are important for allylic substrate binding and catalysis. While the first aspartate-rich motif (DDISD) in PaxC conforms to this consensus, the second (NDXXN) does not suggesting PaxC has a novel function. The recent work by Tagami et al. [12] demonstrates that PaxC is a prenyl transferase that catalyzes the condensation of indole-3-glycerol phosphate with GGPP to form 3-GGI [12] (Figure 1). PaxM is predicted to be an FAD-dependent monooxygenase containing a modified Rossman fold, as it contains the highly conserved dinucleotide binding motif (DBM), as well as the ATG, GD and G-helix motifs found in the functionally characterized salicylate hydroxylase (NahG) from Pseudomonas putida and zeaxanthin epoxidase from Nicotiana plumbaginifolia [16][17][18][19][20] (Figure 5). These same motifs are found in many closely related hypothetical proteins identified in the genomes of other filamentous fungi including fruiting body maturation (Fbm-1) from Neurospora crassa [21]. The top hits to PaxM were to structurally and (mostly) functionally characterized bacterial FAD-dependent, NAD(P)H-binding proteins including urate oxidase from Klebsiella pneumoniae (PDB ID: 3rp8; 22.3% identity) [22], 2,6-dihydroxypyridine 3-hydroxylase from Arthrobacter nicotinovorans (PDB ID: 2vou; 15.3% identity) [23], aklavinone-11-hydroxylase from Streptomyces purpurascens (PDB ID: 3ihg; 17.2% identity) [24] and putative FAD-containing monooxygenase from Photorhabdus luminescens subsp. laumondii TTO1 (PDB ID: 4hb9; 18.6% identity). Reconstitution of paspaline biosynthesis in A. oryzae demonstrates that PaxM, together with PaxB (see below), is involved in two rounds of epoxidation/cyclization to first generate emindole SB then paspaline [12] ( Figure 1). PaxA and PaxB appear to be a novel group of integral membrane proteins, containing 6 or 7 transmembrane domains (Figures 6 and 7). Despite their similarity in predicted secondary structure, they share very little sequence identity. They each contain a single intron, but the size (60 nt versus 87 nt) and location (345-404 and 519-605) of these introns is different. In addition, paxA utilizes a second 5' GT donor, upstream of the first (226-404; 170 nt intron), to generate an alternative mRNA isoform. Conceptual translation of this isoform generates a 77-, instead of 356-, amino acid polypeptide. The shorter (77 amino acid) predicted polypeptide contains no putative transmembrane domains. BLASTP analysis identified a number of closely related proteins in other fungal genomes but all are hypothetical conserved proteins. On the basis of their reconstitution experiments Tagami et al. [12] propose that PaxB is a novel indole-diterpene cyclase that works together with PaxM to convert 3-GGI to paspaline (Figure 1). However, the role of PaxA is unclear given reconstitution experiments in P. paxilli and A. oryzae demonstrated that paxG-M-B-C were required for the synthesis of paspaline [4,12], and in A. oryzae paxG-M-B-C-P-Q were sufficient for paxilline biosynthesis [12], yet the paxA deletion mutant was defective in paxilline biosynthesis and could be complemented by reintroduction of the wild-type paxA. Although, the functional role of PaxA is still unclear homologues of this gene are present in all Penicillium and Aspergillus indole-diterpene gene clusters identified to date [4,14]. Furthermore, a gene named idtS (ltmS) that encodes a structurally similar gene product to paxA, is found in indole-diterpene gene clusters from the Clavicipitaceae [25,26].   Given the mutual requirement of PaxB and PaxM to effect the conversion of 3-GGI to paspaline [12], it is of note that PaxM is predicted by TMHMM to have an approximately 25-residue C-terminal transmembrane helix with the N-terminal region in the cytosol. This C-terminal tag would facilitate co-location of PaxM with the integral membrane protein PaxB. Furthermore, the region of helix II predicted for PaxB is not predicted to be a transmembrane helix in several other sequences (Pc-CAP80269, Nl-LtmB and Mg_XP_367501; Figure 7). This region carries the conserved WExx(Y/F) motif in its middle. For Pc-CAP80269, Nl-LtmB and Mg_XP_367501 the N-terminal sequence preceding helix I is predicted to lie on the cytosolic side of the membrane, placing the conserved WExx(Y/F) extracellularly. At least one positively charged residue, as well as at least one histidine, is found on the intracellular loops between helices III and IV and between helices V and VI. The latter contain strongly conserved hydrophobic residues at their N-and C-termini, respectively. Finally, transmembrane predictor MEMSAT-SVM [27] suggests that PaxB has a propensity to form a pore. Based on all these observations we propose that PaxB may provide the proton(s) to break open the epoxide (the formation of which is mediated by PaxM) and orientate the 3-IGG in an internal pore so that the correct cyclization to paspaline takes place.
To further define the boundaries of the pax cluster, expression analysis was carried out on all proposed pax biosynthetic genes and on the genes immediately flanking the pax genes. This analysis showed that in addition to the 7 previously defined pax genes, paxD and PP121 were also up-regulated with the onset of paxilline biosynthesis (Figure 8). The multiple bands observed in the paxA, PP121 and PP122 lanes are potentially products of incomplete or alternative splicing. In contrast to these samples the steady-state levels of β-tubulin, PP111, PP112 and PP122 are very similar across the time course of growth. These results suggest that paxD and possibly PP121 are coordinately regulated with the 7 core pax biosynthetic genes. The best characterized match to PaxD is AtmD from Aspergillus flavus, an indole dimethylallyl transferase that is predicted to catalyze the C4-reverse prenylation of paspalinine to form aflatrem [14,28] (Figure 1, Table 1). Therefore, a targeted disruption of paxD was made to determine whether there were any metabolite profile differences to wild-type that may be the result of additional prenylation steps. In screening the putative knockouts both a single replacement deletion (CYD-162) as well as an extended deletion of undefined length (CYD-67) of paxD were identified (Figures 2 and 3). As the TLC analysis of the paxD deletions showed the presence of paxilline, mass spectrometry (MS) analysis was used to compare the chemical phenotype of wild-type with the paxD deletion mutants. LC-MS/MS analysis identified a novel indole-diterpene at 32.8 min within the wild-type sample with a peak at m/z 504.3 that is absent in ΔpaxD (Figure 9a  . Based on these spectra, we assume that the prenylation occurs on the indole part of the molecule. However, the exact location of the prenyl group on the indole system remains to be elucidated. These results demonstrate that PaxD is able to catalyze the further addition of an isoprene unit to the basic paxilline structure (Figure 1), a result confirmed experimentally by Liu et al. [13] who demonstrated that PaxD purified from E. coli could catalyze the conversion of dimethylallyl diphosphate and paxilline in vitro to mono-(m/z of 504.3) and di-prenylated (m/z of 572.3) paxilline. Analysis of the 1 H-and 13 C-NMR spectra confirmed that the major product was 21,22-diprenylated paxilline [13]. The gene PP121 is predicted to encode an oxidoreductase but this gene has still to be deleted to determine whether it also has a role in post-paxilline biosynthesis. However, the LC-MS/MS analysis was unable to detect differences between the ΔpaxD (CYD-162) and the extended deletion mutant CYD-67, suggesting that if the PP121 gene product has a role as part of this biosynthetic gene cluster, it would act post PaxD.
Unlike other prenyltransferases (e.g., PaxC and PaxG), the indole dimethylallyl transferases found in fungi do not contain the two aspartate-rich motifs, DDXXD and DDXXN/D, are generally more divergent [29][30][31][32], have broad indole derivative substrate specificity, yet only accept dimethylallyl diphosphate as the prenyl group donor [29]. The predicted active sites of two indole dimethylallyl transferases, CpaD (for α-cyclopiazonic acid) and FgaPT2 (first committed step in ergot alkaloid biosynthesis in A. fumigatus) have been characterized through mutagenesis and crystal structure, respectively [33,34]. CpaD and FgaPT2 both catalyze regular prenylation of the indole moiety at the C4 position and are found in the clade that contains the DmaW required for ergot alkaloid production (FgaPT2) or that catalyze a similar reaction (CpaD) [33][34][35][36]. Alignment of PaxD with these and other characterized dimethylallyl transferases shows some conservation across the sites proposed to be important for enzyme activity [33]. However, not all sites are conserved and these differences may explain enzymatic variation between substrates and resulting products where prenyl transfer occurs on different positions of indole moieties and depends on prenylation type.  Table A1 together with additional information on position and type of prenylation including name of the metabolite and reference.
To gain further insight into the evolution and functional relationship of PaxD and related indole dimethylallyl transferases, phylogenetic analysis of 21 related proteins, from 15 different species, of which 20 have known functions or predicted biosynthetic products, was carried out based on previous analyses of Liu et al. [37] (Figure 10). A phylogenetic tree based on the entire gene-coding region, of which 265 sites are informative, was used to potentially place functionality to PaxD. PaxD clustered closely with AtmD even though these two proteins share only 35% identity (Table 1) and have different modes of prenylation; regular for PaxD versus reverse for AtmD. The proteins within the ergot alkaloid clade, which includes DmaW, group very tightly together even though they have a broad taxonomic distribution [37]. In contrast, the dimethylallyl transferases involved in prenylation of indole-ditperpenes such as paxilline or lolitrem B, group into two very disparate clades ( Figure 10).
Although P. paxilli PaxD and Claviceps paspali IdtF are both able to prenylate the C5-position of an indole-diterpene resulting in prenylated paxilline and paspalitrem A, respectively, the sequences are quite divergent (sharing only 22% identity) and group in different clades. The two indole-diterpene clades represent members with different prenylation capabilities with both regular and reverse prenylation as well as prenylation of the diterpene moiety [26] (Table A1). Further analyses would be required to determine if the differences between these two clades simply represents phylogenetic distances between the species and whether there are implications for functional biochemical differences.

Molecular Biology
Plasmid DNA was isolated and purified by alkaline lysis using a Bio-Rad Quantum Prep ® Plasmid Mini-prep Kit (Bio-Rad, Berkeley, CA, USA). Genomic DNA was isolated using a modification of the method of Yoder (1988) [41] as described previously [4]. PCR conditions were as previously described [4,9,10], using primer sets listed in Table A2. DNA fragments and PCR products were purified using a QIAquick gel extraction and PCR purification kit (Qiagen, Hilden, Germany). DNA fragments were sequenced by the dideoxynucleotide chain-termination method [42] using Big-Dye (Version 3) chemistry (PerkinElmer Life Sciences, Waltham, MA, USA) with oligonucleotide primers (Sigma Genosys, St. Lois, MO, USA). Products were separated on an ABI Prism 377 sequencer (Perkin-Elmer Life Sciences). Total RNA was isolated from frozen mycelium using TRIzol ® reagent (Invitrogen, Carlsbad, CA, USA) and treated with DNase (Invitrogen, Carlsbad, CA, USA), as described previously [10]. RT-PCR conditions were as previously described [4,9,10], except DNase-treated total RNA (80 ng) was converted to cDNA and amplified for just 27 cycles in a single reaction using Superscript III-RT enzyme (Invitrogen) according to the manufacturer's instructions. Primers used to amplify each of the genes are summarized in Table A2.

Penicillium paxilli Transformation and Screening
Protoplasts of PN2013 were prepared and transformed with PCR-amplified linear products of each of the replacement constructs as previously described [9], except protoplasts transformed with linear products of pBM2, pBM3 and pBM4 were plated on ACM medium supplemented with 0.8 M sucrose, rather than RG medium. Transformants were selected on medium supplemented with either hygromycin (100 μg/mL) or geneticin (150 μg/mL). The resulting stable transformants were maintained on either PD or ACM medium supplemented with either hygromycin or geneticin.
Primary screening of transformants for targeted homologous recombination events was carried out using genomic DNA from conidia as template [4], and primer sets (see above) within, and external to, the gene fragment to be replaced. Putative replacement mutants identified by PCR screening were further analyzed by Southern blotting and hybridization, using methods previously described [6].

Indole-Diterpene Analysis
Indole-diterpenes were extracted from mycelium of P. paxilli in a 2:1 chloroform-methanol mixture and analyzed by normal phase TLC and reverse phase HPLC as previously described [4]. LC-MS/MS analysis was performed on a Thermo Finnigan Surveyor (Thermo Finnigan, San Jose, CA, USA) HPLC system as previously described [4]. Mass spectra were determined with a linear ion trap mass spectrometer (Thermo LTQ, Thermo Finnigan, San Jose, CA, USA) using electro spray ionization (ESI) in positive mode using parameters previously described [4].

Bioinformatic Analyses
Sequences were aligned using ClustalX or ClustalW [43] with sequences retrieved from the NCBI GenBank database or the Broad Institute. Multiple sequence alignments were edited using Jalview.
Putative function of proteins encoded by pax genes and protein domains were identified using InterProScan [44,45]. The predicted transmembrane topologies of PaxA and PaxB were determined using TMHMM version 2, which utilizes a hidden Markov model [46].
Given the low level of sequence identity to proteins of known function, pGenThreader [47] at the University College London website [27] was used to find structures whose pattern of secondary structure elements match those predicted for PaxM. This threading is based on the well-established premise that 2-D structures, for which reliable prediction algorithms exist [48], and resultant 3-D structures, are conserved even where sequence identity has lost significance.
The phylogenetic relationships of PaxD and other known indole dimethylallyl transferases (accession numbers provided in Table A1) were determined with the program MAFFT version 7 [49,50]. Alignments were performed similarly to Liu et al. [37] with the following settings, FFT-NS-I, JTT200 scoring matrix with the gap opening penalty set to 1.0 and gap extension penalty at 0.0.
The pax gene sequences from P. paxilli are available in the GenBank database under accession number HM171111 (update to AF279808).

Conclusions
A cluster of seven genes-paxG, paxA, paxM, paxB, paxC, paxP and paxQ-is required for paxilline biosynthesis in P. paxilli. One additional gene, paxD, is required for a post-paxilline biosynthetic step resulting in prenylation of the indole group of paxilline. Together, these genes constitute the pax gene cluster with each gene deleted, functionally characterized, and shown to be transcriptionally co-regulated.