A Possible Trifunctional β-Carotene Synthase Gene Identified in the Draft Genome of Aurantiochytrium sp. Strain KH105

Labyrinthulomycetes have been regarded as a promising industrial source of xanthophylls, including astaxanthin and canthaxanthin, polyunsaturated fatty acids such as docosahexaenoic acid and docosapentaenoic acid, ω-3 oils, and terpenic hydrocarbons, such as sterols and squalene. A Thraustochytrid, Aurantiochytrium sp. KH105 produces carotenoids, including astaxanthin, with strong antioxidant activity. To gain genomic insights into this capacity, we decoded its 97-Mbp genome and characterized genes for enzymes involved in carotenoid biosynthesis. Interestingly, all carotenogenic genes, as well as other eukaryotic genes, appeared duplicated, suggesting that this strain is diploid. In addition, among the five genes involved in the pathway from geranylgeranyl pyrophosphate to astaxanthin, geranylgeranyl phytoene synthase (crtB), phytoene desaturase (crtI) and lycopene cyclase (crtY) were fused into single gene (crtIBY) with no internal stop codons. Functionality of the trifunctional enzyme, CrtIBY, to catalyze the reaction from geranylgeranyl diphosphate to β-carotene was confirmed using a yeast assay system and mass spectrometry. Furthermore, analyses of differential gene expression showed characteristic up-regulation of carotenoid biosynthetic genes during stationary and starvation phases under these culture conditions. This suggests genetic engineering events to promote more efficient production of carotenoids. We also showed an occurrence of crtIBY in other Thraustochytrid species.


Introduction
A marine microbe, Aurantiochytrium, belongs to the order Thraustochytriaceae (so-called Thraustochytrids), pertaining to the class Labyrinthulomycetes, of the phylum Heterokonta, within the kingdom Chromista [1,2]. Thraustochytrids are a group of non-photosynthetic, marine, fungoid protists, characterized by an ectoplasmic net and a cell wall composed of non-cellulosic, sulfated scales [1][2][3]. It has been argued that the biomass of Labyrinthulomycetes in the water column may equal or even exceed that of bacteria [4]. Their high biomass and production of degradative enzymes indicate that Labyrinthulomycetes play a significant ecological role as alternative food sources for picoplankton feeders, as well as being active decomposers and consumers in marine microbial food chains [5].

DNA Preparation
Cells were harvested from culture broth by centrifugation (10,000× g, 20 min at 4 • C), and frozen in liquid nitrogen. Nuclear DNA for sequencing was obtained using a protease, RNAase, and a phenol-chloroform extraction protocol [16]. Packed, frozen cells were pulverized using a sterile mortar and pestle in liquid nitrogen. The powder was treated with 2% cetyltrimethylammonium bromide (CTAB) solution (2% CTAB, 100 mM Tris-HCl, pH 8.0, 20 mM ethylenediaminetetraacetic acid (EDTA), 1.4 M NaCl) for 1 h at 65 • C. Solutions were centrifuged at 1500× g for 5 min at 4 • C. Supernatant was subjected to 20 µg/mL RNase treatment (37 • C, 30 min) and further treated by adding an equal volume of chloroform/isoamyl alcohol (24:1, v/v) with gentle rotation for 1 h. After centrifugation at 1500× g using Phase lock gel tubes (Funakoshi, Tokyo, Japan), supernatants were transferred to new tubes. An additional chloroform treatment was repeated twice. By adding 1.5× volumes of 1% CTAB, genomic DNA was precipitated. Precipitated DNA was treated with 0.1 mg/mL proteinase K (Takara Bio Inc., Shiga, Japan) in 1 M NaCl for 2-3 h at 56 • C. By adding at least 2.5× volumes of ethanol, DNA was precipitated again. After rinsing with 70% ethanol twice, DNA was dissolved overnight in 10 mM Tris-HCl, 1 mM EDTA, pH 7.4, at 4 • C.

Genome Sequencing and Assembly
Genomic DNA was fragmented, and libraries were prepared and sequenced according to the manufacturer's protocols with whole-genome shotgun (WGS) reads using a Roche 454 GS-FLX (Roche Diagnostics, Basel, Switzerland) [17], and Illumina Miseq and GAIIx sequencers (Illumina, San Diego, CA, USA) [18]. To obtain contigs, 454 and Miseq paired-end reads (Illumina, San Diego, CA, USA) were combined and assembled de novo using GS De Novo Assembler version 2.6 (Newbler, Roche, Basel, Switzerland) [17]. Then, subsequent scaffolding was performed with SOPRA [19] and SSPACE [20] using Illumina mate-pair information. At least five consistent read-pairs were needed to form a connection. Gaps inside scaffolds were closed with Illumina paired-end data using GapCloser [21]. Genome size can be roughly estimated by the k-mer value of new-generation sequence reads. Illumina sequences were quality trimmed (QV ≥ 20) and possible polymerase chain reaction (PCR) duplicates were removed with MarkDuplicates in Picard tools (http://picard.sourceforge.net, Broad Institute, Cambridge, MA, USA). K-mers (length: 21, 31, 41, 51, 61, and 71) in the dataset were counted using JELLYFISH [22].

RNA Sequencing Analyses
We performed transcriptome sequencing using an Illumina GAIIx sequencer (Illumina, San Diego, CA, USA). RNA was isolated from cells at three time-points (27,40, and 48 h) during culturing at 28 • C, as described above. At each time-point, three samples were taken from different flasks. After washing cells with distilled water, total RNA was extracted following the manufacturer's instructions (Invitrogen, Carlsbad, CA, USA) and purified using DNase and RNeasy micro kits (QIAGEN, Hilden, Germany). Transcriptome libraries for Illumina GAIIx were prepared with TruSeq Standard mRNA LT Sample Prep kit (Illumina, San Diego, CA, USA) and sequenced as per manufacturer's instruction (paired-end sequences, 150 bp ×2). A total of 113.6 Gbp nucleotide (Gbn) sequence data were obtained, 71.8% of which (81.6 Gbn) had a quality value ≥30. These high-quality sequences were assembled with the Velvet/Oases assembler with a hash length of 37 [23]. Trinity assemblies were also used [24]. As a result, we obtained 112,852 contigs (29.3 Mbp) over 100 bp with an N50 size of 1575 bp. Of these, 88,570 (78%) had a blat alignment to the assembled genome (with default settings). A total of 68,571 of the 88,570 contigs were found to have complete open reading frames (ORFs) from the start to stop codon of at least 450 bp. Of these putatively full-length RNA contigs, 66,342 (97.3%) had a blat alignment to the assembled scaffolds. These data were used to produce gene models and annotation.

Gene Modeling
Gene predictions (Aurantiochytrium KH105 Gene Model ver. 1) were performed using ab initio and complementary DNA (cDNA) sequence alignment prediction derived from Augustus software [25]. Assembled Aurantiochytrium transcriptome sequences were processed with PASA [26], and the 500 longest ORFs were used to train the prediction software. Assembled Illumina RNA sequencing (RNA-seq) data were aligned to the assembled genome and incorporated as an Augustus "hint" on a repeat-masked genome produced using RepeatMasker [27].

Genome Browser
A genome browser was established for the assembled sequences using the Generic Genome Browser (GBrowser) 2.17 [28].

Annotation and Identification of Aurantiochytrium Genes
Protein-coding genes in the Aurantiochytrium KH105 genome were surveyed as follows. (i) Nucleotide and amino acid sequences of well annotated genes of model organisms were used as queries for basic local alignment search tool (BLAST) searches including TBLASTN (expect value < 1 × 10 −10 ) of the Aurantiochytrium KH105 genome; (ii) Pfam domain searches were performed to identify protein domains included in putative proteins from all gene models (Pfam-A.hmm, release 24.0; http://pfam.sanger.ac.uk) [29]. Aurantiochytrium genes with conserved domains were included in a phylogenetic analysis; (iii) In order not to miss divergent homologs specific to Labyrinthula lineages, we confirmed them with at least one gene hit using reciprocal BLAST searches. Divergent homologs were shown as outgroups of annotated genes. RNA-seq reads were mapped to the assembled genome [29]. MAKER2 was used to tune-up the annotation and modeling [30].

Molecular Phylogeny
Molecular phylogenic analyses of phytoene desaturase (CrtI), geranylgeranyl phytoene synthase (CrtB), and geranylgeranyl phytoene synthase (CrtY) domains of CrtIBY were carried out using ClustalW [31]. Their amino acid sequences were aligned using default parameters and the resulting datasets were used to visualize maximum-parsimony trees with Sea View [32].

Analyses of Gene Expression Profiles
Differential gene expression profiles were analyzed using RNA-seq data with TopHat and Cluffinks, according to the method of Trapnell et al. [33]. The expression level of each gene was calculated as fragments per kilobase of transcript per million mapped reads (FPKM) with Cufflinks [34].

Functional Assay of crtIBY
After cultivation of Aurantiochytrium sp. KH105 in GPY medium for 4 days, total genomic DNA was extracted from cells using the phenol-chloroform method. The crtIBY1 gene was obtained using polymerase chain reaction containing 0.15 µg of genomic DNA, 1 unit of KOD FX neo (Toyobo, Osaka, Japan), 0.4 mM of each deoxynucleotide (dNTP), and 0.3 µM of each oligo DNA primer (5'-CAGTGTGCTGGAATTATGGCGCGCAGGGCGTCG-3' and 5'-GATATCTGCAGAATTTCAGGCA TTCTTGTACAGCGGGAGC-3') (Fasmac, Kanagawa, Japan) in 50 µL with incubation at 94 • C for 2 min and then 30 cycles of 98 • C for 10 s, 70 • C for 30 s, and 68 • C for 2 min. The crtIBY1 gene was ligated to the EcoRI-digested vector pYES2 (Thermo Fisher Scientific, Waltham, MA, USA) using the In-fusion HD system (Clontech Laboratories, Mountain View, CA, USA). Stellar competent Escherichia coli cells (Clontech Laboratories) were transformed using a heat shock method. Resultant pYES2-crtIBY and pYES2 plasmids were introduced individually to Saccharomyces cerevisiae INVSc1 (Thermo Fisher Scientific) using the polyethylene glycol/lithium acetate method. Transformants complementing uracil requirement were isolated and cultivated in uracil-defective minimum medium (0.67% yeast nitrogen base without amino acids (Difco Laboratories, Detroit, MI, USA), 0.19% yeast synthetic drop out medium without uracil (Thermo Fisher Scientific), 4% D-raffinose (Thermo Fisher Scientific) at 28 • C for 6 h. The crtIBY1 gene was expressed by addition of galactose (Thermo Fisher Scientific) to a final concentration of 2%, followed by further incubation for 16 h. For carotenoid analysis, cells from a 50-mL culture were collected by centrifugation (3500 rpm, 10 min, 4 • C), washed with water, and freeze-dried. Dried cells were crushed with glass beads (Wako Pure Chemical, Osaka, Japan) in 1 mL of chloroform/methanol (2:1, v/v) using a beads crusher (µT-12, Taitec, Saitama, Japan) at 2500 rpm for 120 s. The extract was recovered in the chloroform phase by centrifugation (3500 rpm, 10 min, 4 • C) of the crushed cells and evaporated under nitrogen gas spray. Carotenoids were dissolved with acetone/methanol (7:3, v/v) and analyzed on a 1260 Infinity HPLC system (Agilent Technologies, Santa Clara, CA, USA) equipped with a carotenoid separation column (S-5, 4.6 × 250 mm; YMC, Kyoto, Japan) and a photo diode array (Agilent Technologies) at 1 mL/min with acetonitrile/dichloromethane/methanol (7:2:1, v/v). Liquid chromatography-mass spectrometric (LC-MS) analysis was performed using LTQ Orbitrap XL (Thermo Fisher Scientific) equipped with a liquid chromatography (LC) column as mentioned above. We did not detect any oxidative decomposition products or other modified compounds.

Whole Genome Sequencing
Roche 454 WGS paired-end sequencing (661 bp on average read-length) yielded 2.9 Gb of nucleotide sequence data (Supplementary Table S1). In addition, Illumina Miseq paired-end sequencing (249 bp on average read-length, ×2) yielded 1.4 Gb of nucleotides, and GAIIx mate-pair sequencing (148 bp on average read-length, 3-kb library) yielded a total of 4.6 Gb. In total, 8.9 Gb sequence data were obtained.

Gene Modeling
Gene models were constructed by combining RNA-seq sequence data (113.6 Gb of paired-end reads from the Illumina GAIIx) (Table S2) and other queries. There were 19,756 complete gene models with both start and stop codons (Table 1). Approximately 80% of them (15,800) were supported by RNA-seq analyses. At this modeling stage, the average gene length was 13,061 bp. The number of exons per gene was 4.7, and the average exon length was 513 bp, suggesting that an average gene consists of exons 2411-bp in length. This means that, on average, each gene contains an intron of nearly 10,565 bp. Here, we compared the gene content of Aurantiochytrium sp. strain KH105 to those of four other Thraustochytrid species (Table 1). The genome size of Aurantiochytrium sp. strain KH105 was almost double that of the other Thraustochytrid species. This suggests that the Aurantiochytrium sp. strain KH105 is diploid (see below).
To determine the ploidy level of the genome, we analyzed a set of 248 core eukaryotic genes (CEGs), which are highly conserved and are present in low copy numbers in higher eukaryotes [37]. For example, the yeasts, Saccharomyces cerevisiae and Schizosaccharomyces pombe, both haploid strains, have an average of 1.10 and 1.11 orthologs per CEG in their genomes, respectively (Supplementary Table S3), indicating that they are haploid. A similar analysis of genomes of Phytophthora infestans [38], Phytophthora ramorum, and Phytophthora sojae [39] showed that their average numbers were 1.25, 1.18, and 1.22, respectively. In contrast, the number of Auranthiochytrium sp. KH105 was 2.24, almost double those of the yeasts and Phytophthora. In addition, 95.6% of the 248 CEGs in the Auranthiochytrium sp. KH105 genome have two or more orthologs. Given the properties of CEGs, it is highly like that Auranthiochytrium sp. KH105 is diploid.

Structure and Function of crtIBY Gene
In addition, we found that crtB, crt I, and crtY were clustered in scaffolds 00221 (tentatively called crtIBY1) and 00288 (crtIBY2), and that the gene order in both clusters was I-B-Y (Figure 2). Both expanded approximately 3.8 kb in the scaffolds and were expected to encode proteins of 1,268 amino acids. Molecular phylogeny indicated an affinity of KH105 CrtI (Figures S3 and S4), CrtB (Supplementary Figures S5 and S6), and CrtY ( Figures S7 and S8) to archaean homologs. Messenger RNAs (mRNAs) transcribed by crtIBY1 and crtIBY2 (for example, base positions 106 to 3807 of crtIBY1 ( Figure 2) had an open reading frame (ORF) without stop codons. This suggests that the mRNAs are translated to produce single polypeptides of 1268 amino acids.  Supplementary Table S5. Therefore, not only are crtB, crtI, and crtY clustered in the genome, but they are also fused into a single gene. Although there are several reports that show a cluster of two genes encoding carotenoid biosynthetic enzymes, for example, carRA in Phycomyces blakesleeanus [39] and crtYB in Xanthophyllomyces dendrorhous [49], this finding of a three-gene fusion appears novel. Since the biosynthetic process from GGPP to β-carotene is usually regulated by four independent enzymatic reactions catalyzed by CrtB, CrtI, and CrtY (green lines in Figure 1a), this gene fusion suggests that Aurantiochytrium has managed to increase efficiency by performing all three reactions with a single, multifunctional enzyme (Figure 1a). In addition, we have been unable to detect lycopene, which is an intermediate product of the pathway when the process is accomplished by four separate enzymatic reactions.
To confirm the function of crtIBY, the gene was heterologously expressed in yeast Saccharomyces cerevisiae INVSc1, which possesses GGPP synthase (BTS1), but lacks carotenoid biosynthetic genes, crtB, crtI, and crtY. Liquid chromatography-mass spectrometric analysis of carotenoids produced by the crtIBY1-expressing yeast identified a newly generated compound as β-carotene, based upon its mass profile, which is similar to that of authentic β-carotene (Figure 3). This indicates that the crtIBY gene encodes β-carotene synthase. The fact that levels of β-carotene increased in BTS1-overexpressing cells suggests that GGPP is the substrate of CrtIBY.

Gene Expression Profile
Carotenoid biosynthesis of Aurantiochytrium sp. KH105 becomes evident after 27 h of culture at 28 • C (mid-log phase of culture; Figure 4a), and carotenoid production plateaus after 72 h (starvation phase). Amounts of β-carotenes, canthaxanthin, and astaxanthin, plateaued at ≈48 h (stationary phase), 72 h, and 96 h, respectively, after initiation of incubation. We thought that these changes might reflect enzymatic gene expression; therefore, we performed a large-scale, gene expression profile analysis of the seven genes involved in carotenoid synthesis. This analysis showed that the concentration of FDPS mRNA decreased after 27 h of incubation (Figure 4b), while the concentrations of crtE, crtIBY, crtO, and crtZ mRNAs steadily increased from 27-48 h of incubation.
Major lipid biosynthetic pathways in Aurantiochytrium include those for polyunsaturated fatty acids (DHA and DPA) from acetyl-CoA, squalene, and sterols from farnesyl pyrophosphate (FPP), and xanthophylls from FPP/GGPP (Supplementary Figure S9). As mentioned above, in strain KH105, increased expression activity of crtE was evident, but overall expression levels of crtIBY, crtO, and crtZ appeared lower compared to that of crtE. Therefore, a key enzymatic reaction to synthesize xanthophylls is the CrtIBY-mediated reaction to produce β-carotene from GGPP. Although future studies should explore regulatory mechanisms of crtIBY gene expression in relation to crtE expression, genetic manipulation of the expression of these genes seems to be a key event that has allowed Aurantiochytrium to produce carotenoids more efficiently.
Since differential gene expression profile analysis showed fewer changes in the concentration of mRNAs for crtIBY, crtO, and crtZ, we further compared mRNA levels of the three genes between growth phases. Expression levels of crtO and crtZ increased from 27 (mid-log phase) to 40 h (late-log to stationary phase), and from 27 to 48 h (starvation phase), respectively (Figure 4c). Up-regulation of crtO and crtZ expression appeared to correspond well to increased concentrations of canthaxanthin and astaxanthin (Figure 4a,c). This suggests that genetic changes that up-regulate crtO and crtZ expression do increase carotenoid production. However, since overall levels of crtO and crtZ expression appear high during log phase compared to starvation phase, this explanation may be too simplistic. In contrast, the expression level of crtIBY appears lower than those of other enzymatic genes. Therefore, a plausible way to increase carotenoid production is to induce higher crtIBY expression during log, stationary, and starvation phases.

Occurrence of the crtIBY Gene in other Thraustochytrid Species
As described above, the genome of Aurantiochytrium sp. strain KH105 contains a fused gene, crtIBY, and the encoded protein likely acts as a trifunctional β-carotene synthase. Because genomes of other Thraustochytrid species have been sequenced (Table 1), we examined whether crtIBY is specific to Aurantiochytrium sp. 105 or whether it is also a feature of other Thraustochytrids. As a query of KH105 crtI, crtB, or crtY nucleotide sequences, we BLASTed genome sequences of the other four species. We found that all four Thraustochytrids possess a protein very similar to KH105 CrtIBY ( Figure 5).
No stop codons were found in the genomic region that includes the sequences. Especially, amino acid sequences of CrtIBY were almost completely identical among Aurantiochytrium sp. strain KH105 and Aurantiochytrium sp. FCC1311 and Schizochytrium sp. CCTCC M209059 ( Figure 5). On the other hand, in CrtI of Aurantiochytrium sp. T66 and Thraustochytrium sp. ATCC 26185, there were three insertions consisting of two, three, or four amino acids. In addition, there were many amino acid substitutions in these two species, compared with the other three ( Figure 5). Therefore, it is highly likely that a fused gene crtIBY is common to all Thraustochytrid species, although its function should be examined in each.

Conclusions
The decoded genome of Aurantiochytrium sp. strain KH105 demonstrated at least two genomic alterations associated with efficient production of carotenoids, gene duplication, and gene fusion. The fusion gene, crtIBY, looks common in all Thraustochytrid species. In addition, a large number of differential expression profiles of genes involved in carotenoid biosynthesis suggest key genetic manipulations that lead to more efficient production of carotenoids in this strain.