Transcription and Maturation of mRNA in Dinoflagellates

Dinoflagellates are of great importance to the marine ecosystem, yet scant details of how gene expression is regulated at the transcriptional level are available. Transcription is of interest in the context of the chromatin structure in the dinoflagellates as it shows many differences from more typical eukaryotic cells. Here we canvas recent transcriptome profiles to identify the molecular building blocks available for the construction of the transcriptional machinery and contrast these with those used by other systems. Dinoflagellates display a clear paucity of specific transcription factors, although surprisingly, the rest of the basic transcriptional machinery is not markedly different from what is found in the close relatives to the dinoflagellates.


Introduction
Dinoflagellates are an important group of unicellular eukaryotes found in both marine and fresh water environments.These marine species are of particular importance on a global scale, as along with the diatoms, they contribute roughly half of the carbon fixed in the oceans, and thus roughly a quarter of the global totals [1].They also play a role in maintaining the biodiversity surrounding coral reefs, since the coral polyps themselves rely on photosynthetic products supplied by the symbiotic dinoflagellates they harbor for growth in nutrient poor waters [2].Furthermore, many marine dinoflagellates synthesize potent toxins that accumulate to high concentrations in the algal blooms OPEN ACCESS commonly called "red tides" [3].Lastly, the nightly bioluminescence of many dinoflagellates, popularly known as the "phosphorescence of the sea", has inspired not only art and literature but also intensive scientific dissection of the bioluminescence phenomenon [4].Interestingly, in Lingulodinium polyedrum this nightly bioluminescence [5], as well as photosynthesis [6], cell division [7], and diurnal vertical migration [8], are all regulated by an endogenous circadian (daily) clock.L. polyedrum has been studied for over 60 years as a model system for addressing the biochemical links between the internal clock and the observed rhythms [9].
Phylogenetically, dinoflagellates are grouped in the superphylum Alveolata, which contains apicomplexans as their closest relatives as well as ciliates [10].Members of the Alveolata share a number of features, in particular the presence of flattened vesicles termed cortical alveoli lying just beneath the plasma membrane (Figure 1).However, dinoflagellates also have many unique characteristics compared to their relatives.For example, dinoflagellates typically possess a large quantity of nuclear DNA containing many genes organized in tandem gene arrays, with DNA found in a liquid crystal structure lacking observable nucleosomes [11].It is unfortunate that dinoflagellates have so far proven refractory to mutational or gene transformational studies, thus hindering the extensive molecular studies needed to understand the mechanisms for regulating gene expression.The mechanisms used to control the expression of different genes have been extensively researched in both prokaryotes and eukaryotes.Critical events in eukaryotes include changes in chromatin organization, transcription of DNA into pre-mRNA, splicing of pre-RNA into mature mRNA, mRNA transport, mRNA degradation, mRNA editing and covalent modifications of the mRNA, translation of mRNA into protein, and, lastly, post-translational modification of the protein.All these, either individually or collectively, are responsible for regulating gene expression within a cell.In this review, we will focus primarily on transcription and its regulation as they relate to the control of gene expression in the dinoflagellates, as more comprehensive studies on dinoflagellates have been published elsewhere [12][13][14].

cis-Acting Sequences and RNA Polymerase Components
Dinoflagellate chromosomes are permanently condensed at all stages of the cell cycle (Figure 2) and assume a liquid crystalline structure [15,16] with bivalent cations acting as the stabilization matrix [17].This unusual chromatin structure thus raises the important questions about the accessibility of genes within the structure to the transcriptional machinery.The dinoflagellate Prorocentrum micans was inspected using high resolution electron microscope autoradiography for 3 H-adenine incorporation, and this revealed that RNA transcription was prevalent only on extrachromosomal DNA filaments and not on DNA within the main body of the chromosome [18].It was proposed that this transcriptionally inactive DNA might instead play a role in stabilizing chromosome organization, perhaps by an association with a protein matrix [18].Given access to the genetic material, transcription initiation in dinoflagellates is likely to require an elaborate set of trans-acting factors and a series of conserved cis-acting sequences, as is the case in other eukaryotes.The complex of trans-acting factors binding the regulatory sequences in the DNA includes, in addition to the RNA polymerases, both general and gene-specific transcription factors, activators and mediators [19].The cis-acting sequences in eukaryotes can include regulatory elements far from the transcription start site, termed enhancers, although the region just upstream of the start site, termed a promoter, consisting of a core region and other regulatory domains [20,21] is considered as the primary site of initiation.There are two major classes of promoters that regulate the expression of protein coding genes, and these contain either a TATA-box (consensus sequence <TATAAA>) or CpG islands, a region rich in CG dinucleotides [22] as their core domains.In Pyrocystis lunula luciferase (lcf) genes, a GC box consensus sequence <GGGCGG> is present, but its location is further upstream than the usual position of −110 (numbered relative to the transcriptional start site at +1) found in many eukaryotes [23].Furthermore, a GC-rich motif <C(G/C)GCCC> was also found within the upstream region of P. lunula lcf A and L. polyedrum lcf and lbp genes, but its position was not fixed.This GC-rich motif was first reported in the upstream region of the Peridinium bipes ferredoxin gene [24].However, the role of this motif in gene expression has still not been established.Both TATA-box or CpG island type promoters may include additional sequence elements such as the GC-box <GGGCGG>, the CAAT-box <CCAAT>, and the INR box <(C/T)(C/T)AN(T/A)(C/T)(C/T)> at which transcription is initiated.Interestingly, the TATA box is quite conserved in eukaryotes and is also found in protists as diverse as amoebas (Acanthamoeba), slime molds (Dictyostelium), ciliates (Histriculus cavicola), and apicomplexans (Plasmodium) [25][26][27][28][29][30].On the other hand, members of the phylum Parabasalia use their own specific promoter element instead of the canonical TATA box [31][32][33].
Proper understanding of gene organization and structure is required to describe transcription in dinoflagellates.For example, L. polyedrum has multiple copies of peridinin-chlorophyll a-binding protein (pcp), luciferin binding protein (lbp) and luciferase (lcf) genes arranged in long tandem repeats [34][35][36][37].PCR with Pyrocystis lunula genomic DNA revealed that, among lcf A, lcf B and lcf C isoforms, two (lcf A and B) are in tandem repeat.However, the sequence of the intergenic region between lcf and pcp coding sequences of the L. polyedrum lacks any known promoter elements.The only common feature between the two was a conserved 13 nucleotide sequence, CGTGAACGCAGTG, proposed as a dinoflagellate specific promoter sequence [35] but no further work has been published to firmly establish this result.Moreover, this sequence is not conserved among different dinoflagellate species as it is absent in the intergenic region between P. lunula lcf A and lcf B genes [38].To test if the tandem repeat structure is a general character of dinoflagellates, PCR was used with primers directed away from one another in Amphidinium carterae [39].PCR using genomic DNA as a template was expected to produce a band if the genes were found as a tandem repeat, and this strategy revealed that 17 out of the 47 genes tested did indeed have a tandem repeat structure.
The lack of identifiable sequence elements in the intergenic spacers has lead to the suggestion that tandem gene repeats may form a polycistronic transcript, in a manner similar to the Trypanosoma gene structure [40].The trypanosomes transcribe from a single promoter long polycistronic transcripts containing genes coding for different gene products, and the primary transcript is then processed into mature mRNAs by trans splicing of the SL leader at the 5′ end and by polyadenylation at the 3′ end.If true for dinoflagellates, one possibility would place a promoter upstream of each tandem array, thus explaining the lack of recognizable promoter sequences in the intergenic regions.However, the consequences of this hypothesis include the predictions that the intergenic spacer region should be abundant in the transcribed RNAs, and that sequence differences between copies in low copy number arrays should be detected in the mature transcripts at a frequency inversely proportional to the copy number.In a recent transcriptomic study that addressed this issue, none of these predictions were validated experimentally [41].
Eukaryotic and prokaryotic transcription also differs in that three different RNA polymerases (RNAP) are used for the former while only one is used for the latter.The three eukaryotic enzymes have specialized functions, with RNAP I transcribing most ribosomal RNA (rRNA), RNAP II transcribing protein-coding messengers (mRNA), small nuclear RNAs (snRNA) and micro RNA (miRNA), and RNAP III synthesizing transfer RNAs (tRNA) and the 5S rRNA.An assessment of the activity of RNA polymerase in the dinoflagellate Crypthecodinium cohnii, carried out with radiolabeled UTP, revealed that considerable amounts of RNA polymerase activity remained even after inhibition by α-amanitin, a potent inhibitor of RNAP II.This thus confirmed the presence of multiple forms of DNA dependent RNA polymerase as in other eukaryotes [42].Curiously, this research also noted a peculiar inhibition of polymerase activity by Mn +2 , instead of the activation of these enzymes seen in other eukaryotes.It was suggested that dinoflagellate RNAP II activity might differ slightly from the other eukaryotic RNAP II enzymes [42], perhaps analogous to the unusual form of RNAP II found in some trypanosomes [43].However, the transcriptome of L. polyedrum contains a complete set of core and common elements for all the three eukaryotic RNAPs.Furthermore, the specific elements absent from the transcriptome were also missing in other members of the Alveolata (Figure 3).It seems that the alveolates in general can assemble functional RNAPs with a reduced number of components as compared to higher eukaryotes, and there is nothing unique to the dinoflagellates in this part of the transcriptional machinery.

Basal/General Transcription Factors
In addition to RNAP II, an in vitro reconstitution of a functional eukaryotic transcriptional apparatus requires a suite of other basal/general transcriptional factors (TF) [44].Six multi-subunit complexes, termed TFIIA, TFIIB, TFIID, TFIIE, TFIIF and TFIIH, appear to be among the most important [45][46][47][48][49].The first step of promoter recognition is performed by TFIID, constituted from the TATA binding protein (TBP) and at least 14 TBP-associated factors (TAFs) [50,51].TBP binding is considered to be the rate-limiting step in the transcription process [52], although TBP can have relatives, such as TBP-related factors (TRF), which also activate transcription from the same RNAP II promoters that are activated by TBP [53,54].These TRFs have been found in diverse animals, including fruit fly, nemotode, frog, zebrafish, chick, mouse, and human [53].Interestingly, C. cohnii has been shown to contain a TBP-like factor (TLF), clearly homologous to TBP yet lacking the four phenylalanine residues known to interact with the TATA box.This TLF is unique to dinoflagellates (Figure 4, Supplementary Figure S1) and has a strong affinity for a <TTTT> sequence instead of the consensus TATA-box sequence [55].Unfortunately, the upstream regions from 6 different genes of two different dinoflagellates did not contain a TTTT element [55].This suggests a unique promoter recognition mechanism for at least these genes, in keeping with the unusual structure of the chromatin of these organisms.The number of RNA polymerase components present over a wide phylogenetic range of organisms includes those considered to be core components (red), common components (yellow) and specific components (blue) of the RNAP I, II and III.Each bar represents an individual component.The representative sequences for the RNA polymerase I, II and III subunits were selected from an animal (H.sapiens), a plant (A.thaliana), a diatom (T.pseudonana), and two other alveolates (T.thermophila and P. falciparum) and uploaded and maintained as a local database in the Geneious software.Using tBLASTn and an expect E-value cutoff of e −25 , the Lingulodinium transcriptome was scanned to obtain the homologues for the RNA polymerase subunits [41].For all other species the sequences were directly obtained from the KEGG specific pathway database by selecting the specific organism.[56].For L. polyedrum and Symbiodinium, the translated sequences were aligned using MUSCLE, an alignment program built in the tree construction software MEGA5 [57] that was used for this phylogenetic analysis.
The L. polyedrum transcriptome contains two TLF isoforms similar to the TLF found in C. cohnii and, somewhat surprisingly, no TBP at all [41].The phylogenetic relationship between the consensus TBP and the TLF, found uniquely in the dinoflagellates, clearly indicates the early divergence of TLF from TBP as well as the presence of two distinct TLF clades within the dinoflagellates (Figure 4).In agreement with this lack of TBP, it is perhaps not surprising that L. polyedrum also lacks most other TAFs, although the closely related Alexandrium expresses two proteins with DNA helicase activity, RuvB-like1 and RuvB-like2 [58].RuvB-like proteins have been shown to co-purify with the human RNA polymerase holoenzyme complex and found to be an extremely important element required for growth [59], suggesting they may also play a role in the dinoflagellates.In particular, L. polyedrum lacks any TFIIA, TFIIB, TFIIE or TFIIF components, and only 3 out of the ten expected TFIIH components are found (the E-value cut-off for the tBLASTn is e −25 ).It must be noted, however, that ciliate, apicomplexan and diatom genomes contain a single TBP and also lack the TAFs and TFs missing in L. polyedrum [41].A figurative representation of basal TF status in different eukaryotes (Figure 5) indicates that the poor conservation of TAFs and other basal TFs in L. polyedrum is commensurate with the other related eukaryotes.These properties thus seem more likely to be due to a reduced dependence on these TFs throughout the Alveolata than to the unusual nature of the dinoflagellate chromatin.The dinoflagellates do not contain the putative TBP (red) but do express a TBP-like factor (TLF; pink).Each bar represents a different component.A pool of basal transcription factor (BTF) protein sequences were selected from the five species then stored as a local database in Geneious.The Lingulodinium transcriptome was scanned using tBLASTn at an expect E-value of <e −25 , to obtain the homologues of the BTFs [41].

DNA Binding Proteins
Histones are the most abundant and conserved class of basic proteins in the DNA-binding protein class of eukaryotes, and can profoundly affect transcription rates by their ability to alter the degree of chromatin condensation.The classic nucleosome structure, observed microscopically as "beads on a string", forms when 146 bp of DNA wraps 1.65 times around the histone octamer (dimers of each of the four core histone proteins H2A, H2B, H3, and H4) [60,61].A fifth protein, histone H1, binds to the linker DNA between nucleosomes to induce an even higher structural order to the chromatin [62].Dinoflagellates have long been thought to lack histone proteins, and there is considerable biochemical evidence to support this view [63].Dinoflagellate protein extracts do not show the typical pattern of histones after polyacrylamide gel electrophoresis [64,65] and no nucleosomes are visible in dinoflagellate DNA spreads observed under a microscope [66,67].The only other eukaryotic cells lacking histones are sperm nuclei, which instead employ arginine-rich proteins called protamines to stabilize their DNA structure [15,68,69].No protamines are found in the L. polyedrum transcriptome.
Although metatranscriptomic analysis with the DinoSL found core histone sequences that were scattered in different dinoflagellate species [70], the presence of the full complement of all core histones in a single dinoflagellate species were first confirmed in L. polyedrum [71].However, the presence of all the core histone sequences in the transcriptome of two dinoflagellate species, the high sequence conservation of these sequences compared to other eukaryotic histones, and the presence of a wide range of histone modifying enzymes in the L. polyedrum transcriptome all suggest that histone proteins are indeed expressed [56,71], albeit at levels still undetectable by antibody or MS analysis [71].
The total amount of basic proteins in dinoflagellate nuclei (basic protein to DNA ratio of 1:10 [72]) is much lower than generally found in eukaryotes (1:1 ratio [73]) and prokaryotes (1:1.75 ratio [74]) and appears to date to include two different basic protein types.One, a group of histone-like proteins (HLPs) [65], were first found by electrophoretic analysis of acid soluble nuclear proteins in the dinoflagellate C. cohnii and later renamed HCc 1-4 [75,76].Blast homology searches with C. cohnii HLP revealed that L. polyedrum also has an HLP, which was named HLp [77], and this protein was shown to have sequence specific DNA binding activity and be subject to post-translational modifications suggesting that its activity might be regulated in vivo [77].The presence of HLPs has been confirmed in many other dinoflagellates [63].A second basic protein called DVNP (dinoflagellate/viral nucleoprotein), recently found in studies of the basal dinoflagellate, Hematodinium, can bind DNA as efficiently as histones and can also be post-translationally modified [78].DVNP is found only in dinoflagellates, including the early diverging lineage Hematodinium, as well as in a family of large algal virus, the Phycodnaviridae.However, DVNP is not found in Perkinsus, the common ancestor of dinoflagellates and apicomplexans, which has instead the typical eukaryotic chromatin with all core histone proteins and DNA arranged into nucleosomes [79].The acquisition of DVNP thus occurred at some time following divergence of Hematodinium and the main dinoflagellate lineages from Perkinsus, and thus appears to coincide with the appearance of the unusual core dinoflagellate nuclear morphology.In addition, a substantial proportion of the DNA appears to consist of repeated sequences, and it is possible that this may contribute to genome organization [80].
The nuclear matrix is a network of fibers in the nucleus that also plays a key role in the functional and structural organization of the chromatin.Electron microscopy studies of nuclear matrices in the dinoflagellate Amphidinium carterae, produced in situ by microencapsulation in agarose and sequential extraction coupled with immunoblotting, revealed the presence of two matrix proteins (lamins and topoisomerase II) similar to what is found in higher eukaryotes [81].The lamins are architectural proteins, a class of intermediate filaments that line the inside of the metazoan nuclear envelope and act as a scaffold to which proteins and chromatin bind [82].They have a wide range of nuclear functions such as higher-order genome organization, chromatin regulation, transcription, DNA replication and repair [83,84].Thus, although the dinoflagellate chromatin is arranged differently from other eukaryotes, its nuclear matrix is conserved, perhaps indicative of an ancient evolutionary trait required for nuclear structure.
In pursuit of sequence-specific DNA binding proteins (as opposed to basal or general TFs), a dinoflagellate nuclear associated protein (Dinap1) was found in C. cohnii.Dinap1 does not have any known homologues but does contain two zinc finger domains (known to be present in many transcriptional factors) and two WW domains (known to interact with proline-rich domains) [85].An interaction study using the Dinap1 WW domains identified five proline-rich Dinap1-interacting proteins (Dip) [86], and screening of a C. cohnii cDNA library with a tagged Dip1 retrieved not only the expected Dinap1 but also other interactants, named DAP (Dip1-associated proteins) [86].Dinap1, Dip1 and DAP were all found in the nucleus and all have the same pattern of protein expression.Unfortunately, none of the above-described proteins interacted with DNA directly [86], although some as yet unidentified intermediate partners may be involved in DNA recognition.In addition to Dinap1, a homologue of the Tubby-like protein (TUBL) [58], a group of membrane-tethered transcription factors involved in the signaling pathway [87] has been found in Alexandrium, although this protein has not been fully characterized.
Gene specific transcription factors (TFs) are one of the largest family of proteins in most cells, accounting for ~4% of the genome in yeast or ~8% of the genome in plants and mammals [88].In contrast, proteins with a DNA binding domain account for only 0.15%-0.3% of the total transcripts in each of two different dinoflagellates, Lingulodinium and Symbiodinium [41,56].Furthermore, in both species, roughly two-thirds of the TFs are represented by a single group, the Cold Shock Domain (CSD) containing proteins.The CSD is relatively uncommon in eukaryotes, and importantly, is more often implicated in posttranscriptional than transcriptional regulation [89].Whether or not the dinoflagellate version of the CSD proteins will be shown to be bone fide DNA-binding proteins, and the reason for the preferential expansion of this domain in dinoflagellates, remains to be discovered.However, there is a caveat to assuming that dinoflagellates are bereft of most DNA binding domains based on gene sequence data.Apicomplexans were initially also thought to have a low number of DNA binding proteins, yet further research revealed the expansion of a unique family of transcription factors, ApiAP2, in these organisms [90].An as yet unknown family of factors modulating transcription may remain to be discovered in the dinoflagellates.

Transcriptional Regulation
Methylation of cytosine in the DNA is a well-studied epigenetic modification that plays an important role in several cellular processes such as retrotransposon silencing, genomic imprinting, X-chromosome inactivation, regulation of gene expression, and maintenance of epigenetic memory [91].Cytosine methylation occurs at roughly 0.5%-4% of cytosines in dinoflagellates [92,93], and is dynamic as it has been shown to change with varying light conditions [94].It is thus possible that cytosine methylation may structurally regulate the access of DNA to transcription.In addition to 5-methylcytosine (5-MeC), dinoflagellates possess a number of unusual base modifications such as 5-hydroxymethyluracil (5-HMeU) and N6-methyladenine (N6-MeA) [95].5-HMeU is formed in DNA as a product of oxidative attack on the methyl group of thymidine [96], and dinoflagellate DNA contains between 12% and 70% of the thymidine as 5-HMeU [97].The significance of this modification in dinoflagellate DNA is still unclear.
The posttranslational modification of histones plays an important role in regulating gene expression in other eukaryotes, and deserves re-examination in dinoflagellates because of the recent discovery that conserved sequences for core histones and their regulatory enzymes appear in the transcriptomes [56,71].It is possible that very low levels of histones are associated with gene regulatory sites, much as the low levels of acetylated histone H3 are associated with initiation of polycistronic transcripts in kinetoplastids [98].The role of HLPs in regulating gene expression is also unclear, although the sequence-specific DNA binding and their existence in several post-translationally modified forms may indicate an involvement in gene regulatory mechanisms [77].HLP transcript abundance in dinoflagellates appears to be up-regulated during different phases of cell cycle and in response to nutrient availability, as exemplified by Pyrocystis lunula where HLP transcripts peaked during the S-phase [99], and Alexandrium fundyense where HLP transcripts were up-regulated during G1 phase [99].However, unlike the higher eukaryotes whose histone mRNA levels increase during S-phase, no difference in histone mRNAs abundance was found during S-phase in L. polyedrum [71].It will be interesting to examine the newly discovered DVNP [78] to see if transcriptional regulation accompanies DNA synthesis in the dinoflagellates.
Most organisms have evolved an ability to respond to environmental changes, including biotic and abiotic stresses such as changes in light or temperature.The signaling pathways involve receptors that sense and transmit the information to regulatory molecules, and changes in gene expression are a frequently observed cellular response [100].For example, in Amphidinium carterae, Northern blot hybridization revealed that transcript levels of two light harvesting proteins, peridinin chlorophyll a protein (PCP) and a major a/c-containing intrinsic light-harvesting proteins (LHC), were, respectively, 86-and 6-fold more abundant under low light conditions than under normal light conditions [94].Interestingly, this increase in transcript levels coincided with a decrease in DNA cytosine methylation of CpG and CpNpG motifs present near or inside the coding regions of the two genes under low light intensity, although in vitro experiments to link DNA demethylation with transcriptional activation were unsuccessful [94].Karenia brevis may also have a transcriptional response to low light, as the abundance of 9.8% of the 4269 unique genes in the microarray differed between day and night [101].In addition to light, temperature is also an important signal, and has been implicated in the loss of cnidarian-dinoflagellate symbiosis, a phenomenon called coral bleaching.Temperature increases induce oxidative stress in Symbiodinium bermudense that result in increased levels of superoxide radicals and hydrogen peroxide [102], and this may be the primary reason for loss of the symbiont [103].
To check the regulation of expression of heat shock protein (hsp) genes in Symbiodinium residing inside its coral host Acropora millepora, qPCR was used with samples that were subjected to elevated temperatures rapidly or gradually [104].Dinoflagellate hsp70 transcript levels increased from 39% to 57% when temperature increased to 26 °C (moderate) or 29 °C (severe), although when cells were exposed to extreme heat stress hsp70 transcript levels decreased by up to 70%.Curiously, hsp90 transcript levels always decreased under heat stress and were independent of the speed of the temperature increase [104].
Oxidative stress is often able to induce a transcriptional response in organisms.In L. polyedrum, metal-induced oxidative stress resulted in sharp increases in the activity of the defense enzyme superoxide dismutase [105], with the increase in activity dependent on the type of metal, its exposure time and concentration [106,107].This same stress resulted in an increase in the chloroplastic Fe-SOD transcript level, which accounted for the increased enzymatic activity, clearly demonstrating the transcriptional response [108].Similarly, a microarray of 3500 genes from P. lunula revealed that 204 and 37 genes increased in abundance by 2-to 4-fold after treatment with 1 mM sodium nitrite or 0.5 mM paraquat, respectively [109].The transcriptional response of the heat shock protein genes hsp70 and hsp90, to elevated temperature, metal and endocrine disrupting chemicals, were tested in the dinoflagellate Prorocentrum minimum.RT-PCR results revealed that Hsp70 transcripts increased in response to each of these stresses, while Hsp90 transcript level increased only in response to temperature and metals [110,111].Lastly, 454 pyrosequencing in the basal dinoflagellate, Oxyrrhis marina, revealed 9 and 21 transcripts to be up-and down-regulated by saline stress, respectively [112].However, it is worth mentioning that transcript levels of only 11 of these 30 genes varied by more than 2-fold, and among these latter, 10 were in the down-regulated class.Clearly, dinoflagellates respond to a variety of stress conditions.
The circadian (daily) clock is an endogenous timer that regulates daily rhythms in organisms from all walks of life [113][114][115][116], and although the clock receives timing cues from light/dark cycles or temperature changes [117][118][119][120], it provides signals distinct from these environmental conditions since rhythms can be maintained under constant conditions.Circadian rhythms presumably make organisms more fit by allowing them to specialize for different tasks at different times of day, and, in many cases, the physiological rhythms regulated by the clock are mediated through changes in gene expression.Indeed, microarray studies showed that the number of circadian mRNAs varied from 5%-20% in Neurospora, 10% in Arabidopsis, 5%-10% in mice and 30%-65% in the cyanobacteria Synechococcus elongates [121,122].In the dinoflagellate P. lunula, 3% of the genes on a microarray were found to exhibit changes in transcript abundance (between 2-and 2.5-fold) [123] while in K. brevis 0.7% of the genes varied in both light/dark and constant light (between 2-and 7-fold) [101].The fluorescence labeling of total RNAs and 32 P incorporation of ribosomal RNAs in the stationary phase cells of L. polyedrum under constant light, followed by subsequent gel electrophoresis of the labeled RNAs, showed circadian rhythmicity with maximum RNA abundance at CT18 [124], the time corresponding to the peak of S-phase in these species [125,126].However, when L. polyedrum cells were treated with Actinomycin D (ActD), a drug that inhibits DNA-dependent RNA synthesis, the bioluminescence and photosynthesis rhythms were unaffected for 30 h or more depending on the dose of the treatment [127].In contrast to the lack of effect using transcription inhibitors, treatment with translation inhibitor puromycin caused an immediate inhibition of the rhythms [127].As ActD will also indirectly inhibit protein synthesis when RNA levels have decayed sufficiently, it is possible that the eventual loss of the rhythms following ActD treatment was due to decreasing levels of RNA.Similar tests with high concentrations of other potent inhibitors of RNAP II, such as DRB (5,6-dichloro-1-beta-D-ribofuranosylbenzimidazole) and α-amanitin confirmed no significant effect on growth, luminescence or rhythmicity in L. polyedrum cultures [128].Indeed, all circadian changes of protein levels in L. polyedrum have so far proven to be regulated post-transcriptionally [9].
Nutrient availability is also an important environmental cue, and can result in the formation of algal blooms for some dinoflagellates.The nutrients most important for the blooms are nitrogen (N) and phosphorus (P), and thus the transcriptomic response of dinoflagellates to N-and P-deplete and -replete conditions has been of great interest.When Karenia grown in N-deplete and -replete conditions were compared, 1102 genes on a microarray chip of 11,000 genes were found to be differentially expressed [129].Among the up-regulated genes were found type III glutamine synthetases, nitrate/nitrite transporters, and an ammonium transporter, all known to function in the nitrogen uptake and assimilation pathway.The transcriptomic response to P-depletion was not so informative, although 12% of the array showed a different expression profile.However, the activity and transcription levels of alkaline phosphatase were found to be regulated by the availability of the inorganic phosphate source in the dinoflagellates K. brevis and A. carterae [130,131].Interestingly, N and P concentrations and growth stages have a strong impact on the toxin levels produced by Alexandrium tamarense, suggesting that expression of genes involved in these pathways may be responsive to nutrients [132].Microarray experiment with 4298 sequences from Alexandrium minutum identified 87 genes that specifically responded to N or P limitation [133], while massively parallel signature sequencing (MPSS) in A. tamarense cultures showed only 2 and 12 out of a total of 40,029 signatures were uniquely expressed under N and P starvation, respectively [134].
The strain and growth stage of dinoflagellate cultures can also affect gene expression.In the microarray study of A. tamarense discussed above, 489 of the 4298 sequences examined were found to be differentially expressed when exponentially growing and stationary phase cultures were compared, a number even higher than the response induced by nutrient deprivation [133].Here, proliferating cells showed a greater abundance of translation pathway gene transcripts and a lower abundance of transcripts from genes involved in intracellular signaling [133].Similar studies in A. catenella revealed proliferating cells show over-expression of transcripts from several categories, including transcription and RNA processing, protein synthesis and translational regulation, cell division, transport related, photosynthesis and cellular metabolism [58].In Karenia brevis, five time points representing different growth phases were selected for microarray analysis, and taken together, 21% of the 11,000 features examined had accumulated to different levels in logarithmic compared to stationary phase cells [135].Interestingly, a comparison of toxic and non-toxic strains of A. minutum has indicated a strain specific regulation of gene expression [136].Using microarray chips with a cut-off value of 1.5 fold difference, 145 and 47 sequences were identified as up-regulated in either toxic or non-toxic strains, respectively.While one of the original goals was to identify toxin-related genes in Alexandrium, it is unclear how much reliance can be placed on this line of experiments as many toxin related genes could also have unknown and important metabolic functions in the dinoflagellates and thus be similarly regulate in both strains.This view is supported by the observation that a non-toxic strain of Heterocapsa circularisquama transcribes a substantial number of genes thought to be involved in toxin biosynthesis [137].Lastly, it must be noted that expression of the same gene may be regulated differently in different species.As an example, Rubisco is not subject to transcriptional control in L. polyedrum [138] while a pronounced difference is seen in transcript levels over the diurnal cycle in Prorocentrum donghaiense [139].
Gene expression in the dinoflagellates can also be influenced by biotic factors, as shown by a massively parallel signature sequencing MPSS comparison of A. tamarense grown axenically and in normal cultures [134].From a total of 40,000 signatures, 307 were differentially expressed in the axenic cultures (39% up-regulated and 61% down-regulated).The association of bacteria with the dinoflagellates seems to affect the methionine-homocysteine cycle and photosynthesis, as these categories were enriched in the differentially expressed genes.However, it is likely that the most important biotic factors will be those related to symbioses.The first indication of symbiosis-specific gene expression in dinoflagellates was obtained from study of Scrippsiella nutricula with and without its radiolarian host Thalassicola nucleata.It was found that several genes in the dinoflagellate were differentially transcribed depending on symbiotic or free living growth [140].The dinoflagellate-cnidarian symbiosis, vital for ocean reef ecology, also presents an excellent model for understanding the regulation of gene expression by biotic factors.In this context, a homologue of P-type H + -ATPase gene in Symbiodinium was shown to be expressed exclusively during the coral symbiosis [141].Thermal stress, the primary cause of coral bleaching, induced different responses in the host and the symbiont, with the coral expression pattern much more important than the dinoflagellate symbiont [142].Lastly, copepods (Calanushel golandicus, Acartia clausii, and Oithona similis) were found to induce a species-specific response of toxin production by the dinoflagellate Alexandrium, and this was associated with the significant and specific regulation of particular sets of genes, especially those involved in signal transduction, translational and post translational mechanisms [143].
It must be kept in mind that most of the gene regulation studies performed in dinoflagellates are expression-profiling experiments, which indicate mRNA levels and are thus determined by the balance between mRNA synthesis and degradation rates.Indeed, mRNA degradation may play a major role in determining the transcript abundance [144].So far, only half-lives of transcripts whose protein synthesis is regulated by the clock in the dinoflagellate L. polyedrum have been measured [128].Thus, different mRNA levels obtained during the gene expression studies cannot be unambiguously ascribed to result from transcriptional regulation.

Splicing and the Spliceosome
Several posttranscriptional modifications in the primary transcripts of eukaryotic cells are necessary to create a mature mRNA that can be efficiently translated, and of these, arguably the most important is the removal of the intervening sequences, or "introns", that interrupt the coding sequence, or "exons" [145][146][147].Mammalian genomes are generally intron-rich, while in contrast, dinoflagellate genes contain very few or lack introns completely.For example, all the high copy number genes tested in L. polyedrum, such as pcp, lbp and lcf, lack introns [34,35,37].However, in another bioluminescent dinoflagellate, P. lunula, a comparison of genomic and cDNA PCR products of the lcf C gene identified a 403 bp intron [38].The form II Rubisco gene lacks introns in Prorocentrum minimum [148], yet contains six introns in Symbiodinium [149].The saxitoxin pathway gene sxtG in Alexandrium was found to have one intron whose length varied from species to species, ranging from 260 to 750 bp.Sequencing of different sxtG introns showed >90% intraspecies identity and <80% interspecies identity, with no variation observed within a strain [150].Analyses of hsp90 sequences from the genomic DNA of 17 dinoflagellates reported introns in only three species (97 bp, 134 bp and 289 bp in Peridinium willei, Polarella glacialis and Thecadiniium yashimaense, respectively) [151].A more detailed test, carried out with 31 genes in A. carterae, showed that four genes (encoding polyketide synthase, translation initiation factor 3 subunit 8, small nuclear ribonuclear protein and psbO) had 6 or more introns, similar to other eukaryotes, another 11 genes had less than 5 introns, and the rest no introns at all [39].This study also correlated highly expressed genes with a very low intron density and a tandem gene arrangement in the genome [39].
The cellular mechanism that joins exons together by excising the introns is called splicing [146,147].As expected, splicing must be extremely accurate, as even a single nucleotide frame shift could result in a nonsense mutation or a truncated protein.All introns in the nuclear-encoded pre-mRNAs are delimited by splice sites, which are critical sequences specifying the extremities, and eukaryotic introns are generally bounded by the conserved dinucleotides GU and AG at their 5′ and 3′ ends respectively.Another important sequence, the branch point, is usually located between 18 to 40 nucleotides upstream from the 3′ end of the intron, but except from a mandatory adenine which is ligated to the 5′ end of the intron during the splicing reaction, its sequence is only loosely conserved.Interestingly, the dinoflagellate introns typically lack the usual GU-AG splice sites, as exemplified by the AT-TC intron found in lcf C of P. lunula [38], the G(C/A)-AG introns in Symbiodinium rubisco [149] and the AG-AG intron in the Alexandrium sxtG [150].Some of these novel splice sites have been shown to function in other eukaryotes, such as the introns with GC-AG boundaries described in animal and plant genomes [151].
The splice sites in pre-mRNA introns are recognized by base pairing to short RNA molecules (U1, U2, U4, U5 and U6) termed small nuclear RNAs (snRNA), each of which is bound to a complex of proteins to form small nuclear ribonucleoproteins (snRNPs).These five snRNPs, together with numerous non-snRNP proteins, constitute the spliceosome, a dynamic complex that forms and reforms repeatedly to process pre-mRNAs to mature transcripts [152].Many of the protein components are highly conserved between mammals and dinoflagellates, as evidenced by the observation that autoimmune antibodies recognizing the so-called Smith antigen (Sm protein) present in all five human snRNP complexes were found to recognize four of the C. cohnii snRNPs [153].In addition, the L. polyedrum and Symbiodinium transcriptomes contain sequences with significant homology to 70% and 85% of the splicing components, respectively [41,56].A high degree of sequence conservation was also noticed between the dinoflagellate and mammalian U2, U5 and U6 RNAs and, as in higher eukaryotes, the dinoflagellate Sm tends to protect an AUn region in the snRNAs [153].Furthermore, the snRNAs of dinoflagellates have a modified 5′ trimethylguanosine (TMG) cap, as do snRNAs of other eukaryotes [153].Intriguingly, the spatial organization of the splicing process in the nucleus also appears similar in dinoflagellates and other eukaryotes.Several phylogenetically different species, including Prorocentrum micans, Alexandrium fundyense, Akashiwo sanguinea, and Amphidinium carterae were examined microscopically after immunolabeling with antibodies directed against Sm proteins, DNA and p105-PANA (proliferation associated nuclear antigen) in conjunction with cytochemical staining for RNA, phosphorylated proteins and DNA [154].These studies revealed a cross-reaction of the anti-Sm with eukaryotic-like perichromosomal granules, structures enriched in splicing factors that are actively involved in splicing, as well as Cajal-like bodies, nuclear regions thought to be involved in the modification and assembly of snRNPs.However, it must be noted that the anti-Sm labeling on Western blots revealed cross-reaction with proteins other than those of the expected molecular weight [154] raising the possibility that atypical Sm antigens may be present in the dinoflagellates.
Despite the paucity of cis-splicing events in dinoflagellates, trans-splicing is now known to be pervasive [155].In this, dinoflagellates are similar to the kinetoplastid Trypanosoma brucei, where mRNAs were found to contain a consensus sequence of 39 nucleotides (nt) at their 5′ ends.This sequence, termed a spliced leader (SL) sequence [156], is added from a separate SL-donor RNA (an SL RNA) in a process called trans splicing to all trypanosome mRNAs [157].Since this initial discovery, many organisms including cnidarians, ctenophores, flatworms, nematodes, crustaceans, Euglena and now dinoflagellates have also been shown to use SL trans-splicing [158][159][160].
The length of the SL exon varies in different species, from 16 nt in Ciona intestinalis [161] to 51 nt in Stylochus zebra [162], and in dinoflagellates, the SL leader is a 22 nt sequence 5′-DCCGUAGCCAUUUUGGCUCAAG-3′ (D = U, A, or G) [155].The discovery of the dinoflagellate SL has provided an enormous boost to the study of dinoflagellate molecular biology, in part because full-length sequences of dinoflagellate cDNAs can now be readily retrieved, but more importantly, because dinoflagellate sequences can now be isolated from complex mixtures such as RNA extracted from environmental samples or from organisms in symbiosis [70].The dinoflagellate SL sequence is derived from SL RNAs of 50-60 nt and contains an Sm binding motif (AUUUUGG) in the exon, unlike all other SL RNAs where this conserved sequence is found in the intron [155].SL trans-splicing is absent in organelle-encoded transcripts, although a unique type of trans-splicing was recently found in the mitochondria of diverse dinoflagellates.The mitochondrial cox3 gene is encoded in two pieces that are transcribed separately then trans-spliced to form a complete coding cox3 mRNA [163].SL trans-splicing is evolutionarily ancient for the dinoflagellates, also being found in the perkinsozoa, Perkinsus marinus is basal to the dinoflagellate and apicomplexans lineages, and has nuclear-encoded transcripts with 5 different SL sequences.Three of the SL are 22 nt long and similar to the core dinoflagellates (SL1) while the other two are truncated 21 nt SL with either A or G as the starting nucleotide (SL2) [164].The function of SL trans-splicing is not clear.It is unlikely to be involved in mRNA stability or translation, as there was no difference in translation efficiency or stability between trans-spliced and non-trans-spliced nematodes mRNAs [160].It has been proposed that in conjunction with polyadenylation it functions in the production of mature monocistronic transcripts from polycistronic transcripts, and it is still possible that it defines the 5′ end of transcripts even though polycistronic transcription now seems limited [41].
The paucity of introns, as well as the presence of multiple relict sequences related to the SL in the 5′ ends of dinoflagellate genes isolated from genomic DNA, has led to the proposal of a mRNA recycling mechanism whereby mature mRNAs are inserted back into the genome through a recombination process [165].This hypothesis still requires a more comprehensive enquiry in diverse dinoflagellates, but if true, may shed some light on the origin of the plethora of tandem array genes in dinoflagellates.It is also interesting that alternative splicing, a process by which cells can generate several proteins through permutation and combination of exons from a single pre-mRNA, has been discovered for cyclin transcripts in Perkinsus marinus [166].Alternative splicing may have been lost after divergence from this basal lineage as to date, it has not yet been observed for other dinoflagellates.

RNA Transport and mRNA Surveillance Pathways
Nuclear pore complexes (NPC) are enormous protein complexes, ranging from 50 MDa in yeast to 125 MDa in mammals, which are present within the nuclear envelope and mediate nucleo-cytoplasmic transport [167,168].Though small molecules under 40 kDa can passively diffuse through NPC, larger mRNA molecules require a more complex energy-dependent and signal-mediated process [169].The nuclear export pathway has been well characterized in yeast and higher eukaryotes, but does not appear to be conserved in apicomplexans, as many of the important components are either absent or unrecognizable by homology search algorithms [170].To date, no description of this pathway has been made in any dinoflagellate, and we have thus analyzed the L. polyedrum transcriptome to try and retrieve the components expected for RNA transport.There are three general classes of proteins required, those forming the nuclear pore itself and those soluble in either the nucleus or the cytoplasm.Compared to the components found in other eukaryotes, the most marked difference between the alveolates and other organisms appears to lie in those components used for construction of the pore (Table 1).Apart from the conserved integral membrane proteins termed Nups, thought to anchor the pores in the nuclear membrane, it seems that lower eukaryotes either manage to construct this large molecular complex with far fewer elements than are required in mammals, or alternatively, employ some unique and as yet unidentified constituents.It would evidently be of great interest to examine the structure of the pore using electron microscopy to ascertain if the pore retains the eightfold symmetrical structure normally found in higher eukaryotes.In addition to the NPC, a plethora of nuclear and cytoplasmic trans acting factors are also employed to mediate RNA processing and transport in mammals and higher eukaryotes.The nuclear components include factors common to the different types of RNA as well as other specific factors for processing and maturity that facilitate the nucleo-cytoplasmic transport [171], and these appear to be conserved in the dinoflagellates.In contrast, only a third of the mammalian and half of the plant cytoplasmic components involved in nuclear transport are conserved in L. polyedrum and other alveolates (Table 1, Supplementary Figure S2).Eukaryotes also employ a multistep "quality control" or surveillance pathway to selectively degrade the damaged or mutated mRNAs as a protective mechanism against aberrant protein synthesis.This concerted procedure starts with mRNA capping during transcription within the nucleus, and ends in the cytoplasm with the degradation of abnormal mRNAs.There are three main pathways, the first being nonsense-mediated mRNA decay.In mammals, this pathway interprets stop codons found 50 or more nucleotides upstream form the last exon boundary to be premature stop codons, principally because normal stop codon are typically located in the last exon [172,173] and this process uses factors involved in capping or 3′ end processing of the pre-mRNAs as well as a large complex of nuclear factors comprising the exon-junction complex (EJC) as a scaffold [174].These mRNAs are then degraded to block synthesis of truncated proteins that might act as dominant negative or gain-of-function mutants.Curiously, despite the conservation of many of the components, intron/exon boundaries are not required to fulfill the same role in invertebrates and yeast although the implication of the EJC is not well defined in these systems.Nonsense-mediated decay appears to be operative in dinoflagellates, as many of the generally conserved components are found (Table 2), but the mechanisms used may be more similar to yeasts and insects as dinoflagellate genes have a generally low intron density.The second pathway, termed nonstop-mediated mRNA decay, is used to detect mRNA molecules lacking a stop codon.These transcripts pose a problem in that ribosomes translating into the poly A tail stall and have difficult dissociating from the transcript, thus reducing the number of ribosomes available for general translation [175].This mechanism requires both a release of the ribosome and a degradation of the mRNA, but the components required for this remain to be fully characterized.Lastly, recognition of stalled ribosomes may also be involved in what is termed no-go mRNA decay [176], where ribosomes stalled during translation, perhaps because of unusual secondary structure elements in the transcript, are also targeted for degradation [174].In general, dinoflagellates and other alveolates have a very poor conservation of the nuclear factors required for RNA surveillance (27% as compared to mammals) although the conservation of cytoplasmic factors is better (67% as compared to the mammals) (Table 2).

Conclusions and Perspectives
Considerable progress has been made in the study of dinoflagellate transcription, fuelled in large part by the recent availability of low cost sequencing.We show here that most of the expected players in the transcriptional machinery are found in dinoflagellates, at least with respect to their counterparts among the Alveolata.The exception to this general rule is that the specific transcription factors seem in large part to be reduced in quantity and type in the dinoflagellates.Thus, while general transcription carries on much as expected for the eukaryotes, the specific targeting of genes for transcriptional control may differ as a result of the unusual chromatin organization in this class.Further studies will now be necessary to confirm the biochemical activities of some of the more interesting components identified from the massive influx of sequence information.

Figure 1 .
Figure 1.The diagram shows the schematic representation of the phylogeny of the superphylum Alveolata, which is marked by the presence of the cortical alveoli.Splice leader trans-splicing is a common feature in all the members of the dinoflagellate clade, while Oxyrrhis and the core dinoflagellates lack histones and have a dinokaryotic nucleus.

Figure 2 .
Figure 2. (A) Permanently condensed chromosomes of the dinoflagellate Lingulodinium polyedrum (the cultures were obtained from the National Center for Marine Algae, Maine) as visualized by fluorescence microscopy after DAPI.The C-shaped nucleus (n) is surrounded by the small punctate DNA staining of the multiple plastid genomes and lies under two larger spherical PAS bodies (p) at the apical end of the cell.(B) The nucleus viewed by transmission electron microscopy.The cross section shown lies near the back of the C-shaped nucleus (n) and shows chromosomes cut both in cross section (ovals) and longitudinally (cylinders), as well as plastids (p) and numerous diamond-shaped trichocysts.All scale bars are 10 µm.

Figure 3 .
Figure 3.The number of RNA polymerase components present over a wide phylogenetic range of organisms includes those considered to be core components (red), common components (yellow) and specific components (blue) of the RNAP I, II and III.Each bar represents an individual component.The representative sequences for the RNA polymerase I, II and III subunits were selected from an animal (H.sapiens), a plant (A.thaliana), a diatom (T.pseudonana), and two other alveolates (T.thermophila and P. falciparum) and uploaded and maintained as a local database in the Geneious software.Using tBLASTn and an expect E-value cutoff of e −25 , the Lingulodinium transcriptome was scanned to obtain the homologues for the RNA polymerase subunits[41].For all other species the sequences were directly obtained from the KEGG specific pathway database by selecting the specific organism.

Figure 5 .
Figure 5. Phylogenetic distribution of transcription factors associated with RNA-polymerase II shows a marked decrease in the number of TFII members among the apicomplexans.The dinoflagellates do not contain the putative TBP (red) but do express a TBP-like factor (TLF; pink).Each bar represents a different component.A pool of basal transcription factor (BTF) protein sequences were selected from the five species then stored as a local database in Geneious.The Lingulodinium transcriptome was scanned using tBLASTn at an expect E-value of <e −25 , to obtain the homologues of the BTFs[41].

Table 1 .
Number of components involved in nuclear transport found in the L. polyedrum transcriptome.Gene sequences for various Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were tabulated.The alveolates are represented by L. polyedrum (Lp), Plasmodium falciparum (Pf) and Tetrahymena thermophila (Tt).A cutoff value of e −25 was used to assess the presence of components.

Table 2 .
Number of components involved in mRNA surveillance found in the L. polyedrum transcriptome.Gene sequences for various KEGG pathways were tabulated.The alveolates are represented by L. polyedrum (Lp), Plasmodium falciparum (Pf) and Tetrahymena thermophila (Tt).A cutoff value of e −25 was used to assess the presence of components.