In Silico Identification of Type III PKS Chalcone and Stilbene Synthase Homologs in Marine Photosynthetic Organisms

Marine microalgae are photosynthetic microorganisms at the base of the marine food webs. They are characterized by huge taxonomic and metabolic diversity and several species have been shown to have bioactivities useful for the treatment of human pathologies. However, the compounds and the metabolic pathways responsible for bioactive compound synthesis are often still unknown. In this study, we aimed at analysing the microalgal transcriptomes available in the Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP) database for an in silico search of polyketide synthase type III homologs and, in particular, chalcone synthase (CHS) and stilbene synthase (STS), which are often referred to as the CHS/STS family. These enzymes were selected because they are known to produce compounds with biological properties useful for human health, such as cancer chemopreventive, anti-inflammatory, antioxidant, anti-angiogenic, anti-viral and anti-diabetic. In addition, we also searched for 4-Coumarate: CoA ligase, an upstream enzyme in the synthesis of chalcones and stilbenes. This study reports for the first time the occurrence of these enzymes in specific microalgal taxa, confirming the importance for microalgae of these pathways and giving new insights into microalgal physiology and possible biotechnological applications for the production of bioactive compounds.


Introduction
Microalgae are photosynthetic organisms adapted to live in several different environments, including marine, freshwater, polar, temperate and tropical. They are also known to produce a plethora of metabolites derived from primary and secondary metabolism, which have shown possible applications for human health (e.g., compounds with anti-cancer, anti-microbial, anti-diabetes, anti-epilepsy, anti-inflammatory, anti-atherosclerosis, anti-osteoporosis, immunomodulatory and antioxidant activities; [1][2][3][4][5][6][7][8][9][10]). Various studies have focused on the discovery of metabolic pathways involved in the synthesis of bioactive compounds [11][12][13][14]. Even if microalgae can be cultured in big volumes to get large amounts of the compounds of interest, heterologous expression of the enzyme responsible for the compound synthesis and its production in a host can be a valuable alternative for meeting industrial demands.
From an evolutionary point of view, STS protein sequences do not form a cluster of their own but are grouped with the CHS sequences of the same or related organisms rather than with other STS [23,24,80]. This suggested the hypothesis that there was no ancestral STS gene and that STS genes developed from CHS recurrently and independently [24]. The latter hypothesis is supported by the fact that, currently, STS genes have been isolated from a small number of unrelated higher plants (see above) and that in vitro studies have shown that few amino acid changes are sufficient to convert a CHS into a protein with STS activity [24,81]. Even if most of the known chalcone and stilbene synthase genes have been isolated from gymnosperms and angiosperms, there are also reports from other land plant lineages. For instance, CHS-like genes were found in ferns [82] and liverworts [83]. The occurrence of homologs of some of the land plant flavonoid/stilbenoid pathway genes in other lineages of marine photosynthetic organisms is still controversial. No type III PKS genes have been found in various microalgal genomes, such as of the diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana, or in red algae [84]. This led to the conclusion that these genes have probably been acquired after the land conquest, possibly by lateral gene transfer from bacteria or fungi [84]. However, recent studies have confirmed the presence of putative type III PKS enzymes in the brown alga Ectocarpus siliculosus [84] and the dichtyochophyte Pseudochattonella farcimen [85] and of some genes involved in the phenylpropanoid pathway in streptophyte algae [86]. On the basis of these new data, it was hypothesised that the lateral gene transfer event of type III PKS genes must have occurred after the separation of diatoms from other ochrophytes, but before the divergence of brown algae with pelagophytes and dichtyochophytes [84]. To date, only a few nuclear genomes are available for microalgae to corroborate such assumptions, mostly due to their large sizes and high complexity, especially for dinoflagellates [87,88], which make them difficult to analyse. As a consequence, the study of gene function and metabolic pathways in these microorganisms has been mostly done through transcriptome sequencing [11,[89][90][91][92]. Indeed, targeting only expressed coding regions, transcriptome sequencing overcome the issues associated to sequencing and assembly of introns, intergenic and repetitive regions common to eukaryotes [93]. Furthermore, thanks to the Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP), hundreds of transcriptomes from the most abundant and ecologically significant microbial eukaryotes in the oceans have been made available to the public [93].
In this study, we aimed at shedding light on the possible occurrence of chalcone and stilbene synthase genes in marine photosynthetic eukaryotes, with special regard to STS due to relevant pharmaceutical activities of resveratrol. We searched for CHS/STS homologs in the transcriptomes of photosynthetic marine organisms from MMETSP [93]. We annotated the sequences retrieved from BLAST search and inferred phylogenetic trees. To further confirm the validity of our findings, we also searched for homologs of 4-Coumarate: CoA ligase (4CL), an upstream enzyme in flavonoid and stilbenoid biosynthesis in land plants.

Sequence Alignment, Trimming and Phylogenetic Inference
To ascertain the evolutionary relationships among CHS/STS genes of marine photosynthetic organisms and the well-studied terrestrial counterpart (land plants), we included in our analysis the sequences of STS, CHS and CHS-like genes of representative organisms from different taxonomic categories (Table S2). As outgroups, we used the sequences of 3-ketoacyl-CoA synthase (KCS) from representative taxa (Table S2), which have previously proven to be the sister group of CHS/STSs in the thiolase superfamily [80]. For 4CL, we downloaded from the GenBank ingroup and outgroup sequences from different taxa (Table S2) and used the sequences of luciferases as outgroup taxa [97].
Sequences were aligned using COBALT [98] (available at https://www.ncbi.nlm.nih.gov/tools/ cobalt/). This software uses sequence information together with protein-motif regular expressions (PROSITE database) and conserved protein domains (NCBI CDD database) to produce biologically meaningful multiple alignments [98]. Poorly aligned regions were removed with trimAl v1.2 [99] using the automated1 option to find the most appropriate mode to trim the alignment (use of gaps or similarity scores) depending on the alignment characteristics. A maximum likelihood phylogenetic tree was inferred for both CHS/STS and 4CL genes in PhyML [100] using the evolution model suggested by Smart Model Selection (SMS) [101]. Support to nodes was calculated using the Shimodaira-Hasegawa-like (aLRT SH-like) procedure [102]. The resulting tree was visualised and graphically edited in FigTree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/).

Protein Domain Assignment
The microalgal strains exhibiting multiple CHS/STS homologs after functional annotation and phylogenetic inference were further analysed to annotate the conserved domains within the protein sequence using InterProScan 5 [103]. For each protein sequence, we reported the occurrence of N and C-terminus domains, which are relevant to biochemical activity [104] and their localisation along the transcript.

Retrieval, Annotation and Phylogenetic Analysis of STS/CHS Homologs
The BLASTP search against the 112 transcriptomes of MMETSP (103 of photosynthetic microorganisms plus 9 of not-photosynthetic ones) returned 43 homolog sequences in 27 taxa (Table S1). After functional annotation, 31 homologs in 24 taxa were left (Table S1, File S1). According to the results, the functional filter allowed the removal of protein sequences with conserved domains but different function in the same organisms, since 12 sequences and only 3 taxa were lost. In total, the homologous sequences used for phylogenetic inference (including the ones from GenBank) corresponded to 12 classes/phyla: Bacillariophyta, Chlorophyta, Chrysophyceae, Coccolithophyceae, Cyanobacteria, Dictyocophyceae, Dinophyceae, Oxyrrhidophyceae, Pelagophyceae, Raphidophyceae, Streptophyta and Xantophyceae ( Figure 2).
The final alignment (after trimming procedure) included 60 sequences and 249 characters (File S2). The best evolutionary model for the protein alignment selected using the Akaike Information Criterion (AIC) was the LG+G+I model [105]. The maximum likelihood phylogenetic tree of CHS/STS genes ( Figure 2) confirmed that all retrieved and annotated homologs form a monophyletic group that is sister to 3-ketoacyl-CoA synthases (KCSs), the latter considered the closest outgroup within the thiolase superfamily [80]. Within this group, a first, highly supported (1.00 support value) branching event led to the separation of CHS/STS-like genes of Prymnesium parvum Texoma1 (Coccolithophyceae), Oxyrrhis marina LB1974 (Oxyrrhidophyceae) and Karlodinium micrum CCMP2283 (Dinophyceae) from all other sequences. The next branching event in the phylogenetic tree (0.76-0.90 support value) separates two main clades of CHS/STS-like sequences-one containing sequences of Dictyocophyceae, Pelagophyceae, Raphidophyceae and Xantophyceae, with the sequence of Symbiodinium sp. C1 as sister taxon, and another one encompassing sequences of diatoms (Bacillariophyta), green algae (Chlorophyta), dinoflagellates (Dinophyta), chrysophyceae and the land plants (Streptophyta) (Figure 2). The sequence of a CHS-like gene belonging to the cyanobacterium Synechococcus sp. is sister to this clade. The localisation of one of the two sequences found in the transcriptome of the chrysophyte Ochromonas sp. CCMP1393 in this clade next to other dinoflagellates (despite with low support) is likely to constitute an artefact due to its shorter length (less than half) in respect to other sequences in the alignment (File S1). The CHS/STS sequences of land plants formed, as expected, a highly supported (1.00) clade, in which the sequences of CHS and STS do not form a separate cluster but are grouped together. The final alignment (after trimming procedure) included 60 sequences and 249 characters (File S2). The best evolutionary model for the protein alignment selected using the Akaike Information

Identification of Taxa with CHS/STS-and 4CL-like Enzymes
Our transcriptome survey revealed that after functional annotation, homologs for both CHS/STS and 4CL genes were only found in the raphidophyte Heterosigma akashiwo strains CCMP3107 and NB, and the dinoflagellate Kryptoperidinium foliaceum (Dinophyceae) strain CCMP1326 (Table S1). Other strains of Heterosigma akashiwo failed the annotation step for 4CL gene (Table S1). However, in many other taxa (e.g., other dinoflagellates, diatoms, dictyocophyceae, pelagophyceae, chrysophyceae and oxyrrhidophyceae) we found both CHS/STS and 4CL homologs, but the latter were not functionally recognised as 4-Coumarate: CoA ligases (Table S1).
At a higher taxonomic level, homologs for both genes were mostly found in Bacillariophyta, Dinophyceae and Raphidophyceae. In the trancriptomes of Pyramimonas parkeae CCMP726 and Chattonella subsalsa CCMP2191 we found three homologs corresponding to CHS/STS-like genes each, with different degree of similarity. In the chlorophyte Pyramimonas parkeae CCMP726, these homologs shared a homology between 53%-63%, whilst in the case of the raphidophyte Chattonella subsalsa The final alignment (after removal of poorly aligned regions) consisted of 317 aa (File S4). The best evolutionary model of protein evolution selected using the AIC was the LG+G+F [105]. The inferred phylogenetic tree showed that the sequences of the dinoflagellates Karenia brevis SP1 and Kryptoperidinium foliaceum CCMP1326 formed a highly supported clade (support > 0.91) that was separated from all the other sequences (1.00 support value) (Figure 3). Within this latter group, the sequences of marine photosynthetic microorganisms (Bacillariophyta, Pavlovophyceae, Raphidophyceae) formed a monophyletic group (support value > 0.75) sister to the 4CL sequence of the actinobacterium Streptomyces malaysiensis (support > 0.91). Dinoflagellate sequences form a highly supported clade (1.00) but the phylogenetic relationships with neighbour taxa are not resolved (support value < 0.50). The sequences of land plants formed a monophyletic group, with the sequences of gymnosperms and angiosperms closely related (support > 0.91) and of the fern Dryopteris fragrans as sister group (support > 0.75). The sequence of the ciliate Favella taraikaensis is found close to Streptophyta, but the support was low (0.51-0.75).

Identification of Taxa with CHS/STS-and 4CL-Like Enzymes
Our transcriptome survey revealed that after functional annotation, homologs for both CHS/STS and 4CL genes were only found in the raphidophyte Heterosigma akashiwo strains CCMP3107 and NB, and the dinoflagellate Kryptoperidinium foliaceum (Dinophyceae) strain CCMP1326 (Table S1). Other strains of Heterosigma akashiwo failed the annotation step for 4CL gene (Table S1). However, in many other taxa (e.g., other dinoflagellates, diatoms, dictyocophyceae, pelagophyceae, chrysophyceae and oxyrrhidophyceae) we found both CHS/STS and 4CL homologs, but the latter were not functionally recognised as 4-Coumarate: CoA ligases (Table S1).
At a higher taxonomic level, homologs for both genes were mostly found in Bacillariophyta, Dinophyceae and Raphidophyceae. In the trancriptomes of Pyramimonas parkeae CCMP726 and Chattonella subsalsa CCMP2191 we found three homologs corresponding to CHS/STS-like genes each, with different degree of similarity. In the chlorophyte Pyramimonas parkeae CCMP726, these homologs shared a homology between 53-63%, whilst in the case of the raphidophyte Chattonella subsalsa CCMP2191 homology was between 89-96%.

Protein Domain Assignment
Four microalgal strains, Chattonella subsalsa CCMP2191, Heterosigma akashiwo CCMP2393, Ochromonas sp. CCMP1393 and Pyramimonas parkeae CCMP726, contained more than one CHS/STS transcript. The domain analysis revealed that all of these sequences contain the CHS/STS domains, both at N terminus (IPR001099, green bars) and C terminus (IPR012328, orange bars) (Figure 4). The only exception was the CAMPEP 0190290982 transcript of Ochromonas sp. CCMP1393, which contained only a portion of the CHS/STS domain of around 100 aa at the N terminus (Figure 4c). This might result in an artefact, also explaining why this sequence is misplaced in the phylogenetic tree, close to dinoflagellates instead of other phylogenetically related taxa as well as the other transcript from the same species (Figure 2). Furthermore, this transcript is likely to be non-functional, because of the lack of the C-terminus domain. The three CHS/STS transcripts found in Pyramimonas parkeae CCMP726 presented an overlapping match of N and C domains (Figure 4d). This could be interpreted as either the result of complex structural gene rearrangements that occurred during the evolutionary history of the species or contrasting predictions of domain architecture in InterPro.
close to dinoflagellates instead of other phylogenetically related taxa as well as the other transcript from the same species (Figure 2). Furthermore, this transcript is likely to be non-functional, because of the lack of the C-terminus domain. The three CHS/STS transcripts found in Pyramimonas parkeae CCMP726 presented an overlapping match of N and C domains (Figure 4d). This could be interpreted as either the result of complex structural gene rearrangements that occurred during the evolutionary history of the species or contrasting predictions of domain architecture in InterPro.

Discussion
Type III polyketide synthases were initially thought to be plant-exclusive enzymes with a pivotal role in the biosynthesis of flavonoids and several secondary metabolites as chalcones, stilbenes, benzophenones, acridones, phloro-glucinols and resorcinols [16,22]. Subsequently, these enzymes were also found in some bacteria [15], fungi [18] and brown algae [84]. Several transcriptome surveys have demonstrated that marine protists may encode several PKS enzymes, especially of type I and II [106][107][108][109]. However, to date, proper knowledge of the occurrence and function of type III PKS enzymes in marine photosynthetic organisms is still lacking.
In this paper, we focused our attention on a specific class of type III PKSs, the chalcone family, which includes enzymes involved in the production of chalcones and stilbenes (e.g., resveratrol) and their precursor enzyme 4-Coumarate: CoA ligase (4CL). Our transcriptomic search reveals the occurrence of CHS/STS-like genes in several lineages of marine photosynthetic microorganisms (Bacillariophyta, Chlorophyta, Chrysophyceae, Coccolithophyceae, Dictyochophyceae, Dinophyceae, Oxyrrhidophyceae, Pelagophyceae, Raphidophyceae and Xantophyceae). From the phylogenetic point of view, the sequences of diatoms (Fragilariopsis kerguelensis, Thalassiosira miniscula and Thalassiosira weissflogii) and some dinoflagellates (Durinskia baltica and Kryptoperidinium foliaceum), as well as the ones of the green alga Pyramimonas parkeae could be considered as the most likely CHS/STS-like candidates ( Figure 2). This is because they are in the same, highly supported clade that contains the true CHS/STS genes of land plants (Streptophyta) and is sister to the cyanobacterium Synechococcus. The sequences of Dictyochophyceae, Pelagophyceae, Raphidophyceae and Xantophyceae are grouped together in a highly supported clade, and this arrangement is consistent with the traditional phylogeny of such taxa, especially regarding Heterokonts [110]. We cannot undoubtedly assert that they belong to the same CHS family of land plants but surely they can be considered as type III PKS enzymes. The same assertion is valid for the homologs found in the haptophyte Prymnesium parvum, the dinoflagellate Karlodinium micrum and the oxyrrhidophyte Oxyrrhis marina. Such sequences are distantly related to those from other dinoflagellates or have homologs within the same species that are closer to other species in the phylogenetic tree (e.g., the case of Oxyrrhis marina).
In general, the finding of such homologs in all the aforementioned taxa is interesting since so far type III PKS genes from marine organisms were only known in some brown algae [111][112][113], dichtyochophytes [85] and ochrophytes [84]. Under the light of such findings, we support the current evolutionary scenario according to which type III PKS genes were acquired by an ancient lateral gene transfer event (likely from a bacterium) before the divergence of brown algae with pelagophytes and dichtyochophytes [84]. Our data also provide evidence for the fact that despite type III PKS homologs are absent in the genome, such as the diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana [84], they occur in the transcriptomes of the congeneric (T. minuscola and T. weissflogii) or other (Fragilariopsis kerguelensis) diatom species. Furthermore, we provide evidence for the first time of the occurrence of type III PKS homologs in green algae. Indeed, we found three CHS/STS homologs within the transcriptome of the prasinophyte Pyramimonas parkeae, which were confirmed by domain analyses. Such preliminary results open up new scenarios about the evolution and the metabolic and ecological role of such enzymes in the taxa here investigated.
Since chalcone and stilbene synthases catalyse the sequential decarboxylative addition of three acetate units from malonyl-CoA to a p-coumaryl-CoA starter molecule [20], we also looked for the occurrence of the 4-Coumarate: CoA ligase (4CL), the enzyme responsible for the production of p-coumaryl-CoA. Our results indicate that homologs annotated as 4CL occurred in a limited number of taxa as Bacillariophyta, Dinophyceae, Pavlovophyceae and Raphidophyceae. Since we analysed transcriptomic data, the absence of 4CL homologs (or CHS/STS for the case above) in some taxa does not necessarily mean that such genes are absent in those organisms but simply that they were not expressed at the time of sampling. This constitutes a clear limitation of transcriptome over genome mining when searching for genes/pathways of interest. Nonetheless, to date, transcriptome mining remains a valuable resource if considering the disproportionate amount of microalgal transcriptomes over genomes available. Our knowledge of occurrence of 4CLs is limited to land plants and some streptophyte algae [86] and little is known about other photosynthetic lineages. In these taxa, this enzyme seems to be involved in the production of lignin-like compounds and defense mechanisms. However, many lands plants also possess several 4CL-like enzymes that are not involved in flavonoid or lignin biosynthesis but whose function is still unknown [114][115][116].
In our analysis, we found only two organisms that expressed both CHS/STS and 4CL-like enzymes, the raphidophyte Heterosigma akashiwo (strains CCMP3107 and NB), and the dinoflagellate Kryptoperidinium foliaceum. These microorganisms are phylogenetically distant, there is not much information regarding their bioactivities or biosynthetic pathways, and are hence the taxa of election for further investigations. These results give new insights into the presence of molecular machineries for the production of naringenin chalcone or resveratrol, or, at least, what their homologs do in land plants. Marine microalgae possessing type I and II PKS enzymes are already known to produce polyketides with applications in human health and biotechnology [117][118][119][120]. We demonstrated that several lineages of microalgae possess type III PKS resembling CHS/STS genes, which posed new questions on their possible functions in microalgae. From a biotechnological point of view, this discovery shed light on new biosynthetic pathways to be considered for the production of bioactive compounds from microalgae.

Conflicts of Interest:
The authors declare no conflict of interest. The funder had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.