Monogalactosyldiacylglycerol and Sulfolipid Synthesis in Microalgae

Microalgae, due to their huge taxonomic and metabolic diversity, have been shown to be a valuable and eco-friendly source of bioactive natural products. The increasing number of genomic and transcriptomic data will give a great boost for the study of metabolic pathways involved in the synthesis of bioactive compounds. In this study, we analyzed the presence of the enzymes involved in the synthesis of monogalactosyldiacylglycerols (MGDGs) and sulfoquinovosyldiacylglycerols (SQDG). Both compounds have important biological properties. MGDGs present both anti-inflammatory and anti-cancer activities while SQDGs present immunostimulatory activities and inhibit the enzyme glutaminyl cyclase, which is involved in Alzheimer’s disease. The Ocean Global Atlas (OGA) database and the Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP) were used to search MGDG synthase (MGD), UDP-sulfoquinovose synthase (SQD1), and sulfoquinovosyltransferase (SQD2) sequences along microalgal taxa. In silico 3D prediction analyses for the three enzymes were performed by Phyre2 server, while binding site predictions were performed by the COACH server. The analyzed enzymes are distributed across different taxa, which confirms the importance for microalgae of these two pathways for thylakoid physiology. MGD genes have been found across almost all analyzed taxa and can be separated in two different groups, similarly to terrestrial plant MGD. SQD1 and SQD2 genes are widely distributed along the analyzed taxa in a similar way to MGD genes with some exceptions. For Pinguiophyceae, Raphidophyceae, and Synurophyceae, only sequences coding for MGDG were found. On the contrary, sequences assigned to Ciliophora and Eustigmatophyceae were exclusively corresponding to SQD1 and SQD2. This study reports, for the first time, the presence/absence of these enzymes in available microalgal transcriptomes, which gives new insights on microalgal physiology and possible biotechnological applications for the production of bioactive lipids.


Introduction
Microalgae are eukaryotic photosynthetic microorganisms that are adapted to live in ecologically different habitats, which results in a wide diversity of species and natural products [1,2] with pharmaceutical [3], nutraceutical [4], and cosmeceutical [5] interests. Microalgae can be cultivated in huge quantities and this advantage overcomes the bottleneck of drug discovery from marine macro-organisms and destructive collection practices. In addition, many studies have focused on optimizing the culturing conditions in order to obtain the metabolites of interest or produce them Sulfolipids are constituent of the thylakoids in plant and algal chloroplasts [31]. Sulfoquinovose, which is the building block of sulfolipids, is also the major component of the biological sulfur cycle [32] and it is produced by photosynthetic organisms at a rate of 1010 tons per year [33]. The biosynthesis of these lipids proceeds in two reactions. The first reaction is catalysed by the UDP-sulfoquinovose synthase (SQD1), that assembles the UDP-sulfoquinovose from UDP-glucose and sulphite [34]. The second reaction is catalysed by sulfoquinovosyltransferase (SQD2), which transfers the sulfoquinovose to diacylglycerol [35]. This produces sulfoquinovosyldiacylglycerol (SQDG) (Figure 1b).
Different classes of microalgal-derived compounds have been identified and several have shown specific biological activities, such as anti-cancer [13], anti-inflammatory [14], anti-tuberculosis [36], anti-epilepsy [37], anti-microbial [38], immune-regulatory [39], anti-hypertensive [40], anti-atherosclerosis [40], and anti-osteoporosis [40] activity. The systematic studies of the genes involved in the synthesis of the bioactive compounds of interest are increasing [7,9,10] and can open new perspectives for gene-editing and boost the use of microalgae as a source of new marine natural products.
Mar. Drugs 2020, 18, x 3 of 18 types of genes, namely type-A (AtMGD1) and type-B (AtMGD2 and AtMGD3) isoforms, and enzymes are well characterized [25][26][27][28][29][30]. In microalgae, there is no information available yet. Sulfolipids are constituent of the thylakoids in plant and algal chloroplasts [31]. Sulfoquinovose, which is the building block of sulfolipids, is also the major component of the biological sulfur cycle [32] and it is produced by photosynthetic organisms at a rate of 1010 tons per year [33]. The biosynthesis of these lipids proceeds in two reactions. The first reaction is catalysed by the UDPsulfoquinovose synthase (SQD1), that assembles the UDP-sulfoquinovose from UDP-glucose and sulphite [34]. The second reaction is catalysed by sulfoquinovosyltransferase (SQD2), which transfers the sulfoquinovose to diacylglycerol [35]. This produces sulfoquinovosyldiacylglycerol (SQDG) (Figure 1b).
Different classes of microalgal-derived compounds have been identified and several have shown specific biological activities, such as anti-cancer [13], anti-inflammatory [14], anti-tuberculosis [36], anti-epilepsy [37], anti-microbial [38], immune-regulatory [39], anti-hypertensive [40], antiatherosclerosis [40], and anti-osteoporosis [40] activity. The systematic studies of the genes involved in the synthesis of the bioactive compounds of interest are increasing [7,9,10] and can open new perspectives for gene-editing and boost the use of microalgae as a source of new marine natural products. (i.e., UDP-sulfoquinovose synthase or SQD1 and sulfoquinovosyltransferase or SQD2) biosynthesis.

Identification and Taxonomic Assignation of MGD, SQD1, and SQD2 Homologous Sequences
The BLASTp search conducted in the OGA database returned 1,614 annotated eukaryotic sequences putatively attributable to MGD, 1,422 to SQD1 and 1,340 to SQD2. Of these sequences, the following were unambiguously identified as the genes of interest after Blast2GO analysis: 1154 to MGDG, 817 to SQD1, and 273 to SQD2 (Supplementary Data S1-S3, respectively).

Identification and Taxonomic Assignation of MGD, SQD1, and SQD2 Homologous Sequences
The BLASTp search conducted in the OGA database returned 1,614 annotated eukaryotic sequences putatively attributable to MGD, 1,422 to SQD1 and 1,340 to SQD2. Of these sequences, the following were unambiguously identified as the genes of interest after Blast2GO analysis: 1154 to MGDG, 817 to SQD1, and 273 to SQD2 (Supplementary Data S1-S3, respectively).
The taxonomic pattern of these sequences, excluding the ones annotated as only as "Eukaryota" and "Stramenopiles" is illustrated in Figure 2. Of the 15 taxonomic divisions considered, 13 were found in the sequences assigned to MGD, 11 to SQD1, and 7 to SQD2 ( Figure 2). In this database (OGA), we found homologs for all three genes investigated in the following microalgal taxa: Bacillariophyta, Chlorophyta, Dictyochophyceae, Coccolithophyceae, Pavlovophyceae, and Pelagophyceae. In particular, homolog sequences were particularly abundant in Bacillariophyta, Coccolithophyceae, and Pelagophyceae. For Pinguiophyceae, Raphidophyceae, and Synurophyceae, only sequences coding for MGDG were found. On the contrary, sequences assigned to Ciliophora and Eustigmatophyceae were exclusively corresponding to SQD1 and SQD2. Considering that the results are based on transcriptomic data, they can suggest the absence of a specific gene in certain taxa or simply the analyzed organisms were not expressing the specific gene at the time of sampling and fixing for the RNA-sequencing.
Mar. Drugs 2020, 18, x 4 of 18 The taxonomic pattern of these sequences, excluding the ones annotated as only as "Eukaryota" and "Stramenopiles" is illustrated in Figure 2. Of the 15 taxonomic divisions considered, 13 were found in the sequences assigned to MGD, 11 to SQD1, and 7 to SQD2 ( Figure 2). In this database (OGA), we found homologs for all three genes investigated in the following microalgal taxa: Bacillariophyta, Chlorophyta, Dictyochophyceae, Coccolithophyceae, Pavlovophyceae, and Pelagophyceae. In particular, homolog sequences were particularly abundant in Bacillariophyta, Coccolithophyceae, and Pelagophyceae. For Pinguiophyceae, Raphidophyceae, and Synurophyceae, only sequences coding for MGDG were found. On the contrary, sequences assigned to Ciliophora and Eustigmatophyceae were exclusively corresponding to SQD1 and SQD2. Considering that the results are based on transcriptomic data, they can suggest the absence of a specific gene in certain taxa or simply the analyzed organisms were not expressing the specific gene at the time of sampling and fixing for the RNA-sequencing. For MMETSP transcriptomes, we retrieved 313 sequences for MGD, 84 for SQD1, and 266 for SQD2 after performing a blastp search. After functional annotation (and removal of sequences considered not valid), 267 sequences were left for MGD, 80 for SQD1, and 121 for SQD2 (Supplementary Data S4-S6, respectively). For MMETSP transcriptomes, we retrieved 313 sequences for MGD, 84 for SQD1, and 266 for SQD2 after performing a blastp search. After functional annotation (and removal of sequences considered not valid), 267 sequences were left for MGD, 80 for SQD1, and 121 for SQD2 (Supplementary Data S4-S6, respectively).
We reported in Table 1 the species and the strains in which MGD, SQD1, and SQD2 were found. The analyzed genes were found in 96 different species or strains across 14 different taxonomic categories. All the three genes were found in the reported taxa, with the exception of the two species of Chrysophyceae (Dinobryon sp. UTEXLB2267 and Paraphysomonas imperforata PA2, Supplementary Data S7). In addition, considering that the results are based on transcriptomic data, they can be interpreted as either the absence of that specific gene or lack of its expression in that particular condition. Table 1. Species and strains from MMETSP transcriptomes in which MGD, SQD1, and SQD2 homologs were found. Colors refer to taxonomic ranks as in Figure 2. P = presence of validated gene homologs.

Final Dataset and Phylogenetic Inferences
The final alignments consisted of 649 sequences for MGD, 521 for SQD1, and 244 for SQD2 (Supplementary Data S8-S10, respectively), from both the OGA and MMETSP database. The length of each dataset after trimming of poorly aligned regions was as follows: 377 bp for MGD, 198 bp for SQD1, and 295 bp for SQD2.
The MGD phylogenetic tree showed that most of the taxa here investigated contained paralog copies (homolog copies resulting from duplication events) of monogalactosyldiacylglycerol synthase gene ( Figure 3). These paralogs generally occurred in two copies, resulting in two distinct and highly supported clades for most taxa (red circles, Figure 3, Supplementary File S1). A few taxa (e.g., Coccolithophyceae, Cryptophyta, and Xantophyceae) presented only one MGD copy (and, therefore, a single group of sequences), while others (e.g., Dinophyceae) presented several MGD paralogs ( Figure 3 or Supplementary File S1).
Mar. Drugs 2020, 18, x 7 of 18 Coccolithophyceae, Cryptophyta, and Xantophyceae) presented only one MGD copy (and, therefore, a single group of sequences), while others (e.g., Dinophyceae) presented several MGD paralogs ( Figure 3 or Supplementary File S1). . MGD unrooted phylogenetic tree. Colored circles at the base of each node refer to branch support after aLRT SH-like test. Colors of taxa refer to taxonomic groups as in Figure 2 and Table 1.
For the SQD1 gene, the phylogenetic tree showed that all the sequences belonging to the same taxon formed a highly supported monophyletic group in most species (Figure 4, Supplementary File S2). Among the exceptions, there are the diatoms (Bacillariphyta) where, beside a clade containing . MGD unrooted phylogenetic tree. Colored circles at the base of each node refer to branch support after aLRT SH-like test. Colors of taxa refer to taxonomic groups as in Figure 2 and Table 1. For the SQD1 gene, the phylogenetic tree showed that all the sequences belonging to the same taxon formed a highly supported monophyletic group in most species (Figure 4, Supplementary File S2). Among the exceptions, there are the diatoms (Bacillariphyta) where, beside a clade containing the most diatom sequences retrieved from OGA and MMETSP, there are a few others interspersed across the tree (Figure 4). One of these only contains sequences of Thalassiosira rotula (synonym of T. gravida) strain CCMP1093 from MMETSP. The others contain small groups of sequences from OGA ( Figure 4, Supplementary File S2). Similarly, SQD1 dinoflagellates (Dinophyceae) homologs do not form a monophyletic group in the tree. This could be due to the high complexity of the plastid evolution in Dinophyceae. Dinophyceae acquired plastid with four different endosymbiotic events. [41]. In a secondary endosymbiotic event, Dinophyceae acquired plastid by serial endosymbiosis with green microalgae (Chlorophyta) [42]. In the tertiary endosymbiotic event, Dinophyceae acquired tertiary plastids by endosymbiosis with Cryptophyceae-Cryptomonads, Haptophyta, and Bacillariphyta [41].
Mar. Drugs 2020, 18, x 8 of 18 evolution in Dinophyceae. Dinophyceae acquired plastid with four different endosymbiotic events. [41]. In a secondary endosymbiotic event, Dinophyceae acquired plastid by serial endosymbiosis with green microalgae (Chlorophyta) [42]. In the tertiary endosymbiotic event, Dinophyceae acquired tertiary plastids by endosymbiosis with Cryptophyceae-Cryptomonads, Haptophyta, and Bacillariphyta [41]. Figure 4. SQD1 unrooted the phylogenetic tree. Colored circles at the base of each node refer to branch support after an aLRT SH-like test. Colors of taxa refer to taxonomic groups in Figure 2 and Table 1.
For the SQD2 gene, all the sequences from the same taxonomic group formed monophyletic groups except for the Dinophyceae ( Figure 5, Supplementary File S3). All the dinoflagellate Figure 4. SQD1 unrooted the phylogenetic tree. Colored circles at the base of each node refer to branch support after an aLRT SH-like test. Colors of taxa refer to taxonomic groups in Figure 2 and Table 1. For the SQD2 gene, all the sequences from the same taxonomic group formed monophyletic groups except for the Dinophyceae ( Figure 5, Supplementary File S3). All the dinoflagellate sequences from OGA and MMETSP were interspersed across the tree with high support, close to Bacillariophyta, Chlorophyta, Dictyocophyceae, and Coccolithophyceae.
Mar. Drugs 2020, 18, x 9 of 18 Figure 5. SQD2 unrooted phylogenetic tree. Colored circles at the base of each node refer to branch support after the aLRT SH-like test. Colors of taxa refer to taxonomic groups in Figure 2 and Table 1.

Structural Details of MDGs, SQD1, and SQD2 from Thalassiosira Weissflogii CCMP1336 (Conticriba Weissflogii)
We built in silico models for MGDs, SQD1, and SQD2 proteins from the diatom T. weissflogii CCMP1336 (synonym of Conticribra weissflogii). T. weissflogii (C. weissflogii) CCMP1336 has been selected because it is known to have immunostimulatory activity, and both galactolipids and sulfolipids have been found to play a key role in this bioactivity [39,43]. 3D in silico models ( Figure  6) were generated using the amino acid sequences of MGD (CAMPEP_0193073380, CAMPEP_0193064960 and CAMPEP_0193062160) (Figures 6a-c), SQD1 (CAMPEP_0193062736) (Figure 6d), and SQD2 (CAMPEP_0193058822) (Figure 6e), obtained from the MMETSP database. Phyre2 results were summarized in Table 2. The analyses pointed out the homology between T. weissflogii (C. weissflogii) and Arabidopsis thaliana proteins. Structural prediction of MGDs (CAMPEP_0193073380, Supplementary Data S11, CAMPEP_0193064960 Supplementary Data S12, and CAMPEP_0193062160, Supplementary Data S13) found high similarity with the 3D structure of MGD1 from A. thaliana. The percentage of identity between T. weissflogii (C. weissflogii) and A. thaliana MGD1 is higher than 40%. Similar results were obtained for structural prediction of SQD1 from T. weissflogii (C. weissflogii) (Supplementary Data S14) with 45% identity with the 3D structure of SQD1 Figure 5. SQD2 unrooted phylogenetic tree. Colored circles at the base of each node refer to branch support after the aLRT SH-like test. Colors of taxa refer to taxonomic groups in Figure 2 and Table 1.

Structural Details of MDGs, SQD1, and SQD2 from Thalassiosira Weissflogii CCMP1336 (Conticriba weissflogii)
We built in silico models for MGDs, SQD1, and SQD2 proteins from the diatom T. weissflogii CCMP1336 (synonym of Conticribra weissflogii). T. weissflogii (C. weissflogii) CCMP1336 has been selected because it is known to have immunostimulatory activity, and both galactolipids and sulfolipids have been found to play a key role in this bioactivity [39,43]. 3D in silico models ( Figure 6) were generated using the amino acid sequences of MGD (CAMPEP_0193073380, CAMPEP_0193064960 and CAMPEP_0193062160) (Figure 6a-c), SQD1 (CAMPEP_0193062736) (Figure 6d), and SQD2 (CAMPEP_0193058822) (Figure 6e), obtained from the MMETSP database. Phyre2 results were summarized in Table 2. The analyses pointed out the homology between T. weissflogii (C. weissflogii) and Arabidopsis thaliana proteins. Structural prediction of MGDs (CAMPEP_0193073380, Supplementary Data S11, CAMPEP_0193064960 Supplementary Data S12, and CAMPEP_0193062160, Supplementary Data S13) found high similarity with the 3D structure of MGD1 from A. thaliana. The percentage of identity between T. weissflogii (C. weissflogii) and A. thaliana MGD1 is higher than 40%. Similar results were obtained for structural prediction of SQD1 from T. weissflogii (C. weissflogii) (Supplementary Data S14) with 45% identity with the 3D structure of SQD1 from A. thaliana.
Mar. Drugs 2020, 18, x 10 of 18 and SUS1 is 18%, which is lower than the percent of identity of SQD1 and MDGs. SUS1 from A. thaliana catalyzed a reversible sucrose synthesis. It transfers glucose moiety from UDP to fructose and it has been found to form a complex with both UDP-glucose and UDP-fructose [44]. These findings may indicate the presence of a low specificity ligand-binding site in the SQD2 enzyme.  Table 2. Report of Phyre2 analysis. We report the template (protein of known structure used for the prediction analysis) and its protein data bank (PDB) code, confidence (probability that the sequence and template are homologous), and percent id (percent of identity). In order to evaluate whether predicted structures preserved the functional active sites of MGDs, SQD1, and SQD2, they were also analyzed by using the COACH server. MGD analysis was performed for all the structures obtained from Phyre2. The analysis pointed out the presence of a UDP binding pocket for the three MGDs. MGD CAMPEP_0193073380 (Figure 7a) binding site prediction had a C-score 0.32 and the presence of 17 residues in the binding site (i.e., H69, A71, R263, G312, G313, V350, G352, F379, V380, M383, Y396, G398, P399, G400, T401, E40, and E422). MGD CAMPEP_0193064960 (Figure 7b) binding site prediction had a low C-score value (0.18) and 17 amino acid residues were involved in the binding site (i.e., H89, A91, R260, G305, G306, V335, G337, F528, V529, M532, K545, G547, P548, G549, T550, E553, and E571). MGD CAMPEP_0193062160 (Figure 7c) binding site prediction had a low C-score (0.18) and 17 residues involved (i.e. H262, A264, R434, G477, G478, V507, G509, F586, V587, M590, K603, G605, P606, G607, T608, E611, and E629). COACH prediction analysis highlighted the same 17 amino acid residues involved in ligand-binding site of  Table 2. Report of Phyre2 analysis. We report the template (protein of known structure used for the prediction analysis) and its protein data bank (PDB) code, confidence (probability that the sequence and template are homologous), and percent id (percent of identity). Structural prediction for SQD2 (Supplementary Data S15) found a similarity with the 3D structure of sucrose synthase-1 (SUS1) from A. thaliana. The percentage of identity between SQD2 and SUS1 is 18%, which is lower than the percent of identity of SQD1 and MDGs. SUS1 from A. thaliana catalyzed a reversible sucrose synthesis. It transfers glucose moiety from UDP to fructose and it has been found to form a complex with both UDP-glucose and UDP-fructose [44]. These findings may indicate the presence of a low specificity ligand-binding site in the SQD2 enzyme.

Identification of MGD, SQD1, and SQD2 Homologous Sequences
Since our primary interest was to ascertain the occurrence of MGD, SQD1, and SQD2 in microalgae, we used the sequences of MGD (accession number XP002181685), SQD1 (accession number XP002185968), and SQD2 (accession number XP002185276) from the diatom Phaeodactylum tricornutum as queries for a BLAST [50] search against the Ocean Global Atlas (OGA, [51]) database (http://tara-oceans.mio.osupytheas.fr/oceangene-atlas/), and several protist transcriptomes from the Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP, [52]) available at https://zenodo.org/record/12125852585. The OGA database contains a collection of more than 116 million eukaryote and 40 million prokaryotic genes gathered during the Tara Oceans [53,54] and the Global Ocean Sampling [55] expeditions. Instead, MMETSP contains the transcriptomes of some of the most abundant and ecologically significant microbial eukaryotes in the oceans. To retrieve homologs, we used the blastp algorithm against the metagenome/metatranscriptome and transcriptome databases contained in OGA and the MMETSP, setting the expect threshold to 1E-10. For OGA, after ascertaining that the sequences obtained from metagenomes and metatranscriptomes were identical, we only used one dataset (metatranscriptomes) for further analyses. Since a sequencebased homology search could have recovered different genes with similar functions, we used Blast2GO [56] to obtain a functional annotation of the homologs retrieved. We used the default settings (i.e., blastx program, using the nr BLAST database and with a BLAST expectation value of 1.0E-3) for the analysis. We considered valid Blast2GO annotations containing the following names:  (Figure 9b) (C-score 0.33) that involved the residues G83, Y84, R87, H190, T191, K249, S354, E355, T356, L357, and G358. These data confirmed the presence of a low specificity ligand binding pocket, and suggest that SQD2 from diatoms could be involved in the synthesis of other diacylglycerols such as glucuronosyldiacylglycerol (GlcADG) in Arabidopsis [47] or flavonoid glycosylation in Oryza sativa [48,49].

Identification of MGD, SQD1, and SQD2 Homologous Sequences
Since our primary interest was to ascertain the occurrence of MGD, SQD1, and SQD2 in microalgae, we used the sequences of MGD (accession number XP002181685), SQD1 (accession number XP002185968), and SQD2 (accession number XP002185276) from the diatom Phaeodactylum tricornutum as queries for a BLAST [50] search against the Ocean Global Atlas (OGA, [51]) database (http://tara-oceans.mio.osupytheas.fr/oceangene-atlas/), and several protist transcriptomes from the Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP, [52]) available at https://zenodo.org/record/12125852585. The OGA database contains a collection of more than 116 million eukaryote and 40 million prokaryotic genes gathered during the Tara Oceans [53,54] and the Global Ocean Sampling [55] expeditions. Instead, MMETSP contains the transcriptomes of some of the most abundant and ecologically significant microbial eukaryotes in the oceans. To retrieve homologs, we used the blastp algorithm against the metagenome/metatranscriptome and transcriptome databases contained in OGA and the MMETSP, setting the expect threshold to 1E-10.

Identification of MGD, SQD1, and SQD2 Homologous Sequences
Since our primary interest was to ascertain the occurrence of MGD, SQD1, and SQD2 in microalgae, we used the sequences of MGD (accession number XP002181685), SQD1 (accession number XP002185968), and SQD2 (accession number XP002185276) from the diatom Phaeodactylum tricornutum as queries for a BLAST [50] search against the Ocean Global Atlas (OGA, [51]) database (http://tara-oceans.mio.osupytheas.fr/oceangene-atlas/), and several protist transcriptomes from the Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP, [52]) available at https://zenodo.org/record/12125852585. The OGA database contains a collection of more than 116 million eukaryote and 40 million prokaryotic genes gathered during the Tara Oceans [53,54] and the Global Ocean Sampling [55] expeditions. Instead, MMETSP contains the transcriptomes of some of the most abundant and ecologically significant microbial eukaryotes in the oceans. To retrieve homologs, we used the blastp algorithm against the metagenome/metatranscriptome and transcriptome databases contained in OGA and the MMETSP, setting the expect threshold to 1E-10. For OGA, after ascertaining that the sequences obtained from metagenomes and metatranscriptomes were identical, we only used one dataset (metatranscriptomes) for further analyses. Since a sequence-based homology search could have recovered different genes with similar functions, we used Blast2GO [56] to obtain a functional annotation of the homologs retrieved. We used the default settings (i.e., blastx program, using the nr BLAST database and with a BLAST expectation value of 1.0E-3) for the analysis. We considered valid Blast2GO annotations containing the following names: 1,2-diacylglycerol 3-beta-galactosyltransferase and monogalactosyldiacylglycerol synthase for MGD, sulfolipid biosynthesis protein, sulfoquinovosyldiacylglycerol synthesis protein, UDP sulfoquinovose synthase and uridine 5 -diphosphate-sulfoquinovose synthase for SQD1, sulfoquinovosyl transferase SQD2, sulfoquinovosyldiacylglycerol 2, and UDP-sulfoquinovose: DAG sulfoquinovosyltransferase for SQD2. All the sequences identified as only "predicted or hypothetical protein" without specification were a priori discarded.

Taxonomic Overview
All the sequences from OGA that passed the Blast2GO analysis with the criterions specified above were annotated using the annotation file generated during homolog retrieval. We removed all the sequences without a taxonomic annotation, annotated only as "Eukaryota" and, whenever applicable, with a generic annotation that did not allow to discriminate such sequences from others of lower taxonomic rank (e.g., "Stramenopiles"). All the taxa in which the MGD, SQD1, and SQD2 genes were found and were organized into a table. The abundance of such sequences were plotted as histograms using the R [57] working packages scales [58] and ggplot2 [59]. For the homologs retrieved from the MMETSP transcriptomes, we generated a table illustrating the species surveyed and the genes of interest that were found in each of them.

Sequence Alignment and Phylogenetic Inference
The sequences of each gene from OGA and MMETSP that passed Blast2GO annotation and with a length of at least 200 aa were aligned using COBALT [60] available at https://www.ncbi.nlm.nih.gov/ tools/cobalt/). Unlike other common software used for protein alignments (e.g., ClustalW, MAFFT, MUSCLE, ProbCons) that only use sequence information, COBALT also integrates the information of protein-motif regular expressions (PROSITE database) and of conserved protein domains (NCBI CDD database). In doing so, COBALT has a better chance of producing a biologically meaningful multiple alignment compared to tools that do not utilize this information 60]. Poorly aligned regions from each alignment were removed with trimAl v1.2 [61] in order to increase the quality of subsequent phylogenetic analyses. We used the automated1 option to find the most appropriate mode to trim the alignments (use of gaps or similarity scores) depending on the alignment characteristics.
In order to infer reliable phylogenetic trees, all the sequences that, after trimming, were shorter than one quarter (around 60-70 aa) of the final length of the alignment were removed, unless longer sequences were not available for that particular taxon. Maximum likelihood phylogenetic trees were inferred using PhyML [62] using the LG substitution model [63], which turned out to be the best evolution model for the three genes investigated according to the Akaike Information Criterion implemented in SMS [64]. Support to nodes was calculated using the Shimodaira-Hasegawa-like (aLRT SH-like) procedure [65]. Trees were visualised and graphically edited in FigTree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/).
3D in silico prediction analyses indicated that the enzymes, maintaining conserved domains, could be effectively involved in the synthesis of compounds with known anticancer and immune-modulatory activities, such as MGDGs and SQDGs. This approach can give preliminary information for the selection of specific microalgal species for drug discovery and for genetic engineering approaches in order to produce huge amounts of bioactive compounds of pharmaceutical interest.
Supplementary Materials: The following are available online at http://www.mdpi.com/1660-3397/18/5/237/s1. Data S1: Blast2GO analysis of MGD genes. BLASTp search conducted in the OGA database highlighted 1614 annotated eukaryotic sequences putatively attributable to MGD. Of these sequences, 1154 were identified as the gene of interest. Data S2: Blast2GO analysis of SQD1 genes. BLASTp search conducted in the OGA database highlighted 1422 annotated eukaryotic sequences putatively attributable to SQD1. Of these sequences, 817 were identified as the gene as the gene of interest. Data S3: Blast2GO analysis of SQD2 genes. BLASTp search conducted in the OGA database highlighted 1340 annotated eukaryotic sequences putatively attributable to SQD2. Of these sequences, 273 were identified as the gene of interest. Data S4: MGD amino acid sequences from MMETSP transcriptomes. After functional annotation, 267 sequences were left for MGD. Data S5: SQD1 amino acid sequences from MMETSP transcriptomes. After functional annotation, 84 sequences were left for MGD. Data S6: SQD2 amino acid sequences from MMETSP transcriptomes. After functional annotation, 121 sequences were left for SQD2. Data S7: Original data for Table 1. All the species and the strains were analyzed from MMETSP. Data S8: Alignments for MGD amino acid sequences. The alignments were performed using 649 amino acid sequences for MGD. Data S9: Alignments for SQD1 amino acid sequences. The alignments were performed using 521 amino acid sequences for SQD1. Data S10: Alignments for SQD2 amino acid sequences. The alignments were performed using 295 amino acid sequences for SQD2. Data S11: PDB file of MGD CAMPEP_0193073380 predicted structure. In silico model generated by Phyre2 for MGD CAMPEP_0193073380. Data S12: PDB file of MGD CAMPEP_0193064960 predicted structure. In silico model generated by Phyre2 for MGD CAMPEP_0193064960. Data S13: PDB file of MGD CAMPEP_0193062160 predicted structure. In silico model generated by Phyre2 for MGD CAMPEP_0193062160. Data S14: PDB file of SQD1 CAMPEP_0193062736 predicted structure. In silico model generated by Phyre2 for SQD1 CAMPEP_0193062736. Additional file 15: Data S15: PDB File of SQD2 CAMPEP_0193058822 predicted structure. In silico model generated by Phyre2 for SQD2 CAMPEP_0193058822. File S1: PDF file of MGD unrooted phylogenetic tree. Colored circles at the base of each node refer to branch support after aLRT SH-like test. Colors of taxa refer to taxonomic groups as in Table 1. File S2: PDF file of SQD1 unrooted phylogenetic tree. Colored circles at the base of each node refer to branch support after aLRT SH-like test. Colors of taxa refer to taxonomic groups in Table 1. File S3: PDF file of SQD2 unrooted phylogenetic tree. Colored circles at the base of each node refer to branch support after the aLRT SH-like test. Colors of taxa refer to taxonomic groups as in Table 1.
Author Contributions: Conceptualization, G.R. and C.L. Methodology and formal analysis, G.R., D.D.L., and C.L. Supervision, C.L., G.R., D.D.L., and C.L. co-wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding:
The "Antitumor Drugs and Vaccines from the Sea (ADViSE)" project (PG/2018/0494374) funded this research.

Conflicts of Interest:
The authors declare no conflict of interest.