Intraspecific Diversity in the Cold Stress Response of Transposable Elements in the Diatom Leptocylindrus aporus

Transposable elements (TEs), activated as a response to unfavorable conditions, have been proposed to contribute to the generation of genetic and phenotypic diversity in diatoms. Here we explore the transcriptome of three warm water strains of the diatom Leptocylindrus aporus, and the possible involvement of TEs in their response to changing temperature conditions. At low temperature (13 °C) several stress response proteins were overexpressed, confirming low temperature to be unfavorable for L. aporus, while TE-related transcripts of the LTR retrotransposon superfamily were the most enriched transcripts. Their expression levels, as well as most of the stress-related proteins, were found to vary significantly among strains, and even within the same strains analysed at different times. The lack of overexpression after many months of culturing suggests a possible role of physiological plasticity in response to growth under controlled laboratory conditions. While further investigation on the possible central role of TEs in the diatom stress response is warranted, the strain-specific responses and possible role of in-culture evolution draw attention to the interplay between the high intraspecific variability and the physiological plasticity of diatoms, which can both contribute to the adaptation of a species to a wide range of conditions in the marine environment.


Introduction
Diatoms are among the most common types of phytoplankton and one of the most successful clades of eukaryotic, single-celled photosynthetic organisms in the modern ocean [1]. They are widespread in the plankton and benthos of freshwater, coastal and oceanic habitats and even in temporarily wet terrestrial environments. The ability of diatoms to thrive under many different conditions in the natural environment could depend on: (1) their phenotypic plasticity, reflected in their physiological diversity that allows them to acclimate and face short-term environmental heterogeneity [2,3], (2) intraspecific genetic variations leading to distinct populations, each with a rather narrow physiological tolerance and response (i.e., adapted populations); numerous cases of cryptic or pseudo-cryptic diatom species have recently been uncovered [4][5][6], and even populations within species that show particular distributions as a result of adaptation to specific ecological niches [7], genetically and epigenetically-based phenotypic variation and have been associated with adaptation to the environment in numerous studies on different organisms [46,47], while two active, diatom-specific retrotransposons, namely Blackbeard and Surcouf, have been proposed to act as environmental sensors in the response to stress in the diatom Phaeodactylum tricornutum [32,47]. Thus, TEs could substantially increase genomic diversity and be a crucial element in the acclimation and even adaptation of diatoms to the ever-changing aquatic environment but when in a constant environment such as the lab, their suppression would have no effects and could hence become permanent through genetic changes, leading to the loss of the diatom adaptability.
Leptocylindrus aporus belongs to an ancient centric diatom lineage, Leptocylindraceae, with a broad distribution, common from polar to sub-tropical coastal regions, and is often a conspicuous component of diatom blooms [48,49]. It grows equally efficiently at medium and high temperature but less at low temperature, where it might experience physiological stress [50]. In the Gulf of Naples, L. aporus has been found to bloom in summer but the species is present during other seasons as well [6,51]. Indications of a cold adapted population resulted from a worldwide high throughput sequencing (HTS) metabarcoding analysis on Leptocylindraceae, offering a possible explanation of the species occurrence through the year [51]. In the current study, (i) we used RNA sequencing of multiple strains of L. aporus to explore expression differences in response to temperature changes, focusing on TEs. (ii) Intraspecific genetic diversity was assessed with a single-nucleotide polymorphism (SNP) analysis. In addition, in order to detect possible genetic changes in laboratory conditions for the oldest strain, an additional SNP analysis was conducted comparing present data to the already available transcriptome in the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) [52]. (iii) The effects of experimental conditions were explored with shorter acclimation times as well as with recently isolated L. aporus strains using qPCR.
Our overall goal was to better understand how diatoms acclimate and possibly adapt to different environmental conditions, which is a requisite to interpret their changes in distribution and phenological patterns.

Leptocylindrus aporus Strains
The six L. aporus strains (clonal cultures of L. aporus, i.e., initiated from a single cell) used in the current study were isolated from surface waters at the LTER-MC site (40 • 48.5 N, 14 • 15 E) in the Gulf of Naples (GoN, Mediterranean Sea) on different dates from 2010 to 2016 (Table 1). The cultures were kept in K medium [53] plus silica at 20 • C, under fluorescent light of 100 µmol photons m −2 s −1 with a photoperiod of 12:12 (light: dark). The first three strains, used for RNA sequencing (RNA-Seq), were identical based on the ITS r-DNA marker and were selected in a way that they would cover diversity in terms of seasonality and possibly in physiological characteristics detected by growth experiments. Following the results of RNA-Seq, new isolations were performed in order to obtain strains with a lower time of culture maintenance to be used in qPCR experiments. The new strains were also molecularly identified as L aporus.

Acclimation for RNA-seq
The three strains selected for RNA-seq were acclimated to three temperatures, 13 • C, 19 • C and 26 • C, at a light intensity of 100 µmol photons m −2 s −1 and a photoperiod of L:D, 12:12 ( Table 2). The temperature values were chosen based on the range recorded in the GoN during the year (13.0-27.5 • C, [54]). Starting from the maintenance conditions at 20 • C, cultures were maintained for one week at intermediate temperatures, namely 15 • C and 23 • C ( Figure S1). Subsequently, they were moved to 13 • C and 26 • C and grown in triplicates under the same light conditions, along with one triplicate grown at 19 • C. The following acclimation period was maintained for at least three growth cycles or more until the growth curves were stabilized. Due to the different behavior of each strain, the acclimation duration varied between strains, yet based on their growth rates they were all considered equally acclimated in the end.
Cell density was quantified under the light microscope and chlorophyll fluorescence was measured daily using a Turner 10-005 fluorometer. Growth rate was calculated based on plots of the logarithmic values (base 10) of cells density and/or fluorescence over the days. K10, the slope calculated from the linear trendline applied to the plot, was used in the following equations, where k equals doublings per day and Ke equals the exponential growth rate [55]: k (div./day) = 3.322/K10 (1) and Ke (day-1) = 0.6931 k (2)

RNA Extraction and Sequencing
Starter cultures (100 mL) with a concentration of 2000 cells/mL in exponential phase were transferred to 1-liter volume flasks and grown until a final concentration of 50,000 cells/mL, ensuring they were still at the exponential phase. Cultures were harvested by filtration on 47 mm MF-Millipore mixed cellulose membrane filters (1.2 µm pore size), which were stored in TRIzol Reagent. RNA extraction was performed as described in Nanjappa et al. (2017) [56].
To remove DNA contamination Roche DNase I recombinant, RNase-free (10 u/µL) was used and RNA was then purified using the QIAGEN RNeasy Mini Kit, following the manufacturer's protocol. RNA samples were run on a Bioanalyzer 2100 RNA Nano LabChip (Agilent, Palo Alto, CA, USA) for qualitative control and sent for deep sequencing on an Illumina HiSeq2000 instrument (Illumina: San Diego, CA, USA) (paired-end, 50 bases) to the European Molecular Biology Laboratory (EMBL) Genomics Core Facilities.
The completeness of the assembly and annotation was quantified using the software BUSCO v.2 using the "protists" set of lineage-specific single copy genes [70]. The BUSCO results were compared to the corresponding ones produced from the L. aporus transcriptome available in the Marine Microbial Eukaryote Transcriptome Sequencing Project, which was based on the single strain B651 [56].

Differential Expression Analysis
To assess the distance among samples, non-metric multidimensional scaling (NMDS) analysis was performed based on the expression values (CPMs) of all transcripts. Subsequently, we used EdgeR to evaluate the expression of the transcripts obtained and statistically distinguish the expression of the genes among the experiments. The following comparisons were performed: 13 vs. 19 • C, 19 vs. 26 • C, 13 vs. 26 • C. Different strains were used as replicates. Transcript counts were obtained using samtools idx. Raw counts were transformed in count per million (CPM) with the R library "edgeR", followed by a normalization step with the functions calcNormFactors, estimateCommonDisp and estimateTagwiseDisp. For each couple of sample CPMs, we executed a paired t-test to avoid the effect of the differences among strains using the R function exactTest. Transcripts were considered differentially expressed when the FDR (false discovery rate) ≤ 0.05 and the FC (fold change) > 2. A complementary analysis was performed between strain couples (1A1 vs. B651, 1A1 vs. 3A6, B651 vs. 3A6) using the temperature conditions as replicates. Significantly differentially expressed (DE) transcripts with no annotation were manually blasted in the National Center for Biotechnology Information (NCBI) and CDD databases. All significantly DE transcripts were also annotated for KEGG pathways using GhostKOALA [71]. The KEGG pathway annotation results were further processed in a similar way as in Liang et al. 2019 [17]. KEGG pathways and Brite categories were selected from the following categories: metabolism, genetic information processing, environmental information processing, and cellular processes. A heatmap based on the log transformed expression values was made for the significantly DE transcripts between strains. Pathways with less than five annotated KEGG orthology identifiers (KO IDs) were removed.

Enrichment Analysis and Clustering of DE Transcripts
The gene ontology (GO) terms that were enriched in the list of DE transcripts with respect to the overall transcriptome were identified with the R prop.test (parameters: alternative=prop.alt) function. The parameters used were 10 as the minimum number of transcripts associated to a GO class and a cut-off of adjusted p-value equal to 0.1 in order to consider a class significant.
With the aim of investigating the different functions of genes related to the expression dynamics, a cluster analysis based on unsupervised classification was applied. The software MeV [72] was used for the clustering of DE transcripts using the k-means algorithm with the following parameters: extraction of fifteen clusters (k = 15), 500 clusters re-ordering iteration, Pearson uncentered as distance measure, calculation of medians (k-medians) instead of means.

SNP Analysis
For the intraspecific diversity assessment, the quality checked and trimmed reads of all strains at 19 • C were mapped on the transcriptome using STAR 2.7.0c [73] and further processed with samtools (sorted and indexed). The software Opossum 0.2 was used in order to adjust the mapping of RNA-seq reads for variant calling [74]. The SNPs were called using FreeBayes v1.2.0-4-gd15209e [75] and filtered with bcftools by samtools, removing reads not properly mapped and with mapping quality less than 30. Variants with allele frequency (AF) equal or less than 0.2 were not considered ( Figure 1).
To address possible variations over the culturing time, the same workflow was also followed mapping B651 reads acquired from this study and public B651 reads already available in MMETSP on the B651 MMETSP transcriptome. The B651 MMETSP reads and transcriptome are based on RNA extracted from the same B651 strain, at 19 • C in 2011, three years before the experiments described in this study. For this intra-strain comparison, the transcriptome used for the mapping was the B651 MMETSP transcriptome as it is based only on the B651 strain and would give more accurate results regarding the evolution through time of the strain itself, with no noise introduced by the other strains. °C. An additional analysis (light blue boxes) was performed for B651, including the transcriptome publicly available in MMETSP B651, which is based on RNA extracted from the same strain but three years before the present experiment.

qPCR Validation of Gene Expression Levels from RNA-Seq
Following the investigation of the expression and function of all significantly DE genes, eight of them were selected as targets for validation and further experiments with qPCR (Genbank MN738467-MN738474). Five of the selected transcripts expressed TE-related genes and three expressed heat stress-related genes ( Table 3). Tubulin A1 (TUBA1) and tubulin B6 (TUBB6) were tested for stable expression under the different experimental conditions using NormFinder software [76].
The samples used for validation with qPCR had undergone the same acclimation conditions and time as in the RNA-Seq experiment. The RNA was extracted following the same protocols described above and then reverse transcribed into cDNA using the QuantiTect Reverse Transcription Kit (Qiagen, Venlo, Limburg, the Netherlands). The specificity of all primers was checked by blasting against the whole L. aporus transcriptome. For the TE-related transcripts, a gradient PCR for a temperature range of 59.6-66.9 °C was conducted using randomly selected DNA from the studied strains. The reactions were performed in a final volume of 10 μl: cDNA 1μl, forward primer (10 μM), reverse primer (10 μM), PCR reaction buffer with MgCl2 (10×), dNTP (10×), Taq DNA Polymerase (5 u/μL). The products were run on 1.5% agarose gel in order to specify the size of the amplicon and confirm the presence of a single band, which demonstrates that a single region is amplified for each primer pair ( Figure S2). An additional analysis (light blue boxes) was performed for B651, including the transcriptome publicly available in MMETSP B651, which is based on RNA extracted from the same strain but three years before the present experiment.

qPCR Validation of Gene Expression Levels from RNA-Seq
Following the investigation of the expression and function of all significantly DE genes, eight of them were selected as targets for validation and further experiments with qPCR (Genbank MN738467-MN738474). Five of the selected transcripts expressed TE-related genes and three expressed heat stress-related genes ( Table 3). Tubulin A1 (TUBA1) and tubulin B6 (TUBB6) were tested for stable expression under the different experimental conditions using NormFinder software [76].
The samples used for validation with qPCR had undergone the same acclimation conditions and time as in the RNA-Seq experiment. The RNA was extracted following the same protocols described above and then reverse transcribed into cDNA using the QuantiTect Reverse Transcription Kit (Qiagen, Venlo, Limburg, the Netherlands). The specificity of all primers was checked by blasting against the whole L. aporus transcriptome. For the TE-related transcripts, a gradient PCR for a temperature range of 59.6-66.9 • C was conducted using randomly selected DNA from the studied strains. The reactions were performed in a final volume of 10 µl: cDNA 1µl, forward primer (10 µM), reverse primer (10 µM), PCR reaction buffer with MgCl2 (10×), dNTP (10×), Taq DNA Polymerase (5 u/µL). The products were run on 1.5% agarose gel in order to specify the size of the amplicon and confirm the presence of a single band, which demonstrates that a single region is amplified for each primer pair ( Figure S2). The efficiency of primers was calculated with an at least five point serial 10-fold dilution using the standard curve method of the ViiA™ 7 Real-Time PCR System (Applied Biosystems by Life Technologies, Carlsbad, CA, USA). The efficiency of all primers ranged between 1.85 and 2.41 (perfect efficiency equals 2, which means a double amount of DNA after one cycle). In addition, the melting curves, which ideally should show a single peak corresponding to a single product, were checked in order to verify the specificity of the primers already seen by the agarose gel. TR7186 showed a faint second band in the agarose gel but the qPCR melting curve did not detect any other product except for the targeted one. On the other hand, there were a few exceptions of melting curves that showed a smaller double peak indicating possible primer-dimers or non-specific amplification (TR6356, TR6586, TR6506_i5 in 1A1, 1188A1, 1189A3 and 1189B3 at 13 • C, HSFA in 1A1 at 26 • C). All these cases of extra peaks or slightly different peaks occurred when the amplification was low (see example Figure S3) and could have been related to the much lower level or absence of the target transcripts. This non-specificity would ultimately not have an important impact on the final interpretation of the low expression results. After verifying a satisfactory efficiency of the primers, qPCR was performed for all samples in triplicates with a negative control. Relative Expression Software Tool-Multiple Condition Solver (REST-MCS) was used for the calculation of the relative expression in qPCR. The software uses the pair-wise fixed reallocation randomization test [77]. The relative expression ratio of the targeted genes was computed as the expression variation between high temperature samples, set as control because L. aporus grows well at high temperature, against the low temperature samples, set as the stress condition, normalized over the expression variation of reference genes (TUBA1 and TUBB6) whose expression levels were not regulated in specific experimental conditions.

qPCR Experiments on Acclimation and Cultivation Time
To further explore the expression profile of the target genes in response to different acclimation and cultivation times, two qPCR experiments were designed and performed on two different sets of RNA samples, respectively (Table 4).

Experiment 1
In the first experiment, the same strains used in the RNA-Seq and validation were used but acclimation time was altered in order to investigate the acclimation duration impact. Strains were acclimated for a much shorter period compared with the time used in the RNA-Seq and validation experiments (39-66 days) and then used for qPCR analysis. These will be referred to as "_exp1" samples.

Experiment 2
To test the effect of time spent in culture, new L. aporus strains (1188A1, 1189A3, 1189B3) were isolated and used for qPCR after a short acclimation period (8-43 days). The acclimation duration varied due to the ease or difficulty of each strain that was acclimated (had a stable growth rate after three growth cycles). In order to check the presence of the TE-related sequences in the genomes of all strains we also performed PCR on genomic DNA. Genomic DNA from strains used in both experiments was isolated using cetyltrimethylammonium bromide (CTAB). We added 500 µL CTAB and 12 µL β-mercaptoethanol to the cell pellet, then RNase was added and the reaction was incubated at 65 • C for 45 min. After addition of 500 µL SEVAG (chloroform: isoamyl alcohol 24:1), the sample was centrifuged at 20,000× g for 15 min at 4 • C. Precipitation occurred in 1 volume of ice-cold isopropanol at −20 • C and, after 2 washes in 75% ethanol, the pellet was air-dried and finally resuspended in water. PCRs were performed by using Wonder Taq DNA Polymerase (EuroClone) according to manufacturer's instructions, with an annealing temperature of 57 • C and extension time of 1 min. Primer sequences were the same used for qPCR (Table 3).

Phylogenetic Analysis of the Selected TEs
The selected transcripts coding for transposable elements were screened for interspersed repeats and low complexity DNA sequences, like simple tandem repeats, polypurine and AT-rich regions, in the RepeatMasker (Search Engine: cross match, DNA source: diatom; [78]). The same screening was performed for the CoDi/GyDi sequences, which belong to groups of transposable elements mainly found in marine organisms with most of them being identified only in diatoms to date [32]. The CoDi/GyDi sequences were derived from Genbank using the Genbank accession numbers mentioned in [32]. The results of the two screenings were compared in order to identify whether the selected TEs represent known groups or new CoDis/GyDis. In addition, a phylogenetic analysis of the selected transcripts and the known CoDi/GyDi elements was performed. Multiple sequence alignment was performed with CLUSTAL Omega at EMBL-EBI [79]. Neighbor-joining trees were constructed for amino acid sequences with the bootstrap method set as the test of phylogeny (500 replications) and the Poisson model as a substitution model in MEGA7 [80] with GyDi group as outgroups, as was also done in [32]. All ambiguous positions were removed for each sequence pair. TR7186 and TR6586 were analysed separately in the phylogeny because it was not possible to estimate their pairwise distance.

Leptocylindrus aporus Transcriptome
Preliminary growth experiments were performed on several L. aporus strains in order to identify the best candidates for investigating the species acclimation and adaptation using RNA-Seq (Supplementary File 1). Based on the V9 region included in the beginning of ITS marker, none of the strains belonged to the possible cold adapted population found in the worldwide HTS analysis [51]. Ultimately, one L. aporus strain isolated in summer 2010 (B651) and two isolated in winter 2013-2014 (1A1 and 3A6) were chosen, due to their diversity in terms of the year of isolation, seasonality and physiological characteristics. The final transcriptome of L. aporus (ArrayExpress ID: E-MTAB-8596), obtained using all results from the three strains grown at the three different temperatures, consisted of 19,963 transcripts. Further statistics are presented in Table S1.
The transcriptome obtained in the current study was compared to the previously available MMETSP transcriptome [52], obtained from the oldest strain B651, in order to test whether the set of 215 protist lineage-specific single-copy genes were present in the assemblies (BUSCO analysis). This analysis gave an indication on the completeness of the assemblies as well as their redundancy. While the two assemblies contained a similar number of complete genes from the dataset (100 for the current and 103 for the older assembly), the older assembly had a higher proportion of potentially erroneous assembled transcripts and therefore were classified as duplicated by BUSCO (7 for the current and 34 for the old) (Table S2).

Differential Expression: Heat and Cold Response
The differential expression analysis between the temperatures resulted in 276 significantly DE transcripts between low (13 • C) and high (26 • C) temperature and only nine between low and medium (19 • C) temperature (Table 5, Figure 2), while there was no significant difference in expression between medium and high temperature (Supplementary File 2, 3 and 7).   All nine transcripts that were found to be significantly differentially expressed between low and 355 medium temperature were also contained within the group of significantly differentially expressed 356 genes in the low and high temperature contrast and followed the same direction of expression change 357 as in the less extreme contrast.  All nine transcripts that were found to be significantly differentially expressed between low and medium temperature were also contained within the group of significantly differentially expressed genes in the low and high temperature contrast and followed the same direction of expression change as in the less extreme contrast.

358
A total of 49 significant DE transcripts, representing 17.7% of all DE transcripts, received an annotation related to temperature, referring to oxidative or any other kind of stress, heat/cold response and DNA integration, overall indicating a stress response of L. aporus at low temperature ( Table 6).
The gene ontology (GO) enrichment analysis showed the GO terms enriched with respect to the overall transcriptome. The adjusted p-value cutoff to consider a class as significant was 0.1, which means that 10% of significant classes might be false positives. The analysis indicated that three classes belonging to the biological process GO aspect were significantly enriched among the DE transcripts between high and low temperature, protein dephosphorylation, DNA integration and carbohydrate metabolic process. Interestingly, among them, the most represented was DNA integration (Figure 3), which refers to the incorporation of a segment of DNA, including a transposon, into another, usually larger, DNA molecule such as a chromosome. Manual inspection of annotations revealed 16 DE transcripts related to DNA integration, four of which had no further annotation and 12 had specific protein annotation by at least one of the databases used in Annocript (Table 6). These were annotated as (1) reverse transcriptase (RNA-dependent DNA polymerase), which is usually indicative of a mobile element such as a retrotransposon or retrovirus, (2) ribonuclease of_Ty1/Copia or Ty3/Gypsy family, an endonuclease that cleaves the RNA strand of an RNA/DNA hybrid, which has been observed as adjunct domains to the reverse transcriptase gene in retroviruses, long-term repeat (LTR)-bearing and non-LTR retrotransposons, (3) integrase, which mediates integration of a DNA copy into the host chromosome, or (4) a gag-pol polyprotein, encoding structural proteins and several enzymatic functions in LTR retrotransposons. The gene ontology (GO) enrichment analysis showed the GO terms enriched with respect to the overall transcriptome. The adjusted p-value cutoff to consider a class as significant was 0.1, which means that 10% of significant classes might be false positives. The analysis indicated that three classes belonging to the biological process GO aspect were significantly enriched among the DE transcripts between high and low temperature, protein dephosphorylation, DNA integration and carbohydrate metabolic process. Interestingly, among them, the most represented was DNA integration (Figure 3), which refers to the incorporation of a segment of DNA, including a transposon, into another, usually larger, DNA molecule such as a chromosome. Manual inspection of annotations revealed 16 DE transcripts related to DNA integration, four of which had no further annotation and 12 had specific protein annotation by at least one of the databases used in Annocript (Table 6). These were annotated as (1) reverse transcriptase (RNA-dependent DNA polymerase), which is usually indicative of a mobile element such as a retrotransposon or retrovirus, (2) ribonuclease of_Ty1/Copia or Ty3/Gypsy family, an endonuclease that cleaves the RNA strand of an RNA/DNA hybrid, which has been observed as adjunct domains to the reverse transcriptase gene in retroviruses, long-term repeat (LTR)-bearing and non-LTR retrotransposons, (3) integrase, which mediates integration of a DNA copy into the host chromosome, or (4) a gag-pol polyprotein, encoding structural proteins and several enzymatic functions in LTR retrotransposons. All the significantly DE TE-related transcripts except one were highly expressed at low temperature in the recently isolated strains 1A1 and 3A6, while they were very low at all All the significantly DE TE-related transcripts except one were highly expressed at low temperature in the recently isolated strains 1A1 and 3A6, while they were very low at all temperatures for the old B651 strain. This expression pattern was also the one mostly seen in several of the other stress-related transcripts (HSFs, SYM1, AOX4, ALDH, RaiA), and in three transcripts, manually annotated by blasting in NCBI and CDD databases as likely homologs of the bacterial MAI genes related to biomineralization, which are known to undergo frequent transposition events [81]. The observation of this expression pattern pointed towards a strain specific stress response. Indeed, in an NMDS analysis using the global gene expression matrix, the oldest strain B651 clustered apart from the other two strains, for which samples from the medium and high temperature were interspersed ( Figure S4). All the above led to further investigation of the differences among strains.
Considering couples of strains, significantly DE transcripts between 1A1 and 3A6 were 622; much less compared to the 3,015 between 1A1 and B651 and 2,418 between B651 and 3A6 (Figures 4 and 5, Supplementary Files 4-7).
Genes 2020, 11, 9 14 of 27 temperatures for the old B651 strain. This expression pattern was also the one mostly seen in several of the other stress-related transcripts (HSFs, SYM1, AOX4, ALDH, RaiA), and in three transcripts, manually annotated by blasting in NCBI and CDD databases as likely homologs of the bacterial MAI genes related to biomineralization, which are known to undergo frequent transposition events [81]. The observation of this expression pattern pointed towards a strain specific stress response. Indeed, in an NMDS analysis using the global gene expression matrix, the oldest strain B651 clustered apart from the other two strains, for which samples from the medium and high temperature were interspersed ( Figure S4). All the above led to further investigation of the differences among strains. Considering couples of strains, significantly DE transcripts between 1A1 and 3A6 were 622; much less compared to the 3,015 between 1A1 and B651 and 2,418 between B651 and 3A6 ( Figure 4 and 5, Supplementary Files 4-7).   Genes 2020, 11, 9 14 of 29 temperatures for the old B651 strain. This expression pattern was also the one mostly seen in several 390 of the other stress-related transcripts (HSFs, SYM1, AOX4, ALDH, RaiA), and in three transcripts, 391 manually annotated by blasting in NCBI and CDD databases as likely homologs of the bacterial MAI 392 genes related to biomineralization, which are known to undergo frequent transposition events [81].

393
The observation of this expression pattern pointed towards a strain specific stress response. Indeed,

394
in an NMDS analysis using the global gene expression matrix, the oldest strain B651 clustered apart 395 from the other two strains, for which samples from the medium and high temperature were 396 interspersed ( Figure S4). All the above led to further investigation of the differences among strains.

397
Considering couples of strains, significantly DE transcripts between 1A1 and 3A6 were 622;    DNA integration held a central role in this analysis as well, as it was the biological process that was significantly enriched in all three pairs of strains. The same TE-related transcripts detected in the temperature differential expression analysis were among the significantly DE ones from the strain analysis. Most of the DNA-integration transcripts were expressed in a similar manner, described above, i.e., higher at low temperature and lower in B651; there were even some transcripts completely absent from all B651 samples. Several key genes and pathways were included in the KEGG annotation of the significant DE transcripts among strains, with B651 clustering separately in the expression heatmap ( Figure S5). Overall, the differential expression analysis among strains confirmed the B651-biased, TE-related expression pattern observed in the differential expression analysis among temperatures.

Clustering of DE Transcripts
Based on genes clustered by their expression patterns, the strongest cell response was to the cold conditions ( Figure 6) except for three clusters (clusters 4, 12 and 14) showing an opposite pattern. At high temperature, cluster-4 genes were highly expressed only in the newer strains 3A6 and 1A1, whereas cluster-12 genes showed a higher level of expression in the old strain B651. Two transcripts related to TEs (transposase, ribonuclease) and a heat stress transcription factor (HSFC-1b) were present in cluster 4 and heat shock factor protein 1 in cluster 12.
Genes 2020, 11, 9 15 of 27 DNA integration held a central role in this analysis as well, as it was the biological process that was significantly enriched in all three pairs of strains. The same TE-related transcripts detected in the temperature differential expression analysis were among the significantly DE ones from the strain analysis. Most of the DNA-integration transcripts were expressed in a similar manner, described above, i.e., higher at low temperature and lower in B651; there were even some transcripts completely absent from all B651 samples. Several key genes and pathways were included in the KEGG annotation of the significant DE transcripts among strains, with B651 clustering separately in the expression heatmap ( Figure S5). Overall, the differential expression analysis among strains confirmed the B651-biased, TE-related expression pattern observed in the differential expression analysis among temperatures.

Clustering of DE Transcripts
Based on genes clustered by their expression patterns, the strongest cell response was to the cold conditions ( Figure 6) except for three clusters (clusters 4, 12 and 14) showing an opposite pattern. At high temperature, cluster-4 genes were highly expressed only in the newer strains 3A6 and 1A1, whereas cluster-12 genes showed a higher level of expression in the old strain B651. Two transcripts related to TEs (transposase, ribonuclease) and a heat stress transcription factor (HSFC-1b) were present in cluster 4 and heat shock factor protein 1 in cluster 12. Four out of the five TE-related genes of interest selected for qPCR (see below) were present in cluster 3, which, along with cluster 9, showed an increased expression at lower temperatures, but only for strains 3A6 and 1A1. At least seven more transcripts related to TEs or transposition (reverse Four out of the five TE-related genes of interest selected for qPCR (see below) were present in cluster 3, which, along with cluster 9, showed an increased expression at lower temperatures, but only for strains 3A6 and 1A1. At least seven more transcripts related to TEs or transposition (reverse transcriptase, ribonuclease, MAIs, gag-pol protein, integrase core domain)-eleven in total-and two heat stress-related transcripts (HSFA-1a, SYM1) were also present in these clusters. The GO terms that were significantly enriched in cluster 3 and 9 were DNA integration and antioxidant activity.
Cluster 2 included genes that showed an increase in expression while the temperature decreased but only in strain B651. The GO enrichment analysis in this cluster showed that oxidation-reduction process and DNA integration were significantly enriched terms. This cluster also included the remaining gene of interest that was investigated (TR6586) and two more TE-related transcripts.
Cluster 5 showed transcripts that were expressed more at cold temperature. This behavior was similar in all strains, but more pronounced in strain 1A1. DNA recombination and transport were the GO terms significantly enriched in this cluster.
Cluster 6 and 13 included transcripts highly expressed at low temperature for all strains. Heat stress transcription factor B-2a (HSFB-2a) was present in this cluster. The GO terms that were significantly enriched were metabolic process and catalytic activity.
Cluster 10 showed higher gene expression at low temperature for two strains B651 and 1A1, while in strain 3A6 the gene expression was lower at the extremes and higher at medium temperature. AOX was present in this cluster. The most significantly enriched GO term was oxidation-reduction process.

SNP Analysis and Genetic Diversity
The number of SNPs identified by FreeBayes was similar in 1A1 and 3A6, while it was almost double in the older strain B651 ( Table 7). The heterozygous to homozygous ratio, for both total and SNP specific, was also quite different for B651 compared to 1A1 and 3A6. The intra-strain SNP analysis of B651 showed that there were slightly fewer SNPs in the new transcriptome compared to the original MMETSP transcriptome. Furthermore, the total and SNP specific heterozygous to homozygous ratio has been reduced to half during the three years of culturing (Table 8).

Expression of Genes of Interest: Temperature and Strain Effects
Five of the eight target genes used in qPCR were related to TEs (Table 9), as they were selected based on the most significantly enriched biological process GO term 'DNA integration'. The three remaining target genes were genes related to stress, as L. aporus seemed to go under stress at the low temperature.

qPCR Validation Results
The eight selected transcripts were cross-validated in the same strains, grown under the same conditions (validation set), and all of the results were found to be consistent with the results from the RNA-Seq samples with the single exception of HSFB in strain 1A1, which showed almost no change in expression ( Figures S6 and S7).
As in RNA-Seq, B651 stood out based on its different regulation compared to 1A1 and 3A6. This was especially true in the case of HSFA, TR6506_i3 and TR6586: the former two were highly induced in 1A1 and 3A6 at low temperature but down or not regulated in B651, while the latter followed the opposite trend.

qPCR Results on Acclimation and Cultivation Time
The expression of the selected transcripts was examined in independent experiments in which the exposure to high and low temperature was applied for a shorter time (approx. 41 vs. 155 days as an average), using the same strains as well as newly isolated ones. The results of these qPCR experiments (Figures 7 and 8 and Table 10) were different from those of the RNA-Seq ( Figure S2).
Genes 2020, 11,9 17 of 27 The three remaining target genes were genes related to stress, as L. aporus seemed to go under stress at the low temperature.

qPCR Validation Results
The eight selected transcripts were cross-validated in the same strains, grown under the same conditions (validation set), and all of the results were found to be consistent with the results from the RNA-Seq samples with the single exception of HSFB in strain 1A1, which showed almost no change in expression ( Figure S6 and S7).
As in RNA-Seq, B651 stood out based on its different regulation compared to 1A1 and 3A6. This was especially true in the case of HSFA, TR6506_i3 and TR6586: the former two were highly induced in 1A1 and 3A6 at low temperature but down or not regulated in B651, while the latter followed the opposite trend.

qPCR Results on Acclimation and Cultivation Time
The expression of the selected transcripts was examined in independent experiments in which the exposure to high and low temperature was applied for a shorter time (approx. 41 vs. 155 days as an average), using the same strains as well as newly isolated ones. The results of these qPCR experiments (Figures 7 and 8 and Table 10) were different from those of the RNA-Seq ( Figure S2).  In B651, 1A1 and 3A6, the selected transcripts were downregulated at low temperature, while in the freshly isolated strain 1189B3-acclimated for the shortest time-all genes were either expressed at low levels or not detected. The only transcript that seemed to be unaffected from the acclimation duration and the time since isolation was SYM1, which was found to always be upregulated at low temperature.
Genes 2020, 11, 9 18 of 27 isolated and used for qPCR after a short/very short acclimation period. The fold change is low temperature to high temperature expression values. Standard deviation values are shown.
In B651, 1A1 and 3A6, the selected transcripts were downregulated at low temperature, while in the freshly isolated strain 1189B3-acclimated for the shortest time-all genes were either expressed at low levels or not detected. The only transcript that seemed to be unaffected from the acclimation duration and the time since isolation was SYM1, which was found to always be upregulated at low temperature. On the other hand, looking at the freshly isolated strains 1188A1 and 1189A3 that were acclimated slightly more than 1189B3, most genes were again low or not expressed (TR6506_i3, TR6586, TR7186, HSFA, HSFB) but there were also upregulated TEs (TR6356 in both 1188A1 and 1189A3, TR6586 in 1188A1, TR6506_i3 and TR6506_i5 in 1189A3).   On the other hand, looking at the freshly isolated strains 1188A1 and 1189A3 that were acclimated slightly more than 1189B3, most genes were again low or not expressed (TR6506_i3, TR6586, TR7186, HSFA, HSFB) but there were also upregulated TEs (TR6356 in both 1188A1 and 1189A3, TR6586 in 1188A1, TR6506_i3 and TR6506_i5 in 1189A3).  Amplification of the gene fragments related to TEs from the genomic DNA of the strains used in these experiments revealed that some of the elements were absent from the genome of strain 1189B3 ( Figure S8), indicating that the absence of expression in this strain was due to the absence of the specific transposon in the genome. Similarly, in strain 1189A3, TR7186 appeared to be absent. In all other cases, amplification was successful.

Phylogenetic Analysis
RepeatMasker analysis on the five selected TE related transcripts (Table 10) revealed that two of them matched annotated repeats that were similar to CoDi/GyDi sequences. TR6506_i3 and GyDi2.1 both matched Gypsy3-I_TP, though at different regions each. TR7186 and the diatom-specific retrotransposon Blackbeard matched at the same region, Copia8-I_TP, but also at different ones (Table S3).
The phylogenetic tree gave a clearer picture about the relationship between the selected TEs and CoDis/GyDis. However, it was not possible to estimate the pairwise distance between TR7186 and TR6586 due to the much shorter protein sequence of the first, so two separate NJ trees were built (Figure 9). TR6506_i3 and TR6506_i5 clustered with the GyDi group, and TR7186 with the CoDi I group, while TR6586 and TR6356, named CoDi L.ap, were closely related to the Copia group.
Amplification of the gene fragments related to TEs from the genomic DNA of the strains used in these experiments revealed that some of the elements were absent from the genome of strain 1189B3 ( Figure S8), indicating that the absence of expression in this strain was due to the absence of the specific transposon in the genome. Similarly, in strain 1189A3, TR7186 appeared to be absent. In all other cases, amplification was successful.

Phylogenetic Analysis
RepeatMasker analysis on the five selected TE related transcripts (Table 10) revealed that two of them matched annotated repeats that were similar to CoDi/GyDi sequences. TR6506_i3 and GyDi2.1 both matched Gypsy3-I_TP, though at different regions each. TR7186 and the diatom-specific retrotransposon Blackbeard matched at the same region, Copia8-I_TP, but also at different ones (Table S3).
The phylogenetic tree gave a clearer picture about the relationship between the selected TEs and CoDis/GyDis. However, it was not possible to estimate the pairwise distance between TR7186 and TR6586 due to the much shorter protein sequence of the first, so two separate NJ trees were built (Figure 9). TR6506_i3 and TR6506_i5 clustered with the GyDi group, and TR7186 with the CoDi I group, while TR6586 and TR6356, named CoDi L.ap, were closely related to the Copia group.

Discussion
The analysis of L. aporus expression profiles at different temperatures based on several strains provided new information on the response of this species to cold stress, but also on the related intraspecific variability and its possible causes. In particular, we identified (i) specific genes, with a prominent role of TEs among them, that react to temperature changes and might be involved in acclimation and/or adaptation to environmental conditions, possibly underlying distinct phenological patterns, and (ii) high intraspecific variations in the species heat/cold response. The high numbers of DE genes only between the minimum and maximum temperature tested indicates that L. aporus mainly reacted to the low temperature, while it retained the same functional state at 26 °C and

Discussion
The analysis of L. aporus expression profiles at different temperatures based on several strains provided new information on the response of this species to cold stress, but also on the related intraspecific variability and its possible causes. In particular, we identified (i) specific genes, with a prominent role of TEs among them, that react to temperature changes and might be involved in acclimation and/or adaptation to environmental conditions, possibly underlying distinct phenological patterns, and (ii) high intraspecific variations in the species heat/cold response. The high numbers of DE genes only between the minimum and maximum temperature tested indicates that L. aporus mainly reacted to the low temperature, while it retained the same functional state at 26 • C and 19 • C. However, considering that one of the strains (B651) showed a very different reaction, this result could also be influenced by this high variance. A higher number of samples and replicates would be needed in order to completely clarify this point. In any case, the variability was much higher among strains, with 3622 DE genes, than among different experimental conditions, where only 276 DE genes were found, which highlights considerable intraspecific functional differences.
The functions related to the significant DE genes in response to temperature confirmed the stress reaction of L. aporus at low temperature. Oxidative, cold/heat, environmental or physiological stress responses, which seem to be coordinated by specific factors such as stress-activated kinases, AOX4, ALDHs, HSPs and HSFs, SYM1, MAI, FDH and transposable elements (TEs), were notable components of the species functional profile at 13 • C.
TEs proved to be the most significantly enriched units of the stress response, providing the first evidence for the possible role of TEs in the temperature stress response in L. aporus, as a few others have done recently in diatoms [47,82]. All TEs were of the LTR retrotransposon superfamily, confirming the tendency of relatively high abundance of LTR retrotransposons in diatom genomes [32]. Furthermore, the phylogenetic analysis showed that two of them belong to the GyDi group and one to the CoDi I group, which are both diatom-specific groups, while the CoDi L.ap group is related to the Copia group, which is also found in animals, plants and yeast.
When genes were grouped according to their expression profile in the different conditions, four out of the five TE-related transcripts that were selected for further analysis clustered with genes related to antioxidant activity. Antioxidant responses in plants are initiated after the accumulation of reactive oxygen species (ROS), such as hydrogen peroxide (H 2 O 2 ) and hydroxyl free radical (·OH) [83,84]. The generation of numerous ROS can be linked to low temperature, when photosynthetic enzymes may be degraded and photo-damage may occur, leading to reduced photosynthetic activity and hence, the accumulation of excess energy [85]. Similarly, the overexpression of MAI at low temperature might mean that the mobilization of this gene in L. aporus contributes to genetic plasticity and finally adaptation to physiological stress in the same way that TEs do. Indeed, in bacteria, MAI undergoes frequent rearrangements, or else transposition events, under physiological stress conditions including prolonged storage at 4 • C or exposure to oxidative stress [81].
From the between-strains transcriptomic analysis, TEs again held a central role, with their expression showing strain-specificity as shown for fungi and plants [86][87][88]. The absence of specific TEs from the genomes of a couple of the strains (1189B3 and 1189A3), as well as the markedly different response to low, and for some genes, high, temperature of one of the three strains, B651, point to a high intraspecific variability. TE copy number variations (CNV) could also contribute to differences in expression patterns among strains, an issue that would be worth further investigation as CNV, if present, could be related to the strain response to cold. The clusters representation, among other things, allows one to appreciate the noise caused by B651. It is tempting to consider these differences as a result of adaptation: having been isolated from a summer population, strain B651 could be expected to perform worse at low temperature, with a lower expression level of genes involved in cold/heat response than the other two strains 1A1 and 3A6, belonging to winter populations. The clustering of B651 separately from 1A1 and 3A6 based on pathways including amino acid metabolism, glycolysis and gluconeogenesis, energy metabolism, cell growth and death and protein processing in endoplasmic reticulum could imply temperature adaptation changes in the baseline expression of key genes and pathways to maintain metabolic homeostasis, as highlighted in another centric diatom, Chaetoceros spp. [17]. The strains 1A1 and 3A6 could hence belong to a cold adapted population, though different from the one detected in the worldwide HTS study of the species [51].
Yet another hypothesis is that strain B651 was cultivated for four years longer than the other two strains, during which constant temperature conditions could have caused a shift of baseline gene expression and the downregulation or the loss of specific cold/heat response proteins. Indeed, the SNP analysis pointed at another peculiarity of the old strain, with an important loss of heterozygosity compared to the more recent ones, but also to the same strain back in 2011. This is not the first time that loss of heterozygosity is observed in diatom strains kept in prolonged cultivation [89]. The relevance of cultivation time would be supported by results of growth rate experiments performed on the other two strains isolated in the same year as B651 [51], which also showed worse performance under extreme temperatures. The effect of in-culture evolution could have been a more likely reason for their distinct behavior than the season of their occurrence, because these two strains were isolated in October and November and not in summer like B651, and were kept in the same conditions as B651 for long time. Stable temperature, light conditions and culture medium, along with the absence of any kind of competition or threat, for four years (more than 1000 generations) could have led to the divergence of B651 from its original 'natural' state and affected its ability to respond to temperature changes and stress. The loss of certain stress response mechanisms resulting in the very low expression of TEs and stress-related genes could be one of those effects. A strain of a pennate diatom of the genus Pseudo-nitzschia also displayed contrasting gene expression patterns under different nutrient conditions compared to earlier experiments [90], leading to the idea of in-culture progressive physiological modifications targeting functions that are energetically costly and confer no advantage in culture. In the L. aporus case, the SNP analysis points to a genetic modification following the physiological one, but we prefer to remain cautious about this assumption since genotypes defined only on RNA data can be a result of several confounding pre-and post-transcriptional alternative roads (monoallelic expression, allelic imbalance, DNA editing, RNA editing, etc.). Finally, in support of the in-culture evolution hypothesis, it should also be noted that after two years from isolation, the two strains 1A1 and 3A6 did not show any upregulation, whereas the freshly isolated strains 1188A1 and 1189A3 did show upregulation in a couple of their TEs, even after a short acclimation at low temperature.
This small yet existent difference indicates that the absence of stress response is not strain-related to B651 alone, but, when compared with more freshly isolated strains, the same observation can be made for all strains maintained in culture.
In addition to intraspecific diversity and possible in-culture evolution, the response to a stressful environment can also vary in relation with the duration of the exposure to stressful conditions. The contrasting expression patterns noticed in the most recently cultured strains that underwent a shorter period of acclimation indicate a possible role of the exposure time to the stressing factor in L. aporus: transcripts related to cold stress were not at all expressed (with 22 days acclimation), or, with a few exceptions, expressed at a very low level (with 26-43 days acclimation). Similar changes were seen in the cold water coralline algae Lithothamnion glaciale, where two phases were identified in a long term experiment under the stress of elevated CO 2 treatment: the "passive" phase during the first 3 months and the "active" phase by the end of 10 months when energy was allocated from cell growth to structural support, showing a clear adaptive plasticity response of the algae [91]. It therefore seems that the "passive" phase in L. aporus lasts about 10-20 days, after which these TEs are expressed. Nevertheless, the response to a short and long-term changing environment may be not only species-but also strain-specific [92], and the relative importance of strain-specific reaction versus time of exposure to stress is hard to disentangle in this study as the strains isolated more recently were also the ones that underwent the shorter acclimation.

Conclusions
Variability within species is a relevant issue especially when studying the potential for adaptation of phytoplankton, which instead is often extrapolated from single strain responses of representative species. In the present study, gene expression patterns of the warm water diatom L. aporus have revealed profound differences among strain responses, which were as important, if not of greater importance, than those among temperature conditions. In addition, time spent by the strains under stable conditions and in-culture evolution seem to play an important role in the differences observed.
TE-related sequences appear to play a role in the adaptation to cold conditions, but they may get silenced when the cells remain for a long period in the same environmental conditions. This result points at the possible role of TEs in the generation of the phenotypic plasticity that can lead to genetic diversity and ultimately to the success of diatoms under different and variable environmental conditions. More experiments are needed to demonstrate the actual importance of TEs in the response of L. aporus to stress, e.g., through the assessment of methylation, which is thought to be involved in heterochromatin formation and maintenance that controls TE mobility. The role of the TEs of interest in transcriptional regulation under cold stress should also be investigated, while experimental evolution approaches or single cell based strategies targeting specific TEs could provide evidence of de novo TE insertions over different time scales and of any changes in expression that can improve fitness.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/1/9/s1, Figure  S1: Acclimation scheme of L. aporus strains at the three different temperatures, Figure S2: Agarose gel of the gradient PCR, Figure S3: qPCR amplification plot and melting curves, Figure S4: NMDS analysis, Figure S5: Heatmap of key genes and pathways, Figure S6: qPCR validation results for the TEs, Figure S7: qPCR validation results for the heat stress-related transcripts, Figure S8: Amplification of the TE-related gene fragments from genomic DNA, Table S1: Transcriptome statistics,