An Oil Hyper-Accumulator Mutant Highlights Peroxisomal ATP Import as a Regulatory Step for Fatty Acid Metabolism in Aurantiochytrium limacinum

Thraustochytrids are marine protists that naturally accumulate triacylglycerol with long chains of polyunsaturated fatty acids, such as ω3-docosahexaenoic acid (DHA). They represent a sustainable response to the increasing demand for these “essential” fatty acids (FAs). Following an attempt to transform a strain of Aurantiochytrium limacinum, we serendipitously isolated a clone that did not incorporate any recombinant DNA but contained two to three times more DHA than the original strain. Metabolic analyses indicated a deficit in FA catabolism. However, whole transcriptome analysis did not show down-regulation of genes involved in FA catabolism. Genome sequencing revealed extensive DNA deletion in one allele encoding a putative peroxisomal adenylate transporter. Phylogenetic analyses and yeast complementation experiments confirmed the gene as a peroxisomal adenylate nucleotide transporter (AlANT1), homologous to yeast ScANT1 and plant peroxisomal adenylate nucleotide carrier AtPNC genes. In yeast and plants, a deletion of the peroxisomal adenylate transporter inhibits FA breakdown and induces FA accumulation, a phenotype similar to that described here. In response to this metabolic event, several compensatory mechanisms were observed. In particular, genes involved in FA biosynthesis were upregulated, also contributing to the high FA accumulation. These results support AlANT1 as a promising target for enhancing DHA production in Thraustochytrids.


Introduction
Aurantiochytrium limacinum is an obligate heterotrophic marine unicellular protist belonging to the Thraustochytrid family. Thraustochytrids are present almost everywhere in the oceans, from tropical to polar areas, and from the surface down to 2000 m below sea level [1][2][3][4]. Because they are obligate heterotrophs, Thraustochytrids are more abundant in habitats containing decaying biological material, such as superficial sediment layers, mangroves, or river effluents [3]. In the past decade, Thraustochytrids have attracted biotechnological interest because they naturally accumulate high levels of triacylglycerols (TAGs). TAGs represent a valuable source of fatty acids (FAs) for either human and animal health or green chemistry purposes [5,6]. The peculiarity of Thraustochytrids compared with many other microalgae is their strikingly high content of very long chain polyunsaturated fatty acids (VLCPUFA), mainly ω3-docosahexaenoic acid (DHA, 22:6) [7][8][9][10][11]. DHA is synthesized at extremely low levels in animals and is therefore considered as a 3 of 23 Erlen-Meyer flasks filled with 50 mL of rich (R) or poor (P) media [11] with a 100-rpm orbital shaking. P medium is an R medium containing 40 times less glucose and yeast extract. For all experiments, 6-day-old axenic cultures grown on R were transferred to fresh culture media at an initial cell concentration of 5 × 10 5 cells·mL −1 . Growth was monitored following the increase of DW. To estimate the role of β-oxidation in A. limacinum growth, the glucose carbon source in the R medium was replaced by 2 mM palmitic acid (C16:0). Stock solutions of palmitic acid were made in 100% TWEEN 80 at a concentration of 50 mM). Two-day-old cells in the exponential phase of growth were centrifuged at low speed and then transferred to glucose-free medium for 24 h to remove all glucose resulting from the subculture process. Cells were then transferred to new glucose-free medium containing 2 mM C16:0 at a final concentration of 5 × 10 5 cells·mL −1 . The growth was followed by measuring the DW increase over 3 days.

Lipid Analyses
Lipids were extracted according to Folch et al. [34]. FAs were converted into methyl esters (FAME), then analyzed by GC-MS/FID (Clarus 80, Perkin Elmer, Waltham, MA, USA) on a BPX70 (SGE) column, as previously described [11,35]. FAMEs were identified by comparison of their retention times with those of standards (Sigma, Saint-Louis, MO, USA) and by their mass fragmentation spectra. They were quantified using C21:0 as internal standard.

Genomic, Transcriptomic, and qRT-PCR Analyses
Nucleic acid extraction, sequencing, and bioinformatics methods are described in Morabito et al. [36] and in Appendix A. RNA samples from 3-day-old cells were reverse transcribed using the SuperScript IV VILO Mastermix with ezDNAse kit (ThermoFisher, Waltham, MA, USA) according to the manufacturer's instructions. The qPCR reactions were run in a final volume of 10 µL containing 20 ng of cDNA, 5 µL of Power SYBR ® Green PCR Master Mix (Applied Biosystems, ThermoFisher), and 600 nM of each primer. Reactions were performed in a CFX ConnectTM Real-Time System (BioRad, Hercules, CA, USA) with the following program: initial denaturation step at 95 • C for 10 min and 40 denaturationamplification-elongation cycles (95 • C, 10 s; 60 • C, 10 s; 72 • C, 30 s), followed by melting curve assessment (65 • C to 95 • C, with a 0.5 • C increment). All primer sequences are available in the Supplementary Data and Methods file. Two sets of primers were used to estimate the expression level of AlANT1 (e_gw1.9.603.1) at day 3, targeting two different regions of the gene. Transcript levels were normalized against the Cystein desulfurase NFS1 (estExt_fgenesh1_kg.C_30063) gene. NSF1 expression did not vary in the WT or in LAS strains in RNA-seq experiment. The differential expression analyses and statistics were performed using the Pair Wise Fixed Reallocation Randomisation Test method developed in the Relative Expression Software Tool REST © [37]. Biological triplicates and technical triplicates of each reaction were performed.
The mutant ∆ant1 was complemented with either ScANT1, g11073, or the truncated form of g11073. For the transformations, the DNA fragments were in vivo cloned by homology into a PstI-BamHI-digested YEplac195 backbone under the control of the promoter Cells 2021, 10, 2680 4 of 23 PMA1 and with the ADH1 terminator. A 6× His tag was cloned in frame with the genes of interest in 3 , upstream and in frame with the stop codon. First, the promoter PMA1, terminator ADH1, and coding sequences of ScANT1, g11073, and truncated g11073 were amplified by PCR. Primers are listed in the Supplementary data and methods file. Yeast cells were directly transformed using the lithium acetate method [38] in the presence of the digested plasmid and the purified PCR fragments, as fully described in the Supplementary data and methods file. Transformed cells were selected in CSM plates without URA. Positive clones were grown in CSM medium. Plasmids were extracted from yeast cells with the kit ZymoPrep D2004, (Zymo Research, Irvine, CA, USA), amplified in DH5α competent cells, and then sequenced for validation. Complemented yeast strains and WT were inoculated in biological triplicate at an initial OD of ca. 0.15 in 50 mL of fresh medium in 250-mL Pyrex ® Erlen-Meyer flasks. Cultures were incubated at 30 • C under agitation at 250 rpm.

Sample Preparation for Electronic Microscopy
Samples were prepared as previously described [39]. Briefly, cells were fixed in 0.1 M phosphate buffer (PB) (pH 7.4) containing 2.5% (v/v) glutaraldehyde for 2 h at room temperature and then stored overnight at 4 • C. Samples were then washed five times in 0.1 M PB (pH 7.4). Samples were fixed by a 1-h incubation on ice in 0.1 M PB (pH 7.4) containing 2% osmium and 1.5% ferricyanide potassium before they were washed five times with 0.1 M PB (pH 7.4). Samples were resuspended in 0.1 M PB (pH 7.4) containing 0.1% tannic acid and incubated for 30 min in the dark at room temperature. Samples were again washed five times with 0.1 M PB (pH 7.4), dehydrated in ascending sequences of ethanol, and infiltrated with ethanol/Epon resin mixture. Finally, the samples were embedded in Epon. Ultrathin sections (50-70 nm) were prepared with a diamond knife on a PowerTome ultramicrotome (RMC products, Tucson, AZ, USA) and collected on 200-µm nickel grids. Samples were visualized by scanning transmission electron microscopy (STEM) using a MERLIN microscope (Zeiss, Oberkochen, Germany) set up at 30 KV and 240 pA.

Phylogenetic Analyses
The nucleotide coding sequence of the g11073 locus was in silico translated and the obtained amino acid sequence was aligned in the data set by Arai et al. [40] The alignment was performed using a customized pipeline in NGphylogeny.fr [41] using MUSCLE v3.8.1551 software. The ambiguously aligned regions were curated using the Block Mapping and Gathering with Entropy (BMGE v1.12_1) software implemented in NGphylogeny.fr using default settings. MEGA X v10.0.5 software was fed with the aligned and curated data set. The best evolutionary model was evaluated and a Maximum Likelihood phylogenetic analysis was performed. In order to define the best evolutionary model, MEGA X was used to compare the 56 models implemented. The LG+G model was chosen (lowest Bayesian Information Criterion (BIC) score). The phylogeny was inferred by Maximum Likelihood with 2000 bootstrap pseudoreplicates. The tree with the highest log likelihood (−10,454.51) was retained. Initial trees for the heuristic search were obtained by applying the BioNJ method to a matrix of pairwise distances estimated using a JTT model. A discrete Gamma distribution was used to model evolutionary rate differences among sites (five categories (+G, parameter = 2.9679)).

Results
We attempted to transform Aurantiochytrium limacinum CCAP 4062/1 [11] to produce a zeocin-resistant strain. A commercial pUC19 vector was modified to introduce the ShBle gene under the control of the promoter and terminator sequences of an endogenous polyubiquitin gene. Following a biolistic transformation, potentially transformed clones were selected on agarose plates containing 200 µg mL −1 zeocin. One of the selected clones appeared later on as a false positive since the initial zeocin-resistant phenotype disappeared after a few subculturing steps. In addition, PCR screening revealed no fragments corresponding to the ShBle gene or any part of the vector plasmid. Whole genome sequencing confirmed the absence of any exogenous DNA. However, this clone showed a strong lipid accumulation phenotype, with a higher lipid concentration than the WT strain. This clone was termed LAS, for Lipid Accumulating Strain.

Growth and Fatty Acid Content of LAS and WT
The WT and LAS growth curves were similar during the first 2 days corresponding to the exponential phase of growth [11] (Supplementary Figure S1). Thereafter, the biomass (expressed as DW·mL −1 ) increased more in LAS than in WT, likely because the former accumulated significantly more lipids (TAGs) than the latter. Indeed, as shown in Figure 1A-D, scanning transmission electron microscopy (STEM) clearly showed that LAS contained more lipid droplets than WT. This was confirmed by FA analyses (Figure 2A), with LAS displaying 2-3 times more FAs than WT, either at the end of the exponential phase of growth (Day 2) or later in the stationary phase (Day 4). In addition, the FA profile in LAS revealed significant differences compared with WT ( Figure 2B); the proportion of C16:0 in LAS was higher than in WT, whereas the proportion of C22:6 was lower. Then, the ratio of FA chains synthesized by the FAS system (C14:0 + C15:0 + C16:0) to FA chains synthesized by the PUFA synthase (C22:5 + C22:6) was monitored throughout growth. a zeocin-resistant strain. A commercial pUC19 vector was modified to introduce the ShBl gene under the control of the promoter and terminator sequences of an endogenou polyubiquitin gene. Following a biolistic transformation, potentially transformed clone were selected on agarose plates containing 200 µg mL −1 zeocin. One of the selected clone appeared later on as a false positive since the initial zeocin-resistant phenotype disap peared after a few subculturing steps. In addition, PCR screening revealed no fragment corresponding to the ShBle gene or any part of the vector plasmid. Whole genome se quencing confirmed the absence of any exogenous DNA. However, this clone showed strong lipid accumulation phenotype, with a higher lipid concentration than the W strain. This clone was termed LAS, for Lipid Accumulating Strain.

Growth and Fatty Acid Content of LAS and WT
The WT and LAS growth curves were similar during the first 2 days correspondin to the exponential phase of growth [11] (Supplementary Figure S1). Thereafter, the bio mass (expressed as DW·mL −1 ) increased more in LAS than in WT, likely because the for mer accumulated significantly more lipids (TAGs) than the latter. Indeed, as shown i Figure 1A-D, scanning transmission electron microscopy (STEM) clearly showed tha LAS contained more lipid droplets than WT. This was confirmed by FA analyses (Figur 2A), with LAS displaying 2-3 times more FAs than WT, either at the end of the exponentia phase of growth (Day 2) or later in the stationary phase (Day 4). In addition, the FA profil in LAS revealed significant differences compared with WT ( Figure 2B); the proportion o C16:0 in LAS was higher than in WT, whereas the proportion of C22:6 was lower. Then the ratio of FA chains synthesized by the FAS system (C14:0 + C15:0 + C16:0) to FA chain synthesized by the PUFA synthase (C22:5 + C22:6) was monitored throughout growth.     It was previously observed [35] that cells inoculated in a rich (R) medium first consumed their TAGs to sustain their rapid growth, and then, upon nitrogen limitation, started to accumulate TAGs at the expense of glucose to build up new storages. As shown in Figure 3A, the total amount of FAs in WT decreased 3-4 times during the first 2 days. This was associated with a decrease of the short to long FA ratio, suggesting that shorter FAs, essentially C16:0, were more rapidly consumed than longer ones (C22:6). This ratio returned to the initial value with the accumulation of newly synthesized FAs. In LAS (Figure 3B), the total amount of FAs decreased less after 2 days of culture, suggesting a slower consumption of storage lipids. Concomitantly, the short to long FA ratio remained high throughout the experiment. In a poor (P) medium, nitrogen was limiting right at the beginning of the experiment and TAGs were immediately synthesized in WT until glucose was exhausted [35] ( Figure 3C). Then, after 2 days, TAGs were consumed to sustain the energy demand. Here again, the short to long FA ratio in WT was modified during the initial FA accumulation phase, suggesting that C16:0 was more rapidly synthesized than C22:6. It declined back to the initial value with the consumption of FAs. In LAS ( Figure  3D), the FA level also increased after 2 days but remained high even after 4 days. This suggests that FAs were poorly utilized to sustain cell basal activity, unlike WT. The short to long FA ratio increased after 2 days and stayed high thereafter. Taken as a whole, these results strongly suggest a slower catabolism of FAs in LAS compared with WT, a metabolic change that also affects the short to long FA ratio. It was previously observed [35] that cells inoculated in a rich (R) medium first consumed their TAGs to sustain their rapid growth, and then, upon nitrogen limitation, started to accumulate TAGs at the expense of glucose to build up new storages. As shown in Figure 3A, the total amount of FAs in WT decreased 3-4 times during the first 2 days. This was associated with a decrease of the short to long FA ratio, suggesting that shorter FAs, essentially C16:0, were more rapidly consumed than longer ones (C22:6). This ratio returned to the initial value with the accumulation of newly synthesized FAs. In LAS ( Figure 3B), the total amount of FAs decreased less after 2 days of culture, suggesting a slower consumption of storage lipids. Concomitantly, the short to long FA ratio remained high throughout the experiment. In a poor (P) medium, nitrogen was limiting right at the beginning of the experiment and TAGs were immediately synthesized in WT until glucose was exhausted [35] ( Figure 3C). Then, after 2 days, TAGs were consumed to sustain the energy demand. Here again, the short to long FA ratio in WT was modified during the initial FA accumulation phase, suggesting that C16:0 was more rapidly synthesized than C22:6. It declined back to the initial value with the consumption of FAs. In LAS ( Figure 3D), the FA level also increased after 2 days but remained high even after 4 days. This suggests that FAs were poorly utilized to sustain cell basal activity, unlike WT. The short to long FA ratio increased after 2 days and stayed high thereafter. Taken as a whole, these results strongly suggest a slower catabolism of FAs in LAS compared with WT, a metabolic change that also affects the short to long FA ratio.
To validate this hypothesis, C16:0 replaced glucose as carbon source in the growth medium ( Figure 4). LAS was not able to grow after 24 h, whereas WT did, although at a slower pace than in the presence of glucose (Supplementary Figure S1). The growth recorded during the first 24 h was probably due to minor sources of carbon present in the medium, such as the amino acids contained in the yeast extract. To validate this hypothesis, C16:0 replaced glucose as carbon source in the growth medium ( Figure 4). LAS was not able to grow after 24 h, whereas WT did, although at a slower pace than in the presence of glucose (Supplementary Figure S1). The growth recorded during the first 24 h was probably due to minor sources of carbon present in the medium, such as the amino acids contained in the yeast extract. Each value is the mean of three biological replicates, and error bars indicate the standard deviation. Graphs and statistics were performed with GraphPad Prism ® software.

Transcriptomic and Genomic Analyses of LAS
To get clues about the modifications affecting the lipid metabolism in LAS, transcriptomic (Illumina RNA-seq) and genomic (PacBio sequencing) analyses on LAS and WT were carried out. The transcriptomic and genomic sequencing data are accessible from  To validate this hypothesis, C16:0 replaced glucose as carbon source in the growth medium ( Figure 4). LAS was not able to grow after 24 h, whereas WT did, although at a slower pace than in the presence of glucose (Supplementary Figure S1). The growth recorded during the first 24 h was probably due to minor sources of carbon present in the medium, such as the amino acids contained in the yeast extract. Each value is the mean of three biological replicates, and error bars indicate the standard deviation. Graphs and statistics were performed with GraphPad Prism ® software.

Transcriptomic and Genomic Analyses of LAS
To get clues about the modifications affecting the lipid metabolism in LAS, transcriptomic (Illumina RNA-seq) and genomic (PacBio sequencing) analyses on LAS and WT were carried out. The transcriptomic and genomic sequencing data are accessible from Each value is the mean of three biological replicates, and error bars indicate the standard deviation. Graphs and statistics were performed with GraphPad Prism ® software.

Transcriptomic and Genomic Analyses of LAS
To get clues about the modifications affecting the lipid metabolism in LAS, transcriptomic (Illumina RNA-seq) and genomic (PacBio sequencing) analyses on LAS and WT were carried out. The transcriptomic and genomic sequencing data are accessible from the following references: Morabito et al., 2020 [36] and Appendix A. The complete list of differentially expressed (DE) genes involved in lipid metabolism is shown in the Supplementary Table S1. In Table 1 the DE genes involved in FA and glycerolipid metabolisms (LAS versus WT, Log 2 fold change > |1|) are shown. Few genes were significantly DE, most of them were upregulated. Among them, upregulation of genes involved in FA biosynthesis (PUFA synthase and FAS) could possibly explain the FA increase observed in LAS. However, none of them could explain the lower FA catabolism suggested by our experiments. Indeed, none of the genes involved in glycerolipid degradation (phospholipases, DAG and TAG lipases), transport of FAs into the peroxisomes (peroxisomal ABC carriers), esterification of FAs (acyl-CoA synthetases/ligases, including three putative acyl-CoA synthetases displaying the peroxisomal targeting signal-SKL), and β-oxidation were significantly downregulated (Table 1 and  Supplementary Table S1). On the contrary, some of them were upregulated, suggesting a complex metabolic response.
To elucidate what happened during the biolistic transformation experiment, we searched for potential structural genomic variations, i.e., for variants displaying insertions/deletions (InDels) > 40 bp. The LAS and WT PacBio reads were compared to identify background variants not due to true biological variation or to the noise of the sequencing technology. About 6400 variants were found in the two samples, but only 13 were specific to LAS, proving the effectiveness of this approach (Appendix A). The filtered variants were manually inspected to exclude additional false positives, eventually producing a final list of five variants (Supplementary Table S2). Three corresponded to unexpressed genes in WT as well as LAS. Of the remaining genes, the locus g1125 encoded for a protein containing a putative transmembrane PAN_1 domain and carried a 46-bp insertion in the promoter region. Proteins containing a PAN_1 domain are found in many organisms, including parasites such as Toxoplasma gondii or viruses. The PAN_1 domain has functional versatility fulfilling diverse biological roles by mediating protein-protein and protein-carbohydrate interactions. In human, PAN_1 domains are present in hepatocyte growth factors, in plasminogen and coagulation factors [42,43]. In Saccharomyces cerevisiae, PAN_1 is present in an actin-cytoskeleton-associated protein likely involved in protein-protein interactions essential for endocytosis [44]. To our knowledge, no protein containing a PAN_1 domain is involved in lipid metabolism. Surprisingly, a BLAST search against the Aurli1 assembly (the reference genome for Aurantiochytrium limacinum) and the NCBI database did not produce any match with g1125.
The locus g11073 encoded for a protein predicted to belong to the 'mitochondrial carrier' family, possibly a 'peroxisomal adenine nucleotide carrier' (jgi|Aurli1|76398|e_gw1.9.603.1). In the reference genome Aurli1, the coding sequence annotation was shorter than g11073, possibly missing the first methionine (Supplementary data and methods file). As shown in Figure 5A, the coverage of aligned PacBio reads displayed a gap in the coding region of g11073 in LAS, but not in WT. The gap corresponded to a 50% decrease in coverage, suggesting that only one allele was affected. Thus, the g11073 sequence in LAS appeared truncated, showing a heterozygous 843-bp deletion in the promoter/gene body region of one of the two alleles. About 400 bp were missing upstream of the coding sequence and 440 bp were missing within the 5 -end of the gene. A 2-fold downregulation of g11073 (Amp3 Log 2 FC = −0.84 ± 0.22, Amp4 Log 2 FC = −1.25 ± 0.18) in LAS vs. WT was recorded by qRT-PCR, suggesting that the truncated allele was not or only minimally expressed ( Figure 5B). If expressed, however, it can be estimated from these data that the size of the truncated protein would approximately be half of the native form ( Figure 5C).
The contribution of the peroxisomal ATP carrier to FA catabolism is reported for several organisms, including plants and yeast. In Saccharomyces cerevisiae, disruption of the peroxisomal adenylate carrier ANT1 (YPR128c) resulted in an impaired growth when medium chain FA laurate (C12:0) was provided as the only carbon source [45,46], and the same holds true for Candida boidinii [47]. In the oleaginous yeast Yarrowia lipolytica, a ∆Ylant1 mutant contained 20% more FAs on a dry weight basis than the wild type [48]. In plants, Arabidopsis lines repressing the two PNC genes displayed an impaired peroxisomal ATP import associated with a 10-fold increase of all FAs [49]. Because the peroxisomal adenylate carrier plays an important role in plant and yeast β-oxidation, we hypothesized that the deletion event recorded in LAS might be responsible for the observed lipid phenotype.  [42,43]. In Saccharomyces cerevisiae, PAN_1 is present in an actin-cytoskeleton-associated protein likely involved in protein-protein interactions essential for endocytosis [44]. To our knowledge, no protein containing a PAN_1 domain is involved in lipid metabolism. Surprisingly, a BLAST search against the Aurli1 assembly (the reference genome for Aurantiochytrium limacinum) and the NCBI database did not produce any match with g1125. The locus g11073 encoded for a protein predicted to belong to the 'mitochondrial carrier' family, possibly a 'peroxisomal adenine nucleotide carrier' (jgi|Aurli1|76398|e_gw1.9.603.1). In the reference genome Aurli1, the coding sequence annotation was shorter than g11073, possibly missing the first methionine (Supplementary data and methods file). As shown in Figure 5A, the coverage of aligned PacBio reads displayed a gap in the coding region of g11073 in LAS, but not in WT. The gap corresponded to a 50% decrease in coverage, suggesting that only one allele was affected. Thus, the g11073 sequence in LAS appeared truncated, showing a heterozygous 843-bp deletion in the promoter/gene body region of one of the two alleles. About 400 bp were missing upstream of the coding sequence and 440 bp were missing within the 5′-end of the gene. A 2-fold downregulation of g11073 (Amp3 Log2FC = −0.84 ± 0.22, Amp4 Log2FC = −1.25 ± 0.18) in LAS vs. WT was recorded by qRT-PCR, suggesting that the truncated allele was not or only minimally expressed ( Figure 5B). If expressed, however, it can be estimated from these data that the size of the truncated protein would approximately be half of the native form ( Figure 5C).

Functional Characterization of g11073
The g11073-translated product consisted of 316 amino acids with a theoretical molecular weight of~35 KD. It contains the conserved solute carrier repeat profile (Solcar Prosite accession: PS50920), shaded in gray in Supplementary Figure S2. The protein sequence was analyzed using DeepLoc2 [50] and it was predicted to be a membrane-bound (likelihood 0.98) peroxisomal (likelihood 0.89) protein. To infer a possible role of g11073 as AlANT1, the protein sequence and that of the closely related species Hondaea fermentalgiana [11] (strain FCC1311 locus 007482) were aligned with protein sequences in the data set published by Arai et al., 2008 [41] and a maximum likelihood (ML) phylogenetic analysis was performed ( Figure 6A). Both the putative AlANT1 and HfANT1 grouped with the PEROXISOMAL ADENINE NUCLEOTIDE CARRIER proteins (PNCs) from Arabidopsis thaliana (AtPNC1 and AtPNC2) and Glycine max (GmPNC), with ML bootstrap value of 59. The Saccharomyces cerevisiae ANT1 protein is found at a basal position of this group (ML 88).
Then, we aimed to confirm the physiological function of g11073. All our attempts to obtain genetically modified strains of A. limacinum CCAP 4062/1 failed, and, thus, a yeast complementation approach was used to validate the function of g11073. The yeast ∆ant1 strain devoid of peroxisomal adenylate carrier was complemented with a YEplac195 plasmid containing either the ANT1 gene, the complete g11073 gene, or the truncated one. The different strains displayed similar growth rates when grown in a glucose-containing medium (Supplementary Figure S3A). However, only ScANT1 and full-length g11073 proteins were detected by Western blot in protein extracts from the different complemented strains (Supplementary Figure S3B). No truncated protein was observed in any of the corresponding clones. This suggests that the truncated gene was not translated or the truncated protein was quickly degraded. As shown in Figure 6B, ∆ant1 mutant cannot grow in the presence of C12:0, in agreement with previous work [45], whereas complementation with the native ANT1 gene or the complete g11073 gene reversed the ∆ant1 growth phenotype. This confirmed a peroxisomal adenylate transporter role for g11073, which was subsequently named AlANT1. The short version of g11073 was unable to complement the ∆ant1 mutant, as expected. All positions with less than 95% site coverage were eliminated (partial deletion option). (B) Yeast Δant1 (in purple) was complemented either with its native ANT1 gene (in red) or with A. limacinum full sequence g11073 (in green) or with A. limacinum short sequence g11073 (in orange). Then, growth was measured in the presence of C12:0 as a carbon source. Neither Δant1 nor the complemented Δant1 with the truncated g11073 could grow in the presence of C12:0. The other complemented strains could grow in the presence of C12:0, although at a lower rate than Sc WT (in blue), demonstrating that g11073 is homologous to ANT1. Each value is the mean of three biological replicates, and error bars indicate the standard deviation. Graphs and statistics were performed with GraphPad Prism ® software.
Then, we aimed to confirm the physiological function of g11073. All our attempts to obtain genetically modified strains of A. limacinum CCAP 4062/1 failed, and, thus, a yeast complementation approach was used to validate the function of g11073. The yeast Δant1 strain devoid of peroxisomal adenylate carrier was complemented with a YEplac195 plasmid containing either the ANT1 gene, the complete g11073 gene, or the truncated one. The different strains displayed similar growth rates when grown in a glucose-containing me- All positions with less than 95% site coverage were eliminated (partial deletion option). (B) Yeast ∆ant1 (in purple) was complemented either with its native ANT1 gene (in red) or with A. limacinum full sequence g11073 (in green) or with A. limacinum short sequence g11073 (in orange). Then, growth was measured in the presence of C12:0 as a carbon source. Neither ∆ant1 nor the complemented ∆ant1 with the truncated g11073 could grow in the presence of C12:0. The other complemented strains could grow in the presence of C12:0, although at a lower rate than Sc WT (in blue), demonstrating that g11073 is homologous to ANT1. Each value is the mean of three biological replicates, and error bars indicate the standard deviation. Graphs and statistics were performed with GraphPad Prism ® software.

Discussion
After a biolistic treatment aimed to produce zeocin-resistant mutants, we unexpectedly selected a clone displaying a strong lipid phenotype. Metabolic analyses pointed out a significant lipid accumulation resulting from a deficit in FA catabolism. Surprisingly, the whole genome and transcriptome sequencing did not identify any recombinant DNA, indicating that the phenotype was not the result of genetic material transfer. Rather, it likely resulted from the biolistic method itself, the particle projections provoking some DNA damages and rearrangements in nuclei [32]. Expression analyses of genes directly involved in lipid metabolism could not corroborate our observations either. In particular, the genes involved in the mobilization, transport, and oxidation of FAs were not significantly downregulated in the mutant, as one would have expected if they were to be responsible for the lower FA catabolism. However, a genomic search for structural variants identified a sizeable deletion in one of the two alleles coding for a putative peroxisomal adenylate transporter. Phylogenetic analyses and yeast complementation experiments demonstrated that the affected gene was indeed a peroxisomal adenylate carrier.
An impaired ability to import ATP into peroxisomes upholds the observed phenotype. Indeed, the catabolism of FAs in peroxisomes involves several steps. First, FAs must be transferred from the cytosolic compartment to the peroxisomes. Two independent pathways are likely involved to transport FAs across peroxisomal membranes. The first pathway transports acyl-CoA chains through an ATP Binding Cassette (ABC) transporter, such as Pxa1p/Pxa2p in yeast or CTS in plants [51,52]. Although acyl-CoA chains may enter yeast peroxisomes directly as CoA esters, it is also possible that the CoA moiety is released during the transfer, either into the cytosol or into the peroxisomal lumen [48,51,53]. CoA release prevails in plants [49,52]. Once inside the peroxisomal lumen, the corresponding free fatty acids (FFAs) are activated as CoA-esters before entering the β-oxidation cycle. This reaction is catalyzed by a peroxisomal acyl-CoA synthetase (Faa2p in yeast). A second pathway involves an unidentified transporter able to import short to medium FFA chains [45,48,52]. Once in the peroxisome, the short to medium FFAs are activated as acyl-CoAs. Peroxisomal acyl-CoA synthetases catalyze ATP-dependent reactions and require the import of ATP from the cytosol. The peroxisomal adenylate carrier [PNC in Arabidopsis thaliana, ANT1 (YPR128c) in Saccharomyces cerevisiae] belongs to the Mitochondrial Carrier Family and catalyzes the import of ATP into peroxisomes in a strict counter exchange with AMP or ADP [46,49]. In Saccharomyces cerevisiae, which does not accumulate lipids, a mutant devoid of peroxisomal ATP carrier (∆Scant1) displayed an impaired growth when C12:0 was the unique source of carbon, whereas normal growth was observed with longer chains (C18:0) [45]. In the oleaginous yeast Yarrowia lipolytica, a ∆Ylant1 displayed a 20% increase of total FAs, whereas a mutant ∆Ylant1∆Ylpxa1∆Ylpxa2 accumulated twice as many FAs as the WT [48]. This indicates that both FFAs and acyl-CoA are likely to enter yeast peroxisomes. In Arabidopsis thaliana, suppressing the ATP import into the peroxisomes strongly impaired the breakdown of storage oil. Consequently, the total level of FAs increased 10-fold in the seedlings, and oil bodies accumulated. Mutant plants were defective in seedling growth and development, unless in the presence of sucrose [49]. This illustrates the preponderant role of the peroxisomal adenylate carrier in plant β-oxidation and suggests that mainly FFAs enter plant peroxisomes.
We did not succeed so far to stably transform A. limacinum strain CCAP 4062/1, thus reverse genetics' strategies could not be used to demonstrate that the lipid phenotype described here resulted solely from a peroxisomal deficiency in ATP import. However, a number of pieces of evidence pinpointed the peroxisomal adenylate carrier as a very likely candidate. First, one of the two alleles coding for AlANT1 was seriously damaged during the biolistic treatment. The mutant displayed a lower expression of AlANT1, suggesting that the truncated gene was not or only minimally expressed. In addition, if translated, the truncated protein would lack two out of the three transmembrane domains, and would, therefore, most likely be inactive. Noteworthy, the truncated gene expressed in S. cerevisiae did not produce any truncated protein (Supplementary Figure S3), suggesting that such a protein was either not translated or rapidly degraded. Second, the lipid phenotype in LAS was associated with a lower FA catabolism activity. This phenotype is similar to those described for plant pnc and yeast ant1 mutants. In response to this metabolic event, several compensatory mechanisms could be observed. In particular, genes involved in TAG degradation and acyl-CoA production in the cytosol (DAG lipase, acyl-CoA synthetase/ligase), genes for peroxisomal β-oxidation (acyl-CoA synthetase/ligase, the bifunctional enzyme Ehhadh), and those for mitochondrial β-oxidation (carnitine palmitoyltransferase, acyl-CoA dehydrogenase) were upregulated. More surprisingly, the genes involved in FA synthesis (acetyl-CoA carboxylase, FAS, PUFA synthase) were also upregulated. We have no clear explanation for this, but it is possible that the decrease in FA catabolism is perceived as a deficit in FA availability. The upregulation of lipid synthesis genes could contribute, together with the decrease in AlANT1 activity, to the large FA accumulation. These potential metabolic events are summarized in Figure 7.
cerevisiae did not produce any truncated protein (Supplementary Figure S3), suggesting that such a protein was either not translated or rapidly degraded. Second, the lipid phenotype in LAS was associated with a lower FA catabolism activity. This phenotype is similar to those described for plant pnc and yeast ant1 mutants. In response to this metabolic event, several compensatory mechanisms could be observed. In particular, genes involved in TAG degradation and acyl-CoA production in the cytosol (DAG lipase, acyl-CoA synthetase/ligase), genes for peroxisomal β-oxidation (acyl-CoA synthetase/ligase, the bifunctional enzyme Ehhadh), and those for mitochondrial β-oxidation (carnitine palmitoyltransferase, acyl-CoA dehydrogenase) were upregulated. More surprisingly, the genes involved in FA synthesis (acetyl-CoA carboxylase, FAS, PUFA synthase) were also upregulated. We have no clear explanation for this, but it is possible that the decrease in FA catabolism is perceived as a deficit in FA availability. The upregulation of lipid synthesis genes could contribute, together with the decrease in AlANT1 activity, to the large FA accumulation. These potential metabolic events are summarized in Figure 7.

FA degradation
Cpt Acad

FA synthesis
Acc PUFA synthase FAS --+ Figure 7. Scheme illustrating the cascade of events likely associated with a decrease of ATP import into peroxisomes. FAs produced from the degradation of TAGs enter peroxisomes through an ABC carrier or an uncharacterized free FA (FFA) transporter. Regarding the ABC carrier, it is not clear whether the CoA moiety is released in the cytosol or in the peroxisomal lumen, and if some acyl-CoAs can be transported as such. In peroxisomes, FFA are converted to acyl-CoAs, consuming ATP, before being oxidized through the β-oxidation cycle. The products of this oxidation cycle are shuttled to mitochondria for a complete oxidation to CO 2 and H 2 O. A deficiency in ATP import into peroxisomes slows down the β-oxidation activity, resulting in lower degradation of TAGs, which accumulate. As a consequence, genes involved in FA and glycerolipid degradation are upregulated, together with genes involved in FA biosynthesis (arrows). Acsl: acyl-CoA synthase/ligase; Acc: acetyl-CoA carboxylase; Cpt: carnitine palmitoyl transferase; Acad: acyl-CoA dehydrogenase; Ehhadh: peroxisomal bifunctional enzyme of the β-oxidation; FAS: fatty acid synthase; PUFA synthase: polyunsaturated fatty acid synthase.
Interestingly, the C16:0/C22:6 ratio fluctuated along the growth in both strains. In WT, C16:0 decreased faster than C22:6 during the rapid oil breakdown phase, whereas it increased faster than C22:6 in the oil accumulation phase. This could reflect a faster dynamics/turnover for C16:0 than for C22:6, with C16:0 being more rapidly mobilized/oxidized than 22:6, and more rapidly synthesized by the FAS system than C22:6 by the PUFA synthase complex. A higher proportion of C16:0 was also observed in LAS (35% of total FAs) compared with WT (20% of total FAs). In LAS, the genes encoding FAS and PUFA synthase were similarly upregulated (log 2 FC = 1.3 and 1.4-2, respectively), and it is difficult to estimate whether or not these regulations can support the observed change in the C16:0/C22:6 ratio. However, it is also possible that a sorting in the import of FAs into peroxisomes exists in A. limacinum, as it does in S. cerevisiae. If this is true, more C16:0 than C22:6 would enter peroxisomes as FFAs, and, thus, the β-oxidation of C16:0 in peroxisomes could be more ATP dependent than C22:6.
Biolistic transformation is a convenient method, well adapted to organisms having thick cell walls such as plants [54,55] or silica shells such as diatoms [30]. However, as already reported [32,33,56], it must be kept in mind that interpretation of a biolistic transformation needs caution. The results presented here appear relatively simple since no transgenic material was transferred into the genome of A. limacinum. In this particular case, the biolistic experiment can be seen as a random mutagenesis experiment, possibly enhanced by the action of the selection agent, the zeocin, which induces double-strand breaks [57]. In cases where the transfer of genetic material is effective, it may be difficult to determine, without sequencing the entire genome, what is due to transformation itself and what is due to collateral damages. The parallel study of independent, transformed lines is a pragmatic way to address this technical issue. Sequence breakage and reassembly are common events with biolistic transformation [56,58]. DNA repair involves mechanisms such as non-homologous end joining (NHEJ) or homology directed repair (HDR), eventually leading either to simple insertion with no other evidence of genome damage, chromothripsis-like rearrangements, or genomic deletions [32]. It is also possible that the specific nature of A. limacinum CCAP 4062/1 contributes to increase the probability to get genomic impairment events during biolistic treatments. Indeed, several cell types characterize the A. limacinum life cycle, including mononucleated and multinucleated cells [6,35]. The presence of cells with several nuclei might increase the complexity of events resulting from the transformation process and the bombardment procedure.

Conclusions
In conclusion, we selected by serendipity a mutant of A. limacinum presenting a large deletion in one of the two alleles coding for a peroxisomal adenylate carrier. This deletion resulted from biolistic damage. The large lipid increase associated with this event and the inheritability of this phenotype strongly support the hypothesis that the peroxisomal adenylate carrier is a very promising target to engineer the lipid content in Thraustochytrids.

. Genomic and Transcriptomic Approach
Here we report the genome sequencing, assembling and annotation of a transgenic line isolated by serendipity after biolistic genetic transformation with a zeocin expression plasmid of the parent Aurantiochytrium limacinum strain CCAP_4062/1. The strain turned out to be a false positive because it did not contain any trace of the plasmid bombarded on the cells. Nevertheless, the mutant showed a strong lipid accumulation phenotype (henceforth referred to as Lipid Accumulator Strain, LAS). In previous works, we demonstrated that the parent strain we used to produce LAS mutant belongs to Aurantiochytrium limacinum species [11] and that its genome [36] shows a high degree of collinearity and an Average Nucleotide Identity (ANI) of 98.89% with A. limacinum ATCC ® MYA1381 TM reference genome (https://phycocosm.jgi.doe.gov/Aurli1/Aurli1.info.html, accessed on 1 April 2020). Noteworthy, A. limacinum ATCC ® MYA1381 TM is the type species of the genus Aurantiochytrium (as strain SR21) [59].

Appendix A.2. Genome Assembly
The raw sequencing dataset included long reads produced with PacBio Sequel I (Table A1). About 676,000 polymerase reads with an N50 of 15,440 bp and a total of about 6 Gbp were produced. Considering that the estimated genome size of A. limacinum is about 60 Mbp, the PacBio data corresponded to an estimated 97× coverage. PacBio subreads were used to generate a complete LAS genome assembly. For this purpose, the reads were first corrected applying three iterations of the self-correction algorithm MECAT [60] with three k-mers length. A draft assembly was then generated with the software Canu [61]. In order to correct residual sequencing errors, RNA-seq data from three LAS samples were used in conjunction with the Pilon algorithm [62] to perform four rounds of corrections. A total of 33,351 corrections were applied, 94% of which were performed during the first iteration.
The final LAS assembly included 147 scaffolds with a total size of 59.28 Mbp (Table A2). The assembly showed high contiguity having an N50 of 1.2 Mbp and an L50 of 16 scaffolds. About 98.69% of the PacBio corrected reads could be mapped back to the assembly. The BUSCO [63] analysis showed that 79% and 91% of Eukaryotic and Stramenopile conserved single copy genes were detected in the assembly ( Figure A1). the software Canu [61]. In order to correct residual sequencing errors, RNA-seq data from three LAS samples were used in conjunction with the Pilon algorithm [62] to perform four rounds of corrections. A total of 33,351 corrections were applied, 94% of which were performed during the first iteration. The final LAS assembly included 147 scaffolds with a total size of 59.28 Mbp (Table  A2). The assembly showed high contiguity having an N50 of 1.2 Mbp and an L50 of 16 scaffolds. About 98.69% of the PacBio corrected reads could be mapped back to the assembly. The BUSCO [63] analysis showed that 79% and 91% of Eukaryotic and Stramenopile conserved single copy genes were detected in the assembly ( Figure A1).   On average 94.5% of the reads could be mapped on the reference genome of A. limacinum ATCC ® MYA1381 TM with STAR aligner [64] (Table A4). Most of the reads mapped uniquely on the genome. Gene expression quantification and the expression normalization were performed in order to carry out a differential expression analysis between LAS and CCAP_4062/1. First, lowly expressed genes were filtered out using the HTSFilter algorithm ( Figure A2) [65] which produced a dataset of about 10,000 genes to be further analysed. A Principal Component Analysis (PCA) using the normalised gene expression values as input was carried out in order to estimate the quality of the experiment in terms of reproducibility of the replicates ( Figure A3). The WT replicates group together and well differentiated along the principal component 1 (amount of variance explained 99.39%) Figure A2. Results of the low expression filtering step performed with HTSFilter. The algorithm calculated a Global Jaccard index (blue line) of similarity (on the Y-axis) between the samples in function of different minimum normalized read counts (on the X-axis, red crosses indicate the minimum and the maximum values). The graphic shows that for s = 31.274 the replicates have the highest similarity; thus, this value was used as a threshold.
A Principal Component Analysis (PCA) using the normalised gene expression values as input was carried out in order to estimate the quality of the experiment in terms of reproducibility of the replicates ( Figure A3). The WT replicates group together and well differentiated along the principal component 1 (amount of variance explained 99.39%) from the group of LAS replicates. Figure A2. Results of the low expression filtering step performed with HTSFilter. The algorithm calculated a Global Jaccard index (blue line) of similarity (on the Y-axis) between the samples in function of different minimum normalized read counts (on the X-axis, red crosses indicate the minimum and the maximum values). The graphic shows that for s = 31.274 the replicates have the highest similarity; thus, this value was used as a threshold.
A Principal Component Analysis (PCA) using the normalised gene expression values as input was carried out in order to estimate the quality of the experiment in terms of reproducibility of the replicates ( Figure A3). The WT replicates group together and well differentiated along the principal component 1 (amount of variance explained 99.39%) from the group of LAS replicates.  About 7300 genes were found to be significantly differentially expressed between LAS and CCAP_4062/1, 1974 of which with an absolute logarithmic fold change higher than 1.

Appendix A.4. Structural Variant Calling
In order to identify all the possible structural variations between LAS and its parent CCAP_4062/1, a structural variant calling was performed. As a first step the PacBio reads of LAS and CCAP_4062/1 were mapped against the reference genome of CCAP_4062/1 [36]. The statistics of the mapping for the two samples are reported in Table A5. The software SVIM [66] was used to perform the structural variation (SV) analysis. Small variants, such as SNPs and small InDels (up to 40 bp) were not considered with this approach. The CCAP_4062/1 reads were also analysed to identify background variants that are not due to true biological variation but to the noise of the sequencing technology. By using this approach, the LAS variants shared with CCAP_4062/1 were filtered out as false