The Terrestrial Carnivorous Plant Utricularia reniformis Sheds Light on Environmental and Life-Form Genome Plasticity

Utricularia belongs to Lentibulariaceae, a widespread family of carnivorous plants that possess ultra-small and highly dynamic nuclear genomes. It has been shown that the Lentibulariaceae genomes have been shaped by transposable elements expansion and loss, and multiple rounds of whole-genome duplications (WGD), making the family a platform for evolutionary and comparative genomics studies. To explore the evolution of Utricularia, we estimated the chromosome number and genome size, as well as sequenced the terrestrial bladderwort Utricularia reniformis (2n = 40, 1C = 317.1-Mpb). Here, we report a high quality 304 Mb draft genome, with a scaffold NG50 of 466-Kb, a BUSCO completeness of 87.8%, and 42,582 predicted genes. Compared to the smaller and aquatic U. gibba genome (101 Mb) that has a 32% repetitive sequence, the U. reniformis genome is highly repetitive (56%). The structural differences between the two genomes are the result of distinct fractionation and rearrangements after WGD, and massive proliferation of LTR-retrotransposons. Moreover, GO enrichment analyses suggest an ongoing gene birth–death–innovation process occurring among the tandem duplicated genes, shaping the evolution of carnivory-associated functions. We also identified unique patterns of developmentally related genes that support the terrestrial life-form and body plan of U. reniformis. Collectively, our results provided additional insights into the evolution of the plastic and specialized Lentibulariaceae genomes.


Introduction
The carnivorous plants from the Lentibulariaceae Rich. family exhibit different life-forms according to their habitats (terrestrial, aquatic, lithophytes, epiphytes, and rheophytes), showing an enormous diversity and worldwide distribution [1]. The family is composed of three genera:  We generated 350 gigabytes of genomic and transcriptomic data for U. reniformis ( Table 1). The k-mer frequency analysis demonstrated that U. reniformis exhibits a high heterozygosity rate of 1.8% ( Figure 2 and Figure S1). In addition, the k-mer frequency plot was consistent with a tetraploid genome with the haploid at 149 Mb. Moreover, the predicted genome unique content and repeated content lengths are ~88 Mb (29%) and ~220 Mb (71%), respectively.  We generated 350 gigabytes of genomic and transcriptomic data for U. reniformis ( Table 1). The k-mer frequency analysis demonstrated that U. reniformis exhibits a high heterozygosity rate of 1.8% (Figure 2 and Figure S1). In addition, the k-mer frequency plot was consistent with a tetraploid genome with the haploid at 149 Mb. Moreover, the predicted genome unique content and repeated content lengths are~88 Mb (29%) and~220 Mb (71%), respectively. The assembled draft genome spanned 304 Mb (96% of the flow cytometry-estimated genome size), with an average coverage of 145× (based on the number and length of aligned reads, divided by the assembled genome size), had an NG50 of 466-Kb and a BUSCO completeness of 87.8%, showing a duplication rate of~26%, which might correspond with a partial genome duplication (Table S2) and 42,582 gene model predictions. The estimated number of error-free bases was 93% (283 of 304 Mb), which also indicated that multiple repeated genomic regions (e.g., non-autonomous LTR elements, centromeres, and telomeres) remained partial or non-assembled. However, this finding also supported that a large amount of the heterozygous content present in U. reniformis genome was represented in the assembled draft genome (Table 2 and Figure 2). . Genome assembly analysis: Read k-mer frequency versus Utricularia reniformis assembly copy number stacked histograms generated by the KAT comp tool. Read content in black is absent from the assembly, red occurs once, purple twice, and green occurs three times, indicating a scenario of both heterozygous and polyploidy genome. The heterozygous content is represented by the first peak at x = 40 and the homozygous content in the second peak at x = 80. The hidden content (the black peak) represents the heterozygous content that is lost when the assembly bubbles are collapsed. The genome assembly contains most (but not all) of the heterozygous content and introduces duplications on both heterozygous and homozygous content.
The assembled draft genome spanned 304 Mb (96% of the flow cytometry-estimated genome size), with an average coverage of 145× (based on the number and length of aligned reads, divided by the assembled genome size), had an NG50 of 466-Kb and a BUSCO completeness of 87.8%, showing a duplication rate of ~26%, which might correspond with a partial genome duplication (Table S2) and 42,582 gene model predictions. The estimated number of error-free bases was 93% (283 of 304 Mb), which also indicated that multiple repeated genomic regions (e.g., non-autonomous LTR elements, centromeres, and telomeres) remained partial or non-assembled. However, this finding also supported that a large amount of the heterozygous content present in U. reniformis genome was represented in the assembled draft genome (Table 2 and Figure 2).
Moreover, several cpDNA-and mtDNA-derived regions were also identified, thus, providing support for the lateral transfer between the organelles and the nuclear genome, as previously reported [26,27]. We also detected simple sequence repeat (SSRs) markers already developed for U. reniformis [25], in our draft genome. Among all markers, for two SSR loci, we found corresponding regions in U. gibba and U. reniformis (Table S3). In both cases, we recovered two scaffolds in U. reniformis and only one syntenic genomic region in U. gibba ( Figure S2). For all other markers, we were not able to detect loci shared in both genomes. In addition to the Arabidopsis-type telomeric repeats identified at the scaffolds ends, we also found Chlamydomonas-and Genlisea-type telomeric repeats in their vicinity, as previously observed for U. gibba [5]. Furthermore, U. reniformis exhibited a considerable amount of TE-derived sequences, when compared to U. gibba (56% vs. 32%).

Figure 2.
Genome assembly analysis: Read k-mer frequency versus Utricularia reniformis assembly copy number stacked histograms generated by the KAT comp tool. Read content in black is absent from the assembly, red occurs once, purple twice, and green occurs three times, indicating a scenario of both heterozygous and polyploidy genome. The heterozygous content is represented by the first peak at x = 40 and the homozygous content in the second peak at x = 80. The hidden content (the black peak) represents the heterozygous content that is lost when the assembly bubbles are collapsed. The genome assembly contains most (but not all) of the heterozygous content and introduces duplications on both heterozygous and homozygous content. Moreover, several cpDNA-and mtDNA-derived regions were also identified, thus, providing support for the lateral transfer between the organelles and the nuclear genome, as previously reported [26,27]. We also detected simple sequence repeat (SSRs) markers already developed for U. reniformis [25], in our draft genome. Among all markers, for two SSR loci, we found corresponding regions in U. gibba and U. reniformis (Table S3). In both cases, we recovered two scaffolds in U. reniformis and only one syntenic genomic region in U. gibba ( Figure S2). For all other markers, we were not able to detect loci shared in both genomes. In addition to the Arabidopsis-type telomeric repeats identified at the scaffolds ends, we also found Chlamydomonasand Genlisea-type telomeric repeats in their vicinity, as previously observed for U. gibba [5]. Furthermore, U. reniformis exhibited a considerable amount of TE-derived sequences, when compared to U. gibba (56% vs. 32%).

Structural Comparative Analysis
In order to infer U. reniformis genome structure and WGD history, we compared the draft genome to U. gibba and other angiosperms. In spite of the low assembly contiguity, we were able to detect at least one WGD round since the core eudicot γ whole-genome triplication (WGT), according to the Ks values for estimates of individual genome duplication events ( Figure S3A). Moreover, blockwise relationships between U. reniformis and U. gibba were about 4:2 ( Figure S3B), supporting that U. reniformis is a tetraploid, as compared to U. gibba. This finding also suggested that the two Utricularia genomes responded distinctly after the WGD events.
The U. reniformis and U. gibba structural genome differences become more obvious by comparative pairwise alignment. We observed 77% (SD:4%) average nucleotide identity (ANI) between the U. reniformis and U. gibba assemblies by reciprocal best blast (RBB). Additionally, we observed low global collinearity, which indicated that the U. reniformis genome presents a mosaic structure in comparison to U. gibba ( Figure 3A,B). Approximately 114 Mb of the U. reniformis genome exhibited partial matches (average length of 84-Kb) to U. gibba (Table S4). The unaligned blocks corresponded to the segments exclusive to U. reniformis, which were mostly related to the TE-related regions ( Figure 3B). Additionally, we mapped less than 37% of the U. reniformis trimmed paired-end reads consistently with the right distance and orientation onto the U. gibba assembly, which also supported the considerable structural differences between these species (Table S5). As a result, it was not possible to employ the reference-assisted assembly procedure to anchor the U. reniformis scaffolds into the U. gibba fully-assembled chromosomes to reconstruct potential pseudo-molecules.
Moreover, in comparison to other angiosperms, U. reniformis also showed a lower percentage of macrosyntenic blocks. For instance, U. reniformis retained at least 47 Mb (average length of 76-Kb and ANI of 82%), 35 Mb (average length of 46-Kb and ANI of 76%), and 53 Mb (average range of 63-Kbp and ANI of 77%) of shared blocks to A. thaliana, V. vinifera, and S. lycopersicum. In contrast, its closest relative, U. gibba, retained 71 Mb (average length of 44-Kb and ANI of 80%), 57 Mb (average length of 26-Kb and ANI of 75%), and 58 Mb (average length of 35-Kb and ANI of 77%) blocks, respectively (Table S4). We further evaluated the microsyntenic level and verified that U. reniformis and U. gibba present distinct patterns of retention and alternative deletion of duplicated genes between the polyploid subgenomes, which is consistent with the distinct post-speciation evolutionary paths ( Figure 4A-C). This feature is also observed among the Utricularia species and A. thaliana, yet with fewer microsynteny blocks ( Figure 4D). When both Utricularia genomes are compared to the phylogenetically distant eudicotyledons, such as V. vinifera and S. lycopersicum, a significant Additionally, we mapped less than 37% of the U. reniformis trimmed paired-end reads consistently with the right distance and orientation onto the U. gibba assembly, which also supported the considerable structural differences between these species (Table S5). As a result, it was not possible to employ the reference-assisted assembly procedure to anchor the U. reniformis scaffolds into the U. gibba fully-assembled chromosomes to reconstruct potential pseudo-molecules.
Moreover, in comparison to other angiosperms, U. reniformis also showed a lower percentage of macrosyntenic blocks. For instance, U. reniformis retained at least 47 Mb (average length of 76-Kb and ANI of 82%), 35 Mb (average length of 46-Kb and ANI of 76%), and 53 Mb (average range of 63-Kbp and ANI of 77%) of shared blocks to A. thaliana, V. vinifera, and S. lycopersicum. In contrast, its closest relative, U. gibba, retained 71 Mb (average length of 44-Kb and ANI of 80%), 57 Mb (average length of 26-Kb and ANI of 75%), and 58 Mb (average length of 35-Kb and ANI of 77%) blocks, respectively (Table S4). We further evaluated the microsyntenic level and verified that U. reniformis and U. gibba present distinct patterns of retention and alternative deletion of duplicated genes between the polyploid subgenomes, which is consistent with the distinct post-speciation evolutionary paths ( Figure 4A-C). This feature is also observed among the Utricularia species and A. thaliana, yet with fewer microsynteny blocks ( Figure 4D). When both Utricularia genomes are compared to the phylogenetically distant eudicotyledons, such as V. vinifera and S. lycopersicum, a significant fractionation is prominent ( Figure 4E,F).  However, in general, the two Utricularia species were far more similar in gene order to each other than either were to A. thaliana, V. vinifera, and S. lycopersicum. The phylogenetic and ANI analyses based on 336 core eukaryotic genes corroborated our microsynteny analysis, supporting the same common ancestor for U. reniformis and U. gibba, and also for the Utricularia and Genlisea clades ( Figure 5). However, in general, the two Utricularia species were far more similar in gene order to each other than either were to A. thaliana, V. vinifera, and S. lycopersicum. The phylogenetic and ANI analyses based on 336 core eukaryotic genes corroborated our microsynteny analysis, supporting the same common ancestor for U. reniformis and U. gibba, and also for the Utricularia and Genlisea clades ( Figure 5). Arabidopsis thaliana as an out-group. Numbers above and below the lines indicate the maximum likelihood bootstrap values and the Bayesian posterior probabilities for each clade. The G. pygmaea and G. repens genomes used in this analysis are yet to be published. However, the concatenated core nucleotide gene fasta sequence used to construct this dataset is freely available at http://doi.org/10.5281/zenodo.3268745.

Utricularia reniformis Comparative Annotation
In general, U. reniformis and U. gibba show different arrays of gene duplication that might directly reflect support that they are retaining genes in a biased manner. This was exemplified by the more significant number of tandem, proximal, and dispersed gene pairs identified in U. reniformis, when compared to U. gibba (Table 3). In contrast, U. gibba displayed more singletons and segmental duplicates. However, the more significant number of segmental copies identified in U. gibba might be related to the assembled genome using long-reads, which permitted a better identification of large syntenic blocks, in comparison to the U. reniformis genome based on short-reads. Moreover, the considerable number of singletons in U. gibba might also represent retained duplicates that have reverted to a single copy after the WGD, which supported the genome size reduction observed in this species.
It is noteworthy that the gene space in U. reniformis represents ~26% of the genome, which also demonstrated that the intergenic regions are replete with TEs-related sequences constituting more than half of the genome size. Conversely, TEs covered ~32% of U. gibba genome, while the gene space represented ~51%. Overall similarity heatmap and maximum likelihood and Bayesian inference phylogenetic tree of 336 core-eukaryotic genes shared among seven carnivorous plant genomes (Utricularia reniformis, U. gibba, Genlisea aurea, G. nigrocaulis, G. pygmaea, G. repens, and G. hispidula) and Arabidopsis thaliana as an out-group. Numbers above and below the lines indicate the maximum likelihood bootstrap values and the Bayesian posterior probabilities for each clade. The G. pygmaea and G. repens genomes used in this analysis are yet to be published. However, the concatenated core nucleotide gene fasta sequence used to construct this dataset is freely available at http://doi.org/10.5281/zenodo.3268745.

Utricularia reniformis Comparative Annotation
In general, U. reniformis and U. gibba show different arrays of gene duplication that might directly reflect support that they are retaining genes in a biased manner. This was exemplified by the more significant number of tandem, proximal, and dispersed gene pairs identified in U. reniformis, when compared to U. gibba (Table 3). In contrast, U. gibba displayed more singletons and segmental duplicates. However, the more significant number of segmental copies identified in U. gibba might be related to the assembled genome using long-reads, which permitted a better identification of large syntenic blocks, in comparison to the U. reniformis genome based on short-reads. Moreover, the considerable number of singletons in U. gibba might also represent retained duplicates that have reverted to a single copy after the WGD, which supported the genome size reduction observed in this species.
It is noteworthy that the gene space in U. reniformis represents~26% of the genome, which also demonstrated that the intergenic regions are replete with TEs-related sequences constituting more than half of the genome size. Conversely, TEs covered~32% of U. gibba genome, while the gene space represented~51%.

Utricularia reniformis Shows a Massive Expansion of LTR from the Gypsy Superfamily
Utricularia reniformis presents a prominent LTR-retrotransposons expansion, representing up to 145 Mb ( Figure 6 and Table S6). In general, a separate array of expansion and contraction of distinct evolutionary lineages from the Copia and Gypsy superfamilies stand as the main differences between U. reniformis and U. gibba. The most noticeable increase of the Gypsy superfamily was observed within the Ogre evolutionary lineage, accounting up to 72 Mb of U. reniformis genome. It is noteworthy that the LTR-LARDs, Angela, CRM, are more prevalent in U. gibba, including the Athila evolutionary lineage, which was exclusively found in U. gibba. In contrast, from the Copia superfamily, the evolutionary lineages Alesia, Bianca, Ikeros, and SIRE were found solely in U. reniformis. As observed in other plant genomes, the Class II elements were less prominent in both species, and except for the Helitron super-family which we were not able to identify in U. gibba, mostly known super-families and evolutionary lineages commonly found in plant genomes were present.  Table S6. The y-axis on the left indicated the total length occupied in the megabases of each evolutionary lineage, whereas the y-axis on the right meant the % of the corresponding genome. Blue and Grey asterisk correspond to the absent evolutionary lineages in U. gibba and U. reniformis, respectively.
It was previously observed in many angiosperm genomes that secondary metabolic function and transcriptional function are enriched among tandems and segmental duplicates, respectively [5,[28][29][30]. Our results corroborate this observation for secondary metabolic function in the tandem repeats of both Utricularia, and the transcriptional functions only in U. gibba segmental duplicates. Interestingly, and opposite to what has been previously observed, the transcriptional features were augmented in U. reniformis dispersed duplicates genes. Since the dispersed copies could arise from  Table S6. The y-axis on the left indicated the total length occupied in the megabases of each evolutionary lineage, whereas the y-axis on the right meant the % of the corresponding genome. Blue and Grey asterisk correspond to the absent evolutionary lineages in U. gibba and U. reniformis, respectively.
It was previously observed in many angiosperm genomes that secondary metabolic function and transcriptional function are enriched among tandems and segmental duplicates, respectively [5,[28][29][30]. Our results corroborate this observation for secondary metabolic function in the tandem repeats of both Utricularia, and the transcriptional functions only in U. gibba segmental duplicates. Interestingly, and opposite to what has been previously observed, the transcriptional features were augmented in U. reniformis dispersed duplicates genes. Since the dispersed copies could arise from TEs movements, and the U. reniformis genome had a high level of LTR-elements, this result might suggest a central role of LTR transposition in molding these genomic differences.
We also checked the potential role of tandem and proximal duplicates with the evolution of novel functions, such as those related to carnivory and the ROS metabolism, which revealed an enrichment of many GO terms and KEGG enzymes in both species (Figure 7). For instance, catalytic activity (GO:0003824), biotic stimulus (GO:0009607), transferase activity (GO:0016740), and the cellular components-lysosome (GO:0005764), lytic vacuole (GO:0000323), and the cell wall (GO:0005618) were overrepresented. The enrichment of the cell wall (GO:0005618) term, could be related to the trap movement during the prey capture, which demands profound and dynamic cell-wall changes [5], and the lysosome and lytic vacuole enrichment might be related to the digestion of preys. However, we also identified distinct GO and KEGG enrichment arrangement among species, strongly suggesting a functional divergence, subfunctionalization, or neofunctionalization among the tandem and proximal duplicates (Figure 7). Comparatively, U. reniformis showed an enrichment of response to chemicals (GO:0042221), a response to external stimulus (GO:0009605), and signaling (GO:0023052), whereas, as observed in the dispersed duplicates, the lipid metabolic process (GO:0006629) was also augmented in U. gibba tandem duplicates. Regarding the well-known carnivory [4,5] and ROS enzymatic functions, U. gibba presented chitinase (EC:3.2.1.14), cysteine protease (EC:3.4.22), and peroxidases (EC:1.11.1.7) enriched in tandem duplicates regions, as previously described [5], whereas U. reniformis exhibited glutathione transferases (EC:3.4.16), carboxylesterase (EC:2.5.1.18), acylglycerol lipase (EC:3.1.1.23), pectinesterase (EC:3.1.1.11), and enzymes that acted on the paired donors, with an incorporation or reduction of molecular oxygen (EC:1.14.13). Therefore, these findings indicated that both species presented distinct enrichment patterns of GO terms and KEGG enzymes, supporting an ongoing gene birth-death-innovation process that included metabolism and carnivory-associated functions, among the tandem and proximal duplicated genes.
TEs movements, and the U. reniformis genome had a high level of LTR-elements, this result might suggest a central role of LTR transposition in molding these genomic differences.
We also checked the potential role of tandem and proximal duplicates with the evolution of novel functions, such as those related to carnivory and the ROS metabolism, which revealed an enrichment of many GO terms and KEGG enzymes in both species (Figure 7). For instance, catalytic activity (GO:0003824), biotic stimulus (GO:0009607), transferase activity (GO:0016740), and the cellular components-lysosome (GO:0005764), lytic vacuole (GO:0000323), and the cell wall (GO:0005618) were overrepresented. The enrichment of the cell wall (GO:0005618) term, could be related to the trap movement during the prey capture, which demands profound and dynamic cell-wall changes [5], and the lysosome and lytic vacuole enrichment might be related to the digestion of preys. However, we also identified distinct GO and KEGG enrichment arrangement among species, strongly suggesting a functional divergence, subfunctionalization, or neofunctionalization among the tandem and proximal duplicates (Figure 7). Comparatively, U. reniformis showed an enrichment of response to chemicals (GO:0042221), a response to external stimulus (GO:0009605), and signaling (GO:0023052), whereas, as observed in the dispersed duplicates, the lipid metabolic process (GO:0006629) was also augmented in U. gibba tandem duplicates. Regarding the well-known carnivory [4,5] and ROS enzymatic functions, U. gibba presented chitinase (EC:3.2.1.14), cysteine protease (EC:3.4.22), and peroxidases (EC:1.11.1.7) enriched in tandem duplicates regions, as previously described [5], whereas U. reniformis exhibited glutathione transferases (EC:3.4.16), carboxylesterase (EC:2.5.1.18), acylglycerol lipase (EC:3.1.1.23), pectinesterase (EC:3.1.1.11), and enzymes that acted on the paired donors, with an incorporation or reduction of molecular oxygen (EC:1.14.13). Therefore, these findings indicated that both species presented distinct enrichment patterns of GO terms and KEGG enzymes, supporting an ongoing gene birth-death-innovation process that included metabolism and carnivory-associated functions, among the tandem and proximal duplicated genes.
Interestingly, among the segmental duplicates, similar patterns of functions related to the primary metabolism, and the plant developmental process were enriched in both species, for example the amide biosynthetic process (GO:0043604), anatomical structure development (GO:0048856), reproduction (GO:0000003), nitrogen compound metabolic process (GO:0006807), peptide metabolic process (GO:0006518), and the primary metabolic process (GO:0044238).  Interestingly, among the segmental duplicates, similar patterns of functions related to the primary metabolism, and the plant developmental process were enriched in both species, for example the amide biosynthetic process (GO:0043604), anatomical structure development (GO:0048856), reproduction (GO:0000003), nitrogen compound metabolic process (GO:0006807), peptide metabolic process (GO:0006518), and the primary metabolic process (GO:0044238).

Utricularia reniformis Displays Unique Patterns of Carnivory-Associated, Land Adaptation, and Developmentally Related Genes
Previous studies, including literature review and comparative annotation approaches, accurately identified GO terms and KEGG enzymes strictly related to the carnivory-associated functions among the sequenced carnivorous plants available in public databases [31,32]. Based on this data, we investigated our functional annotation results and observed that U. reniformis had an extensive gene repertoire of carnivory-associated and ROS-related functions ( Table 4). In addition, we also identified an extensive repertoire of plant development, reproduction, and life-form adaption GO terms in U. reniformis (Table 5), which might be related to the different life-forms in comparison to U. gibba. Moreover, distinct patterns of ABC transporters were observed between U. reniformis and U. gibba ( Figure 8A,B). Terrestrial plants exhibited, on average, a repertoire of 128 ABC transporters, while the aquatic plants had fewer copies [33]. At least 132 and 86 ABC transporters were identified in U. reniformis and U. gibba, respectively ( Figure 8A and Table S10), suggesting that U. reniformis had more copies due to its terrestrial adaptation. This was especially apparent from the expansion of the ABC transporter B ( Figure 8A and Figure S4) and ABC transporter C families, which are commonly related to developmental processes and functions necessary for life on dry land, and the ABC G family ( Figure 8A and Figure S5), which are related to response to biotic stress [33]. Furthermore, plant developmental-related transcription factor (TFs) containing the wuschel-like homeobox (WOX), homeobox-leucine zipper (HD-Zip), agamous-like (MADS-box), TCP, WRKY, RADIALIS, DIVARICATA, DICHOTOMA, and the scarecrow-like gene families present distinct patterns of expansion and contraction among both species ( Figure 8B and Tables S11-S14), and thus, provide support for the two distinct life-forms between these species. Table 4. Carnivory-associated and reactive oxygen species (ROS) detoxification GO terms distribution. Utricularia reniformis (7200 unique genes: Singletons-3.5%, Dispersed-65.5%, Tandem and Proximal-10.5%, and Segmental-20.5%) and U. gibba (3918 unique genes: Singletons-10%, Dispersed-48%, Tandem and Proximal-6%, and Segmental-36%).

Comparative Annotation Among Other Angiosperms Genomes Reveals Species-Specific Genes Strictly Related to the Environment and Life-Form Adaptations
The comparative orthologous gene cluster analysis revealed that approximately 60% of U. reniformis and U. gibba genes are shared, indicating that they were inherited from the last common ancestor. Interestingly, U. reniformis presents an almost duplicated core gene set from that observed in U. gibba, V. vinifera, A. thaliana, and S. lycopersicum, and thus, supports a species-specific WGD event ( Figure 8C and Table S15). In general, a significant fraction of the species-specific genes (genes for which no orthologs could be found in any of the other species compared) and single-copy gene clusters from U. reniformis and U. gibba encode uncharacterized or hypothetical proteins (63% for U. reniformis and 50% for U. gibba). Among the species-specific genes, we highlighted ABC transporters, and several developmental-related TF gene families ( Figure 8A,B), supporting the role of the species-specific genes with plant development, reproduction, prey strategies, and life-from adaptations. GO enrichment analysis (p < 0.05, Fisher's exact test, Bonferroni corrected) among the species-specific genes revealed an augmented functions related to the reproductive process (

The Role of WGD in Utricularia reniformis Genome Evolution
Polyploid occurred multiple times during angiosperms evolution, and this is not an exception for Utricularia lineages. For instance, multi-way microsynteny analysis revealed three sequential WGD events in U. gibba, in addition to the γ WGT event shared by all core eudicots [4,5,34]. Additionally, out-crossing events might be broad among the Utricularia clade, and a comparative analyses revealed that the most recent U. gibba WGD was derived from an allopolyploidization event [5]. Allopolyploidization is recognized as a driving evolutionary innovation and is often

The Role of WGD in Utricularia reniformis Genome Evolution
Polyploid occurred multiple times during angiosperms evolution, and this is not an exception for Utricularia lineages. For instance, multi-way microsynteny analysis revealed three sequential WGD events in U. gibba, in addition to the γ WGT event shared by all core eudicots [4,5,34]. Additionally, out-crossing events might be broad among the Utricularia clade, and a comparative analyses revealed that the most recent U. gibba WGD was derived from an allopolyploidization event [5]. Allopolyploidization is recognized as a driving evolutionary innovation and is often associated with speciation when resulting in a duplicated genome structure [35][36][37]. Therefore, the current hypothesis is that not only was the U. gibba genome shaped by WGD events, but other Utricularia genomes and specifically the U. reniformis genome were shaped by WGD-induced variation and adaptation.
After a polyploidy event, each lineage undergoes distinct patterns of gene retention and fractionation, which generally leads to the genome returning to the diploid state [38]. Our results for U. reniformis and U. gibba support that each species has distinct patterns of deletions, duplicated genes, and rearrangements, suggesting that after speciation and WGD, they took distinct evolutionary paths that were consistent with their life-histories. This is exemplified by the significant number of tandem, proximal, and dispersed gene pairs identified in U. reniformis, when compared to U. gibba. Indeed, gene duplication is recognized as one of the major sources of evolutionary innovation [39]. For instance, tandem and proximal duplicated genes are often associated with the emergence of functional novelty, which commonly originated from isolated events in which an individual gene gets duplicated by unequal crossing-over between similar alleles [39]. Together with a large number of dispersed duplicates identified, the tandem duplicated genes might also arise by translocations mediated by TEs [39], generating single-gene transposition-duplication, and therefore, indicating for an essential role of TEs molding the genome structure and evolution. Additionally, WGD analysis confirmed that U. reniformis experienced at least one WGD round since the core eudicot WGT. However, it should be taken into account that the low assembly contiguity based on short-read technology hindered the identification of large syntenic blocks, and thus, complicated the evaluation of the complete history of U. reniformis WGD events.

LTR-Retrotransposons are Key Agents Governing Utricularia Genome Size Changes
One of the most common repetitive content of U. reniformis is related to TEs-related sequences, corresponding to at least 56% of the genome. The TEs drive the genome evolution, mainly by rearrangements, gene promoters repression, enhancers disruption, epigenetic regulation, and also by inflating the genome size by its copy-and-paste mechanism [40,41]. Apart from their deleterious effects, TEs also provide genome variation that can lead to the plant adapting more rapidly to new environmental conditions, and thus, leading to speciation [42,43]. The increased number of TEs in U. reniformis vs. U. gibba suggests that TEs might be playing a role in the genome innovation we are observing, but it could also just reflect a difference in the ability of the two genomes to purge mobile TEs.
The most prominent TE expansion in U. reniformis is related to LTR-retrotransposons elements from the Gypsy superfamily, which was also observed in the Solanaceae and Brassicaceae families [44,45]. In particular, elements from the Tat lineage were the most prominent, which are commonly broader in other plants, constituting up to 40% of the genome of some species [46].
Except for LTR-LARDs, all other TE groups are expanded in U. reniformis. The LTR-LARD and CRM are generally located in complex and highly repeated genomic loci, such as heterochromatic and pericentromeric regions [47,48]. These regions are often partially assembled using short-read technology, and we verified that most LTR-LARDs identified in U. reniformis are incompletely assembled, suggesting that they were partially determined and underestimated. Therefore, LTR-LARD and CRM elements might potentially represent a considerable fraction of the genome that remains unresolved in the U. reniformis assembly, and the results presented might be biased due to the assembly contiguity limits.

The Genomic Landmarks for Terrestrial Adaptation
Adaptation to the environment and life-form plasticity are hallmarks of the selective pressures that govern genome evolution in the Utricularia lineage. A common reproductive strategy for several Utricularia is to produce clones by stolon fragmentation, which is a usual asexual reproduction of aquatic species from the section Utricularia (the section in which U. gibba is nested), with some species being sterile and rarely flowering [49,50]. In contrast, U. reniformis presents a specialized sexual reproductive strategy that is entirely dependent on pollinators, and is specifically dependent on self-pollination [51]. Moreover, U. gibba seems to have a more severe degree of Fuzzy Arberian Morphology, such as no clear delimitation of distinct vegetative organs [52]. In contrast, U. reniformis presents a more traditional vegetative organ delimitation (as stems and leaves), similar to other angiosperms.
Consequently, the vegetative plasticity found in different Utricularia species might be essential strategies used to colonize the distinct habitats. Considering the different environments (water, soil, rock surface, etc.), the various trap designs [53][54][55] might be shaped as adaptation for different prey faunas [56][57][58][59]. Ultimately this might impact the absorption and assimilation of nutrients. At the genomic scale, both species display distinct patterns of development-related genes and TFs, such as WOX, HD-Zip, MADS-box, WRKY, TCP, RADIALIS, DIVARICATA, DICHOTOMA, and scarecrow-like gene families. These TFs play specialized roles in plant growth and development by regulating cell division and differentiation, in particular, the trap and floral meristem and trichome development, and might also develop an important role in the regulation of flowering time and the ontogeny of reproductive organs [16,[60][61][62][63][64]. For instance, the WOX1 (3 vs. 5 species-specific genes) from the WOX family, is possibly associated with the trap development [16] SOC1 (11 vs. 6 species-specific genes) from the MADS-box family, which might develop a role with phosphorus scavenging from the trapped prey [4], and ATHB51 (5 vs. 6 species-specific genes) related to the leaf morphogenesis. These findings are suggestive to a different array of target genes, and the processes that these TFs control, and in addition to the distinct number of developmental genes identified in both species, might apply in different body plan adaptations and prey strategies.
Moreover, both species display a distinct repertoire of ABC transporters that are considered essential to terrestrial life-form adaption [33]. Some ABC transporters members are exclusively found in U. reniformis. For instance, the ABCB member 9 is related to plant hormone transport, regulation of growth, and in development [65]. Additionally, the mitochondrial ABCB family member 25 was directly involved with the molybdenum cofactor biosynthesis, and nitrogen assimilation from the soil [66]. Interestingly, aquatic plants mainly rely on ammonium as their primary nitrogen source, and the molybdenum cofactor biosynthesis deficiency is commonly associated with the use of ammonium as an alternative nitrogen source in habitats with increased relative humidity [66]. This suggests that the absence of ABCB members 25 in U. gibba might be directly related to their aquatic adaptation. Finally, the ABCG member 36, exclusively found in U. reniformis, is related to the export of related defence compounds and callose deposition at the site of pathogen contact to restrict pathogen development [67,68], supporting that U. reniformis needs extra protection for cellular damage, which is a common characteristic of terrestrial plants.
It is known that carnivorous plants experience different and variable levels of nutritional stress [69], and the emergence of the carnivory-associated function in each lineage was a result of multiple evolutionary paths [31]. Interestingly, U. reniformis presents a distinct and augmented set of well-known carnivory-associated functions. Among them, alpha-galactosidase, amidase, actin filament, aspartic-type endopeptidase, beta-galactosidase, cellulase, chitinase and cysteine-type peptidase, myosin heavy chain kinase, and polygalacturonases might be related to carbon and energy cycles maintenance with cellulose and its associated polymers-decomposing activity, suggesting a potential role in the breakdown of prey polysaccharides. Moreover, the enrichment of ATPase activities might be correlated with the trap acidity and molecular transport functions, such as the release of digestive enzymes and the absorption of digested material from the preys [5,31,70].
Conversely, in addition to carnivory-associated functions, the ROS-related enzymes are also augmented in U. reniformis. For instance, catalase, superoxide dismutase, peroxidase, and glutathione transferase might contribute to cellular homeostasis and impacting the control of the high respiration rates possibly originated through the carnivory process. In addition, U. reniformis exhibits much more dispersed, tandem, and proximal duplicated genes. The emergence of tandem duplicated genes might contribute to the evolution of novel functions and adaptation. Our results provide support for an ongoing gene birth-death-innovation, occurring mainly among tandem and dispersed duplicates, impacting a number of different GO terms and KEGG enzymes, including carnivory-associated functions across both Utricularia. This process might give a fine-tuning of carnivory-associated functions in each selective environment (aquatic and terrestrial). For instance, a high diversity of bacteria, fungi, algae, and protozoa compose the ecosystem inside the Utricularia trap and act synergistically to convert complicated organic matter into a nutrients source for the plants [71]. The trap ecosystem and prey biota can significantly vary between the terrestrial and aquatic habitat. The aquatic Utricularia species usually have a prey spectra dominated by copepodids, cladocerans, ostracods, rotifers, and aquatic insect larvae, while terrestrial species can also have acari, nematodes, and other terrestrial microorganisms [56,58,69,72,73]. Moreover, the environment and the capacity to capture preys can be determined by the trap and prey size [74]. Thus, the trap size difference between the two species (U. gibba possess traps c. 0.7-1.5 mm and U. reniformis c. 1-2.5 mm long; [1]) can also be the result of selection pressure for the different adaptation to aquatic (U. gibba) and terrestrial and epiphytic life-forms. Therefore, both plants might present a distinct trap ecosystem and an enzymatic digestive repertoire to uptake nutrients, and our results supported this hypothesis at a genomic scale. However, this hypothesis warrants further functional studies for confirmation. For the genome size estimation, approximately 25 mg of leaf tissue was macerated in 1 mL of cold Ebihara buffer [75] supplied with 0.025 µg mL −1 RNAse, using a scalpel blade to release the nuclei into suspension with the same mass of the internal reference standard Raphanus sativus var. Saxa (2C = 1.11 pg; [76]). The nuclei suspensions were stained by adding 25 µL of a 1 mg mL −1 solution of propidium iodide (PI, Sigma). The analysis was performed using the FACSCanto II cytometer (Becton Dickinson, San Jose, CA, USA). The histograms were obtained through the FACSDiva software based on 5000 events, and the statistical evaluation was performed using the Flowing Software 2.5.1 (http://www.flowingsoftware.com/). Three individuals were analyzed in triplicates. The quality control of samples was based on the coefficient of variation (CV) of each measurement (which should be below 5%) and the standard deviation (SD) among the 2C-values (which should be below 3%). Such limits ensure that the variations observed inside and among measurements are due to technical factors and do not represent intraspecific variation among individuals [77].

Genome and Transcriptome Sequencing
One individual was sampled, producing the three different DNA libraries used in this study. DNA was extracted using the QIAGEN DNeasy Plant Maxi Kit extraction protocol (Qiagen, Hilden, Germany) to prepare libraries for Illumina TruSeq PCR-free and Nextera XT paired-end reads, with 350 bp (2 × 100 bp) and~450 bp (2 × 300 bp), insert sizes, and Illumina Nextera mate-pair gel free with insert sizes ranging from 3.5 to 10 Kb (2 × 100 bp). Illumina libraries were sequenced on HiScanSQ and MiSeq machines. The paired reads (2 × 100 bp and 2 × 300 bp) adapters were removed with Trimmomatic v0.27 [78], while the mate-pair reads (2 × 100 bp) internal adapters were trimmed with the NxTrim [79]. Sequences with phred value below 24 (<Q24) and spanning less than 50 bp were removed. For transcriptome sequencing, we used samples from leaves, stolon, and utricles. The RNA extraction and sequencing was performed according with the U. reniformis previous research [26,27]. Trimmed RNAseq reads from the leaves, stolon, and utricles were concatenated, and de novo assembled using Trinity v2.7.0 [80], and were exclusively used for gene model predictions (see below).
The predicted WGDs events that occurred in the U. reniformis evolutionary history were predicted on the basis of the distribution of synonymous substitutions per synonymous site (Ks) of gene pairs, using the SynMap2 tool from the Comparative Genomics Platform (CoGe) [92]. MCScanX [93] was employed to identify dispersed, proximal, and tandem (small-scale duplication) and segmental duplicates (WGDs). Tandem duplicates are defined as paralogs that are adjacent to each other, and the maximum distance to call a proximal copy was set to 10 genes. The anchor genes in collinear blocks inferred segmental duplicates. Structural and comparative analyses (pairwise alignment) were carried out with D-GENIES [94]. Macrosynteny and microsynteny analyses were carried out with VGSC2 and MCscan (Python version) (https://github.com/tanghaibao/jcvi.wiki.git).

Transposable Elements Identification, Classification, and Genome Annotation
The de novo detection and classification of TEs were carried out by the REPET v2.5 [95] with the "-struct" parameter. The identified elements from the REPET package were manually validated and characterized into super-families and evolutionary lineages with the usage of the Domain-based ANnotation of Transposable Elements (DANTE) with the Viridiplantae database v3.0 [96]. The TE masking was performed with the RepeatMasker Open-v4.0.7 (http://www.repeatmasker.org) using the parameter: "-s -cutoff 260".
The gene prediction was performed using the BRAKER v2 pipeline [97], and the TE-related genes were discarded. PASA pipeline [98] was used to produce spliced alignment assemblies based on RNA-seq data. The gene predictions and transcript alignments were combined with the EVidence Modeler software [99]. Finally, two PASA pipeline iterations were used to update the EVidence Modeler consensus predictions, adding UTR annotations and models for the alternatively spliced isoforms. Functional annotation was performed using Blast2GO v5 [100], InterProScan 5 [101], and EggNOG-mapper v1.0.3 with the EggNOG 5.0 database [102]. For the protein assignment, the UniProtKB/TrEMBL [103] viridiplantae database was used. The final predicted gene models were scanned against the DANTE and RepeatMasker predictions results, and all TE-related genes were further discarded.
For consistency in the comparative analysis, Utricularia gibba transcriptomes read from NCBI Sequence Read Archive (SRA): SRX2368915, SRX247091, and SRX038704 were assembled, and the U. gibba genome (Genbank Accession Number: NEEC01000001-NEEC01000518) was re-annotated using the same pipeline.

Phylogenetic Analysis
The phylogenetic analyses were performed using maximum likelihood (ML), and Bayesian inference (BI) approaches. For the ML, RAxML v. 8.2.10 [104] was used with default parameters, and the clade support estimates were calculated using rapid bootstrapping of 1000 pseudoreplicates. The BI was performed with Mr. Bayes v.3.2.6 [105] using MCMCMC, with random starting trees, of about 5,000,000 generations, with sampling for each 100 trees, using two runs and four chains and the trees were considered only after a stationary position was reached, with 25% of initially elaborated trees discarded. The GTR + G + I was used as the best-of-fit model according to the AIC (Akaike Information Criterion) with the jModeltest v. 2.1.10 [106].

Comparative Analysis
OrthoVenn v2 [107] was used to determine the orthologous gene families among Utricularia reniformis, U. gibba, and other angiosperms (Arabidopsis thaliana, Vitis vinifera, and Solanum lycopersicum) retrieved from the Phytozome database [108]. Functional enrichment analyses of the GO terms and the KEGG enzymes were conducted using Fisher's exact test and was corrected for multiple testing using two methods employed by the Blast2GO (GO terms and KEGG enzymes) and GOATOOLS (GO terms) [109]. The p-values were determined after the correction for multiple tests through False Discovery Rate (FDR) control, according to the Benjamini-Hochberg. The WOX, HD-Zip, MADS-Box Transcription Factors (TFs), and ABC transporters comparative analyses were based on previous [16,33] and manual annotation using the A. thaliana reviewed entries from UniProt Database as reference [110].

Availability of Supporting Data
The raw sequencing reads have been deposited into GenBank, BioProject PRJNA290588, linked directly to the SRA under the accession numbers SRR8289569, SRR8289570, and SRR8289571 for the Illumina datasets and SRR8289572 for Ion Torrent's RNA-Seq data. This Whole Genome Shotgun project has been stored at DDBJ/ENA/GenBank under the accession RWGZ00000000. The version described in this paper is version RWGZ01000000. The gene models and genome browser of Utricularia reniformis are available at https://genomevolution.org/coge/GenomeInfo.pl?gid=54799. The Gene Ontology annotation (Blast2GO) and all the raw data generated for this study are freely available at the Zenodo platform at http://doi.org/10.5281/zenodo.3268745.

Conclusions
In this work, we generated a draft genome sequence and annotation for the terrestrial carnivorous plant U. reniformis and compared it against its sister, aquatic species U. gibba. Our results demonstrated that a massive proliferation and loss of lineage-specific LTR-retrotransposons, predominantly from the Gypsy super-family and WGDs are the main governing agents of the genome size changes and evolution between these species. Interestingly, our results strongly suggest that both genomes responded distinctly after the WGD events. This is mainly observed by specific patterns of gene fractionation and retention after the lineage split, which might be reflected in an ongoing gene birth-death-innovation process among the duplicated genes. Additionally, each species carries a diversified set of plant ontogeny and carnivory-associated gene repertoire, supporting that the reproductive isolation and terrestrial and aquatic life-form constraints (e.g., different nutrients repertoire and availability, prey diversity, and trap microbiome ecosystem) in an evolutionary time scale are associated with speciation.