A Global Landscape of Miniature Inverted-Repeat Transposable Elements in the Carrot Genome

Miniature inverted-repeat transposable elements (MITEs) are the most abundant group of Class II mobile elements in plant genomes. Their presence in genic regions may alter gene structure and expression, providing a new source of functional diversity. Owing to their small size and lack of coding capacity, the identification of MITEs has been demanding. However, the increasing availability of reference genomes and bioinformatic tools provides better means for the genome-wide identification and analysis of MITEs and for the elucidation of their contribution to the evolution of plant genomes. We mined MITEs in the carrot reference genome DH1 using MITE-hunter and developed a curated carrot MITE repository comprising 428 families. Of the 31,025 MITE copies spanning 10.34 Mbp of the carrot genome, 54% were positioned in genic regions. Stowaways and Tourists were frequently present in the vicinity of genes, while Mutator-like MITEs were relatively more enriched in introns. hAT-like MITEs were relatively more frequently associated with transcribed regions, including untranslated regions (UTRs). Some carrot MITE families were shared with other Apiaceae species. We showed that hAT-like MITEs were involved in the formation of new splice variants of insertion-harboring genes. Thus, carrot MITEs contributed to the accretion of new diversity by altering transcripts and possibly affecting the regulation of many genes.


Introduction
Miniature inverted-repeat transposable elements (MITEs) are non-autonomous Class II mobile elements, widespread and abundant in plant genomes. MITEs constitute an artificial group within class II elements characterized by high copy numbers, small size (>700 bp) and lack of any coding capacity [1]. They are further divided into sub-groups based on the similarity of their terminal inverted repeats (TIRs) and target site duplications (TSDs) to those of certain elements representing major superfamilies of Class II transposons.
Tourists and Stowaways were the first described and the most thoroughly characterized MITE sub-groups derived from PIF/Harbinger and Tc1/Mariner elements, respectively [2,3]. Subsequently, MITEs related to other DNA transposons, i.e., Mutators and hATs, were also reported [4,5]. The similarity of MITE TIRs to those of autonomous DNA transposons allows MITEs to be mobilized by transposases encoded by their autonomous relatives, resulting in the transposition and formation of characteristic TSDs upon insertion.
MITEs are often inserted in the vicinity of genes. Such a location entails the possibility of a functional impact on the nearby gene. Besides the knockout mutations resulting from a disruption of the gene structure [6][7][8], numerous reports have been published indicating the possibility that MITE insertions may alter the expression of adjacent genes [9][10][11][12][13][14]. MITEs can also give rise to siRNAs, and drive gene silencing through RNA-directed methylation (RdDM) [14,15]. It has been shown that genes containing MITE insertions in their promoters may be epigenetically down-regulated, e.g., MITE-associated methylation of ZmCCT and ZmNAC111 promoters in maize resulted in an early-flowering phenotype and a drought-sensitive phenotype of maize seedlings, respectively [16,17]. MITEs may also

Identification and Genomic Localization of MITEs in the Carrot Reference Genome
Carrot MITE families were identified with MITE-Hunter [37], using default parameters. The final dataset, comprising 522 MITE consensus sequences, was manually curated and grouped into 428 MITE families using the 80-80-80 similarity rule [1]. Those sequences were used as queries for a blastn search [38] against the carrot DH1 reference genome (GenBank assembly accession number: GCA_001625215.1; [30]). The blastn output was parsed using a custom Perl script, with parameters allowing the extraction of coordinates of copies following 80-80-80 similarity rule, and for which similarity at both ends started between 1-10 nucleotides and continued over the whole query sequence. Genomic regions meeting those criteria were used to produce two bed files, one with exact coordinates and the other with coordinates extended by 50 nt at both ends. The bed file with the MITE-flanking regions was used to extract sequences that were manually inspected to verify/identify MITE TIRs and TSDs. Subsequently, TSDs were used to combine families into groups related to DNA transposons [Class II], i.e., Stowaways (related to Tc1/mariner), Tourists (related to PIF/Harbinger), hAT-like, and Mutator-like MITEs.
To mine for autonomous elements related to MiMs (MITEs inserted in microsatellite) [26], we used blastx implemented in blastall (v 1 -b 1) to search for Mutator transposases in the carrot reference genome. As queries, we used Mutator proteins identified in Oryza sativa (AAX92869.1), Rosa chinensis (PRQ51974.1), Helianthus annuus (OTF91787.1), Solanum tuberosum (ABI34394.1), Arabidopsis thaliana (AAG51216.1), and Medicago truncatula [39]. Genomic regions with similarity to the Mutator proteins were extracted with 4 kb flanking sequences and used to identify TIRs. TIRs were recognized based on the results of the selfblast of extracted sequences (blastn; -v 1 -b 1), from which the coordinates of hits present on both flanks of the putative transposase coding region were in complementary orientation and shared at least 50% identity. They were used to define the boundaries of putative autonomous elements. Subsequently, we used blastn with default parameters to compare 50-bp-long ends of consensus MiM sequences with the mined Mutator-like elements.
A bed file containing coordinates of MITEs was used to determine the positions of MITE copies in the context of genic regions, as described by Macko-Podgórni et al. [33]. MITE copies were categorized based on the NCBI carrot genome annotation (GCF_001625215.1 _ASM162521v1_genomic.gff) as 5 UTR (MITE coordinates overlapping annotated 5 UTRs), 3 UTR (MITE coordinates overlapping annotated 3 UTRs), cds (MITE coordinates overlapping annotated cds), intron (MITE coordinates located within an annotated intron), upstream (MITE present within 1 kb upstream a gene), downstream (MITE present within 1 kb downstream a gene), while copies not fulfilling any of those criteria were considered as intergenic. All calculations were performed using the bed file in R [40]. A gff3 file showing the annotation of carrot MITEs in the reference genome was developed (File S1 in the Supplementary Materials).

Mining for MITEs in the Reference Genomes of Other Asterid Species
The mining strategy used to identify carrot MITEs was applied to eight reference genomes representing members of the Asterid clade (Supplementary Table S2 [47][48][49][50][51][52][53]). Then, blastn [38] was used for among-species comparisons of MITE consensus sequences to identify MITE families shared among Asterids. The presence of a region covering a minimum of 60% MITE length with identity equal to or higher than 60% was used as a threshold to define related MITE families from different genomes. A graphical representation of levels of similarity among related MITEs was drawn using Circoletto [54], with the following blastn parameters: -F F -e 1e-10 -E -1 -v 200 -b 200, and "score/max" ratio coloring with blue ≤ 0.25, green ≤ 0.50, orange ≤ 0.75, red > 0.75.

Abundance and Genomic Localization of Carrot MITEs
We identified 428 MITE families in the carrot reference genome and divided them into groups on the basis of their relationships to Class II TE superfamilies, as revealed by their TIR and TSD similarity ( Figure 1). We used consensus sequences representing each family to identify individual copies along the reference assembly of the carrot genome [30]. In total, MITEs were estimated to occupy around 2% of the carrot genome (31,025 copies spanning 10.34 Mbp). The largest genome fraction was attributed to Stowaways (2.97 Mbp) followed by Tourists (2.78 Mbp) and Mutator-like MITEs (2.71 Mbp). hAT-like MITEs were the least numerous but the most diverse, comprising one-third of all identified MITE families. Families derived from Mutator-like elements accounted for ca. 30%, while Tourists and Stowaways grouped 19% and 13% of MITE families, respectively (Table 1). More than half of all MITEs (16,693 copies; 54%) were inserted within genic regions, defined as 1 kb upstream and downstream of genes and including the gene body (Table 1). We also identified 2156 copies belonging to 15 families flanked by "TA" stretches and having distinguishable but usually poorly conserved TIRs. For three of those families, we were able to identify related putative autonomous Mutator elements, confirming that they should be attributed to the group of Mutator-like MITEs, and classified them as MiMs, following the nomenclature of Chen et al. [26]. The MITE groups differed in terms of their copy number, the average length of the element and genomic localization. Stowaways were on average 254 bp-long, and constituted the most abundant group of MITEs in the carrot genome, reaching 11,674 copies. The largest family, comprising 1452 copies, was attributed to that group. The most numerous Stowaway families included previously described DcSto families [33]. Stowaway elements were relatively enriched in intergenic regions and upstream from genes. Tourists were represented by 8731 copies with an average element length of 319 bp. The largest family in that group comprised 1353 copies. A previously described Krak family [31], comprising 333 copies in the carrot reference genome, was the seventh-largest family of Tourists.
Tourists were more frequently positioned upstream and downstream from genes and in UTRs. Mutator-like MITEs comprised 6308 copies, with a mean length of 429 bp. The largest family within that group contained 542 elements. Mutator-like MITEs, including MiMs, were present mainly in intergenic regions and in introns. The group of hAT-like elements was the least numerous, represented by 3637 copies with an average length of 454 bp and the largest family comprising 217 copies. hAT-like elements were relatively more frequently located in coding regions and UTRs, as compared to the other MITE groups (Table 1, Figure 2, Figure S1, File S2 in the Supplementary Materials).

Carrot MITEs Are Co-Transcribed with Genes and Are Involved in Tissue-Specific Alternative Splicing
In order to determine whether MITEs localized in the vicinity of genes were cotranscribed, we used RNAseq reads from 20 carrot tissues [30] and mapped them onto MITE sequences to search for MITEs covered by more than ten reads. In total, 3469 MITE copies, representing 336 families, were retrieved. Of those, 60% were localized in introns, 16% and 14% 1 Kb upstream or downstream from genes, respectively, 4% in 5'UTR, 5% in 3'UTR, and 1% in cds (Table S3, Figure S2 in the Supplementary Materials). The number of copies belonging to particular MITE families, residing in transcribed regions, correlated with the total number of copies in those families (p-value = 2.2 × 10 −16 (Supplementary Figure S3)). hAT-like elements, which were shown to be relatively more frequent in gene bodies, were also more frequently co-transcribed. Interestingly, hAT-like elements located in introns were frequently retained in transcripts, while intronic Stowaways and Mutator-like MITEs were usually spliced out together with introns and did not significantly contribute to transcripts ( Figure S4 in the Supplementary Materials). Thus, we were prompted to investigate if hAT-like MITEs might be involved in the formation of novel gene-splicing variants.
In some tissues, we observed more reads mapping to MITEs, possibly indicating a higher expression of MITE-containing isoforms. The highest normalized RPK values were shown for "stressed leaves of 7-8 cm at reversible wilting point", "stressed leaves of 2-2.5 cm at reversible wilting point" and "fibrous roots" ( Figure S5 in the Supplementary Materials). Because such differences might have reflected general differences in gene expression, not related to the presence of MITEs, we calculated the MITE vs. gene (M/g) expression ratio, in which a normalized number of reads mapping to the co-transcribed MITE segment was divided by a normalized number of reads mapping to the whole gene, providing means to determine if the MITE-derived isoform was indeed differentially expressed in a tissue-specific manner. In total, we identified 3022 genes with normalized RPK > 1 and M/g > 1.9. Of those, 63%, 26% and 11% of genes carried MITEs localized in introns, upstream or downstream regions of genes, and transcripts, respectively (Table S4, File S2 in the Supplementary Materials).
To test the possible impact of MITEs on the differential usage of exons in different tissues, we retrieved genes for which differences in the maximum and minimum M/g values were higher than 1.5. From those, we randomly selected 40 genes carrying hAT-like elements in their UTRs, cds or introns (Table S5 in the Supplementary Materials). We found five genes for which the differential exon usage was reported, for isoforms containing hATlike MITEs (Figure S6-S10; File S3 in the Supplementary Materials). In the case of two genes (LOC108223373 and LOC108201852), the presence of hAT-like MITEs led to the formation of isoforms encompassing the whole MITE. The hAT-like MITE present in LOC108227539 was partially incorporated into the new exon, while MITEs within the other two genes (LOC108220652 and LOC108222651) were co-transcribed as alternative 5'UTR and 3'UTR, respectively. In general, regardless of the tissue, we observed the prevalence of one isoform. In the case of LOC108222651 and LOC108227539, the non-MITE-containing variant was generally more abundant; however, the expression of MITE-containing isoforms differed slightly among tissues. The expression of LOC108223373 was lower than the other genes ( Figure 3a

Identification of MITEs in Asterids
In order to compare the abundance of MITEs in the carrot genome to those in other Asterid species, we mined MITEs from eight Asterid reference genomes. The number of MITE families and the genome fraction occupied by MITEs were not correlated with the genome size (p-values = 0.37 and 0.79, respectively). The highest abundance and diversity of MITEs (428 families) was observed in the carrot genome, while only 83 MITE families were identified in the celery genome ( Figure 4; Table 2).
In addition, the largest proportion of the genome was attributed to MITE elements in the carrot (    MITEs present in genomes of Asterids from different families (Apiaceae, Solanaceae, Rubiaceae, Asteraceae) revealed no similarity above the threshold of 60% sequence identity over 60% of the sequence length. More similar MITEs were identified within Solanaceae (tomato, potato and pepper) and within Apiaceae (carrot, fennel, celery, and Java water dropwort) ( Table S6 in the Supplementary Materials). Thus, the number of related MITEs in different genomes corresponded to phylogenetic relationships among the species. In total, similar counterparts in Apiaceae were found for 150 carrot MITE families (Table S7, File S4 in the Supplementary Materials). Of those, nine MITE families were present in the four Apiaceae species, with six families representing Stowaways (DcSto1, DcSto4, DcSto6, DcSto10, DcSto29, and DcSto37), two families representing Tourists, and one hAT-like MITE family ( Figure S12 in the Supplementary Materials). Forty-three carrot MITE families were also present in two other Apiaceae species (20 shared with Java water dropwort and fennel, 18 with celery and fennel, and five with Java water dropwort and celery). The remaining 95 carrot MITE families were similar to MITEs from one other Apiaceae species (53, 27 and 15 were shared with fennel, Java water dropwort and celery, respectively).

Diversity and Abundance of Carrot MITEs
MITEs are the most abundant group of DNA transposons in plant genomes. Higher diversity and abundance of MITEs have also been reported in monocots than in dicots [26]. We identified 428 families comprising 31,025 copies in the carrot genome. Thus, the carrot stands out from most other dicot species and can be placed among few species, having relatively small genomes but containing large numbers of diverse MITEs, along with Medicago truncatula and mulberry as the most prominent examples. In the M. truncatula genome of 307 Mbp, 288 MITE families and 132,834 copies were identified [26] while the 357 Mb-long mulberry genome was reported to contain 90,789 full-length copies divided into 232 families [20]. Thus, even though the MITE copy number in the carrot is lower, they show remarkable diversity.
We identified 15 families, grouping more than 2000 MiMs residing in "TA" stretches. MITEs preferably inserted into microsatellites have been described in other plant genomes. Chen et al. [26] reported that MiMs were present in ten of 41 plant species [26]. MiMs were first described in rice as high copy-number MITEs having poorly recognizable TIRs and preferentially inserting into "TA" repeats [55]. MiMs from other species were characterized by similar features and, based on the similarity of their TIRs to related autonomous elements, they were attributed to the Mutator superfamily [39,56]. It has been suggested that the propensity to insert into "TA" stretches might reflect a strategy adopted to avoid removal from the genome by positioning into regions not subject to strong selective pressure [39]. However, it remains unclear how those elements propagate in the genome, whether autonomous Mutator elements drive their mobilization, and how mobilized copies are directed to "TA" microsatellites. An alternative mechanism assuming the propagation of MiMs via homologous recombination, preceded by the formation of extrachromosomal circular DNA (eccDNA) intermediates with microsatellite sequences at MiM ends, was proposed by Franco et al. [56].

Distribution of MITEs in the Carrot Genome
We showed that 16,693 carrot MITEs (54%) reside in genic regions, and 21% of those are co-transcribed with the nearby gene, possibly affecting gene expression or the fate of MITE-containing transcripts. MITEs can interact with genes in multiple ways. Upon insertion in regulatory regions, they may alter gene expression by providing transcription factor binding sites [19]. The presence of polyadenylation signals or splicing sites in the MITE sequence may lead to the formation of new isoforms. On the other hand, MITEs may down-regulate genes by altering methylation through RNA-directed methylation (RdDM) [14,15]. We observed that different carrot MITE groups were enriched in different genic contexts, which may suggest their possible involvement in different regulatory mechanisms. In general, elements inserted into the gene body (UTRs, cds or introns) were often co-transcribed. It was the most apparent in the case of hAT-like MITEs, which were inserted into genes relatively more frequently. Here, we demonstrated their engagement in the formation of new isoforms. In the case of two genes, MITE-containing isoforms were prevalent. Such a variability of transcripts may result in a measurable phenotypic effect. For instance, the insertion of a copia retrotransposon into the intron of a tomato gene Solyc02g079490 produced multiple isoforms without affecting the overall gene expression, but leading to overexpression of non-functional transcripts, ultimately resulting in the accumulation of 2-phenylethanol and giving the tomato fruit a pleasant floral aroma [57]. Carrot Tourist elements were preferentially localized in UTRs and regions upstream or downstream from genes, but only insertions residing in UTRs were present in transcripts. In contrast, Mutator-like MITEs, enriched not only in intergenic regions but also in introns, were rarely retained in transcripts. Stowaways were relatively rarely present in transcripts, despite being the most numerous group of carrot MITEs. However, it should be stressed that the overview of general trends observed for each MITE group was certainly biased by the behavior of the most numerous families. A recent report on carrot DcStos indicated strong family-specific preferences for their genomic positioning. Some DcSto families were shown to be enriched in genic regions, especially in the vicinity of genes encoding transcription factors [33]. Thus, a detailed characterization of individual families, including information about the expression of nearby genes, is necessary to understand and resolve the involvement of MITEs in the regulation of gene expression.

MITEs in Other Asterid Species
An in silico identification of transposable elements can be biased, and the number of identified repetitive sequences largely depends on the pipeline and stringency of parameters used for analysis. This was why we tested eight Asterid species using the same pipeline we applied for the carrot. The number of families mined from the tomato and potato genomes proved to be consistent with previous reports [26], confirming that the method was adequate, and the carrot genome was indeed characterized by a large diversity and abundance of MITEs. We showed that MITEs were species-specific; however, some MITE families were shared among species from the same botanical family. The number of MITE families shared among species reflected their phylogenetic relationships. We identified 150 carrot MITE families sharing a similarity with MITEs from at least one other Apiaceae species. The intra-family identity in sequence and length decreases over time due to random mutations. The approach we used was based on the 80-80-80 rule for grouping copies into families within the genome, while a less stringent criterion, requiring an identity of at least 60% over 60% of the sequence length, was applied for interspecific comparisons. Detection of nine MITE families shared among Apiaceae, which diverged by about 26 Mya [58], raises a question about their origin. It is possible that they have been retained owing to their functional role in the host genomes. The TE conservation may be related to wiring new transcriptional networks. For example, similar TFBS-carrying MITEs were reported to be shared among species within the Prunus and Solanum genera [19]. Similarly, Helitrons in A. thaliana are enriched in PHE1 binding sites, and their insertions up-regulate a number of genes in the endosperm. Some of those Helitron insertions are also present in orthologous genes of other Brassicaceae [59]. It was also shown that some ancient TEs identified in Brassicaceae were enriched in gene regulatory networks, e.g., the flowering gene network, suggesting their domestication and conservation for more than 100 million years [60]. Even though the preservation of MITEs in related species is possible, a horizontal transfer cannot be ruled out, as there has been evidence of TE horizontal transfers in animals and plants, and between viruses and their hosts [61][62][63][64].
Our results suggest that some MITEs may play regulatory functions in the carrot and other Apiaceae. We reported a carrot MITE repository and genome annotation providing means for further analysis of MITEs and their effect on the structure and function of carrot genes and genomes. Besides the analysis of the functional role and impact on the genome evolution, insertional polymorphism of MITEs belonging to newly described families can be utilized as molecular markers. Until now, MITE-based molecular markers were successfully used to study carrot diversity and population structure [35,36]. Identification of new MITE families, especially those associated with genes, provides a rich source of transposable element-associated structural variants (TEASVs) that can be used for TE-based genome-wide association mapping (TE-GWAS), an approach recently successfully applied in rice [65].

Conclusions
Carrot MITEs are exceptionally diverse and abundant. Their localization in the vicinity of genes and presence in transcribed regions points to their possible involvement in the regulation of gene expression and the formation of novel isoforms. The identification and characterization of carrot MITEs provide a basis for further studies on their functional impact. It also facilitates the use of MITE insertion polymorphisms to identify genetic factors associated with important agronomic traits of the carrot.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/genes12060859/s1, Figure S1: Box-plot showing the number of copies in each group of carrot MITEs. Figure S2: Abundance and genomic localization of carrot MITEs present in transcripts. Figure S3: Relationship between the number of copies in transcribed regions and the total number of copies representing each MITE family. Figure S4: Relative abundance of MITE groups in different genomic regions, divided into present (bolded) or absent in transcripts (p-value < 0.001). Figure S5: Number of normalized RNAseq reads (RPM) attributed to each MITE group. Figure S6: Splice variants and differential exon usage for LOC108223373. Figure S7: Splice variants and differential exon usage for LOC108201852. Figure S8: Splice variants and differential exon usage for LOC108227539. Figure S9: Splice variants and differential exon usage for LOC108220652. Figure S10: Splice variants and differential exon usage for LOC108222651. Figure S11: Plot showing the number of MITE copies present in Asterid species (y axis) related to the genome size (x axis). Figure S12: Relationships among MITE families shared by four Apiaceae species. Table S1: Carrot DH1 transcriptomes used for identification of MITEs in transcripts. Table S2: Asterid species used for identification of MITEs. Table S3: Carrot MITEs identified in transcripts. Table S4: Summary of MITEs present in transcripts and inserted in the vicinity/within genes with RPK > 1 and M/g > 1.9. Table S5: List of randomly selected genes containing hAT-like MITE insertions assigned to UTRs, cds or introns, tested for MITE-related differential exon usage.