Transposable elements (TEs) are DNA sequences able to replicate and insert within genomes [1
]. Since their first discovery [3
], TEs are known to impact host genome evolution and cell functions in several ways; for example, they can modify gene regulatory networks, mediate genomic rearrangement or even be exapted as new genes or gene components [2
]. They are distributed across all living being genomes and can be classified on the basis of their replication mechanisms and sequence structures [1
]. Class I TEs replicate via an RNA intermediate, through copy-and-paste mechanisms, while class II TEs proliferate through a DNA intermediate, using different molecular strategies. Both classes include autonomous and non-autonomous elements: the former are able to code for the protein(s) necessary for their replication/integration, while the latter have to exploit the enzymatic machinery of an autonomous partner element.
Short interspersed elements (SINEs) and miniature inverted-repeat transposable elements (MITEs), which belong to class I and class II, respectively, are the most successful non-autonomous elements [4
]. SINEs are widespread among vertebrates and they were found well-represented in cartilaginous fishes, the coelacanth and mammals [6
]. They are very short sequences (200–600 bp) composed by three “modules”: an RNA-related head, which may originate from tRNA, 7SL, or 5S rDNA genes, a tail homologous to that of long interspersed repeats (LINEs), which allows to exploit the LINE reverse transcriptase, and a body linking the head and the tail [4
]. The origin of the body is unknown in the majority of SINEs, although a LINE-derived origin has been proposed [7
]. Surprisingly, it has been observed that some SINEs, having different origins and with different taxonomic distributions, may share a highly-conserved domain (HCD) [4
]. Nine HCDs have been discovered so far in vertebrates, echinoderms, arthropods, molluscs, annelids and cnidarians [8
]. The discovery of HCDs in fast evolving sequences like SINEs, raised the question of why they are highly conserved. Many hypotheses have been put forward in the last two decades, including the possibility to mediate the SINE-LINE tail exchange [17
], the module exchange [8
], or even conferring an advantage to the host genome [18
]. In this view, it has to be considered that some HCD-SINE underwent exaptation, being domesticated as gene enhancers [19
]. Moreover, an implication for some HCDs, such as CORE, Deu and V domains, as regulatory elements has been suggested based on mRNAs and miRNA genes analyses in fishes [23
SINEs carrying the ~120 bp long V HCD are among the most widespread among metazoans, but they apparently lack in amniote genomes; on the other hand, the V domain has been found within a DNA transposon, called MER6, in the human genome [10
]. Ogiwara and co-workers [10
] concluded that V-homologous sequences could be either a genomic debris derived from a V-SINE inserted into the DNA element, or could have been conserved for some, yet uncharacterized, functional advantage for the host genome. The DNA transposon MER6 has been classified in Repbase Update, the database of repeated DNAs [25
], and in previous literature [26
] as a Tc1/Mariner DNA element. MER6 does not show any protein coding potential, as also suggested by its short length (865 bp); it shows two terminal inverted repeats (TIR) which are 24 bp long and insertions are surrounded by TA target site duplication [26
]. Overall, these features help to identify MER6 as a MITE [27
]. Moreover, MER6 has a homologous element, called MER6A: the consensus sequences of the two MITEs diverge by 0.3%, and the only structural difference is a 261 bp long internal deletion in the MER6A repeat.
As both MER6 and MER6A (henceforth collectively referred to as simply MER6) show the conserved V domain, it is interesting to investigate about their origin, including that of the V HCD, to highlight their evolutionary dynamics. For this purpose, a genome-wide analysis was undertaken on representative genomes of amniote (mammals, reptiles and birds) and non-amniote vertebrates (amphibians). The actual taxonomic distribution of MER6 as well as the parental TEs that gave origin to MER6 are reported here. Moreover, similarly to some HCD-carrying SINEs, MER6 elements were also found embedded in messenger RNAs (mRNAs).
In this work, the first detailed genome-wide analysis of MER6 and MER6a non-autonomous DNA-transposons is reported. Although known so far as the unique mobile element carrying the V conserved domain in mammals (and in amniote in general) [10
], the evolution and genomic dynamics of these elements have never been investigated.
MER6 and MER6A have been isolated for the first time in H. sapiens
and later labelled as primate-specific elements [26
]. Data presented here depict a far wider distribution for the variant MER6A, which is also present in bats (order Chiroptera) and in eulipotyphlans (order Eulipotyphla). Looking at mammalian phylogeny (Figure 5
), MER6A was present in two different lineages: the Euarchontoglires, to which primates belong, and the Laurasiatheria, which includes bats and eulipotyphlans. The differential distribution of MER6 and MER6A suggests that their divergence could date back to 96 Million years ago (Mya) (C.I. = 91–102 Mya), when the two mammalian clades split, and that in the Laurasiatheria lineage only MER6A survived. Moreover, within Laurasiatheria, the element further diversified, as suggested by the slightly different sequence structure in eulipotyphlan genomes. On the other hand, MER6A was not found in non-primates euarchontoglires and/or in other laurasiatherian lineages (Figure 5
). In their evaluation of the timing of DNA transposon activity in the human genome, Pace and Feschotte [28
] calculated the age of MER6 and MER6A as comprised between 50 Mya and 70 Mya, roughly corresponding to the initial amplification wave of Tc1/Mariner elements. Although TEs’ age estimates based on sequence divergence should be taken cautiously, because of possible bias and variation in substitution rates, it is interesting to note that considering the split at 96 Mya the substitution rate experienced by MER6 elements varies between around 2.1 × 10−9
(MER6) and 2.4 × 10−9
(MER6A) substitutions/site/year. Notably, these estimates overlap the neutral nucleotide substitution rates estimated in mammals, 2.2 × 10−9
], or in bats, 2.37 × 10−9
]. Therefore, considering the phylogenetic distribution of these elements and the concordance with rough estimates of substitution rates, it is likely that the main activity of MER6 elements dates back to the origin of the Boreoeutheria crown group (Figure 5
). As an alternative hypothesis, which might reconcile the taxonomic distribution and the suggested later activity [28
], it could be suggested that MER6A might have been acquired in the early Laurasiatheria lineage by horizontal transfer. TEs’ horizontal transfers in mammals are quite rare, although some of these events, involving hAT, piggyBac and Tc1/Mariner elements, have been scored in vespertilionid bats (which include the presently analysed M. lucifugus
]. On the other hand, the separation between Eulipotyphla and Chiroptera, which both harbour MER6A, occurred between 81 Mya and 96 Mya, earlier than previously estimated repeats’ age of 50–70 Mya [28
]. This would suggest, therefore, that multiple horizontal transfers might have occurred which appears, though, even more unlikely. It is to be noted, finally, that some technical differences may also explain the difference observed between Pace and Feschotte’s age estimates [28
] and those provided in the present analysis. Their age estimates have been calculated with a different evolutionary model, using substitution rates estimated on the taxonomic distribution of transposons’ orthologous insertions and nested insertions within primate-specific retrotransposons [28
]. However, their comparative genomics analysis was more limited than that presently reported and did not include Eulipotyphla and Chiroptera [28
]: thus, they may have obtained biased substitution rate estimates. Moreover, Pace and Feschotte did not rule out the possibility that some of the analysed transposon families were actually older and that they were analysing just primate-specific transposon clades [28
The phylogenetic analysis of the 8588 repeat sequences failed to retrieve a meaningful clustering pattern, i.e., based on host species (or on other taxonomic level) or on repeats variant (MER6 vs. MER6A). The use of either the Jukes–Cantor or the GTR models, the latter being more parametrized than the former, does not result in different clustering pattern, suggesting that the model choice does not significantly affect the analysis. The lack of phylogenetic structure is even more striking considering the structural difference between MER6A elements from bats and eulipotyphlans. In this regard, the absence of the phylogenetic signal could be explained in different, alternative ways. Long-term inactivity could lead to random mutations accumulation on existing TEs copies blurring the phylogenetic signal. In the present analysis there was no evidence of recent activity: in fact, if the calculated substitution rate is applied on the younger copies found among all genomes (>12% of Jukes–Cantor divergence from the consensus), it could be obtained a rough age estimate for the end of MER6 elements wave at about 54.5 Mya. Generally speaking, DNA elements in mammalian genomes are very little represented and, in agreement with the present rough estimate, previous data indicated that their activity ceased almost completely in the anthropoid lineage around 40 Mya [28
]. It is to be noted, though, that clear signs of recent activity have been found in the primate Microcebus murinus
and the microbat Myotis lucifugus
genomes, even if this evidence refers to different DNA elements than MER6 [30
]. An alternative view is that some form of selection is acting on these repeats, maintaining a substantial homogeneity, but the extent of variability scored among their sequences actually suggests the opposite. Overall, data presented here suggest that MER6 and MER6A were active during the early phase of Boreoeutheria diversification, and that the two repeats differentially invaded mammalian genomes.
The anatomical dissection of MER6 elements structure further depicts a complex scenario for their origin and evolution. Based on terminal inverted repeats similarity, the parental autonomous Tc1/Mariner element was identified as the Mariner-2_CPB element from the genome of the turtle C. picta
. Moreover, also the source of the V HCD was identified, as within MER6 elements a sequence fragment homologous to a nearly full-length fish-specific V-SINE, namely Ac1, was found. These evidences suggest that MER6 emerged first as internal deletion derivative of a Tc1/Mariner element homologous to Mariner-2_CPB and then an Ac1-like element retro-transposed within the new MITE (Figure 6
While the chimeric origin of a TE is not unexpected [35
], the absence of MER6 elements parental components from amniote genomes is striking. In fact, the possible chimeric origin clearly requires that both the Mariner and SINE component were simultaneously present in the same genome at the moment of the chimera assembly. On the other hand, this element, or any other related element, was not found in the mammalian genome (with the exception of obvious sequence similarities at the transposase protein coding region level). The taxonomic distribution of MER6 elements suggests that they most likely originated within the ancestral Boreoeutheria lineage or, at least, during the differentiation of Eutheria (Figure 5
). However, it cannot be excluded that they just went extinct in the other mammalian lineages, as well as in bird and other reptiles. In this instance, a different evolutionary scenario could be that both the Tc1/Mariner and the SINE elements were present in the reptile genome, where the ancestral MER6 sequence was assembled. However, Ac1-like elements, or even the presence of the V HCD hosted in other SINE families, were not found either in the turtle or in other reptile genomes. Yet, the clear-cut divergence between Ac1-like SINEs and their homologous region within MER6 elements points out either to an ancestral split, before the SINEs replication wave occurred in bony fishes, or to a past horizontal transfer event from a yet unidentified fish species to the early amniote lineage.
Although the present analysis cannot provide direct evidence of the origin of MER6 elements, having determined the structural original components of these MITEs highlights a unique, complex evolutionary history that developed across vertebrates.
It is worth noting that about the 40% of MER6 elements insertions have been scored within genes or in their flanking regions (±5000 bp) in all assayed genomes, with the exception of C. syrichta
which has a significantly lower proportion of insertions within genic regions. This difference is difficult to explain without any ad hoc
explanation, like some selective pressure preventing the conservation of MER6 element insertions in this particular genome. On the other hand, it cannot be excluded a technical bias. The C. syrichta
genome is, in fact, by far the most fragmented among those presently analysed (Supplementary Table S2
): therefore, it is likely that several selected flanking regions (and, possibly, also some genes) could have been picked up only partially, at variance with more contiguous genomes. In addition to the C. syrichta
genome, though, the similar proportion of MER6 elements insertions across primates, chiropterans and eulipotyphlans appears to point out the absence of differential selective pressure against or favouring the insertion within genes. A more detailed analysis on the presence of MER6 elements insertions within transcripts revealed several mRNAs and lncRNAs including these MITEs. Interestingly, MER6-homologous regions constitute a portion of the protein coding region in two mRNAs. This suggests that MER6 sequences underwent exaptation, an event known to produce evolutionary novelties and already observed for both DNA transposons and retrotransposons (reviewed in [2
]). However, in addition to these two examples, the remaining MER6 elements insertions were found in the 5’ and 3’ UTRs. In a V-SINEs survey in fish genomes, it has been found that these elements can be frequently found in mRNAs and, therefore, it has been hypothesized that the V HCD could have some regulatory function [23
]. A similar hypothesis has been also suggested for CORE and Deu HCDs found in the elephant shark’s SINEs [24
]. It is, therefore, possible that the presence of MER6/MER6A insertions in mRNAs could have a similar role.
The V HCD has never been found in amniote genomes as a part of active V-SINE elements [10
]: in fact, the V-SINE included in MER6 MITEs lost its head module, which prevent the transcription and, therefore, the independent replication of the retroelement itself [4
]. The replication of the analysed MITEs helped to distribute the V HCD within primate, chiropteran and eulipotyphlan lineages, contributing to its spreading within these genomes. On the other hand, in the majority of analysed mammalian lineages, including primates, DNA elements are generally less represented and generally less active [28
]. Therefore, the extent of genomic distribution of V HCD within mammalian genomes appears more limited that those observed in non-amniotic genomes.
A metazoan-wide survey of HCDs distribution, including the V one, suggested that their high conservation cannot be attributed to horizontal transfer events: on the contrary, phylogenetic and age vs. divergence analyses indicated a pattern of vertical inheritance [16
]. Our data on the taxonomic distribution and evolutionary history of MER6 MITEs seem to confirm this pattern and provide evidence of the widespread occurrence and conservation of the V HCD in three different mammalian lineages. The presence of MER6 insertions within genes and within mRNAs could suggest some functional roles that are worth to be better investigated in future analyses. Overall, data presented here clearly show how complex interplays between different TEs could give raise to new forms of elements with the potential to impact on the host genome and cell functions.
4. Materials and Methods
Consensus sequences of MER6 and MER6A were downloaded from Repbase Update [25
] (accessed on January 2019). They were used as query sequences to probe 85 representative tetrapod genomes (43 mammalians, 21 birds, 13 reptiles and eight amphibians; Supplementary Table S3
) with BLAST [37
], using the blastn
algorithm with default parameters and e
-value <1 × 10−10
. Standard BLAST output files were transformed into FASTA alignment files through Mview v. 1.6 [38
] and the query coverage was calculated as the number of nucleotides aligned in each position using the profile
function of the R library seqinr [39
MER6 and MER6A copies were obtained after running RepeatMasker [40
] on genome assemblies, with default parameters. Only positive hits covering >50% of the query sequence were taken into account, in order to avoid overlapping similarities between MER6 and MER6A. Then, to refine the copy number estimation, the Onecodetofindthemall.pl script [41
] was used to merge adjacent fragments and isolate each copy from the respective genome (using the built-in --fasta
The putative parental autonomous element was searched using BLAST, with parameter as described above, in the Tc1/Mariner collection found in RepBase (accessed on January 2019). Moreover, the internal region carrying the V HCD was compared with the SINE collection found in RepBase (accessed on January 2019) using the same search method.
SINEs matching the MER6 SINE-homologous region were retrieved by searching into Genbank nucleotide database (nt/nr; accessed on August 2019) using a BLAST search as described above.
Sequence alignments were performed using MAFFT v.7 [42
], using the FFT-NS-1 parameter set. The analysis of sequence divergence from the relative species-specific consensus sequence was used to tentatively estimate the MITEs activity through time, considering the divergence a proxy for time since the insertion. According to this principle, highly divergent copies from the consensus indicate past activity, while less divergent copies would indicate a more recent activity [43
]. The divergence was estimated using the Jukes–Cantor nucleotide substitution model, which accounts for possible multiple substitutions. Phylogenetic analyses were carried out using FastTree v. 2.1.11 [44
] with default parameters, setting either the Jukes–Cantor or the GTR (General Time Reversible) nucleotide substitution models with the CAT approximation for the among site variation.
MER6 elements within genes ±5000 bp flanking regions were found by analysing the overlap between insertions and gene annotations, obtained from Genbank, through BEDTools v. 2.17 [45
] using the intersectBed
Mammalian phylogeny and time estimates of cladogenetic events were obtained from TimeTree.org (last accessed on August 2019) [46
]. The MER6 elements substitution rate was derived from the formula T
, where T
is the time of the cladogenetic event, D
is the divergence from the consensus and s
is the substitution rate [47