Next Article in Journal
A Multi-Well Method for the CD138 and AML/MDS FISH Testing of Multiple Biomarkers on a Single Slide in Multiple Myeloma and AML/MDS Patients
Previous Article in Journal
Long-Term DNA Storage of Challenging Forensic Casework Samples at Room Temperature
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Of Short Interspersed Nuclear Elements, Long Interspersed Nuclear Elements and Leeches: Identification and Molecular Characterization of Transposable Elements in Leech Genomes

by
Christian Müller
Animal Physiology, Zoological Institute and Museum, University of Greifswald, Felix-Hausdorff-Str. 1, D-17489 Greifswald, Germany
Submission received: 25 March 2025 / Revised: 11 May 2025 / Accepted: 15 May 2025 / Published: 10 June 2025

Abstract

:
Backround/Objectives: Mobile genetic elements (MGEs), in general, and transposable elements (TEs), in particular, constitute a major part of almost every eukaryotic genome, and several types of such elements have been classified based on size, genetic structure and transposition intermediate. Methods: The fast-growing availability of whole genome sequences of species across the living world provides almost unlimited possibilities for in-depth molecular analyses of all kinds, including the search for TEs. The aim of the present study was to perform the first molecular description and characterization of selected MGEs in leeches, namely, short interspersed nuclear element (SINE), long interspersed nuclear element (LINE) and long terminal repeat (LTR) retrotransposons. Results: Several representatives of all three groups of TEs could be identified, and some of the newly described elements display unique structural features compared to the archetype elements of the respective groups. Conclusions: Non-model organisms like leeches are an excellent source for new information on long-term studied objects like TEs and may provide new insights into the diversity and the putative biological impact of these MGEs.

Graphical Abstract

1. Introduction

Transposable elements (TEs) are present in almost every complex eukaryotic genome and comprise up to 20% of the total genome size in fungi, up to 50% in metazoans and up to 90% in plants [1]. TEs, hence, not only shape the structure of a genome but may also change it due to their ability to move and replicate. As a consequence, TEs contribute to genomic plasticity and are major drivers of genome evolution [2]. As such, their mobility may result in beneficial (gain or enhancement of traits) or detrimental (loss or change of traits, disease development) outcomes for the affected cell and the whole individual.
TEs have been classified according to their transposition intermediate (RNA or DNA) as class I elements (or retrotransposons) and class II elements (or DNA transposons) [3]. Later, Wicker et al. [1] introduced a unified classification system for eukaryotic TEs that includes subclasses, orders, superfamilies, families and subfamilies. Elements may either transpose autonomously or non-autonomously depending on whether or not they encode for proteins that catalyze the retrotransposition event [2,4]. Autonomous non-long terminal repeat (LTR) retrotransposons enclose the superfamily of long interspersed nuclear elements (LINEs), whereas the superfamily of short interspersed nuclear elements (SINEs) transpose non-autonomously and depend on the trans activity of their respective LINE counterparts for mobility [5,6]. For that, SINEs have to share a 3′ end sequence with their corresponding LINE. The shared sequence can be very specific to form a stringent SINE/LINE pair or can be an unspecific polyA tail to form a relaxed SINE/LINE pair [7,8]. PolyA tails are usually generated during mRNA synthesis by RNA polymerase II (pol II) but can also be generated by RNA polymerase III (pol III) during the transcription of respective SINEs. The latter process requires the presence of a polyadenylation signal (pAS, AATAAA) and a pol III terminator sequence (TCTTT) within the SINE sequence [8,9,10]. Respective SINEs belong to the T+-class, whereas SINEs lacking both sequence signals belong to the T-class and already contain an A-rich tail [11]. Both T+- and T-class SINEs can hence “cooperate” with various LINEs as long as the respective LINE encodes a polyA tail by itself.
LINEs are about 3–6 kb in size and comprise one or more open reading frame(s) (ORF(s)) that code for at least two proteins that facilitate reverse transcription and transposition, a reverse transcriptase (RT) and an endonuclease, but additional domains, like for an RNAse H (RH), might be present. The presence and arrangement of the respective domains form the basis for LINE classification [1]. In addition to the domains, several sequence motifs are frequently present in LINE-encoded proteins, including the CCHC zinc finger knuckle and an RNA-recognition motif (RRM) [12,13]. A full RRM motif, in turn, is composed of two short motifs, known as ribonucleoprotein (RNP) motifs RNP1 and RNP2 [14]. SINEs, in contrast, are much smaller (app. 80–500 bp in size) [1,15] and are composed of a head structure that contains a pol III promoter region, a central core domain and a LINE-related segment [16]. SINE promoters can be derived from either tRNA, 5S or 7SL genes, respectively [15]. As a basic rule, both LINEs and SINEs generate target site duplications (TSDs) of variable length upon insertion that can be used to annotate and classify the respective element [17,18], but TSDs may mutate and degrade over time [19].
Leeches (Figure 1) belong to the phylum Annelida (segmented worms) and the class Clitellata. They are globally distributed, with the exception of Antarctica, and about 700 leech species have been described so far [20]. However, the actual diversity of leeches might be much higher [21]. Whereas some leeches are predators, others are hematophagous and require regular blood meals for growth, development and reproduction [22]. Leeches have been used for medical purposes for thousands of years in many cultures worldwide [23]. In 2004, leeches were approved as a medical device by the US Food and Drug Administration (FDA). In Germany, leeches for medical purposes are listed as drugs. To ensure an undisturbed and saturating blood meal, leeches secrete a great variety of bioactive substances into the bite, among them, factors that interfere with the coagulation cascade, inhibit inflammation and prevent pain sensation [24,25]. Despite the great biopharmaceutical potential, the thrombin inhibitor hirudin is the only leech-derived compound that has found its way from nature to clinical application [26]. Over the last years, the whole genome sequences of a few leech species have been determined, namely, Helobdella robusta Shankland, Bissen and Weisblat, 1992 [27], Hirudinaria manillensis Lesson, 1842 [28], H. medicinalis [29,30], Whitmania pigra Blanchard, 1887 [31] and Hirudo verbana Carena, 1820 [32]. A detailed analysis of the H. medicinalis and H. manillensis genomes using the RepeatMasker and RepeatModeler pipelines revealed the presence of a variety of putative TEs, including DNA transposons, LTR-retrotransposons, LINEs and SINEs with copy numbers between three and several thousand for each particular element (Supplementary Information S1 in [29]; Supplementary Materials Table S2 in [28]). According to Zhao et al. [33], about 30% of the total genome size of Hirudo nipponia Whitmann, 1886, and Hirudo tianjinensis Liu, sp. nov., is formed by repeat elements. However, a more detailed analysis and in-depth characterization of leech-derived putative TEs is missing. In the present study, the author describes the identification and molecular characterization of SINEs that transposed into hirudin genes of H. verbana and H. manillensis and analyzes their relationship to respective TEs (SINEs, LINEs and LTR elements) of different leech taxa. The aim of this study was, hence, to obtain a first impression of the presence, distribution and molecular structure of different types of TEs in leeches.

2. Methods and Materials

2.1. Genome and Transcriptome Data

Leech genome data for H. robusta (GenBank accession number GCA_000326865.1), H. manillensis (GCA_034509925.1), H. medicinalis (GCA_011800805.1), W. pigra (GCA_041430665.1) and H. verbana (GCA_020137395.1) are freely accessible and searchable through the NCBI database. Available transcriptome data were used to complement the genome-based investigations when necessary.

2.2. Sources of Reference Sequences

The following references were used to identify the molecular signatures of TE-related domains and motifs:
DomainReference
Apurinic endonuclease (APE)[34,35]
Aspartic protease (AP)[36,37]
Integrase (IN)[38,39]
Restriction-like endonuclease (RLE)[35]
Reverse Transkriptase (RT)[38,40,41,42]
RNA recognition motif (RRM)[12,43]
RNase H (RH)[44,45,46]
Tyrosin recombinase (YR)[47,48,49]
Zinc finger knuckle motif (CCHC)[50]

2.3. Bioinformatics Tools

Basic Local Alignment Search Tool (BLAST) analyses were performed using the respective NCBI web portal and the following settings for nucleotide (blastn) searches: expected threshold: 0.05 and word size: 11.
Multiple sequence alignments were generated using the CLC Sequence Viewer software package v8.0 (CLC bio) and the following settings: gap open cost: 5.0; gap extension cost: 2.0; and end gap cost: free.
Phylogenetic trees were generated using the CLC Sequence Viewer software package v8.0 and the following parameters: tree construction method: Unweighted Pair Group Method with Arithmetic mean (UPGMA) algorithm [51]; nucleotide distance measure: Jukes–Cantor model [52]; and replicates for bootstrap analysis: 10.000.
Putative TSDs were identified using the Web-based tool “Repeats Finder for DNA/Protein Sequences” (https://www.novoprolabs.com/tools/repeats-sequences-finder) (accessed several times between August 2022 and August 2024).

3. Results

3.1. Identification and Characterization of HvSINE1

In previous studies, we determined the gene structures of several hirudin and hirudin-like factor (HLF) genes, including hirudin variants HV1, HV2 and HV3 of Hirudo medicinalis [53] and Hirudo verbana [54]. In all cases, the genes shared a highly conserved structure not only in exon and intron number but also in terms of position and size. Our findings were confirmed upon the availability of whole genome data of H. medicinalis [29,30]. Recently, whole genome data of H. verbana became accessible via GenBank (BioProject PRJNA55103, GenBank accession number GCA_020137395.1), and a detailed analysis revealed remarkable differences in the structure of the HV1 gene of the particular H. verbana biosample that was used for the study by Paulsen et al. [32] compared to both the biosamples of our own studies (GenBank accession numbers KX215734.1 and KX215735.1 for H. verbana and KR066930.1 and KR066931.1 for H. medicinalis) and the investigations by Kvist et al. [29] and Babenko et al. [30]. The sizes of introns 2 and 3 differ by 210 and 25 bp, respectively (Table 1). Since the observed differences are limited to introns 2 and 3, sequencing-technology-based explanations (e.g., higher error rates in NGS sequencing approaches compared to Sanger sequencing technology) seem very unlikely.
Whereas the alterations in intron 3 seem to be randomly distributed, a multiple sequence alignment revealed the presence of an additional continuous sequence stretch in intron 2 (Figure 2, marked in cyan in Supplementary Material Figure S1).
A nucleotide BLAST search (blastn) against the whole genomes of both H. verbana and H. medicinalis using the sequence of the putative TE revealed the presence of thousands of copies in both leech species. The best 50 hits within the genome of H. verbana were extracted and aligned to determine the consensus sequence of the putative transposable element (TE) (Supplementary Material Figure S2). Strikingly, no evidence for target site duplications (TSDs) could be found.
To decipher to what class of TEs the element might belong, the promoter sequences of the tRNA, 5S and 7SL genes of H. verbana or H. medicinalis were determined and used to assign the putative TE. The predicted promoter sequence does not perfectly match with one or the other archetype sequences, but the structure (Box 1 and Box 2), sequence and spacing between the boxes strongly point to a tRNA-gene-derived promoter (Table 2).
Taken together, the putative TE very likely represents a SINE and was hence termed HvSINE1.

3.2. Identification and Characterization of HvSINE2–4

Based on the core domain of HvSINE1 as the query sequence, additional nucleotide BLAST (blastn) searches were performed and revealed evidence for the presence of related SINEs in the H. verbana genome. The respective elements were termed HvSINE2–4; the nucleotide sequences, a multiple sequence alignment and schematic drawings of all four elements are provided in Supplementary Material Figure S3. The four SINEs show overall degrees of sequence similarity between 35 and 77%, with HvSINE4 being the most distinct member of the family (Figure S3). All four elements contain tRNA-gene-derived promoters (Table 2) and share a common core domain of 55 bp in size (underlined in Figure S3) but differ in their putative LINE-specific segments. Strikingly, only HvSINE4 contains a short repeat sequence at the 3′ end (highlighted in Figure S3), an otherwise typical feature of SINEs [16].
The abundances of HvSINE1–4 differ markedly in the genomes of both H. verbana and H. medicinalis: whereas HvSINE1 is present in very high numbers, HvSINE2 and HvSINE3 each occur in a single copy only. For HvSINE4, 21 copies in H. verbana and 14 copies in H. medicinalis comprise the whole sequence, whereas about 200 copies in each genome contain the head and core domains but lack the putative LINE-specific segment (Table 3).

3.3. Tissue-Specific Expression of HvSINE Sequences

Like all retrotransposons, SINEs transpose via a “copy–paste mechanism”, including transcription of the element [11]. It should, hence, be possible to detect the respective SINE sequences in transcriptome datasets as well. Several tissue-specific transcriptome datasets of either H. verbana or H. medicinalis, including muscle (SRX3875125), salivary gland (SRX3875124), central nervous system (CNS) (SRX3742574), ganglion (SRX9699081, SRX9699082, SRX9699083) and head (SRX5257616), were analyzed. Both HvSINE1 and HvSINE4 sequences could be detected in all datasets, whereas the expression of HvSINE2 and HvSINE3 seemed to be restricted to neuronal tissue (Table 4).

3.4. Presence of HvSINE1-like Elements in Leech and Annelid Species

Nucleotide BLAST (blastn) searches in genome and/or transcriptome datasets of various leech and annelid species using HvSINE1 as the query sequence indicated that the presence of HvSINE1-like elements is restricted to merely a handful of Eurasian members of the family Hirudinidae. Among them are two non-hematophagous leeches, namely, Haemopis sanguisuga Linnaeus, 1758, and Whitmania pigra. As for H. verbana and H. medicinalis, for each leech species, several distinct SINEs could be identified. The sequence data for all elements are summarized in Supplementary Material Figure S4 (Hirudinaria manillensis), Figure S5 (W. pigra), Figure S6 (H. sanguisuga) and Figure S7 (Hirudo nipponia). Detailed information about the phylogenetic relationships among the selected leech and annelid species can be obtained from Phillips and Siddall [58] and Phillips et al. [59].

3.5. Phylogenetic Analyses Based on the HvSINE1 Sequence

The presence of SINEs in almost all vertebrate and invertebrate taxa makes them promising candidates as markers for molecular phylogeny and systematics [60,61,62]. To obtain an impression of whether or not leech SINEs might be useful tools for phylogenetic analyses as well, trees were constructed based on either partial cytochrome C subunit I (coi) nucleotide sequences that are commonly used for DNA barcoding or on HvSINE1-like sequences. The best matches to HvSINE1 in every leech species were selected and included in the analysis. The coi-sequence of Lumbricus terrestris Linnaeus, 1758, was chosen as the outgroup for the coi-derived tree. Since L. terrestris does not contain an HvSINE1-like element (see Table 5), the sequence of HvSINE4 was selected as the outgroup for the HvSINE1-derived tree. Both multiple sequence alignments and the original trees are provided in Supplementary Material Figure S17. The resulting trees were manually redrawn to illustrate the basic principles, not to display the actual distances. As can be seen in Figure 3, the trees do not match that well. To evaluate and possibly improve the results, respective trees were constructed using the coi and SINE1 sequences of Crassostrea (Magallana) gigas Thunberg, 1793 (additional file 10: Supplementary Data S1 of Martelossi et al. [63] for the SINE1 sequence), as outgroups instead of HvSINE4 and the coi sequence of L. terrestris. The resulting trees, however, possessed exactly the same topology and almost the same bootstrap values as the original trees (the multiple sequence alignments and the original trees are provided in Supplementary Material Figure S17). The information content that is stored in SINE nucleotide sequences might, hence, be too limited to draw reliable conclusions compared to the use of traditional molecular markers, like coi sequences. Instead, the presence/absence patterns (see Table 5) may provide more robust information. Taken together, SINEs might be considered as useful additional molecular markers for phylogenetic analyses in leeches.

3.6. Identification and Characterization of HmSINE_V2

In a recent manuscript, we described the identification of a Tandem-hirudin (TH), including the corresponding gene in H. manillensis [64]. In contrast to the archetype hirudin gene, the TH gene is composed of six exons and five introns. Within the fifth exon (565 bp in size), a quite unusual stretch of 18 thymine residues giving rise to a polyA tail in the reverse-complementary orientation caught our attention. SINEs of the T-class contain an A-rich tail, and a thorough analysis of the exon 5 of the TH gene indeed revealed strong evidence for the presence of yet another SINE in H. manillensis. The putative TE is very different from the HvSINE1-like elements described above and was termed HmSINE_V2; its sequence and a schematic drawing are provided in Supplementary Material Figure S8. About 50 copies of HmSINE_V2 are present in the genome of H. manillensis. Interestingly, searches against the genomes of H. verbana/medicinalis and W. pigra revealed the presence of similar elements that, however, did not cover the entire sequence of HmSINE_V2 but started only at position 151 (the respective matching sequence is underlined in Figure S8). Another quite curious aspect of HmSINE_V2 is the presence of both a pAS and pol III terminator sequence in addition to the A-rich tail. The element, hence, comprises features of both T+- and T-class SINEs. Similar elements have never been described before.

3.7. Identification of Corresponding LINEs

SINEs are non-autonomous TEs and depend on the trans activity of their respective LINE counterparts for mobility, but the author failed to identify the corresponding LINEs for HvSINEs1–3. However, for both HvSINE4 and HmSINE_V2, putative matching LINEs could be identified. The elements were termed HvLINE1 and HmLINE1, respectively; the sequences (both nucleotide and derived amino acid sequences of predicted open reading frames (ORFs)) are provided in Supplementary Material Figures S9 and S10. The elements are about 4.4 kb (HvLINE1) and 3.5 kb (HmLINE1) in size. HmLINE1 is flanked by a putative 8 bp TSD and contains a single ORF that encodes a protein of 946 amino acid residues in length. For HvLINE1, no putative TSD could be determined. The element contains four ORFs, the first in reverse-complementary orientation. The structures of both elements are represented in Figure 4. HvSINE4 and HvLINE1 share a stretch of 48 bp in length (double underlined in Supplementary Material Figures S3 and S9), a typical size for LINE-related segments in SINEs [16]. In contrast, the entire sequence of HmSINE_V2 is present in HmLINE1, including the stop codon of the ORF (underlined in Supplementary Material Figure S10).
Both HvLINE1 and HmLINE1 encode AP, RT and RH domains and a CCHC motif, respectively, but only HvLINE1 encodes a full RRM. Strikingly, the AP domain and the two RNP motifs of HvLINE1 are encoded by different ORFs that are oriented in opposite directions to each other (see Figure 4).
Further analyses led to the identification of additional putative LINEs in H. verbana, named HvLINE2–4. The nucleotide and amino acid sequences of HvLINEs 2–4 are provided in Supplementary Material Figures S11–S13. All four LINEs of H. verbana display different structures in terms of overall size, ORF number and size, presence and localization of CCHC and RRM motifs and presence of TSDs (Figure 4), highlighting the great diversity of such TEs, even within a single species.
Only HvLINE2 displays the “classical” architecture of a LINE encompassing two ORFs: the first encoding a basic protein (pI value 9.41) including three CCHC motifs and the second encoding a multi-domain protein with putative domains for APE, RT and RH and a single C-terminal CCHC motif (Figure 4 and Figure S11). The presence and order of domains permit a classification into either the L1 group or the I group (Kojima 2019). HvLINE3 is in large parts comparable to HvLINE2, with the exception of the first ORF that is split into two separate ORFs (Figure 4 and Figure S12). It is very likely that the proteins that are encoded by both ORFs form a heterodimer: the molecular mass of the putative heterodimer (44.7 kDa) is almost identical to the molecular mass of the protein encoded by ORF1 of HvLINE1 (44.8 kDa). In contrast, HvLINE4 belongs to a different superfamily of LINEs. The element comprises an RLE domain instead of an APE domain, a feature typical for the so-called “early branched non-LTR retrotransposons” [35].
For HmLINE1, nucleotide BLAST (blastn) analyses revealed the presence of a similar element in W. pigra, named WpLINE1. In contrast to HmLINE1, WpLINE1 contains not only one but four ORFs. However, most likely, the ORFs1–3 belong to a putative LTR-retrotransposon (named WpLTRE1) that integrated into WpLINE1, disrupting the “original” single ORF of WpLINE1 and creating a “patchwork” TE (Figure 4). WpLTRE1 itself is flanked by direct repeats of 105 bp in size (highlighted in purple in Figure 4 and Supplementary Material Figure S14). When eliminating WpLTRE1 and manually reconstructing the ORF of WpLINE1, the elements encode two putative APE domains (Figure 4 and Figure S14). Both HmLINE1 and WpLINE1 contain an A-rich tail immediately downstream of the pol III terminator sequence (Figures S10 and S14).

3.8. Abundance of WpLTRs

The entire sequence of WpLTRE1, including the LTRs, was used to evaluate whether or not additional copies of the element are present in the genome of W. pigra. It turned out that this is indeed the case. All sequences are of about the same size of app. 2.2 kb and comprise the full-length LTRs, confirming that they are an integral part of WpLTRE1. No target sequence duplications and no evidence for target site specificity for integration could be observed. The sequences of the “original” WpLTRE1 (named copy 1) and four additional copies (named copy 2–5), including a multiple sequence alignment of the deduced amino acid sequences, are shown in Supplementary Material Figure S15. Strikingly, all copies of WpLTRE1 but copy 1 (the “original”) contained a single ORF only, spanning almost the full length of the element and comprising the three ORFs of copy 1 (see Supplementary Material Figure S15). Hence, copy 1 most likely represents an atypical WpLTRE1, and copies 2–5 likely represent the archetype WpLTRE1. The deduced amino acid sequences of all copies revealed an interesting feature of WpLTRE1. WpLTR1 copies 2–5 encode a protein of 637 amino acid residues in length that encompasses a putative YR domain, including the highly conserved RHRY tetrad, a CCHC motif and a putative AP domain, as well as a putative IN domain, including the highly conserved DDE(K) triad (see Figure 4 and Figure S15). In WpLTR1 copy 1, the three domains are encoded by the three ORFs (Figure 4). The YR domain is a characteristic feature of the DIRS1 group of retrotransposons [40], a group that does not contain LTRs, whereas the integrase domain is part of LTR-retrotransposon-encoded proteins [65]. Hence, WpLTRE1 combines features of both types of TEs, a phenomenon that raises questions about its actual mode of transposition.
WpLTRE1 copy 2 was used as the query sequence for a nucleotide BLAST (blastn) search to address the question of whether or not WpLTRE1-like elements are also present in the genome of H. verbana. The search revealed the presence of several copies of a similar element, named HvLTR1. The element is flanked by LTRs of 186 bp in size, and also, the overall length (about 4.1 kb) is larger compared to WpLTR1 (about 2.2 kb). The sequences of two copies of HvLTR1 are presented in Supplementary Material Figure S16. HvLTR1 copies 1 and 2 differ in the length of the 5`LTR (with respect to the orientation of the ORF) due to an internal deletion of 43 bp in HvLTR1 copy 2. The individual genomic contigs that comprise the two copies of HvLTR1 sequences both contain a stretch of unresolved nucleotides at the same position. By identification of a contig that covered the “region of uncertainty”, including the flanking regions on both sides, it was possible to determine the length (118 bp) and sequence of the undetermined nucleotides. The cured sequence of HvLTR1 contains a single ORF that encodes for a protein of 1188 amino acid residues in length. The reconstructed region of 39 amino acid residues almost perfectly fits both in size and sequence to hypothetical proteins of Caenorhabditis brenneri Sudhaus and Kiontke, 2007, and Ancylostoma ceylanicum Looss, 1911 (see Supplementary Material Figure S16), making it very likely that the reconstruction was correct.
The hypothetical proteins encoded by HvLTR1 are in part homologous to their WpLTR1-encoded counterparts. The first 444/446 amino acid residues of both proteins display degrees of 85%/93% of sequence identity/similarity, including the putative YR domain, the CCHC motif and the AP domain, but lacking the putative IN domain. The remaining parts differ markedly, however. Also, the hypothetical protein of HvLTR1 comprises a putative IN domain close to the C-terminus (see Supplementary Material Figure S16). Both putative IN domains differ in sequence, but the localization and spacing of the canonical DDE(K) motifs are comparable. In addition, the HvLTR1-encoded protein contains both a putative RH domain and an RT domain, but as for WpLTR1, no evidence for a Gag-encoding ORF could be found.

4. Discussion

The number of described TEs grows constantly, and the classification system becomes increasingly complex [66,67]. Many of these elements have been identified in “model organisms”, like Drosophila [68] or Arabidopsis [69], but the progress in sequencing technology and assembly methods allows for the rapid and cost-effective determination of the whole genome sequences of all kinds of organisms. The respective datasets can subsequently be used to address questions on the presence of TEs and their impact on biological processes (e.g., [63,70]). However, both the correct annotation and characterization of putative TEs in genome sequence datasets are challenging and time-consuming tasks. Tools like the RepeatMasker [71], RepeatModeler [72] and Generic Repeat Finder [73] pipelines provide first information, but more targeted analyses [17,74] and even a final manual editing (or “curation”) of output results [18] are mandatory. The present study provides the first detailed description of non-LTR (SINEs and LINEs) and LTR retrotransposons in leeches.

4.1. Classification of SINEs

SINEs were identified in six leech species, namely, H. verbana, H. medicinalis, H. nipponia, H. sanguisuga, H. manillensis and W. pigra. Based on a promoter sequence determination, they very likely all belong to the tRNA head superfamily [1], but the core sequences markedly differ between the HvSINE1–3 group, on one hand, and HvSINE4, on the other hand (Supplementary Material Figure S3). HvSINE1-like elements can be found in closely related Eurasian species of the leech family Hirudinidae but not in family members of other geographical origin or in representatives of other leech families (Table 5). Leech SINEs may hence be used as an accessory molecular marker for phylogenetic and phylogeographic analyses. However, since SINEs are very short genetic elements, the actual information content of a single element is rather limited (see Figure 3), and only the combination of various elements into one analysis may result in reliable conclusions [75]. An even more robust phylogenetic marker is the presence/absence pattern of SINE insertions [62,76]. The H. verbana individuals that were used in the studies by Müller et al. [54] (H. verbana_HGW) and Paulsen et al. [32] (H. verbana_USA) display 99.5% of coi sequence identity, and both belong to the Eastern subgroup of H. verbana [77]. The integration of HvSINE1 into the hirudin HV1 gene of H. verbana_USA must hence be a very recent event, and HvSINE1 is very likely still an active TE. The latter assumption is strongly supported by its very high abundance in the genomes of both H. verbana and H. medicinalis (Table 3). Interestingly, the expression pattern of all HvSINE elements in different organs/tissues of H. verbana/medicinalis is not uniform but displays remarkable differences (Table 4). To the knowledge of the author, this is the first example of such investigations. The data are much too tentative to draw any further conclusions but may illustrate the need to pay attention to tissue-specific expression patterns of TEs in the future.
For HvSIN1–3, no corresponding LINE could be identified. The very high copy number of HvSINE1, in combination with the short LINE-related segment, impeded all BLAST search attempts. In contrast, for HvSINE4, a corresponding LINE, named HvLINE1, was identified. HvSINE4 and HvLINE1 share a common segment of 48 bp in length, including a short A-rich tail and a stretch of simple repeats (Supplementary Material Figure S9). The further properties of HvLINE1 will be discussed below.
A second SINE that was identified based on its presence in a hirudin gene is HmSINE_V2 in H. manillensis. Whereas all HvSINE1-like elements and HvSINE4 do not contain a pAS and a pol III terminator sequence and belong to the T-class, HmSINE_V2 does and hence belongs to the T+-class [8]. In addition, the element also contains a long A-rich tail, a typical structure of T-class elements, making it a hybrid element. The most striking feature of HmSINE_V2, however, is its relationship with the respective LINE: the complete sequence of HmSINE_V2 is present in HmLINE1 (underlined in Supplementary Material Figure S10). The LINE-related segment usually comprises only a part of the entire SINE sequence located at the 3′ end [16]. Nevertheless, the presence of app. 50 copies of HmSINE_V2 in the genome of H. manillensis indicate that the element is not an artifact but very likely still comprises its ability to transpose and is, hence, functional.

4.2. Classification of LINEs

Six new LINEs were identified and characterized in the present investigations. Two of them, HvLINE1 and HmLINE1, could so far be attributed to respective SINEs (see above). Only HvLINE2 displays the “classical” architecture of a LINE encompassing two ORFs. HvLINE3 is in large parts comparable to HvLINE2, with the exception of the first ORF that is split into two separate ORFs (Figure 4 and Supplementary Material Figure S12).
HvLINE4 belongs to a different superfamily of LINEs. The element comprises an RLE domain instead of an APE domain, a feature typical for the so-called “early branched non-LTR retrotransposons” [35]. Elements of that type belong to the R2 group and usually contain a single ORF that encodes an RH domain and an RLE domain [67]. Strikingly, HvLINE4 encodes two putative RH domains in addition to the RLE domain (Figure 4 and Supplementary Material Figure S13). To the knowledge of the author, no such duplication has been described so far in LINEs.
HvLINE1 comprises a rather complex structure. The element is composed of four ORFs, the first being in the opposite direction to ORFs 2–4. So far, no structure like this has been described for LINEs. The assumption that ORF1 is an integral part of the element is supported by the observation that the two RNP motifs of a complete RRM and a putative APE domain are encoded by both ORF 1 and 2 (Figure 4 and Supplementary Material Figure S9). In general, the domain order of HvLINE1 resembles that of HvLINEs 2 and 3, and the four proteins that are encoded by ORFs1–4 likely form a functional hetero-tetramer. The presence of multiple copies of the related HvSINE4 element points to an intact transposition machinery of HvLINE4, despite the lack of an ORF that is equivalent to ORF1 of HvLINE2 and 3. The classification of the element, however, remains uncertain.
HmLINE1, the corresponding element to HmSINE_V2 (see above), comprises unique features, too, and is, hence, difficult to assign to any of the groups classified by Kojima (2019). The element contains a single ORF only that encodes a protein with putative APE, RH and RT domains, in that order (Figure 4 and Supplementary Material Figure S10), but lacks an ORF1-like domain. Remarkably, the putative RH and RT domains have switched their positions compared to the canonical structure of LINEs [7]. Several copies of HmLINE1 are present in the genome of H. manillensis, indicating its competence for transposition.
WpLINE1 of W. pigra is a damaged element due to the integration of yet another TE, an LTR retrotransposon, near the 5′ end. The intact element very likely contained a single ORF encoding two putative APE domains as well as putative RT and RH domains. Whereas the order of domains resembles HvLINEs 1–3, the distances between the domains are rather short (Figure 4 and Supplementary Material Figure S14). A unique feature of WpLINE1 is the presence of a second putative APE domain. So far, the presence of two endonuclease domains in a LINE has only been described for the elements Dualen (APE and RLE domains, [35]) and Helitron (APE and HUH domains, [78]). No intact copies of WpLINE1 are present in the genome of W. pigra; the element is likely non-functional.

4.3. Classification of LTR-Retrotransposons

Both WpLTR and HvLTR display remarkable structural features that make them different from all superfamilies of LTR retrotransposons defined so far by Wicker et al. [1] and Kojima [67]. First, they do not encode a Gag protein; second, the loss of domains in WpLTR; and third, the gain of a putative YR domain in addition to the canonical IN domain.
The Gag (or Group-specific antigen) protein comprises the retroviral matrix (MA), the capsid (CA) and the nucleocapsid (NC) proteins [79]. In LTR retrotransposons, the Gag-encoded proteins are mandatory to form a ribonucleoprotein or virus-like particle (VLP), in which the reverse transcription process takes place [80,81]. Like WpLTR and HvLTR, the LTR retrotransposon Morgane lacks Gag. But in contrast, Morgane does not encompass a functional ORF that encodes the remaining domains of LTR retrotransposons like the RT domain. Morgane is very likely a non-autonomous TE, and its transposition may hence depend on the trans-activity by a different LTR retrotransposon protein complex [82]. The lack of Gag can indeed be compensated, as described for the BARE-2 element [83]. The presence of several copies of WpLTR and HvLTR in the genomes of their respective hosts indicates that the elements are functional; however, the actual mode of transposition, whether autonomously or non-autonomously, remains elusive.
Based on the order of domains, HvLTR belongs to either the Gypsy or the Bel-Pao superfamily, but not the Copia superfamily, of LTR retrotransposons [1]. WpLTR exhibits basically the same domain order but lacks the RT and RH domains (Figure 4 and Supplementary Material Figures S15 and S16). Strikingly, the three remaining domains of WpLTR (YR, AP and IN) can either be encoded by individual ORFs (copy 1) or by a single ORF (copies 2–5). A comparable split was also observed for HvLINE1 and HvLINE3 compared to HvLINE2 (see Figure 4 for details).
The most striking feature of both WpLTR and HvLTR is the presence of a YR domain in addition to the canonical IN domain. The YR domain is a structural hallmark of DIRS-like elements [40,47] and the Crypton element (a DNA transposon; [84] but is not present in any other groups of LTR retrotransposons. DIRS1-like elements are present in a broad variety of eukaryote taxa, including Annelids [85]. YR-mediated transposition occurs via the integration of a circular DNA intermediate by site-specific recombination and does not generate a TSD [49,86], whereas IN-mediated transposition occurs via the integration of a blunt-end DNA intermediate by a DNA cutting and joining reaction and typically generates a TSD [87]. The integration sites of all copies of WpLTR and HvLTR provide clear evidence for integration via a YR-mediated recombination process. The combination of the structure and transposition mode justifies the definition of WpLTR and HvLTR as the first members of a new superfamily of LTR retrotransposons.

4.4. Biological Significance

TEs have a deep impact on genome diversity [88] (Warren et al., 2015): they are drivers of genome evolution [2,89] and may influence biological processes up to speciation [90]. However, both the presence and activity of TEs may also have deleterious effects on their hosts [91]. The impact of TEs on leeches remains speculative, mainly due to the almost complete lack of detailed information on the presence of TEs in leech genomes. The present study provides only a glimpse of the likely diversity of TEs in leeches. The presence of two of these elements, namely, HvSINE1 and HmSINE_V2, in hirudin genes (albeit in intron sequences) provides an explanation for the remarkable redundancy of hirudin and HLF genes in leech species [53,54]. Hematophagous leeches critically depend on the presence and activity of hirudin as a central inhibitor of blood coagulation to ensure the uptake of a blood meal [92]. A loss-of-function mutation, e.g., due to the random integration of a TE into the coding region of a hirudin gene, would certainly have an immediate negative impact on the fitness of the respective organism. Redundancy can hence be seen as a strategy to compensate for putative gene losses due to such deleterious events.

5. Conclusions

Individual TEs of different types have been identified and structurally characterized in leeches; some of them are unique in structure compared to canonical TEs. However, the actual diversity of TEs in leeches is likely much higher. Non-model organisms are hence an excellent source for new information, even on long-term studied objects like TEs, and may provide new insights into the diversity and the putative biological impact of these fascinating genetic elements.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/dna5020030/s1: Figure S1: Multiple sequence alignment of hirudin HV1 gene sequences, Figure S2: Multiple sequence alignment of best 50 HvSINE1 hits in the genome of H. verbana. Figure S3: Sequences, a multiple sequence alignment and a schematic drawing of putative SINEs of Hirudo verbana. Figure S4: Putative HvSINE1-like elements of Hirudinaria manillensis. Figure S5: Putative HvSINE1-like elements of Whitmania pigra. Figure S6: Putative HvSINE1-like elements of Haemopis sanguisuga. Figure S7: Putative HvSINE1-like elements of Hirudo nipponia. Figure S8: HmSINE_V2 of Hirudinaria manillensis. Figure S9: Sequence and structure of HvLINE1. Figure S10: Sequence and structure of HmLINE1. Figure S11: Sequence and Structure of HvLINE2. Figure S12: Sequence and structure of HvLINE3. Figure S13: Sequence and structure of HvLINE4. Figure S14: Sequence and structure of WpLINE1. Figure S15: Sequences and structures of WpLTR1 copies in W. pigra. Figure S16: Sequences and structures of HvLTR1 copies in H. verbana. Figure S17: Multiple sequence alignments and phylogenetic trees derived of partial coi sequences and SINE sequences.

Funding

This research received no external funding.

Institutional Review Board Statement

The author declares that the investigations described in this paper comply with the current laws in Germany.

Data Availability Statement

Leech genome and transcriptome data for H. robusta, H. manillensis, H. medicinalis, W. pigra and H. verbana are freely accessible and searchable through public databases like GenBank. Specific sequences are provided in the Supplementary Materials; further inquiries can be directed to the corresponding author.

Acknowledgments

The author would like to thank all members of the Animal Physiology working group at the University of Greifswald for their help and support. A special thanks goes to Jan-Peter Hildebrandt and Undine Lauf for their critical reading of this manuscript.

Conflicts of Interest

The author declares that he has no conflicts of interest.

References

  1. Wicker, T.; Sabot, F.; Hua-Van, A.; Bennetzen, J.L.; Capy, P.; Chalhoub, B.; Flavell, A.; Leroy, P.; Morgante, M.; Panaud, O.; et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 2007, 8, 973–982. [Google Scholar] [CrossRef] [PubMed]
  2. Kazazian, H.H., Jr. Mobile elements: Drivers of genome evolution. Science 2004, 303, 1626–1632. [Google Scholar] [CrossRef]
  3. Finnegan, D.J. Eukaryotic transposable elements and genome evolution. Trends Genet. 1989, 5, 103–107. [Google Scholar] [CrossRef]
  4. Finnegan, D.L. Retrotransposons. Curr. Biol. 2012, 22, R432–R437. [Google Scholar] [CrossRef] [PubMed]
  5. Wells, J.N.; Feschotte, C. A field guide to eukaryotic transposable elements. Annu. Rev. Genet. 2020, 54, 539–561. [Google Scholar] [CrossRef]
  6. Bourque, G.; Burns, K.H.; Gehring, M.; Gorbunova, V.; Seluanov, A.; Hammell, M.; Imbeault, M.; Izsvák, Z.; Levin, H.L.; Macfarlan, T.S.; et al. Ten things you should know about transposable elements. Genome Biol. 2018, 19, 199. [Google Scholar] [CrossRef] [PubMed]
  7. Okada, N.; Hamada, M.; Ogiwara, I.; Ohshima, K. SINEs and LINEs share common 3′ sequences: A review. Gene 1997, 205, 229–243. [Google Scholar] [CrossRef]
  8. Roy-Engel, A.M. A tale of an A-tail: The lifeline of a SINE. Mob. Genet. Elem. 2012, 2, 282–286. [Google Scholar] [CrossRef]
  9. Borodulina, O.R.; Kramerov, D.A. Transcripts synthesized by RNA polymerase III can be polyadenylated in an AAUAAA-dependent manner. RNA 2008, 14, 1865–1873. [Google Scholar] [CrossRef]
  10. Borodulina, O.R.; Golubchikova, J.S.; Ustyantsev, I.G.; Kramerov, D.A. Polyadenylation of RNA transcribed from mammalian SINEs by RNA polymerase III: Complex requirements for nucleotide sequences. Biochim. Biophys. Acta 2016, 1859, 355–365. [Google Scholar] [CrossRef]
  11. Borodulina, O.R.; Kramerov, D.A. Short interspersed elements (SINEs) from insectivores. Two classes of mammalian SINEs distinguished by A-rich tail structure. Mamm. Genome 2001, 12, 779–786. [Google Scholar] [CrossRef] [PubMed]
  12. Khazina, E.; Weichenrieder, O. Non-LTR retrotransposons encode noncanonical RRM domains in their first open reading frame. Proc. Natl. Acad. Sci. USA 2009, 106, 731–736. [Google Scholar] [CrossRef] [PubMed]
  13. Metcalfe, C.J.; Casane, D. Modular organization and reticulate evolution of the ORF1 of Jockey superfamily transposable elements. Mob. DNA 2014, 5, 19. [Google Scholar] [CrossRef]
  14. SenGupta, D. RNA-Binding Domains in Proteins. In Brenner’s Encyclopedia of Genetics, 2nd ed.; Academic Press: New York, NY, USA, 2013; pp. 274–276. [Google Scholar] [CrossRef]
  15. Kramerov, D.A.; Vassetzky, N.S. Origin and evolution of SINEs in eukaryotic genomes. Heredity 2011, 107, 487–495. [Google Scholar] [CrossRef] [PubMed]
  16. Gilbert, N.; Labuda, D. CORE-SINEs: Eukaryotic short interspersed retroposing elements with common sequence motifs. Proc. Natl. Acad. Sci. USA 1999, 96, 2869–2874. [Google Scholar] [CrossRef]
  17. Li, Y.; Jiang, N.; Sun, Y. AnnoSINE: A short interspersed nuclear elements annotation tool for plant genomes. Plant Physiol. 2022, 188, 955–970. [Google Scholar] [CrossRef]
  18. Goubert, C.; Craig, R.J.; Bilat, A.F.; Peona, V.; Vogan, A.A.; Protasio, A.V. A beginner’s guide to manual curation of transposable elements. Mob. DNA 2022, 13, 7. [Google Scholar] [CrossRef]
  19. Kanhayuwa, L.; Coutts, R.H.A. Short Interspersed Nuclear Element (SINE) sequences in the genome of the human pathogenic fungus Aspergillus fumigatus Af293. PLoS ONE 2016, 11, e0163215. [Google Scholar] [CrossRef]
  20. Sket, B.; Trontelj, P. Global diversity of leeches (Hirudinea) in freshwater. Hydrobiologia 2008, 595, 129–137. [Google Scholar] [CrossRef]
  21. Kvist, S.; Utevsky, S.; Marrone, F.; Ben Ahmed, R.; Gajda, Ł.; Grosser, C.; Huseynov, M.; Jueg, U.; Khomenko, A.; Oceguera-Figueroa, A.; et al. Extensive sampling sheds light on species-level diversity in Palearctic Placobdella (Annelida: Clitellata: Glossiphoniiformes). Hydrobiologia 2022, 849, 1239–1259. [Google Scholar] [CrossRef]
  22. Sawyer, R.T. Leech Biology and Behaviour: Feeding, Biology, Ecology and Systematics; Clarendon Press: Oxford, UK, 1986; ISBN 978-019-857-622-8. [Google Scholar]
  23. Abdualkader, A.M.; Ghawi, A.M.; Alaama, M.; Awang, M.; Merzouk, A. Leech therapeutic applications. Indian. J. Pharm. Sci. 2013, 75, 127–137. [Google Scholar] [PubMed]
  24. Hildebrandt, J.-P.; Lemke, S. Small bite, large impact–saliva and salivary molecules in the medicinal leech, Hirudo medicinalis. Naturwissenschaften 2011, 98, 995–1008. [Google Scholar] [CrossRef] [PubMed]
  25. Lemke, S.; Vilcinskas, A. European Medicinal leeches-new roles in modern medicine. Biomedicines 2020, 8, 99. [Google Scholar] [CrossRef] [PubMed]
  26. Greinacher, A.; Warkentin, T.E. The direct thrombin inhibitor hirudin. Thromb. Haemost. 2008, 99, 819–829. [Google Scholar] [CrossRef]
  27. Simakov, O.; Marletaz, F.; Cho, S.-J.; Edsinger-Gonzales, E.; Havlak, P.; Hellsten, U.; Kuo, D.-H.; Larsson, T.; Lv, J.; Arendt, D.; et al. Insights into bilaterian evolution from three spiralian genomes. Nature 2013, 493, 526–531. [Google Scholar] [CrossRef]
  28. Guan, D.L.; Yang, J.; Liu, Y.K.; Li, Y.; Mi, D.; Ma, L.B.; Wang, Z.Z.; Xu, S.Q.; Qiu, Q. Draft Genome of the Asian buffalo leech Hirudinaria manillensis. Front. Genet. 2020, 10, 1321. [Google Scholar] [CrossRef]
  29. Kvist, S.; Manzano-Marín, A.; de Carle, D.; Trontelj, P.; Siddall, M.E. Draft genome of the European medicinal leech Hirudo medicinalis (Annelida, Clitellata, Hirudiniformes) with emphasis on anticoagulants. Sci. Rep. 2020, 10, 9885. [Google Scholar] [CrossRef]
  30. Babenko, V.V.; Podgorny, O.V.; Manuvera, V.A.; Kasianov, A.S.; Manolov, A.I.; Grafskaia, E.N.; Shirokov, D.A.; Kurdyumov, A.S.; Vinogradov, D.V.; Nikitina, A.S.; et al. Draft genome sequences of Hirudo medicinalis and salivary transcriptome of three closely related medicinal leeches. BMC Genomics 2020, 21, 331. [Google Scholar] [CrossRef]
  31. Tong, L.; Dai, S.-X.; Kong, D.-J.; Yang, P.-P.; Tong, X.; Tong, X.-R.; Bi, X.-X.; Su, Y.; Zhao, Y.-Q.; Liu, Z.-C. The genome of medicinal leech (Whitmania pigra) and comparative genomic study for exploration of bioactive ingredients. BMC Genomics 2022, 23, 76. [Google Scholar] [CrossRef]
  32. Paulsen, R.T.; Agany, D.D.M.; Petersen, J.; Davis, C.M.; Ehli, E.A.; Gnimpieba, E.; Burrell, B.S. A draft genome for Hirudo verbana, the Medicinal leech. BioRxiv 2020. [Google Scholar] [CrossRef]
  33. Zhao, F.; Huang, Z.; He, B.; Liu, K.; Li, J.; Liu, Z.; Lin, G. Comparative genomics of two Asian medicinal leeches Hirudo nipponia and Hirudo tianjinensis: With emphasis on antithrombotic genes and their corresponding proteins. Int. J. Biol. Macromol. 2024, 270, 132278. [Google Scholar] [CrossRef]
  34. Fillingham, J.S.; Thing, T.A.; Vythilingum, N.; Keuroghlian, A.; Bruno, D.; Golding, G.B.; Pearlman, R.E. A non-long terminal repeat retrotransposon family is restricted to the germ line micronucleus of the ciliated protozoan Tetrahymena thermophila. Eukaryot. Cell 2004, 3, 157–169. [Google Scholar] [CrossRef] [PubMed]
  35. Kojima, K.K.; Fujiwara, H. An extraordinary retrotransposon family encoding dual endonucleases. Genome Res. 2005, 15, 1106–1117. [Google Scholar] [CrossRef] [PubMed]
  36. Tözsér, J. Comparative studies on retroviral proteases: Substrate specificity. Viruses 2010, 2, 147–165. [Google Scholar] [CrossRef]
  37. Gazda, L.D.; Matúz, K.J.; Nagy, T.; Mótyán, J.A.; Tőzsér, J. Biochemical characterization of Ty1 retrotransposon protease. PLoS ONE 2020, 15, e0227062. [Google Scholar] [CrossRef]
  38. Evgen’ev, M.B.; Zelentsova, H.; Shostak, N.; Kozitsina, M.; Barskyi, V.; Lankenau, D.H.; Corces, V.G. Penelope, a new family of transposable elements and its possible role in hybrid dysgenesis in Drosophila virilis. Proc. Natl. Acad. Sci. USA 1997, 94, 196–201. [Google Scholar] [CrossRef] [PubMed]
  39. Ohta, S.; Tsuchida, K.; Choi, S.; Sekine, Y.; Shiga, Y.; Ohtsubo, E. Presence of a characteristic D-D-E motif in IS1 transposase. J. Bacteriol. 2002, 184, 6146–6154. [Google Scholar] [CrossRef]
  40. Goodwin, T.J.D.; Poulter, R.T.M. The DIRS1 group of retrotransposons. Mol. Biol. Evol. 2001, 18, 2067–2082. [Google Scholar] [CrossRef]
  41. Arkhipova, I.R. Distribution and phylogeny of Penelope-like elements in eukaryotes. Syst. Biol. 2006, 55, 875–885. [Google Scholar] [CrossRef]
  42. Meier, B.; Clejan, I.; Liu, Y.; Lowden, M.; Gartner, A.; Hodgkin, J.; Ahmed, S. trt-1 is the Caenorhabditis elegans catalytic subunit of telomerase. PLoS Genet. 2006, 2, e18. [Google Scholar] [CrossRef]
  43. Maris, C.; Dominguez, C.; Allain, F.H.-T. The RNA recognition motif, a plastic RNA-binding platform to regulate post-transcriptional gene expression. FEBS J. 2005, 272, 2118–2131. [Google Scholar] [CrossRef] [PubMed]
  44. Lingner, J.; Hughes, T.R.; Shevchenko, A.; Mann, M.; Lundblad, V.; Cech, T.R. Reverse transcriptase motifs in the catalytic subunit of telomerase. Science 1997, 276, 561–567. [Google Scholar] [CrossRef] [PubMed]
  45. Wu, S.; Zhang, X.; Han, J. A computational model for predicting RNase H domain of retrovirus. PLoS ONE 2016, 11, e0161913. [Google Scholar] [CrossRef]
  46. Moelling, K.; Broecker, F.; Russo, G.; Sunagawa, S. RNase H as gene modifier, driver of evolution and antiviral defense. Front. Microbiol. 2017, 8, 1745. [Google Scholar] [CrossRef] [PubMed]
  47. Goodwin, T.J.D.; Poulter, R.T.M. A new group of tyrosine recombinase-encoding retrotransposons. Mol. Biol. Evol. 2004, 21, 746–759. [Google Scholar] [CrossRef]
  48. Poulter, R.T.M.; Goodwin, T.J.D. DIRS-1 and the other tyrosine recombinase retrotransposons. Cytogenet. Genome Res. 2005, 110, 575–588. [Google Scholar] [CrossRef]
  49. Poulter, R.T.M.; Butler, M.I. Tyrosine recombinase retrotransposons and transposons. Microbiol. Spectr. 2015, 3, MDNA3-0036-2014. [Google Scholar] [CrossRef]
  50. Krishna, S.S.; Majumdar, I.; Grishin, N.V. Structural classification of zinc fingers: Survey and summary. Nucleic Acids Res. 2003, 31, 532–550. [Google Scholar] [CrossRef]
  51. Sokal, R.R.; Michener, C.D. A statistical method for evaluating systematic relationships. Kans. Univ. Sci. Bull. 1958, 38, 1409–1438. [Google Scholar]
  52. Jukes, T.H.; Cantor, C.R. Evolution of protein molecules. In Mammalian Protein Metabolism; Munro, H.N., Ed.; Academic Press: New York, NY, USA, 1969; pp. 21–132. [Google Scholar] [CrossRef]
  53. Müller, C.; Mescke, K.; Liebig, S.; Mahfoud, H.; Lemke, S.; Hildebrandt, J.-P. More than just one: Multiplicity of hirudins and hirudin-like factors in the medicinal leech Hirudo medicinalis. Mol. Genet. Genomics 2016, 291, 227–240. [Google Scholar] [CrossRef]
  54. Müller, C.; Haase, M.; Lemke, S.; Hildebrandt, J.-P. Hirudins and hirudin-like factors in Hirudinidae: Implications for function and phylogenetic relationships. Parasitol. Res. 2017, 116, 313–325. [Google Scholar] [CrossRef] [PubMed]
  55. Geiduschek, E.P.; Tocchini-Valentini, G.P. Transcription by RNA polymerase III. Annu. Rev. Biochem. 1988, 57, 873–914. [Google Scholar] [CrossRef]
  56. Dieci, G.; Giuliodori, S.; Catellani, M.; Percudani, R.; Ottonello, S. Intragenic promoter adaptation and facilitated RNA polymerase III recycling in the transcription of SCR1, the 7SL RNA gene of Saccharomyces cerevisiae. J. Biol. Chem. 2002, 277, 6903–6914. [Google Scholar] [CrossRef]
  57. Traboni, C.; Ciliberto, G.; Cortese, R. A novel method for site-directed mutagenesis: Its application to an eukaryotic tRNAPro gene promoter. EMBO J. 1982, 1, 415–420. [Google Scholar] [CrossRef] [PubMed]
  58. Phillips, A.J.; Siddall, M. Poly-paraphyly of Hirudinidae: Many lineages of medicinal leeches. BMC Evol. Biol. 2009, 9, 246. [Google Scholar] [CrossRef] [PubMed]
  59. Phillips, A.J.; Dornburg, A.; Zapfe, K.L.; Anderson, F.E.; James, S.W.; Erséus, C.; Lemmon, E.M.; Lemmon, A.R.; Wiliams, B.W. Phylogenomic analysis of a putative missing link sparks reinterpretation of leech evolution. Genome Biol. Evol. 2019, 11, 3082–3093. [Google Scholar] [CrossRef]
  60. Miyamoto, M.M. Molecular systematics: Perfect SINEs of evolutionary history? Curr. Biol. 1999, 9, R816–R819. [Google Scholar] [CrossRef]
  61. Ray, D.A.; Xing, J.; Salem, A.-H.; Batzer, M.A. SINEs of a nearly perfect character. Syst. Biol. 2006, 55, 928–935. [Google Scholar] [CrossRef]
  62. Korstian, J.M.; Paulat, N.S.; Platt, R.N., 2nd; Stevens, R.D.; Ray, D.A. SINE-based phylogenomics reveal extensive introgression and incomplete lineage sorting in Myotis. Genes 2022, 13, 399. [Google Scholar] [CrossRef]
  63. Martelossi, J.; Iannello, M.; Ghiselli, F.; Luchetti, A. Widespread HCD-tRNA derived SINEs in bivalves rely on multiple LINE partners and accumulate in genic regions. Mob. DNA 2024, 15, 22. [Google Scholar] [CrossRef]
  64. Lukas, P.; Melikian, G.; Hildebrandt, J.-P.; Müller, C. Make it double: Identification and characterization of a Tandem-Hirudin from the Asian medicinal leech Hirudinaria manillensis. Parasitol. Res. 2022, 121, 2995–3006. [Google Scholar] [CrossRef] [PubMed]
  65. Haren, L.; Ton-Hoang, B.; Chandler, M. Integrating DNA: Transposases and retroviral integrases. Annu. Rev. Microbiol. 1999, 53, 245–281. [Google Scholar] [CrossRef]
  66. Arkhipova, I.R. Using bioinformatic and phylogenetic approaches to classify transposable elements and understand their complex evolutionary histories. Mob. DNA 2017, 8, 19. [Google Scholar] [CrossRef]
  67. Kojima, K.K. Structural and sequence diversity of eukaryotic transposable elements. Genes Genet. Syst. 2019, 94, 233–252. [Google Scholar] [CrossRef] [PubMed]
  68. McCullers, T.J.; Steiniger, M. Transposable elements in Drosophila. Mob. Genet. Elem. 2017, 7, 1–18. [Google Scholar] [CrossRef]
  69. Quesneville, H. Twenty years of transposable element analysis in the Arabidopsis thaliana genome. Mob. DNA 2020, 11, 28. [Google Scholar] [CrossRef]
  70. Han, G.; Zhang, N.; Jiang, H.; Meng, X.; Qian, K.; Zheng, Y.; Xu, J.; Wang, J. Diversity of short interspersed nuclear elements (SINEs) in lepidopteran insects and evidence of horizontal SINE transfer between baculovirus and lepidopteran hosts. BMC Genomics 2021, 22, 226. [Google Scholar] [CrossRef]
  71. Tarailo-Graovac, M.; Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinform. 2009, 4, 4.10.1–4.10.14. [Google Scholar] [CrossRef] [PubMed]
  72. Flynn, J.M.; Hubley, R.; Goubert, C.; Rosen, J.; Clark, A.G.; Feschotte, C.; Smit, A.F. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 2020, 117, 9451–9457. [Google Scholar] [CrossRef]
  73. Shi, J.; Liang, C. Generic Repeat Finder: A high-sensitivity tool for genome-wide de novo repeat detection. Plant Physiol. 2019, 180, 1803–1815. [Google Scholar] [CrossRef]
  74. Bell, E.A.; Butler, C.L.; Oliveira, C.; Marburger, S.; Yant, L.; Taylor, M.I. Transposable element annotation in non-model species: The benefits of species-specific repeat libraries using semi-automated EDTA and DeepTE de novo pipelines. Mol. Ecol. Resour. 2022, 22, 823–833. [Google Scholar] [CrossRef] [PubMed]
  75. Deragon, J.-M.; Zhang, X. Short interspersed elements (SINEs) in plants: Origin, classification, and use as phylogenetic markers. Syst. Biol. 2006, 55, 949–956. [Google Scholar] [CrossRef] [PubMed]
  76. Nikaido, M.; Rooney, A.P.; Okada, N. Phylogenetic relationships among cetartiodactyls based on insertions of short and long interpersed elements: Hippopotamuses are the closest extant relatives of whales. Proc. Natl. Acad. Sci. USA 1999, 96, 10261–10266. [Google Scholar] [CrossRef]
  77. Trontelj, P.; Utevsky, S.Y. Phylogeny and phylogeography of medicinal leeches (genus Hirudo): Fast dispersal and shallow genetic structure. Mol. Phylogenet. Evol. 2012, 63, 475–485. [Google Scholar] [CrossRef] [PubMed]
  78. Poulter, R.T.M.; Goodwin, T.J.D.; Butler, M.I. Vertebrate helentrons and other novel Helitrons. Gene 2003, 313, 201–212. [Google Scholar] [CrossRef]
  79. Karn, J. Retroviruses. In Brenner’s Encyclopedia of Genetics, 2nd ed.; Academic Press: New York, NY, USA, 2013; pp. 211–215. [Google Scholar] [CrossRef]
  80. Havecker, E.R.; Gao, X.; Voytas, D.F. The diversity of LTR retrotransposons. Genome Biol. 2004, 5, 225. [Google Scholar] [CrossRef]
  81. Sabot, F.; Schulman, A.H. Parasitism and the retrotransposon life cycle in plants: A hitchhiker’s guide to the genome. Heredity 2006, 97, 381–388. [Google Scholar] [CrossRef]
  82. Sabot, F.; Sourdille, P.; Chantret, N.; Bernard, M. Morgane, a new LTR retrotransposon group, and its subfamilies in wheats. Genetica 2006, 128, 439–447. [Google Scholar] [CrossRef]
  83. Tanskanen, J.A.; Sabot, F.; Vicient, C.; Schulman, A.H. Life without GAG: The BARE-2 retrotransposon as a parasite’s parasite. Gene 2007, 390, 166–174. [Google Scholar] [CrossRef]
  84. Goodwin, T.J.D.; Butler, M.I.; Poulter, R.T.M. Cryptons: A group of tyrosine-recombinase-encoding DNA transposons from pathogenic fungi. Microbiology 2003, 149, 3099–3109. [Google Scholar] [CrossRef]
  85. Piednoël, M.; Gonçalves, I.R.; Higuet, D.; Bonnivard, E. Eukaryote DIRS1-like retrotransposons: An overview. BMC Genomics 2011, 12, 621. [Google Scholar] [CrossRef] [PubMed]
  86. Curcio, M.J.; Derbyshire, K.M. The outs and ins of transposition: From mu to kangaroo. Nat. Rev. Mol. Cell Biol. 2003, 4, 865–877. [Google Scholar] [CrossRef]
  87. Nefedova, L.; Kim, A. Mechanisms of LTR-retroelement transposition: Lessons from Drosophila melanogaster. Viruses 2017, 9, 81. [Google Scholar] [CrossRef] [PubMed]
  88. Warren, I.A.; Naville, M.; Chalopin, D.; Levin, P.; Berger, C.S.; Galiana, D.; Volff, J.-N. Evolutionary impact of transposable elements on genomic diversity and lineage-specific innovation in vertebrates. Chromosome Res. 2015, 23, 505–531. [Google Scholar] [CrossRef]
  89. Nishihara, H. Transposable elements as genetic accelerators of evolution: Contribution to genome size, gene regulatory network rewiring and morphological innovation. Genes Genet. Syst. 2020, 94, 269–281. [Google Scholar] [CrossRef] [PubMed]
  90. Serrato-Capuchina, A.; Matute, D.R. The role of transposable elements in speciation. Genes 2018, 9, 254. [Google Scholar] [CrossRef]
  91. Platt, R.N., 2nd; Vandewege, M.W.; Ray, D.A. Mammalian transposable elements and their impacts on genome evolution. Chromosome Res. 2018, 26, 25–43. [Google Scholar] [CrossRef]
  92. Gross, U.; Roth, M. The biochemistry of leech saliva. In Medicinal Leech Therapy; Michalsen, A., Roth, M., Dobos, G., Eds.; Georg Thieme Verlag KG: Stuttgart, Germany, 2007. [Google Scholar] [CrossRef]
Figure 1. Hirudo verbana, the Hungarian medicinal leech. (left): dorsal view; (right): lateral view during a blood meal. Images taken by Ch. Müller.
Figure 1. Hirudo verbana, the Hungarian medicinal leech. (left): dorsal view; (right): lateral view during a blood meal. Images taken by Ch. Müller.
Dna 05 00030 g001
Figure 2. Schematic representation of the hirudin HV1 gene structures of Hirudo verbana.
Figure 2. Schematic representation of the hirudin HV1 gene structures of Hirudo verbana.
Dna 05 00030 g002
Figure 3. Schematic representation of phylogenetic trees based on HvSINE1-like elements (left) and coi (right) sequences. Numbers indicate the respective bootstrap values. Original trees were redrawn; branch lengths do not represent phylogenetic distances. Lter: Lumbricus terrestris: Hver: Hirudo verbana; Hmed: Hirudo medicinalis; Hman: Hirudinaria manillensis; Hsan: Haemopis sanguisuga; Wpig: Whitmania pigra; Hnip: Hirudo nipponia; Hver1: sequence of HvSINE1; Hver4: sequence of HvSINE4.
Figure 3. Schematic representation of phylogenetic trees based on HvSINE1-like elements (left) and coi (right) sequences. Numbers indicate the respective bootstrap values. Original trees were redrawn; branch lengths do not represent phylogenetic distances. Lter: Lumbricus terrestris: Hver: Hirudo verbana; Hmed: Hirudo medicinalis; Hman: Hirudinaria manillensis; Hsan: Haemopis sanguisuga; Wpig: Whitmania pigra; Hnip: Hirudo nipponia; Hver1: sequence of HvSINE1; Hver4: sequence of HvSINE4.
Dna 05 00030 g003
Figure 4. Schematic drawings of putative long interspersed nuclear elements (LINEs) of H. verbana (HvLINE1–4), H. manillensis (HmLINE1) and W. pigra (WpLINE1) and of long terminal repeat (LTR) elements of W. pigra and H. verbana. The size and orientation of open reading frames are indicated by open arrows. Colored marks indicate the position of characteristic sequence motifs like the CCHC motif (green), the RRM motif (dark and light blue), a polyA motif (orange), putative TSDs (red) and LTRs (purple). In addition, the characteristic functional domains of each type of element are indicated.
Figure 4. Schematic drawings of putative long interspersed nuclear elements (LINEs) of H. verbana (HvLINE1–4), H. manillensis (HmLINE1) and W. pigra (WpLINE1) and of long terminal repeat (LTR) elements of W. pigra and H. verbana. The size and orientation of open reading frames are indicated by open arrows. Colored marks indicate the position of characteristic sequence motifs like the CCHC motif (green), the RRM motif (dark and light blue), a polyA motif (orange), putative TSDs (red) and LTRs (purple). In addition, the characteristic functional domains of each type of element are indicated.
Dna 05 00030 g004aDna 05 00030 g004b
Table 1. Comparison of HV1 gene structures of Hirudo verbana and Hirudo medicinalis. Red colored numbers indicate the prominent differences in the size of exon 2 and exon 3. Sources of sequence data: Hv_HGW: Müller et al. [54]); Hm_HGW: Müller et al. [53]; Hm_Kvist: et al. [29] and Hv_USA: Paulsen et al. [32].
Table 1. Comparison of HV1 gene structures of Hirudo verbana and Hirudo medicinalis. Red colored numbers indicate the prominent differences in the size of exon 2 and exon 3. Sources of sequence data: Hv_HGW: Müller et al. [54]); Hm_HGW: Müller et al. [53]; Hm_Kvist: et al. [29] and Hv_USA: Paulsen et al. [32].
GeneExon 1Intron 1Exon 2Intron 2Exon 3Intron 3Exon 4GenBank
Hv_HGW16110350627619971KX215734.1
Hv_HGW26110350627619971KX215735.1
Hv_US61103502727622471GCA_020137395.1
JAIQDV010043103.1
Hv_contig_43718
position 3801-2954
Hm_HGW16110350627619971KR066930.1
Hm_HGW26110350627619971KR066931.1
Hm_Kvist6110350627619971GCA_903470615.1
CAGKPE010009153.1
contig SCF_090848
position 35665-36286
Table 2. Consensus sequences for promoter regions of 5S rRNA genes, tRNA and 7SL genes and tRNA genes of Hirudo verbana and Hirudo medicinalis and putative HvSINE1–4 promoters. * [55]; + [56]; # [57].
Table 2. Consensus sequences for promoter regions of 5S rRNA genes, tRNA and 7SL genes and tRNA genes of Hirudo verbana and Hirudo medicinalis and putative HvSINE1–4 promoters. * [55]; + [56]; # [57].
5S rRNA genes
Box A16 (13) bpBox B/IE 4 bpBox C
Homo sapiens:TTGGAAGCTAAGCAGGGTCAGGCCTGGTTGGTACCT-GATGGGAGAGAG
Plutella xylostella:ACCGAAGTCAAGCAACGTCGGGC----GTAGTCATTGGATGGGTGACCG
Urechis unicinctus:ACTGAAGTTAAGCAACGTCGGGCCCGGTTAGTACTTGGATGGGTGACCG
Hirudo medicinalis:ACCGAAGTTAAGCAACGTCGAGCCCGGTTAGTACTTGGATGGGTGACCG
Hirudo verbana:ACCGAAGTTAAGCAACGTCGAGCCCGGTTAGTACTTGGATGGGTGACCG
tRNA and 7SL genes
Box 1spacerBox 2
tRNA consensusTRGYBYAGTGG33 bpRGTTCGADYCY +
TRGCNNAGYGG33 bpGGTTCGANTCC *
Human tRNAProTGGTCTAGTGG31 bpGGTTCAA_TCC #
Human 7SL RNAGGGCGCGGTGG47 bpGCTTGAG_TCC
D. melanogaster 7SL RNATGGAAGGTTGG49 bpGGCTGGGATCT
H. medicinalis 7SL RNATGGAGTCGTAG44 bpGTTTGAGGTCG
H. verbana 7SL RNATGGAGTCGTAG44 bpGTTTGAGGTCG
putative HvSINE1–4 promoters
Box 1spacerBox 2
Hirudo sp. tRNA promoterTGGTCTAATGG29–32 bpGAATCGAATCC
HvSINE1TATCCCAATGG31 bpTATATAGCGCC
HvSINE2GATCCGGGTTGG30 bpTATATAGCACC
HvSINE3TGGATGCGAAGG31 bpTGTGTGGATCA
HvSINE4TGCGCGGAGGG29 bpTGTTTTAATCG
Table 3. The abundance of short interspersed nuclear elements HvSINEs in genomes of H. verbana and H. medicinalis. For HvSINE4, 21/14 copies contain the whole sequence, whereas about 200 copies contain the head and core domain but lack the putative LINE-specific segment.
Table 3. The abundance of short interspersed nuclear elements HvSINEs in genomes of H. verbana and H. medicinalis. For HvSINE4, 21/14 copies contain the whole sequence, whereas about 200 copies contain the head and core domain but lack the putative LINE-specific segment.
Hirudo verbanaHirudo medicinalis
HvSINE1>1000 copies>1000 copies
HvSINE21 copy1 copy
HvSINE31 copy1 copy
HvSINE421 (about 200) copies14 (about 200) copies
Table 4. Expression pattern of HvSINE-RNAs in different Hirudo verbana and/or Hirudo medicinalis tissues. ✓ indicates presence, - indicates absence of expression.
Table 4. Expression pattern of HvSINE-RNAs in different Hirudo verbana and/or Hirudo medicinalis tissues. ✓ indicates presence, - indicates absence of expression.
HvSINE1HvSINE2HvSINE3HvSINE4
salivary gland--
muscle--
ganglion-
central nervous system
head
Table 5. Presence (+) or absence (−) of HvSINE1-like sequences in genomes of various leech and annelid species. The taxonomic classifications (family level) and the geographic distributions are provided.
Table 5. Presence (+) or absence (−) of HvSINE1-like sequences in genomes of various leech and annelid species. The taxonomic classifications (family level) and the geographic distributions are provided.
Hirudo medicinalis+HirudinidaeEurope
Hirudinaria manillensis+HirudinidaeSoutheast
Whitmania pigra+HirudinidaeAsia
Hirudo nipponia+HirudinidaeEast Asia
Haemopis sanguisuga+HirudinidaeEast Asia
Limnobdella mexicanaHirudinidaeEurope, North Africa
Macrobdella decoraHirudinidaeNorth America
Asiaticobdella fenestrataHirudinidaeNorth America
Haemadipsa interruptaHaemadipsidaeSouthern Africa
Haementeria vizzotoiGlossiphoniidaeSouthern Asia
Helobdella robustaGlossiphoniidaeSouth America
Erpobdella octoculataErpobdellidaeNorth America
Piscicola geometraPiscicolidaeEurope
Enchytraeus crypticusEnchytraeidaeEurope
Lumbricus terrestrisLumbricidaeglobally
Eisenia fetidaLumbricidaeEurope
Capitella teletaCapitellidaeEurope
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Müller, C. Of Short Interspersed Nuclear Elements, Long Interspersed Nuclear Elements and Leeches: Identification and Molecular Characterization of Transposable Elements in Leech Genomes. DNA 2025, 5, 30. https://doi.org/10.3390/dna5020030

AMA Style

Müller C. Of Short Interspersed Nuclear Elements, Long Interspersed Nuclear Elements and Leeches: Identification and Molecular Characterization of Transposable Elements in Leech Genomes. DNA. 2025; 5(2):30. https://doi.org/10.3390/dna5020030

Chicago/Turabian Style

Müller, Christian. 2025. "Of Short Interspersed Nuclear Elements, Long Interspersed Nuclear Elements and Leeches: Identification and Molecular Characterization of Transposable Elements in Leech Genomes" DNA 5, no. 2: 30. https://doi.org/10.3390/dna5020030

APA Style

Müller, C. (2025). Of Short Interspersed Nuclear Elements, Long Interspersed Nuclear Elements and Leeches: Identification and Molecular Characterization of Transposable Elements in Leech Genomes. DNA, 5(2), 30. https://doi.org/10.3390/dna5020030

Article Metrics

Back to TopTop