Revealing Landscapes of Transposable Elements in Apis Species by Meta-Analysis

Simple Summary Studies on the detection of transposable elements and their annotations have posed several challenges. For example, simple comparisons of transposable elements in different species using different methods can lead to misinterpretations. Thus, assembling data for transposable elements analyzed by unified methods is important for comparison purposes. Therefore, we performed a meta-analysis of transposable elements identified using genome datasets from five Apis species (11 sets of genome data) and specific software to detect the transposable elements, which revealed the landscapes of transposable elements. We examined the types and locations of transposable elements in the Apis genomes. The landscapes of transposable elements showed that four to seven transposable element families among 13 and 15 families of TEs detected in classes I and II, respectively, consisted mainly of Apis-associated transposable elements. These families include DNA/TcMar-Mariner and DNA/CMC-EnSpm. In addition, more DNA/TcMar-Mariner consensus sequences and copies were detected in Apis mellifera than in other Apis species. These data suggest that TcMar-Mariner might exert A. mellifera-specific effects in the host A. mellifera species. Our landscape data provide new insights into Apis transposable elements; furthermore, detailed analyses of our data could pave the way for new biological insights in this field. Abstract Transposable elements (TEs) are grouped into several families with diverse sequences. Owing to their diversity, studies involving the detection, classification, and annotation of TEs are difficult tasks. Moreover, simple comparisons of TEs among different species with different methods can lead to misinterpretations. The genome data of several honey bee (Apis) species are available in public databases. Therefore, we conducted a meta-analysis of TEs, using 11 sets of genome data for Apis species, in order to establish data of “landscape of TEs”. Consensus TE sequences were constructed and their distributions in the Apis genomes were determined. Our results showed that TEs belonged to four to seven TE families among 13 and 15 families of TEs detected in classes I and II respectively mainly consisted of Apis TEs and that more DNA/TcMar-Mariner consensus sequences and copies were present in all Apis genomes tested. In addition, more consensus sequences and copy numbers of DNA/TcMar-Mariner were detected in Apis mellifera than in other Apis species. These results suggest that TcMar-Mariner might exert A. mellifera-specific effects on the host A. mellifera species. In conclusion, our unified approach enabled comparison of Apis genome sequences to determine the TE landscape, which provide novel evolutionary insights into Apis species.


Introduction
Transposable elements (TEs) are mobile DNA sequences that undergo a change in their positions within a genome [1]. TEs occur in diverse forms and are found in the genomes of many species. Numerous effects of TEs on the host species have been reported. To mention some specific effects of TEs, they can serve as a source of mutations, lead to host-genome rearrangements, and change gene expression at the level of transcription. TEs can be divided into two classes: class I and class II (sometimes referred to as retrotransposons and DNA transposons, respectively) [1][2][3]. Class I TEs use an RNA intermediate and a "copy-and-paste" mechanism [1]. Class I TEs are further divided into subclasses (referred to as "order" in [3]), namely long terminal repeats (LTRs), Dictyostelium intermediate repeat sequence (DIRS), and non-LTRs. LTRs are divided into several superfamilies (e.g., Copia, Gypsy, and ERV) while non-LTRs are divided into other several superfamilies (e.g., long interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs) and Penelope), several superfamilies of which some are divided into several families. Class II TEs move using a "cut-and-paste" mechanism through a DNA intermediate [1,[3][4][5]; however, the Helitron type moves in a "peel-and-paste" manner [6]. Class II TEs are divided into subclasses (orders): terminal inverted repeat (TIR) (possessing transposase in its coding region), Crypton, Helitron, and Marverick. Each category is further classified into subfamilies, of which some are divided into several families [1,3]. For example, Tc1/Mariner is one of the subfamilies belonging to the TIR subclass, and Tc1/Mariner is further classified into Tc1 or Mariner. The TE distributions of each species have specific features. Thus, performing a comparative analysis of the distributions of TEs among several species can potentially uncover some new insights into these species related to their TEs.
Due to their ability to "transpose" within the genome, TEs have increased in number within the genome during evolution. In addition, new TEs enter the genome via horizontal transmission from other species. TE sequences have high diversity, due to the accumulation of mutations, which leads to many variants [1,21]. TE insertion and removal can indirectly cause rearrangements in host-genome sequences, leading to duplications or reshuffling around TEs in the host genome. These events can occur in genes or expression-regulation sites. Moreover, TEs can cause genome structural diversities long after TE could lose the capacity to move. Therefore, accurate TE detection and annotation are difficult to achieve. Although the basal TE status of each genome report is important (e.g., simply showing percentages or numbers of TE families or classes present in the genome), comparisons among multiple species are suboptimal because the TE statuses were constructed using different methods and different software versions. Instead, new knowledge related to TEs could be obtained by studying the landscapes of TEs (the types of TEs and their positions in different Apis genomes), applying a unified TE analysis to the Apis genome data, and comparing the TE status between different species.
Comparing TE composition data among different species is important for genomic and evolutionary research, as indicated above. Recently, one report provided basal TE data for various insect species, suggesting that the content and diversity of TEs and genome sizes are related [22]. Other reports have provided evolutionary insights into pogo and Tc1/mariner by comparing the status data for these TE families in Apoidea genomes [23]. In this study, to obtain landscape data for such comparisons, a meta-analysis was performed using genome data from the 11 Apis genome data (5 Apis species) listed in Table 1, which are available in a public database ( Figure 1). Specifically, we first performed de novo TE detection and then constructed consensus sequences for TEs with the same parameters, using RepeatModeler2 with the Apis genome data [24]. RepeatModeler2 runs multiple software packages to search for TEs and repetitive sequences, enabling accurate searches for TEs. To perform a detailed classification of the consensus sequences belonging to the Mariner or MLE family (the most prevalent among TE families in Apis genomes), a phylogenetic analysis of MLE consensus sequences was performed. Finally, the distributions of repetitive elements, including the detected TEs, were investigated in all 11 Apis genomes using RepeatMasker, and the TE landscapes of Apis species were drawn. By comparing the TE statuses of different Apis species and making use of the landscape data, we obtained new insights into TEs in Apis species.  (Table 1) from NCBI genome database (URL: https://www.ncbi. nlm.nih.gov/genome/ accessed on 1 June 2022) using RepeatModeler2 [24]. Phylogenetic analysis revealed MLE relationships, where the most abundant consensus sequences were detected among the TE families in Apis species. The distributions of repetitive elements, including the TEs detected by RepeatModeler2, were investigated using RepeatMasker. The landscapes of TEs in Apis species were obtained using both sets of results, which led to new insights into TEs in Apis species. The images in Figure 1 were obtained from TogoTV (© 2016 DBCLS TogoTV).

Genome Data Used in This Study
All genome data used in this study were downloaded from the NCBI Assembly section (https://www.ncbi.nlm.nih.gov/assembly/, accessed on 2 August 2022). The GenBank assembly accession IDs, genome sizes, N50 values, and abbreviations of each genome data point are presented in Table 1.

De novo Detection of Transposable Elements Consensus Family Sequences
De novo detection of TE consensus family sequences was performed using Repeat Mod-eler2 (version DEV) with the default settings and the genome data indicated in Table 1 [24].

Phylogenetic Analysis
The detected TE sequences of some families were aligned using Clustal Omega (version 1.2.4) [25]. To construct approximately maximum-likelihood phylogenetic trees, aln files and Clustal Omega output files were further analyzed using FastTree (version 2.1.10) [26]. To visualize the phylogenetic trees, the FastTree output files (newick files) were loaded into MEGAX (version 10.1.7) [27].

Distribution Analysis of Repetitive Elements in Apis Genomes
The distributions of repetitive elements (including the TEs detected with RepeatMod-eler2) were investigated using the TE sequences as libraries and RepeatMasker (version 4.1.2-p1), with the default settings [28] 3. Results

Detection of Transposable Elements in Apis Genomes
To determine the types of TEs in the Apis genomes, De novo TE detection was performed, and consensus TE family sequences were constructed with RepeatModeler2 using the Apis genomes shown in Table 1. The detection procedure used with RepeatModeler2 was described in detail previously [24]. Briefly, RepeatModeler2 runs different de novo repeat-detection programs such as RECON [29], RepeatScout [30], LtrHarvest [31], and Ltr_retriever [32]. The constructed family models from each software program are merged, redundancies are removed, and consensus sequences are constructed. The consensus sequences are annotated using RepeatClassifier, which compares the consensus sequences to several databases, including Dfam [24]. The output files from RepeatModeler2 are provided in Supplement data S1. The numbers of consensus sequences for each family are shown in Table 2. More consensus sequences were for class II TEs than for class I TEs. Among the class II TEs, DNA/TcMar-Mariner, that is MLE, DNA/TcMar-Tc1, DNA/hAT-Ac, DNA/CMC-EnSpm, and DNA/CMC-PiggyBac consensus sequences were constructed for all or 10 of the 11 Apis genomes studied, whereas the consensus sequences of other families were constructed in less than three Apis genomes. With class I TEs, the consensus sequences of three LTRs (LTR/Copia, LTR/Gypsy, and LTR/Pao) were constructed in more than 9 of the 11 Apis genomes.   per species)  34  17  24  21  35  33  48  39  51  34  38 The consensus sequences not clearly annotated as a family (i.e., "unknown" sequences) are excluded (all-inclusive count result data are available in Supplemental Data S2). The nomenclatures of the TE families were defined previously [2]. Family names belonging to class II TEs and class I TEs are represented with red and blue text, respectively. The degree of red shading indicates the number of the consensus sequences found where darker shading indicates higher numbers.
Next, we investigated the differences in the numbers of consensus TE sequences among Apis species.. As shown in the "Total (per species)" row of Table 2, more consensus TE sequences were constructed with the A. mellifera species (Am, Ami, Amm, Amcar, and Amcau: 48, 39, 51, 34, and 38, respectively) than with the other Apis species (Acc, Acj, Ack, Ad, and Af: 34, 17, 24 21, and 35, respectively). Furthermore, more DNA/TcMar-Mariner sequences were detected with the A. mellifera species and Acc, representing the highest numbers of a consensus sequence constructed among the TE families. In addition, relatively high numbers of DNA/TcMar-Tc1 sequences were constructed with the A. mellifera species. These findings indicate that differences in the total number of consensus sequences among the A. mellifera species and other Apis species were mainly due to differences in DNA/TcMar-Mariner and DNA/TcMar-Tc1 consensus sequences.

Sequence Analysis of TcMar-Mariner Consensus Sequences
As indicated in the previous section, the highest numbers of consensus sequences were constructed for the TcMar-Mariner family, among the TE families detected with RepeatModeler2. To obtain a more detailed classification, multiple sequence alignments were performed using the TcMar-Mariner consensus sequences (Supplemental Data S3), Ammar1-6 (which were previously reported as A. mellifera MLEs [7]), and MLE consensus sequences of other species (mentioned in an MLE-related report [20]; Supplemental Data S4), where the subfamilies have been annotated. Based on the alignment results, a phylogenetic tree was constructed ( Figure 2 and Supplemental Data S5 contain the raw data and related files). As shown in Figure 2 several clusters formed in the phylogenetic tree. MLEs annotated as a subfamily were expected to be located in a cluster; however, the phylogenetic tree showed that no MLEs belonging to a single subfamily were located in a single cluster. These results showed that the classifications of the MLE subfamilies, which are based on the amino acid sequences of transposase in MLEs, conflicted with the results of the nucleotide sequence-based analyses we performed. All-inclusive count result data are available in Supplemental Data S2.

Distribution Analysis of Transposable Elements in Apis Genome
To determine the distributions of the TEs detected with RepeatModeler2 in Apis genomes (Section 3.1), we ran RepeatMasker with the Apis reference genome as the input data (Table 1) and the consensus TE sequence data as libraries using Repeat Modeler2 (Supplemental Data S1). RepeatMasker was used to screen the TE sequences (registered in Dfam or Repbase) or the consensus sequences as input data (mainly from RepeatModeler2) and simple repeat sequences as genomic query data (for greater detail, see [28]). Because of these software features, high numbers of short TE sequences were detected. The output files are provided in Supplemental Data S6. The percentages of repetitive elements, including TEs, present in the Apis genomes are shown in Table 3. Our findings indicate that repetitive elements comprised approximately 7 to 12% of the Apis genome regions. The A. mellifera genomes (except for Ami) had higher percentages than the other Apis genomes. The percentages in the Apis genomes were lower than those reported for other insect species. (e.g., approximately 46.8% for B. mori [17], 20% for T. castaneum [20], 65% for A. aegypti [16], and 20% for D. melanogaster [20]).
The RepeatMasker results are summarized in tbl files (RepeatMasker output files) and are available in Supplemental Data S6. Because the summary files did not show the number of copies in the individual TE families, we counted them using. out files and other RepeatMasker output files (Supplemental Data S6). The number of copies belonging to the TE families that were clearly annotated as a TE family member (e.g., DNA and SINE?) plus other repetitive elements (e.g., Simple repeat) in all Apis genomes are given in Supplemental Data S7. The total copy numbers of class II and class I TE families in all Apis genomes are shown in Tables 4 and 5, respectively. Overall, several TE families had multiple copies. Among these TE families, the TEs of class II had many more copies than those of class I. With regard to class II, more total number of copies (except for Ami) were observed in A. mellifera genomes than in the other Apis genomes (Table 4). In contrast, among the class I TEs, Acj, Ack, and Ami showed lower copy numbers, whereas Am had a higher copy number than the other Apis species (Table 5).

Figure 2.
Phylogenetic tree of Apis TcMar-Mariner consensus sequences identified in this study. The MLE sequences of other species and A. mellifera were annotated with Mariner subfamilies in previous reports [8,20]. Blue, orange, green, purple, and yellow circles located at end of each node (MLE sequences from the previous reports) indicate the MLE subfamilies. The red circles indicate consensus sequences detected with more than 200 copies. The green semicircular shading encompasses a clade including many sequences with over 200 copies. The numbers at the branches indicate bootstrap values. A high-resolution phylogenetic tree data is available in Supplemental Data S5.  EnSpm  1387  1684  1797  880  1060  692  2761  538  2200  1305  1518  DNA/IS3EU  0  0  0  0  0  107  0  0  The total numbers of TE families (detected using RepeatModeler2) were calculated using output files from RepeatMasker. The degree of red shading reflects the copy numbers found, where darker shading indicates higher copy numbers. Family names belonging to class II TEs and class I TEs are represented with red. Table 5. Total copy numbers of class I TE families in the Apis genomes listed in Table 1. The total numbers of TE families (detected using RepeatModeler2) were calculated using output files from RepeatMasker. The degree of red shading indicates the number of the consensus sequences found, where darker shading indicates higher numbers. Family names belonging to class II TEs and class I TEs are represented with blue text.
Among the class II TE families, copies of DNA/CMC-EnSpm, DNA/TcMar-Mariner, and DNA/hAT-Ac were detected in all Apis genomes tested (Table 4). Over 1000 copies of DNA/TcMar-Mariner were detected in all A. mellifera species and in Acc and Al. In addition, 1000 DNA/CMC-EnSpm copies were detected in all Apis species tested, except for Ad, Al and Ami, whereas 1000 copies of DNA/hAT-Ac were detected in Am and Amcau genomes. In the case of DNA/TcMar-Tc1, over 1000 copies were detected in all A. mellifera species, but no copies were detected in Ad. Over 400 DNA/PiggyBac copies were detected in some Apis genomes, but no copies were detected in the Acj genome. Among the class I TE families, copies of LTR-Copia were detected in all Apis genomes tested, and copies of LTR-Gypsy and LTR/Pao were detected in 10 and 9 Apis species, respectively.
As shown above, abundant copies of DNA/TcMar-Mariner and DNA/CMC-EnSpm were detected in all Apis genomes tested. To investigate this phenomenon in greater detail, the copy numbers of both TE families in each of the Apis genomes are shown in graphically in Figure 3. In the case of DNA/TcMar-Mariner, A. mellifera species (especially Amm, Amcar, and Amcau) had higher copy numbers than other Apis species ( Figure 3A). In the case of DNA/CMC-EnSpm, Am and Amm had higher copy numbers than other Apis species, whereas Ad, Af, Al, and Ami had fewer copies ( Figure 3B). We further investigated which consensus TcMar-Mariner sequences, in particular, had many copies ( Table 2). As shown in Figure 2, consensus sequences with more than 200 copies (indicated with red circles) were scattered over the trees, and several sequences with red circles were located in a single clade (represented with the green semicircular object). This clade contained the sequences of all Apis species tested, except for Af.  Table 1. Both TE families were detected using Repeat Modeler 2 and the total numbers of TEs were calculated using. out files from Repeat Masker. Abbreviations of names of Apis species in the figure are shown in Table 1.

Discussion
In this study, we investigated the landscapes of TEs in Apis species using Apis genome data available in public databases; TE consensus sequences were also constructed. Sequence analysis was performed, and phylogenetic trees were constructed to reveal more detailed relationships for the MLEs, the consensus sequences of which are the most diverse among the TE families detected. Consequently, the distributions of repetitive elements, including the constructed consensus TE sequences within the corresponding Apis genomes, were revealed. Our landscapes showed that several limited TE families (from four to seven families among 13 and 15 families detected of classes I and II, respectively, in each Apis genome: see Tables 4 and 5) are mainly found in Apis genomes.
As described above, detecting TEs in genome sequences is a difficult task because TE sequences have many variants and deletions [21]. Therefore, the results related to TEs can be varied can vary when different methods are adopted. Our meta-analysis was performed using two major software packages that are commonly employed. RepeatModeler2 is commonly used for de novo TE detection with genome data [24]. This software package can also be used to construct consensus sequences. Multiple repeat-searching programs can be run, and merging the results of the program enables accurate detection of TEs (for benchmarking the results, see a previous article that described RepeatModeler2 [24]). RepeatMasker searches for simple repeats and TEs in queried genome data, with consensus sequences serving as the input data [28]. A series of analyses can provide accurate landscape data for TEs in the queried genome. These landscape data can be utilized for further detailed analyses (e.g., comparing the TE status between different species).
The genome assembly level of genome data can affect TE detections. As shown in Table 2, over half of Apis genome data we used from the public database are not "chromosome-level genome assembly data". TE data in detailed points using scaffold-level genome assembly data and chromosome-level genome assembly data in the same species can be different. However, we think that the genome assembly level can not affect on main features of Apis TEs we showed; for example, limited TE families consisted mainly of TEs and much more copies of DNA/CMC-EnSpm, DNA/TcMar-Mariner in all Apis species genomes, which included various genome assembly levels. It is very interesting to analyze the two levels separately. However, since our goal is to clarify the landscape of TEs in the genus Apis, which is not affected by differences in genome assembly, we did not analyze them separately in this study.
There is adequate and reliable software for de novo TE detection in genome sequences, such as EDTA [33] and REPET [34]). The benchmarking results showed that RepeatMod-eler2 produced the output file which was similar to the curated libraries using several model species genome data, and showed better status related to the detected family quality and the detected sequences of fragmentation and redundancy than other software tested while these software showed better status related to some cases [24]. Considering these results, we decided to choose RepeatModeler2 for de novo TE detection.
As mentioned in the introduction part, there are more than 10 known species of honey bees, and by examining the four main species (Af, Ad, A. cerana, and Am) and one closely related species (Al) [7] with RepeatModeler2 and RepeatMasker, which were used for de novo TE detections and revealing distributions of the detected TE families respectively, we were able to characterize the TEs common to the genus Apis without using all species, thus providing a "landscape" of the TEs in the genus Apis, which is our goal of this article. Interestingly, although Ad and Al are closely related species, the landscapes showed that there were several different features of TEs between the two species.
With both class II and class I TEs, several families have diverse consensus sequences, whereas the other families had a few consensus sequences in Apis genomes, implying that these TE families might exert several effects on host Apis species through several mechanisms (e.g., gene insertions or alterations at the transcription level) [35,36]. Comparisons of the consensus TE sequences among Apis species revealed that more consensus sequences were constructed for A. mellifera than for the other Apis species, which was mainly due to DNA/Tc-Mariner and DNA/Tc-Tc1 (which have many consensus sequences). These results suggest that some of the TEs could have had effects on A. mellifera species that might not have occurred in other Apis species.
Among the several characteristics of honey bee TEs revealed by the landscape data, it is worth noting the patchy distribution of each TE. Some TEs are identified only in certain Apis species. For example, DNA/MULE-MuDR was only found only in Af and TcMar-Tigger was found only in Acc. Moreover, RC/Heliton was found only in Am and not in any other Am subspecies. This biased and patchy distribution of the TEs is well known in other species [22,23]. The most famous example of such a distribution is the P element, which is present only in certain strains (e.g., P strains) of Drosophila melanogaster [37]. Using this landscape data, we plan to conduct a detailed comparative analysis in the future.
Many Mariner or MLE consensus sequences were constructed for Apis species in this study. As described above, these consensus sequences were constructed using RepeatMod-eler2, which runs repeat detection programs and annotates the constructed sequences using several databases including Dfam [2,24]. A further detailed classification of these MLEs was performed. This was done by generating alignments and constructing phylogenetic trees using MLE consensus nucleotide sequences that were previously annotated with MLE consensus sequences. Our results revealed that MLE sequences annotated as part of the same MLE subfamily did not form a single clade. MLEs, which have a DD34D catalytic motif in their encoded transposase, are classified into subfamilies based on their transposase amino acid sequences [38,39]. This classification principle must be respected; however, we believe that nucleotide-based classification may also be required. As shown in this study, an enormous number of TE nucleotide sequences can be detected in target genomes because the whole-genome data of many species are available in public databases, and sophisticated TE detection software, such as RepeatModeler2, are now available [24]. Some detected MLEs do not encode transposases of sufficient length because of mutations or deletions in their sequences. Therefore, annotations based on amino acid sequences cannot be used to study such MLE sequences. According to a previous report, annotation methods for studying subfamilies are fraught with problems such as a lack of reproducibility [21]. The development of nucleotide-based annotations of MLE subfamilies is essential for future genome analysis, and our data could lead to future research in this field.
Nucleic acid-based analysis using RepeatModeler2 and RepeatMasker (in this study), and analysis using the consensus amino acid sequence of transposase have yielded several different results [22,23]. However, even if the same genome data are used for nucleic acid-based analysis, the results will differ slightly depending on the method used, and the number of each TE found differs depending on the software used. For example, as mentioned above, we identified many copies of many DNA/TcMar-Tc1 types. However, in a previous analysis using the tblastn method with amino acid sequences against Apoidea genomes, including some Apis genomes [23], these TEs were not found in the Apis genomes. This discrepancy may reflect our method used, which recognizes the Tc1 and Mariner types as different, whereas the previous analysis considered them to be the same type of TEs. Another example is the detection of DNA/CMC-EnSpm in all Apis genomes tested, whereas previous findings indicated that DNA/CMC-EnSpm was absent from the Am genome [40]. This may be because the TEs annotated as DNA/CMC-EnSpm in this study were classified as putative elements, unclassified, or classified Class II TEs. Indeed, Nasonia vitripennis DNA/CMC-EnSpm was registered in Repbase (e.g., EnSpm-2_NVi) [41], and another report showed that CMC TEs were detected in the Am genome [22]. This discrepancy illustrates the difficulty of classifying TEs. However, our landscape was successful in providing a general framework for the TEs of the Apis genus. Further evolutionary studies of TEs will require analysis of the individual TEs found. Recent advances in bioinformatics have made this possible.
RepeatMasker results for the Apis species showed that repetitive elements comprise approximately 7 to 12% of Apis genomes, which is lower than that of many other insect species [20]. However, these percentages are consistent with previous reports [9,10,12,[14][15][16], which validate the accuracy of our datasets and the analytical methods used in this study. Comparing the numbers of TEs among Apis species showed that A. mellifera species, with the exception of Ami, have more TEs than other Apis species. Ami showed lower percentages of repetitive elements, perhaps because the N50 value of Ami was much lower than those of other Apis species. Thus, we conclude that A. mellifera species have more repeat regions and TEs than other Apis species.
RepeatMasker detected high numbers of short TEs in Apis genomes. We assume that while some of them are false positives by RepeatMasker, they are TE footprints [39,42], or fragmented sequences of TEs by insertion or deletions. By detailed analysis of such sequences in our landscape data, the dynamics of Apis TEs could be revealed, leading to the biological interpretation of the TEs.
The total number of TE copies in each TE family showed that families with a higher number of consensus sequences had a higher number of elements. In addition, more copies of class II TEs than class I TEs were detected. Furthermore, these results revealed that the TEs of several limited families in both classes (II and I) consisted of Apis TEs. Most of these results have the same tendencies as those of the consensus TE sequences. These results suggest that TEs belonging to limited TE families mostly consist of Apis TEs. A more detailed investigation also revealed more class II TEs in A. mellifera genomes, except for Ami, than in other Apis genomes. TcMar-Mariner/MLEs were identified as a family with a high number of copies in all Apis species tested. The phylogenetic tree revealed that, although several MLE consensus sequences of all Apis species tested (except for Af, which had over 200 copies) were located in a clade, these sequences were scattered in the trees, suggesting that the abundant MLEs may have been copied from many consensus sequences rather than from a very limited number of consensus sequences.
Although clear differences were found in the number and type of TEs between species, it is interesting to note that variation has occurred within species. This may be due to differences in the quality of the genome data. Among the genome data used in this study, the Ami genome data showed a much lower contig N50 number than the other genome data. No significant correlations were found between the contig N50 numbers for the Apis genome data and the numbers and types of TEs. It would be interesting from an evolutionary point of view if the intraspecific variation observed here was not due to differences in the quality of the genome data. Our findings indicate that many TEs increase in number, shift, or propagate horizontally in the genome after subspeciation. Further studies are required to elucidate these differences.
In this study, we performed a meta-analysis of Apis TEs using Apis whole-genome data and TE-detection software. Through this analysis, we determined the landscape data of TEs showing the specific types of TEs and their positions in the Apis genomes. We also showed that several limited TE families exist in Apis genomes and that A. mellifera species have more TEs, mainly due to MLEs. The findings of this study provide several new insights into the genomes of Apis species. The landscape data obtained in this study can be compared to TE data for other species, including Hymenoptera or other insects [20,22,23], leading to findings related to the evolution of TEs between these species. In addition, analyzing our landscape data in greater detail could help elucidate new TE-related biological insights for Apis species.